8i | 9i | 10g | 11g | 12c | 13c | 18c | Misc | PL/SQL | SQL | RAC | WebLogic | Linux

Home » Articles » Misc » Here

VARIANCE, VAR_POP and VAR_SAMP Analytic Functions

This article gives an overview of the VARIANCE, VAR_POP and VAR_SAMP analytic functions. If you are new to analytic functions you should probably read this introduction to analytic functions first.

Related articles.

Setup

The examples in this article require the following table.

--DROP TABLE emp PURGE;

CREATE TABLE emp (
  empno    NUMBER(4) CONSTRAINT pk_emp PRIMARY KEY,
  ename    VARCHAR2(10),
  job      VARCHAR2(9),
  mgr      NUMBER(4),
  hiredate DATE,
  sal      NUMBER(7,2),
  comm     NUMBER(7,2),
  deptno   NUMBER(2)
);

INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,to_date('17-12-1980','dd-mm-yyyy'),800,NULL,20);
INSERT INTO emp VALUES (7499,'ALLEN','SALESMAN',7698,to_date('20-2-1981','dd-mm-yyyy'),1600,300,30);
INSERT INTO emp VALUES (7521,'WARD','SALESMAN',7698,to_date('22-2-1981','dd-mm-yyyy'),1250,500,30);
INSERT INTO emp VALUES (7566,'JONES','MANAGER',7839,to_date('2-4-1981','dd-mm-yyyy'),2975,NULL,20);
INSERT INTO emp VALUES (7654,'MARTIN','SALESMAN',7698,to_date('28-9-1981','dd-mm-yyyy'),1250,1400,30);
INSERT INTO emp VALUES (7698,'BLAKE','MANAGER',7839,to_date('1-5-1981','dd-mm-yyyy'),2850,NULL,30);
INSERT INTO emp VALUES (7782,'CLARK','MANAGER',7839,to_date('9-6-1981','dd-mm-yyyy'),2450,NULL,10);
INSERT INTO emp VALUES (7788,'SCOTT','ANALYST',7566,to_date('13-JUL-87','dd-mm-rr')-85,3000,NULL,20);
INSERT INTO emp VALUES (7839,'KING','PRESIDENT',NULL,to_date('17-11-1981','dd-mm-yyyy'),5000,NULL,10);
INSERT INTO emp VALUES (7844,'TURNER','SALESMAN',7698,to_date('8-9-1981','dd-mm-yyyy'),1500,0,30);
INSERT INTO emp VALUES (7876,'ADAMS','CLERK',7788,to_date('13-JUL-87', 'dd-mm-rr')-51,1100,NULL,20);
INSERT INTO emp VALUES (7900,'JAMES','CLERK',7698,to_date('3-12-1981','dd-mm-yyyy'),950,NULL,30);
INSERT INTO emp VALUES (7902,'FORD','ANALYST',7566,to_date('3-12-1981','dd-mm-yyyy'),3000,NULL,20);
INSERT INTO emp VALUES (7934,'MILLER','CLERK',7782,to_date('23-1-1982','dd-mm-yyyy'),1300,NULL,10);
COMMIT;

VARIANCE, VAR_POP and VAR_SAMP as Aggregate Functions

The VARIANCE, VAR_POP and VAR_SAMP aggregate functions are used to calculate the variance, population variance and sample variance of a set of data respectively. As aggregate functions they reduce the number of rows, hence the term "aggregate". If the data isn't grouped we turn the 14 rows in the EMP table to a single row with the aggregated values.

SELECT VARIANCE(sal) AS variance_sal,
       VAR_POP(sal) AS var_pop_sal,
       VAR_SAMP(sal) AS var_samp_sal
FROM   emp;

VARIANCE_SAL VAR_POP_SAL VAR_SAMP_SAL
------------ ----------- ------------
  1398313.87  1298434.31   1398313.87

SQL>

We can get more granularity of information by including a GROUP BY clause. In the following example we see the values on a per-department basis.

SELECT deptno,
       VARIANCE(sal) AS variance_sal,
       VAR_POP(sal) AS var_pop_sal,
       VAR_SAMP(sal) AS var_samp_sal
FROM   emp
GROUP BY deptno
ORDER BY deptno;

    DEPTNO VARIANCE_SAL VAR_POP_SAL VAR_SAMP_SAL
---------- ------------ ----------- ------------
        10   3585833.33  2390555.56   3585833.33
        20      1261875     1009500      1261875
        30   446666.667  372222.222   446666.667

SQL>

In both cases we have aggregated the data to get the values, returning less rows than we started with. Analytic functions allow us to return these aggregate values while retaining the original row data.

VARIANCE Analytic Function

If there is more than one record in the sample after discarding nulls, the VARIANCE function returns the result of the VAR_SAMP function, the variance of the sample. If there is only a single row in the sample after discarding nulls, the VARIANCE function returns the value "0". If there are no records in the set after discarding nulls, the return value is NULL.

The basic description for the VARIANCE analytic function is shown below. The order_by_clause and windowing_clause are not allowed. The analytic clause is described in more detail here.

VARIANCE([ DISTINCT | ALL ] expr) [ OVER (analytic_clause) ]

Using an empty OVER clause turns the VARIANCE function into an analytic function. The lack of a partitioning clause means the whole result set is treated as a single partition, so we get the variance of the salary for all employees, as well as all the original data.

SELECT empno,
       ename,
       deptno,
       sal,
       VARIANCE(sal) OVER () AS variance_sal
FROM   emp
ORDER BY deptno;

     EMPNO ENAME          DEPTNO        SAL VARIANCE_SAL
---------- ---------- ---------- ---------- ------------
      7782 CLARK              10       2450   1398313.87
      7839 KING               10       5000   1398313.87
      7934 MILLER             10       1300   1398313.87
      7566 JONES              20       2975   1398313.87
      7902 FORD               20       3000   1398313.87
      7876 ADAMS              20       1100   1398313.87
      7369 SMITH              20        800   1398313.87
      7788 SCOTT              20       3000   1398313.87
      7521 WARD               30       1250   1398313.87
      7844 TURNER             30       1500   1398313.87
      7499 ALLEN              30       1600   1398313.87
      7900 JAMES              30        950   1398313.87
      7698 BLAKE              30       2850   1398313.87
      7654 MARTIN             30       1250   1398313.87

SQL>

Adding the partitioning clause allows us to display the variance of the salary per department, along with the employee data for each department.

SELECT empno,
       ename,
       deptno,
       sal,
       VARIANCE(sal) OVER (PARTITION BY deptno) AS variance_sal_by_dept
FROM   emp;

     EMPNO ENAME          DEPTNO        SAL VARIANCE_SAL_BY_DEPT
---------- ---------- ---------- ---------- --------------------
      7782 CLARK              10       2450           3585833.33
      7839 KING               10       5000           3585833.33
      7934 MILLER             10       1300           3585833.33
      7566 JONES              20       2975              1261875
      7902 FORD               20       3000              1261875
      7876 ADAMS              20       1100              1261875
      7369 SMITH              20        800              1261875
      7788 SCOTT              20       3000              1261875
      7521 WARD               30       1250           446666.667
      7844 TURNER             30       1500           446666.667
      7499 ALLEN              30       1600           446666.667
      7900 JAMES              30        950           446666.667
      7698 BLAKE              30       2850           446666.667
      7654 MARTIN             30       1250           446666.667

SQL>

VAR_POP Analytic Function

The VAR_POP function returns the population variance after discarding nulls. The function performs the following calculation.

SUM((expr - (SUM(expr) / COUNT(expr)))2) / COUNT(expr)

The basic description for the VAR_POP analytic function is shown below. The analytic clause is described in more detail here.

VAR_POP(expr) [ OVER (analytic_clause) ]

Using an empty OVER clause turns the VAR_POP function into an analytic function. The lack of a partitioning clause means the whole result set is treated as a single partition, so we get the population variance of the salary for all employees after discarding nulls, as well as all the original data.

SELECT empno,
       ename,
       deptno,
       sal,
       VAR_POP(sal) OVER () AS var_pop_sal
FROM   emp;

     EMPNO ENAME          DEPTNO        SAL VAR_POP_SAL
---------- ---------- ---------- ---------- -----------
      7369 SMITH              20        800  1298434.31
      7499 ALLEN              30       1600  1298434.31
      7521 WARD               30       1250  1298434.31
      7566 JONES              20       2975  1298434.31
      7654 MARTIN             30       1250  1298434.31
      7698 BLAKE              30       2850  1298434.31
      7782 CLARK              10       2450  1298434.31
      7788 SCOTT              20       3000  1298434.31
      7839 KING               10       5000  1298434.31
      7844 TURNER             30       1500  1298434.31
      7876 ADAMS              20       1100  1298434.31
      7900 JAMES              30        950  1298434.31
      7902 FORD               20       3000  1298434.31
      7934 MILLER             10       1300  1298434.31

SQL>

Adding the partitioning clause allows us to display the population variance of the salary per department, along with the employee data for each department.

SELECT empno,
       ename,
       deptno,
       sal,
       VAR_POP(sal) OVER (PARTITION BY deptno) AS var_pop_by_dept
FROM   emp;

     EMPNO ENAME          DEPTNO        SAL VAR_POP_BY_DEPT
---------- ---------- ---------- ---------- ---------------
      7782 CLARK              10       2450      2390555.56
      7839 KING               10       5000      2390555.56
      7934 MILLER             10       1300      2390555.56
      7566 JONES              20       2975         1009500
      7902 FORD               20       3000         1009500
      7876 ADAMS              20       1100         1009500
      7369 SMITH              20        800         1009500
      7788 SCOTT              20       3000         1009500
      7521 WARD               30       1250      372222.222
      7844 TURNER             30       1500      372222.222
      7499 ALLEN              30       1600      372222.222
      7900 JAMES              30        950      372222.222
      7698 BLAKE              30       2850      372222.222
      7654 MARTIN             30       1250      372222.222

SQL>

VAR_SAMP Analytic Function

The VAR_SAMP function returns the variance of the sample after discarding nulls. If there are no records in the set after discarding nulls, the return value is NULL. The function performs the following calculation.

(SUM(expr - (SUM(expr) / COUNT(expr)))2) / (COUNT(expr) - 1)

The basic description for the VAR_SAMP analytic function is shown below. The analytic clause is described in more detail here.

VAR_SAMP(expr) [ OVER (analytic_clause) ]

Using an empty OVER clause turns the VAR_SAMP function into an analytic function. The lack of a partitioning clause means the whole result set is treated as a single partition, so we get the sample variance of the salary for all employees after discarding nulls, as well as all the original data.

SELECT empno,
       ename,
       deptno,
       sal,
       VAR_SAMP(sal) OVER () AS var_samp_sal
FROM   emp;

     EMPNO ENAME          DEPTNO        SAL VAR_SAMP_SAL
---------- ---------- ---------- ---------- ------------
      7369 SMITH              20        800   1398313.87
      7499 ALLEN              30       1600   1398313.87
      7521 WARD               30       1250   1398313.87
      7566 JONES              20       2975   1398313.87
      7654 MARTIN             30       1250   1398313.87
      7698 BLAKE              30       2850   1398313.87
      7782 CLARK              10       2450   1398313.87
      7788 SCOTT              20       3000   1398313.87
      7839 KING               10       5000   1398313.87
      7844 TURNER             30       1500   1398313.87
      7876 ADAMS              20       1100   1398313.87
      7900 JAMES              30        950   1398313.87
      7902 FORD               20       3000   1398313.87
      7934 MILLER             10       1300   1398313.87

SQL>

Adding the partitioning clause allows us to display the sample variance of the salary per department, along with the employee data for each department.

SELECT empno,
       ename,
       deptno,
       sal,
       VAR_SAMP(sal) OVER (PARTITION BY deptno) AS var_samp_by_dept
FROM   emp;

     EMPNO ENAME          DEPTNO        SAL VAR_SAMP_BY_DEPT
---------- ---------- ---------- ---------- ----------------
      7782 CLARK              10       2450       3585833.33
      7839 KING               10       5000       3585833.33
      7934 MILLER             10       1300       3585833.33
      7566 JONES              20       2975          1261875
      7902 FORD               20       3000          1261875
      7876 ADAMS              20       1100          1261875
      7369 SMITH              20        800          1261875
      7788 SCOTT              20       3000          1261875
      7521 WARD               30       1250       446666.667
      7844 TURNER             30       1500       446666.667
      7499 ALLEN              30       1600       446666.667
      7900 JAMES              30        950       446666.667
      7698 BLAKE              30       2850       446666.667
      7654 MARTIN             30       1250       446666.667

SQL>

Quick Links

The "*" indicates the function supports the full analytic syntax, including the windowing clause.

AVG * CLUSTER_DETAILS CLUSTER_DISTANCE CLUSTER_ID CLUSTER_PROBABILITY
CLUSTER_SET CORR * COUNT * COVAR_POP * COVAR_SAMP *
CUME_DIST DENSE_RANK FEATURE_DETAILS FEATURE_ID FEATURE_SET
FEATURE_VALUE FIRST FIRST_VALUE * LAG LAST
LAST_VALUE * LEAD LISTAGG MATCH_RECOGNIZE MAX *
MEDIAN MIN * NTH_VALUE * NTILE PERCENT_RANK
PERCENTILE_CONT PERCENTILE_DISC PREDICTION PREDICTION_COST PREDICTION_DETAILS
PREDICTION_PROBABILITY PREDICTION_SET RANK RATIO_TO_REPORT REGR_ (Linear Regression) Functions *
ROW_NUMBER STDDEV * STDDEV_POP * STDDEV_SAMP * SUM *
VAR_POP * VAR_SAMP * VARIANCE * String Aggregation Top-N Queries

For more information see:

Hope this helps. Regards Tim...

Back to the Top.