8i | 9i | 10g | 11g | 12c | 13c | 18c | 19c | 21c | 23c | Misc | PL/SQL | SQL | RAC | WebLogic | Linux
VARIANCE, VAR_POP and VAR_SAMP Analytic Functions
This article gives an overview of the VARIANCE
, VAR_POP
and VAR_SAMP
analytic functions. If you are new to analytic functions you should probably read this introduction to analytic functions first.
- Setup
- VARIANCE, VAR_POP and VAR_SAMP as Aggregate Functions
- VARIANCE Analytic Function
- VAR_POP Analytic Function
- VAR_SAMP Analytic Function
- Quick Links
Related articles.
Setup
The examples in this article require the following table.
--DROP TABLE emp PURGE; CREATE TABLE emp ( empno NUMBER(4) CONSTRAINT pk_emp PRIMARY KEY, ename VARCHAR2(10), job VARCHAR2(9), mgr NUMBER(4), hiredate DATE, sal NUMBER(7,2), comm NUMBER(7,2), deptno NUMBER(2) ); INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,to_date('17-12-1980','dd-mm-yyyy'),800,NULL,20); INSERT INTO emp VALUES (7499,'ALLEN','SALESMAN',7698,to_date('20-2-1981','dd-mm-yyyy'),1600,300,30); INSERT INTO emp VALUES (7521,'WARD','SALESMAN',7698,to_date('22-2-1981','dd-mm-yyyy'),1250,500,30); INSERT INTO emp VALUES (7566,'JONES','MANAGER',7839,to_date('2-4-1981','dd-mm-yyyy'),2975,NULL,20); INSERT INTO emp VALUES (7654,'MARTIN','SALESMAN',7698,to_date('28-9-1981','dd-mm-yyyy'),1250,1400,30); INSERT INTO emp VALUES (7698,'BLAKE','MANAGER',7839,to_date('1-5-1981','dd-mm-yyyy'),2850,NULL,30); INSERT INTO emp VALUES (7782,'CLARK','MANAGER',7839,to_date('9-6-1981','dd-mm-yyyy'),2450,NULL,10); INSERT INTO emp VALUES (7788,'SCOTT','ANALYST',7566,to_date('13-JUL-87','dd-mm-rr')-85,3000,NULL,20); INSERT INTO emp VALUES (7839,'KING','PRESIDENT',NULL,to_date('17-11-1981','dd-mm-yyyy'),5000,NULL,10); INSERT INTO emp VALUES (7844,'TURNER','SALESMAN',7698,to_date('8-9-1981','dd-mm-yyyy'),1500,0,30); INSERT INTO emp VALUES (7876,'ADAMS','CLERK',7788,to_date('13-JUL-87', 'dd-mm-rr')-51,1100,NULL,20); INSERT INTO emp VALUES (7900,'JAMES','CLERK',7698,to_date('3-12-1981','dd-mm-yyyy'),950,NULL,30); INSERT INTO emp VALUES (7902,'FORD','ANALYST',7566,to_date('3-12-1981','dd-mm-yyyy'),3000,NULL,20); INSERT INTO emp VALUES (7934,'MILLER','CLERK',7782,to_date('23-1-1982','dd-mm-yyyy'),1300,NULL,10); COMMIT;
VARIANCE, VAR_POP and VAR_SAMP as Aggregate Functions
The VARIANCE
, VAR_POP
and VAR_SAMP
aggregate functions are used to calculate the variance, population variance and sample variance of a set of data respectively. As aggregate functions they reduce the number of rows, hence the term "aggregate". If the data isn't grouped we turn the 14 rows in the EMP
table to a single row with the aggregated values.
SELECT VARIANCE(sal) AS variance_sal, VAR_POP(sal) AS var_pop_sal, VAR_SAMP(sal) AS var_samp_sal FROM emp; VARIANCE_SAL VAR_POP_SAL VAR_SAMP_SAL ------------ ----------- ------------ 1398313.87 1298434.31 1398313.87 SQL>
We can get more granularity of information by including a GROUP BY
clause. In the following example we see the values on a per-department basis.
SELECT deptno, VARIANCE(sal) AS variance_sal, VAR_POP(sal) AS var_pop_sal, VAR_SAMP(sal) AS var_samp_sal FROM emp GROUP BY deptno ORDER BY deptno; DEPTNO VARIANCE_SAL VAR_POP_SAL VAR_SAMP_SAL ---------- ------------ ----------- ------------ 10 3585833.33 2390555.56 3585833.33 20 1261875 1009500 1261875 30 446666.667 372222.222 446666.667 SQL>
In both cases we have aggregated the data to get the values, returning less rows than we started with. Analytic functions allow us to return these aggregate values while retaining the original row data.
VARIANCE Analytic Function
If there is more than one record in the sample after discarding nulls, the VARIANCE
function returns the result of the VAR_SAMP
function, the variance of the sample. If there is only a single row in the sample after discarding nulls, the VARIANCE
function returns the value "0". If there are no records in the set after discarding nulls, the return value is NULL.
The basic description for the VARIANCE
analytic function is shown below. The order_by_clause and windowing_clause are not allowed. The analytic clause is described in more detail here.
VARIANCE([ DISTINCT | ALL ] expr) [ OVER (analytic_clause) ]
Using an empty OVER
clause turns the VARIANCE
function into an analytic function. The lack of a partitioning clause means the whole result set is treated as a single partition, so we get the variance of the salary for all employees, as well as all the original data.
SELECT empno, ename, deptno, sal, VARIANCE(sal) OVER () AS variance_sal FROM emp ORDER BY deptno; EMPNO ENAME DEPTNO SAL VARIANCE_SAL ---------- ---------- ---------- ---------- ------------ 7782 CLARK 10 2450 1398313.87 7839 KING 10 5000 1398313.87 7934 MILLER 10 1300 1398313.87 7566 JONES 20 2975 1398313.87 7902 FORD 20 3000 1398313.87 7876 ADAMS 20 1100 1398313.87 7369 SMITH 20 800 1398313.87 7788 SCOTT 20 3000 1398313.87 7521 WARD 30 1250 1398313.87 7844 TURNER 30 1500 1398313.87 7499 ALLEN 30 1600 1398313.87 7900 JAMES 30 950 1398313.87 7698 BLAKE 30 2850 1398313.87 7654 MARTIN 30 1250 1398313.87 SQL>
Adding the partitioning clause allows us to display the variance of the salary per department, along with the employee data for each department.
SELECT empno, ename, deptno, sal, VARIANCE(sal) OVER (PARTITION BY deptno) AS variance_sal_by_dept FROM emp; EMPNO ENAME DEPTNO SAL VARIANCE_SAL_BY_DEPT ---------- ---------- ---------- ---------- -------------------- 7782 CLARK 10 2450 3585833.33 7839 KING 10 5000 3585833.33 7934 MILLER 10 1300 3585833.33 7566 JONES 20 2975 1261875 7902 FORD 20 3000 1261875 7876 ADAMS 20 1100 1261875 7369 SMITH 20 800 1261875 7788 SCOTT 20 3000 1261875 7521 WARD 30 1250 446666.667 7844 TURNER 30 1500 446666.667 7499 ALLEN 30 1600 446666.667 7900 JAMES 30 950 446666.667 7698 BLAKE 30 2850 446666.667 7654 MARTIN 30 1250 446666.667 SQL>
VAR_POP Analytic Function
The VAR_POP
function returns the population variance after discarding nulls. The function performs the following calculation.
SUM((expr - (SUM(expr) / COUNT(expr)))2) / COUNT(expr)
The basic description for the VAR_POP
analytic function is shown below. The analytic clause is described in more detail here.
VAR_POP(expr) [ OVER (analytic_clause) ]
Using an empty OVER
clause turns the VAR_POP
function into an analytic function. The lack of a partitioning clause means the whole result set is treated as a single partition, so we get the population variance of the salary for all employees after discarding nulls, as well as all the original data.
SELECT empno, ename, deptno, sal, VAR_POP(sal) OVER () AS var_pop_sal FROM emp; EMPNO ENAME DEPTNO SAL VAR_POP_SAL ---------- ---------- ---------- ---------- ----------- 7369 SMITH 20 800 1298434.31 7499 ALLEN 30 1600 1298434.31 7521 WARD 30 1250 1298434.31 7566 JONES 20 2975 1298434.31 7654 MARTIN 30 1250 1298434.31 7698 BLAKE 30 2850 1298434.31 7782 CLARK 10 2450 1298434.31 7788 SCOTT 20 3000 1298434.31 7839 KING 10 5000 1298434.31 7844 TURNER 30 1500 1298434.31 7876 ADAMS 20 1100 1298434.31 7900 JAMES 30 950 1298434.31 7902 FORD 20 3000 1298434.31 7934 MILLER 10 1300 1298434.31 SQL>
Adding the partitioning clause allows us to display the population variance of the salary per department, along with the employee data for each department.
SELECT empno, ename, deptno, sal, VAR_POP(sal) OVER (PARTITION BY deptno) AS var_pop_by_dept FROM emp; EMPNO ENAME DEPTNO SAL VAR_POP_BY_DEPT ---------- ---------- ---------- ---------- --------------- 7782 CLARK 10 2450 2390555.56 7839 KING 10 5000 2390555.56 7934 MILLER 10 1300 2390555.56 7566 JONES 20 2975 1009500 7902 FORD 20 3000 1009500 7876 ADAMS 20 1100 1009500 7369 SMITH 20 800 1009500 7788 SCOTT 20 3000 1009500 7521 WARD 30 1250 372222.222 7844 TURNER 30 1500 372222.222 7499 ALLEN 30 1600 372222.222 7900 JAMES 30 950 372222.222 7698 BLAKE 30 2850 372222.222 7654 MARTIN 30 1250 372222.222 SQL>
VAR_SAMP Analytic Function
The VAR_SAMP
function returns the variance of the sample after discarding nulls. If there are no records in the set after discarding nulls, the return value is NULL. The function performs the following calculation.
(SUM(expr - (SUM(expr) / COUNT(expr)))2) / (COUNT(expr) - 1)
The basic description for the VAR_SAMP
analytic function is shown below. The analytic clause is described in more detail here.
VAR_SAMP(expr) [ OVER (analytic_clause) ]
Using an empty OVER
clause turns the VAR_SAMP
function into an analytic function. The lack of a partitioning clause means the whole result set is treated as a single partition, so we get the sample variance of the salary for all employees after discarding nulls, as well as all the original data.
SELECT empno, ename, deptno, sal, VAR_SAMP(sal) OVER () AS var_samp_sal FROM emp; EMPNO ENAME DEPTNO SAL VAR_SAMP_SAL ---------- ---------- ---------- ---------- ------------ 7369 SMITH 20 800 1398313.87 7499 ALLEN 30 1600 1398313.87 7521 WARD 30 1250 1398313.87 7566 JONES 20 2975 1398313.87 7654 MARTIN 30 1250 1398313.87 7698 BLAKE 30 2850 1398313.87 7782 CLARK 10 2450 1398313.87 7788 SCOTT 20 3000 1398313.87 7839 KING 10 5000 1398313.87 7844 TURNER 30 1500 1398313.87 7876 ADAMS 20 1100 1398313.87 7900 JAMES 30 950 1398313.87 7902 FORD 20 3000 1398313.87 7934 MILLER 10 1300 1398313.87 SQL>
Adding the partitioning clause allows us to display the sample variance of the salary per department, along with the employee data for each department.
SELECT empno, ename, deptno, sal, VAR_SAMP(sal) OVER (PARTITION BY deptno) AS var_samp_by_dept FROM emp; EMPNO ENAME DEPTNO SAL VAR_SAMP_BY_DEPT ---------- ---------- ---------- ---------- ---------------- 7782 CLARK 10 2450 3585833.33 7839 KING 10 5000 3585833.33 7934 MILLER 10 1300 3585833.33 7566 JONES 20 2975 1261875 7902 FORD 20 3000 1261875 7876 ADAMS 20 1100 1261875 7369 SMITH 20 800 1261875 7788 SCOTT 20 3000 1261875 7521 WARD 30 1250 446666.667 7844 TURNER 30 1500 446666.667 7499 ALLEN 30 1600 446666.667 7900 JAMES 30 950 446666.667 7698 BLAKE 30 2850 446666.667 7654 MARTIN 30 1250 446666.667 SQL>
Quick Links
The "*" indicates the function supports the full analytic syntax, including the windowing clause.
For more information see:
- VARIANCE
- VAR_POP
- VAR_SAMP
- Analytic Functions : All Articles
- STDDEV, STDDEV_POP and STDDEV_SAMP Analytic Functions
Hope this helps. Regards Tim...