8i | 9i | 10g | 11g | 12c | 13c | 18c | 19c | 21c | 23c | Misc | PL/SQL | SQL | RAC | WebLogic | Linux
COVAR_POP and COVAR_SAMP Analytic Functions
This article gives an overview of the COVAR_POP
and COVAR_SAMP
analytic functions. If you are new to analytic functions you should probably read this introduction to analytic functions first.
- Setup
- COVAR_POP and COVAR_SAMP as Aggregate Functions
- COVAR_POP Analytic Function
- COVAR_SAMP Analytic Function
- Quick Links
Related articles.
- Analytic Functions : All Articles
- CORR Analytic Function
- STDDEV, STDDEV_POP and STDDEV_SAMP Analytic Functions
Setup
The examples in this article require the following table.
--DROP TABLE emp PURGE; CREATE TABLE emp ( empno NUMBER(4) CONSTRAINT pk_emp PRIMARY KEY, ename VARCHAR2(10), job VARCHAR2(9), mgr NUMBER(4), hiredate DATE, sal NUMBER(7,2), comm NUMBER(7,2), deptno NUMBER(2) ); INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,to_date('17-12-1980','dd-mm-yyyy'),800,NULL,20); INSERT INTO emp VALUES (7499,'ALLEN','SALESMAN',7698,to_date('20-2-1981','dd-mm-yyyy'),1600,300,30); INSERT INTO emp VALUES (7521,'WARD','SALESMAN',7698,to_date('22-2-1981','dd-mm-yyyy'),1250,500,30); INSERT INTO emp VALUES (7566,'JONES','MANAGER',7839,to_date('2-4-1981','dd-mm-yyyy'),2975,NULL,20); INSERT INTO emp VALUES (7654,'MARTIN','SALESMAN',7698,to_date('28-9-1981','dd-mm-yyyy'),1250,1400,30); INSERT INTO emp VALUES (7698,'BLAKE','MANAGER',7839,to_date('1-5-1981','dd-mm-yyyy'),2850,NULL,30); INSERT INTO emp VALUES (7782,'CLARK','MANAGER',7839,to_date('9-6-1981','dd-mm-yyyy'),2450,NULL,10); INSERT INTO emp VALUES (7788,'SCOTT','ANALYST',7566,to_date('13-JUL-87','dd-mm-rr')-85,3000,NULL,20); INSERT INTO emp VALUES (7839,'KING','PRESIDENT',NULL,to_date('17-11-1981','dd-mm-yyyy'),5000,NULL,10); INSERT INTO emp VALUES (7844,'TURNER','SALESMAN',7698,to_date('8-9-1981','dd-mm-yyyy'),1500,0,30); INSERT INTO emp VALUES (7876,'ADAMS','CLERK',7788,to_date('13-JUL-87', 'dd-mm-rr')-51,1100,NULL,20); INSERT INTO emp VALUES (7900,'JAMES','CLERK',7698,to_date('3-12-1981','dd-mm-yyyy'),950,NULL,30); INSERT INTO emp VALUES (7902,'FORD','ANALYST',7566,to_date('3-12-1981','dd-mm-yyyy'),3000,NULL,20); INSERT INTO emp VALUES (7934,'MILLER','CLERK',7782,to_date('23-1-1982','dd-mm-yyyy'),1300,NULL,10); COMMIT;
COVAR_POP and COVAR_SAMP as Aggregate Functions
The COVAR_POP
and COVAR_SAMP
aggregate functions are used to calculate the population covariance and sample covariance of a set of data respectively. As aggregate functions they reduce the number of rows, hence the term "aggregate". If the data isn't grouped we turn the 14 rows in the EMP
table to a single row with the aggregated values. In the following example we are displaying the covariance of the length of employment with the salary.
SELECT COVAR_POP(SYSDATE - hiredate, sal) AS covar_pop_val, COVAR_SAMP(SYSDATE - hiredate, sal) AS covar_samp_val FROM emp; COVAR_POP_VAL COVAR_SAMP_VAL ------------- -------------- -18125.255 -19519.505 SQL>
We can get more granularity of information by including a GROUP BY
clause. In the following example we see the values on a per-job basis.
SELECT job, COVAR_POP(SYSDATE - hiredate, sal) AS covar_pop_val_by_job, COVAR_SAMP(SYSDATE - hiredate, sal) AS covar_samp_val_by_job FROM emp GROUP BY job ORDER BY job; JOB COVAR_POP_VAL_BY_JOB COVAR_SAMP_VAL_BY_JOB --------- -------------------- --------------------- ANALYST 1.0000E-31 2.0000E-31 CLERK -55390.625 -73854.167 MANAGER 6102.77778 9154.16667 PRESIDENT 0 SALESMAN 3325 4433.33333 SQL>
In both cases we have aggregated the data to get the values, returning less rows than we started with. Analytic functions allow us to return these aggregate values while retaining the original row data.
COVAR_POP Analytic Function
The COVAR_POP
function returns the population covariance after discarding nulls. The function performs the following calculation.
(SUM(expr1 * expr2) - SUM(expr2) * SUM(expr1) / n) / n
The basic description for the COVAR_POP
analytic function is shown below. The analytic clause is described in more detail here.
COVAR_POP(expr1, expr2) [ OVER (analytic_clause) ]
Using an empty OVER
clause turns the COVAR_POP
function into an analytic function. The lack of a partitioning clause means the whole result set is treated as a single partition, so we get the population covariance of the length of employment with the salary after discarding nulls, as well as all the original data.
SELECT empno, ename, deptno, sal, job, COVAR_POP(SYSDATE - hiredate, sal) OVER () AS covar_pop_val FROM emp; EMPNO ENAME DEPTNO SAL JOB COVAR_POP_VAL ---------- ---------- ---------- ---------- --------- ------------- 7369 SMITH 20 800 CLERK -18125.255 7499 ALLEN 30 1600 SALESMAN -18125.255 7521 WARD 30 1250 SALESMAN -18125.255 7566 JONES 20 2975 MANAGER -18125.255 7654 MARTIN 30 1250 SALESMAN -18125.255 7698 BLAKE 30 2850 MANAGER -18125.255 7782 CLARK 10 2450 MANAGER -18125.255 7788 SCOTT 20 3000 ANALYST -18125.255 7839 KING 10 5000 PRESIDENT -18125.255 7844 TURNER 30 1500 SALESMAN -18125.255 7876 ADAMS 20 1100 CLERK -18125.255 7900 JAMES 30 950 CLERK -18125.255 7902 FORD 20 3000 ANALYST -18125.255 7934 MILLER 10 1300 CLERK -18125.255 SQL>
Adding the partitioning clause allows us to display the population covariance of the length of employment with the salary per job, along with the employee data.
SELECT empno, ename, deptno, sal, job, COVAR_POP(SYSDATE - hiredate, sal) OVER (PARTITION BY job) AS covar_pop_val_by_job FROM emp; EMPNO ENAME DEPTNO SAL JOB COVAR_POP_VAL_BY_JOB ---------- ---------- ---------- ---------- --------- -------------------- 7788 SCOTT 20 3000 ANALYST 5.0000E-32 7902 FORD 20 3000 ANALYST 5.0000E-32 7934 MILLER 10 1300 CLERK -55390.625 7900 JAMES 30 950 CLERK -55390.625 7369 SMITH 20 800 CLERK -55390.625 7876 ADAMS 20 1100 CLERK -55390.625 7698 BLAKE 30 2850 MANAGER 6102.77778 7566 JONES 20 2975 MANAGER 6102.77778 7782 CLARK 10 2450 MANAGER 6102.77778 7839 KING 10 5000 PRESIDENT 0 7844 TURNER 30 1500 SALESMAN 3325 7654 MARTIN 30 1250 SALESMAN 3325 7521 WARD 30 1250 SALESMAN 3325 7499 ALLEN 30 1600 SALESMAN 3325 SQL>
COVAR_SAMP Analytic Function
The COVAR_SAMP
function returns the sample covariance after discarding nulls. If there are no records in the set after discarding nulls, the return value is NULL. The function performs the following calculation.
(SUM(expr1 * expr2) - SUM(expr1) * SUM(expr2) / n) / (n-1)
The basic description for the COVAR_SAMP
analytic function is shown below. The analytic clause is described in more detail here.
COVAR_SAMP(expr1, expr2) [ OVER (analytic_clause) ]
Using an empty OVER
clause turns the COVAR_SAMP
function into an analytic function. The lack of a partitioning clause means the whole result set is treated as a single partition, so we get the sample covariance of the length of employment with the salary after discarding nulls, as well as all the original data.
SELECT empno, ename, deptno, sal, job, COVAR_SAMP(SYSDATE - hiredate, sal) OVER () AS covar_samp_val FROM emp; EMPNO ENAME DEPTNO SAL JOB COVAR_SAMP_VAL ---------- ---------- ---------- ---------- --------- -------------- 7369 SMITH 20 800 CLERK -19519.505 7499 ALLEN 30 1600 SALESMAN -19519.505 7521 WARD 30 1250 SALESMAN -19519.505 7566 JONES 20 2975 MANAGER -19519.505 7654 MARTIN 30 1250 SALESMAN -19519.505 7698 BLAKE 30 2850 MANAGER -19519.505 7782 CLARK 10 2450 MANAGER -19519.505 7788 SCOTT 20 3000 ANALYST -19519.505 7839 KING 10 5000 PRESIDENT -19519.505 7844 TURNER 30 1500 SALESMAN -19519.505 7876 ADAMS 20 1100 CLERK -19519.505 7900 JAMES 30 950 CLERK -19519.505 7902 FORD 20 3000 ANALYST -19519.505 7934 MILLER 10 1300 CLERK -19519.505 SQL>
Adding the partitioning clause allows us to display the sample covariance of the length of employment with the salary per job, along with the employee data.
SELECT empno, ename, deptno, sal, job, COVAR_SAMP(SYSDATE - hiredate, sal) OVER (PARTITION BY job) AS covar_samp_val_by_dept FROM emp; EMPNO ENAME DEPTNO SAL JOB COVAR_SAMP_VAL_BY_DEPT ---------- ---------- ---------- ---------- --------- ---------------------- 7788 SCOTT 20 3000 ANALYST 1.0000E-31 7902 FORD 20 3000 ANALYST 1.0000E-31 7934 MILLER 10 1300 CLERK -73854.167 7900 JAMES 30 950 CLERK -73854.167 7369 SMITH 20 800 CLERK -73854.167 7876 ADAMS 20 1100 CLERK -73854.167 7698 BLAKE 30 2850 MANAGER 9154.16667 7566 JONES 20 2975 MANAGER 9154.16667 7782 CLARK 10 2450 MANAGER 9154.16667 7839 KING 10 5000 PRESIDENT 7844 TURNER 30 1500 SALESMAN 4433.33333 7654 MARTIN 30 1250 SALESMAN 4433.33333 7521 WARD 30 1250 SALESMAN 4433.33333 7499 ALLEN 30 1600 SALESMAN 4433.33333 SQL>
Quick Links
The "*" indicates the function supports the full analytic syntax, including the windowing clause.
For more information see:
- COVAR_POP
- COVAR_SAMP
- Analytic Functions : All Articles
- CORR Analytic Function
- STDDEV, STDDEV_POP and STDDEV_SAMP Analytic Functions
Hope this helps. Regards Tim...