8i | 9i | 10g | 11g | 12c | 13c | 18c | 19c | 21c | 23c | Misc | PL/SQL | SQL | RAC | WebLogic | Linux

Home » Articles » Misc » Here

COVAR_POP and COVAR_SAMP Analytic Functions

This article gives an overview of the COVAR_POP and COVAR_SAMP analytic functions. If you are new to analytic functions you should probably read this introduction to analytic functions first.

Related articles.

Setup

The examples in this article require the following table.

--DROP TABLE emp PURGE;

CREATE TABLE emp (
  empno    NUMBER(4) CONSTRAINT pk_emp PRIMARY KEY,
  ename    VARCHAR2(10),
  job      VARCHAR2(9),
  mgr      NUMBER(4),
  hiredate DATE,
  sal      NUMBER(7,2),
  comm     NUMBER(7,2),
  deptno   NUMBER(2)
);

INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,to_date('17-12-1980','dd-mm-yyyy'),800,NULL,20);
INSERT INTO emp VALUES (7499,'ALLEN','SALESMAN',7698,to_date('20-2-1981','dd-mm-yyyy'),1600,300,30);
INSERT INTO emp VALUES (7521,'WARD','SALESMAN',7698,to_date('22-2-1981','dd-mm-yyyy'),1250,500,30);
INSERT INTO emp VALUES (7566,'JONES','MANAGER',7839,to_date('2-4-1981','dd-mm-yyyy'),2975,NULL,20);
INSERT INTO emp VALUES (7654,'MARTIN','SALESMAN',7698,to_date('28-9-1981','dd-mm-yyyy'),1250,1400,30);
INSERT INTO emp VALUES (7698,'BLAKE','MANAGER',7839,to_date('1-5-1981','dd-mm-yyyy'),2850,NULL,30);
INSERT INTO emp VALUES (7782,'CLARK','MANAGER',7839,to_date('9-6-1981','dd-mm-yyyy'),2450,NULL,10);
INSERT INTO emp VALUES (7788,'SCOTT','ANALYST',7566,to_date('13-JUL-87','dd-mm-rr')-85,3000,NULL,20);
INSERT INTO emp VALUES (7839,'KING','PRESIDENT',NULL,to_date('17-11-1981','dd-mm-yyyy'),5000,NULL,10);
INSERT INTO emp VALUES (7844,'TURNER','SALESMAN',7698,to_date('8-9-1981','dd-mm-yyyy'),1500,0,30);
INSERT INTO emp VALUES (7876,'ADAMS','CLERK',7788,to_date('13-JUL-87', 'dd-mm-rr')-51,1100,NULL,20);
INSERT INTO emp VALUES (7900,'JAMES','CLERK',7698,to_date('3-12-1981','dd-mm-yyyy'),950,NULL,30);
INSERT INTO emp VALUES (7902,'FORD','ANALYST',7566,to_date('3-12-1981','dd-mm-yyyy'),3000,NULL,20);
INSERT INTO emp VALUES (7934,'MILLER','CLERK',7782,to_date('23-1-1982','dd-mm-yyyy'),1300,NULL,10);
COMMIT;

COVAR_POP and COVAR_SAMP as Aggregate Functions

The COVAR_POP and COVAR_SAMP aggregate functions are used to calculate the population covariance and sample covariance of a set of data respectively. As aggregate functions they reduce the number of rows, hence the term "aggregate". If the data isn't grouped we turn the 14 rows in the EMP table to a single row with the aggregated values. In the following example we are displaying the covariance of the length of employment with the salary.

SELECT COVAR_POP(SYSDATE - hiredate, sal) AS covar_pop_val,
       COVAR_SAMP(SYSDATE - hiredate, sal) AS covar_samp_val
FROM emp;

COVAR_POP_VAL COVAR_SAMP_VAL
------------- --------------
   -18125.255     -19519.505

SQL>

We can get more granularity of information by including a GROUP BY clause. In the following example we see the values on a per-job basis.

SELECT job,
       COVAR_POP(SYSDATE - hiredate, sal) AS covar_pop_val_by_job,
       COVAR_SAMP(SYSDATE - hiredate, sal) AS covar_samp_val_by_job
FROM emp
GROUP BY job
ORDER BY job;

JOB       COVAR_POP_VAL_BY_JOB COVAR_SAMP_VAL_BY_JOB
--------- -------------------- ---------------------
ANALYST             1.0000E-31            2.0000E-31
CLERK               -55390.625            -73854.167
MANAGER             6102.77778            9154.16667
PRESIDENT                    0
SALESMAN                  3325            4433.33333

SQL>

In both cases we have aggregated the data to get the values, returning less rows than we started with. Analytic functions allow us to return these aggregate values while retaining the original row data.

COVAR_POP Analytic Function

The COVAR_POP function returns the population covariance after discarding nulls. The function performs the following calculation.

(SUM(expr1 * expr2) - SUM(expr2) * SUM(expr1) / n) / n

The basic description for the COVAR_POP analytic function is shown below. The analytic clause is described in more detail here.

COVAR_POP(expr1, expr2) [ OVER (analytic_clause) ]

Using an empty OVER clause turns the COVAR_POP function into an analytic function. The lack of a partitioning clause means the whole result set is treated as a single partition, so we get the population covariance of the length of employment with the salary after discarding nulls, as well as all the original data.

SELECT empno,
       ename,
       deptno,
       sal,
       job,
       COVAR_POP(SYSDATE - hiredate, sal) OVER () AS covar_pop_val
FROM   emp;

     EMPNO ENAME          DEPTNO        SAL JOB       COVAR_POP_VAL
---------- ---------- ---------- ---------- --------- -------------
      7369 SMITH              20        800 CLERK        -18125.255
      7499 ALLEN              30       1600 SALESMAN     -18125.255
      7521 WARD               30       1250 SALESMAN     -18125.255
      7566 JONES              20       2975 MANAGER      -18125.255
      7654 MARTIN             30       1250 SALESMAN     -18125.255
      7698 BLAKE              30       2850 MANAGER      -18125.255
      7782 CLARK              10       2450 MANAGER      -18125.255
      7788 SCOTT              20       3000 ANALYST      -18125.255
      7839 KING               10       5000 PRESIDENT    -18125.255
      7844 TURNER             30       1500 SALESMAN     -18125.255
      7876 ADAMS              20       1100 CLERK        -18125.255
      7900 JAMES              30        950 CLERK        -18125.255
      7902 FORD               20       3000 ANALYST      -18125.255
      7934 MILLER             10       1300 CLERK        -18125.255

SQL>

Adding the partitioning clause allows us to display the population covariance of the length of employment with the salary per job, along with the employee data.

SELECT empno,
       ename,
       deptno,
       sal,
       job,
       COVAR_POP(SYSDATE - hiredate, sal) OVER (PARTITION BY job) AS covar_pop_val_by_job
FROM   emp;

     EMPNO ENAME          DEPTNO        SAL JOB       COVAR_POP_VAL_BY_JOB
---------- ---------- ---------- ---------- --------- --------------------
      7788 SCOTT              20       3000 ANALYST             5.0000E-32
      7902 FORD               20       3000 ANALYST             5.0000E-32
      7934 MILLER             10       1300 CLERK               -55390.625
      7900 JAMES              30        950 CLERK               -55390.625
      7369 SMITH              20        800 CLERK               -55390.625
      7876 ADAMS              20       1100 CLERK               -55390.625
      7698 BLAKE              30       2850 MANAGER             6102.77778
      7566 JONES              20       2975 MANAGER             6102.77778
      7782 CLARK              10       2450 MANAGER             6102.77778
      7839 KING               10       5000 PRESIDENT                    0
      7844 TURNER             30       1500 SALESMAN                  3325
      7654 MARTIN             30       1250 SALESMAN                  3325
      7521 WARD               30       1250 SALESMAN                  3325
      7499 ALLEN              30       1600 SALESMAN                  3325

SQL>

COVAR_SAMP Analytic Function

The COVAR_SAMP function returns the sample covariance after discarding nulls. If there are no records in the set after discarding nulls, the return value is NULL. The function performs the following calculation.

(SUM(expr1 * expr2) - SUM(expr1) * SUM(expr2) / n) / (n-1)

The basic description for the COVAR_SAMP analytic function is shown below. The analytic clause is described in more detail here.

COVAR_SAMP(expr1, expr2) [ OVER (analytic_clause) ]

Using an empty OVER clause turns the COVAR_SAMP function into an analytic function. The lack of a partitioning clause means the whole result set is treated as a single partition, so we get the sample covariance of the length of employment with the salary after discarding nulls, as well as all the original data.

SELECT empno,
       ename,
       deptno,
       sal,
       job,
       COVAR_SAMP(SYSDATE - hiredate, sal) OVER () AS covar_samp_val
FROM   emp;

     EMPNO ENAME          DEPTNO        SAL JOB       COVAR_SAMP_VAL
---------- ---------- ---------- ---------- --------- --------------
      7369 SMITH              20        800 CLERK         -19519.505
      7499 ALLEN              30       1600 SALESMAN      -19519.505
      7521 WARD               30       1250 SALESMAN      -19519.505
      7566 JONES              20       2975 MANAGER       -19519.505
      7654 MARTIN             30       1250 SALESMAN      -19519.505
      7698 BLAKE              30       2850 MANAGER       -19519.505
      7782 CLARK              10       2450 MANAGER       -19519.505
      7788 SCOTT              20       3000 ANALYST       -19519.505
      7839 KING               10       5000 PRESIDENT     -19519.505
      7844 TURNER             30       1500 SALESMAN      -19519.505
      7876 ADAMS              20       1100 CLERK         -19519.505
      7900 JAMES              30        950 CLERK         -19519.505
      7902 FORD               20       3000 ANALYST       -19519.505
      7934 MILLER             10       1300 CLERK         -19519.505

SQL>

Adding the partitioning clause allows us to display the sample covariance of the length of employment with the salary per job, along with the employee data.

SELECT empno,
       ename,
       deptno,
       sal,
       job,
       COVAR_SAMP(SYSDATE - hiredate, sal) OVER (PARTITION BY job) AS covar_samp_val_by_dept
FROM   emp;

     EMPNO ENAME          DEPTNO        SAL JOB       COVAR_SAMP_VAL_BY_DEPT
---------- ---------- ---------- ---------- --------- ----------------------
      7788 SCOTT              20       3000 ANALYST               1.0000E-31
      7902 FORD               20       3000 ANALYST               1.0000E-31
      7934 MILLER             10       1300 CLERK                 -73854.167
      7900 JAMES              30        950 CLERK                 -73854.167
      7369 SMITH              20        800 CLERK                 -73854.167
      7876 ADAMS              20       1100 CLERK                 -73854.167
      7698 BLAKE              30       2850 MANAGER               9154.16667
      7566 JONES              20       2975 MANAGER               9154.16667
      7782 CLARK              10       2450 MANAGER               9154.16667
      7839 KING               10       5000 PRESIDENT
      7844 TURNER             30       1500 SALESMAN              4433.33333
      7654 MARTIN             30       1250 SALESMAN              4433.33333
      7521 WARD               30       1250 SALESMAN              4433.33333
      7499 ALLEN              30       1600 SALESMAN              4433.33333

SQL>

Quick Links

The "*" indicates the function supports the full analytic syntax, including the windowing clause.

AVG * BIT_AND_AGG * BIT_OR_AGG * BIT_XOR_AGG * CHECKSUM *
CLUSTER_DETAILS CLUSTER_DISTANCE CLUSTER_ID CLUSTER_SET CORR *
COUNT * COVAR_POP * COVAR_SAMP * CUME_DIST DENSE_RANK
FEATURE_DETAILS FEATURE_ID FEATURE_SET FEATURE_VALUE FIRST
FIRST_VALUE * KURTOSIS_POP * KURTOSIS_SAMP * LAG LAST
LAST_VALUE * LEAD LISTAGG MATCH_RECOGNIZE MAX *
MEDIAN MIN * NTH_VALUE * NTILE PERCENT_RANK
PERCENTILE_CONT PERCENTILE_DISC PREDICTION PREDICTION_COST PREDICTION
PREDICTION_COST PREDICTION_DETAILS PREDICTION_PROBABILITY PREDICTION_SET RANK
RATIO_TO_REPORT REGR_ (Linear Regression) Functions * ROW_NUMBER SKEWNESS_POP * SKEWNESS_SAMP *
STDDEV * STDDEV_POP * STDDEV_SAMP * SUM * VAR_POP *
VAR_SAMP * VARIANCE * String Aggregation Top-N Queries  

For more information see:

Hope this helps. Regards Tim...

Back to the Top.