8i | 9i | 10g | 11g | 12c | 13c | 18c | Misc | PL/SQL | SQL | RAC | WebLogic | Linux

Home » Articles » Misc » Here

CORR Analytic Function

This article gives an overview of the CORR analytic function. If you are new to analytic functions you should probably read this introduction to analytic functions first.

Related articles.

Setup

The examples in this article require the following table.

--DROP TABLE emp PURGE;

CREATE TABLE emp (
  empno    NUMBER(4) CONSTRAINT pk_emp PRIMARY KEY,
  ename    VARCHAR2(10),
  job      VARCHAR2(9),
  mgr      NUMBER(4),
  hiredate DATE,
  sal      NUMBER(7,2),
  comm     NUMBER(7,2),
  deptno   NUMBER(2)
);

INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,to_date('17-12-1980','dd-mm-yyyy'),800,NULL,20);
INSERT INTO emp VALUES (7499,'ALLEN','SALESMAN',7698,to_date('20-2-1981','dd-mm-yyyy'),1600,300,30);
INSERT INTO emp VALUES (7521,'WARD','SALESMAN',7698,to_date('22-2-1981','dd-mm-yyyy'),1250,500,30);
INSERT INTO emp VALUES (7566,'JONES','MANAGER',7839,to_date('2-4-1981','dd-mm-yyyy'),2975,NULL,20);
INSERT INTO emp VALUES (7654,'MARTIN','SALESMAN',7698,to_date('28-9-1981','dd-mm-yyyy'),1250,1400,30);
INSERT INTO emp VALUES (7698,'BLAKE','MANAGER',7839,to_date('1-5-1981','dd-mm-yyyy'),2850,NULL,30);
INSERT INTO emp VALUES (7782,'CLARK','MANAGER',7839,to_date('9-6-1981','dd-mm-yyyy'),2450,NULL,10);
INSERT INTO emp VALUES (7788,'SCOTT','ANALYST',7566,to_date('13-JUL-87','dd-mm-rr')-85,3000,NULL,20);
INSERT INTO emp VALUES (7839,'KING','PRESIDENT',NULL,to_date('17-11-1981','dd-mm-yyyy'),5000,NULL,10);
INSERT INTO emp VALUES (7844,'TURNER','SALESMAN',7698,to_date('8-9-1981','dd-mm-yyyy'),1500,0,30);
INSERT INTO emp VALUES (7876,'ADAMS','CLERK',7788,to_date('13-JUL-87', 'dd-mm-rr')-51,1100,NULL,20);
INSERT INTO emp VALUES (7900,'JAMES','CLERK',7698,to_date('3-12-1981','dd-mm-yyyy'),950,NULL,30);
INSERT INTO emp VALUES (7902,'FORD','ANALYST',7566,to_date('3-12-1981','dd-mm-yyyy'),3000,NULL,20);
INSERT INTO emp VALUES (7934,'MILLER','CLERK',7782,to_date('23-1-1982','dd-mm-yyyy'),1300,NULL,10);
COMMIT;

CORR as an Aggregate Function

The CORR aggregate function is used to calculate the coefficient of correlation, or Pearson correlation coefficient. As an aggregate function it reduces the number of rows, hence the term "aggregate". If the data isn't grouped we turn the 14 rows in the EMP table to a single row with the aggregated values. The following example includes the CORR_S (Spearman rho correlation coefficient) and CORR_K (Kendall tau-b correlation coefficient) functions, which are only available as aggregate functions. In the following example we are correlating the length of employment with the salary.

SELECT CORR(SYSDATE - hiredate, sal) AS corr_val,
       CORR_S(SYSDATE - hiredate, sal) AS corr_s_val,
       CORR_K(SYSDATE - hiredate, sal) AS corr_k_val
FROM emp;

  CORR_VAL CORR_S_VAL CORR_K_VAL
---------- ---------- ----------
-.02116564 .082690238 .111733587

SQL>

We can get more granularity of information by including a GROUP BY clause. In the following example we see the values on a per-job basis.

SELECT job,
       CORR(SYSDATE - hiredate, sal) AS corr_val_by_job,
       CORR_S(SYSDATE - hiredate, sal) AS corr_s_val_by_job,
       CORR_K(SYSDATE - hiredate, sal) AS corr_k_val_by_job
FROM emp
GROUP BY job
ORDER BY job;

JOB       CORR_VAL_BY_JOB CORR_S_VAL_BY_JOB CORR_K_VAL_BY_JOB
--------- --------------- ----------------- -----------------
ANALYST
CLERK           -.3250778                .4        .333333333
MANAGER        .978243943                 1                 1
PRESIDENT
SALESMAN       .205987887        .210818511        .182574186

SQL>

In both cases we have aggregated the data to get the values, returning less rows than we started with. Analytic functions allow us to return these aggregate values while retaining the original row data.

CORR Analytic Function

The CORR analytic function is used to calculate the coefficient of correlation, or Pearson correlation coefficient. The function computes the following on rows in the set with no null values for either expression.

COVAR_POP(expr1, expr2) / (STDDEV_POP(expr1) * STDDEV_POP(expr2))

The function returns a value between +1 and -1, which represent the following, along with an example interpretation.

The basic description for the CORR analytic function is shown below. The analytic clause is described in more detail here.

CORR(expr1, expr2) [ OVER (analytic_clause) ]

Using an empty OVER clause turns the CORR function into an analytic function. The lack of a partitioning clause means the whole result set is treated as a single partition, so we get the correlation of the length of employment with the salary for all employees, as well as all the original data.

SELECT empno,
       ename,
       deptno,
       sal,
       job,
       CORR(SYSDATE - hiredate, sal) OVER () AS corr_val
FROM   emp;

     EMPNO ENAME          DEPTNO        SAL JOB         CORR_VAL
---------- ---------- ---------- ---------- --------- ----------
      7369 SMITH              20        800 CLERK     -.02116564
      7499 ALLEN              30       1600 SALESMAN  -.02116564
      7521 WARD               30       1250 SALESMAN  -.02116564
      7566 JONES              20       2975 MANAGER   -.02116564
      7654 MARTIN             30       1250 SALESMAN  -.02116564
      7698 BLAKE              30       2850 MANAGER   -.02116564
      7782 CLARK              10       2450 MANAGER   -.02116564
      7788 SCOTT              20       3000 ANALYST   -.02116564
      7839 KING               10       5000 PRESIDENT -.02116564
      7844 TURNER             30       1500 SALESMAN  -.02116564
      7876 ADAMS              20       1100 CLERK     -.02116564
      7900 JAMES              30        950 CLERK     -.02116564
      7902 FORD               20       3000 ANALYST   -.02116564
      7934 MILLER             10       1300 CLERK     -.02116564

SQL>

Adding the partitioning clause allows us to display the correlation per department, along with the employee data.

SELECT empno,
       ename,
       deptno,
       sal,
       job,
       CORR(SYSDATE - hiredate, sal) OVER (PARTITION BY deptno) AS corr_val
FROM   emp;

     EMPNO ENAME          DEPTNO        SAL JOB         CORR_VAL
---------- ---------- ---------- ---------- --------- ----------
      7782 CLARK              10       2450 MANAGER   .074769211
      7839 KING               10       5000 PRESIDENT .074769211
      7934 MILLER             10       1300 CLERK     .074769211
      7566 JONES              20       2975 MANAGER   .044630566
      7902 FORD               20       3000 ANALYST   .044630566
      7876 ADAMS              20       1100 CLERK     .044630566
      7369 SMITH              20        800 CLERK     .044630566
      7788 SCOTT              20       3000 ANALYST   .044630566
      7521 WARD               30       1250 SALESMAN  .417964304
      7844 TURNER             30       1500 SALESMAN  .417964304
      7499 ALLEN              30       1600 SALESMAN  .417964304
      7900 JAMES              30        950 CLERK     .417964304
      7698 BLAKE              30       2850 MANAGER   .417964304
      7654 MARTIN             30       1250 SALESMAN  .417964304

SQL>

Alternatively we can display the correlation per job type, along with the employee data.

SELECT empno,
       ename,
       deptno,
       sal,
       job,
       CORR(SYSDATE - hiredate, sal) OVER (PARTITION BY job) AS corr_val
FROM   emp
ORDER BY JOB, deptno;

     EMPNO ENAME          DEPTNO        SAL JOB         CORR_VAL
---------- ---------- ---------- ---------- --------- ----------
      7902 FORD               20       3000 ANALYST
      7788 SCOTT              20       3000 ANALYST
      7934 MILLER             10       1300 CLERK      -.3250778
      7369 SMITH              20        800 CLERK      -.3250778
      7876 ADAMS              20       1100 CLERK      -.3250778
      7900 JAMES              30        950 CLERK      -.3250778
      7782 CLARK              10       2450 MANAGER   .978243943
      7566 JONES              20       2975 MANAGER   .978243943
      7698 BLAKE              30       2850 MANAGER   .978243943
      7839 KING               10       5000 PRESIDENT
      7654 MARTIN             30       1250 SALESMAN  .205987887
      7844 TURNER             30       1500 SALESMAN  .205987887
      7521 WARD               30       1250 SALESMAN  .205987887
      7499 ALLEN              30       1600 SALESMAN  .205987887

SQL>

Quick Links

The "*" indicates the function supports the full analytic syntax, including the windowing clause.

AVG * CLUSTER_DETAILS CLUSTER_DISTANCE CLUSTER_ID CLUSTER_PROBABILITY
CLUSTER_SET CORR * COUNT * COVAR_POP * COVAR_SAMP *
CUME_DIST DENSE_RANK FEATURE_DETAILS FEATURE_ID FEATURE_SET
FEATURE_VALUE FIRST FIRST_VALUE * LAG LAST
LAST_VALUE * LEAD LISTAGG MATCH_RECOGNIZE MAX *
MEDIAN MIN * NTH_VALUE * NTILE PERCENT_RANK
PERCENTILE_CONT PERCENTILE_DISC PREDICTION PREDICTION_COST PREDICTION_DETAILS
PREDICTION_PROBABILITY PREDICTION_SET RANK RATIO_TO_REPORT REGR_ (Linear Regression) Functions *
ROW_NUMBER STDDEV * STDDEV_POP * STDDEV_SAMP * SUM *
VAR_POP * VAR_SAMP * VARIANCE * String Aggregation Top-N Queries

For more information see:

Hope this helps. Regards Tim...

Back to the Top.