8i | 9i | 10g | 11g | 12c | 13c | 18c | 19c | 21c | 23ai | Misc | PL/SQL | SQL | RAC | WebLogic | Linux

SKEWNESS_POP and SKEWNESS_SAMP Analytic Functions

This article gives an overview of the SKEWNESS_POP and SKEWNESS_SAMP analytic functions. If you are new to analytic functions you should probably read this introduction to analytic functions first.

Basics
Setup
SKEWNESS_POP and SKEWNESS_SAMP as Aggregate Functions
SKEWNESS_POP Analytic Function
SKEWNESS_SAMP Analytic Function
Quick Links

Basics

You can read a basic introduction to skewness here.

The SKEWNESS_POP and SKEWNESS_SAMP aggregate and analytic functions were added in Oracle 21c to measure asymmetry, or skew, in the distribution of data. In both cases they return a numeric value with the following meaning.

Negative : Left skew. The data is mostly on the right, but the long tail is on the left, so this is considered left skew or left tailed.
Zero : Data has a normal distribution. Most of the values cluster around the mean, with fewer at the tails. Zero is also returned if the data set has less than 3 rows.
Positive : Right skew. The data is mostly on the left, but the long tail is on the right, so this is considered right skew or right tailed.
Null : Null values in the expression are ignored, and the function will return null if all rows have a null value for the expression.

The SKEWNESS_POP and SKEWNESS_SAMP functions will not return the same results, but both should be representative of the distribution of data. The larger the data set, the more similar their results will be.

Setup

We need some data with various distribution patterns to measure using the SKEWNESS_POP and SKEWNESS_SAMP aggregate functions. The following table contains columns that represent a left skew, normal distribution and a right skew.

-- drop table t1 purge;

create table t1 (
  id         number generated always as identity,
  left_skew  number,
  normal     number,
  right_skew number
);


insert into t1 (left_skew, normal, right_skew)
select case
         when level > 9800 then 1
         else dbms_random.value(400, 500)
       end,
       dbms_random.normal,
       case
         when level < 200 then dbms_random.value(400, 500)
         else 1
       end
from   dual
connect by level <= 10000;
commit;

We will use the following table to demonstrate the use of the SKEWNESS_POP and SKEWNESS_SAMP analytic functions. The sample size is too small, but the EMP table will be familiar to people practicing analytic function using the articles on this site.

create table emp (
  empno    number(4) constraint pk_emp primary key,
  ename    varchar2(10),
  job      varchar2(9),
  mgr      number(4),
  hiredate date,
  sal      number(7,2),
  comm     number(7,2),
  deptno   number(2) );

insert into emp values (7369,'SMITH','CLERK',7902,to_date('17-12-1980','dd-mm-yyyy'),800,NULL,20);
insert into emp values (7499,'ALLEN','SALESMAN',7698,to_date('20-2-1981','dd-mm-yyyy'),1600,300,30);
insert into emp values (7521,'WARD','SALESMAN',7698,to_date('22-2-1981','dd-mm-yyyy'),1250,500,30);
insert into emp values (7566,'JONES','MANAGER',7839,to_date('2-4-1981','dd-mm-yyyy'),2975,NULL,20);
insert into emp values (7654,'MARTIN','SALESMAN',7698,to_date('28-9-1981','dd-mm-yyyy'),1250,1400,30);
insert into emp values (7698,'BLAKE','MANAGER',7839,to_date('1-5-1981','dd-mm-yyyy'),2850,NULL,30);
insert into emp values (7782,'CLARK','MANAGER',7839,to_date('9-6-1981','dd-mm-yyyy'),2450,NULL,10);
insert into emp values (7788,'SCOTT','ANALYST',7566,to_date('13-JUL-87','dd-mm-rr')-85,3000,NULL,20);
insert into emp values (7839,'KING','PRESIDENT',NULL,to_date('17-11-1981','dd-mm-yyyy'),5000,NULL,10);
insert into emp values (7844,'TURNER','SALESMAN',7698,to_date('8-9-1981','dd-mm-yyyy'),1500,0,30);
insert into emp values (7876,'ADAMS','CLERK',7788,to_date('13-JUL-87', 'dd-mm-rr')-51,1100,NULL,20);
insert into emp values (7900,'JAMES','CLERK',7698,to_date('3-12-1981','dd-mm-yyyy'),950,NULL,30);
insert into emp values (7902,'FORD','ANALYST',7566,to_date('3-12-1981','dd-mm-yyyy'),3000,NULL,20);
insert into emp values (7934,'MILLER','CLERK',7782,to_date('23-1-1982','dd-mm-yyyy'),1300,NULL,10);
commit;

SKEWNESS_POP and SKEWNESS_SAMP as Aggregate Functions

The following query uses the SKEWNESS_POP aggregate function to display the distribution of the data in the LEFT_SKEW, NORMAL and RIGHT_SKEW columns.

select skewness_pop(left_skew) as left_skew,
       skewness_pop(normal) as normal,
       skewness_pop(right_skew) as right_skew
from   t1;

 LEFT_SKEW     NORMAL RIGHT_SKEW
---------- ---------- ----------
-5.1049315  -.0077114 6.91847714

SQL>

As expected the LEFT_SKEW column returns a negative value, the NORMAL column returns a near-zero value and the RIGHT_SKEW column returns a positive value.

The SKEWNESS_POP function uses a sample size of 100% of the rows, which can represent an overhead for large data sets. In contrast the SKEWNESS_SAMP function uses a smaller sample size, making it more efficient for large data sets, whilst still returning representative results.

select skewness_samp(left_skew) as left_skew,
       skewness_samp(normal) as normal,
       skewness_samp(right_skew) as right_skew
from   t1;

 LEFT_SKEW     NORMAL RIGHT_SKEW
---------- ---------- ----------
-5.1056973 -.00771256 6.91951511

SQL>

Using DISTINCT or UNIQUE keywords mean only unique values in the expression are used for the calculation. The ALL keyword is that same as the default action.

select skewness_samp(distinct left_skew) as skew_distinct,
       skewness_samp(unique left_skew) as skew_unique,
       skewness_samp(all left_skew) as skew_all,
       skewness_samp(left_skew) as skew
from   t1;

SKEW_DISTINCT SKEW_UNIQUE   SKEW_ALL       SKEW
------------- ----------- ---------- ----------
 -0.370901916 -0.370901916 -5.11615858 -5.11615858

SQL>

The DISTINCT, UNIQUE and ALL keywords are also available for the analytic functions.

SKEWNESS_POP Analytic Function

Using an empty OVER clause turns the SKEWNESS_POP function into an analytic function. The lack of a partitioning clause means the whole result set is treated as a single partition. The following query uses the SKEWNESS_POP analytic function to display the skewness of the data in the SAL column, as well as all the original data.

select empno,
       ename,
       deptno,
       sal,
       round(skewness_pop(sal) over (),2) as sal_skew 
from   emp;

     EMPNO ENAME          DEPTNO        SAL   SAL_SKEW
---------- ---------- ---------- ---------- ----------
      7369 SMITH              20        800       1.04
      7499 ALLEN              30       1600       1.04
      7521 WARD               30       1250       1.04
      7566 JONES              20       2975       1.04
      7654 MARTIN             30       1250       1.04
      7698 BLAKE              30       2850       1.04
      7782 CLARK              10       2450       1.04
      7788 SCOTT              20       3000       1.04
      7839 KING               10       5000       1.04
      7844 TURNER             30       1500       1.04
      7876 ADAMS              20       1100       1.04
      7900 JAMES              30        950       1.04
      7902 FORD               20       3000       1.04
      7934 MILLER             10       1300       1.04

SQL>

Adding the partitioning clause allows us to display the salary skew per department, along with the employee data for each department.

select empno,
       ename,
       deptno,
       sal,
       round(skewness_pop(sal) over (partition by deptno),2) as sal_skew_by_dept
from   emp;

     EMPNO ENAME          DEPTNO        SAL SAL_SKEW_BY_DEPT
---------- ---------- ---------- ---------- ----------------
      7782 CLARK              10       2450              .43
      7839 KING               10       5000              .43
      7934 MILLER             10       1300              .43
      7566 JONES              20       2975            -0.44
      7902 FORD               20       3000            -0.44
      7876 ADAMS              20       1100            -0.44
      7369 SMITH              20        800            -0.44
      7788 SCOTT              20       3000            -0.44
      7521 WARD               30       1250             1.33
      7844 TURNER             30       1500             1.33
      7499 ALLEN              30       1600             1.33
      7900 JAMES              30        950             1.33
      7698 BLAKE              30       2850             1.33
      7654 MARTIN             30       1250             1.33

SQL>

SKEWNESS_SAMP Analytic Function

Using an empty OVER clause turns the SKEWNESS_SAMP function into an analytic function. The lack of a partitioning clause means the whole result set is treated as a single partition. The following query uses the SKEWNESS_SAMP analytic function to display the skewness of the data in the SAL column, as well as all the original data.

select empno,
       ename,
       deptno,
       sal,
       round(skewness_samp(sal) over (),2) as sal_skew 
from   emp;

     EMPNO ENAME          DEPTNO        SAL   SAL_SKEW
---------- ---------- ---------- ---------- ----------
      7369 SMITH              20        800       1.17
      7499 ALLEN              30       1600       1.17
      7521 WARD               30       1250       1.17
      7566 JONES              20       2975       1.17
      7654 MARTIN             30       1250       1.17
      7698 BLAKE              30       2850       1.17
      7782 CLARK              10       2450       1.17
      7788 SCOTT              20       3000       1.17
      7839 KING               10       5000       1.17
      7844 TURNER             30       1500       1.17
      7876 ADAMS              20       1100       1.17
      7900 JAMES              30        950       1.17
      7902 FORD               20       3000       1.17
      7934 MILLER             10       1300       1.17

SQL>

Adding the partitioning clause allows us to display the salary skew per department, along with the employee data for each department.

select empno,
       ename,
       deptno,
       sal,
       round(skewness_samp(sal) over (partition by deptno),2) as sal_skew_by_dept
from   emp;

     EMPNO ENAME          DEPTNO        SAL SAL_SKEW_BY_DEPT
---------- ---------- ---------- ---------- ----------------
      7782 CLARK              10       2450             1.04
      7839 KING               10       5000             1.04
      7934 MILLER             10       1300             1.04
      7566 JONES              20       2975            -0.65
      7902 FORD               20       3000            -0.65
      7876 ADAMS              20       1100            -0.65
      7369 SMITH              20        800            -0.65
      7788 SCOTT              20       3000            -0.65
      7521 WARD               30       1250             1.82
      7844 TURNER             30       1500             1.82
      7499 ALLEN              30       1600             1.82
      7900 JAMES              30        950             1.82
      7698 BLAKE              30       2850             1.82
      7654 MARTIN             30       1250             1.82

SQL>

Quick Links

The "*" indicates the function supports the full analytic syntax, including the windowing clause.

AVG *	BIT_AND_AGG *	BIT_OR_AGG *	BIT_XOR_AGG *	CHECKSUM *
CLUSTER_DETAILS	CLUSTER_DISTANCE	CLUSTER_ID	CLUSTER_SET	CORR *
COUNT *	COVAR_POP *	COVAR_SAMP *	CUME_DIST	DENSE_RANK
FEATURE_DETAILS	FEATURE_ID	FEATURE_SET	FEATURE_VALUE	FIRST
FIRST_VALUE *	KURTOSIS_POP *	KURTOSIS_SAMP *	LAG	LAST
LAST_VALUE *	LEAD	LISTAGG	MATCH_RECOGNIZE	MAX *
MEDIAN	MIN *	NTH_VALUE *	NTILE	PERCENT_RANK
PERCENTILE_CONT	PERCENTILE_DISC	PREDICTION	PREDICTION_COST	PREDICTION
PREDICTION_COST	PREDICTION_DETAILS	PREDICTION_PROBABILITY	PREDICTION_SET	RANK
RATIO_TO_REPORT	REGR_ (Linear Regression) Functions *	ROW_NUMBER	SKEWNESS_POP *	SKEWNESS_SAMP *
STDDEV *	STDDEV_POP *	STDDEV_SAMP *	SUM *	VAR_POP *
VAR_SAMP *	VARIANCE *	String Aggregation	Top-N Queries

For more information see:

Hope this helps. Regards Tim...

Back to the Top.