8i | 9i | 10g | 11g | 12c | 13c | 18c | 19c | 21c | 23c | Misc | PL/SQL | SQL | RAC | WebLogic | Linux
SKEWNESS_POP and SKEWNESS_SAMP Analytic Functions
This article gives an overview of the SKEWNESS_POP
and SKEWNESS_SAMP
analytic functions. If you are new to analytic functions you should probably read this introduction to analytic functions first.
- Basics
- Setup
- SKEWNESS_POP and SKEWNESS_SAMP as Aggregate Functions
- SKEWNESS_POP Analytic Function
- SKEWNESS_SAMP Analytic Function
- Quick Links
Related articles.
Basics
You can read a basic introduction to skewness here.
The SKEWNESS_POP
and SKEWNESS_SAMP
aggregate and analytic functions were added in Oracle 21c to measure asymmetry, or skew, in the distribution of data. In both cases they return a numeric value with the following meaning.
- Negative : Left skew. The data is mostly on the right, but the long tail is on the left, so this is considered left skew or left tailed.
- Zero : Data has a normal distribution. Most of the values cluster around the mean, with fewer at the tails. Zero is also returned if the data set has less than 3 rows.
- Positive : Right skew. The data is mostly on the left, but the long tail is on the right, so this is considered right skew or right tailed.
- Null : Null values in the expression are ignored, and the function will return null if all rows have a null value for the expression.
The SKEWNESS_POP
and SKEWNESS_SAMP
functions will not return the same results, but both should be representative of the distribution of data. The larger the data set, the more similar their results will be.
Setup
We need some data with various distribution patterns to measure using the SKEWNESS_POP
and SKEWNESS_SAMP
aggregate functions. The following table contains columns that represent a left skew, normal distribution and a right skew.
-- drop table t1 purge; create table t1 ( id number generated always as identity, left_skew number, normal number, right_skew number ); insert into t1 (left_skew, normal, right_skew) select case when level > 9800 then 1 else dbms_random.value(400, 500) end, dbms_random.normal, case when level < 200 then dbms_random.value(400, 500) else 1 end from dual connect by level <= 10000; commit;
We will use the following table to demonstrate the use of the SKEWNESS_POP
and SKEWNESS_SAMP
analytic functions. The sample size is too small, but the EMP
table will be familiar to people practicing analytic function using the articles on this site.
create table emp ( empno number(4) constraint pk_emp primary key, ename varchar2(10), job varchar2(9), mgr number(4), hiredate date, sal number(7,2), comm number(7,2), deptno number(2) ); insert into emp values (7369,'SMITH','CLERK',7902,to_date('17-12-1980','dd-mm-yyyy'),800,NULL,20); insert into emp values (7499,'ALLEN','SALESMAN',7698,to_date('20-2-1981','dd-mm-yyyy'),1600,300,30); insert into emp values (7521,'WARD','SALESMAN',7698,to_date('22-2-1981','dd-mm-yyyy'),1250,500,30); insert into emp values (7566,'JONES','MANAGER',7839,to_date('2-4-1981','dd-mm-yyyy'),2975,NULL,20); insert into emp values (7654,'MARTIN','SALESMAN',7698,to_date('28-9-1981','dd-mm-yyyy'),1250,1400,30); insert into emp values (7698,'BLAKE','MANAGER',7839,to_date('1-5-1981','dd-mm-yyyy'),2850,NULL,30); insert into emp values (7782,'CLARK','MANAGER',7839,to_date('9-6-1981','dd-mm-yyyy'),2450,NULL,10); insert into emp values (7788,'SCOTT','ANALYST',7566,to_date('13-JUL-87','dd-mm-rr')-85,3000,NULL,20); insert into emp values (7839,'KING','PRESIDENT',NULL,to_date('17-11-1981','dd-mm-yyyy'),5000,NULL,10); insert into emp values (7844,'TURNER','SALESMAN',7698,to_date('8-9-1981','dd-mm-yyyy'),1500,0,30); insert into emp values (7876,'ADAMS','CLERK',7788,to_date('13-JUL-87', 'dd-mm-rr')-51,1100,NULL,20); insert into emp values (7900,'JAMES','CLERK',7698,to_date('3-12-1981','dd-mm-yyyy'),950,NULL,30); insert into emp values (7902,'FORD','ANALYST',7566,to_date('3-12-1981','dd-mm-yyyy'),3000,NULL,20); insert into emp values (7934,'MILLER','CLERK',7782,to_date('23-1-1982','dd-mm-yyyy'),1300,NULL,10); commit;
SKEWNESS_POP and SKEWNESS_SAMP as Aggregate Functions
The following query uses the SKEWNESS_POP
aggregate function to display the distribution of the data in the LEFT_SKEW
, NORMAL
and RIGHT_SKEW
columns.
select skewness_pop(left_skew) as left_skew, skewness_pop(normal) as normal, skewness_pop(right_skew) as right_skew from t1; LEFT_SKEW NORMAL RIGHT_SKEW ---------- ---------- ---------- -5.1049315 -.0077114 6.91847714 SQL>
As expected the LEFT_SKEW
column returns a negative value, the NORMAL
column returns a near-zero value and the RIGHT_SKEW
column returns a positive value.
The SKEWNESS_POP
function uses a sample size of 100% of the rows, which can represent an overhead for large data sets. In contrast the SKEWNESS_SAMP
function uses a smaller sample size, making it more efficient for large data sets, whilst still returning representative results.
select skewness_samp(left_skew) as left_skew, skewness_samp(normal) as normal, skewness_samp(right_skew) as right_skew from t1; LEFT_SKEW NORMAL RIGHT_SKEW ---------- ---------- ---------- -5.1056973 -.00771256 6.91951511 SQL>
Using DISTINCT
or UNIQUE
keywords mean only unique values in the expression are used for the calculation. The ALL
keyword is that same as the default action.
select skewness_samp(distinct left_skew) as skew_distinct, skewness_samp(unique left_skew) as skew_unique, skewness_samp(all left_skew) as skew_all, skewness_samp(left_skew) as skew from t1; SKEW_DISTINCT SKEW_UNIQUE SKEW_ALL SKEW ------------- ----------- ---------- ---------- -0.370901916 -0.370901916 -5.11615858 -5.11615858 SQL>
The DISTINCT
, UNIQUE
and ALL
keywords are also available for the analytic functions.
SKEWNESS_POP Analytic Function
Using an empty OVER
clause turns the SKEWNESS_POP
function into an analytic function. The lack of a partitioning clause means the whole result set is treated as a single partition. The following query uses the SKEWNESS_POP
analytic function to display the skewness of the data in the SAL
column, as well as all the original data.
select empno, ename, deptno, sal, round(skewness_pop(sal) over (),2) as sal_skew from emp; EMPNO ENAME DEPTNO SAL SAL_SKEW ---------- ---------- ---------- ---------- ---------- 7369 SMITH 20 800 1.04 7499 ALLEN 30 1600 1.04 7521 WARD 30 1250 1.04 7566 JONES 20 2975 1.04 7654 MARTIN 30 1250 1.04 7698 BLAKE 30 2850 1.04 7782 CLARK 10 2450 1.04 7788 SCOTT 20 3000 1.04 7839 KING 10 5000 1.04 7844 TURNER 30 1500 1.04 7876 ADAMS 20 1100 1.04 7900 JAMES 30 950 1.04 7902 FORD 20 3000 1.04 7934 MILLER 10 1300 1.04 SQL>
Adding the partitioning clause allows us to display the salary skew per department, along with the employee data for each department.
select empno, ename, deptno, sal, round(skewness_pop(sal) over (partition by deptno),2) as sal_skew_by_dept from emp; EMPNO ENAME DEPTNO SAL SAL_SKEW_BY_DEPT ---------- ---------- ---------- ---------- ---------------- 7782 CLARK 10 2450 .43 7839 KING 10 5000 .43 7934 MILLER 10 1300 .43 7566 JONES 20 2975 -0.44 7902 FORD 20 3000 -0.44 7876 ADAMS 20 1100 -0.44 7369 SMITH 20 800 -0.44 7788 SCOTT 20 3000 -0.44 7521 WARD 30 1250 1.33 7844 TURNER 30 1500 1.33 7499 ALLEN 30 1600 1.33 7900 JAMES 30 950 1.33 7698 BLAKE 30 2850 1.33 7654 MARTIN 30 1250 1.33 SQL>
SKEWNESS_SAMP Analytic Function
Using an empty OVER
clause turns the SKEWNESS_SAMP
function into an analytic function. The lack of a partitioning clause means the whole result set is treated as a single partition. The following query uses the SKEWNESS_SAMP
analytic function to display the skewness of the data in the SAL
column, as well as all the original data.
select empno, ename, deptno, sal, round(skewness_samp(sal) over (),2) as sal_skew from emp; EMPNO ENAME DEPTNO SAL SAL_SKEW ---------- ---------- ---------- ---------- ---------- 7369 SMITH 20 800 1.17 7499 ALLEN 30 1600 1.17 7521 WARD 30 1250 1.17 7566 JONES 20 2975 1.17 7654 MARTIN 30 1250 1.17 7698 BLAKE 30 2850 1.17 7782 CLARK 10 2450 1.17 7788 SCOTT 20 3000 1.17 7839 KING 10 5000 1.17 7844 TURNER 30 1500 1.17 7876 ADAMS 20 1100 1.17 7900 JAMES 30 950 1.17 7902 FORD 20 3000 1.17 7934 MILLER 10 1300 1.17 SQL>
Adding the partitioning clause allows us to display the salary skew per department, along with the employee data for each department.
select empno, ename, deptno, sal, round(skewness_samp(sal) over (partition by deptno),2) as sal_skew_by_dept from emp; EMPNO ENAME DEPTNO SAL SAL_SKEW_BY_DEPT ---------- ---------- ---------- ---------- ---------------- 7782 CLARK 10 2450 1.04 7839 KING 10 5000 1.04 7934 MILLER 10 1300 1.04 7566 JONES 20 2975 -0.65 7902 FORD 20 3000 -0.65 7876 ADAMS 20 1100 -0.65 7369 SMITH 20 800 -0.65 7788 SCOTT 20 3000 -0.65 7521 WARD 30 1250 1.82 7844 TURNER 30 1500 1.82 7499 ALLEN 30 1600 1.82 7900 JAMES 30 950 1.82 7698 BLAKE 30 2850 1.82 7654 MARTIN 30 1250 1.82 SQL>
Quick Links
The "*" indicates the function supports the full analytic syntax, including the windowing clause.
For more information see:
Hope this helps. Regards Tim...