8i | 9i | 10g | 11g | 12c | 13c | 18c | 19c | Misc | PL/SQL | SQL | RAC | WebLogic | Linux

Home » Articles » 19c » Here

LISTAGG DISTINCT in Oracle Database 19c

The LISTAGG function was introduced in Oracle 11gR2 to make string aggregation simpler. In Oracle 12cR2 it was extended to include overflow error handling. Oracle 19c includes the ability to remove duplicates from the LISTAGG results by including the DEFAULT keyword.

Related articles.

Setup

The examples in this article use the following table.

-- DROP TABLE EMP PURGE;

CREATE TABLE EMP (
  EMPNO NUMBER(4) CONSTRAINT PK_EMP PRIMARY KEY,
  ENAME VARCHAR2(10),
  JOB VARCHAR2(9),
  MGR NUMBER(4),
  HIREDATE DATE,
  SAL NUMBER(7,2),
  COMM NUMBER(7,2),
  DEPTNO NUMBER(2)
);

INSERT INTO EMP VALUES (7369,'SMITH','CLERK',7902,to_date('17-12-1980','dd-mm-yyyy'),800,NULL,20);
INSERT INTO EMP VALUES (7499,'ALLEN','SALESMAN',7698,to_date('20-2-1981','dd-mm-yyyy'),1600,300,30);
INSERT INTO EMP VALUES (7521,'WARD','SALESMAN',7698,to_date('22-2-1981','dd-mm-yyyy'),1250,500,30);
INSERT INTO EMP VALUES (7566,'JONES','MANAGER',7839,to_date('2-4-1981','dd-mm-yyyy'),2975,NULL,20);
INSERT INTO EMP VALUES (7654,'MARTIN','SALESMAN',7698,to_date('28-9-1981','dd-mm-yyyy'),1250,1400,30);
INSERT INTO EMP VALUES (7698,'BLAKE','MANAGER',7839,to_date('1-5-1981','dd-mm-yyyy'),2850,NULL,30);
INSERT INTO EMP VALUES (7782,'CLARK','MANAGER',7839,to_date('9-6-1981','dd-mm-yyyy'),2450,NULL,10);
INSERT INTO EMP VALUES (7788,'SCOTT','ANALYST',7566,to_date('13-JUL-87','dd-mm-rr')-85,3000,NULL,20);
INSERT INTO EMP VALUES (7839,'KING','PRESIDENT',NULL,to_date('17-11-1981','dd-mm-yyyy'),5000,NULL,10);
INSERT INTO EMP VALUES (7844,'TURNER','SALESMAN',7698,to_date('8-9-1981','dd-mm-yyyy'),1500,0,30);
INSERT INTO EMP VALUES (7876,'ADAMS','CLERK',7788,to_date('13-JUL-87', 'dd-mm-rr')-51,1100,NULL,20);
INSERT INTO EMP VALUES (7900,'JAMES','CLERK',7698,to_date('3-12-1981','dd-mm-yyyy'),950,NULL,30);
INSERT INTO EMP VALUES (7902,'FORD','ANALYST',7566,to_date('3-12-1981','dd-mm-yyyy'),3000,NULL,20);
INSERT INTO EMP VALUES (7934,'MILLER','CLERK',7782,to_date('23-1-1982','dd-mm-yyyy'),1300,NULL,10);
COMMIT;

The Problem

The default action of the LISTAGG function is shown below.

COLUMN employees FORMAT A40

SELECT deptno, LISTAGG(ename, ',') WITHIN GROUP (ORDER BY ename) AS employees
FROM   emp
GROUP BY deptno
ORDER BY deptno;

    DEPTNO EMPLOYEES
---------- ----------------------------------------
        10 CLARK,KING,MILLER
        20 ADAMS,FORD,JONES,SCOTT,SMITH
        30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD

3 rows selected.

SQL>

Let's add some extra people called "MILLER" into department 10, to give us duplicates in the aggregated list.

INSERT INTO emp VALUES (9998,'MILLER','ANALYST',7782,to_date('23-1-1982','dd-mm-yyyy'),1600,NULL,10);
INSERT INTO emp VALUES (9999,'MILLER','MANADER',7782,to_date('23-1-1982','dd-mm-yyyy'),1500,NULL,10);
COMMIT;

As expected, we now see multiple entries for the name "MILLER" in department 10.

COLUMN employees FORMAT A40

SELECT deptno, LISTAGG(ename, ',') WITHIN GROUP (ORDER BY ename) AS employees
FROM   emp
GROUP BY deptno
ORDER BY deptno;

    DEPTNO EMPLOYEES
---------- ----------------------------------------
        10 CLARK,KING,MILLER,MILLER,MILLER
        20 ADAMS,FORD,JONES,SCOTT,SMITH
        30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD

3 rows selected.

SQL>

If that's what we are expecting, great. If we want to remove duplicates, what do we do?

The Solution : Pre-19c

We could solve this in a number of ways. In the following example we use the ROW_NUMBER analytic function to remove any duplicates, then use the conventional LISTAGG function to aggregate the data.

COLUMN employees FORMAT A40

SELECT e2.deptno, LISTAGG(e2.ename, ',') WITHIN GROUP (ORDER BY e2.ename) AS employees
FROM   (SELECT e.*,
               ROW_NUMBER() OVER (PARTITION BY e.deptno, e.ename ORDER BY e.empno) AS myrank
        FROM   emp e) e2
WHERE  e2.myrank = 1
GROUP BY e2.deptno
ORDER BY e2.deptno;

    DEPTNO EMPLOYEES
---------- ----------------------------------------
        10 CLARK,KING,MILLER
        20 ADAMS,FORD,JONES,SCOTT,SMITH
        30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD

3 rows selected.

SQL>

Alternatively we could use DISTINCT in an inline view to remove the duplicate rows, then use the conventional LISTAGG function call to aggregate the data.

COLUMN employees FORMAT A40

SELECT e2.deptno, LISTAGG(e2.ename, ',') WITHIN GROUP (ORDER BY e2.ename) AS employees
FROM   (SELECT DISTINCT e.deptno, e.ename
        FROM   emp e) e2
GROUP BY e2.deptno
ORDER BY e2.deptno;

    DEPTNO EMPLOYEES
---------- ----------------------------------------
        10 CLARK,KING,MILLER
        20 ADAMS,FORD,JONES,SCOTT,SMITH
        30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD

3 rows selected.

SQL>

The Solution : 19c Onward

Oracle 19c introduced a simpler solution. We can now include the DISTINCT keyword directly in the LISTAGG function call.

COLUMN employees FORMAT A40

SELECT deptno, LISTAGG(DISTINCT ename, ',') WITHIN GROUP (ORDER BY ename) AS employees
FROM   emp
GROUP BY deptno
ORDER BY deptno;

    DEPTNO EMPLOYEES
---------- ----------------------------------------
        10 CLARK,KING,MILLER
        20 ADAMS,FORD,JONES,SCOTT,SMITH
        30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD

3 rows selected.

SQL>

The default functionality is to include all results, which we can express explicitly using the ALL keyword.

SELECT deptno, LISTAGG(ALL ename, ',') WITHIN GROUP (ORDER BY ename) AS employees
FROM   emp
GROUP BY deptno
ORDER BY deptno;

    DEPTNO EMPLOYEES
---------- ----------------------------------------
        10 CLARK,KING,MILLER,MILLER,MILLER
        20 ADAMS,FORD,JONES,SCOTT,SMITH
        30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD

3 rows selected.

SQL>

Quick Links

The "*" indicates the function supports the full analytic syntax, including the windowing clause.

AVG * CLUSTER_DETAILS CLUSTER_DISTANCE CLUSTER_ID CLUSTER_PROBABILITY
CLUSTER_SET CORR * COUNT * COVAR_POP * COVAR_SAMP *
CUME_DIST DENSE_RANK FEATURE_DETAILS FEATURE_ID FEATURE_SET
FEATURE_VALUE FIRST FIRST_VALUE * LAG LAST
LAST_VALUE * LEAD LISTAGG MATCH_RECOGNIZE MAX *
MEDIAN MIN * NTH_VALUE * NTILE PERCENT_RANK
PERCENTILE_CONT PERCENTILE_DISC PREDICTION PREDICTION_COST PREDICTION_DETAILS
PREDICTION_PROBABILITY PREDICTION_SET RANK RATIO_TO_REPORT REGR_ (Linear Regression) Functions *
ROW_NUMBER STDDEV * STDDEV_POP * STDDEV_SAMP * SUM *
VAR_POP * VAR_SAMP * VARIANCE * String Aggregation Top-N Queries

For more information see:

Hope this helps. Regards Tim...

Back to the Top.