Forums | Aggregator | Books | Industry News | Firefox Plugins | Social | Links

Comments

String Aggregation Techniques - Several methods of combining multiple rows of data into a single row using aggregate functions.



Lim said...

This is a good article, I currently use something similar to the 'Generic Function using Ref Cursor' solution.

But now that my eyes have been opened to other possiblities which is faster/better? or is there no difference?

Tim... said...

I suggest trying them all and picking the one you prefer. It's worth doing some peformance tests also.

I prefer the user-defined aggregate function, but that just me preference, it's not the law :)

Cheers

Tim...


Moore said...

Thanks,
This solves the problem I am having now with a query select my client wants

Laurent Schneider said...

I thought you could be intersted in my solution in 10gR2 with xquery

http://laurentschneider.blogspot.com/2005/09/stragg-in-10gr2.html


Petr Novotny said...

Tried with 9.2.0.5. With a small EMP table of 10 entries, it worked fine. However, when I added 1 million entries into EMP, both the string_agg() method and the row_number()/sys_path_connect_by() methods crash the shared server (alert log says "found dead shared server", client says "ORA-03113").

Opened on MetaLink, let's see if it can be fixed.

Petr Novotny said...

I just don't want to leave here unfinished threads. The problem I mentioned turned out to be a case of a known (non-)bug 1539879. The CONNECT BY query is simply too TEMP-heavy and my TEMP tablespace was not set up safely.

Mikhail Waxman said...

Very nice article! For a long time I tried to find better solution string aggegation. I decide immediatly use it. SPECIFIC FUNCTION I'm using for a long time. But this is not I want. More flexible GENERIC FUNCTION USING REF CURSOR. So I made this work after 3 hours. I'm working with ORACLE9. There are 2 problem I discover:
1.JOIN ON doesn't work properly in CURSOR(SELECT...FROM table1 JOIN table2 ON...). So instead of JOIN i use ORACLE7 rules for join tables.
2. In the function CONCATENATE_LIST missing CLOSE p_cursor statement after END LOOP;. Without fixing this problem query ORACLE complain TOO MANY OPENED CURSORS.
But anyway think You for nice article.

aketi said...

anoter SQL

create table KeyList(
Key char(3),
Val char(2),
primary key (Key,Val));

insert into KeyList values('AAA','12');
insert into KeyList values('AAA','2A');
insert into KeyList values('AAA','32');
insert into KeyList values('AAA','44');
insert into KeyList values('BBB','11');
insert into KeyList values('BBB','33');
insert into KeyList values('BBB','8S');
commit;

col ConcatVal for a20

select Key,
substr(
replace(
replace(XMLAgg(XMLElement("dummy",Val) order by Val),''),
'',',')
,2) as ConcatVal
from KeyList
group by Key;


select Key,max(SubStr(sys_connect_by_path(Val,','),2)) as ConcatVal
from (select Key,Val,
Lag(RowID) over(partition by Key order by Val) as LagRowID
from KeyList)
start with LagRowID is null
connect by Prior RowID = LagRowID
group by Key
order by Key;


http://forums.oracle.com/forums/thread.jspa?messageID=1111377

Guy said...

Hi,
I' ve been compiling the string_agg function and the associated types.
It works great.
Thank you very much
Regards
Guy

Josh said...

Just so you know, in the STRING_AGG example, even though you have declared g_string to be 32K, you have also declared the return type to be VARCHAR2. If your aggregation ever exceeds 4K, you'll get ORA-06502 PL/SQL: numeric or value error: character string buffer too small. Also, because g_string is (potentially) so big, if you use STRING_AGG more than twice in the same select clause, you might (depending on the db block size) run into error ORA-01467: sort key too long.

The only way I can see to allow strings larger than 4K is to declare the returnValue parameter of ODCIAggregateTerminate and the return type of STRING_AGG to be CLOB.

Daniel said...

I had to add
CLOSE p_cursor;
right before the return statement of the concatenate_list function to avoid 'too many open cursors' error messages.

Sam said...

Anyone know of a way to avoid the sort key too long error when using string_agg more than twice in the same query (8kb block size, 9i)?

Sam said...

To answer my own question, changing g_string to:
g_string varchar2(4000);
seems to do the trick.

Duke Ganote said...

I'm playing with the MODEL clause:

select deptno, csv
from (
select *
from
( select deptno, empno
, row_number()
OVER (order by deptno, empno ) as r#
, row_number()
OVER (partition by deptno order by empno ) as p#
from scott.emp )
model
dimension by ( r# )
measures ( empno, deptno, p#
, cast( null as varchar2(150) ) as csv )
rules automatic order
(
CSV[r#] =
CASE WHEN csv[cv()+1] IS NULL -- end of table
OR deptno[cv()+1] <> deptno[cv()] -- end of partition
THEN TO_CHAR(empno[cv()])
ELSE csv[cv()+1]||','||empno[cv()]
END
)
)
where p# = 1
/
DEPTNO CSV
---------- -------------------------------------
10 7934,7839,7782
20 7902,7876,7788,7566,7369
30 7900,7844,7698,7654,7521,7499

http://tinyurl.com/2c92qo

Richie said...

Hi,

does anyone know how to write the same sql using ROW_NUMBER() and SYS_CONNECT_BY_PATH functions.... but when I have 3 columns/fields..
so like, Deptno, ename and ecountry ?

mik said...

The fourth example can be simplified to:

SELECT deptno,
LTRIM(MAX(SYS_CONNECT_BY_PATH(ename,','))
KEEP (DENSE_RANK LAST ORDER BY curr),',') AS employees
FROM (SELECT deptno,
ename,
ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY ename) AS curr
FROM emp)
GROUP BY deptno
CONNECT BY curr - 1 = PRIOR curr AND deptno = PRIOR deptno
START WITH curr = 1;

mik said...

... and with model clause it can be done like this:

select deptno, csv employees from
(
select *
from
( select deptno, ename
, row_number() OVER (partition by deptno order by ename) as r#
from emp )
model
partition by ( deptno )
dimension by ( r# )
measures ( ename, cast( null as varchar2(4000) ) csv )
rules automatic order
(
csv[r#] =
case when csv[cv()+1] is null
then ename[cv()]
else ename[cv()]||','||csv[cv()+1]
end
)
)
where r# = 1
order by deptno
/
DEPTNO EMPLOYEES
------ ------------------------------------
10 CLARK,KING,MILLER
20 ADAMS,FORD,JONES,SCOTT,SMITH
30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD

mik said...

... or like this:

select deptno, csv employees from
(
select *
from
( select deptno, ename
, row_number() over (partition by deptno order by ename) as r#
, count(*) over (partition by deptno) as c#
from emp )
model
partition by ( deptno )
dimension by ( r# )
measures ( ename, c#, cast( null as varchar2(4000) ) csv )
rules
(
csv[1] = ename[cv()],
csv[r#>1] = csv[cv()-1]||','||ename[cv()]
)
)
where r# = c#
order by deptno
/
DEPTNO EMPLOYEES
------ ------------------------------------
10 CLARK,KING,MILLER
20 ADAMS,FORD,JONES,SCOTT,SMITH
30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD

Gordon said...

I really like the query using the CONNECT clauses, since I often do not have write access in databases.

However, Oracle documentation for the CONNECT clause specifically says that it should not be used with the GROUP BY clause.

Does anyone know if the documentation is being too restrictive? Or, does using CONNECT with the GROUP BY cause poor performance? Or, is there sometimes the possibility of a database error in this case?

Just as a note . . . I am getting lousy performance on largish data and when playing around with the original query, I did get Oracle database errors.





Laurent Schneider said...

fresh solution with XSLT
http://laurentschneider.com/wordpress/2008/12/old-fashion-listagg.html

Riikou said...

Congratulations and Thank you

OT said...

The string_agg functions works perfectly! Thanks a lot!

DanC said...

Here is a modified STRING_AGG that does a distinct and sort on the results.

CREATE OR REPLACE TYPE t_string_agg2 AS OBJECT
(
g_resultlist SYS.ODCIVARCHAR2LIST,
g_string VARCHAR2(32767),

STATIC FUNCTION ODCIAggregateInitialize(sctx IN OUT t_string_agg2)
RETURN NUMBER,

MEMBER FUNCTION ODCIAggregateIterate(self IN OUT t_string_agg2,
value IN VARCHAR2 )
RETURN NUMBER,

MEMBER FUNCTION ODCIAggregateTerminate(self IN t_string_agg2,
returnValue OUT VARCHAR2,
flags IN NUMBER)
RETURN NUMBER,

MEMBER FUNCTION ODCIAggregateMerge(self IN OUT t_string_agg2,
ctx2 IN t_string_agg2)
RETURN NUMBER
);


CREATE OR REPLACE TYPE BODY t_string_agg2 IS
STATIC FUNCTION ODCIAggregateInitialize(sctx IN OUT t_string_agg2)
RETURN NUMBER IS
BEGIN
sctx := t_string_agg2(SYS.ODCIVARCHAR2LIST(NULL),NULL);
RETURN ODCIConst.Success;
END;

MEMBER FUNCTION ODCIAggregateIterate(self IN OUT t_string_agg2,
value IN VARCHAR2 )
RETURN NUMBER IS
BEGIN
SELF.g_resultlist.EXTEND;
SELF.g_resultlist(g_resultlist.LAST) := value;
SELF.g_string := self.g_string || ',' || value;
RETURN ODCIConst.Success;
END;

MEMBER FUNCTION ODCIAggregateTerminate(self IN t_string_agg2,
returnValue OUT VARCHAR2,
flags IN NUMBER)
RETURN NUMBER IS
type tt_varchar2_list is table of varchar2(32000) index by varchar2(32000);
t_varchar2_list tt_varchar2_list;
id2 pls_integer;
BEGIN
returnvalue := NULL;

for i in self.g_resultlist.FIRST..self.g_resultlist.LAST loop
if self.g_resultlist(i) is not null then
t_varchar2_list(self.g_resultlist(i)) := self.g_resultlist(i);
end if;
end loop;

id2 := t_varchar2_list.FIRST;
WHILE id2 IS NOT NULL LOOP
returnvalue := returnvalue || ',' || t_varchar2_list(id2);
id2 := t_varchar2_list.NEXT(id2);
END LOOP;

returnValue := RTRIM(LTRIM(returnValue, ','), ',');

RETURN ODCIConst.Success;
END;

MEMBER FUNCTION ODCIAggregateMerge(self IN OUT t_string_agg2,
ctx2 IN t_string_agg2)
RETURN NUMBER IS
BEGIN
SELF.g_string := SELF.g_string || ',' || ctx2.g_string;
RETURN ODCIConst.Success;
END;
END;

CREATE OR REPLACE FUNCTION string_agg_sort (p_input VARCHAR2)
RETURN VARCHAR2
PARALLEL_ENABLE AGGREGATE USING t_string_agg2;

NItin Roman said...

Excellent Article !!!

Solved problem of row to column display.

andrey said...

The link to asktom is broken.

Tim... said...

Hi.

Fixed now. Thanks for the heads-up.

Cheers

Tim...

Frank Kulash said...

As clearly stated, the ROWN_NUMBER/SYS_CONNECT_BY_PATH solution is for Oracle 9. In Oracle 10 (and up) it can be simplified using the CONNECT_BY_ISLEAF pseudo-column:
[pre]
SELECT deptno,
LTRIM (SYS_CONNECT_BY_PATH(ename,','), ',') AS employees
FROM (SELECT deptno,
ename,
ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY ename) AS curr
FROM emp)
WHERE CONNECT_BY_ISLEAF = 1
CONNECT BY CURR - 1 = PRIOR curr AND deptno = PRIOR deptno
START WITH curr = 1;
[/pre]

Kevin said...

Hai

This is a good post.
I used the COLLECT function in 10g for aggregation. It does the right thing but includes duplicates of the column data as well. How do I eliminate the duplicates??

Thanks

Tim... said...

Hi.

The method itself doesn't introduce duplicates, so your query must be supplying duplicates. You should alter your query to remove duplicates, which will then be removed from the concatenated list.

Note. It's a better idea to use the LISTAGG or WM_CONACAT functions really.

Cheers

Tim...

Theo said...

Thank you! Was looking for something like this for days. Oracle's COLLECT function has troublesome like hell! Since I'm only using 10g2, WM_CONCAT was the way forward. Thanks.

romar said...

Great article. Thanks a lot!

moutaye said...

Brilliant article. Many Thanks

OracleUser said...

Thank you.

Kim Berg Hansen said...

Very handy to have a compilation of string agg techniques - thanks :-)

On the Oracle technet forums I have seen another nice way from Solomon Yakobson:

SELECT
deptno,
rtrim(xmlagg(xmlelement(e,ename,',').extract('//text()') order by ename),',') AS employees
FROM scott.emp
GROUP BY deptno;

Drew said...

I used the LISTAGG and that worked perfectly. Thank You so much

Dan J said...

For large data sets, I found the hierarchical or cursor queries don't cut it. Stuck in 11gR1 (no listagg), I found that wm_concat can be used as an analytic function to sort the elements, courtesy of Dagon - http://www.tek-tips.com/viewthread.cfm?qid=1629662

Norm said...

Excellent article! Thanks!

Sunil said...

I want to AVERAGE the each column in given string. please suggest which change I have to make in above code.

Tim... said...

Hi.

I don't understand what you are asking for. Average of strings?

Please ask the question in the forum on this site and post the relevant CREATE TABLE and INSERT statements to build an example test case.

Cheers

Tim...

evald said...

big problems when you have huge text...

Sasha said...

Very nice! Thank you.

Lauri said...

Very good post Tim!!

Vijay Jadhav said...

Thank you very much. It helped me.
Prepared Statement Error: Table './oraclebasecms/cms_page_comment_uuids' is marked as crashed and should be repaired