8i | 9i | 10g | 11g | 12c | 13c | 18c | 19c | 21c | 23c | Misc | PL/SQL | SQL | RAC | WebLogic | Linux
APEX_DATA_PARSER : Convert simple CSV, JSON, XML and XLSX data to rows and columns.
The APEX_DATA_PARSER
package allows you to easily convert simple CSV, JSON, XML and XLSX formatted data into rows and columns from SQL. You must have APEX 19.1 or later installed in the database for this package to be available, but you can use the package independently of APEX.
Related articles.
Introduction
The APEX_DATA_PARSER
package makes it easy to display CSV, JSON, XML and XLSX formatted data as rows and columns from SQL. It's focused on basic tabular data, not complex document structures, so don't expect this to be a generic parser you will use for all documents.
The package contains a number of routines, but you will mostly use the PARSE
pipelined table function, which returns rows of type WWV_FLOW_T_PARSER_ROW
. There is a LINE_NUMBER
column, then columns named COL001
to COL300
, so there is a maximum of 300 columns in the data.
Setup
The examples in this article use files placed in a directory on the database server. The files are accessed using an Oracle directory object called TEMP_DIR
. We create a directory object and grant access to is from our test user.
CREATE OR REPLACE DIRECTORY TEMP_DIR AS '/tmp/'; GRANT READ, WRITE ON DIRECTORY TEMP_DIR TO testuser1;
The following files are present in the "/tmp" directory on the database server. Their contents are dumps of the data in the EMP
table, in the specified format.
The examples below will load the data from these files into a BLOB using the following function.
CREATE OR REPLACE FUNCTION file_to_blob (p_dir IN VARCHAR2, p_filename IN VARCHAR2) RETURN BLOB AS l_bfile BFILE; l_blob BLOB; l_dest_offset INTEGER := 1; l_src_offset INTEGER := 1; BEGIN l_bfile := BFILENAME(p_dir, p_filename); DBMS_LOB.fileopen(l_bfile, DBMS_LOB.file_readonly); DBMS_LOB.createtemporary(l_blob, FALSE); IF DBMS_LOB.getlength(l_bfile) > 0 THEN DBMS_LOB.loadblobfromfile ( dest_lob => l_blob, src_bfile => l_bfile, amount => DBMS_LOB.lobmaxsize, dest_offset => l_dest_offset, src_offset => l_src_offset); END IF; DBMS_LOB.fileclose(l_bfile); RETURN l_blob; END file_to_blob; /
The "emp.csv" file contains the following data.
"EMPNO","ENAME","JOB","MGR","HIREDATE","SAL","COMM","DEPTNO" 7369,"SMITH","CLERK",7902,17-DEC-80,800,,20 7499,"ALLEN","SALESMAN",7698,20-FEB-81,1600,300,30 7521,"WARD","SALESMAN",7698,22-FEB-81,1250,500,30 7566,"JONES","MANAGER",7839,02-APR-81,2975,,20 7654,"MARTIN","SALESMAN",7698,28-SEP-81,1250,1400,30 7698,"BLAKE","MANAGER",7839,01-MAY-81,2850,,30 7782,"CLARK","MANAGER",7839,09-JUN-81,2450,,10 7788,"SCOTT","ANALYST",7566,19-APR-87,3000,,20 7839,"KING","PRESIDENT",,17-NOV-81,5000,,10 7844,"TURNER","SALESMAN",7698,08-SEP-81,1500,0,30 7876,"ADAMS","CLERK",7788,23-MAY-87,1100,,20 7900,"JAMES","CLERK",7698,03-DEC-81,950,,30 7902,"FORD","ANALYST",7566,03-DEC-81,3000,,20 7934,"MILLER","CLERK",7782,23-JAN-82,1300,,10
The "emp.json" file contains the following data.
{"results":[{"columns":[{"name":"EMPNO","type":"NUMBER"},{"name":"ENAME","type":"VARCHAR2"},{"name":"JOB","type":"VARCHAR2"},{"name":"MGR","type":"NUMBER"},{"name":"HIREDATE","type":"DATE"},{"name":"SAL","type":"NUMBER"},{"name":"COMM","type":"NUMBER"},{"name":"DEPTNO","type":"NUMBER"}], "items": [ {"empno":7369,"ename":"SMITH","job":"CLERK","mgr":7902,"hiredate":"17-DEC-80","sal":800,"deptno":20} ,{"empno":7499,"ename":"ALLEN","job":"SALESMAN","mgr":7698,"hiredate":"20-FEB-81","sal":1600,"comm":300,"deptno":30} ,{"empno":7521,"ename":"WARD","job":"SALESMAN","mgr":7698,"hiredate":"22-FEB-81","sal":1250,"comm":500,"deptno":30} ,{"empno":7566,"ename":"JONES","job":"MANAGER","mgr":7839,"hiredate":"02-APR-81","sal":2975,"deptno":20} ,{"empno":7654,"ename":"MARTIN","job":"SALESMAN","mgr":7698,"hiredate":"28-SEP-81","sal":1250,"comm":1400,"deptno":30} ,{"empno":7698,"ename":"BLAKE","job":"MANAGER","mgr":7839,"hiredate":"01-MAY-81","sal":2850,"deptno":30} ,{"empno":7782,"ename":"CLARK","job":"MANAGER","mgr":7839,"hiredate":"09-JUN-81","sal":2450,"deptno":10} ,{"empno":7788,"ename":"SCOTT","job":"ANALYST","mgr":7566,"hiredate":"19-APR-87","sal":3000,"deptno":20} ,{"empno":7839,"ename":"KING","job":"PRESIDENT","hiredate":"17-NOV-81","sal":5000,"deptno":10} ,{"empno":7844,"ename":"TURNER","job":"SALESMAN","mgr":7698,"hiredate":"08-SEP-81","sal":1500,"comm":0,"deptno":30} ,{"empno":7876,"ename":"ADAMS","job":"CLERK","mgr":7788,"hiredate":"23-MAY-87","sal":1100,"deptno":20} ,{"empno":7900,"ename":"JAMES","job":"CLERK","mgr":7698,"hiredate":"03-DEC-81","sal":950,"deptno":30} ,{"empno":7902,"ename":"FORD","job":"ANALYST","mgr":7566,"hiredate":"03-DEC-81","sal":3000,"deptno":20} ,{"empno":7934,"ename":"MILLER","job":"CLERK","mgr":7782,"hiredate":"23-JAN-82","sal":1300,"deptno":10} ]}]}
The "emp.xml" file contains the following data.
<?xml version='1.0' encoding='UTF-8' ?> <RESULTS> <ROW><EMPNO>7369</EMPNO><ENAME>SMITH</ENAME><JOB>CLERK</JOB><MGR>7902</MGR><HIREDATE>17-DEC-80</HIREDATE><SAL>800</SAL><COMM></COMM><DEPTNO>20</DEPTNO></ROW> <ROW><EMPNO>7499</EMPNO><ENAME>ALLEN</ENAME><JOB>SALESMAN</JOB><MGR>7698</MGR><HIREDATE>20-FEB-81</HIREDATE><SAL>1600</SAL><COMM>300</COMM><DEPTNO>30</DEPTNO></ROW> <ROW><EMPNO>7521</EMPNO><ENAME>WARD</ENAME><JOB>SALESMAN</JOB><MGR>7698</MGR><HIREDATE>22-FEB-81</HIREDATE><SAL>1250</SAL><COMM>500</COMM><DEPTNO>30</DEPTNO></ROW> <ROW><EMPNO>7566</EMPNO><ENAME>JONES</ENAME><JOB>MANAGER</JOB><MGR>7839</MGR><HIREDATE>02-APR-81</HIREDATE><SAL>2975</SAL><COMM></COMM><DEPTNO>20</DEPTNO></ROW> <ROW><EMPNO>7654</EMPNO><ENAME>MARTIN</ENAME><JOB>SALESMAN</JOB><MGR>7698</MGR><HIREDATE>28-SEP-81</HIREDATE><SAL>1250</SAL><COMM>1400</COMM><DEPTNO>30</DEPTNO></ROW> <ROW><EMPNO>7698</EMPNO><ENAME>BLAKE</ENAME><JOB>MANAGER</JOB><MGR>7839</MGR><HIREDATE>01-MAY-81</HIREDATE><SAL>2850</SAL><COMM></COMM><DEPTNO>30</DEPTNO></ROW> <ROW><EMPNO>7782</EMPNO><ENAME>CLARK</ENAME><JOB>MANAGER</JOB><MGR>7839</MGR><HIREDATE>09-JUN-81</HIREDATE><SAL>2450</SAL><COMM></COMM><DEPTNO>10</DEPTNO></ROW> <ROW><EMPNO>7788</EMPNO><ENAME>SCOTT</ENAME><JOB>ANALYST</JOB><MGR>7566</MGR><HIREDATE>19-APR-87</HIREDATE><SAL>3000</SAL><COMM></COMM><DEPTNO>20</DEPTNO></ROW> <ROW><EMPNO>7839</EMPNO><ENAME>KING</ENAME><JOB>PRESIDENT</JOB><MGR></MGR><HIREDATE>17-NOV-81</HIREDATE><SAL>5000</SAL><COMM></COMM><DEPTNO>10</DEPTNO></ROW> <ROW><EMPNO>7844</EMPNO><ENAME>TURNER</ENAME><JOB>SALESMAN</JOB><MGR>7698</MGR><HIREDATE>08-SEP-81</HIREDATE><SAL>1500</SAL><COMM>0</COMM><DEPTNO>30</DEPTNO></ROW> <ROW><EMPNO>7876</EMPNO><ENAME>ADAMS</ENAME><JOB>CLERK</JOB><MGR>7788</MGR><HIREDATE>23-MAY-87</HIREDATE><SAL>1100</SAL><COMM></COMM><DEPTNO>20</DEPTNO></ROW> <ROW><EMPNO>7900</EMPNO><ENAME>JAMES</ENAME><JOB>CLERK</JOB><MGR>7698</MGR><HIREDATE>03-DEC-81</HIREDATE><SAL>950</SAL><COMM></COMM><DEPTNO>30</DEPTNO></ROW> <ROW><EMPNO>7902</EMPNO><ENAME>FORD</ENAME><JOB>ANALYST</JOB><MGR>7566</MGR><HIREDATE>03-DEC-81</HIREDATE><SAL>3000</SAL><COMM></COMM><DEPTNO>20</DEPTNO></ROW> <ROW><EMPNO>7934</EMPNO><ENAME>MILLER</ENAME><JOB>CLERK</JOB><MGR>7782</MGR><HIREDATE>23-JAN-82</HIREDATE><SAL>1300</SAL><COMM></COMM><DEPTNO>10</DEPTNO></ROW> </RESULTS>
The "emp.xslx" file contains the CSV data from the "emp.scv" loaded into Excel and saved as an XSLX file. There is only a single worksheet, and it's called "emp sheet".
All the queries below use the following column formats in SQLcl/SQL*Plus unless otherwise stated.
COLUMN COL001 FORMAT A10 COLUMN COL002 FORMAT A10 COLUMN COL003 FORMAT A10 COLUMN COL004 FORMAT A10 COLUMN COL005 FORMAT A10 COLUMN COL006 FORMAT A10 COLUMN COL007 FORMAT A10 COLUMN COL008 FORMAT A10 SET LINESIZE 110
CSV Data
In the following example we use the FILE_TO_BLOB
function to load the contents of the "emp.csv" file into a BLOB, so we can pass it into the PARSE
function as the P_CONTENT
parameter. We also pass in the file name as the P_FILE_NAME
parameter. The type of data to load can be inferred from the file extension. Alternatively we could use the P_FILE_TYPE
or P_FILE_PROFILE
parameters to indicate the type of data, but using the file name is simple. The select list includes the line number and the first eight columns.
SELECT line_number, col001, col002, col003, col004, col005, col006, col007, col008 FROM TABLE( APEX_DATA_PARSER.parse( p_content => file_to_blob('TEMP_DIR', 'emp.csv'), p_file_name => 'emp.csv' ) ); LINE_NUMBER COL001 COL002 COL003 COL004 COL005 COL006 COL007 COL008 ----------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- 1 EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO 2 7369 SMITH CLERK 7902 17-DEC-80 800 20 3 7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 4 7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30 5 7566 JONES MANAGER 7839 02-APR-81 2975 20 6 7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30 7 7698 BLAKE MANAGER 7839 01-MAY-81 2850 30 8 7782 CLARK MANAGER 7839 09-JUN-81 2450 10 9 7788 SCOTT ANALYST 7566 19-APR-87 3000 20 10 7839 KING PRESIDENT 17-NOV-81 5000 10 11 7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30 12 7876 ADAMS CLERK 7788 23-MAY-87 1100 20 13 7900 JAMES CLERK 7698 03-DEC-81 950 30 14 7902 FORD ANALYST 7566 03-DEC-81 3000 20 15 7934 MILLER CLERK 7782 23-JAN-82 1300 10 15 rows selected. SQL>
For CSV data, by default the first row contains the column headers, then we get the 14 rows of data we would expect. We could remove the column headers row by adding the "P_ADD_HEADERS_ROW => 'N'"
parameter into the function call.
JSON Data
In the following example we use the "emp.json" file. It's similar to the CSV example, but we need to use the P_ROW_SELECTOR
parameter to show where the array of row data can be found, which in this case is the "results.items" item. For JSON data, the column names aren't included in the output by default, so we use the "P_ADD_HEADERS_ROW => 'Y'"
parameter to include them.
SELECT line_number, col001, col002, col003, col004, col005, col006, col007, col008 FROM TABLE( APEX_DATA_PARSER.parse( p_content => file_to_blob('TEMP_DIR', 'emp.json'), p_file_name => 'emp.json', p_row_selector => 'results.items', p_add_headers_row => 'Y' ) ); LINE_NUMBER COL001 COL002 COL003 COL004 COL005 COL006 COL007 COL008 ----------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- JOB MGR SAL COMM EMPNO ENAME DEPTNO HIREDATE 1 CLERK 7902 800 7369 SMITH 20 17-DEC-80 2 SALESMAN 7698 1600 300 7499 ALLEN 30 20-FEB-81 3 SALESMAN 7698 1250 500 7521 WARD 30 22-FEB-81 4 MANAGER 7839 2975 7566 JONES 20 02-APR-81 5 SALESMAN 7698 1250 1400 7654 MARTIN 30 28-SEP-81 6 MANAGER 7839 2850 7698 BLAKE 30 01-MAY-81 7 MANAGER 7839 2450 7782 CLARK 10 09-JUN-81 8 ANALYST 7566 3000 7788 SCOTT 20 19-APR-87 9 PRESIDENT 5000 7839 KING 10 17-NOV-81 10 SALESMAN 7698 1500 0 7844 TURNER 30 08-SEP-81 11 CLERK 7788 1100 7876 ADAMS 20 23-MAY-87 12 CLERK 7698 950 7900 JAMES 30 03-DEC-81 13 ANALYST 7566 3000 7902 FORD 20 03-DEC-81 14 CLERK 7782 1300 7934 MILLER 10 23-JAN-82 15 rows selected. SQL>
Notice the column order is not what we might have expected.
XML Data
In the following example we use the "emp.xml" file. It's similar to the CSV example, but we need to use the P_ROW_SELECTOR
parameter show where the array of row data can be found, which in this case is the "/RESULTS/ROW" tag. For XML data, column names aren't included in the output by default, so we use the "P_ADD_HEADERS_ROW => 'Y'"
parameter to include them.
SELECT line_number, col001, col002, col003, col004, col005, col006, col007, col008 FROM TABLE( APEX_DATA_PARSER.parse( p_content => file_to_blob('TEMP_DIR', 'emp.xml'), p_file_name => 'emp.xml', p_row_selector => '/RESULTS/ROW', p_add_headers_row => 'Y' ) ); LINE_NUMBER COL001 COL002 COL003 COL004 COL005 COL006 COL007 COL008 ----------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO 1 7369 SMITH CLERK 7902 17-DEC-80 800 20 2 7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 3 7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30 4 7566 JONES MANAGER 7839 02-APR-81 2975 20 5 7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30 6 7698 BLAKE MANAGER 7839 01-MAY-81 2850 30 7 7782 CLARK MANAGER 7839 09-JUN-81 2450 10 8 7788 SCOTT ANALYST 7566 19-APR-87 3000 20 9 7839 KING PRESIDENT 17-NOV-81 5000 10 10 7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30 11 7876 ADAMS CLERK 7788 23-MAY-87 1100 20 12 7900 JAMES CLERK 7698 03-DEC-81 950 30 13 7902 FORD ANALYST 7566 03-DEC-81 3000 20 14 7934 MILLER CLERK 7782 23-JAN-82 1300 10 15 rows selected. SQL>
XLSX Data
In the following example we use the "emp.xslx" file. If we are interested in the first worksheet, we can keep things really simple and use it in a similar way to the CSV example.
SELECT line_number, col001, col002, col003, col004, col005, col006, col007, col008 FROM TABLE( APEX_DATA_PARSER.parse( p_content => file_to_blob('TEMP_DIR', 'emp.xlsx'), p_file_name => 'emp.xlsx' ) ); LINE_NUMBER COL001 COL002 COL003 COL004 COL005 COL006 COL007 COL007 COL008 ----------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- 1 EMPNO ENAME JOB MGR HIREDATE SAL COMM COMM DEPTNO 2 7369 SMITH CLERK 7902 1980-12-17 800 20 3 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 300 30 4 7521 WARD SALESMAN 7698 1981-02-22 1250 500 500 30 5 7566 JONES MANAGER 7839 1981-04-02 2975 20 6 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 1400 30 7 7698 BLAKE MANAGER 7839 1981-05-01 2850 30 8 7782 CLARK MANAGER 7839 1981-06-09 2450 10 9 7788 SCOTT ANALYST 7566 1987-04-19 3000 20 10 7839 KING PRESIDENT 1981-11-17 5000 10 11 7844 TURNER SALESMAN 7698 1981-09-08 1500 0 0 30 12 7876 ADAMS CLERK 7788 1987-05-23 1100 20 13 7900 JAMES CLERK 7698 1981-12-03 950 30 14 7902 FORD ANALYST 7566 1981-12-03 3000 20 15 7934 MILLER CLERK 7782 1982-01-23 1300 10 15 rows selected. SQL>
If there are multiple worksheets, we use the P_XLSX_SHEET_NAME
parameter to decide which sheet we want to use. We list the worksheets in the document using the GET_XLSX_WORKSHEETS
function.
COLUMN sheet_display_name FORMAT A30 COLUMN sheet_file_name FORMAT A30 SELECT sheet_display_name, sheet_file_name FROM TABLE( APEX_DATA_PARSER.get_xlsx_worksheets( p_content => file_to_blob('TEMP_DIR', 'emp.xlsx') ) ); SHEET_DISPLAY_NAME SHEET_FILE_NAME ------------------------------ ------------------------------ emp sheet sheet1.xml SQL>
We then use the SHEET_FILE_NAME
value in the parse call.
SELECT line_number, col001, col002, col003, col004, col005, col006, col007, col008 FROM TABLE( APEX_DATA_PARSER.parse( p_content => file_to_blob('TEMP_DIR', 'emp.xlsx'), p_file_name => 'emp.xlsx', p_xlsx_sheet_name => 'sheet1.xml' ) ); LINE_NUMBER COL001 COL002 COL003 COL004 COL005 COL006 COL007 COL007 COL008 ----------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- 1 EMPNO ENAME JOB MGR HIREDATE SAL COMM COMM DEPTNO 2 7369 SMITH CLERK 7902 1980-12-17 800 20 3 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 300 30 4 7521 WARD SALESMAN 7698 1981-02-22 1250 500 500 30 5 7566 JONES MANAGER 7839 1981-04-02 2975 20 6 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 1400 30 7 7698 BLAKE MANAGER 7839 1981-05-01 2850 30 8 7782 CLARK MANAGER 7839 1981-06-09 2450 10 9 7788 SCOTT ANALYST 7566 1987-04-19 3000 20 10 7839 KING PRESIDENT 1981-11-17 5000 10 11 7844 TURNER SALESMAN 7698 1981-09-08 1500 0 0 30 12 7876 ADAMS CLERK 7788 1987-05-23 1100 20 13 7900 JAMES CLERK 7698 1981-12-03 950 30 14 7902 FORD ANALYST 7566 1981-12-03 3000 20 15 7934 MILLER CLERK 7782 1982-01-23 1300 10 15 rows selected. SQL>
For more information see:
Hope this helps. Regards Tim...