8i | 9i | 10g | 11g | 12c | 13c | 18c | 19c | 21c | 23c | Misc | PL/SQL | SQL | RAC | WebLogic | Linux

Home » Articles » Misc » Here

APEX_DATA_PARSER : Convert simple CSV, JSON, XML and XLSX data to rows and columns.

The APEX_DATA_PARSER package allows you to easily convert simple CSV, JSON, XML and XLSX formatted data into rows and columns from SQL. You must have APEX 19.1 or later installed in the database for this package to be available, but you can use the package independently of APEX.

Related articles.

Introduction

The APEX_DATA_PARSER package makes it easy to display CSV, JSON, XML and XLSX formatted data as rows and columns from SQL. It's focused on basic tabular data, not complex document structures, so don't expect this to be a generic parser you will use for all documents.

The package contains a number of routines, but you will mostly use the PARSE pipelined table function, which returns rows of type WWV_FLOW_T_PARSER_ROW. There is a LINE_NUMBER column, then columns named COL001 to COL300, so there is a maximum of 300 columns in the data.

Setup

The examples in this article use files placed in a directory on the database server. The files are accessed using an Oracle directory object called TEMP_DIR. We create a directory object and grant access to is from our test user.

CREATE OR REPLACE DIRECTORY TEMP_DIR AS '/tmp/';
GRANT READ, WRITE ON DIRECTORY TEMP_DIR TO testuser1;

The following files are present in the "/tmp" directory on the database server. Their contents are dumps of the data in the EMP table, in the specified format.

The examples below will load the data from these files into a BLOB using the following function.

CREATE OR REPLACE FUNCTION file_to_blob (p_dir       IN  VARCHAR2,
                                         p_filename  IN  VARCHAR2)
  RETURN BLOB
AS
  l_bfile  BFILE;
  l_blob   BLOB;

  l_dest_offset INTEGER := 1;
  l_src_offset  INTEGER := 1;
BEGIN
  l_bfile := BFILENAME(p_dir, p_filename);
  DBMS_LOB.fileopen(l_bfile, DBMS_LOB.file_readonly);
  DBMS_LOB.createtemporary(l_blob, FALSE);
  IF DBMS_LOB.getlength(l_bfile) > 0 THEN
    DBMS_LOB.loadblobfromfile (
      dest_lob    => l_blob,
      src_bfile   => l_bfile,
      amount      => DBMS_LOB.lobmaxsize,
      dest_offset => l_dest_offset,
      src_offset  => l_src_offset);
  END IF;
  DBMS_LOB.fileclose(l_bfile);
  RETURN l_blob;
END file_to_blob;
/

The "emp.csv" file contains the following data.

"EMPNO","ENAME","JOB","MGR","HIREDATE","SAL","COMM","DEPTNO"
7369,"SMITH","CLERK",7902,17-DEC-80,800,,20
7499,"ALLEN","SALESMAN",7698,20-FEB-81,1600,300,30
7521,"WARD","SALESMAN",7698,22-FEB-81,1250,500,30
7566,"JONES","MANAGER",7839,02-APR-81,2975,,20
7654,"MARTIN","SALESMAN",7698,28-SEP-81,1250,1400,30
7698,"BLAKE","MANAGER",7839,01-MAY-81,2850,,30
7782,"CLARK","MANAGER",7839,09-JUN-81,2450,,10
7788,"SCOTT","ANALYST",7566,19-APR-87,3000,,20
7839,"KING","PRESIDENT",,17-NOV-81,5000,,10
7844,"TURNER","SALESMAN",7698,08-SEP-81,1500,0,30
7876,"ADAMS","CLERK",7788,23-MAY-87,1100,,20
7900,"JAMES","CLERK",7698,03-DEC-81,950,,30
7902,"FORD","ANALYST",7566,03-DEC-81,3000,,20
7934,"MILLER","CLERK",7782,23-JAN-82,1300,,10

The "emp.json" file contains the following data.

{"results":[{"columns":[{"name":"EMPNO","type":"NUMBER"},{"name":"ENAME","type":"VARCHAR2"},{"name":"JOB","type":"VARCHAR2"},{"name":"MGR","type":"NUMBER"},{"name":"HIREDATE","type":"DATE"},{"name":"SAL","type":"NUMBER"},{"name":"COMM","type":"NUMBER"},{"name":"DEPTNO","type":"NUMBER"}],
"items": [
{"empno":7369,"ename":"SMITH","job":"CLERK","mgr":7902,"hiredate":"17-DEC-80","sal":800,"deptno":20}
,{"empno":7499,"ename":"ALLEN","job":"SALESMAN","mgr":7698,"hiredate":"20-FEB-81","sal":1600,"comm":300,"deptno":30}
,{"empno":7521,"ename":"WARD","job":"SALESMAN","mgr":7698,"hiredate":"22-FEB-81","sal":1250,"comm":500,"deptno":30}
,{"empno":7566,"ename":"JONES","job":"MANAGER","mgr":7839,"hiredate":"02-APR-81","sal":2975,"deptno":20}
,{"empno":7654,"ename":"MARTIN","job":"SALESMAN","mgr":7698,"hiredate":"28-SEP-81","sal":1250,"comm":1400,"deptno":30}
,{"empno":7698,"ename":"BLAKE","job":"MANAGER","mgr":7839,"hiredate":"01-MAY-81","sal":2850,"deptno":30}
,{"empno":7782,"ename":"CLARK","job":"MANAGER","mgr":7839,"hiredate":"09-JUN-81","sal":2450,"deptno":10}
,{"empno":7788,"ename":"SCOTT","job":"ANALYST","mgr":7566,"hiredate":"19-APR-87","sal":3000,"deptno":20}
,{"empno":7839,"ename":"KING","job":"PRESIDENT","hiredate":"17-NOV-81","sal":5000,"deptno":10}
,{"empno":7844,"ename":"TURNER","job":"SALESMAN","mgr":7698,"hiredate":"08-SEP-81","sal":1500,"comm":0,"deptno":30}
,{"empno":7876,"ename":"ADAMS","job":"CLERK","mgr":7788,"hiredate":"23-MAY-87","sal":1100,"deptno":20}
,{"empno":7900,"ename":"JAMES","job":"CLERK","mgr":7698,"hiredate":"03-DEC-81","sal":950,"deptno":30}
,{"empno":7902,"ename":"FORD","job":"ANALYST","mgr":7566,"hiredate":"03-DEC-81","sal":3000,"deptno":20}
,{"empno":7934,"ename":"MILLER","job":"CLERK","mgr":7782,"hiredate":"23-JAN-82","sal":1300,"deptno":10}
]}]}

The "emp.xml" file contains the following data.

<?xml version='1.0'  encoding='UTF-8' ?>
<RESULTS>
  <ROW><EMPNO>7369</EMPNO><ENAME>SMITH</ENAME><JOB>CLERK</JOB><MGR>7902</MGR><HIREDATE>17-DEC-80</HIREDATE><SAL>800</SAL><COMM></COMM><DEPTNO>20</DEPTNO></ROW>
  <ROW><EMPNO>7499</EMPNO><ENAME>ALLEN</ENAME><JOB>SALESMAN</JOB><MGR>7698</MGR><HIREDATE>20-FEB-81</HIREDATE><SAL>1600</SAL><COMM>300</COMM><DEPTNO>30</DEPTNO></ROW>
  <ROW><EMPNO>7521</EMPNO><ENAME>WARD</ENAME><JOB>SALESMAN</JOB><MGR>7698</MGR><HIREDATE>22-FEB-81</HIREDATE><SAL>1250</SAL><COMM>500</COMM><DEPTNO>30</DEPTNO></ROW>
  <ROW><EMPNO>7566</EMPNO><ENAME>JONES</ENAME><JOB>MANAGER</JOB><MGR>7839</MGR><HIREDATE>02-APR-81</HIREDATE><SAL>2975</SAL><COMM></COMM><DEPTNO>20</DEPTNO></ROW>
  <ROW><EMPNO>7654</EMPNO><ENAME>MARTIN</ENAME><JOB>SALESMAN</JOB><MGR>7698</MGR><HIREDATE>28-SEP-81</HIREDATE><SAL>1250</SAL><COMM>1400</COMM><DEPTNO>30</DEPTNO></ROW>
  <ROW><EMPNO>7698</EMPNO><ENAME>BLAKE</ENAME><JOB>MANAGER</JOB><MGR>7839</MGR><HIREDATE>01-MAY-81</HIREDATE><SAL>2850</SAL><COMM></COMM><DEPTNO>30</DEPTNO></ROW>
  <ROW><EMPNO>7782</EMPNO><ENAME>CLARK</ENAME><JOB>MANAGER</JOB><MGR>7839</MGR><HIREDATE>09-JUN-81</HIREDATE><SAL>2450</SAL><COMM></COMM><DEPTNO>10</DEPTNO></ROW>
  <ROW><EMPNO>7788</EMPNO><ENAME>SCOTT</ENAME><JOB>ANALYST</JOB><MGR>7566</MGR><HIREDATE>19-APR-87</HIREDATE><SAL>3000</SAL><COMM></COMM><DEPTNO>20</DEPTNO></ROW>
  <ROW><EMPNO>7839</EMPNO><ENAME>KING</ENAME><JOB>PRESIDENT</JOB><MGR></MGR><HIREDATE>17-NOV-81</HIREDATE><SAL>5000</SAL><COMM></COMM><DEPTNO>10</DEPTNO></ROW>
  <ROW><EMPNO>7844</EMPNO><ENAME>TURNER</ENAME><JOB>SALESMAN</JOB><MGR>7698</MGR><HIREDATE>08-SEP-81</HIREDATE><SAL>1500</SAL><COMM>0</COMM><DEPTNO>30</DEPTNO></ROW>
  <ROW><EMPNO>7876</EMPNO><ENAME>ADAMS</ENAME><JOB>CLERK</JOB><MGR>7788</MGR><HIREDATE>23-MAY-87</HIREDATE><SAL>1100</SAL><COMM></COMM><DEPTNO>20</DEPTNO></ROW>
  <ROW><EMPNO>7900</EMPNO><ENAME>JAMES</ENAME><JOB>CLERK</JOB><MGR>7698</MGR><HIREDATE>03-DEC-81</HIREDATE><SAL>950</SAL><COMM></COMM><DEPTNO>30</DEPTNO></ROW>
  <ROW><EMPNO>7902</EMPNO><ENAME>FORD</ENAME><JOB>ANALYST</JOB><MGR>7566</MGR><HIREDATE>03-DEC-81</HIREDATE><SAL>3000</SAL><COMM></COMM><DEPTNO>20</DEPTNO></ROW>
  <ROW><EMPNO>7934</EMPNO><ENAME>MILLER</ENAME><JOB>CLERK</JOB><MGR>7782</MGR><HIREDATE>23-JAN-82</HIREDATE><SAL>1300</SAL><COMM></COMM><DEPTNO>10</DEPTNO></ROW>
</RESULTS>

The "emp.xslx" file contains the CSV data from the "emp.scv" loaded into Excel and saved as an XSLX file. There is only a single worksheet, and it's called "emp sheet".

All the queries below use the following column formats in SQLcl/SQL*Plus unless otherwise stated.

COLUMN COL001 FORMAT A10
COLUMN COL002 FORMAT A10
COLUMN COL003 FORMAT A10
COLUMN COL004 FORMAT A10
COLUMN COL005 FORMAT A10
COLUMN COL006 FORMAT A10
COLUMN COL007 FORMAT A10
COLUMN COL008 FORMAT A10
SET LINESIZE 110

CSV Data

In the following example we use the FILE_TO_BLOB function to load the contents of the "emp.csv" file into a BLOB, so we can pass it into the PARSE function as the P_CONTENT parameter. We also pass in the file name as the P_FILE_NAME parameter. The type of data to load can be inferred from the file extension. Alternatively we could use the P_FILE_TYPE or P_FILE_PROFILE parameters to indicate the type of data, but using the file name is simple. The select list includes the line number and the first eight columns.

SELECT line_number, col001, col002, col003, col004, col005, col006, col007, col008
FROM   TABLE(
         APEX_DATA_PARSER.parse(
           p_content   => file_to_blob('TEMP_DIR', 'emp.csv'),
           p_file_name => 'emp.csv'
         )
       );

LINE_NUMBER COL001     COL002     COL003     COL004     COL005     COL006     COL007     COL008
----------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
          1 EMPNO      ENAME      JOB        MGR        HIREDATE   SAL        COMM       DEPTNO
          2 7369       SMITH      CLERK      7902       17-DEC-80  800                   20
          3 7499       ALLEN      SALESMAN   7698       20-FEB-81  1600       300        30
          4 7521       WARD       SALESMAN   7698       22-FEB-81  1250       500        30
          5 7566       JONES      MANAGER    7839       02-APR-81  2975                  20
          6 7654       MARTIN     SALESMAN   7698       28-SEP-81  1250       1400       30
          7 7698       BLAKE      MANAGER    7839       01-MAY-81  2850                  30
          8 7782       CLARK      MANAGER    7839       09-JUN-81  2450                  10
          9 7788       SCOTT      ANALYST    7566       19-APR-87  3000                  20
         10 7839       KING       PRESIDENT             17-NOV-81  5000                  10
         11 7844       TURNER     SALESMAN   7698       08-SEP-81  1500       0          30
         12 7876       ADAMS      CLERK      7788       23-MAY-87  1100                  20
         13 7900       JAMES      CLERK      7698       03-DEC-81  950                   30
         14 7902       FORD       ANALYST    7566       03-DEC-81  3000                  20
         15 7934       MILLER     CLERK      7782       23-JAN-82  1300                  10

15 rows selected.

SQL>

For CSV data, by default the first row contains the column headers, then we get the 14 rows of data we would expect. We could remove the column headers row by adding the "P_ADD_HEADERS_ROW => 'N'" parameter into the function call.

JSON Data

In the following example we use the "emp.json" file. It's similar to the CSV example, but we need to use the P_ROW_SELECTOR parameter to show where the array of row data can be found, which in this case is the "results.items" item. For JSON data, the column names aren't included in the output by default, so we use the "P_ADD_HEADERS_ROW => 'Y'" parameter to include them.

SELECT line_number, col001, col002, col003, col004, col005, col006, col007, col008
FROM   TABLE(
         APEX_DATA_PARSER.parse(
           p_content         => file_to_blob('TEMP_DIR', 'emp.json'),
           p_file_name       => 'emp.json',
           p_row_selector    => 'results.items',
           p_add_headers_row => 'Y'
         )
       );

LINE_NUMBER COL001     COL002     COL003     COL004     COL005     COL006     COL007     COL008
----------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
            JOB        MGR        SAL        COMM       EMPNO      ENAME      DEPTNO     HIREDATE
          1 CLERK      7902       800                   7369       SMITH      20         17-DEC-80
          2 SALESMAN   7698       1600       300        7499       ALLEN      30         20-FEB-81
          3 SALESMAN   7698       1250       500        7521       WARD       30         22-FEB-81
          4 MANAGER    7839       2975                  7566       JONES      20         02-APR-81
          5 SALESMAN   7698       1250       1400       7654       MARTIN     30         28-SEP-81
          6 MANAGER    7839       2850                  7698       BLAKE      30         01-MAY-81
          7 MANAGER    7839       2450                  7782       CLARK      10         09-JUN-81
          8 ANALYST    7566       3000                  7788       SCOTT      20         19-APR-87
          9 PRESIDENT             5000                  7839       KING       10         17-NOV-81
         10 SALESMAN   7698       1500       0          7844       TURNER     30         08-SEP-81
         11 CLERK      7788       1100                  7876       ADAMS      20         23-MAY-87
         12 CLERK      7698       950                   7900       JAMES      30         03-DEC-81
         13 ANALYST    7566       3000                  7902       FORD       20         03-DEC-81
         14 CLERK      7782       1300                  7934       MILLER     10         23-JAN-82

15 rows selected.

SQL>

Notice the column order is not what we might have expected.

XML Data

In the following example we use the "emp.xml" file. It's similar to the CSV example, but we need to use the P_ROW_SELECTOR parameter show where the array of row data can be found, which in this case is the "/RESULTS/ROW" tag. For XML data, column names aren't included in the output by default, so we use the "P_ADD_HEADERS_ROW => 'Y'" parameter to include them.

SELECT line_number, col001, col002, col003, col004, col005, col006, col007, col008
FROM   TABLE(
         APEX_DATA_PARSER.parse(
           p_content         => file_to_blob('TEMP_DIR', 'emp.xml'),
           p_file_name       => 'emp.xml',
           p_row_selector    => '/RESULTS/ROW',
           p_add_headers_row => 'Y'
         )
       );

LINE_NUMBER COL001     COL002     COL003     COL004     COL005     COL006     COL007     COL008
----------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
            EMPNO      ENAME      JOB        MGR        HIREDATE   SAL        COMM       DEPTNO
          1 7369       SMITH      CLERK      7902       17-DEC-80  800                   20
          2 7499       ALLEN      SALESMAN   7698       20-FEB-81  1600       300        30
          3 7521       WARD       SALESMAN   7698       22-FEB-81  1250       500        30
          4 7566       JONES      MANAGER    7839       02-APR-81  2975                  20
          5 7654       MARTIN     SALESMAN   7698       28-SEP-81  1250       1400       30
          6 7698       BLAKE      MANAGER    7839       01-MAY-81  2850                  30
          7 7782       CLARK      MANAGER    7839       09-JUN-81  2450                  10
          8 7788       SCOTT      ANALYST    7566       19-APR-87  3000                  20
          9 7839       KING       PRESIDENT             17-NOV-81  5000                  10
         10 7844       TURNER     SALESMAN   7698       08-SEP-81  1500       0          30
         11 7876       ADAMS      CLERK      7788       23-MAY-87  1100                  20
         12 7900       JAMES      CLERK      7698       03-DEC-81  950                   30
         13 7902       FORD       ANALYST    7566       03-DEC-81  3000                  20
         14 7934       MILLER     CLERK      7782       23-JAN-82  1300                  10

15 rows selected.

SQL>

XLSX Data

In the following example we use the "emp.xslx" file. If we are interested in the first worksheet, we can keep things really simple and use it in a similar way to the CSV example.

SELECT line_number, col001, col002, col003, col004, col005, col006, col007, col008
FROM   TABLE(
         APEX_DATA_PARSER.parse(
           p_content         => file_to_blob('TEMP_DIR', 'emp.xlsx'),
           p_file_name       => 'emp.xlsx'
         )
       );

LINE_NUMBER COL001     COL002     COL003     COL004     COL005     COL006     COL007     COL007     COL008
----------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
          1 EMPNO      ENAME      JOB        MGR        HIREDATE   SAL        COMM       COMM       DEPTNO
          2 7369       SMITH      CLERK      7902       1980-12-17 800                              20
          3 7499       ALLEN      SALESMAN   7698       1981-02-20 1600       300        300        30
          4 7521       WARD       SALESMAN   7698       1981-02-22 1250       500        500        30
          5 7566       JONES      MANAGER    7839       1981-04-02 2975                             20
          6 7654       MARTIN     SALESMAN   7698       1981-09-28 1250       1400       1400       30
          7 7698       BLAKE      MANAGER    7839       1981-05-01 2850                             30
          8 7782       CLARK      MANAGER    7839       1981-06-09 2450                             10
          9 7788       SCOTT      ANALYST    7566       1987-04-19 3000                             20
         10 7839       KING       PRESIDENT             1981-11-17 5000                             10
         11 7844       TURNER     SALESMAN   7698       1981-09-08 1500       0          0          30
         12 7876       ADAMS      CLERK      7788       1987-05-23 1100                             20
         13 7900       JAMES      CLERK      7698       1981-12-03 950                              30
         14 7902       FORD       ANALYST    7566       1981-12-03 3000                             20
         15 7934       MILLER     CLERK      7782       1982-01-23 1300                             10

15 rows selected.

SQL>

If there are multiple worksheets, we use the P_XLSX_SHEET_NAME parameter to decide which sheet we want to use. We list the worksheets in the document using the GET_XLSX_WORKSHEETS function.

COLUMN sheet_display_name FORMAT A30
COLUMN sheet_file_name FORMAT A30

SELECT sheet_display_name, sheet_file_name
FROM   TABLE(
         APEX_DATA_PARSER.get_xlsx_worksheets(
           p_content => file_to_blob('TEMP_DIR', 'emp.xlsx')
         )
       );

SHEET_DISPLAY_NAME             SHEET_FILE_NAME
------------------------------ ------------------------------
emp sheet                      sheet1.xml

SQL>

We then use the SHEET_FILE_NAME value in the parse call.

SELECT line_number, col001, col002, col003, col004, col005, col006, col007, col008
FROM   TABLE(
         APEX_DATA_PARSER.parse(
           p_content         => file_to_blob('TEMP_DIR', 'emp.xlsx'),
           p_file_name       => 'emp.xlsx',
           p_xlsx_sheet_name => 'sheet1.xml'
         )
       );

LINE_NUMBER COL001     COL002     COL003     COL004     COL005     COL006     COL007     COL007     COL008
----------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
          1 EMPNO      ENAME      JOB        MGR        HIREDATE   SAL        COMM       COMM       DEPTNO
          2 7369       SMITH      CLERK      7902       1980-12-17 800                              20
          3 7499       ALLEN      SALESMAN   7698       1981-02-20 1600       300        300        30
          4 7521       WARD       SALESMAN   7698       1981-02-22 1250       500        500        30
          5 7566       JONES      MANAGER    7839       1981-04-02 2975                             20
          6 7654       MARTIN     SALESMAN   7698       1981-09-28 1250       1400       1400       30
          7 7698       BLAKE      MANAGER    7839       1981-05-01 2850                             30
          8 7782       CLARK      MANAGER    7839       1981-06-09 2450                             10
          9 7788       SCOTT      ANALYST    7566       1987-04-19 3000                             20
         10 7839       KING       PRESIDENT             1981-11-17 5000                             10
         11 7844       TURNER     SALESMAN   7698       1981-09-08 1500       0          0          30
         12 7876       ADAMS      CLERK      7788       1987-05-23 1100                             20
         13 7900       JAMES      CLERK      7698       1981-12-03 950                              30
         14 7902       FORD       ANALYST    7566       1981-12-03 3000                             20
         15 7934       MILLER     CLERK      7782       1982-01-23 1300                             10

15 rows selected.

SQL>

For more information see:

Hope this helps. Regards Tim...

Back to the Top.