8i | 9i | 10g | 11g | 12c | 13c | 18c | 19c | 21c | 23c | Misc | PL/SQL | SQL | RAC | WebLogic | Linux

Home » Articles » 10g » Here

CTX_DOC.MARKUP - Converting Documents to Highlighted HTML using Oracle Text

The CTX_DOC.MARKUP procedure converts formatted documents stored in BLOBs into HTML with highlighted search terms. The following example stores two word documents in a table, then retrieves and reformats the documents using Oracle Text functionality.

First, we must create a table containing a BLOB column to hold the documents and the converted output. The MARKUP procedure can convert and highlight the documents in memory, or using a table. In this example we will use the table option, hence the presence of the markup table.

DROP TABLE my_docs;
DROP SEQUENCE my_docs_seq;
DROP PROCEDURE load_file_to_my_docs;
DROP TABLE my_markup_table;

CREATE TABLE my_docs (
  id    NUMBER(10)     NOT NULL,
  name  VARCHAR2(200)  NOT NULL,
  doc   BLOB           NOT NULL
);

ALTER TABLE my_docs ADD (
  CONSTRAINT my_docs_pk PRIMARY KEY (id)
);

CREATE SEQUENCE my_docs_seq;

CREATE TABLE my_markup_table (
  query_id 	NUMBER,
  document 	CLOB
);

CREATE OR REPLACE DIRECTORY documents AS 'C:\temp';

Create a procedure to load the documents into the table.

CREATE OR REPLACE PROCEDURE load_file_to_my_docs (p_file_name  IN  my_docs.name%TYPE) AS
  l_bfile      BFILE;
  l_blob       BLOB;

  l_dest_offset INTEGER := 1;
  l_src_offset  INTEGER := 1;
BEGIN
  INSERT INTO my_docs (id, name, doc)
  VALUES (my_docs_seq.NEXTVAL, p_file_name, empty_blob())
  RETURN doc INTO l_blob;

  l_bfile := BFILENAME('DOCUMENTS', p_file_name);
  DBMS_LOB.fileopen(v_bfile, Dbms_Lob.File_Readonly);
  -- loadfromfile deprecated.
  -- DBMS_LOB.loadfromfile(l_blob, l_bfile, DBMS_LOB.getlength(l_bfile));
  DBMS_LOB.loadblobfromfile (
    dest_lob    => l_blob,
    src_bfile   => l_bfile,
    amount      => DBMS_LOB.lobmaxsize,
    dest_offset => l_dest_offset,
    src_offset  => l_src_offset);

  DBMS_LOB.fileclose(l_bfile);

  COMMIT;
END;
/

Place the cat_and_apple.doc and dog_and_banana.doc files into the appropriate load directory specified by the Oracle directory object, then call the load procedure.

EXEC load_file_to_my_docs('cat_and_apple.doc');
EXEC load_file_to_my_docs('dog_and_banana.doc');

Create the Oracle Text index and gather the statistics on the table.

CREATE INDEX my_docs_doc_idx ON my_docs(doc) INDEXTYPE IS CTXSYS.CONTEXT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(USER, 'MY_DOCS', cascade=>TRUE);

The code below retrieves all documents that match out search criteria by using the CONTAINS function. In this case we are lookign for documents containing the words "dog" and "banana". The matching documents are then converted to HTML by the MARKUP procedure and placed in the markup table.

TRUNCATE TABLE my_markup_table;

DECLARE
  l_keywords  VARCHAR2(20) := 'dog AND banana';
  l_query_id  NUMBER := 1;
BEGIN
  FOR cur_rec IN (SELECT SCORE(1) score, id, name
                  FROM   my_docs
                  WHERE  CONTAINS(doc, l_keywords, 1) > 0
                  ORDER BY SCORE(1) DESC)
  LOOP
    -- Generate HTML version of document with
    -- highlighted search terms.
    CTX_DOC.markup( 
      index_name => 'my_docs_doc_idx', 
      textkey    => TO_CHAR(cur_rec.id), 
      text_query => l_keywords,
      restab     => 'my_markup_table', 
      query_id   => l_query_id, 
      plaintext  => FALSE,
      tagset     => 'HTML_NAVIGATE');
      
    l_query_id := l_query_id + 1;
  END LOOP;
END;
/

Querying the markup table reveals that the correct document was found the search terms have been highlighted as expected.

SET LONG 1000
SELECT * FROM my_markup_table;

  QUERY_ID DOCUMENT
---------- --------------------------------------------------------------------------------
         1 <html><body><p/>
           <br /><font face="Times New Roman" size="3">This is a document that contains the
            words <A NAME=ctx1><B>dog</B></A> <A HREF=#ctx2>></A> and <A HREF=#ctx1><
           </A> <A NAME=ctx2><B>banana</B></A>.</font>
           <br/><br/><br/></body></html>


1 row selected.

SQL>

For more information see:

Hope this helps. Regards Tim...

Back to the Top.