CTX_DOC.MARKUP - Converting Documents to Highlighted HTML using Oracle Text
The CTX_DOC.MARKUP procedure converts formatted documents stored in BLOBs into HTML with highlighted search terms. The following example stores two word documents in a table, then retrieves and reformats the documents using Oracle Text functionality.
First, we must create a table containing a BLOB column to hold the documents and the converted output. The MARKUP procedure can convert and highlight the documents in memory, or using a table. In this example we will use the table option, hence the presence of the markup table.
DROP TABLE my_docs; DROP SEQUENCE my_docs_seq; DROP PROCEDURE load_file_to_my_docs; DROP TABLE my_markup_table; CREATE TABLE my_docs ( id NUMBER(10) NOT NULL, name VARCHAR2(200) NOT NULL, doc BLOB NOT NULL ); ALTER TABLE my_docs ADD ( CONSTRAINT my_docs_pk PRIMARY KEY (id) ); CREATE SEQUENCE my_docs_seq; CREATE TABLE my_markup_table ( query_id NUMBER, document CLOB ); CREATE OR REPLACE DIRECTORY documents AS 'C:\temp';
Create a procedure to load the documents into the table.
CREATE OR REPLACE PROCEDURE load_file_to_my_docs (p_file_name IN my_docs.name%TYPE) AS
v_bfile BFILE;
v_blob BLOB;
BEGIN
INSERT INTO my_docs (id, name, doc)
VALUES (my_docs_seq.NEXTVAL, p_file_name, empty_blob())
RETURN doc INTO v_blob;
v_bfile := BFILENAME('DOCUMENTS', p_file_name);
Dbms_Lob.Fileopen(v_bfile, Dbms_Lob.File_Readonly);
Dbms_Lob.Loadfromfile(v_blob, v_bfile, Dbms_Lob.Getlength(v_bfile));
Dbms_Lob.Fileclose(v_bfile);
COMMIT;
END;
/
Place the cat_and_apple.doc and dog_and_banana.doc files into the appropriate load directory specified by the Oracle directory object, then call the load procedure.
EXEC load_file_to_my_docs('cat_and_apple.doc');
EXEC load_file_to_my_docs('dog_and_banana.doc');
Create the Oracle Text index and gather the statistics on the table.
CREATE INDEX my_docs_doc_idx ON my_docs(doc) INDEXTYPE IS CTXSYS.CONTEXT; EXEC DBMS_STATS.GATHER_TABLE_STATS(USER, 'MY_DOCS', cascade=>TRUE);
The code below retrieves all documents that match out search criteria by using the CONTAINS function. In this case we are lookign for documents containing the words "dog" and "banana". The matching documents are then converted to HTML by the MARKUP procedure and placed in the markup table.
TRUNCATE TABLE my_markup_table;
DECLARE
l_keywords VARCHAR2(20) := 'dog AND banana';
l_query_id NUMBER := 1;
BEGIN
FOR cur_rec IN (SELECT SCORE(1) score, id, name
FROM my_docs
WHERE CONTAINS(doc, l_keywords, 1) > 0
ORDER BY SCORE(1) DESC)
LOOP
-- Generate HTML version of document with
-- highlighted search terms.
CTX_DOC.markup(
index_name => 'my_docs_doc_idx',
textkey => TO_CHAR(cur_rec.id),
text_query => l_keywords,
restab => 'my_markup_table',
query_id => l_query_id,
plaintext => FALSE,
tagset => 'HTML_NAVIGATE');
l_query_id := l_query_id + 1;
END LOOP;
END;
/
Querying the markup table reveals that the correct document was found the search terms have been highlighted as expected.
SET LONG 1000
SELECT * FROM my_markup_table;
QUERY_ID DOCUMENT
---------- --------------------------------------------------------------------------------
1 <html><body><p/>
<br /><font face="Times New Roman" size="3">This is a document that contains the
words <A NAME=ctx1><B>dog</B></A> <A HREF=#ctx2>></A> and <A HREF=#ctx1><
</A> <A NAME=ctx2><B>banana</B></A>.</font>
<br/><br/><br/></body></html>
1 row selected.
SQL>
For more information see:
Hope this helps. Regards Tim...
![]() |

