|Full Text Indexing using Oracle Text - Efficiently query free text and produce document classification applications using Oracle Text.|
Moloy Bagchi said...This is an excellent and short article about Full Text serch in Oracle 9i. Can you please write a similar article for 10g.
dnunknown said...Did not understand CLOB datatypes with indexes, now I get it, thanks
hamilton said...thanks ,
i'll work hard
Peter Berka said...Thank you for the Examples.
I've just compared the CTXCAT and the CONTEXT Index with standard character search with LIKE and wildcards in terms of Sizes and Performance.
Database Release is Oracle 184.108.40.206.0.
I populated the Table my_items with 15 Millions records, that gives a size of 932 MB for the data and 237 MB for the Primary Key.
Standard text search with wildcards took 34 seconds without any index:
SELECT id, price, name
WHERE description like '%iption (851222%';
(Cost: 11421, Bytes: 39.000.000, Cardinality: 750)
With a normal index on column description which is another 625 MB in Size the Query took 31 seconds,
of course only after changing parameter 'optimizer_index_cost_adj' to a very low value AND using the INDEX-Hint.
(Cost: 792, Bytes: 20.250.000, Cardinality: 750)
The CTXCAT-Index took a very long time to create, uses a vast temp segment and reaches finally a total size of 5,2 GB!
The Query took 0.3 seconds, however ONLY if you're searching for whole Words.
An expression like
SELECT id, price, name,
WHERE CATSEARCH(description, 'Descripti*', 'price BETWEEN 1 AND 5')> 0;
take 21 seconds, whereas a wildcard on the beginning of the phrase gets no results.
Finally the CONTEXT-Index was 2,4 GB in Size, the Query (with wildcards on both ends) took 18 seconds.
Text search with wildcards, which means Character by Character comparison, has always been hard work for every database, no matter which indexing technology are in use.
For my test case, the best result was searching via CONTEXT Index, but you have to deal with the Index Size and - more important - with sync'ing and optimizing this kind of Index fairly regularily.
There are some issues for tuning those CONTEXT Indexes, i.e. partitioning, custom lexer, using the keep pool and so on.
Not surprisingly the best would be if we can get rid of the wildcard at the beginning of the search string.
When I try this, I got query answer times of 0.01 to 0.30 seconds, even with the standard index.