8i | 9i | 10g | 11g | 12c | 13c | 18c | 19c | 21c | 23c | Misc | PL/SQL | SQL | RAC | WebLogic | Linux
One True Lookup Table (OTLT)
The One True Lookup Table (OTLT) design, also know as "one lookup table to rule them all", is a classic anti-pattern that looks so sensible at first glance, that you will defend your decision to use it, even as your database performance takes a nose dive, your data integrity is compromised, and your developers stick pins into a voodoo doll in your likeness!
What is One True Lookup Table (OTLT)?
Most systems have lots of basic lookup tables that look similar to this.
-- DROP TABLE order_types PURGE; CREATE TABLE order_types ( order_type_id NUMBER NOT NULL, order_type_code VARCHAR2(10) NOT NULL, order_type_desc VARCHAR2(100) NOT NULL, CONSTRAINT order_types_pk PRIMARY KEY (id), CONSTRAINT order_types_uk UNIQUE (code) ); -- DROP TABLE order_statuses PURGE; CREATE TABLE order_statuses ( order_status_id NUMBER NOT NULL, order_status_code VARCHAR2(10) NOT NULL, order_status_desc VARCHAR2(100) NOT NULL, CONSTRAINT order_statuses_pk PRIMARY KEY (id), CONSTRAINT order_statuses_uk UNIQUE (code) );
Having many similar tables seems like a messy solution to some designers, so they feel the need to consolidate them into a single table, under the guise of making the data model simpler, and reducing the amount of maintenance screens. This consolidation results in the One True Lookup Table (OTLT) anti-pattern. The starting point for our shared lookup table might look like this.
-- DROP TABLE lookups PURGE; CREATE TABLE lookups ( lookup_id NUMBER NOT NULL, lookup_key VARCHAR2(100) NOT NULL, lookup_value VARCHAR2(500) NOT NULL, CONSTRAINT lookups_pk PRIMARY KEY (lookup_id), CONSTRAINT lookups_uk UNIQUE (lookup_key) );
You will need a way of distinguishing between the types of lookup data, so we might add a lookup type or domain column. Something to remember is we might have the same key for multiple lookup types, so we have to include the lookup type in the unique key. This might give us something like this.
-- DROP TABLE lookups PURGE; CREATE TABLE lookups ( lookup_id NUMBER NOT NULL, lookup_type VARCHAR2(20) NOT NULL, lookup_key VARCHAR2(100) NOT NULL, lookup_value VARCHAR2(500) NOT NULL, CONSTRAINT lookups_pk PRIMARY KEY (lookup_id), CONSTRAINT lookups_uk UNIQUE (lookup_type, lookup_key) );
The lookup type is not very descriptive, so it might be better to have a proper description kept somewhere.
-- DROP TABLE lookups PURGE; -- DROP TABLE lookup_types PURGE; CREATE TABLE lookup_types ( lookup_type_id NUMBER NOT NULL, lookup_type_code VARCHAR2(20) NOT NULL, lookup_description VARCHAR2(100) NOT NULL, CONSTRAINT lookup_types_pk PRIMARY KEY (lookup_type_id), CONSTRAINT lookup_types_uk UNIQUE (lookup_type_code) ); CREATE TABLE lookups ( lookup_id NUMBER NOT NULL, lookup_type_code VARCHAR2(20) NOT NULL, lookup_key VARCHAR2(100) NOT NULL, lookup_value VARCHAR2(500) NOT NULL, CONSTRAINT lookups_pk PRIMARY KEY (lookup_id), CONSTRAINT lookups_uk UNIQUE (lookup_type_code, lookup_key), CONSTRAINT lookup_lookup_type_fk FOREIGN KEY (lookup_type_code) REFERENCES lookup_types (lookup_type_code) );
Note. I would not normally foreign key to a unique key like this, as things change over time, but it makes the SQL even tougher if we don't.
We pat ourselves on the back at our wonderful solution...
The problem with this solution is it is dealing with a single datatype for the key and the value. What happens if we need a variety of key-value pairs from a data type perspective? Either we ignore the data type variations and incur additional data type conversions, or we add more optional columns of differing types, like NUMBER, DATE and even string length variants. We tell yourself it's fine, as it doesn't waste a lot of space to have empty columns. The problem now is we have made an important bit of information optional, when it really shouldn't be.
CREATE TABLE lookups ( lookup_id NUMBER NOT NULL, lookup_type_code VARCHAR2(20) NOT NULL, lookup_key VARCHAR2(100) NOT NULL, lookup_string_value VARCHAR2(500), lookup_short_string_value VARCHAR2(100), lookup_long_string_value VARCHAR2(4000), lookup_number_value NUMBER, lookup_date_value DATE, lookup_ts_value TIMESTAMP, CONSTRAINT lookups_pk PRIMARY KEY (lookup_id), CONSTRAINT lookups_uk UNIQUE (lookup_type_code, lookup_key), CONSTRAINT lookup_lookup_type_fk FOREIGN KEY (lookup_type_code) REFERENCES lookup_types (lookup_type_code) );
Then a couple of our basic lookups need a few additional fields to be kept with them. That means splitting them out of the generic lookup table, and revisiting all bits of the application code that reference it, or adding some additional columns to the lookup table, maybe as "flex fields", which can be used to store different things. Invariably we pick the "easy option" and end up with something like this.
CREATE TABLE lookups ( lookup_id NUMBER NOT NULL, lookup_type_code VARCHAR2(20) NOT NULL, lookup_key VARCHAR2(100) NOT NULL, lookup_string_value VARCHAR2(500), lookup_short_string_value VARCHAR2(100), lookup_long_string_value VARCHAR2(4000), lookup_number_value NUMBER, lookup_date_value DATE, lookup_ts_value TIMESTAMP, flex_field_1 VARCHAR2(4000), flex_field_2 VARCHAR2(4000), flex_field_3 VARCHAR2(4000), flex_field_4 VARCHAR2(4000), flex_field_5 VARCHAR2(4000), CONSTRAINT lookups_pk PRIMARY KEY (lookup_id), CONSTRAINT lookups_uk UNIQUE (lookup_type_code, lookup_key), CONSTRAINT lookup_lookup_type_fk FOREIGN KEY (lookup_type_code) REFERENCES lookup_types (lookup_type_code) );
Before we know it, our "elegant solution" to save on reference tables becomes a complete disaster area.
So What's The Problem?
There are so many problems with this approach. Here are a few.
- It makes the SQL look ugly (see below).
- Many statements will require multiple joins to the lookup table. The extra join columns make the statements look bigger and scarier. There will be the same number of joins when using separate lookup tables, but those joins will be simpler.
- Multiple references to the same table can make it hard to determine what is happening in the execution plan, as you will see those repeated references there, and have to refer to the predicates to understand the context of table reference. If you were using separate lookup tables, it would be clear which table you were referring to at any point of the execution plan.
- You can't foreign key to this type of table. Technically you can if you are willing to put both columns (lookup_type_code and lookup_key) in the table, but you won't because it is ugly. This means there is a good chance your data integrity will be compromised over time. It's really easy to foreign key to individual lookup tables, and therefore protect your data.
- It's hard to control the contents of the table. It's a shared resource, so check constraints and triggers are problematic. If you need users to have different privileges, depending on which lookup they are dealing with, things are going to get messy. That would be really easy with separate lookup tables.
- If you need to make a change for one reference type, like extending the size of the key or value, it affects all reference data. Using separate lookup tables isolates the change.
- Over time, many reference tables take on additional data. To model that you would need to either split out that reference data from this shared lookup table, or start adding optional columns to cope with the "one-off" issues. A change like this is really simple for separate lookup tables.
- Data types matter. You should always use the correct data type, as it will reduce the number of data type conversions needed. Implicit data type conversions are bugs waiting to happen!
- Performance can be a problem with the OTLT approach as it's hard for the optimizer to make sound judgements about the data. The optimizer cares about cardinality, but it may be hard to make that decision if you are dealing with a large number of rows, most of which are irrelevant in any one specific context. The optimizer also cares about high/low values, but these are not be relevant to any one lookup, but shared. We've also mentioned you probably won't foreign key to this data, which will reduce the amount of information the optimizer has when making its decision. You may have artificially made columns optional, that are actually mandatory, a key must have a value, but which column? I think you get the message.
Ugly SQL?
We are probably going to join to this table multiple times from queries. Here is an example using ANSI and Non-ANSI join syntax.
-- ANSI Join SELECT o.order_id, o.order_type_code, ot.lookup_value AS order_type_description, o.order_status_code, os.lookup_value AS order_status_description FROM orders o JOIN lookups ot ON o.order_type_code = ot.lookup_key AND ot.lookup_type_code = 'ORDTYP' JOIN lookups os ON o.order_status_code = os.lookup_key AND os.lookup_type_code = 'ORDSTAT' -- Non-ANSI Join SELECT o.order_id, o.order_type_code, ot.lookup_value AS order_type_description, o.order_status_code, os.lookup_value AS order_status_description FROM orders o, lookups ot, lookups os WHERE o.order_type_code = ot.lookup_key AND ot.lookup_type_code = 'ORDTYP' AND o.order_status_code = os.lookup_key AND os.lookup_type_code = 'ORDSTAT'
What we can see is every reference to the lookup table requires two columns, and we are going to see the same table repeated in the execution plan multiple times. The table name and the column names don't give us any clue about why we are using this table.
I've seen examples of this where the number of references to the lookup table were in double figures.
Alternatively, we can do the job properly and things look clean, with correct table and column references.
-- ANSI Join SELECT o.order_id, ot.order_type_code, ot.order_type_desc, os.order_status_code, os.order_status_desc FROM orders o JOIN order_types ot ON o.order_type_id = ot.order_type_id JOIN order_statuses os ON o.order_status_id = order_status_id -- Non-ANSI Join SELECT o.order_id, ot.order_type_code, ot.order_type_desc, os.order_status_code, os.order_status_desc FROM orders o, order_types ot, order_statuses os WHERE o.order_type_id = ot.order_type_id AND o.order_status_id = os.order_status_id
Conclusion
The One True Lookup Tables (OTLT) approach seems to make sense at first. When designing a logical model it looks kind-of nice. Once you get down to implementing it with some real code, it starts to go wrong, and as the system evolves over time, that's just going to get worse.
Some people claim reasonable results where they stick to basic lookups, but I feel they've just not reached the pain point yet.
Please don't do this!
Hope this helps. Regards Tim...