8i | 9i | 10g | 11g | 12c | 13c | 18c | 19c | 21c | 23c | Misc | PL/SQL | SQL | RAC | WebLogic | Linux
Histogram Enhancements in Oracle Database 12c Release 1 (12.1)
Data skew in a column can make it difficult for the optimizer to accurately estimate the cardinality of an operation. Without a histogram it will assume an even distribution amongst the distinct values for the column. Take the case of a Yes/No flag for active records in a table. There may be 1 million rows in the table with only 100 being marked as active. The optimizer would assume half are marked as "Yes" and half as "No", which would be a really bad assumption in this case. Histograms describe the distribution of data in a column, which helps identify data skew and allows the optimizer to make a better decision.
In previous releases, two types of histograms were supported, "frequency" and "height-balanced" histograms. Oracle database 12c introduces two new types of histogram, a "top frequency" and "hybrid" histogram. This article presents an example of each type of histogram.
- Frequency Histograms (Pre-12c)
- Height-Balanced Histograms (Pre-12c)
- Top Frequency Histograms (12c)
- Hybrid Histograms (12c)
- Histogram Creation
- VARCHAR2 Columns
Related articles.
- Cost-Based Optimizer (CBO) And Database Statistics
- Statistics Collection Enhancements in Oracle Database 12c Release 1 (12.1)
Frequency Histograms (Pre-12c)
A frequency histogram is created when the number of distinct values (NDV) for the column is less than or equal to the total number of histogram buckets, which defaults to 254. This means each distinct value can be represented by its own bucket. The following code creates and populates a table. Half of the rows have a RECORD_TYPE
value of 0, with the other half being randomly assigned a value from 1-9, giving a total of 10 distinct values. Once populated, the data is queried using a filter on the skewed column, then statistics are gathered.
DROP TABLE tab1 PURGE; CREATE TABLE tab1 AS SELECT level AS id, CASE WHEN MOD(level, 2) = 0 THEN 0 ELSE TRUNC(DBMS_RANDOM.value(1,10)) END AS record_type, 'Description for ' || level AS description FROM dual CONNECT BY level <= 10000; COMMIT; ALTER TABLE tab1 ADD CONSTRAINT tab1_pk PRIMARY KEY (id); CREATE INDEX tab1_record_type_idx ON tab1(record_type); -- Query table referencing skewed column to trigger automatic -- histogram creation the next time statistics are gathered. SELECT COUNT(*) FROM tab1 WHERE record_type = 0; COUNT(*) ---------- 5000 SQL> -- Default statistics gathered. EXEC DBMS_STATS.gather_table_stats(USER, 'TAB1');
We can see the presence of the frequency histogram using the USER_TAB_COLUMNS
view.
COLUMN column_name FORMAT A20 SELECT column_id, column_name, histogram FROM user_tab_columns WHERE table_name = 'TAB1' ORDER BY column_id; COLUMN_ID COLUMN_NAME HISTOGRAM ---------- -------------------- --------------- 1 ID NONE 2 RECORD_TYPE FREQUENCY 3 DESCRIPTION NONE SQL>
The USER_TAB_HISTOGRAMS
view is used to display information about the contents of histograms. The ENDPOINT_VALUE
column represents the highest distinct column value in the bucket and the ENDPOINT_NUMBER
is a cumulative frequency of rows in the bucket. If we want to get the actual frequency of the specific column value, we need to subtract the previous cumulative value. The following query does that using the LAG
analytic function.
SELECT endpoint_value, endpoint_number, endpoint_number - LAG(endpoint_number, 1, 0) OVER (ORDER BY endpoint_value, endpoint_number) AS frequency FROM user_tab_histograms WHERE table_name = 'TAB1' AND column_name = 'RECORD_TYPE' ORDER BY endpoint_value; ENDPOINT_VALUE ENDPOINT_NUMBER FREQUENCY -------------- --------------- ---------- 0 5000 5000 1 5535 535 2 6104 569 3 6679 575 4 7243 564 5 7773 530 6 8329 556 7 8889 560 8 9469 580 9 10000 531 SQL>
Height-Balanced Histograms (Pre-12c)
In previous releases, when there were more distinct column values than the number of histogram buckets, a height-balanced histogram was created. In this case the histogram buckets are compressed so each bucket represents a range of values, which should contain approximately the same frequency of rows.
In Oracle database 12c, height-balanced histograms are only created if sampling is explicitly used during statistics collection. If there is no explicit sampling, Oracle will perform a full table scan and build a hybrid histogram, or possibly a top frequency histogram depending on the circumstances.
The following table has half the rows with a RECORD_TYPE
of 0 and the other half each having a unique value. The total number of distinct values is 5001, which is far in excess of the default number of histogram buckets of 254. Once populated, the data is queried using a filter on the skewed column, then statistics are gathered using an explicit sample size.
DROP TABLE tab1 PURGE; CREATE TABLE tab1 AS SELECT level AS id, CASE WHEN MOD(level,2) = 0 THEN 0 ELSE level END AS record_type, 'Description for ' || level AS description FROM dual CONNECT BY level <= 10000; COMMIT; ALTER TABLE tab1 ADD CONSTRAINT tab1_pk PRIMARY KEY (id); CREATE INDEX tab1_record_type_idx ON tab1(record_type); -- Query table referencing skewed column to trigger automatic -- histogram creation the next time statistics are gathered. SELECT COUNT(*) FROM tab1 WHERE record_type = 0; COUNT(*) ---------- 5000 SQL> -- Statistics gathered. Use explicit sampling to force height-balanced histogram. EXEC DBMS_STATS.gather_table_stats(USER, 'TAB1', estimate_percent=>10);
We can see the presence of the height-balanced histogram using the USER_TAB_COLUMNS
view.
COLUMN column_name FORMAT A20 SELECT column_id, column_name, histogram FROM user_tab_columns WHERE table_name = 'TAB1' ORDER BY column_id; COLUMN_ID COLUMN_NAME HISTOGRAM ---------- -------------------- --------------- 1 ID NONE 2 RECORD_TYPE HEIGHT BALANCED 3 DESCRIPTION NONE SQL>
Since there are not enough buckets to represent all the distinct values, the buckets are compressed to represent ranges of values. The ENDPOINT_VALUE
represents the highest value in the bucket and the ENDPOINT_NUMBER
represents the cumulative frequency. The following query displays the range, number of values represented and frequency associated with each bucket. The range is displayed as "<=" because popular values can span buckets in high-balanced histograms. Remember, this is sample data.
COLUMN range FORMAT A20 SELECT '<=' || endpoint_value AS range, endpoint_value - (LAG(endpoint_value, 1, -1) OVER (ORDER BY endpoint_value, endpoint_number)+1) + 1 AS vals_in_range, endpoint_number - LAG(endpoint_number, 1, 0) OVER (ORDER BY endpoint_value, endpoint_number) AS frequency FROM user_tab_histograms WHERE table_name = 'TAB1' AND column_name = 'RECORD_TYPE' ORDER BY endpoint_value; RANGE VALS_IN_RANGE FREQUENCY -------------------- ------------- ---------- <=0 1 125 <=57 57 1 <=143 86 1 <=209 66 1 <=281 72 1 <=359 78 1 <=443 84 1 <=515 72 1 <=579 64 1 <=679 100 1 <=749 70 1 RANGE VALS_IN_RANGE FREQUENCY -------------------- ------------- ---------- <=831 82 1 <=887 56 1 <=963 76 1 <=1051 88 1 <=1119 68 1 <=1201 82 1 <=1265 64 1 <=1343 78 1 <=1413 70 1 <=1481 68 1 <=1571 90 1 RANGE VALS_IN_RANGE FREQUENCY -------------------- ------------- ---------- <=1647 76 1 <=1735 88 1 <=1813 78 1 <=1875 62 1 <=1943 68 1 <=2029 86 1 <=2113 84 1 <=2181 68 1 <=2263 82 1 <=2335 72 1 <=2415 80 1 RANGE VALS_IN_RANGE FREQUENCY -------------------- ------------- ---------- <=2523 108 1 <=2615 92 1 <=2675 60 1 <=2753 78 1 <=2821 68 1 <=2899 78 1 <=2971 72 1 <=3051 80 1 <=3119 68 1 <=3189 70 1 <=3273 84 1 RANGE VALS_IN_RANGE FREQUENCY -------------------- ------------- ---------- <=3359 86 1 <=3445 86 1 <=3503 58 1 <=3587 84 1 <=3679 92 1 <=3755 76 1 <=3835 80 1 <=3911 76 1 <=3977 66 1 <=4051 74 1 <=4159 108 1 RANGE VALS_IN_RANGE FREQUENCY -------------------- ------------- ---------- <=4253 94 1 <=4355 102 1 <=4467 112 1 <=4567 100 1 <=4633 66 1 <=4709 76 1 <=4779 70 1 <=4869 90 1 <=4947 78 1 <=5025 78 1 <=5101 76 1 RANGE VALS_IN_RANGE FREQUENCY -------------------- ------------- ---------- <=5193 92 1 <=5281 88 1 <=5363 82 1 <=5443 80 1 <=5529 86 1 <=5621 92 1 <=5711 90 1 <=5767 56 1 <=5841 74 1 <=5909 68 1 <=5991 82 1 RANGE VALS_IN_RANGE FREQUENCY -------------------- ------------- ---------- <=6057 66 1 <=6115 58 1 <=6201 86 1 <=6273 72 1 <=6363 90 1 <=6439 76 1 <=6521 82 1 <=6605 84 1 <=6667 62 1 <=6733 66 1 <=6811 78 1 RANGE VALS_IN_RANGE FREQUENCY -------------------- ------------- ---------- <=6895 84 1 <=6961 66 1 <=7041 80 1 <=7113 72 1 <=7191 78 1 <=7263 72 1 <=7333 70 1 <=7429 96 1 <=7511 82 1 <=7567 56 1 <=7651 84 1 RANGE VALS_IN_RANGE FREQUENCY -------------------- ------------- ---------- <=7719 68 1 <=7805 86 1 <=7877 72 1 <=7977 100 1 <=8057 80 1 <=8115 58 1 <=8177 62 1 <=8259 82 1 <=8345 86 1 <=8433 88 1 <=8499 66 1 RANGE VALS_IN_RANGE FREQUENCY -------------------- ------------- ---------- <=8571 72 1 <=8645 74 1 <=8707 62 1 <=8793 86 1 <=8875 82 1 <=8949 74 1 <=9023 74 1 <=9085 62 1 <=9147 62 1 <=9227 80 1 <=9309 82 1 RANGE VALS_IN_RANGE FREQUENCY -------------------- ------------- ---------- <=9385 76 1 <=9479 94 1 <=9553 74 1 <=9641 88 1 <=9709 68 1 <=9781 72 1 <=9849 68 1 <=9933 84 1 <=9999 66 1 SQL>
Top Frequency Histograms (12c)
Top-frequency histograms are a variant on the frequency histogram, where the histogram only focuses on the popular values, ignoring the less popular values as statistically insignificant. For a top-frequency histogram to be useful, the number of distinct popular values must be less than or equal to the number of histogram buckets, while the non-popular values are significantly non-popular in comparison to the popular values.
The following table contains 9990 rows that are randomly assigned values 1-9, so there are approximately 1100 rows for each value. The remaining 10 rows each have a unique value. The METHOD_OPT
parameter is used to limit the bucket size to 10 during the statistics creation.
DROP TABLE tab1 PURGE; CREATE TABLE tab1 AS SELECT level AS id, CASE WHEN level <= 9990 THEN TRUNC(DBMS_RANDOM.value(1,10)) ELSE level END AS record_type, 'Description for ' || level AS description FROM dual CONNECT BY level <= 10000; COMMIT; ALTER TABLE tab1 ADD CONSTRAINT tab1_pk PRIMARY KEY (id); CREATE INDEX tab1_record_type_idx ON tab1(record_type); -- Statistics gathered. EXEC DBMS_STATS.gather_table_stats(USER, 'TAB1', method_opt => 'FOR COLUMNS RECORD_TYPE SIZE 10');
We can see the presence of the top frequency histogram using the USER_TAB_COLUMNS
view.
COLUMN column_name FORMAT A20 SELECT column_id, column_name, histogram FROM user_tab_columns WHERE table_name = 'TAB1' ORDER BY column_id; COLUMN_ID COLUMN_NAME HISTOGRAM ---------- -------------------- --------------- 1 ID NONE 2 RECORD_TYPE TOP-FREQUENCY 3 DESCRIPTION NONE SQL>
The following query displays the ENDPOINT_VALUE
, which is the value represented by each bucket, the ENDPOINT_NUMBER
, which is a cumulative sum of the rows represented and frequency associated with each bucket, which is calculated by subtracting the previous row's ENDPOINT_NUMBER
value. Notice the popular values are represented like a normal frequency histogram, but most of the less popular values are missing.
SELECT endpoint_value, endpoint_number, endpoint_number - LAG(endpoint_number, 1, 0) OVER (ORDER BY endpoint_value, endpoint_number) AS frequency FROM user_tab_histograms WHERE table_name = 'TAB1' AND column_name = 'RECORD_TYPE' ORDER BY endpoint_value; ENDPOINT_VALUE ENDPOINT_NUMBER FREQUENCY -------------- --------------- ---------- 1 1152 1152 2 2267 1115 3 3337 1070 4 4463 1126 5 5596 1133 6 6726 1130 7 7824 1098 8 8890 1066 9 9990 1100 10000 9991 1 SQL>
Hybrid Histograms (12c)
Hybrid histograms are like a combination of frequency and height-balanced histograms. In the majority of cases, 12c uses hybrid histograms in place of height-balanced histograms. Unlike height-balanced histograms, a single endpoint value cannot span buckets. In addition to the highest value in the bucket, the histogram stores the amount of times the highest value is represented in the bucket, giving an accurate idea of its popularity, as well as giving an indication of the popularity of the other endpoints in the bucket.
The following table contains 5000 rows that are randomly assigned values 1-99, so there are approximately 50 rows for each popular value. The remaining 5000 rows each have a unique value. Default statistics are gathered after the table is queried using a reference to the skewed column.
DROP TABLE tab1 PURGE; CREATE TABLE tab1 AS SELECT level AS id, CASE WHEN MOD(level,2) = 0 THEN TRUNC(DBMS_RANDOM.value(1,100)) ELSE level END AS record_type, 'Description for ' || level AS description FROM dual CONNECT BY level <= 10000; COMMIT; ALTER TABLE tab1 ADD CONSTRAINT tab1_pk PRIMARY KEY (id); CREATE INDEX tab1_record_type_idx ON tab1(record_type); -- Query table referencing skewed column to trigger -- histogram creation during next stats gathering. SELECT COUNT(*) FROM tab1 WHERE record_type = 1; COUNT(*) ---------- 54 SQL> -- Statistics gathered. EXEC DBMS_STATS.gather_table_stats(USER, 'TAB1');
We can see the presence of the hybrid histogram using the USER_TAB_COLUMNS
view.
COLUMN column_name FORMAT A20 SELECT column_id, column_name, histogram FROM user_tab_columns WHERE table_name = 'TAB1' ORDER BY column_id; COLUMN_ID COLUMN_NAME HISTOGRAM ---------- -------------------- --------------- 1 ID NONE 2 RECORD_TYPE HYBRID 3 DESCRIPTION NONE SQL>
The ENDPOINT_VALUE
represents the highest value in the bucket and the ENDPOINT_NUMBER
represents the cumulative frequency. The following query displays the range, number of values represented and frequency associated with each bucket. It also includes the ENDPOINT_REPEAT_COUNT
column.
COLUMN range FORMAT A20 SELECT (LAG(endpoint_value, 1, 0) OVER (ORDER BY endpoint_value, endpoint_number)+1) || '-' || endpoint_value AS range, endpoint_value - (LAG(endpoint_value, 1, 0) OVER (ORDER BY endpoint_value, endpoint_number)+1) + 1 AS vals_in_range, endpoint_number - LAG(endpoint_number, 1, 0) OVER (ORDER BY endpoint_value, endpoint_number) AS frequency, endpoint_repeat_count FROM user_tab_histograms WHERE table_name = 'TAB1' AND column_name = 'RECORD_TYPE' ORDER BY endpoint_value; RANGE VALS_IN_RANGE FREQUENCY ENDPOINT_REPEAT_COUNT -------------------- ------------- ---------- --------------------- 1-1 1 54 54 2-2 1 56 56 3-3 1 51 51 4-4 1 44 44 5-5 1 48 48 6-6 1 54 54 7-7 1 54 54 8-8 1 44 44 9-9 1 62 62 10-10 1 41 41 11-11 1 61 61 RANGE VALS_IN_RANGE FREQUENCY ENDPOINT_REPEAT_COUNT -------------------- ------------- ---------- --------------------- 12-12 1 47 47 13-13 1 56 56 14-14 1 57 57 15-15 1 63 63 16-16 1 60 60 17-17 1 72 72 18-18 1 51 51 19-19 1 46 46 20-20 1 48 48 21-21 1 43 43 22-22 1 61 61 RANGE VALS_IN_RANGE FREQUENCY ENDPOINT_REPEAT_COUNT -------------------- ------------- ---------- --------------------- 23-23 1 47 47 24-24 1 57 57 25-25 1 53 53 26-26 1 43 43 27-27 1 48 48 28-28 1 60 60 29-29 1 54 54 30-30 1 49 49 31-31 1 44 44 32-32 1 46 46 33-33 1 43 43 RANGE VALS_IN_RANGE FREQUENCY ENDPOINT_REPEAT_COUNT -------------------- ------------- ---------- --------------------- 34-34 1 59 59 35-35 1 53 53 36-36 1 41 41 37-37 1 39 39 38-38 1 49 49 39-39 1 54 54 40-40 1 46 46 41-41 1 39 39 42-42 1 66 66 43-43 1 50 50 44-44 1 69 69 RANGE VALS_IN_RANGE FREQUENCY ENDPOINT_REPEAT_COUNT -------------------- ------------- ---------- --------------------- 45-45 1 48 48 46-46 1 41 41 47-47 1 54 54 48-48 1 53 53 49-49 1 52 52 50-50 1 60 60 51-51 1 46 46 52-52 1 47 47 53-53 1 46 46 54-54 1 48 48 55-55 1 55 55 RANGE VALS_IN_RANGE FREQUENCY ENDPOINT_REPEAT_COUNT -------------------- ------------- ---------- --------------------- 56-56 1 40 40 57-57 1 60 60 58-58 1 62 62 59-59 1 42 42 60-61 2 73 45 62-62 1 45 45 63-63 1 49 49 64-64 1 46 46 65-65 1 52 52 66-66 1 53 53 67-67 1 50 50 RANGE VALS_IN_RANGE FREQUENCY ENDPOINT_REPEAT_COUNT -------------------- ------------- ---------- --------------------- 68-68 1 44 44 69-69 1 55 55 70-70 1 54 54 71-71 1 51 51 72-72 1 54 54 73-73 1 48 48 74-74 1 46 46 75-75 1 52 52 76-76 1 62 62 77-77 1 64 64 78-78 1 60 60 RANGE VALS_IN_RANGE FREQUENCY ENDPOINT_REPEAT_COUNT -------------------- ------------- ---------- --------------------- 79-79 1 57 57 80-80 1 51 51 81-81 1 42 42 82-82 1 35 35 83-83 1 59 59 84-84 1 57 57 85-85 1 49 49 86-86 1 44 44 87-87 1 50 50 88-88 1 56 56 89-89 1 46 46 RANGE VALS_IN_RANGE FREQUENCY ENDPOINT_REPEAT_COUNT -------------------- ------------- ---------- --------------------- 90-90 1 56 56 91-91 1 58 58 92-92 1 43 43 93-93 1 59 59 94-94 1 45 45 95-95 1 46 46 96-96 1 63 63 97-97 1 49 49 98-98 1 36 36 99-99 1 55 55 100-163 64 32 1 RANGE VALS_IN_RANGE FREQUENCY ENDPOINT_REPEAT_COUNT -------------------- ------------- ---------- --------------------- 164-227 64 32 1 228-291 64 32 1 292-355 64 32 1 356-419 64 32 1 420-483 64 32 1 484-547 64 32 1 548-609 62 31 1 610-673 64 32 1 674-737 64 32 1 738-801 64 32 1 802-865 64 32 1 RANGE VALS_IN_RANGE FREQUENCY ENDPOINT_REPEAT_COUNT -------------------- ------------- ---------- --------------------- 866-929 64 32 1 930-993 64 32 1 994-1057 64 32 1 1058-1121 64 32 1 1122-1185 64 32 1 1186-1249 64 32 1 1250-1311 62 31 1 1312-1375 64 32 1 1376-1439 64 32 1 1440-1503 64 32 1 1504-1567 64 32 1 RANGE VALS_IN_RANGE FREQUENCY ENDPOINT_REPEAT_COUNT -------------------- ------------- ---------- --------------------- 1568-1631 64 32 1 1632-1695 64 32 1 1696-1759 64 32 1 1760-1823 64 32 1 1824-1887 64 32 1 1888-1951 64 32 1 1952-2013 62 31 1 2014-2077 64 32 1 2078-2141 64 32 1 2142-2205 64 32 1 2206-2269 64 32 1 RANGE VALS_IN_RANGE FREQUENCY ENDPOINT_REPEAT_COUNT -------------------- ------------- ---------- --------------------- 2270-2333 64 32 1 2334-2397 64 32 1 2398-2461 64 32 1 2462-2525 64 32 1 2526-2589 64 32 1 2590-2651 62 31 1 2652-2715 64 32 1 2716-2779 64 32 1 2780-2843 64 32 1 2844-2907 64 32 1 2908-2971 64 32 1 RANGE VALS_IN_RANGE FREQUENCY ENDPOINT_REPEAT_COUNT -------------------- ------------- ---------- --------------------- 2972-3035 64 32 1 3036-3099 64 32 1 3100-3163 64 32 1 3164-3227 64 32 1 3228-3291 64 32 1 3292-3353 62 31 1 3354-3417 64 32 1 3418-3481 64 32 1 3482-3545 64 32 1 3546-3609 64 32 1 3610-3673 64 32 1 RANGE VALS_IN_RANGE FREQUENCY ENDPOINT_REPEAT_COUNT -------------------- ------------- ---------- --------------------- 3674-3737 64 32 1 3738-3801 64 32 1 3802-3865 64 32 1 3866-3929 64 32 1 3930-3991 62 31 1 3992-4055 64 32 1 4056-4119 64 32 1 4120-4183 64 32 1 4184-4247 64 32 1 4248-4311 64 32 1 4312-4375 64 32 1 RANGE VALS_IN_RANGE FREQUENCY ENDPOINT_REPEAT_COUNT -------------------- ------------- ---------- --------------------- 4376-4439 64 32 1 4440-4503 64 32 1 4504-4567 64 32 1 4568-4631 64 32 1 4632-4693 62 31 1 4694-4757 64 32 1 4758-4821 64 32 1 4822-4885 64 32 1 4886-4949 64 32 1 4950-5013 64 32 1 5014-5077 64 32 1 RANGE VALS_IN_RANGE FREQUENCY ENDPOINT_REPEAT_COUNT -------------------- ------------- ---------- --------------------- 5078-5141 64 32 1 5142-5205 64 32 1 5206-5269 64 32 1 5270-5333 64 32 1 5334-5395 62 31 1 5396-5459 64 32 1 5460-5523 64 32 1 5524-5587 64 32 1 5588-5651 64 32 1 5652-5715 64 32 1 5716-5779 64 32 1 RANGE VALS_IN_RANGE FREQUENCY ENDPOINT_REPEAT_COUNT -------------------- ------------- ---------- --------------------- 5780-5843 64 32 1 5844-5907 64 32 1 5908-5971 64 32 1 5972-6033 62 31 1 6034-6097 64 32 1 6098-6161 64 32 1 6162-6225 64 32 1 6226-6289 64 32 1 6290-6353 64 32 1 6354-6417 64 32 1 6418-6481 64 32 1 RANGE VALS_IN_RANGE FREQUENCY ENDPOINT_REPEAT_COUNT -------------------- ------------- ---------- --------------------- 6482-6545 64 32 1 6546-6609 64 32 1 6610-6673 64 32 1 6674-6735 62 31 1 6736-6799 64 32 1 6800-6863 64 32 1 6864-6927 64 32 1 6928-6991 64 32 1 6992-7055 64 32 1 7056-7119 64 32 1 7120-7183 64 32 1 RANGE VALS_IN_RANGE FREQUENCY ENDPOINT_REPEAT_COUNT -------------------- ------------- ---------- --------------------- 7184-7247 64 32 1 7248-7311 64 32 1 7312-7373 62 31 1 7374-7437 64 32 1 7438-7501 64 32 1 7502-7565 64 32 1 7566-7629 64 32 1 7630-7693 64 32 1 7694-7757 64 32 1 7758-7821 64 32 1 7822-7885 64 32 1 RANGE VALS_IN_RANGE FREQUENCY ENDPOINT_REPEAT_COUNT -------------------- ------------- ---------- --------------------- 7886-7949 64 32 1 7950-8013 64 32 1 8014-8075 62 31 1 8076-8139 64 32 1 8140-8203 64 32 1 8204-8267 64 32 1 8268-8331 64 32 1 8332-8395 64 32 1 8396-8459 64 32 1 8460-8523 64 32 1 8524-8587 64 32 1 RANGE VALS_IN_RANGE FREQUENCY ENDPOINT_REPEAT_COUNT -------------------- ------------- ---------- --------------------- 8588-8651 64 32 1 8652-8715 64 32 1 8716-8777 62 31 1 8778-8841 64 32 1 8842-8905 64 32 1 8906-8969 64 32 1 8970-9033 64 32 1 9034-9097 64 32 1 9098-9161 64 32 1 9162-9225 64 32 1 9226-9289 64 32 1 RANGE VALS_IN_RANGE FREQUENCY ENDPOINT_REPEAT_COUNT -------------------- ------------- ---------- --------------------- 9290-9353 64 32 1 9354-9415 62 31 1 9416-9479 64 32 1 9480-9543 64 32 1 9544-9607 64 32 1 9608-9671 64 32 1 9672-9735 64 32 1 9736-9799 64 32 1 9800-9863 64 32 1 9864-9927 64 32 1 9928-9991 64 32 1 RANGE VALS_IN_RANGE FREQUENCY ENDPOINT_REPEAT_COUNT -------------------- ------------- ---------- --------------------- 9992-9999 8 4 1 SQL>
The ENDPOINT_REPEAT_COUNT
column represents the number of times the highest value in the bucket was repeated. Not surprisingly, where a bucket represents a single value the ENDPOINT_REPEAT_COUNT
matches the frequency. The combination of the ENDPOINT_REPEAT_COUNT
and the range of values in the bucket allow the optimizer to calculate the density of the values in the bucket, which allows a greater level of precision while calculating the cardinality.
Histogram Creation
There are several ways that histograms can be created, including the following.
- The default setting of the
METHOD_OPT
parameter isSIZE AUTO
. The predicates used by queries are logged in theSYS.COL_USAGE$
table. The next time statistics are gathered the columns logged in theSYS.COL_USAGE$
table are checked to see if histograms are necessary. - The
METHOD_OPT
parameter can be used to manually set the columns which require histograms and the bucket size. For example, "METHOD_OPT=>'FOR COLUMNS RECORD_TYPE SIZE 10'
"
VARCHAR2 Columns
Histograms for VARCHAR2
columns are essentially the same as for numeric types with one notable exception. Prior to 12c, if the first 32 bytes of a string are identical, they will be seen as the same value and placed in the same bucket. From 12c onward the first 64 bytes are considered.
For more information see:
- Histograms
- Cost-Based Optimizer (CBO) And Database Statistics
- Statistics Collection Enhancements in Oracle Database 12c Release 1 (12.1)
Hope this helps. Regards Tim...