Questions about the concept of Index [Fast] Full scans

All posts relating to Oracle PL/SQL development.

Moderator: Tim...

Questions about the concept of Index [Fast] Full scans

Postby dariyoosh » Thu May 23, 2013 8:12 am

Hello Tim,

Oracle Version: 11gR2 (11.2.0.1.0) - 64bit
OS: Linux Fedora Core 17 (x86_64)

Currently, I'm reading the online Database Concepts book in order to understand better how things work in oracle. At the time of writing I'm reading Chapter 3: 3 Indexes and Index-Organized Tables. There are some concepts which I think I need your help to understand and I appreciate if you could kindly give me a hand.

As I understand a B-Tree structure (index) is stored in RAM as a sorted data structure which makes searching much faster.

I'm not sure to understand properly the definition of "Full Index Scan" and "Fast Full Index Scan".

According to the documentation, here is how a "Full Index Scan" has been defined

In a full index scan, the database reads the entire index in order. A full index scan is available if a predicate (WHERE clause) in the SQL statement references a column in the index, and in some circumstances when no predicate is specified. A full scan can eliminate sorting because the data is ordered by index key.

Suppose that an application runs the following query:

Code: Select all
SELECT department_id, last_name, salary
FROM   employees
WHERE  salary > 5000
ORDER BY department_id, last_name;


Also assume that department_id, last_name, and salary are a composite key in an index. Oracle Database performs a full scan of the index, reading it in sorted order (ordered by department ID and last name) and filtering on the salary attribute. In this way, the database scans a set of data smaller than the employees table, which contains more columns than are included in
the query, and avoids sorting the data.


I'm going to write down here, what I understand from the above definition and would like to ask you to kindly read my description in order to see whether my understanding of Index Full Scan is correct.

Description 1:
-----------------------------------
What I understand from the above description is that given a composite index formed by (department_id, last_name, salary), oracle looks to the query and sees that the WHERE clause includes the salary which is one component of the given composite index. So a Index Full Scan is to be done. There is however an ORDER BY clause in the above SQL query (ORDER BY department_id, salary)

this means that the entire index is to be read in ascending order for both department_id and salary.

Consequently here is how I think oracle proceeds:

Step1: starts from the Root of the B-Tree index.

Step2: Because of ORDER BY department_id, last_name in the above SQL query oracle chooses the left-most branch block followed by the left-most leaf block where smallest index data (key, rowId) are stored, I say smallest from an data order point of view (Likewise it would have been Right-most if the ORDER BY clause in the query had been ORDER BY department_id DESC, salary DESC). This takes one logical I/O operation for reaching the root of the B-Tree index and one or several logical I/O operations (depending on the height of the B-Tree) for reaching the left-most and down-most (decision) branch block (so the next level will be the leafs level).

Step3: Again, after one logical I/O operation, oracle arrives finally at the left-most leaf block in the Tree.

Step4: The leaf blocks are double linked (so depending to the current position we may be able to go either to left or right). We're at the left-most leaf block. Each leaf-block is read by a single logical I/O one by one from the left-most leaf block to the right-most leaf-block so N logical I/O operations assuming that there are N leafs block, one for reading the entire of each leaf block.

NOTE1: Because the data in each leaf block is already sorted, there is no need to perform an ORDER BY. Therefore the only use of the ORDER BY clause in the above SQL query (ORDER BY department_id, last_name) is to tell oracle whether the scan is to be started from the left-most leaf block towards the right-most or vice versa. Once this decision has been made, oracle doesn't apply the ORDER BY to the result of the above SQL query as data in the leaf blocks are already sorted. The data is read leaf block by leaf block from left to right and every (key, rowId) where the salary component of the composite key satisfies the predicate (WHERE salary > 5000) is returned.

NOTE2: For a Index Full Scan, all columns of the index must be present in the SELECT statement, otherwise there won't be a Full Index scan

Question:
Do you think that my above mentioned description (Description1) about the Index Full Scan is correct?


Later, I will add to this topic the Index Fast Full Scan, but for now I prefer to concentrate only on the first part.


Thank you very much for your help.

Regards,
Dariyoosh
User avatar
dariyoosh
Member
 
Posts: 24
Joined: Tue Aug 11, 2009 11:18 am

Re: Questions about the concept of Index [Fast] Full scans

Postby Tim... » Thu May 23, 2013 9:49 am

Hi.

You've got a mix of stuff here. Some correct and some pretty wrong. :) Let's break this down.

1) Indexes are physical access structures that are stored in blocks on disk. Just like table data blocks, these blocks can be read from disk into the buffer cache to make access faster, but ultimately, the index is stored on disk.

2) A full index scan involves the the database traversing the index tree from start to finish, so the process is ordered. The links between the leaf blocks allow them to do this quicker than doing a full traverse, but the result is pretty similar.

3) A fast full scan involves pulling back all the blocks of the index, but in any order. This is similar to a full table scan of a table. The order of the blocks is not important, so they can be read from disk (or memory) in any order.

4) In some situations, the use of a specific index operation may implicitly cause data to be ordered, but Oracle will only guarantee the order of results from a query of the query includes an ORDER BY clause. Oracle often change algorithms between versions of the database and sometimes this affects implicit ordering. In Oracle 10g, the GROUP BY processing was changed, which removed the implicit ordering, causing lots of confusion. If data is to be ordered, it should *always* have an ORDER BY. It says so in the docs.

5) The ORDER BY does not determine how an index is accessed. It is a separate operation once the data has been retrieved. It is possible the optimizer may pick a different access path as a result of an ORDER BY being present, but you should really think of this as being a separate step once the final result set has been retrieved. Notice the sort operation in the execution plan

Code: Select all
select * from destination order by id;

        ID     STATUS DESCRIPTION
---------- ---------- ------------------------------------------------------------
         6         10 Description of level 6
         7         20 Description of level 7
         8         10 Description of level 8
         9         20 Description of level 9
        10         10 Description of level 10

5 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1083546623

----------------------------------------------------------------------------------
| Id  | Operation          | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |             |    10 |   290 |     4  (25)| 00:00:01 |
|   1 |  SORT ORDER BY     |             |    10 |   290 |     4  (25)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| DESTINATION |    10 |   290 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------


6) Prior to 9i, Oracle could only use an index if the leading edge of the index was used. So for an index on col1+col2+col3, the index could be used if col1 was referenced in the WHERE clause. If the where clause only contained col2+col3, it would not get used. In 9i oracle introduced index-skip-scanning, to allow indexes to be used, even when they did not hit the leading edge. Although this is now possible, it is rare you see skip scanning in real systems, because it is very costly.

http://www.oracle-base.com/articles/9i/ ... anning.php

With this in mind, you statements about the columns in the index affecting the decision on full or fast scans doesn't really make sense, unless I've misunderstood the point you are trying to make. :)

Cheers

Tim...
Tim...
Oracle ACE Director
Oracle ACE of the Year 2006 - Oracle Magazine Editors Choice Awards
OakTable Member
OCP DBA 7.3, 8, 8i, 9i, 10g, 11g
OCP Advanced PL/SQL Developer
Oracle Database: SQL Certified Expert
My website: http://www.oracle-base.com
My blog: http://www.oracle-base.com/blog
Tim...
Site Admin
 
Posts: 17937
Joined: Mon Nov 01, 2004 5:56 pm
Location: England, UK

Re: Questions about the concept of Index [Fast] Full scans

Postby dariyoosh » Sun May 26, 2013 8:50 pm

Dear Tim,

First of all I would like to thank you very much for your help and the time you spent to give me this detailed and informative description of the problem.


Tim... wrote:1) Indexes are physical access structures that are stored in blocks on disk. Just like table data blocks, these blocks can be read from disk into the buffer cache to make access faster, but ultimately, the index is stored on disk.

Very important point that I didn't know, thanks for this remark. So this means that if a DML (INSERT, UPDATE, DELETE) changes the table, the index will also be changed accordingly and physically on the disk, am I right?

Tim... wrote:3) A fast full scan involves pulling back all the blocks of the index, but in any order. This is similar to a full table scan of a table. The order of the blocks is not important, so they can be read from disk (or memory) in any order.

4) In some situations, the use of a specific index operation may implicitly cause data to be ordered, but Oracle will only guarantee the order of results from a query of the query includes an ORDER BY clause. Oracle often change algorithms between versions of the database and sometimes this affects implicit ordering. In Oracle 10g, the GROUP BY processing was changed, which removed the implicit ordering, causing lots of confusion. If data is to be ordered, it should *always* have an ORDER BY. It says so in the docs.

Ok, it seems to me that here we have the same principle as the Heap Organized tables
http://docs.oracle.com/cd/E11882_01/ser ... m#autoId23
oracle online documentation Database Concept ... wrote:
By default, a table is organized as a heap, which means that the database places rows where they fit best rather than in a user-specified order. Thus, a heap-organized table is an unordered collection of rows. As users add rows, the database places the rows in the first available free space in the data segment. Rows are not guaranteed to be retrieved in the order in which they were inserted.

So as I understand, the same is true for the rows in the index blocks, no ORDER BY clause = no order guarantee for how index leaf blocks are returned (but the data in each leaf block always remains sorted), is that right?

Tim... wrote:2) A full index scan involves the the database traversing the index tree from start to finish, so the process is ordered. The links between the leaf blocks allow them to do this quicker than doing a full traverse, but the result is pretty similar.

Well, this is precisely what is confusing for me. I cannot understand how the process is ordered because according to your notes 3) and 4) we say that the storage is like a heap organized table and we cannot rely on the order of the index as the data blocks may be returned in any order. So how oracle says in the documentation that
. . . A full scan can eliminate sorting because the data is ordered by index key

Ok, I agree that the data is sorted by the index but as you said the leaf blocks may be returned in any order if the ORDER BY clause is absent. So if the ORDER BY is really required to guarantee the final result, I don't understand the online documentation argument indicating that this could be useful as we don't need to sort the data any more, when we put ORDER BY in a SQL query we sort the resultset according to that ORDER BY clause.

Could you kindly elaborate more on this?

Thanks in advance, 8)
(And in particular thanks a lot for the link of your article about Index Skip Scanning, I did the exercise in the article and I saw how oracle behaved differently)

Regards,
Dariyoosh
User avatar
dariyoosh
Member
 
Posts: 24
Joined: Tue Aug 11, 2009 11:18 am

Re: Questions about the concept of Index [Fast] Full scans

Postby Tim... » Mon May 27, 2013 6:57 am

Hi.

1) Yes. All DML against the table requires maintenance of the index blocks, which will eventually be persisted to disk. Remember, Oracle tries to do most work in memory, with a record of the work written to the redo logs in case the instance crashes. It will flush the changed blocks from memory to disk when needed.

3 & 4) The conclusion is correct, but the way you state it may suggest you've missed the point. Indexes are fundamentally different to heap organized tables. The data in a heap organized table is not ordered at all. Even within blocks you can get random ordering. Added to this, during a full table scan, the blocks can be returned in a random order. This is fundamentally different to an b*tree index, where the data within the index blocks is *always* ordered. Having said that, a fast full index scan can retrieve the leaf blocks in any order.

2) No. The storage is not like a heap table, except that the index is stored as blocks on disk. The contents of the blocks is very different. The leaf blocks include a forward and backward pointer, forming a linked list of index blocks. These forward and backwards pointers don't point randomly to the previous and next blocks on disk, but to the previous and next blocks in the order, so they may be scattered all over the disk. The pointers allow the index to be scanned block-to-block, without fully traversing the tree. If you are doing a full scan you fins the start of the index by traversing it, then follow the leaf block pointers to read from the start to the end. Since the pointers are maintained to keep the correct order, the operation is in the correct order...

You: "Ok, I agree that the data is sorted by the index but as you said the leaf blocks may be returned in any order if the ORDER BY clause is absent. So if the ORDER BY is really required to guarantee the final result, I don't understand the online documentation argument indicating that this could be useful as we don't need to sort the data any more, when we put ORDER BY in a SQL query we sort the resultset according to that ORDER BY clause."


No!!!! Read what I said in point.

Me: The ORDER BY does not determine how an index is accessed. It is a separate operation once the data has been retrieved. It is possible the optimizer may pick a different access path as a result of an ORDER BY being present, but you should really think of this as being a separate step once the final result set has been retrieved.


The optimizer makes a choice between full table scan, index range scan, index full scan index fast full scan etc. based on what it thinks will perform best for the query in question. This happens regardless of the presence of an ORDER BY. Adding the ORDER BY will guarantee the data is returned in the correct order. The leaf blocks will be returned to the database in whatever order is dictated by the access path. The order that data is presented to you may be totally different based on the ORDER BY. The optimizer may decide a quick route to get the data is using an index full scan, then totally order it a different way because of an ORDER BY.

I think the thing you have to remember is this. Ultimately the data resides in the table. All queries could be serviced using a full table scan of the table blocks. The whole point of the indexes is to reduce the amount of work done to get the data. This may be using the index to identify the relevant table blocks to read, or actually reading the data from the index blocks themselves if they contain all the necessary columns. But always, this is only done if it makes getting the data faster than accessing it directly from the table.

Cheers

Tim...
Tim...
Oracle ACE Director
Oracle ACE of the Year 2006 - Oracle Magazine Editors Choice Awards
OakTable Member
OCP DBA 7.3, 8, 8i, 9i, 10g, 11g
OCP Advanced PL/SQL Developer
Oracle Database: SQL Certified Expert
My website: http://www.oracle-base.com
My blog: http://www.oracle-base.com/blog
Tim...
Site Admin
 
Posts: 17937
Joined: Mon Nov 01, 2004 5:56 pm
Location: England, UK

Re: Questions about the concept of Index [Fast] Full scans

Postby dariyoosh » Mon May 27, 2013 8:26 am

Hello,


Thanks again for these clarifications

Tim... wrote:The optimizer makes a choice between full table scan, index range scan, index full scan index fast full scan etc. based on what it thinks will perform best for the query in question. This happens regardless of the presence of an ORDER BY. Adding the ORDER BY will guarantee the data is returned in the correct order. The leaf blocks will be returned to the database in whatever order is dictated by the access path. The order that data is presented to you may be totally different based on the ORDER BY. The optimizer may decide a quick route to get the data is using an index full scan, then totally order it a different way because of an ORDER BY.

Ok, I understand, unlike what I was thinking by reading the example on the online documentation the decision of doing an Index Full Scan or an Index Fast Full scan is not dependant on the fact that I put an ORDER BY clause in my SQL query or not. This is something that the optimizer decides based on the context and when it seems to be appropriate (allowing to go faster and improve the performance). This is what led me to error at the beginning: I was thinking that writing an ORDER BY dictates oracle to do an Index Full scan (the internal ordered process, for me, was something related to the ORDER BY present in the SQL query), but now thanks to your description, I know that it is not true and they are two separate concepts. Besides no matter according to what order the leaf blocks are returned, the data inside each leaf block is always ordered.

Tim... wrote:2) No. The storage is not like a heap table, except that the index is stored as blocks on disk. The contents of the blocks is very different. The leaf blocks include a forward and backward pointer, forming a linked list of index blocks. These forward and backwards pointers don't point randomly to the previous and next blocks on disk, but to the previous and next blocks in the order, so they may be scattered all over the disk. The pointers allow the index to be scanned block-to-block, without fully traversing the tree. If you are doing a full scan you fins the start of the index by traversing it, then follow the leaf block pointers to read from the start to the end. Since the pointers are maintained to keep the correct order, the operation is in the correct order...


Ok, just an example, let's say that we have an Articles table including article reference (id), constructor, label, etc. Suppose that a B-Tree index has been created in oracle based on the articles' reference (id). On a piece of paper we can show this for example as bellow (well the structure bellow is just an example):

Code: Select all
                         The root of the
                               index
                       +++++++++++++++++++++
           _________   | id-01 . . . id-09 | ___________
          |            +++++++++++++++++++++            |
          |                                             |
          |                                             |
          |                                             |
          |                                             |
          v                                             v
   Branch block 1                                Branch block 2
+++++++++++++++++++++                         ++++++++++++++++++++                     
| id-01 . . . id-06 | ___________             | id-07 . . . id-09|
+++++++++++++++++++++            |            ++++++++++++++++++++
          |                      |                      |               
          |                      |                      |
          |                      |                      |
          |                      |                      |               
          v                      v                      v
++++++++++++++++++++   ++++++++++++++++++++   ++++++++++++++++++++
|(id-01, rowid-001)|   |(id-04, rowid-017)|   |(id-07, rowid-017)|
|         ^        |   |         ^        |   |         ^        |
|         |        |   |         |        |   |         |        |
|         v        |   |         v        |   |         v        |
|(id-02, rowid-005)|<->|(id-05, rowid-083)|<->|(id-08, rowid-083)|
|         ^        |   |         ^        |   |         ^        |
|         |        |   |         |        |   |         |        |
|         v        |   |         v        |   |         v        |
|(id-03, rowid-012)|   |(id-06, rowid-065)|   |(id-09, rowid-065)|                 
++++++++++++++++++++   ++++++++++++++++++++   ++++++++++++++++++++
   Leaf Block 1             Leaf Block 2            Leaf Block 3


Each leaf block is a pair of (key, rowid) for example (id-01, rowid-001), (id-02, rowid-005), etc. The elements inside each leaf block are ordered and double linked so we can go forward or backward within each leaf block. Likewise leaf blocks themselves are double linked so it is also possible to go forward or backward (depending on how index is being scanned) between each pair of leaf blocks.

This data structure is ordered, so from a data ordering point of view the smallest elements go towards bigger elements from left side to the right side of the tree. Now if you ask me to do an Index Full Scan of this tree by looking at this tree on a piece of paper (so I say a logical reading on a piece of paper and not how it is happening physically in memory) I say:

1) We start at the Root block
2) Then I go to Branch block 1
3) Then I go to Leaf Block 1 this is where I can start reading from the smallest data (from data ordering point of view upon which the index was created). From left towards right I read each element inside the Leaf Block 1. Thanks to the pointers I can continue reading the ordered elements of the Leaf Block 2 and likewise the Leaf Block 3 until the whole index has been scanned.

Fact:
- The data inside each leaf block are ordered,
- In this example Leaf blocks were also read according to the correct order

for example if we assume that
Code: Select all
SELECT *
FROM Articles;

does an Index Full scan based on the articles id column then I receive the data from the smallest id to the biggest one according to the index order. So that's why oracle says that an Index Full Scan can eliminate the need for sorting, the order of the index scan gives me all articles' id as an ordered sequence.

Now the very same schema showed above on a piece of paper can exist on the disk, except that thanks to the pointers between Leaf Blocks and elements inside each leaf block, and the fact that these pointers are correctly/permanently/intelligently maintained by oracle we can have exactly the same logic in terms of reading the tree, although the blocks may be scattered all over the disk. But the final result is the same: an ordered structure (from a logical/data order point of view)

Am I correct? 8)


Regards,
Dariyoosh
User avatar
dariyoosh
Member
 
Posts: 24
Joined: Tue Aug 11, 2009 11:18 am

Re: Questions about the concept of Index [Fast] Full scans

Postby Tim... » Mon May 27, 2013 9:13 am

Hi.

You are nearly correct. There is one point I need to pick up on...

The query,

Code: Select all
SELECT *
FROM Articles;


Is unlikely to do an index full scan as you are requesting all the data from the table, so it is unlikely to give any advantage using an index. In fact, it will add to the load as you will have to read all the leaf blocks of the index, plus all the table blocks, rather than just all the table blocks. Remember, it is all about performance.

So where would the full index scan (fast or not) be useful? Let's say your table had 50 columns, but you had a concatenated index on id+code. These could use the full index scan.

Code: Select all
SELECT id, code FROM articles;
SELECT id, code FROM articles ORDER BY id, code;


The first could use either fast full index scan or full index scan. No order is required so there is no advantage wither way. Although not guaranteed, the second would probably be more efficient to use the ful index scan as the subsequent sort operation to support the order by would not be necessary.

This would not use an index because it would increase the total amount of I/O necessary to get the data.

Code: Select all
SELECT * FROM articles;


Remember: Indexes are all about performance. If you are asking for a large percentage of the rows in a table, an index will probably not be advantageous. If you are asking for a small number of rows, or a subset of the columns that match an index structure, then an index is a good choice.

Cheers

Tim...
Tim...
Oracle ACE Director
Oracle ACE of the Year 2006 - Oracle Magazine Editors Choice Awards
OakTable Member
OCP DBA 7.3, 8, 8i, 9i, 10g, 11g
OCP Advanced PL/SQL Developer
Oracle Database: SQL Certified Expert
My website: http://www.oracle-base.com
My blog: http://www.oracle-base.com/blog
Tim...
Site Admin
 
Posts: 17937
Joined: Mon Nov 01, 2004 5:56 pm
Location: England, UK

Re: Questions about the concept of Index [Fast] Full scans

Postby dariyoosh » Mon May 27, 2013 9:42 am

Dear Tim,

I really would like to thank you for your patience, I think I made you really tired by my numerous questions. Yet it was quite helpful and allowed me to understand important topics that I was struggling to understand.

Thank you very much for your kind help and your time.

Have a nice day! And whenever you write a book about performance for beginners tell me and I'll be the first person to purchase it :D


Regards,
Dariyoosh
User avatar
dariyoosh
Member
 
Posts: 24
Joined: Tue Aug 11, 2009 11:18 am

Re: Questions about the concept of Index [Fast] Full scans

Postby Tim... » Mon May 27, 2013 9:53 am

Hi.

By that time you would no longer be a beginner. :)

Cheers

Tim...
Tim...
Oracle ACE Director
Oracle ACE of the Year 2006 - Oracle Magazine Editors Choice Awards
OakTable Member
OCP DBA 7.3, 8, 8i, 9i, 10g, 11g
OCP Advanced PL/SQL Developer
Oracle Database: SQL Certified Expert
My website: http://www.oracle-base.com
My blog: http://www.oracle-base.com/blog
Tim...
Site Admin
 
Posts: 17937
Joined: Mon Nov 01, 2004 5:56 pm
Location: England, UK


Return to Oracle SQL and PL/SQL Development

Who is online

Users browsing this forum: No registered users and 4 guests

cron