The human being is the only animal that…

Last night, I decided to re-read Stumbling on Happiness, a book I'd discovered a few years ago and was (then) delighted with. 

I chose that over one of my (back then) favorite books of fiction, because I'd been thinking yesterday and how odd it is that lots of left-leaning humans are all upset about climate change and really pissed at their elected officials about their non-action on this literally world-changing issue at a time when radical action is necessary - yet they don't take radical action in their own lives.

It's pretty clear that politicians will not change direction (will not override the influence of the source of their funding), until their constituents demonstrate a deep desire for change, backed up by action.

Anyway, there I was wondering once again about humans and why we behave the way we do. And so I sought out some answers in SoH. After all, the renowned Malcolm Gladwell, author of The Tipping Point, says right on the cover: 
"If you have even the slightest curiosity about the human condition, you ought to read it. Trust me."
OK, so fine. If a person says "trust me", usually you want to run in the other direction. But hey....

So I started reading and soon found Daniel Gilbert talking about psychologists are expected sometime in their career to finish The Sentence that starts with "The human being is the only animal that..." and now it was his turn.

Exciting! And then he finished the sentence:
"The human being is the only animal that thinks about the future."
And then you know what I did?

I stopped the reading book - and tossed it into the recycle bin. Yep, I threw the book away. That's how much Gilbert disgusted me, right then and there.

Why? Because of all the things we know about the world and the way it "works", the one thing we can never know is what another animal - even another human - is actually, truly thinking

All we can know, all we can see, all we can measure, and then draw conclusions from, is how an animal manifests their thinking into the world.

Gilbert cites as one "proof" of his Sentence that squirrels will, ahem, squirrel away nuts in advance of winter even in places where they will then find, winter after winter, that nuts or other food remain abundant. 

Go, Gilbert, go! Apply a human frame of judgement onto other animals, sure, why not? Why not assume that means that squirrels don't think about the future, rather than saying: "Maybe they do think about the future and know that they cannot trust what the future will bring, because they are not willing to destroy forests to build houses to hide them from the vagaries of the future."

So I threw out the book, but that got me thinking about The Sentence. I thought I would offer my own variations on that statement and invite others to do the same. Here goes...

The human being is the only animal that:
  • creates garbage, including vast "islands" of plastic in the middle of our oceans
  • causes the extinction of entire species, year in and year out
  • poisons water, the source of all life on this planet
  • learns multiple languages
  • holds it in
And just to pre-empt some typical responses:

The human being is the not only animal that:
  • creates art - lots of birds do, too. Just check out nests of bowerbirds.
  • has a sense of right and wrong - black bears do, too. Just check out Among the Bears. Seriously: READ THIS BOOK.
  • uses tools - birds, chimpanzees and others repurpose stones, branches, etc. as tools
  • is altruistic - again, black bears, and even more so ants. Many species of ants are way more altruistic than humans.
So what can you think of that only a human does? And please don't tell me about your belief about internal states of mind. That's just an opinion. Tell me about what humans do.



Quiz Night

How many rows will you return from a single table query if you include the predicate


        rownum > 2

in the where clause.

Warning: this IS a catch question

To make it easier and avoid ambiguity, you may assume the table is the standard SCOTT.EMP table.

 

Part 2:

This posting was prompted by noticing a note that Dominic Brooks posted a few months ago.

Can you supply a workaround for the little oddity he’s described.


fsfreeze in Linux

The fsfreeze command, is used to suspend and resume access to a file system. This allows consistent snapshots to be taken of the filesystem. fsfreeze supports Ext3/4, ReiserFS, JFS and XFS.

A filesystem can be frozen using following command:

# /sbin/fsfreeze -u /data

Now if you are writing to this filesystem, the process/command will be stuck. For example, following command will be stuck in D (UNINTERUPTEBLE_SLEEP) state:

# echo “testing” > /data/file

Only after the filesystem is unfreezed using the following command, can it continue:

# /sbin/fsfreeze -u /data

As per the fsfreeze main page, “fsfreeze is unnecessary for device-mapper devices. The device-mapper (and LVM) automatically freezes filesystem on the device when a snapshot creation is requested.”

fsfreeze is provided by the util-linux package in RHEL systems. Along with userspace support, fsfreeze also requires kernel support.

For example, in the following case, fsfreeze was used in the ext4 filesystem of an AWS CentOS node:

# fsfreeze -f /mysql
fsfreeze: /mysql: freeze failed: Operation not supported

From strace we found that ioctl is returning EOPNOTSUPP:

fstat(3, {st_dev=makedev(253, 0), st_ino=2, st_mode=S_IFDIR|0755,
st_nlink=4, st_uid=3076, st_gid=1119, st_blksize=4096, st_blocks=8,
st_size=4096, st_atime=2014/05/20-10:58:56,
st_mtime=2014/11/17-01:39:36, st_ctime=2014/11/17-01:39:36}) = 0
ioctl(3, 0xc0045877, 0) = -1 EOPNOTSUPP (Operation not
supported)

From latest upstream kernel source:

static int ioctl_fsfreeze(struct file *filp)
{
struct super_block *sb = file_inode(filp)->i_sb;if (!capable(CAP_SYS_ADMIN))
return -EPERM;

/* If filesystem doesn’t support freeze feature, return. */
if (sb->s_op->freeze_fs == NULL)
return -EOPNOTSUPP;

/* Freeze */
return freeze_super(sb);
}

EOPNOTSUPP is returned when a filesystem does not support the feature.

On testing to freeze ext4 in CentOs with AWS community AMI, fsfreeze worked fine.

This means that the issue was specific to the kernel of the system. It was found that AMI used to build the system was having a customized kernel without fsfreeze support.

12c shows patch information in alert log

I just noticed on a pretty recent installtion (12.1.0.2) of a database shows information on applied patches in the alert log during instance startup. Usually one had to run opatch to get this kind of information and I think it is neat to have this in the alert log aswell now.

===========================================================
Dumping current patch information
===========================================================
Patch Id: 19769480
Patch Description: Database Patch Set Update : 12.1.0.2.2 (19769480)
Patch Apply Time: 2015-05-14 10:15:32 GMT+02:00
Bugs Fixed: 14643995,16359751,16870214,17835294,18250893,18288842,18354830,
18436647,18456643,18610915,18618122,18674024,18674047,18791688,18845653,
18849537,18885870,18921743,18948177,18952989,18964939,18964978,18967382,
18988834,18990693,19001359,19001390,19016730,19018206,19022470,19024808,
19028800,19044962,19048007,19050649,19052488,19054077,19058490,19065556,
19067244,19068610,19068970,19074147,19075256,19076343,19077215,19124589,
19134173,19143550,19149990,19154375,19155797,19157754,19174430,19174521,
19174942,19176223,19176326,19178851,19180770,19185876,19189317,19189525,
19195895,19197175,19248799,19279273,19280225,19289642,19303936,19304354,
19309466,19329654,19371175,19382851,19390567,19409212,19430401,19434529,
19439759,19440586,19468347,19501299,19518079,19520602,19532017,19561643,
19577410,19597439,19676905,19706965,19708632,19723336,19769480,20074391,
20284155

Patch Id: 20299023
Patch Description: Database Patch Set Update : 12.1.0.2.3 (20299023)
Patch Apply Time: 2015-05-14 10:15:40 GMT+02:00
Bugs Fixed: 16619249,17274537,18202441,18306996,18417036,18607546,18681056,
18856999,18909599,18940497,19012119,19018447,19023822,19035573,19065677,
19081128,19183343,19238590,19272708,19291380,19315691,19335438,19358317,
19385656,19393542,19487147,19512341,19524384,19536415,19547370,19597583,
19606174,19619732,19627012,19637186,19644859,19649152,19658708,19670108,
19684504,19687159,19730508,19791377,19805359,19841800,19865345,19873610,
19896336,19915271,19928926,19978542,20235511,20347562,20348653,20425790,
20440930

Patch Id: 20415564
Patch Description: Database PSU 12.1.0.2.3, Oracle JavaVM Component (Apr2015)
Patch Apply Time: 2015-05-14 10:17:58 GMT+02:00
Bugs Fixed: 19153980,19223010,19231857,19245191,19699946,19855285,19877336,
19895326,19895362,19909862,20408829,20408866,20415564
===========================================================

I do not know when this started but I have not seen this in 12.1.0.1 installations.

Ingest a single table from Microsoft SQL Server Data into Hadoop

Introduction

This blog describes the best-practice approach in regards to the data ingestion from SQL Server into Hadoop. The case scenario is described as under:

  • Single table ingestion (no joins)
  • No partitioning
  • Complete data ingestion (trash old and replace new)
  • Data stored in Parquet format

Pre-requisites

This example has been tested using the following versions:

  • Hadoop 2.5.0-cdh5.3.0
  • Hive 0.13.1-cdh5.3.0
  • Sqoop 1.4.5-cdh5.3.0
  • Oozie client build version: 4.0.0-cdh5.3.0

Process Flow Diagram

process_flow1

Configuration

  • Create the following directory/file structure (one per data ingestion process). For a new ingestion program please adjust the directory/file names as per requirements. Make sure to replace the
    tag with your table name
<table_name>_ingest
+ hive-<table_name>
create-schema.hql
+ oozie-properties
<table_name>.properties
+ oozie-<table_name>-ingest
+ lib
kite-data-core.jar
kite-data-mapreduce.jar
sqljdbc4.jar
coordinator.xml
impala_metadata.sh
workflow.xml
  • The ingestion process is invoked using an oozie workflow. The workflow invokes all steps necessary for data ingestion including pre-processing, ingestion using sqoop and post-processing.
oozie-<table_name>-ingest
This directory stores all files that are required by the oozie workflow engine. These files should be stored in HDFS for proper functioning of oozie
oozie-properties
This directory stores the <table_name>.properties. This file stores the oozie variables such as database users, name node details etc. used by the oozie process at runtime.
hive-<table_name>
This directory stores a file called create-schema.hql  which contains the schema definition of the HIVE tables. This file is required to be run in HIVE only once.
  • Configure files under oozie-<table_name>-ingest
1.   Download kite-data-core.jar and kite-data-mapreduce.jar files from http://mvnrepository.com/artifact/org.kitesdk
2.  Download sqljdbc4.jar from https://msdn.microsoft.com/en-us/sqlserver/aa937724.aspx
3.  Configure coordinator.xml. Copy and paste the following XML.
<coordinator-app name=”<table_name>-ingest-coordinator” frequency=”${freq}” start=”${startTime}” end=”${endTime}” timezone=”UTC” xmlns=”uri:oozie:coordinator:0.2″>
<action>
<workflow>
<app-path>${workflowRoot}/workflow.xml</app-path>
<configuration>
<property>
<name>partition_name</name>
<value>${coord:formatTime(coord:nominalTime(), ‘YYYY-MM-dd’)}</value>
</property>
</configuration>
</workflow>
</action>
</coordinator-app>

4.  Configure workflow.xml. This workflow has three actions:

a) mv-data-to-old – Deletes old data before refreshing new
b) sqoop-ingest-<table_name> – Sqoop action to fetch table from SQL Server
c) invalidate-impala-metadata – Revalidate Impala data after each refresh
Copy and paste the following XML.
<workflow-app name=”<table_name>-ingest” xmlns=”uri:oozie:workflow:0.2″><start to=”mv-data-to-old” /><action name=”mv-data-to-old”>
<fs>
<delete path=’${sqoop_directory}/<table_name>/*.parquet’ />
<delete path=’${sqoop_directory}/<table_name>/.metadata’ />
</fs><ok to=”sqoop-ingest-<table_name>”/>
<error to=”kill”/>
</action><action name=”sqoop-ingest-<table_name>”>
<sqoop xmlns=”uri:oozie:sqoop-action:0.3″>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path=”${nameNode}/user/${wf:user()}/_sqoop/*” />
</prepare><configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration><arg>import</arg>
<arg>–connect</arg>
<arg>${db_string}</arg>
<arg>–table</arg>
<arg>${db_table}</arg>
<arg>–columns</arg>
<arg>${db_columns}</arg>
<arg>–username</arg>
<arg>${db_username}</arg>
<arg>–password</arg>
<arg>${db_password}</arg>
<arg>–split-by</arg>
<arg>${db_table_pk}</arg>
<arg>–target-dir</arg>
<arg>${sqoop_directory}/<table_name></arg>
<arg>–as-parquetfile</arg>
<arg>–compress</arg>
<arg>–compression-codec</arg>
<arg>org.apache.hadoop.io.compress.SnappyCodec</arg>
</sqoop><ok to=”invalidate-impala-metadata”/>
<error to=”kill”/>
</action><action name=”invalidate-impala-metadata”>
<shell xmlns=”uri:oozie:shell-action:0.1″>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node><configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${impalaFileName}</exec>
<file>${impalaFilePath}</file>
</shell>
<ok to=”fini”/>
<error to=”kill”/>
</action>
<kill name=”kill”>
<message>Workflow failed with error message ${wf:errorMessage(wf:lastErrorNode())}</message>
</kill><end name=”fini” /></workflow-app>

5. Configure impala_metadata.sh. This file will execute commands to revalidate impala metadata after each restore. Copy and paste the following data.

#!/bin/bash
export PYTHON_EGG_CACHE=./myeggs
impala-shell -i <hive_server> -q “invalidate metadata <hive_db_name>.<hive_table_name>”
  • Configure files under oozie-properties. Create file oozie.properties with contents as under. Edit the parameters as per requirements.
# Coordinator schedulings
freq=480
startTime=2015-04-28T14:00Z
endTime=2029-03-05T06:00Z
jobTracker=<jobtracker>
nameNode=hdfs://<namenode>
queueName=<queue_name>
rootDir=${nameNode}/user//oozie
workflowRoot=${rootDir}/<table_name>-ingest
oozie.use.system.libpath=true
oozie.coord.application.path=${workflowRoot}/coordinator.xml
# Sqoop settings
sqoop_directory=${nameNode}/data/sqoop
# Hive/Impala Settings
hive_db_name=<hive_db_name>
impalaFileName=impala_metadata.sh
impalaFilePath=/user/oozie/<table_name>-ingest/impala_metadata.sh
#impala_metadata.sh
# MS SQL Server settings
db_string=jdbc:sqlserver://;databaseName=<sql_server_db_name>
db_username=<sql_server_username>
db_password=<sql_server_password>
db_table=<table_name>
db_columns=<columns>
  • Configure files under hive-<table_name>. Create a new file create-schema.hql with contents as under.
DROP TABLE IF EXISTS ;CREATE EXTERNAL TABLE ()
STORED AS PARQUET
LOCATION ‘hdfs:///data/sqoop/<table_name>';

Deployment

  • Create new directory in HDFS and copy files
$ hadoop fs -mkdir /user/<user>/oozie/<table_name>-ingest
$ hadoop fs -copyFromLocal <directory>/<table_name>/oozie-<table_name>-ingest/lib /user/<user>/oozie/ <table_name>-ingest
$ hadoop fs -copyFromLocal <directory>/<table_name>/oozie-<table_name>-ingest/ coordinator.xml /user/<user>/oozie/ <table_name>-ingest
$ hadoop fs -copyFromLocal <directory>/<table_name>/oozie-<table_name>-ingest/ impala_metadata.sh /user/<user>/oozie/<table_name>-ingest
$ hadoop fs -copyFromLocal <directory>/<table_name>/oozie-<table_name>-ingest/ workflow.xml /user/<user>/oozie/ <table_name>-ingest
  • Create new directory in HDFS for storing data files
$ hadoop fs -mkdir /user/SA.HadoopPipeline/oozie/<table_name>-ingest
$ hadoop fs -mkdir /data/sqoop/<table_name>
  • Now we are ready to select data in HIVE. Go to URL http://<hive_server>:8888/beeswax/#query.
a. Choose existing database on left or create new.
b. Paste contents of create-schema.hql in Query window and click Execute.
c. You should now have an external table in HIVE pointing to data in hdfs://<namenode>/data/sqoop/<table_name>
  • Create Oozie job
a. Choose existing database on left or create new.
$ oozie job -run -config /home/<user>/<<directory>/<table_name>/oozie-properties/oozie.properties

Validation and Error Handling

  • At this point an oozie job should be created. To validate the oozie job creation open URL http://<hue_server>:8888/oozie/list_oozie_coordinators. Expected output as under. In case of error please review the logs for recent runs.
 oozie1
  • To validate the oozie job is running open URL http://<hue_server>:8888/oozie/list_oozie_workflows/ . Expected output as under. In case of error please review the logs for recent runs.
 oozie2
  • To validate data in HDFS execute the following command. You should see a file with *.metadata extension and a number of files with *.parquet extension.
$ hadoop fs -ls /data/sqoop/<table_name>/
  • Now we are ready to select data in HIVE or Impala.
    For HIVE go to URL http://<hue_server>:8888/beeswax/#query
    For Impala go to URL http://<hue_server>:8888/impala
    Choose the newly created database on left and execute the following SQL – select * from <hive_table_name> limit 10
    You should see the the data being outputted from the newly ingested data.

Instance stats

While reading a posting by Martin Bach on a new buffering option for 12c I was prompted to take a look at another of his posts on the instance activity stats, which reminded me that the CLASS column on v$statname is a bit flag, which we can dissect using the bitand() function to pick out the statistics that belong to multiple classes. I’ve got 2 or 3 little scripts that do this one, for example, picks out all the statistics relating to RAC, another is just a cross-tab of the class values used and there breakdown by class.  Originally this latter script used the “diagonal” method decode() then sum() – but when the 11g pivot() option appeared I used it as an experiment on pivoting.

This is the script as it now stands, with the output from 12.1.0.2




select
        *
from    (
        select
                st.class,
                pwr.class_id,
                case bitand(st.class, pwr.expn)
                        when 0 then to_number(null)
                               else 1
                end     class_flag
        from
                v$statname      st,
                (select
                        level                   class_id,
                        power(2,level - 1)      expn
                from
                        dual
                connect by level <= 8
                )       pwr
        where
                bitand(class,pwr.expn) = pwr.expn
        )
pivot   (
                sum(class_flag)
        for     class_id in (
                        1 as EndUser,
                        2 as Redo,
                        3 as Enqueue,
                        4 as Cache,
                        5 as OS,
                        6 as RAC,
                        7 as SQL,
                        8 as Debug
                )
        )
order by
        class
;


     CLASS    ENDUSER       REDO    ENQUEUE      CACHE         OS        RAC        SQL      DEBUG
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
         1        130
         2                    68
         4                                9
         8                                         151
        16                                                     16
        32                                                                35
        33          3                                                      3
        34                     1                                           1
        40                                          53                    53
        64                                                                          130
        72                                          15                               15
       128                                                                                     565
       192                                                                            2          2

13 rows selected.

The titles given to the columns come from Martin’s blog, but the definitive set is in the Oracle documentation in the reference manual for v$sysstat. (I’ve changed the first class from “User” to “EndUser” because of a reserved word problem, and I abbreviated the “RAC” class for tidiness.) It’s interesting to note how many of the RAC statistics are also about the Cache layer.


Troubleshooting Target Status Availability Issues

If you’ve been using EM12c (or any of its precursors for that matter), you’d know that it can sometimes be problematic to troubleshoot an availability issue for targets.  You can see they might be up (hopefully!), down, pending, unreachable or showing a metric collection error, but understanding what’s causing that particular status (and indeed why it can sometimes be wrong) can be difficult at times.

Thankfully there’s a couple of Information Publisher reports in EM12c from 12.1.0.3 up (a patch is available to get the functionality in EM 12.1.0.2 see patch# 16457404) to help with troubleshooting these target availability issues.  The two reports are called “Target Status Diagnostics Report: Agent-based targets” and “Target Status Diagnostics Report: Repository-based targets”.  The agent-based targets report is used for diagnosing issues with databases, listeners or ASM environments, while the repository-based targets report is used for clustered targets – the cluster itself, clustered databases, or clustered ASM.  The reports can be useful for issues such as these:

  • Listener status down, pending, metric error, or agent unreachable
  • Database instance status down, pending, metric error, or agent unreachable
  • Database system status down, pending, metric error, or agent unreachable
  • Cluster database status down, pending , or metric error
  • Cluster ASM status down, pending, or metric error
  • ASM Instance status down, pending , metric error, or agent unreachable
  • Automatic Storage Management status down, pending, metric error or  agent unreachable
  • Agent status Agent Unreachable (Under Migration), Agent Unreachable (Cannot Write to File System), Agent Unreachable (Collections Disabled), Agent Unreachable (Disk Full), Agent Unreachable (Post Blackout), Agent Unreachable (Communication Broken), Unreachable (Agent Misconfigured), or Agent Unreachable
  • Agent status Agent Blocked (Blocked Manually), Agent Blocked (Plug-in Mismatch), or Agent Blocked (Bounce Counter Mismatch)
  • Agent status Status Pending (Target Addition in Progress), Status Pending (Post Blackout), Status Pending (Post Metric Error), or Agent status Status Pending

Let’s take a look at an environment where my single instance test database is showing a status of Agent Unreachable, and use the “Target Status Diagnostics Report: Agent-based targets” report to identify why.  Here’s a short video I produced that shows you how to do that:

Hope that helps someone with diagnosing these sorts of target availability issues!

Parallel Query

According to the Oracle Database VLDB and Partitioning Guide (10g version and 11g version):

A SELECT statement can be executed in parallel only if the following conditions are satisfied:

  • The query includes a parallel hint specification (PARALLEL or PARALLEL_INDEX) or the schema objects referred to in the query have a PARALLEL declaration associated with them.
  • At least one table specified in the query requires one of the following:
    • A full table scan
    • An index range scan spanning multiple partitions
  • No scalar subqueries are in the SELECT list.

Note, particularly, that last restriction. I was looking at a query recently that seemed to be breaking this rule, and after examining the 10053 trace file for a while, I decided that I would construct a simplified model of the client’s query to demonstrate how the manuals can tell you the truth while being completely deceptive or (conversely) be wrong while still giving the correct impression. So here’s a query, with execution plan from 11.2.0.4:

select
        /*+ parallel(t1 2) */
        d1.small_vc,
        t1.r1,
        t2.n21,
        t2.v21,
        t3.v31,
        (select max(v1) from ref1 where n1 = t2.n21)    ref_t2,
        (select max(v1) from ref2 where n1 = t1.r1)     ref_t1,
        t1.padding
from
        driver          d1,
        t1, t2, t3
where
        d1.n1 = 1
and     t1.n1 = d1.id
and     t1.n2 = 10
and     t1.n3 = 10
and     t2.id = t1.r2
and     t3.id = t1.r3
;

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |   100 | 15700 |  1340   (3)| 00:00:07 |        |      |            |
|   1 |  SORT AGGREGATE              |          |     1 |    10 |            |          |        |      |            |
|   2 |   TABLE ACCESS BY INDEX ROWID| REF1     |     1 |    10 |     2   (0)| 00:00:01 |        |      |            |
|*  3 |    INDEX UNIQUE SCAN         | R1_PK    |     1 |       |     1   (0)| 00:00:01 |        |      |            |
|   4 |  SORT AGGREGATE              |          |     1 |    10 |            |          |        |      |            |
|   5 |   TABLE ACCESS BY INDEX ROWID| REF2     |     1 |    10 |     2   (0)| 00:00:01 |        |      |            |
|*  6 |    INDEX UNIQUE SCAN         | R2_PK    |     1 |       |     1   (0)| 00:00:01 |        |      |            |
|   7 |  PX COORDINATOR              |          |       |       |            |          |        |      |            |
|   8 |   PX SEND QC (RANDOM)        | :TQ10003 |   100 | 15700 |  1340   (3)| 00:00:07 |  Q1,03 | P->S | QC (RAND)  |
|*  9 |    HASH JOIN                 |          |   100 | 15700 |  1340   (3)| 00:00:07 |  Q1,03 | PCWP |            |
|* 10 |     HASH JOIN                |          |   100 | 14700 |  1317   (3)| 00:00:07 |  Q1,03 | PCWP |            |
|* 11 |      HASH JOIN               |          |   100 | 13300 |  1294   (3)| 00:00:07 |  Q1,03 | PCWP |            |
|  12 |       BUFFER SORT            |          |       |       |            |          |  Q1,03 | PCWC |            |
|  13 |        PX RECEIVE            |          |   100 |  1300 |     4   (0)| 00:00:01 |  Q1,03 | PCWP |            |
|  14 |         PX SEND BROADCAST    | :TQ10000 |   100 |  1300 |     4   (0)| 00:00:01 |        | S->P | BROADCAST  |
|* 15 |          TABLE ACCESS FULL   | DRIVER   |   100 |  1300 |     4   (0)| 00:00:01 |        |      |            |
|  16 |       PX BLOCK ITERATOR      |          |   100 | 12000 |  1290   (3)| 00:00:07 |  Q1,03 | PCWC |            |
|* 17 |        TABLE ACCESS FULL     | T1       |   100 | 12000 |  1290   (3)| 00:00:07 |  Q1,03 | PCWP |            |
|  18 |      BUFFER SORT             |          |       |       |            |          |  Q1,03 | PCWC |            |
|  19 |       PX RECEIVE             |          | 10000 |   136K|    23   (5)| 00:00:01 |  Q1,03 | PCWP |            |
|  20 |        PX SEND BROADCAST     | :TQ10001 | 10000 |   136K|    23   (5)| 00:00:01 |        | S->P | BROADCAST  |
|  21 |         TABLE ACCESS FULL    | T2       | 10000 |   136K|    23   (5)| 00:00:01 |        |      |            |
|  22 |     BUFFER SORT              |          |       |       |            |          |  Q1,03 | PCWC |            |
|  23 |      PX RECEIVE              |          | 10000 |    97K|    23   (5)| 00:00:01 |  Q1,03 | PCWP |            |
|  24 |       PX SEND BROADCAST      | :TQ10002 | 10000 |    97K|    23   (5)| 00:00:01 |        | S->P | BROADCAST  |
|  25 |        TABLE ACCESS FULL     | T3       | 10000 |    97K|    23   (5)| 00:00:01 |        |      |            |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("N1"=:B1)
   6 - access("N1"=:B1)
   9 - access("T3"."ID"="T1"."R3")
  10 - access("T2"."ID"="T1"."R2")
  11 - access("T1"."N1"="D1"."ID")
  15 - filter("D1"."N1"=1)
  17 - filter("T1"."N2"=10 AND "T1"."N3"=10)

Thanks to my hint the query has been given a parallel execution plan – and a check of v$pq_tqstat after running the query showed that it had run parallel. Note, however, where the PX SEND QC and PX COORDINATOR operations appear – lines 7 and 8; and above those lines we see the two scalar subqueries.

We run the basic select statement as a parallel query, but the query co-ordinator serialises on the scalar subqueries in the select list.  Is the manual right, but deceptive, or wrong but giving the right impression ?  Serialising on the scalar subqueries can have a huge impact on the performance and effectively make the query behave like a serial query.

You may recall that an example of this type of behaviour, and its side effects when the scalar subqueries executed as independently as parallel queries, showed up some time ago. At the time I said I would follow up with a note about the change in behaviour in 12c; this seems to be an appropriate moment to show the 12c plan(s), first the default:


----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |   100 | 19100 |  1364   (3)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR              |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)        | :TQ10005 |   100 | 19100 |  1364   (3)| 00:00:01 |  Q1,05 | P->S | QC (RAND)  |
|*  3 |    HASH JOIN BUFFERED        |          |   100 | 19100 |  1364   (3)| 00:00:01 |  Q1,05 | PCWP |            |
|*  4 |     HASH JOIN OUTER          |          |   100 | 18100 |  1340   (3)| 00:00:01 |  Q1,05 | PCWP |            |
|*  5 |      HASH JOIN               |          |   100 | 16400 |  1335   (3)| 00:00:01 |  Q1,05 | PCWP |            |
|*  6 |       HASH JOIN OUTER        |          |   100 | 15000 |  1311   (3)| 00:00:01 |  Q1,05 | PCWP |            |
|*  7 |        HASH JOIN             |          |   100 | 13300 |  1306   (3)| 00:00:01 |  Q1,05 | PCWP |            |
|   8 |         PX RECEIVE           |          |   100 |  1300 |     4   (0)| 00:00:01 |  Q1,05 | PCWP |            |
|   9 |          PX SEND BROADCAST   | :TQ10000 |   100 |  1300 |     4   (0)| 00:00:01 |  Q1,00 | S->P | BROADCAST  |
|  10 |           PX SELECTOR        |          |       |       |            |          |  Q1,00 | SCWC |            |
|* 11 |            TABLE ACCESS FULL | DRIVER   |   100 |  1300 |     4   (0)| 00:00:01 |  Q1,00 | SCWP |            |
|  12 |         PX BLOCK ITERATOR    |          |   100 | 12000 |  1302   (3)| 00:00:01 |  Q1,05 | PCWC |            |
|* 13 |          TABLE ACCESS FULL   | T1       |   100 | 12000 |  1302   (3)| 00:00:01 |  Q1,05 | PCWP |            |
|  14 |        PX RECEIVE            |          |  1000 | 17000 |     5  (20)| 00:00:01 |  Q1,05 | PCWP |            |
|  15 |         PX SEND BROADCAST    | :TQ10001 |  1000 | 17000 |     5  (20)| 00:00:01 |  Q1,01 | S->P | BROADCAST  |
|  16 |          PX SELECTOR         |          |       |       |            |          |  Q1,01 | SCWC |            |
|  17 |           VIEW               | VW_SSQ_1 |  1000 | 17000 |     5  (20)| 00:00:01 |  Q1,01 | SCWC |            |
|  18 |            HASH GROUP BY     |          |  1000 | 10000 |     5  (20)| 00:00:01 |  Q1,01 | SCWC |            |
|  19 |             TABLE ACCESS FULL| REF2     |  1000 | 10000 |     4   (0)| 00:00:01 |  Q1,01 | SCWP |            |
|  20 |       PX RECEIVE             |          | 10000 |   136K|    24   (5)| 00:00:01 |  Q1,05 | PCWP |            |
|  21 |        PX SEND BROADCAST     | :TQ10002 | 10000 |   136K|    24   (5)| 00:00:01 |  Q1,02 | S->P | BROADCAST  |
|  22 |         PX SELECTOR          |          |       |       |            |          |  Q1,02 | SCWC |            |
|  23 |          TABLE ACCESS FULL   | T2       | 10000 |   136K|    24   (5)| 00:00:01 |  Q1,02 | SCWP |            |
|  24 |      PX RECEIVE              |          |  1000 | 17000 |     5  (20)| 00:00:01 |  Q1,05 | PCWP |            |
|  25 |       PX SEND BROADCAST      | :TQ10003 |  1000 | 17000 |     5  (20)| 00:00:01 |  Q1,03 | S->P | BROADCAST  |
|  26 |        PX SELECTOR           |          |       |       |            |          |  Q1,03 | SCWC |            |
|  27 |         VIEW                 | VW_SSQ_2 |  1000 | 17000 |     5  (20)| 00:00:01 |  Q1,03 | SCWC |            |
|  28 |          HASH GROUP BY       |          |  1000 | 10000 |     5  (20)| 00:00:01 |  Q1,03 | SCWC |            |
|  29 |           TABLE ACCESS FULL  | REF1     |  1000 | 10000 |     4   (0)| 00:00:01 |  Q1,03 | SCWP |            |
|  30 |     PX RECEIVE               |          | 10000 |    97K|    24   (5)| 00:00:01 |  Q1,05 | PCWP |            |
|  31 |      PX SEND BROADCAST       | :TQ10004 | 10000 |    97K|    24   (5)| 00:00:01 |  Q1,04 | S->P | BROADCAST  |
|  32 |       PX SELECTOR            |          |       |       |            |          |  Q1,04 | SCWC |            |
|  33 |        TABLE ACCESS FULL     | T3       | 10000 |    97K|    24   (5)| 00:00:01 |  Q1,04 | SCWP |            |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T3"."ID"="T1"."R3")
   4 - access("ITEM_2"(+)="T2"."N21")
   5 - access("T2"."ID"="T1"."R2")
   6 - access("ITEM_1"(+)="T1"."R1")
   7 - access("T1"."N1"="D1"."ID")
  11 - filter("D1"."N1"=1)
  13 - filter("T1"."N2"=10 AND "T1"."N3"=10)

The first thing to note is the location of the PX SEND QC and PX COORDINATOR operations – right at the top of the plan: there’s no serialisation at the query coordinator. THen we spot the views at operations 17 and 27 – VW_SSQ_1, VW_SSQ_2 (would SSQ be “scalar subquery”, perhaps). The optimimzer has unnested the scalar subqueries out of the select list into the join. When a scalar subquery in the select list returns no data it’s value is deemed to be NULL, so the joins (operations 4 and 6) appear as outer joins.

You’ll notice that there are a lot of PX SELECTOR operations – each feeding a PX SEND BROADCAST operations that reports its “IN-OUT” column as S->P (i.e. serial to parallel). Historically a serial to parallel operation started with query coordinator doing the serial bit, but in 12c the optimizer can dictate that one of the PX slaves should take on that task (see Randolf Geist’s post here). Again my code to report v$pq_tqstat confirmed this behaviour in a way that we shall see shortly.

This type of unnesting is a feature of 12c, and in some cases will be very effective. It is a cost-based decision, though, and the optimizer can make mistakes; fortunately we can control the feature. We could simply set the optimizer_features_enable back to 11.2.0.4 (perhaps through the hint) and this would take us back to the original plan, but this isn’t the best option in this case. There is a hidden parameter _optimizer_unnest_scalar_sq enabling the feature so we could, in principle, set the parameter to false; a more appropriate strategy would simply be to tell the optimizer that it should not unnest the subqueries. In my case I could put the /*+ no_unnest */ hint into both the subqueries or use the qb_name() hint to give the two subquery blocks names, and then used the no_unnest hint with the “@my_qb_name” format at the top of the main query. Here’s the execution plan I get whether I use the hidden parameter or the no_unnest mechanim:

-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |          |       |       |  1554 (100)|          |        |      |            |
|   1 |  PX COORDINATOR                 |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)           | :TQ10003 |   100 | 15700 |  1354   (3)| 00:00:01 |  Q1,03 | P->S | QC (RAND)  |
|   3 |    EXPRESSION EVALUATION        |          |       |       |            |          |  Q1,03 | PCWC |            |
|*  4 |     HASH JOIN                   |          |   100 | 15700 |  1354   (3)| 00:00:01 |  Q1,03 | PCWP |            |
|*  5 |      HASH JOIN                  |          |   100 | 14700 |  1330   (3)| 00:00:01 |  Q1,03 | PCWP |            |
|*  6 |       HASH JOIN                 |          |   100 | 13300 |  1306   (3)| 00:00:01 |  Q1,03 | PCWP |            |
|   7 |        BUFFER SORT              |          |       |       |            |          |  Q1,03 | PCWC |            |
|   8 |         PX RECEIVE              |          |   100 |  1300 |     4   (0)| 00:00:01 |  Q1,03 | PCWP |            |
|   9 |          PX SEND BROADCAST      | :TQ10000 |   100 |  1300 |     4   (0)| 00:00:01 |        | S->P | BROADCAST  |
|* 10 |           TABLE ACCESS FULL     | DRIVER   |   100 |  1300 |     4   (0)| 00:00:01 |        |      |            |
|  11 |        PX BLOCK ITERATOR        |          |   100 | 12000 |  1302   (3)| 00:00:01 |  Q1,03 | PCWC |            |
|* 12 |         TABLE ACCESS FULL       | T1       |   100 | 12000 |  1302   (3)| 00:00:01 |  Q1,03 | PCWP |            |
|  13 |       BUFFER SORT               |          |       |       |            |          |  Q1,03 | PCWC |            |
|  14 |        PX RECEIVE               |          | 10000 |   136K|    24   (5)| 00:00:01 |  Q1,03 | PCWP |            |
|  15 |         PX SEND BROADCAST       | :TQ10001 | 10000 |   136K|    24   (5)| 00:00:01 |        | S->P | BROADCAST  |
|  16 |          TABLE ACCESS FULL      | T2       | 10000 |   136K|    24   (5)| 00:00:01 |        |      |            |
|  17 |      BUFFER SORT                |          |       |       |            |          |  Q1,03 | PCWC |            |
|  18 |       PX RECEIVE                |          | 10000 |    97K|    24   (5)| 00:00:01 |  Q1,03 | PCWP |            |
|  19 |        PX SEND BROADCAST        | :TQ10002 | 10000 |    97K|    24   (5)| 00:00:01 |        | S->P | BROADCAST  |
|  20 |         TABLE ACCESS FULL       | T3       | 10000 |    97K|    24   (5)| 00:00:01 |        |      |            |
|  21 |     SORT AGGREGATE              |          |     1 |    10 |            |          |        |      |            |
|  22 |      TABLE ACCESS BY INDEX ROWID| REF1     |     1 |    10 |     2   (0)| 00:00:01 |        |      |            |
|* 23 |       INDEX UNIQUE SCAN         | R1_PK    |     1 |       |     1   (0)| 00:00:01 |        |      |            |
|  24 |     SORT AGGREGATE              |          |     1 |    10 |            |          |        |      |            |
|  25 |      TABLE ACCESS BY INDEX ROWID| REF2     |     1 |    10 |     2   (0)| 00:00:01 |        |      |            |
|* 26 |       INDEX UNIQUE SCAN         | R2_PK    |     1 |       |     1   (0)| 00:00:01 |        |      |            |
-------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T3"."ID"="T1"."R3")
   5 - access("T2"."ID"="T1"."R2")
   6 - access("T1"."N1"="D1"."ID")
  10 - filter("D1"."N1"=1)
  12 - access(:Z>=:Z AND :Z<=:Z)
       filter(("T1"."N2"=10 AND "T1"."N3"=10))
  23 - access("N1"=:B1)
  26 - access("N1"=:B1)

Note particularly that the PX SEND QC and PX COORDINATOR operations are operations 2 and 1,, and we have a new operator EXPRESSSION EVALUATION at operation 3. This has three child operations – the basic select starting at operation 4, and the two scalar subqueries starting at lines 21 and 24. We are operating the scalar subqueries as correlated subqueries, but we don’t leave all the work to the query coordinator – each slave is running its own subqueries before forwarding the final result to the coordinator. There is a little side effect that goes with this change – the “serial to parallel” operations are now, as they always used to be, driven by the query co-ordinator, the PC SELECTOR operations have disappeared.

And finally

Just to finish off, let’s take a look at the results from v$pq_tqstat in 12.1.0.2. First from the default plan with the PX SELECTOR operations. Remember that this turned into a five table join where two of the “tables” where non-correlated aggregate queries against the reference tables.


DFO_NUMBER      TQ_ID SERVER_TYPE     INSTANCE PROCESS           NUM_ROWS      BYTES      WAITS   TIMEOUTS AVG_LATENCY
---------- ---------- --------------- -------- --------------- ---------- ---------- ---------- ---------- -----------
         1          0 Producer               1 P002                   200       2428          0          0           0
                                             1 P003                     0         48          0          0           0
                      Consumer               1 P000                   100       1238         59         27           0
                                             1 P001                   100       1238         41         24           0

                    1 Producer               1 P002                  2000      23830          0          0           0
                                             1 P003                     0         48          0          0           0
                      Consumer               1 P000                  1000      11939         57         26           0
                                             1 P001                  1000      11939         41         24           0

                    2 Producer               1 P002                     0         48          0          0           0
                                             1 P003                 20000     339732          0          0           0
                      Consumer               1 P000                 10000     169890         49         22           0
                                             1 P001                 10000     169890         31         21           0

                    3 Producer               1 P002                     0         48          0          0           0
                                             1 P003                  2000      23830          0          0           0
                      Consumer               1 P000                  1000      11939         58         26           0
                                             1 P001                  1000      11939         38         23           0

                    4 Producer               1 P002                     0         48          0          0           0
                                             1 P003                 20000     239986          0          0           0
                      Consumer               1 P000                 10000     120017         50         22           0
                                             1 P001                 10000     120017         34         21           0

                    5 Producer               1 P000                     1        169          0          0           0
                                             1 P001                     1        169          1          0           0
                      Consumer               1 QC                       2        338          3          0           0

As you read down the table queues you can see that in the first five table queues (0 – 4) we seem to operate parallel to parallel, but only one of the two producers (p002 and p003) produces any data at each state. A more traditional plan would show QC as the single producer in each of these stages.

Now with with scalar subquery unnesting blocked – the plan with the three table join and EXPRESSION EVALUATION, we see the more traditional serial to parallel – the product is QC in all three of the first table queues (the full scan and broadcast of tables t1, t2, and t3).

DFO_NUMBER      TQ_ID SERVER_TYPE     INSTANCE PROCESS           NUM_ROWS      BYTES      WAITS   TIMEOUTS AVG_LATENCY
---------- ---------- --------------- -------- --------------- ---------- ---------- ---------- ---------- -----------
         1          0 Producer               1 QC                     200       1726          0          0           0
                      Consumer               1 P000                   100       1614         28         15           0
                                             1 P001                   100       1614         34         13           0

                    1 Producer               1 QC                   20000     339732          0          0           0
                      Consumer               1 P000                 10000     169866         19         10           0
                                             1 P001                 10000     169866         25          8           0

                    2 Producer               1 QC                   20000     239986          0          0           0
                      Consumer               1 P000                 10000     119993         23         11           0
                                             1 P001                 10000     119993         31         11           0

                    3 Producer               1 P000                     1        155          1          0           0
                                             1 P001                     1        155          0          0           0
                      Consumer               1 QC                       2        310          3          1           0

It’s an interesting point that this last set of results is identical to the set produced in 11g – you can’t tell from v$pq_tqstat whether the parallel slaves or the query co-ordinator executed the subqueries – you have to look at the output from SQL trace (or similar) to see the individual Rowsource Executions Statistics for the slaves and coordinator to see which process actually ran the subqueries.


🎂 10 years 🎂

Thanks to all my readers for being so faithful :)
I’ll post a new solution to calculate factorial.

This is the one I posted 10 years ago :
http://laurentschneider.com/wordpress/2005/05/recursive-sql.html

I also used it in the obfuscation contest


with function f (x number) return number 
is 
begin 
  return case x when 1 then x else x*f(x-1) end; 
end;
select value(t), f(value(t)) 
from table(sys.odcinumberlist(4,6))t

  VALUE(T) F(VALUE(T))
---------- -----------
         4          24
         6         720

It is neither quicker nor shorter than the one I posted in 2005, but it could not have worked in 2005

Efficiency

Here’s a question to which I don’t know the answer, and for which I don’t think anyone is likely to need the answer; but it might be an entertaining little puzzle for thr curious.

Assume you’re doing a full tablescan against a simple heap table of no more than 255 columns (and not using RAC, Exadata, In-memory, etc. etc. etc.), and the query is something like:


select  {columns 200 to 250}
from    t1
where   column_255 = {constant}
;

To test the predicate Oracle has to count its way along each row column by column to find column 255. Will it:

  1. transfer columns 200 to 250 to local memory as it goes, then check column_255 — which seems to be a waste of CPU if the predicate doesn’t evaluate to TRUE
  2. evaluate the predicate, then walk the row again to transfer columns 200 to 250 to local memory — which seems to be a waste of CPU if the predicate evaluates to TRUE
  3. one or other of the above depending on which columns are needed, how long they are, and the selectivity of the predicate

How would you test your hypothesis ?