Webinar Followup (Nov. 12) – In Search of Plan Stability – Part 2

Sorry for the delay in getting this posted, but thanks to everyone who attended my November 12th webinar entitled In Search of Plan Stability - Part 2. You can download the presentation materials from these links:

Presentation PDF
Recording







Thanks!

Oracle Database 12c : EXPAND_SQL_TEXT, APPROX_COUNT_DISTINCT, Session Sequences and Temporary Undo

While I was away on the OTN APAC tour, I wasn’t really able to sink my teeth into anything, so inspired by Connor McDonald‘s session in Perth, I decided to write up a few bits and pieces in my free moments in the hotels and planes (when I had enough elbow room to type).

Here is what I’ve neatened up so far.

I’ve got a couple of other things that are part done, which will no doubt be coming out over this week…

Cheers

Tim…


Oracle Database 12c : EXPAND_SQL_TEXT, APPROX_COUNT_DISTINCT, Session Sequences and Temporary Undo was first posted on November 23, 2014 at 2:21 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

OTN APAC tour 2014 blog posts compilation

So much has been written about all the exciting events of the 2014 OTN Asia-Pacific tour. It just feels right to compile all of them into one link list. I am sorry if I missed a post, please ping me and I promise I will add and fix that.

General posts before the tour

singapore supertrees and MBS from satay by the bay bathroomStop 0 – Singapore

Stop 1 – Perth, Australia

OTN APAC Shanghai posterStop 2 – Shanghai, China

Stop 3 – Tokyo, Japan

OOW swim beach balls in BeiijingStop 4 – Beijing, China

Stop 5 – Bangkok, Thailand

Stop 6 – Auckland, New Zealand

post tour

OTN APAC 2014 : Summary

As you will have seen from my last post, I’m back in Birmingham after completing the OTN APAC 2014 tour. Here is a list of all the posts I wrote over the course of the tour.

At just short of 3 weeks, that was a really long tour, most of which seemed to be in planes. If someone could hurry up and invent a teleport system I would be most grateful!

I feel like the tour went really well. There are a lot of user groups in the Asia Pacific region (see here), whose membership numbers vary considerably. For some of them, overseas speakers are essential to draw people to the events. For others, it is a way of encouraging the members to interact with the English speaking community. Oracle is an American company, so being able to speak English is a distinct advantage if you plan to use Oracle technologies. The bigger user groups, like those in China and Japan, recognise this and encourage people to come to the English sessions.

I’m in a bit of a daze now. I think it’s going to take me a few days to feel vaguely normal again. Back to work tomorrow. :)

Now for the thank you list…

  • Big thanks go out to all the Oracle user groups, companies and individuals that helped make this tour happen (AUSOUG WA, SHOUG, Insight Technology Inc.ACOUG, OUGTH,  NZOUG).
  • Thanks to Francisco for his work in bringing this about.
  • Thanks to all the attendees. Without your support, none of these events would happen.
  • Thanks to all the other speakers for putting up with me.
  • Many thanks to Bjoern for being my PA during the start of the tour. I hardly had to do anything. He planned the lot for me. :)
  • Last, but not least, thanks to the Oracle ACE Program for getting us ACEDs to these events. Your assistance is very welcome!

I hope I get to see you all again soon!

Cheers

Tim…


OTN APAC 2014 : Summary was first posted on November 23, 2014 at 10:46 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

OTN APAC 2014 : Auckland to Birmingham

Everything went smoothly at Auckland airport. Boarding an A380 takes quite a bit of time and I was near the front of the plane, so I was in the last group to board. Fortunately there was loads of room for hand luggage and the seat was pretty good. The people too my right were a couple, so they were leaning together, which gave me more elbow room. The extra room and a plug socket meant I was able to have the laptop open for most of the 4 hour flight.

At Melbourne we had to get off the plane, go through security, then wait to board the same plane in the same seat. My neighbours were also flying through to Dubai, so I had the same folks next to me for that leg of the flight. No matter how good the conditions are in economy class, and they were good on this Emirates flight, sitting in a plane for 14 hours is hell. Eventually I get cabin fever and have a desperate desire to get out. I have thoughts of faking a heart attack and forcing an early landing, just to get out. It’s really not cool. With a little bit of sleep, some re-watching of films and a lot of pacing up and down by the toilets I managed to get through it. The only bit of drama was when I was leaving the plane I realised I had left some things behind. I was allowed back on the plane to get them, so all was good!

After a totally uneventful 2+ hour layover in Dubai, it was back on to a 8 hour flight to Birmingham. By this time in the journey I’m usually kind-of frantic. Not so much thing time. I think my brain was mush by now. This flight didn’t feel so long, but I don;t remember too much about it, apart from ripping one of my finger nails and needing a plaster. I don;t think I slept much. I think I was just kind-of dazed.

I arrived in Birmingham in the early afternoon. After getting my luggage is was a relatively short taxi ride home. At one point we got in a traffic jam and I wanted to die, but then the traffic opened up and I got home. Once home I got a drink, put some washing on, had a bath, then went to bed.It was about 14:00.

Next thing I know, I was having a dream about something biting my finger. I shook my hand to get it off, and knocked my tablet off the bedside table. Freshly awake, I looked at the clock and thought, damn it. It’s 5 o’clock and I’ve slept for only 3 hours. I got up and drove to a store and it was closed. I checked the clock again and I has actually slept 15 hours. It was 5am, not 5pm. I guess I was pretty tired… :)

So now I’m back and I think I’m OK. I have that “aeroplane nose”, which I’m hoping is not a cold coming, and I also have a foggy head, which is I guess a jet-lag thing. I’ve got today off as it’s Sunday, but it’s back to work tomorrow…

My next post will be a wrap-up post…

Cheers

Tim…


OTN APAC 2014 : Auckland to Birmingham was first posted on November 23, 2014 at 9:44 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

VirtualBox 4.3.20

Just got home to find VirtualBox 4.3.20 has been released. I’ve just installed it on MaC (Yosemite), Fedora 20 and Windows 7 with no dramas so far.

The downloads and changelog are in the usual places.

Cheers

Tim…


VirtualBox 4.3.20 was first posted on November 22, 2014 at 4:11 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

New leafmisscount Option in Flex Cluster 12c

In the Oracle RAC previous version we have worked with the option “misscount” of crsctl, which applies for all the servers because there was only one kind of servers, but in the new version of Oracle (12c) we have a new kind of RAC configuration, the “Flex Cluster”. With Flex Cluster we have two kinds of servers “Hub nodes” and “Leaf Nodes”, the Hub Nodes have the same concept of the normal nodes in the previous versions, however the Leaf Nodes are different. I will let Oracle Documentation define these servers:

Hub Nodes are similar to Oracle Grid Infrastructure nodes in an Oracle Clusterware standard Cluster configuration: they are tightly connected, and have direct access to shared storage. In an Oracle Flex Cluster configuration, shared storage can be provisioned to Leaf Nodes independent of the Oracle Grid Infrastructure.

Leaf Nodes are different from standard Oracle Grid Infrastructure nodes, in that they do not require direct access to shared storage, but instead request data through Hub Nodes. Hub Nodes can run in an Oracle Flex Cluster configuration without having any Leaf Nodes as cluster member nodes, but Leaf Nodes must be members of a cluster that includes at least one Hub Node.

Reference: https://docs.oracle.com/database/121/CWADD/bigcluster.htm#CWADD92560

With new kind of servers, we also have a new option to configure the maximum time that a Leaf node can be tolerated with heartbeat issues. This crsctl option is “leafmisscount” and it’s exactly the same than the “misscount” for Hub Nodes, but this is for Leaf Nodes.

Let’s see how does it work:

 My Flex Cluster

Configuration

[root@dg1 ~]# crsctl check cluster -all

**************************************************************

dg1:

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

**************************************************************

dg2:

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

**************************************************************

dg3:

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

**************************************************************

[root@dg1 ~]#

 

[root@dg1 ~]# crsctl get node role status -all

Node ‘dg1′ active role is ‘hub’

Node ‘dg2′ active role is ‘hub’

Node ‘dg3′ active role is ‘leaf’

[root@dg1 ~]#

Checking the current value:

[grid@dg1 ~]$ crsctl get css leafmisscount

CRS-4678: Successful get leafmisscount 30 for Cluster Synchronization Services.

[grid@dg1 ~]$

How does it work with 30 secs:

[grid@dg2 ~]$ date

Fri Nov 14 16:03:54 EST 2014

–Stop connectivity between Leaf Node
and Hub Nodes

[root@dg3 ~]# date

Fri Nov 14 16:04:01 EST 2014

[root@dg3 ~]# crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

[root@dg3 ~]# date

Fri Nov 14 16:04:29 EST 2014

[root@dg3 ~]# crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4530: Communications failure contacting Cluster Synchronization Services daemon

CRS-4534: Cannot communicate with Event Manager

[root@dg3 ~]#

Now let’s increase the value to 600 sec:

[grid@dg1 ~]$ crsctl set css leafmisscount 600

CRS-4684: Successful set of parameter leafmisscount to 600 for Cluster Synchronization Services.

[grid@dg1 ~]$

Let’s confirm the value:

[grid@dg2 ~]$ crsctl get css leafmisscount

CRS-4678: Successful get leafmisscount 600 for Cluster Synchronization Services.

[grid@dg2 ~]$

The maximum value we can specify is 600:

[grid@dg1 ~]$ crsctl set css leafmisscount 3600

CRS-1671: The value for parameter leafmisscount is outside the allowed range of 1 to 600

[grid@dg1 ~]$

Let’s see how does it work with 600:

[root@dg3 ~]# date

Fri Nov 14 16:19:54 EST 2014

–Stop connectivity between leaf
node and hub nodes

[root@dg3 ~]# date

Fri Nov 14 16:24:01 EST 2014

[root@dg3 ~]# crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

[root@dg3 ~]# date

Fri Nov 14 16:25:18 EST 2014

[root@dg3 ~]# crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

[root@dg3 ~]# date

Fri Nov 14 16:27:30 EST 2014

[root@dg3 ~]# crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

[root@dg3 ~]# date

Fri Nov 14 16:28:08 EST 2014

[root@dg3 ~]# crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

[root@dg3 ~]# date

Fri Nov 14 16:29:04 EST 2014

[root@dg3 ~]# crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

[root@dg3 ~]# date

Fri Nov 14 16:29:50 EST 2014

[root@dg3 ~]# crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

[root@dg3 ~]# date

Fri Nov 14 16:30:10 EST 2014

[root@dg3 ~]# crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4530: Communications failure contacting Cluster Synchronization Services daemon

CRS-4534: Cannot communicate with Event Manager

[root@dg3 ~]#

Challenge Me! Capturing SQL Server Data with PowerShell

LaerteHello PowerShell lovers! If you are reading this blog post, then you are likely interested in attending my session at SQLBits (well, so far it is an abstract. Hopefully I will be selected!)

Any data professional who is responsible for performance tuning, has their way of gathering data to troubleshoot and solve issues. But what if I told you there was a different way, a better way, to collect the data you need without consuming additional SQL Server resources?

Attend my session and learn how to leverage PowerShell to collect data from Perfmon, DMVs, and more. This session will feature demos where attendees will learn about some .Net classes to control timer events and the PowerShell Register-Objectevent cmdlet.

Attendees are encouraged to send their challenges to me by email at laertesqldba@outlook.com. I will review their scripts and select some of them to demonstrate in the session by offering an alternative using PowerShell and the event  subsystem.

I will split your challenges into three categories:

  1. Perfmon Counters
  2. DMVs, DMFs, etc.
  3. Other

Using one challenge from each category, here are examples of the kinds challenges I will demonstrate:

 

Example 1: Perfmon Counters

Hello Laerte,

My name is James Tiberius Kirk, I am Captain of the starship USS Enterprise and we are testing a new warp drive and our SQL Server´s servers are very, very busy with that. Our DBAs need a way to collect some performance counters, in an interval of time of 15 minutes and the data stored in a repository for further analysis. This process need to be done remotely and we cannot use any kind of SQL Server resources for that. We don’t want to use any kind of GUI for that as well. The performance counters are:SQLServer: Buffer Manager: Buffer cache hit ratio

SQLServer: Buffer Manager: Page life expectancy
SQLServer: SQL Statistics: Batch Requests/Sec
SQLServer: SQL Statistics: SQL Compilations/Sec
SQLServer: SQL Statistics: SQL Re-Compilations/SecCan you help us?  Thanks!


Example 2: DMVs, DMFs, etc.

Hello Laerte,

My name is Anakyn Skywalker, (a.k.a. Darth Vader, the most powerful Sith Master ever.) We are building a new Death Star and our SQL Server´s servers are extremely busy with inserts and updates. We need to capture and store the fragmentation data from one table called BlowUpThePlanet in a server called DeathStarProd3 Database VaderIsTheBest. As I told you before and I will not say again, it is a very busy server and because of that, the interval of gathering needs to be hourly. I am awaiting your help today. *FORCE CHOKING*


Example 3: Other

Hello Laerte,

My Name is Neo. Everything you know, or you think know, actually does not exist—but we can talk about that later.  We are trying to inject a new code in the Matrix (don’t worry about that right now either) and I need to capture some data. I will send to you the TSQL and all information that I need. Can you help me?  Thanks.

TSQL
Insert into XXXX Select XXXX
Interval – 10 seconds
Server Name – XXX
Database Name – YYY

 

Please send your challenges to laertesqldba@outlook.com. I will select three of them, one from each category and I will demonstrate how to do them using PowerShell without using any kind of SQL Server resources, after  I explain the Timer class and the register-objectevent cmdlet.

That is it! And remember: Always… if it is PowerCool, it is PowerShell.

KeepAlive socket in 12c listener

A not uncommon issue with firewalls and listeners are timeouts. Your production database may be behind a firewall, you may connect from a remote location, even your Windows workstation may have some firewall activated, possibly you use ssh tunnels or TCPS. All those occasionally lead to timeouts and connection abortion, for instance ORA-03113 end-of-file on communication channel, ORA-03135: connection lost contact, TNS-12547 Lost contact.

The good news is that Oracle 12c now implements Socket Options (see man setsockopt), as documented in Net admin new features and more nicely in Note 1591874.1
dcd visualized

I made until now a positive experience with this keepalive behavior, especially with SSL listener. The default value for SQLNET.EXPIRE_TIME is 0, so you must set it to a non-zero value first, the recommended value is 10 (minutes).

Parallel Costs

While creating a POC of a SQL rewrite recently I received a little surprise as I switched my query from serial execution to parallel execution and saw the optimizer’s estimated cost increase dramatically. I’ll explain why in a moment, but it made me think it might be worth setting up a very simple demonstration of the anomaly. I created a table t1 by copying view all_source – which happened to give me a table with about 100,000 rows and 1117 blocks – and then ran the query ‘select max(line) from t1;’ repeating the query with a /*+ parallel(t1 2) */ hint. From 11.2.0.4 here are the two execution plans I got:


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     4 |   153   (4)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     4 |            |          |
|   2 |   TABLE ACCESS FULL| T1   | 99173 |   387K|   153   (4)| 00:00:01 |
---------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |     1 |     4 |  1010   (1)| 00:00:05 |        |      |            |
|   1 |  SORT AGGREGATE        |          |     1 |     4 |            |          |        |      |            |
|   2 |   PX COORDINATOR       |          |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |     1 |     4 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |          |     1 |     4 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR |          | 99173 |   387K|  1010   (1)| 00:00:05 |  Q1,00 | PCWC |            |
|   6 |       TABLE ACCESS FULL| T1       | 99173 |   387K|  1010   (1)| 00:00:05 |  Q1,00 | PCWP |            |
----------------------------------------------------------------------------------------------------------------

It seems that the optimizer thinks that running the query parallel 2 will take five times as long as running it serially ! Is it, perhaps, some cunning fix to the optimizer that is trying to allow for process startup time for the parallel execution slaves ? Is it a bug ? No – it’s just that I hadn’t got around to setting my system stats and they were complete garbage thanks to various other tests I had been running over the previous couple of weeks. Critically, I had not adjusted the “parallel-oriented” statistics to be consistent with the “serial-oriented” stats.

Here, from the 10053 trace file for the parallel run, is the section on the Single Table Access costing, together with the system stats and base statistical information:


SYSTEM STATISTICS INFORMATION
-----------------------------
  Using WORKLOAD Stats
  CPUSPEED: 1110 millions instructions/sec
  SREADTIM: 4.540000 milliseconds
  MREADTIM: 12.440000 millisecons
  MBRC:     21 blocks
  MAXTHR:   3000000 bytes/sec
  SLAVETHR: 1000000 bytes/sec

***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: T1  Alias: T1
    #Rows: 99173  #Blks:  1117  AvgRowLen:  76.00  ChainCnt:  0.00
Access path analysis for T1
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for T1[T1]
  Table: T1  Alias: T1
    Card: Original: 99173.000000  Rounded: 99173  Computed: 99173.00  Non Adjusted: 99173.00
  Access Path: TableScan
    Cost:  151.13  Resp: 1010.06  Degree: 0
      Cost_io: 147.00  Cost_cpu: 20826330
      Resp_io: 1007.76  Resp_cpu: 11570183

I’m going to walk through the optimizer’s calculations that got the serial I/O cost (cost_io: 147.00) and the parallel I/O cost (Resp_io: 1007.76), but before I do that I’ll point out how inconsistent some of the system stat are. The multiblock read time (mreadtim) is 12.44 milliseconds, to read an average of 21 blocks (mbrc) which, converting to bytes per second means (21 * 8192) * 1000/12.44 = 13,828,938 bytes per second; but the I/O rate for a single parallel execution slave (slavethr) is only 1,000,000 bytes per second – which means a single serial session can (apparently) tablescan nearly 14 times as fast as an individual parallel execution slave. It’s not surprising that somehow the optimizer thinks a serial tablescan will be faster than parallel 2) – but let’s check exactly how the arithmetic falls out.

Serial:

  • #Blks: 1117, MBRC = 21 => read requests = 1117/21 = 53.19
  • sreadtim = 4.54 milliseconds, mreadtim = 12.44 milliseconds = 2.74 * sreadtim
  • Cost = 53.19 * 2.74 + 1 (_tablescan_cost_plus_one = true) = 146.74 — which looks close enough.

Parallel:

  • #Blks: 1117, block size = 8KB => total I/O requirement = 9,150,464 bytes
  • slavethr: 1,000,000 bytes/sec, degree 2 => total throughput 2,000,000 bytes/sec => elapsed I/O time = 4.575232 seconds
  • sreadtim = 4.54 milliseconds => cost = 4.575232 / 0.00454 = 1007.76 QED.

Two final thoughts:

First, if I increase the degree of parallelism to 3 the cost drops to 673 (671.84 plus a bit of CPU); if I increase the degree any further the cost doesn’t drop any further – that’s because the maxthr (maximum throughput) is 3,000,000. The optimizer uses the smaller of maxthr and (degree * slavethr) in the parallel arithmetic.

Secondly, and most importantly, the way the optimizer behaves will be affected by the various parameters relating to parallelism, so here’s a list of the relevant settings on the instance I was using when creating this example:


SQL> show parameter parallel

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
fast_start_parallel_rollback         string      LOW
parallel_adaptive_multi_user         boolean     TRUE
parallel_automatic_tuning            boolean     FALSE
parallel_degree_limit                string      CPU
parallel_degree_policy               string      MANUAL
parallel_execution_message_size      integer     16384
parallel_force_local                 boolean     FALSE
parallel_instance_group              string
parallel_io_cap_enabled              boolean     FALSE
parallel_max_servers                 integer     80
parallel_min_percent                 integer     0
parallel_min_servers                 integer     0
parallel_min_time_threshold          string      AUTO
parallel_server                      boolean     FALSE
parallel_server_instances            integer     1
parallel_servers_target              integer     32
parallel_threads_per_cpu             integer     2
recovery_parallelism                 integer     0

Note, particularly, that I have not enabled parallel_automatic_tuning.

For further details on parallel costs and the cost/time equivalence that the optimizer uses in recent versions, here’s a link to an article by Randolf Geist.