Why the execution plan changed? Case study
Overview
At some
point in time the execution plan of the SQL_ID etp65bryqtd2d changed.
The new execution plan drops on performance from an Avg. of 0.01 seconds per
execution to an Avg. of 1.5 seconds per execution.
This
increase on elapsed time per execution provoke that the application can’t
process work orders fast enough, the work queue started to drastically grow
leading the application team to escalate a severity 1 issue.
Analysis
Execution
plan changes
The execution
plan changes only when a parse is performed, this because the
_optim_bind_peeking parameter is on default value “TRUE”. A hard parse is
required due to cursor invalidation in the shared pool; a cursor can be
invalidated for some reasons but the most happening is, object change (Index
rebuild, DBMS_STATS, etc.).
Rolling
Cursor Invalidation
Starting
with Oracle10g cursors are marked for rolling invalidation instead of
marked INVALID immediately. On the next execution of the query the Oracle
server will generate a random number between 0 and the value of the
_optimizer_invalidation_period parameter, which has a default value of 18000,
and the cursor will remain valid for this number of seconds. Upon every
following execution Oracle will check if this random selected timeout has
expired. If that is the case then the cursor will be hard parsed again.
Things will
change without notice! Cursor invalidation has changed in the past and the
reason for this change is that previously all dependent cursors were
invalidated immediately after gathering new object statistics. This massive
invalidation might cause a serious degradation in performance right after
statistics gathering due to a high number of hard parses. Rolling cursor
invalidation spreads the invalidation of cursors out over a longer period of
time thereby avoiding the performance degradation caused by hard parsing the
invalidated cursors.
Reference:
Metalink note:557661.1
Bind
variable peeking
The query
optimizer peeks at the values of user-defined bind variables on the first
invocation of a cursor. This feature lets the optimizer determine the
selectivity of any WHERE clause condition, as well as if literals
have been used instead of bind variables. On subsequent invocations of the
cursor, no peeking takes place, and the cursor is shared, based on the
standard cursor-sharing criteria, even if subsequent invocations use
different bind values.
In this
case we can clearly see that the elapsed time per execution is not changing
because of the bind peeking; when Oracle peeks a bad execution plan, all
elapsed times are bad.
CBO
trace event 10053
Behind the
scenes, what is the CBO doing when it comes to how it comes up with an
execution plan? This is where the 10053 trace event comes to play. Other tools
or settings show us WHAT the CBO comes up with; the 10053 setting tells us HOW
the CBO came to its decision (the final execution plan).
conn
/ as sysdba
oradebug
setmypid
oradebug
unlimit
oradebug
event 10053 trace name context forever, level 1
...your
statement here...
oradebug
event 10053 trace name context off
oradebug
tracefile_name
We have
performed this CBO trace when the query was performing bad and when it was
performing well. With this method we have 2 trace files to compare and find out
what is making the CBO to take such bad plan.
Notice that
this method will produce a trace file only in the hard parse stage. So, in
order to force the CBO to hard parse the SQL we have run it with a comment
hint.
variable N1 VARCHAR2(128)
exec :N1 :='125';
select /* comment*/...
After
comparing both trace files we can clearly see when the CBO chose the good plan:
***********************
Best so
far: Table#: 1 cost: 7.3522 card: 3973.0000 bytes: 218515
Table#: 7 cost: 16.2850
card: 9.0178 bytes: 612
Table#: 2 cost: 21.6884
card: 33.7888 bytes: 3094
Table#: 5 cost: 48.9215
card: 3565.0298 bytes: 406410
Table#: 4 cost: 1475.7155 card: 3565.4846 bytes: 556140
Table#: 0 cost: 1483.2886 card: 3559.8976 bytes: 644360
Table#: 6 cost: 2195.7103 card: 3599.4022 bytes: 770186
Table#: 3 cost: 2197.9000 card: 3599.4022 bytes: 820572
***********************
In the other hand, the trace of the bad
plan is having a huge cost:
***********************
Best so far:
Table#: 1 cost: 7.3510 card: 3928.0000 bytes: 216040
Table#: 7 cost: 16.2836
card: 8.9798 bytes: 612
Table#: 2 cost: 21.6870
card: 32.6678 bytes: 3003
Table#: 5 cost: 153.8561 card: 20168.2755 bytes: 2299152
Table#: 4 cost: 8225.5481 card: 20170.8485 bytes: 3146676
Table#: 0 cost: 8233.2391 card: 20139.2414 bytes: 3645159
Table#: 6 cost: 12263.4248 card: 20362.7290 bytes: 4357682
Table#: 3 cost: 12266.6610 card: 20362.7290 bytes: 4642764
***********************
In the trace files we can clearly see that cost of table#5
messed up our total cost. Why is that?
***************
Now joining: TABLE5[TL2]#5
***************
NL Join
Outer table: Card: 32.67 Cost: 21.69
Resp: 21.69 Degree: 1 Bytes: 91
Inner table: TABLE5 Alias: TL2
Access Path:
TableScan
Index: TABLE5_IDX3
resc_io:
20.00 resc_cpu: 360839
ix_sel:
3.9216e-04 ix_sel_with_filters:
3.9216e-04
NL Join: Cost: 153.86 Resp: 153.86
Degree: 1
Cost_io:
152.55 Cost_cpu: 18392646
Resp_io:
152.55 Resp_cpu: 18392646
****** finished trying bitmap/domain indexes ******
Best NL cost: 153.86
resc: 153.86 resc_io: 152.55
resc_cpu: 18392646
resp: 153.86 resp_io: 152.55
resp_cpu: 18392646
Join
Card: 20168.28 = outer (32.67) * inner
(1574304.00) * sel (3.9216e-04)
Join Card
- Rounded: 20168 Computed: 20168.28
Index selectivity on TABLE5_IDX3 is very low (3.9216e-04) compared with the good
plan (6.7020e-05).
Index
selectivity
After looking into the Index selectivity, the question arises; why the
Index selectivity is low?
B*TREE Indexes improve the performance of queries that select a small
percentage of rows from a table. As a general guideline, we should create
indexes on tables that are often queried for less than 15% of the table's
rows. This value may be higher in situations where all data can be
retrieved from an index, or where the indexed columns can be used for joining
to other tables.
The ratio of the number of distinct values in the indexed
column / columns to the number of records in the table represents the
selectivity of an index. The ideal selectivity is 1. Such selectivity
can be reached only by unique indexes on NOT NULL columns.
Selectivity = Distinct Values / Total Number Rows
With this measure we can correlate the importance of the statistics in
the objects. If we gather statistics only on the table, the total number of
rows will change; leading to drastically change the Index selectivity. In the
same manner, if we only gather statistics on the Index, the number of distinct
values will change and the selectivity will dramatically change.
Apart of that, the accuracy of the statistics will also play an important
role. The accurate the statistics, the accurate the Index selectivity.
Suggestions
– How to fix this issue?
The
suggestions came from the percept that Index selectivity is playing a bad game
on this particular SQL_ID.
• As table and its indexes have a close
relationship on how the CBO calculates the cost, we must gather statistics on
the table and its indexes in the same time. This will avoid the Index
selectivity to drastically change.
• When an index is rebuilt, we must
gather statistics of the table too.
• Review the accuracy of the
statistics. In some cases AUTO_SAMPLE_SIZE is not giving a good sample.
Some test
on this SQL_ID have proven that gathering statistics to the TABLE5 table and its
indexes with an estimate percent of 50%, plus invalidating the cursor immediately
solved the issue.
The CBO was
able to pick the good execution plan because the index selectivity was good
enough.
Thanks,
Alfred
Labels: 10053, bind variable, CBO, cursor, DBMS_STATS, execution, Index selectivity, oradebug, Performance, plan, SQL ID, trace, Tuning