Oracle 19c Real-Time and High-Frequency Automatic Statistics Collection

December 3, 2019, 9:16 am

≫ Next: Extended Column Group Statistics, Composite Index Statistics, Histograms and an EDB360 Enhancement to Detect the Coincidence

≪ Previous: Practical Application Performance Tuning: Applying Theory in Practice

I gave this presentation at the UKOUG Techfest 19 conference. This video was produced as a part of the preparation for that session. The slide deck is also available on my website.

It takes a look at the pros and cons of these new 19c features. They are only available on Engineered Systems. Both features aim to address the challenge of using data that has been significantly updated before the statistics maintenance window has run again.

Real-Time Statistics uses table monitoring to augment existing statistics with simple corrections
High-Frequency Automatic Optimizer Statistics is an extra statistics maintenance window that runs regularly to update the stalest statistics.

As your statistics change, so there are opportunities for SQL execution plans, and therefore application performance to change. DBAs and developers need to be aware of the implications.

↧

Extended Column Group Statistics, Composite Index Statistics, Histograms and an EDB360 Enhancement to Detect the Coincidence

December 16, 2019, 2:51 am

≫ Next: Sparse Indexing

≪ Previous: Oracle 19c Real-Time and High-Frequency Automatic Statistics Collection

In this post:

A simple demonstration to show the behaviour of extended statistics and how it can be disabled by the presence of histograms. None of this is new, there are many other blog posts on this topic. I provide links to some of them.
I have added an enhancement to the EDB360 utility to detect histograms on columns in extended statistics.

Introduction

'Extended statistics were introduced in Oracle 11g to allow statistics to be gathered on groups of columns, to highlight the relationship between them, or on expressions. Oracle 11gR2 makes the process of gathering extended statistics for column groups easier'. [Tim Hall: https://oracle-base.com/articles/11g/extended-statistics-enhancements-11gr2]

Example 1: Cardinality from Extended Statistics

Without extended statistics, Oracle will simply multiply column cardinalities together. Here is a simple example. I will create a table with 10000 rows, where two columns each have the same 100 rows of 100 values, so they correlate perfectly. I will gather statistics, but no histograms.

create table t
(k number
,a number
,b number
,x varchar2(1000)
);

insert into t
with n as (select rownum n from dual connect by level <= 100)
select rownum, a.n, a.n, TO_CHAR(TO_DATE(rownum,'J'),'Jsp')
from n a, n b
order by a.n, b.n;

exec dbms_stats.gather_table_stats(user,'T',method_opt=>'FOR ALL COLUMNS SIZE 1');

I will deliberately disable optimizer feedback so that Oracle cannot learn from experience about the cardinality misestimates.

alter session set statistics_level=ALL;
alter session set "_optimizer_use_feedback"=FALSE;

select count(*) from t where a = 42 and b=42;

  COUNT(*)
----------
       100

Oracle estimates that it will get 1 row but actually gets 100.
It estimates 1 because it is 1/100 * 1/100 * 10000 rows

select * from table(dbms_xplan.display_cursor(null,null,format=>'ADVANCED +ALLSTATS LAST, IOSTATS -PROJECTION -OUTLINE'));

Plan hash value: 1071362934
---------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |       |    21 (100)|          |      1 |00:00:00.01 |      73 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |    26 |            |          |      1 |00:00:00.01 |      73 |
|*  2 |   TABLE ACCESS FULL| T    |      1 |      1 |    26 |    21   (0)| 00:00:01 |    100 |00:00:00.01 |      73 |
---------------------------------------------------------------------------------------------------------------------

Now I will create extended statistics on the column group. I can do that in one of two ways:

either by explicitly creating the definition and then creating them by gathering statistics:

SELECT dbms_stats.create_extended_stats(ownname=>user, tabname=>'t', extension=>'(a,b)')
FROM dual;
exec dbms_stats.gather_table_stats(user,'T',method_opt=>'FOR ALL COLUMNS SIZE 1');

Or, I can create extended statistics directly in one go by specifying them in the method opt clause.

exec dbms_stats.gather_table_stats(user,'T',method_opt=>'FOR ALL COLUMNS SIZE 1, FOR COLUMNS SIZE 1 (A,B)');

Now Oracle correctly estimates that the same query will fetch 100 rows because it directly knows the cardinality for the two columns in the query.

Plan hash value: 1071362934
---------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |       |    21 (100)|          |      1 |00:00:00.01 |      73 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |     6 |            |          |      1 |00:00:00.01 |      73 |
|*  2 |   TABLE ACCESS FULL| T    |      1 |    100 |   600 |    21   (0)| 00:00:01 |    100 |00:00:00.01 |      73 |
---------------------------------------------------------------------------------------------------------------------

Example 2: Cardinality from Index Statistics

I can get exactly the same effect by creating an index on the two columns.

drop table t purge;

create table t
(k number
,a number
,b number
,x varchar2(1000)
);

insert into t
with n as (select rownum n from dual connect by level <= 100)
select rownum, a.n, a.n, TO_CHAR(TO_DATE(rownum,'J'),'Jsp')
from n a, n b
order by a.n, b.n;

create index t_ab on t(a,b) compress;

This time I have not collected any statistics on the table. Statistics are automatically collected on the index when it is built. I have used a hint to stop the query using the index to look up the rows, nonetheless, Oracle has correctly estimated that it will get 100 rows because it has used the number of distinct keys from the index statistics.

SQL_ID  711banpfgfa18, child number 0
-------------------------------------
select /*+FULL(t)*/ count(*) from t where a = 42 and b=42

Plan hash value: 1071362934
---------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |       |    21 (100)|          |      1 |00:00:00.01 |      74 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |    26 |            |          |      1 |00:00:00.01 |      74 |
|*  2 |   TABLE ACCESS FULL| T    |      1 |    100 |  2600 |    21   (0)| 00:00:01 |    100 |00:00:00.01 |      74 |
---------------------------------------------------------------------------------------------------------------------

Note, that there is nothing in the execution plan to indicate that the index statistics were used to estimate the number of rows returned!

Example 3: Histograms Disable the Use of Extended Statistics

There have long been blogs that refer to behaviour that Oracle has documented as Bug 6972291: Column group selectivity is not used when there is a histogram on one column:
'As of 10.2.0.4 CBO can use the selectivity of column groups but this option is disabled if there is a histogram defined on any of the columns of the column group.
Note: This fix is disabled by default. To enable the fix set "_fix_control"="6972291:ON"
When ENABLED the code will use multi-column stats regardless of whether there is a histogram on one of the columns or not. When DISABLED (default) CBO will not use multi-column stats if there is a histogram on one of the columns in the column group.'

Christian Antognini, 2014: https://antognini.ch/2014/02/extension-bypassed-because-of-missing-histogram/
Jonathan Lewis, 2012: https://jonathanlewis.wordpress.com/2012/04/11/extended-stats/

Maria Colgan also commented: 'This … was a deliberate design decision to prevent over-estimations when one of the values supplied is ‘out of range’. We can’t ignore the ‘out of range’ scenario just because we have a column group. Extended statistics do not contain the min, max values for the column group so we rely on the individual column statistics to check for ‘out of range’ scenarios like yours. When one of the columns is ‘out of range’, we revert back to the column statistics because we know it is going to generate a lower selectivity range and if one of the columns is ‘out of range’ then the number of rows returned will be lower or none at all, as in your example'

In this example, I explicitly create a histogram on one of the columns in my extended statistics. However, in the real world that can happen automatically if the application references one column and not another.

exec dbms_stats.gather_table_stats(user,'T',method_opt=>'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 100 A, FOR COLUMNS SIZE 1 B (A,B)');

My cardinality estimate goes back to 1 because Oracle does use the extended statistics in the presence of a histogram on any of the constituent columns. Exactly the same happens if the number of distinct values on the combination of columns comes from composite index statistics. A histogram similarly disables their use.

SQL_ID  8trj2kacqhm6f, child number 1
-------------------------------------
select count(*) from t where a = 42 and b=42

Plan hash value: 1071362934
---------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |       |    21 (100)|          |      1 |00:00:00.01 |      73 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |     6 |            |          |      1 |00:00:00.01 |      73 |
|*  2 |   TABLE ACCESS FULL| T    |      1 |      1 |     6 |    21   (0)| 00:00:01 |    100 |00:00:00.01 |      73 |
---------------------------------------------------------------------------------------------------------------------

This is likely to happen in real-life systems because histograms can be automatically created when statistics are gathered.

Out-of-Range Predicates

If you have one or more predicates on columns that are part of an extended statistics, and that predicate goes out of range when compared to the column statistics, then Oracle still doesn’t use the extended statistics (see also https://jonathanlewis.wordpress.com/2012/04/11/extended-stats/), irrespective of whether it has a histogram or not, or whether fix control 6972291 is set or not.
The extended histogram uses a virtual column whose value is derived from SYS_OP_COMBINED_HASH(). You can see this in the default data value for the column. Therefore the optimizer cannot use the minimum/maximum value (see also https://jonathanlewis.wordpress.com/2018/08/02/extended-histograms-2/).
Instead, Oracle does the linear decay of the density of the column predicates, and if there is a frequency or top-frequency histogram then it uses half the density of the lowest frequency bucket and applies linear decay to that.

Example 4: Extended Histograms

This time I will create a histogram on my extended statistics as well as histograms on the underlying columns.

exec dbms_stats.gather_table_stats(user,'T',method_opt=>'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 100 A B (A,B)');

I am back to getting the correct cardinality estimate.

SQL_ID  8trj2kacqhm6f, child number 0
-------------------------------------
select count(*) from t where a = 42 and b=42

Plan hash value: 1071362934
---------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |       |    21 (100)|          |      1 |00:00:00.01 |      73 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |     6 |            |          |      1 |00:00:00.01 |      73 |
|*  2 |   TABLE ACCESS FULL| T    |      1 |    100 |   600 |    21   (0)| 00:00:01 |    100 |00:00:00.01 |      73 |
---------------------------------------------------------------------------------------------------------------------

And this is also something that has been blogged about previously:

Jonathan Lewis, 2018: https://jonathanlewis.wordpress.com/2018/07/31/extended-histograms/

Threats

In this blog, Jonathan Lewis comments (https://jonathanlewis.wordpress.com/2012/04/11/extended-stats/) on certain weaknesses in the implementations. He also references other bloggers.
Either creating or removing histograms on columns in either extended statistics or composite indexes may result in the execution plan changing because those extended statistics may change. This could happen automatically when gathering statistics as data skew and predicate usage changes.
If I drop a composite index, maybe because it is not used, or maybe it is redundant because it is a subset of another index, then I should replace it with an extended histogram on the same set of columns.

Jonathan Lewis, 2015: 'It can be a very good idea to have column group stats on leading subsets of the index columns – especially if you add columns to an existing index definition.' - https://jonathanlewis.wordpress.com/2015/12/29/column-groups-2#comment-85443
Jonathan Lewis, 2016: '… each time you drop[…] an index you really ought to be prepared to create a set of extended stats' - https://jonathanlewis.wordpress.com/2016/12/07/extended-stats-4/
However, there is a limit of 20 column groups per table (or 1 per 10 columns, whichever be the greater, but also counting existing column groups as they are a type of virtual column). So, you can easily run out of extended statistics.

Detecting the Problem

I have added a report to section 3c of EDB360 to detect the problem. The SQL query is shown below. It will report on histograms on columns in either:

composite indexes where there are no extended column group statistics, or
extended column group statistics for which there are no histograms.

3c.25. Columns with Histograms in Extended Statistics (DBA_STAT_EXTENSIONS)

#	Table Owner	Table Name	Object Type	Index/Extension Name	Number of Distinct Values	Number of Buckets	EXTENSION	Column Name	Column Number of Distinct Values	Column Number of Buckets	Column Histogram Type
1	HR	JOB_HISTORY	Index	JHIST_EMP_ID_ST_DATE_PK	10		("EMPLOYEE_ID","START_DATE")	EMPLOYEE_ID	7	7	FREQUENCY
2	OE	INVENTORIES	Index	INVENTORY_IX	1112		("WAREHOUSE_ID","PRODUCT_ID")	PRODUCT_ID	208	208	FREQUENCY
3	OE	ORDER_ITEMS	Index	ORDER_ITEMS_PK	665		("ORDER_ID","LINE_ITEM_ID")	ORDER_ID	105	105	FREQUENCY
4	OE	ORDER_ITEMS	Index	ORDER_ITEMS_UK	665		("ORDER_ID","PRODUCT_ID")	ORDER_ID	105	105	FREQUENCY
5	OE	ORDER_ITEMS	Index	ORDER_ITEMS_UK	665		("ORDER_ID","PRODUCT_ID")	PRODUCT_ID	185	185	FREQUENCY
6	SCOTT	T	Extension	SYS_STUNA$6DVXJXTP05EH56DTIR0X	100	1	("A","B")	A	100	100	FREQUENCY
7	SCOTT	T	Extension	SYS_STUNA$6DVXJXTP05EH56DTIR0X	100	1	("A","B")	B	100	100	FREQUENCY
8	SOE	ORDERS	Index	ORD_WAREHOUSE_IX	10270		("WAREHOUSE_ID","ORDER_STATUS")	ORDER_STATUS	10	10	FREQUENCY
9	SOE	ORDER_ITEMS	Index	ORDER_ITEMS_PK	13758515		("ORDER_ID","LINE_ITEM_ID")	LINE_ITEM_ID	7	7	FREQUENCY

Just because something is reported by this test, does not necessarily mean that you need to change anything.

Providing fix control 6972291 is not enabled, should you wish to drop or alter any reported index, you at least know that it cannot be used to provide column group statistics. Though you would still need to consider SQL that might use the index directly.
You might choose to add column group histograms, and sometimes that will involve adding column statistics. However, the number of distinct values on the column group will usually be higher than on the individual columns and can easily be greater than the number of buckets you can have in a frequency histogram. In such cases, from 12c, you may end up with either a Top-frequency histogram or a hybrid histogram.
Or you might choose to remove the histograms from the individual columns so that the column group statistics are used.
Or you might choose to enforce the status quo, by setting table statistics preferences to ensure currently existing histograms are preserved and currently, non-existent histograms are not introduced.

Whatever you choose to do regarding statistics and histogram collection, I would certainly recommend doing so declaratively, by defining a table statistic preference. For example, here I will preserve the histograms on the columns in the column group, but I will also build a histogram on the column group:

exec dbms_stats.gather_table_stats(user,'T',method_opt=>'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 254 A B (A,B)');

Or, you might even enable the fix_control. You can also do that at session level or even statement level (but beware of disabling any other fix controls that may be set).

SQL_ID  16judk2v0uf7w, child number 0
-------------------------------------
select /*+FULL(t) OPT_PARAM('_fix_control','6972291:on')*/ count(*)
from t where a = 42 and b=42

Plan hash value: 1071362934
---------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |       |    21 (100)|          |      1 |00:00:00.01 |      73 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |     6 |            |          |      1 |00:00:00.01 |      73 |
|*  2 |   TABLE ACCESS FULL| T    |      1 |    100 |   600 |    21   (0)| 00:00:01 |    100 |00:00:00.01 |      73 |
---------------------------------------------------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      OPT_PARAM('_fix_control''6972291:1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1""T"@"SEL$1")
      END_OUTLINE_DATA
  */

EDB360 Test Query

This is the SQL query that produces the report in EDB360.

WITH i as ( /*composite indexes*/
SELECT i.table_owner, i.table_name, i.owner index_owner, i.index_name, i.distinct_keys
,        '('||(LISTAGG('"'||c.column_name||'"',',') WITHIN GROUP (order by c.column_position))||')' column_list
FROM    dba_indexes i
,       dba_ind_columns c
WHERE   i.table_owner = c.table_owner
AND     i.table_name = c.table_name
AND     i.owner = c.index_owner
AND     i.index_name = c.index_name
AND     i.table_name NOT LIKE 'BIN$%'
AND     i.table_owner NOT IN ('ANONYMOUS','APEX_030200','APEX_040000','APEX_040200','APEX_SSO','APPQOSSYS','CTXSYS','DBSNMP','DIP','EXFSYS','FLOWS_FILES','MDSYS','OLAPSYS','ORACLE_OCM','ORDDATA','ORDPLUGINS','ORDSYS','OUTLN','OWBSYS')
AND     i.table_owner NOT IN ('SI_INFORMTN_SCHEMA','SQLTXADMIN','SQLTXPLAIN','SYS','SYSMAN','SYSTEM','TRCANLZR','WMSYS','XDB','XS$NULL','PERFSTAT','STDBYPERF','MGDSYS','OJVMSYS')
GROUP BY i.table_owner, i.table_name, i.owner, i.index_name, i.distinct_keys
HAVING COUNT(*) > 1 /*index with more than one column*/
), e as ( /*extended stats*/
SELECT  e.owner, e.table_name, e.extension_name
,       CAST(e.extension AS VARCHAR(1000)) extension
,       se.histogram, se.num_buckets, se.num_distinct
FROM    dba_stat_extensions e
,       dba_tab_col_statistics se
WHERE   e.creator = 'USER'
AND     se.owner = e.owner
AND     se.table_name = e.table_name
AND     se.column_name = e.extension_name
AND     e.table_name NOT LIKE 'BIN$%'
AND     e.owner NOT IN ('ANONYMOUS','APEX_030200','APEX_040000','APEX_040200','APEX_SSO','APPQOSSYS','CTXSYS','DBSNMP','DIP','EXFSYS','FLOWS_FILES','MDSYS','OLAPSYS','ORACLE_OCM','ORDDATA','ORDPLUGINS','ORDSYS','OUTLN','OWBSYS')
AND     e.owner NOT IN ('SI_INFORMTN_SCHEMA','SQLTXADMIN','SQLTXPLAIN','SYS','SYSMAN','SYSTEM','TRCANLZR','WMSYS','XDB','XS$NULL','PERFSTAT','STDBYPERF','MGDSYS','OJVMSYS')
)
SELECT  e.owner, e.table_name
,       'Extension' object_type
,       e.extension_name object_name, e.num_distinct, e.num_buckets, e.extension
,       sc.column_name
,       sc.num_distinct col_num_distinct
,       sc.num_buckets col_num_buckets
,       sc.histogram col_histogram
FROM    e
,       dba_tab_col_statistics sc
WHERE   e.histogram = 'NONE'
AND     e.extension LIKE '%"'||sc.column_name||'"%'
AND     sc.owner = e.owner
AND     sc.table_name = e.table_name
AND     sc.histogram != 'NONE'
AND     sc.num_buckets > 1 /*histogram on column*/
AND     e.num_buckets = 1 /*no histogram on extended stats*/
UNION ALL
SELECT  /*+  NO_MERGE  */ /* 3c.25 */
        i.table_owner, i.table_name
,       'Index' object_type
,       i.index_name object_name, i.distinct_keys, TO_NUMBER(null), i.column_list
,       sc.column_name
,       sc.num_distinct col_num_distinct
,       sc.num_buckets col_num_buckets
,       sc.histogram col_histogram
From    i
,       dba_ind_columns ic
,       dba_tab_col_statistics sc
WHERE   ic.table_owner = i.table_owner
AND     ic.table_name = i.table_name
AND     ic.index_owner = i.index_owner
AND     ic.index_name = i.index_name
AND     sc.owner = i.table_owner
AND     sc.table_name = ic.table_name
AND     sc.column_name = ic.column_name
AND     sc.histogram != 'NONE'
AND     sc.num_buckets > 1 /*histogram on column*/
AND NOT EXISTS( /*report index if no extension*/
        SELECT 'x'
        FROM    e
        WHERE   e.owner = i.table_owner
        AND     e.table_name = i.table_name
        AND     e.extension = i.column_list)
ORDER BY 1,2,3,4;

↧

Sparse Indexing

January 6, 2020, 9:43 am

≫ Next: Partial Indexing

≪ Previous: Extended Column Group Statistics, Composite Index Statistics, Histograms and an EDB360 Enhancement to Detect the Coincidence

This is the first of two blog posts that discuss sparse and partial indexing.

Problem Statement

It is not an uncommon requirement to find rows that match a rare value in a column with a small number of distinct values. So, the distribution of values is skewed. A typical example is a status column where an application processes newer rows that are a relatively small proportion of the table because over time the majority of rows have been processed and are at the final status.
An index is effective at finding the rare values, but it is usually more efficient to scan the table for the common values. A histogram is would almost certainly be required on such a column. However, if you build an index on the column you have to index all the rows. The index is, therefore, larger and requires more maintenance. Could we not just index the rare values for which we want to use the index to find?

Oracle does not index null values. If we could engineer that the common value was null, and then the index would only contain the rare values. This is sometimes called sparse indexing and is discussed in this blog.
Or we could separate the rare and common values into different index partitions, and build only the index partition(s) for the rare values. This is called partial indexing and is discussed in the next blog.

As usual, this is not a new subject and other people have written extensively on these subjects, and I will provide links. However, I want to draw some of the issues together.

Sparse Indexing

The ideas discussed in this section are based on the principle that Oracle indexes do not include rows where the key values are null.

Store Null Values in the Database?

One option is to engineer the application to use null as the common status value. However, this means that the column in question has to be nullable, and you may require different logic because the comparison to null is always false.

CREATE TABLE t 
(key    NUMBER NOT NULL
,status VARCHAR2(1)
,other  VARCHAR2(1000) 
,CONSTRAINT t_pk PRIMARY KEY(key)
);

INSERT /*+APPEND*/ INTO t 
SELECT rownum
, CASE WHEN rownum<=1e6-42 THEN NULL /*common status*/
       WHEN rownum<=1e6-10 THEN 'A'
       ELSE 'R' END CASE
FROM dual
CONNECT BY level <= 1e6;

CREATE INDEX t_status ON t (status);
exec sys.dbms_stats.gather_table_stats(user,'T',method_opt=>'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 254 status');

I have created a test table with 1000000 rows. 10 rows have status R, and 32 rows have status A. The rest have status NULL. I have indexed the status column, and also created a histogram on it when I collected statistics.

SELECT status, COUNT(*)
FROM   t 
GROUP BY status
/ 

S   COUNT(*)
- ----------
      999958
R         10
A         32

I can see from the statistics that I have 1000000 rows in the primary key index, but only 42 rows in the status index because it only contains the not null values. Therefore, it is much smaller, having only a single leaf block, whereas the primary key index has 1875 leaf blocks.

SELECT index_name, num_rows, leaf_blocks FROM user_indexes WHERE table_name = 'T';

INDEX_NAME   NUM_ROWS LEAF_BLOCKS
---------- ---------- -----------
T_PK          1000000        1875
T_STATUS           42           1

There are some problems with this approach.

Not All Index Columns are Null
If any of the index columns are not null, then there is an entry in the index for the row, and there is no saving of space. It is not uncommon to add additional columns to such an index, either for additional filtering, or to avoid accessing the table by satisfying the query from the index.

CREATE INDEX t_status2 ON t (status,other);
SELECT index_name, num_rows, leaf_blocks FROM user_indexes WHERE table_name = 'T' ORDER BY 1;

INDEX_NAME   NUM_ROWS LEAF_BLOCKS
---------- ---------- -----------
T_PK          1000000        1875
T_STATUS           42           1
T_STATUS2     1000000        9081

Null Logic
If, for example, I want to find the rows that do not have status A, then a simple inequality does not find the null statuses because comparison to null is always false.

SELECT status, COUNT(*)
FROM   t 
WHERE  status != 'A'
GROUP BY status
/

S   COUNT(*)
- ----------
R         10

Instead, I would have to explicitly code for the null values.

SELECT status, COUNT(*)
FROM   t 
WHERE  status != 'A' OR status IS NULL
GROUP BY status
/

S   COUNT(*)
- ----------
      999958
R         10

This additional complexity is certainly one reason why developers shy away from this approach in custom applications. It is almost impossible to retrofit it into an existing or packaged application.

Function-Based Indexes

It is possible to build an index on a function, such that that function to evaluates to null for the common values. This time my test table still has 1,000,000 rows. The status column is now not nullable.

CREATE TABLE t 
(key NUMBER NOT NULL
,status VARCHAR2(1) NOT NULL
,other  VARCHAR2(1000) 
,CONSTRAINT t_pk PRIMARY KEY(key)
)
/
INSERT /*+APPEND*/ INTO t
SELECT rownum
, CASE WHEN rownum<=1e6-42 THEN 'C'
       WHEN rownum<=1e6-10 THEN 'A'
       ELSE 'R' END CASE
, TO_CHAR(TO_DATE(rownum,'J'),'Jsp')
FROM dual
CONNECT BY level <= 1e6;
exec sys.dbms_stats.gather_table_stats(user,'T',method_opt=>'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 254 status');

10 rows have status A, and 32 rows have status O. The rest have status C.

SELECT status, COUNT(*)
FROM   t 
GROUP BY status
/
S   COUNT(*)
- ----------
R         10
C     999958
A         32

I will build a simple index on status, and a second index on a function of status that decodes the common status C back to NULL;

CREATE INDEX t_status ON t (status);
CREATE INDEX t_status_fn ON t (DECODE(status,'C',NULL,status));

As before, with the null column, the function-based index has only a single leaf block, the other indexes are much larger because they contain all 1 million rows.

SELECT index_name, index_type, num_rows, leaf_blocks 
from user_indexes WHERE table_name = 'T' ORDER BY 1;

INDEX_NAME   INDEX_TYPE                    NUM_ROWS LEAF_BLOCKS
------------ --------------------------- ---------- -----------
T_PK         NORMAL                         1000000        1875
T_STATUS     NORMAL                         1000000        1812
T_STATUS_FN  FUNCTION-BASED NORMAL               42           1

If I query the table for the common status, Oracle quite reasonably full scans the table.

SELECT COUNT(other) FROM t WHERE status='C';

COUNT(OTHER)
------------
      999958

Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    55 |  2446   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |    55 |            |          |
|*  2 |   TABLE ACCESS FULL| T    |   999K|    52M|  2446   (1)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("STATUS"='C')

If I query for the rare status value, it will use the normal index to look that up.

SELECT COUNT(other) FROM t WHERE status='R';

COUNT(OTHER)
------------
          10

Plan hash value: 1997248105
-------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |          |     1 |    55 |     4   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE                      |          |     1 |    55 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T        |    10 |   550 |     4   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T_STATUS |    10 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("STATUS"='R')

Now I will make that index invisible, and the optimizer can only choose to full scan the table. It cannot use the function-based index because the query does not match the function.

ALTER INDEX t_status INVISIBLE;
SELECT COUNT(other) FROM t WHERE status='R';

COUNT(OTHER)
------------
          10

Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    55 |  2445   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |    55 |            |          |
|*  2 |   TABLE ACCESS FULL| T    |    10 |   550 |  2445   (1)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("STATUS"='R')

Instead, I must change the query to reference the function in the function-based index, and then the optimizer chooses the function-based index, even if I make the normal index visible again. Note that the function is shown in the access operation in the predicate section.

ALTER INDEX t_status VISIBLE;
SELECT COUNT(other) FROM t WHERE DECODE(status,'C',null,status)='R';

COUNT(OTHER)
------------
          10

Plan hash value: 2511618215
----------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |             |     1 |    55 |     2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE                      |             |     1 |    55 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T           |    21 |  1155 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T_STATUS_FN |    21 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access(DECODE("STATUS",'C',NULL,"STATUS")='R')

Invisible Virtual Columns

Function-based indexes are implemented using an invisible virtual column. You can even reference that virtual column in a query. However, the name of the column is system generated, so you may not want to include it in your application.

SELECT * FROM user_stat_extensions WHERE table_name = 'T';

TABLE_NAME EXTENSION_NAME  EXTENSION                                CREATO DRO
---------- --------------- ---------------------------------------- ------ ---
T          SYS_NC00004$    (DECODE("STATUS",'C',NULL,"STATUS"))     SYSTEM NO

SELECT SYS_NC00004$, COUNT(*) FROM t group by SYS_NC00004$;

S   COUNT(*)
- ----------
      999958
R         10
A         32

Instead, you could create a virtual column and then index it. The resulting index is still function-based because it references the function inside the virtual column. From Oracle 12c it is also possible to make a column invisible. I would recommend doing so in case you have any insert statements without explicit column lists, otherwise, you might get ORA-00947: not enough values.

ALTER TABLE t ADD virtual_status VARCHAR2(1) INVISIBLE
GENERATED ALWAYS AS (DECODE(status,'C',null,status));
CREATE INDEX t_status_virtual ON t (virtual_status);

SELECT index_name, index_type, num_rows, leaf_blocks FROM user_indexes WHERE table_name = 'T' ORDER BY 1;

INDEX_NAME       INDEX_TYPE                    NUM_ROWS LEAF_BLOCKS
---------------- --------------------------- ---------- -----------
T_PK             NORMAL                         1000000        1875
T_STATUS         NORMAL                         1000000        1812
T_STATUS_VIRTUAL FUNCTION-BASED NORMAL               42           1

The only difference between this and previous function-based index example is that now you can control the name of the virtual column, and you can easily reference it in the application.
If you have only ever referenced the virtual column in the application, and never the function, then it is also easy to change the function. Although you would have to rebuild the index.

SELECT COUNT(other) FROM t WHERE virtual_status='R';

COUNT(OTHER)
------------
          10

Plan hash value: 3855131553
---------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                  |     1 |    55 |     2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE                      |                  |     1 |    55 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T                |    21 |  1155 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T_STATUS_VIRTUAL |    21 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("VIRTUAL_STATUS"='R')

If you have already created function-based indexes and referenced the function in the application you can replace them with an index on a named virtual column and the index will still be used.

SELECT COUNT(other) FROM t WHERE DECODE(status,'C',null,status)='R';

COUNT(OTHER)
------------
          10

Plan hash value: 3855131553
---------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                  |     1 |    55 |     2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE                      |                  |     1 |    55 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T                |    21 |  1155 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T_STATUS_VIRTUAL |    21 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("T"."VIRTUAL_STATUS"='R')

Conclusion

A function-based index, preferably on an explicitly named and created virtual column, will permit you to build an index on just the rare values in a column. Making the virtual column invisible will prevent errors during insert statements without explicit column lists. However, you will still need to alter the application SQL to reference either the virtual column or the function that generates it.

↧

Partial Indexing

January 6, 2020, 9:44 am

≫ Next: On-Line Statistics Gathering Disabled by Column Specific METHOD_OPT Table Statistics Preference

≪ Previous: Sparse Indexing

This is the second of two blog posts that discuss sparse and partial indexing.

Problem Statement

(This is the same problem statements as for sparse indexing)
It is not an uncommon requirement to find rows that match a rare value in a column with a small number of distinct values. So, the distribution of values is skewed. A typical example is a status column where an application processes newer rows that are a relatively small proportion of the table because over time the majority of rows have been processed and are at the final status.
An index is effective at finding the rare values, but it is usually more efficient to scan the table for the common values. A histogram is would almost certainly be required on such a column. However, if you build an index on the column you have to index all the rows. The index is, therefore, larger and requires more maintenance. Could we not just index the rare values for which we want to use the index to find?

Oracle does not index null values. If we could engineer that the common value was null, and then the index would only contain the rare values. This is sometimes called sparse indexing and was discussed in the previous blog.
Or we could separate the rare and common values into different index partitions, and build only the index partition(s) for the rare values. This is called partial indexing and is discussed in this blog.

As usual, this is not a new subject and other people have written extensively on these subjects, and I will provide links. However, I want to draw some of the issues together.

Partition Table and Locally Partitioned Partial Index

I could partition the table on the status column. Here, I have used list partitioning, because the common status is between the two rare status, so I only need two partitions not three. From Oracle 12.1, I can specify indexing on and off on the table and certain partitions so that later I can build partial local indexes only on some partitions. See also:

CREATE TABLE t 
(key NUMBER NOT NULL
,status VARCHAR2(1) NOT NULL
,other  VARCHAR2(1000)
,CONSTRAINT t_pk PRIMARY KEY(key)
) INDEXING OFF 
PARTITION BY LIST (status)
(PARTITION t_status_rare   VALUES ('R','A') INDEXING ON
,PARTITION t_status_common VALUES (DEFAULT) 
) ENABLE ROW MOVEMENT
/
INSERT /*+APPEND*/ INTO t --(key, status)
SELECT rownum
, CASE WHEN rownum<=1e6-1000 THEN 'C'
       WHEN rownum<=1e6-10 THEN 'A'
       ELSE 'R' END CASE
, TO_CHAR(TO_DATE(rownum,'J'),'Jsp')
FROM dual
CONNECT BY level <= 1e6;
exec sys.dbms_stats.gather_table_stats(user,'T',method_opt=>'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 254 status');

Here Oracle eliminated the common status partition and only scanned the rare status partition (partition 1). Note that I don't even have an index at this point. So simply partitioning the table can be effective.

SELECT COUNT(other) FROM t WHERE status='R';

COUNT(OTHER)
------------
          10

Plan hash value: 2831600127
-----------------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |     1 |    58 |     4   (0)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE        |      |     1 |    58 |            |          |       |       |
|   2 |   PARTITION LIST SINGLE|      |    10 |   580 |     4   (0)| 00:00:01 |   KEY |   KEY |
|*  3 |    TABLE ACCESS FULL   | T    |    10 |   580 |     4   (0)| 00:00:01 |     1 |     1 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("STATUS"='R')

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("STATUS"='R')

However, now when the application updates the status from R (rare) to C (common) the row must be moved between partitions. It is necessary to enable row movement on the table, otherwise, an error will be generated. However, there is additional overhead in moving the row. It is effectively deleted from one partition and inserted into the other.
In this test, I have increased the frequency of one of the rare statuses. Otherwise, the optimizer determines that it is cheaper just to scan the table partition than use the index!

SELECT status, COUNT(*)
FROM   t 
GROUP BY status
/
S   COUNT(*)
- ----------
R         10
A        990
C     999000

Note that I have already specified INDEXING OFF on the table and INDEXING ON on the rare statuses partition. Now I can just build a locally partitioned partial index.

CREATE INDEX t_status ON t(status) LOCAL INDEXING PARTIAL;

Note that only partition T_STATUS_RARE is physically built, and it only contains a single extent. Partition T_STATUS_COMMON exists, is unusable and the segment has not been physically built. It contains no rows and no leaf blocks.

SELECT partition_name, status, num_rows, leaf_blocks
from user_ind_partitions where index_name = 'T_STATUS';

PARTITION_NAME       STATUS     NUM_ROWS LEAF_BLOCKS
-------------------- -------- ---------- -----------
T_STATUS_COMMON      UNUSABLE          0           0
T_STATUS_RARE        USABLE         1000           2

SELECT segment_name, partition_name, blocks
FROM user_segments WHERE segment_name = 'T_STATUS';

SEGMENT_NAME PARTITION_NAME           BLOCKS
------------ -------------------- ----------
T_STATUS     T_STATUS_RARE                 8

SELECT segment_name, partition_name, segment_type, extent_id, blocks
FROM user_extents WHERE segment_name = 'T_STATUS';

SEGMENT_NAME PARTITION_NAME       SEGMENT_TYPE        EXTENT_ID     BLOCKS
------------ -------------------- ------------------ ---------- ----------
T_STATUS     T_STATUS_RARE        INDEX PARTITION             0          8

Scans for the common status value can only full scan the table partition because there is no index to use.

SELECT COUNT(other) FROM t WHERE status='C';

COUNT(OTHER)
------------
      999000

Plan hash value: 2831600127
-----------------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |     1 |    55 |  2444   (1)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE        |      |     1 |    55 |            |          |       |       |
|   2 |   PARTITION LIST SINGLE|      |   998K|    52M|  2444   (1)| 00:00:01 |   KEY |   KEY |
|*  3 |    TABLE ACCESS FULL   | T    |   998K|    52M|  2444   (1)| 00:00:01 |     2 |     2 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("STATUS"='C')

To query the rare value Oracle does use the index on the rare values partition.

SELECT COUNT(other) FROM t WHERE status='R';

COUNT(OTHER)
------------
          10

Plan hash value: 3051124889
------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |          |     1 |    58 |     2   (0)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE                             |          |     1 |    58 |            |          |       |       |
|   2 |   PARTITION LIST SINGLE                     |          |    10 |   580 |     2   (0)| 00:00:01 |   KEY |   KEY |
|   3 |    TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T        |    10 |   580 |     2   (0)| 00:00:01 |     1 |     1 |
|*  4 |     INDEX RANGE SCAN                        | T_STATUS |    10 |       |     1   (0)| 00:00:01 |     1 |     1 |
------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("STATUS"='R')

However, it is not worth using the index for the slightly more common status A. Here, Oracle full scans the table partition.

SELECT COUNT(other) FROM t WHERE status='A';

COUNT(OTHER)
------------
         990

Plan hash value: 2831600127
-----------------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |     1 |    58 |     4   (0)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE        |      |     1 |    58 |            |          |       |       |
|   2 |   PARTITION LIST SINGLE|      |   990 | 57420 |     4   (0)| 00:00:01 |   KEY |   KEY |
|*  3 |    TABLE ACCESS FULL   | T    |   990 | 57420 |     4   (0)| 00:00:01 |     1 |     1 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("STATUS"='A')

Note that to use partial indexing I also had to partition the table.

Globally Partitioned Index with Zero-Sized Unusable Partitions

Since Oracle 11.2.0.4, it has been possible to achieve the same effect without partitioning the table, thus avoiding the overhead of row movement. See also:

Partially Index a Table – Christian Antognini 2010.
Zero-Size Unusable Indexes and the Query Optimizer– Christian Antognini 2009
Indexing new features oracle 11g release 1 and release 2 - Richard Foote 2015

This feature also worked in earlier versions, but Oracle built a single extent for each unusable partition.

Creating a "Sparse" Index – Hermant Chitale, 2010

Here, I will recreate a non-partitioned table.

CREATE TABLE t 
(key NUMBER NOT NULL
,status VARCHAR2(1) NOT NULL
,other  VARCHAR2(1000)
,CONSTRAINT t_pk PRIMARY KEY(key)
) 
/
INSERT /*+APPEND*/ INTO t
SELECT rownum
, CASE WHEN rownum<=1e6-1000 THEN 'C'
       WHEN rownum<=1e6-10 THEN 'A'
       ELSE 'R' END CASE
, TO_CHAR(TO_DATE(rownum,'J'),'Jsp')
FROM dual
CONNECT BY level <= 1e6;

exec sys.dbms_stats.gather_table_stats(user,'T',method_opt=>'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 254 status');

SELECT status, COUNT(*)
FROM   t 
GROUP BY status
/ 

S   COUNT(*)
- ----------
R         10
C     999000
A        990

It is not possible to create a globally list-partitioned index. Oracle simply does not support it.

CREATE INDEX t_status ON t(status)
GLOBAL PARTITION BY LIST (status, id2)
(PARTITION t_status_common VALUES ('R','A')
,PARTITION t_status_rare   VALUES (DEFAULT)
);

GLOBAL PARTITION BY LIST (status)
                    *
ERROR at line 2:
ORA-14151: invalid table partitioning method

You can create a global range or hash partitioned index. It is unlikely that the hash values of the column will break down conveniently into particular hash values. In this example, I would still have needed to create 4 hash partitions and still build 2 of them.

WITH x as (
SELECT status, COUNT(*) freq
FROM   t 
GROUP BY status
) SELECT x.*
, dbms_utility.get_hash_value(status,0,2)
, dbms_utility.get_hash_value(status,0,4)
FROM x
/ 

S       FREQ DBMS_UTILITY.GET_HASH_VALUE(STATUS,0,2) DBMS_UTILITY.GET_HASH_VALUE(STATUS,0,4)
- ---------- --------------------------------------- ---------------------------------------
R        990                                       1                                       1
C    1009000                                       0                                       0
A         10                                       0                                       2

It is easier to create a globally range partitioned index. Although in my example, the common status lies between the two rare statuses, so I need to create three partitions. I will create the index unusable and build the two rare status partitions.

CREATE INDEX t_status ON t(status)
GLOBAL PARTITION BY RANGE (status)
(PARTITION t_status_rare1  VALUES LESS THAN ('C')
,PARTITION t_status_common VALUES LESS THAN ('D')
,PARTITION t_status_rare2  VALUES LESS THAN (MAXVALUE)
) UNUSABLE;
ALTER INDEX t_status REBUILD PARTITION t_status_rare1;
ALTER INDEX t_status REBUILD PARTITION t_status_rare2;

The index partition for the common status is unusable so Oracle can only full scan the table.

SELECT COUNT(other) FROM t WHERE status='C';

COUNT(OTHER)
------------
      999000

Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    55 |  2445   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |    55 |            |          |
|*  2 |   TABLE ACCESS FULL| T    |   999K|    52M|  2445   (1)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("STATUS"='C')

However, for the rare statuses, Oracle scans the index and looks up each of the table rows.

SELECT COUNT(other) FROM t WHERE status='R';

COUNT(OTHER)
------------
          10

Plan hash value: 2558590380
------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |          |     1 |    55 |     2   (0)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE                       |          |     1 |    55 |            |          |       |       |
|   2 |   PARTITION RANGE SINGLE              |          |    10 |   550 |     2   (0)| 00:00:01 |     3 |     3 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T        |    10 |   550 |     2   (0)| 00:00:01 |       |       |
|*  4 |     INDEX RANGE SCAN                  | T_STATUS |    10 |       |     1   (0)| 00:00:01 |     3 |     3 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("STATUS"='R')

Conclusion

The advantage of this global partitioning approach is that it does not require any change to application code, and it does not involve partitioning the table. However, you will have to remember not to rebuild the unusable partitions, otherwise, they will have to be maintained as the table changes until you make them unusable again and, they will consume space that you will only get back by recreating the entire index.
NB: Partitioning is a licenced option that is only available on the Enterprise Edition of the database.

↧

On-Line Statistics Gathering Disabled by Column Specific METHOD_OPT Table Statistics Preference

January 15, 2020, 3:01 am

≫ Next: Analysing Database Time with Active Session History for Statements with On-line Optimizer Statistics Gathering Operations

≪ Previous: Partial Indexing

I have come across a quirk where the presence of a table statistics preference that specifies METHOD_OPT that is specific to some columns disables on-line statistics gathering. This behaviour is at least not documented. I have reproduced this in Oracle version 12.1.0.2 and 19.3.

Demonstration

I will create two identical tables, but on the first table and specify a table statistic preference to collect a histogram on column C.

set serveroutput on verify on autotrace off
CREATE TABLE t1(a number, b varchar2(1000), c number);
CREATE TABLE t2(a number, b varchar2(1000), c number);
exec dbms_stats.set_table_prefs(user,'t1','METHOD_OPT','FOR ALL COLUMNS SIZE AUTO FOR COLUMNS SIZE 254 C');

Then I will truncate each table, delete any statistics (because truncate does not delete statistics) and then populate the table again in direct-path mode.

TRUNCATE TABLE t1;
EXEC dbms_stats.delete_table_stats(user,'T1');
INSERT /*+APPEND*/ INTO t1 
SELECT ROWNUM a, TO_CHAR(TO_DATE(rownum,'J'),'Jsp') b, CEIL(SQRT(rownum)) c 
FROM dual CONNECT BY level <= 1e5;

TRUNCATE TABLE t2;
EXEC dbms_stats.delete_table_stats(user,'T2');
INSERT /*+APPEND*/ INTO t2
SELECT ROWNUM a, TO_CHAR(TO_DATE(rownum,'J'),'Jsp') b, CEIL(SQRT(rownum)) c 
FROM dual CONNECT BY level <= 1e5;
COMMIT;

I expect to get statistics on both tables.

alter session set nls_date_Format = 'hh24:mi:ss dd/mm/yy';
column table_name format a10
column column_name format a11
SELECT table_name, num_rows, last_analyzed FROM user_tables WHERE table_name LIKE 'T_' ORDER BY 1;
SELECT table_name, column_name, num_distinct, histogram, num_buckets FROM user_tab_columns WHERE table_name LIKE 'T_' ORDER BY 1,2;

But I only get table and column statistics on T2, the one without the statistics preference.

TABLE_NAME   NUM_ROWS LAST_ANALYZED
---------- ---------- -----------------
T1
T2             100000 10:08:30 15/01/20

Table Column
Name  Name   NUM_DISTINCT HISTOGRAM       NUM_BUCKETS
----- ------ ------------ --------------- -----------
T1    A                   NONE
T1    B                   NONE
T1    C                   NONE
T2    A            100000 NONE                      1
T2    B             98928 NONE                      1
T2    C               317 NONE                      1

It appears that I don't get statistics on T1 because I have specified a table statistics preference that is specific to some named columns. It doesn't have to specify creating a histogram, it might be preventing a histogram from being created.
For example, this preference does not disable on-line statistics collection.

EXEC dbms_stats.set_table_prefs(user,'t2','METHOD_OPT','FOR ALL COLUMNS SIZE 1');

But these preferences do disable on-line statistics collection.

EXEC dbms_stats.set_table_prefs(user,'t2','METHOD_OPT','FOR COLUMNS SIZE 1 B C');
EXEC dbms_stats.set_table_prefs(user,'t2','METHOD_OPT','FOR COLUMNS SIZE 1 A B C');

I have not found any other statistics preferences (for other DBMS_STATS parameters) that cause this behaviour.

Conclusion

Table preferences are recommended as a method of controlling statistics collection declaratively and consistently. You don't have to specify parameters to DBMS_STATS into scripts collect statistics ad-hoc. The table statistics preferences provide a method that every time statistics are collected on a particular table, they are collected consistently, albeit in a way that may be different from the default.
However, take the example of an ETL process loading data into a data warehouse. If you rely on on-line statistics gathering to collect table statistics as a part of a data load process, you must now be careful not to disable statistics collection during the load with METHOD_OPT statistics preferences.

↧

Analysing Database Time with Active Session History for Statements with On-line Optimizer Statistics Gathering Operations

January 21, 2020, 9:32 am

≫ Next: Online Statistics Collection during Bulk Loads on Partitioned Tables

≪ Previous: On-Line Statistics Gathering Disabled by Column Specific METHOD_OPT Table Statistics Preference

I have been looking into the performance of on-line statistics collection. When statistics are collected on-line there is an extra OPTIMIZER STATISTICS GATHERING operation in the execution plan. However, I have noticed that the presence or absence of this operation does not change the hash value of the plan. This has consequences for profiling DB time by execution plan line and then describing that line from a captured plan.

OPTIMIZER STATISTICS GATHERING Operation

From 12c, statistics are collected on-line during either a create-table-as-select operation or the initial direct-path insert into a new segment. Below, I have different statements, whose execution plans have the same plan hash value, but actually differ. So, the differences are in areas that do not contribute to the plan hash value.

The first statement performs online statistics gathering, and so the plan includes the OPTIMIZER STATISTICS GATHERING operation, the second does not.
Note also that the statements insert into different tables, and that does not alter the plan hash value either. However, if the data was queried from different tables that would have produced a different plan hash value.

INSERT /*+APPEND PARALLEL(i)*/ into T2 i SELECT * /*+*/ FROM t1 s

Plan hash value: 90348617
---------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                 |      |       |       |   178K(100)|          |       |       |
|   1 |  LOAD AS SELECT                  | T2   |       |       |            |          |       |       |
|   2 |   OPTIMIZER STATISTICS GATHERING |      |   100M|  4005M|   178K  (1)| 00:00:07 |       |       |
|   3 |    PARTITION RANGE ALL           |      |   100M|  4005M|   178K  (1)| 00:00:07 |     1 |1048575|
|   4 |     TABLE ACCESS STORAGE FULL    | T1   |   100M|  4005M|   178K  (1)| 00:00:07 |     1 |1048575|
---------------------------------------------------------------------------------------------------------

INSERT /*+APPEND PARALLEL(i) NO_GATHER_OPTIMIZER_STATISTICS*/ into T3 i
SELECT /*+*/ * FROM t1 s

Plan hash value: 90348617
----------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT            |      |       |       |   178K(100)|          |       |       |
|   1 |  LOAD AS SELECT             | T3   |       |       |            |          |       |       |
|   2 |   PARTITION RANGE ALL       |      |   100M|  4005M|   178K  (1)| 00:00:07 |     1 |1048575|
|   3 |    TABLE ACCESS STORAGE FULL| T1   |   100M|  4005M|   178K  (1)| 00:00:07 |     1 |1048575|
----------------------------------------------------------------------------------------------------

I find that it is often useful to profile database time from DBA_HIST_ACTIVE_SESS_HISTORY (or v$active_session_history) by line in the execution plan, in order to see how much time was consumed by the different operations. I can then join the profile to DBA_HIST_SQL_PLAN (or v$sql_plan) to see what is the operation for each line. So long as I also join these tables by SQL_ID, the answer I get will be correct, but I may not always get an answer.

column inst_id heading 'Inst|Id' format 99
column sql_plan_line_id heading 'SQL Plan|Line ID'
column sql_plan_hash_value heading 'SQL Plan|Hash Value'
column ash_secs heading 'ASH|Secs' format 999
break on sql_id skip 1
with h as (
SELECT h.dbid, h.sql_id, h.sql_plan_line_id, h.sql_plan_hash_Value
,      SUM(10) ash_secs
FROM   dba_hist_Active_Sess_history h
WHERE  h.sql_plan_hash_value = 90348617
AND    h.sql_id IN('g7awpb71jbup1','c2dy3rmnqp7d7','drrbxctf8t5nz','7140frhyu42t5')
GROUP BY h.dbid, h.sql_id, h.sql_plan_line_id, h.sql_plan_hash_Value
)
SELECT h.*, p.operation
FROM   h
  LEFT OUTER JOIN dba_hist_sql_plan p
  ON p.dbid = h.dbid
  and p.sql_id = h.sql_id
  AND p.plan_hash_value = h.sql_plan_hash_value
  AND p.id = h.sql_plan_line_id
ORDER BY 1,2,3
/

If the plan was not captured into AWR or is no longer in the library cache, I don't get a description of the operations in the plan.

                SQL Plan   SQL Plan  ASH
SQL_ID           Line ID Hash Value Secs OPERATION
------------- ---------- ---------- ---- --------------------------------
0s4ruucw2wvsw          0   90348617    4 INSERT STATEMENT
                       1   90348617   77 LOAD AS SELECT
                       2   90348617   25 OPTIMIZER STATISTICS GATHERING
                       3   90348617   11 PARTITION RANGE
                       4   90348617   24 TABLE ACCESS

33x8fjppwh095          0   90348617    2 INSERT STATEMENT
                       1   90348617   89 LOAD AS SELECT
                       2   90348617   10 PARTITION RANGE
                       3   90348617   20 TABLE ACCESS

7140frhyu42t5          0   90348617    1
                       1   90348617   83
                       2   90348617    8
                       3   90348617   28

9vky53vhy5740          0   90348617    3
                       1   90348617   89
                       2   90348617   23
                       3   90348617    9
                       4   90348617   22

Normally, I would look for another SQL_ID that produced the same plan hash value. However, for an execution plan that only sometimes includes on-line statistics gathering, the operations may not match correctly because the OPTIMIZER STATISTICS GATHERING operation changes the line IDs.

WITH h as (
SELECT h.dbid, h.sql_id, h.sql_plan_line_id, h.sql_plan_hash_Value
,      SUM(10) ash_secs
FROM   dba_hist_Active_Sess_history h
WHERE  h.sql_plan_hash_value = 90348617
AND    h.sql_id IN('g7awpb71jbup1','c2dy3rmnqp7d7','drrbxctf8t5nz','7140frhyu42t5')
GROUP BY h.dbid, h.sql_id, h.sql_plan_line_id, h.sql_plan_hash_Value
), p as (
SELECT DISTINCT dbid, plan_hash_value, id, operation
from dba_hist_sql_plan
)
SELECT h.*, p.operation
FROM   h
  LEFT OUTER JOIN p
  ON p.dbid = h.dbid
  AND p.plan_hash_value = h.sql_plan_hash_value
  AND p.id = h.sql_plan_line_id
ORDER BY 1,2,3
/

If I just join the ASH profile to a distinct list of ID and operation for the same plan hash value but matching any SQL_ID, I can get duplicate rows returned, starting at the line with the OPTIMIZER STATISTICS GATHERING operation because I have different plans with the same plan hash value.

                           SQL Plan   SQL Plan  ASH
      DBID SQL_ID           Line ID Hash Value Secs OPERATION
---------- ------------- ---------- ---------- ---- ------------------------------
1278460406 7140frhyu42t5          1   90348617   80 LOAD AS SELECT
1278460406                        2   90348617   10 OPTIMIZER STATISTICS GATHERING
1278460406                        2   90348617   10 PARTITION RANGE
1278460406                        3   90348617   30 PARTITION RANGE
1278460406                        3   90348617   30 TABLE ACCESS
...

To mitigate this problem, in the following SQL Query, I check that the maximum plan line ID for which I have ASH data matches the maximum line ID (i.e. the number of lines) in any alternative plan with the same hash value.

WITH h as (
SELECT h.dbid, h.sql_id, h.sql_plan_line_id, h.sql_plan_hash_Value
,      SUM(10) ash_secs
FROM   dba_hist_Active_Sess_history h
WHERE  h.sql_plan_hash_value = 90348617
AND    h.sql_id IN('g7awpb71jbup1','c2dy3rmnqp7d7','drrbxctf8t5nz','7140frhyu42t5')
GROUP BY h.dbid, h.sql_id, h.sql_plan_line_id, h.sql_plan_hash_value
), x as (
SELECT h.*
,      MAX(sql_plan_line_id) OVER (PARTITION BY h.dbid, h.sql_id) plan_lines
,      p1.operation
FROM   h
  LEFT OUTER JOIN dba_hist_sql_plan p1
  ON  p1.dbid = h.dbid
  AND p1.sql_id = h.sql_id
  AND p1.plan_hash_value = h.sql_plan_hash_value
  AND p1.id = h.sql_plan_line_id
)
SELECT x.*
, (SELECT p2.operation
   FROM   dba_hist_sql_plan p2
   WHERE  p2.dbid = x.dbid
   AND    p2.plan_hash_value = x.sql_plan_hash_value
   AND    p2.id = x.sql_plan_line_id
   AND    p2.sql_id IN(
                   SELECT p.sql_id
                   FROM   dba_hist_sql_plan p
                   WHERE  p.dbid = x.dbid
                   AND    p.plan_hash_value = x.sql_plan_hash_value
                   GROUP BY p.dbid, p.sql_id
HAVING MAX(p.id) = x.plan_lines)
   AND    rownum = 1) operation2
FROM   x
ORDER BY 1,2,3
/

Now, I get an operation description for every line ID (if the same plan was gathered for a different SQL_ID).

                           SQL Plan   SQL Plan  ASH
      DBID SQL_ID           Line ID Hash Value Secs PLAN_LINES OPERATION                        OPERATION2
---------- ------------- ---------- ---------- ---- ---------- -------------------------------- ------------------------------
1278460406 7140frhyu42t5          1   90348617   80          3                                  LOAD AS SELECT
1278460406                        2   90348617   10          3                                  PARTITION RANGE
1278460406                        3   90348617   30          3                                  TABLE ACCESS

1278460406 c2dy3rmnqp7d7          1   90348617  520          4 LOAD AS SELECT                   LOAD AS SELECT
1278460406                        2   90348617  100          4 OPTIMIZER STATISTICS GATHERING   OPTIMIZER STATISTICS GATHERING
1278460406                        3   90348617   80          4 PARTITION RANGE                  PARTITION RANGE
1278460406                        4   90348617  280          4 TABLE ACCESS                     TABLE ACCESS
1278460406                            90348617   30          4

1278460406 drrbxctf8t5nz          1   90348617  100          4                                  LOAD AS SELECT
1278460406                        2   90348617   10          4                                  OPTIMIZER STATISTICS GATHERING
1278460406                        3   90348617   10          4                                  PARTITION RANGE
1278460406                        4   90348617   50          4                                  TABLE ACCESS

1278460406 g7awpb71jbup1          1   90348617  540          3 LOAD AS SELECT                   LOAD AS SELECT
1278460406                        2   90348617   60          3 PARTITION RANGE                  PARTITION RANGE
1278460406                        3   90348617   90          3 TABLE ACCESS                     TABLE ACCESS
1278460406                            90348617   20          3

However, this approach, while better, is still not perfect. I may not have sufficient DB time for the last line in the execution plan to be sampled, and therefore I may not choose a valid alternative plan.

Autonomous & Cloud Databases

Automatic on-line statistics gathering is becoming a more common occurrence.

In the Autonomous Data Warehouse, Oracle has set _optimizer_gather_stats_on_load_all=TRUE, so statistics are collected on every direct-path insert.
From 19c, on Engineered Systems (both in the cloud and on-premises), Real-Time statistics are collected during conventional DML (on inserts, updates and some deletes), also using the OPTIMIZER STATISTICS GATHERING operation. Again, the presence or absence of this operation does not affect the execution plan hash value.

SQL_ID  f0fsghg088k3q, child number 0
-------------------------------------
INSERT INTO t2 SELECT * FROM t1

Plan hash value: 589593414
---------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                 |      |       |       |  1879 (100)|          |       |       |
|   1 |  LOAD TABLE CONVENTIONAL         | T2   |       |       |            |          |       |       |
|   2 |   OPTIMIZER STATISTICS GATHERING |      |  1000K|    40M|  1879   (1)| 00:00:01 |       |       |
|   3 |    PARTITION RANGE ALL           |      |  1000K|    40M|  1879   (1)| 00:00:01 |     1 |1048575|
|   4 |     TABLE ACCESS STORAGE FULL    | T1   |  1000K|    40M|  1879   (1)| 00:00:01 |     1 |1048575|
---------------------------------------------------------------------------------------------------------

SQL_ID  360pwsfmdkxf4, child number 0
-------------------------------------
INSERT /*+NO_GATHER_OPTIMIZER_STATISTICS*/ INTO t3 SELECT * FROM t1

Plan hash value: 589593414
----------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT            |      |       |       |  1879 (100)|          |       |       |
|   1 |  LOAD TABLE CONVENTIONAL    | T3   |       |       |            |          |       |       |
|   2 |   PARTITION RANGE ALL       |      |  1000K|    40M|  1879   (1)| 00:00:01 |     1 |1048575|
|   3 |    TABLE ACCESS STORAGE FULL| T1   |  1000K|    40M|  1879   (1)| 00:00:01 |     1 |1048575|
----------------------------------------------------------------------------------------------------

↧

Online Statistics Collection during Bulk Loads on Partitioned Tables

January 24, 2020, 4:49 am

≫ Next: Data Warehouse Design: Snowflake Dimensions and Lost Skew Trap

≪ Previous: Analysing Database Time with Active Session History for Statements with On-line Optimizer Statistics Gathering Operations

Introduction

One of the enhancements to statistics collection and management in Oracle 12c was the ability of the database will automatically collect statistics during either a create-table-as-select operation or during the initial insert into a freshly created or freshly truncated table, provide that insert is done in direct-path mode (i.e. using the APPEND hint).
When that occurs, there is an additional operation in the execution plan; OPTIMIZER STATISTICS GATHERING.

----------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                 |                 |       |       |   495K(100)|          |
|   1 |  LOAD AS SELECT                  |                 |       |       |            |          |
|   2 |   OPTIMIZER STATISTICS GATHERING |                 |    70M|    11G|   495K  (2)| 00:00:20 |
|   3 |    TABLE ACCESS FULL             | XXXXXXXXXXXXXXX |    70M|    11G|   495K  (2)| 00:00:20 |
----------------------------------------------------------------------------------------------------

The motivation for this blog was encountering a bulk insert into a partitioned table where the statistics gathering operation consumed a very significant amount of time. Partitioning gives you more things to consider.

A Simple Test

I created a simple test that compares the time taken by online statistics collection on partitioned and non-partitioned tables, with the explicit collection of statistics using DBMS_STATS. I have four tables with the same structure.

T1: Not partitioned. Data will be copied from this table to each of the others.
T2: Partitioned. Online statistics only.
T3: Partitioned. Explicitly gathered statistics.
T4: Partitioned. Explicitly gathered incremental statistics.

CREATE TABLE T1 (a number, b varchar2(1000), c number) NOLOGGING;
CREATE TABLE T2 (a number, b varchar2(1000), c number) 
PARTITION BY RANGE (a) INTERVAL(100) (PARTITION t_part VALUES less than (101)) NOLOGGING;
CREATE TABLE T3 (a number, b varchar2(1000), c number) 
PARTITION BY RANGE (a) INTERVAL(100) (PARTITION t_part VALUES less than (101)) NOLOGGING;
CREATE TABLE T4 (a number, b varchar2(1000), c number) 
PARTITION BY RANGE (a) INTERVAL(100) (PARTITION t_part VALUES less than (101)) NOLOGGING;

I loaded 100 million rows into each in direct-path mode. The partitioned tables end up with 100 partitions, each with 1 million rows. I have also suppressed redo logging during the direct-path insert by creating the tables with the NOLOGGING attribute.

Online statistics will be collected on tables T1 and T2.
Online statistics collection will be suppressed on T3 and T4 by using the NO_GATHER_OPTIMIZER_STATISTICS hint. Instead, statistics will be explicitly collected on T3 and T4.
Incremental Statistics are enabled on T4, they collect partition-level statistics and calculate table-level statistics using synopses.

see also Efficient Statistics Maintenance for Partitioned Tables Using Incremental Statistics – Part 1 (Nigel Bayliss).

EXEC dbms_stats.set_table_prefs(user,'T3','INCREMENTAL','FALSE');
EXEC dbms_stats.set_table_prefs(user,'T4','INCREMENTAL','TRUE');

The following set of tests will be run for different combinations of:

Parallel hint on query, or not
Parallel hint on insert, or not
Table parallelism specified
Parallel DML enabled or disabled at session level
Column-specific METHOD_OPT table preference specified or not.

I enabled SQL trace, from which I was able to obtain the elapsed time of the various statements, and I can determine the amount of time spent on online statistics gathering from timings on the OPTIMIZER STATISTICS GATHERING operation in the execution plan in the trace.

TRUNCATE TABLE T2;
TRUNCATE TABLE T3;
EXEC dbms_stats.delete_table_stats(user,'T2');
EXEC dbms_stats.delete_table_stats(user,'T3');
EXEC dbms_stats.delete_table_stats(user,'T4');

INSERT /*+APPEND &inshint*/ into T2 i SELECT * /*+&selhint*/ from t1 s;
INSERT /*+APPEND &inshint NO_GATHER_OPTIMIZER_STATISTICS*/ into T3 i SELECT /*+&selhint*/ * from t1 s;
INSERT /*+APPEND &inshint NO_GATHER_OPTIMIZER_STATISTICS*/ into T4 i SELECT /*+&selhint*/ * from t1 s;
commit;
EXEC dbms_stats.gather_table_stats(user,'T3');
EXEC dbms_stats.gather_table_stats(user,'T4');

Quirks

It was while building this test that I discovered a couple of quirks:

Specifying a column-specific METHOD_OPT table statistics preference disables online statistics collection (bug 30787109) - see Online Statistics Gathering Disabled by Column Specific METHOD_OPT Table Statistics Preference.
12c Online Statistics Gathering Not Working For Insert As Select (Doc ID 2328896.1)– unless all columns are inserted into.

What Statistics Are Normally Collected by Online Statistics Gathering?

After just the initial insert, I can see that I have table statistics on T1 and T2, but not on T3 and T4.

SELECT table_name, num_rows from user_tables where table_name LIKE 'T_' order by 1; 

TABLE_NAME   NUM_ROWS LAST_ANALYZED
---------- ---------- -----------------
T1           10000000 14:07:36 16/01/20
T2           10000000 14:07:36 16/01/20
T3
T4

I also have column statistics on T1 and T2, but no histograms.

break on table_name skip 1
SELECT table_name, column_name, num_distinct, global_stats, histogram, num_buckets, last_analyzed
FROM user_tab_columns where table_name like 'T_' order by 1,2; 

TABLE_NAME COLUMN_NAME  NUM_DISTINCT GLO HISTOGRAM       NUM_BUCKETS LAST_ANALYZED
---------- ------------ ------------ --- --------------- ----------- -----------------
T1         A                   10000 YES NONE                      1 14:06:58 16/01/20
           B                   10000 YES NONE                      1 14:06:58 16/01/20
           C                     100 YES NONE                      1 14:06:58 16/01/20

T2         A                   10000 YES NONE                      1 14:07:11 16/01/20
           B                   10000 YES NONE                      1 14:07:11 16/01/20
           C                     100 YES NONE                      1 14:07:11 16/01/20

T3         A                         NO  NONE
           B                         NO  NONE
           C                         NO  NONE

T4         A                         NO  NONE
           B                         NO  NONE
           C                         NO  NONE

However, I do not have any partition statistics (I have only shown the first and last partition of each table in this report).

break on table_name skip 1
SELECT table_name, partition_position, partition_name, num_rows 
FROM user_tab_partitions WHERE table_name like 'T_' ORDER BY 1,2 nulls first;

TABLE_NAME PARTITION_POSITION PARTITION_NAME         NUM_ROWS LAST_ANALYZED
---------- ------------------ -------------------- ---------- -----------------
T2                          1 T_PART
…
                          100 SYS_P20008

T3                          1 T_PART
…
                          100 SYS_P20107

T4                          1 T_PART
…
                          100 SYS_P20206

Online optimizer statistics gathering only collects statistics at table level but not partition or sub-partition level. Histograms are not collected.

See also Online Statistics Gathering for Bulk Loads in Oracle Database 12c Release 1 (12.1) – Tim Hall

From Oracle 18c, there are two undocumented parameters that modify this behaviour. Both default to false. Interestingly, both are enabled in the Oracle Autonomous Data Warehouse.

If _optimizer_gather_stats_on_load_hist=TRUE histograms are be collected on all columns during online statistics collection.
If _optimizer_gather_stats_on_load_all=TRUE statistics are collected online during every direct-path insert, not just the first one into a segment.

Do I Need Partition Statistics?

Statistics will be collected on partitions that do not have them when the automatic statistics collection job runs in the next database maintenance window. The question is whether to manage without them until then?
"The optimizer will use global or table level statistics if one or more of your queries touches two or more partitions. The optimizer will use partition level statistics if your queries do partition elimination, such that only one partition is necessary to answer each query. If your queries touch two or more partitions the optimizer will use a combination of global and partition level statistics."
– Oracle The Data Warehouse Insider Blog: Managing Optimizer Statistics in an Oracle Database 11g - Maria Colgan
It will depend upon the nature of the SQL in the application. If the optimizer does some partition elimination, and the data is not uniformly distributed across the partitions, then partition statistics are likely to be beneficial. If there is no partition elimination, then you might question whether partitioning (or at least the current partitioning strategy) is appropriate!

What is the Fastest Way to Collect Statistics on Partitioned Tables?

Let's look at how long it takes to insert data into, and then subsequently collect statistics on the tables in my example. This test was run on Oracle 19c on one compute node of a virtualised Exadata X4 machine with 16 CPUs. This table shows elapsed time and the total DB time include all parallel server processes for each operation.

Table Name	Oper-ation	Comment	Option	Serial Insert & Statistics	Parallel Insert & Statistics	Parallel SQL & Statistics	Parallel DML, Insert, Select & Statistics	Parallel DML, SQL & Statistics	Parallel Tables	Parallel Tables & DML	Parallel Tables, DML & Method Opt
			Table	NOPARALLEL	NOPARALLEL	NOPARALLEL	NOPARALLEL	NOPARALLEL	PARALLEL	PARALLEL	PARALLEL
			Insert Hint	blank	PARALLEL(i)	blank	PARALLEL(i)	blank	blank	blank	blank
			Select Hint	blank	PARALLEL(s)	PARALLEL	PARALLEL(s)	PARALLEL	blank	blank	blank
			Parallel DML	DISABLE	DISABLE	DISABLE	ENABLE	ENABLE	DISABLE	ENABLE	ENABLE
			Stats Degree	none	DEFAULT	DEFAULT	DEFAULT	DEFAULT	none	none	none
			Method Opt	none	none	none	none	none	none	none	... FOR COLUMNS SIZE 1 A
T2	Insert	Online Stats Gathering	Elapsed Time (s)	172.46	160.86	121.61	108.29	60.31	194.47	23.57	20.57
T2	Insert	Optimizer Statistics Gathering		82.71	55.59	55.90	-	-	-	-	-
T3	Insert	NO_GATHER_OPTIMIZER_STATS		125.40	156.36	124.18	20.62	29.01	199.20	20.97	21.15
T3	Explicit Stats			122.80	146.25	63.23	15.99	24.88	24.58	24.99	24.62
T4	Insert	NO_GATHER_OPTIMIZER_STATS		123.18	158.15	147.04	20.44	29.91	204.61	20.65	20.60
T4	Incremental Explicit Stats			80.51	104.85	46.05	23.42	23.14	23.21	22.60	23.03
T2	Insert	Online Stats Gathering	DB Time (s)	174	163	169	359	337	248	366	308
T3	Insert	NO_GATHER_OPTIMIZER_STATS		128	193	160	290	211	236	312	326
T3	Explicit Stats			122	146	63		265	305	262	335
T4	Insert	NO_GATHER_OPTIMIZER_STATS		126	194	167	295	205	233	304	295
T4	Incremental Explicit Stats			80	105	2	281	266	300	179	226

It is difficult to determine the actual duration of the OPTIMIZER STATISTICS GATHERING operation, short of measuring the effect of disabling it. The time in the above table has been taken from SQL trace files. That duration is always greater than the amount saved by disabling online statistics gathering with the NO_GATHER_OPTIMIZER_STATS hint. However, the amount of time accounted in Active Session History (ASH) for that line in the execution plan is usually less than the elapsed saving.

Eg. For the sequential insert, 83s was accounted for OPTIMIZER STATISTICS GATHERING in the trace, while ASH showed only 23s of database time for that line of the plan. However, perhaps the only meaningful measurement is that disabling online statistics gathering saved 47s,

DML statements, including insert statements in direct-path, only actually execute in parallel if parallel DML is enabled. Specifying a degree of parallelism on the table, or a parallel hint is not enough. Parallel DML should be enabled.

either at session level

ALTER SESSION ENABLE PARALLEL DML;

or for the individual statement.

insert /*+APPEND ENABLE_PARALLEL_DML*/ into T2 SELECT * from t1;

Specifying parallel insert with a hint, without enabling parallel DML will not improve performance and can actually degrade it.
Specifying parallel query without running the insert in parallel can also degrade performance.

Online statistics will be collected in parallel if

either the table being queried has a degree of parallelism,
or a parallel hint applies to the table being queried, or the entire statement,
or parallel DML has been enabled

Where statistics are collected explicitly (i.e. with a call to DBMS_STATS.GATHER_TABLE_STATS) they are collected in parallel if

either, the DEGREE is specified (I specified a table statistics preference),

EXEC dbms_stats.set_table_prefs(user,'T3','DEGREE','DBMS_STATS.DEFAULT_DEGREE');

or the table has a degree of parallelism.

ALTER TABLE T3 PARALLEL;

Incremental statistics are generally faster to collect because they calculate table-level statistics from partition-level statistics, saving a second pass through the data.
When parallel DML is enabled at session level, I found that the performance of statistics collection also improves.

Conclusion

Overall, the best performance was obtained when the tables were altered to use parallelism, and parallel DML was enabled; then the query, insert and statistics collection are performed in parallel.
However, the improved performance of parallelism comes at a cost. It can be a brutal way of bringing more resource to bear on an activity. A parallel operation can be expected to use more database time across all the parallel server processes than the same operation processed serially. My best results were obtained by activating all of the CPUs on the server without regard for any other activity. Too many concurrent parallel operations have the potential to overload a system. Remember also, that while the parallel attribute remains on the table any subsequent query will also run in parallel.
Suppressing online statistics collection saves total database time whether working in parallel or not. The saving in elapsed time is reduced when the insert and query are running in parallel. The time taken to explicitly collect statistics will exceed that saving because it is doing additional work to collect partition statistics not done during online statistics collection.
Using incremental statistics for partitioned tables will also reduce the total amount of work and database time required to gather statistics, but may not significantly change the elapsed time to collect statistics.
If you need table statistics but can manage without partition statistics until the next maintenance window, then online statistics collection is very effective. However, I think the general case will be to require partition statistics, so you will probably need to explicitly collect statistics instead. If you want histograms, then you will also need to explicitly collect statistics.

↧

Data Warehouse Design: Snowflake Dimensions and Lost Skew Trap

January 28, 2020, 9:42 am

≫ Next: Oracle 19c: Automatic Indexing. Part 1. Introduction

≪ Previous: Online Statistics Collection during Bulk Loads on Partitioned Tables

This post is part of a series that discusses some common issues in data warehouses. Originally written in 2018, but I never got round to publishing it.
While I was experimenting with the previous query I noticed that the cost of the execution plans didn't change as I changed the COUNTRY_ISO_CODE, yet the data volumes for different countries are very different.

select c.country_name
,      u.cust_state_province
,      COUNT(*) num_sales
,      SUM(s.amount_sold) total_amount_sold
from   sales s
,      customers u
,      products p
,      times t
,      countries c
WHERE  s.time_id = t.time_id
AND    s.prod_id = p.prod_id
AND    u.cust_id = s.cust_id
AND    u.country_id = c.country_id
AND    c.country_iso_code = '&&iso_country_code'
AND    p.prod_category_id = 205
and    t.fiscal_year = 1999
GROUP BY c.country_name, u.cust_state_province
ORDER BY 1,2
/

Plan hash value: 3095970037
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   Id  | Operation                                 | Name                      | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | Pstart| Pstop | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|     0 | SELECT STATEMENT                          |                           |      1 |        |       |  1473 (100)|          |       |       |     45 |00:00:01.77 |     101K|       |       |          |
|     1 |  TEMP TABLE TRANSFORMATION                |                           |      1 |        |       |            |          |       |       |     45 |00:00:01.77 |     101K|       |       |          |
|     2 |   LOAD AS SELECT (CURSOR DURATION MEMORY) | SYS_TEMP_0FD9D7C68_A4BC21 |      1 |        |       |            |          |       |       |      0 |00:00:00.13 |    1889 |  1024 |  1024 |          |
|  *  3 |    HASH JOIN                              |                           |      1 |   2413 | 94107 |   418   (1)| 00:00:01 |       |       |  18520 |00:00:00.10 |    1888 |  1185K|  1185K|  639K (0)|
|  *  4 |     TABLE ACCESS FULL                     | COUNTRIES                 |      1 |      1 |    18 |     2   (0)| 00:00:01 |       |       |      1 |00:00:00.01 |       2 |       |       |          |
|     5 |     TABLE ACCESS FULL                     | CUSTOMERS                 |      1 |  55500 |  1138K|   416   (1)| 00:00:01 |       |       |  55500 |00:00:00.02 |    1521 |       |       |          |
|     6 |   SORT GROUP BY                           |                           |      1 |   2359 |   101K|  1055   (1)| 00:00:01 |       |       |     45 |00:00:01.65 |   99111 |  6144 |  6144 | 6144  (0)|
|  *  7 |    HASH JOIN                              |                           |      1 |   3597 |   154K|  1054   (1)| 00:00:01 |       |       |  64818 |00:00:01.58 |   99111 |  2391K|  1595K| 2025K (0)|
|     8 |     TABLE ACCESS FULL                     | SYS_TEMP_0FD9D7C68_A4BC21 |      1 |   2413 | 62738 |     5   (0)| 00:00:01 |       |       |  18520 |00:00:00.01 |       0 |       |       |          |
|     9 |     VIEW                                  | VW_ST_C525CEF3            |      1 |   3597 | 64746 |  1048   (1)| 00:00:01 |       |       |  64818 |00:00:01.44 |   99111 |       |       |          |
…

Note:

There are 55500 rows on CUSTOMERS
There are 23 rows on COUNTRIES
Oracle expects 2413 rows on joining those tables

55500÷23= 2413.04, so Oracle assumes the data is evenly distributed between countries, although there are histograms on COUNTRY_ISO_CODE and COUNTRY_ID.
This is sometimes called 'lost skew'. The skew of a dimension does not pass into the cardinality calculation on the fact table.

See Jonathan Lewis' blog: Bitmap John Indexes 2.

If I replace the predicate on COUNTRY_ISO_CODE with a predicate on COUNTRY_ID then the estimate of the number of rows from customers is correctly 18520 rows. The cost of the star transformation has gone up from 1473 to 6922.

Plan hash value: 1339390240

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name                      | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | Pstart| Pstop | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |                           |      1 |        |       |  6922 (100)|          |       |       |     45 |00:00:01.50 |   97998 |       |       |          |
|   1 |  TEMP TABLE TRANSFORMATION                |                           |      1 |        |       |            |          |       |       |     45 |00:00:01.50 |   97998 |       |       |          |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY) | SYS_TEMP_0FD9D7C6A_A4BC21 |      1 |        |       |            |          |       |       |      0 |00:00:00.06 |    1524 |  1024 |  1024 |          |
|   3 |    NESTED LOOPS                           |                           |      1 |  18520 |   651K|   417   (1)| 00:00:01 |       |       |  18520 |00:00:00.04 |    1523 |       |       |          |
|   4 |     TABLE ACCESS BY INDEX ROWID           | COUNTRIES                 |      1 |      1 |    15 |     1   (0)| 00:00:01 |       |       |      1 |00:00:00.01 |       2 |       |       |          |
|   5 |      INDEX UNIQUE SCAN                    | COUNTRIES_PK              |      1 |      1 |       |     0   (0)|          |       |       |      1 |00:00:00.01 |       1 |       |       |          |
|   6 |     TABLE ACCESS FULL                     | CUSTOMERS                 |      1 |  18520 |   379K|   416   (1)| 00:00:01 |       |       |  18520 |00:00:00.03 |    1521 |       |       |          |
|   7 |   SORT GROUP BY                           |                           |      1 |   2359 |   101K|  6505   (1)| 00:00:01 |       |       |     45 |00:00:01.43 |   96473 |  6144 |  6144 | 6144  (0)|
|   8 |    HASH JOIN                              |                           |      1 |  82724 |  3554K|  6499   (1)| 00:00:01 |       |       |  64818 |00:00:01.37 |   96473 |  2391K|  1595K| 2002K (0)|
|   9 |     TABLE ACCESS FULL                     | SYS_TEMP_0FD9D7C6A_A4BC21 |      1 |  18520 |   470K|    25   (0)| 00:00:01 |       |       |  18520 |00:00:00.01 |       0 |       |       |          |
…

In fact, I only get the star transformation if I force the issue with a STAR_TRANSFORMATION hint. Otherwise, I get the full scan plan which is much cheaper, but again the cardinality calculation on CUSTOMERS is correct.

Plan hash value: 3784979335
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name         | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | Pstart| Pstop | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |              |      1 |        |       |  1595 (100)|          |       |       |     45 |00:00:00.54 |    2065 |    472 |       |       |          |
|   1 |  SORT GROUP BY                    |              |      1 |     45 |  3510 |  1595   (3)| 00:00:01 |       |       |     45 |00:00:00.54 |    2065 |    472 |  6144 |  6144 | 6144  (0)|
|   2 |   HASH JOIN                       |              |      1 |  81133 |  6180K|  1589   (3)| 00:00:01 |       |       |  64818 |00:00:00.43 |    2065 |    472 |  2337K|  2200K| 2221K (0)|
|   3 |    TABLE ACCESS FULL              | CUSTOMERS    |      1 |  18520 |   379K|   416   (1)| 00:00:01 |       |       |  18520 |00:00:00.02 |    1521 |      0 |       |       |          |
|   4 |    HASH JOIN                      |              |      1 |  81133 |  4516K|  1172   (3)| 00:00:01 |       |       |    110K|00:00:00.35 |     544 |    472 |  2546K|  2546K| 1610K (0)|
|   5 |     TABLE ACCESS FULL             | PRODUCTS     |      1 |     26 |   208 |     3   (0)| 00:00:01 |       |       |     26 |00:00:00.01 |       4 |      0 |       |       |          |
|   6 |     HASH JOIN                     |              |      1 |    229K|    10M|  1167   (3)| 00:00:01 |       |       |    246K|00:00:00.30 |     539 |    472 |  1133K|  1133K| 1698K (0)|
|   7 |      PART JOIN FILTER CREATE      | :BF0000      |      1 |    364 |  9828 |    17   (0)| 00:00:01 |       |       |    364 |00:00:00.01 |      57 |      0 |       |       |          |
|   8 |       NESTED LOOPS                |              |      1 |    364 |  9828 |    17   (0)| 00:00:01 |       |       |    364 |00:00:00.01 |      57 |      0 |       |       |          |
|   9 |        TABLE ACCESS BY INDEX ROWID| COUNTRIES    |      1 |      1 |    15 |     1   (0)| 00:00:01 |       |       |      1 |00:00:00.01 |       2 |      0 |       |       |          |
|  10 |         INDEX UNIQUE SCAN         | COUNTRIES_PK |      1 |      1 |       |     0   (0)|          |       |       |      1 |00:00:00.01 |       1 |      0 |       |       |          |
|  11 |        TABLE ACCESS FULL          | TIMES        |      1 |    364 |  4368 |    16   (0)| 00:00:01 |       |       |    364 |00:00:00.01 |      55 |      0 |       |       |          |
|  12 |      PARTITION RANGE JOIN-FILTER  |              |      1 |    918K|    19M|  1142   (3)| 00:00:01 |:BF0000|:BF0000|    296K|00:00:00.21 |     482 |    472 |       |       |          |
|  13 |       TABLE ACCESS FULL           | SALES        |      5 |    918K|    19M|  1142   (3)| 00:00:01 |:BF0000|:BF0000|    296K|00:00:00.20 |     482 |    472 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

↧

Oracle 19c: Automatic Indexing. Part 1. Introduction

May 1, 2020, 5:19 am

≫ Next: Oracle 19c: Automatic Indexing. Part 2. Testing Automatic Indexing with Swingbench

≪ Previous: Data Warehouse Design: Snowflake Dimensions and Lost Skew Trap

This is the first of a two-part post that looks at the Automatic Indexing feature introduced in Oracle 19c, available on engineered systems only. Initially, I simply wanted to see what it does and to understand how it worked.
Next, I wanted to see how good it is. I created a test based on Dominic Giles'Swingbench Sales Order Entry benchmark. Having dropped the secondary indexes (ones not involved in key constraints), I wanted to see which Automatic Indexing would recreate and whether that would reinstate the original performance.

References and Acknowledgements

This blog is not intended to provide a comprehensive description of Automatic Indexing. I explain some things as I go along, but I have referenced the sources that I found helpful.
The Oracle 19c documentation is not particularly verbose. Automatic Indexing is introduced in New Database Features Guide: Big Data & Data Warehousing: Automatic Indexing.

"The automatic indexing feature automates index management tasks, such as creating, rebuilding, and dropping indexes in an Oracle Database based on changes in the application workload. This feature improves database performance by managing indexes automatically in an Oracle Database."

However, there is more information the Database Administrator's Guide at Managing Auto Indexes:

"The automatic indexing feature automates the index management tasks in an Oracle database. Automatic indexing automatically creates, rebuilds, and drops indexes in a database based on the changes in application workload, thus improving database performance. The automatically managed indexes are known as auto indexes.

Index structures are an essential feature to database performance. Indexes are critical for OLTP applications, which use large data sets and run millions of SQL statements a day. Indexes are also critical for data warehousing applications, which typically query a relatively small amount of data from very large tables. If you do not update the indexes whenever there are changes in the application workload, the existing indexes can cause the database performance to deteriorate considerably.

Automatic indexing improves database performance by managing indexes automatically and dynamically in an Oracle database based on changes in the application workload."

Maria Colgan (the Master Oracle Database product manager) has blogged and presented on this feature:

Automatic Indexing is certainly intended for use in the Autonomous Database, but also for other 19c Exadata databases. These presentations also make it clear that Automatic Indexing is intended for OLTP as well as Warehouse and Analytic databases. Some of the examples refer to packaged applications (an unnamed Accounts Receivable system, and a PeopleSoft ERP system).
I found a number of other valuable resources that helped me to get it going, monitor it, and to begin to understand what was going on behind the scenes.

Richard Foote has blogged extensively about this feature over the last year.
Franck Pachot has written several blogs, I will refer to others later.

19c Auto Index: the dictionary views: "Automatic Indexing is an evolution of the Advisors that were introduced since 10g and, in the same way, it provides many dictionary views to understand its activity."

Tim Hall: Automatic Indexing (DBMS_AUTO_INDEX) in Oracle Database 19c
Julian Dontcheff: Automatic Indexing in 19c

How does Automatic Indexing Work?

Automatic Indexing is an expert system that runs with two background scheduler automatic tasks. By default, both run every 15 minutes.

Auto STS Capture Task captures workload into a SQL tuning set SYS_AUTO_STS. This process runs regardless of the Automatic Indexing configuration. It has a maximum runtime of 15 minutes.
Auto Index Task runs if AUTO_INDEX_MODE is not OFF. It has a maximum runtime of 1 hour. This process creates automatic indexes. Initially, they are invisible and unusable. It checks whether the optimizer will use them. If so, it rebuilds them as usable invisible indexes and checks for improved performance before making them visible. It may also make them invisible again later.

Indexes created by Automatic Indexing are created with the AUTO option, and are identified in ALL_INDEXES with the AUTO attribute. Automatic indexes will be dropped if they haven't been used for longer than a specified retention period (default 373 days). Optionally, manually created indexes can be considered by Automatic Indexes, and can also be dropped after a separately specified retention period.
This creates a feedback loop where indexes are created and dropped in response to changing load on the database while assuring that the newly created indexes will be used and will improve performance and that any indexes that are dropped were not being used.
Automatic Indexing is only available on engineered (Exadata) systems (see Database Licensing Information User Manual, 1.3 Permitted Features, Options, and Management Packs by Oracle Database Offering, Performance). This includes Oracle Database Enterprise Edition on Engineered Systems, Oracle Database Exadata Cloud Service, and Oracle's Autonomous databases. Automation of index creation and removal is an important part of the 'self-driving' aspiration for the Autonomous database, where it will do 'database management by using machine learning and automation to eliminate human labor, human error, and manual tuning'.
(In 20c, there are two additional automatic tasks to flush and purge the SQL Tuning Sets).

↧

Oracle 19c: Automatic Indexing. Part 2. Testing Automatic Indexing with Swingbench

May 4, 2020, 4:38 am

≫ Next: Loading a Flat File from OCI Object Storage into an Autonomous Database. Part 1. Upload to Object Storage

≪ Previous: Oracle 19c: Automatic Indexing. Part 1. Introduction

This is the second of a two-part post that looks at the Automatic Indexing feature introduced in Oracle 19c.
I have used Dominic Giles'Swingbench utility to create a realistic and repeatable OLTP load test using the Sales Order Entry (SOE) benchmark. This post explains how I set up and ran the test, and what results I obtained.

Installation & Setup of Swingbench

I have tested Automatic Indexing on an Exadata X4 running Oracle 19.3.1.0.0, and I have used the results from that system in this blog. I have also successfully tested it on 19.6 and 20.2 running in Oracle VirtualBox VMs (built with Frits Hoogland's vagrant-builder) and have enabled Exadata features by setting _exadata_feature_on = TRUE. Of course, I could never recommend setting this on anything other than a play database, but it does show the feature could work on any database platform.

alter system set "_exadata_feature_on"=true scope=spfile;
shutdown immediate;
startup;

Swingbench requires a Java 8 in a Java virtual machine.

yum install java

Then, it is simply a matter of downloading and unzipping the distribution.

curl http://www.dominicgiles.com/swingbench/swingbench261082.zip -o swingbench.zip
unzip swingbench.zip

To assist with monitoring the test and capturing SQL and metrics, I set the AWR snapshot frequency to 15 minutes.

execute dbms_workload_repository.modify_snapshot_settings(interval => 15);

I have created a dedicated tablespace for the SOE schema

CREATE TABLESPACE SOE DATAFILE SIZE 10M AUTOEXTEND ON NEXT 1M;

The SOE schema is built with the oewizard utility. I am creating all the indexes, and not using any partitioning.

cd ~/swingbench/bin
./oewizard -cs //enkx4c02-scan/swingbench_dmk -dt thin -dba "sys as sysdba" -dbap welcome1 -ts SOE -u soe -p soe -create -allindexes -nopart -cl -v

Test 1: Baseline Test

The Swingbench SOE benchmark has 9 tables with 27 indexes. 15 of those indexes are on primary key or referential integrity constraints.

Table                      Index                                     Cons
Owner TABLE_NAME           Owner INDEX_NAME                UNIQUENES Type STATUS   VISIBILIT AUT INDEX_KEYS
----- -------------------- ----- ------------------------- --------- ---- -------- --------- --- ----------------------------
SOE   ADDRESSES            SOE   ADDRESS_CUST_IX           NONUNIQUE R    VALID    VISIBLE   NO  CUSTOMER_ID
                           SOE   ADDRESS_PK                UNIQUE    P    VALID    VISIBLE   NO  ADDRESS_ID

SOE   CARD_DETAILS         SOE   CARDDETAILS_CUST_IX       NONUNIQUE      VALID    VISIBLE   NO  CUSTOMER_ID
                           SOE   CARD_DETAILS_PK           UNIQUE    P    VALID    VISIBLE   NO  CARD_ID

SOE   CUSTOMERS            SOE   CUST_EMAIL_IX             NONUNIQUE      VALID    VISIBLE   NO  CUST_EMAIL
                           SOE   CUSTOMERS_PK              UNIQUE    P    VALID    VISIBLE   NO  CUSTOMER_ID
                           SOE   CUST_FUNC_LOWER_NAME_IX   NONUNIQUE      VALID    VISIBLE   NO  SYS_NC00017$,SYS_NC00018$
                           SOE   CUST_DOB_IX               NONUNIQUE      VALID    VISIBLE   NO  DOB
                           SOE   CUST_ACCOUNT_MANAGER_IX   NONUNIQUE      VALID    VISIBLE   NO  ACCOUNT_MGR_ID

SOE   INVENTORIES          SOE   INV_WAREHOUSE_IX          NONUNIQUE R    VALID    VISIBLE   NO  WAREHOUSE_ID
                           SOE   INV_PRODUCT_IX            NONUNIQUE R    VALID    VISIBLE   NO  PRODUCT_ID
                           SOE   INVENTORY_PK              UNIQUE    P    VALID    VISIBLE   NO  PRODUCT_ID,WAREHOUSE_ID

SOE   ORDERS               SOE   ORD_WAREHOUSE_IX          NONUNIQUE      VALID    VISIBLE   NO  WAREHOUSE_ID,ORDER_STATUS
                           SOE   ORDER_PK                  UNIQUE    P    VALID    VISIBLE   NO  ORDER_ID
                           SOE   ORD_SALES_REP_IX          NONUNIQUE      VALID    VISIBLE   NO  SALES_REP_ID
                           SOE   ORD_CUSTOMER_IX           NONUNIQUE R    VALID    VISIBLE   NO  CUSTOMER_ID
                           SOE   ORD_ORDER_DATE_IX         NONUNIQUE      VALID    VISIBLE   NO  ORDER_DATE

SOE   ORDER_ITEMS          SOE   ITEM_ORDER_IX             NONUNIQUE R    VALID    VISIBLE   NO  ORDER_ID
                           SOE   ITEM_PRODUCT_IX           NONUNIQUE R    VALID    VISIBLE   NO  PRODUCT_ID
                           SOE   ORDER_ITEMS_PK            UNIQUE    P    VALID    VISIBLE   NO  ORDER_ID,LINE_ITEM_ID

SOE   PRODUCT_DESCRIPTIONS SOE   PRD_DESC_PK               UNIQUE    P    VALID    VISIBLE   NO  PRODUCT_ID,LANGUAGE_ID
                           SOE   PROD_NAME_IX              NONUNIQUE      VALID    VISIBLE   NO  TRANSLATED_NAME

SOE   PRODUCT_INFORMATION  SOE   PROD_SUPPLIER_IX          NONUNIQUE      VALID    VISIBLE   NO  SUPPLIER_ID
                           SOE   PRODUCT_INFORMATION_PK    UNIQUE    P    VALID    VISIBLE   NO  PRODUCT_ID
                           SOE   PROD_CATEGORY_IX          NONUNIQUE      VALID    VISIBLE   NO  CATEGORY_ID

SOE   WAREHOUSES           SOE   WAREHOUSES_PK             UNIQUE    P    VALID    VISIBLE   NO  WAREHOUSE_ID
                           SOE   WHS_LOCATION_IX           NONUNIQUE      VALID    VISIBLE   NO  LOCATION_ID

At this stage, Automatic Indexing is off. If you rebuild the SOE schema having previously run Automatic Indexing, remember to disable the feature, otherwise, it might act on the basis of previous activity. It is administered via the DBMS_AUTO_INDEX package.

EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_MODE','OFF');

I ran Swingbench using the character mode charbench front end. Each test runs for an hour.

./charbench -c ../configs/SOE_Client_Side.xml -cs //enkx4c02-scan/swingbench_dmk -dt thin -u soe -p soe -rt 01:00 -v

Author  :        Dominic Giles
Version :        2.6.0.1082

Results will be written to results.xml.
Hit Return to Terminate Run...

Time            Users   TPM     TPS

12:54:54 PM     0       58500   869
Completed Run.

The results are written to an XML file, from which a formatted report can be produced using Result2Pdf. I run this on Windows.

>results2pdf -c results00001.xml

>java -cp ../launcher LauncherBootstrap -executablename results2pdf results2pdf -c results.xml
Application :    Results2Pdf
Author      :    Dominic Giles
Version     :    2.6.0.1076
Success : Pdf file null was created from results.xml results file.

The report gives average response times for the 9 different transactions and an overall average number of transactions per second.

Results

This test is my baseline.

Transaction	Average Response (ms)
Transaction	1: Delivered Indexes
Update Customer Details	1.18
Browse Products	2.03
Browse Orders	2.38
Customer Registration	3.50
Order Products	5.67
Warehouse Query	6.20
Process Orders	13.42
Warehouse Activity Query	14.89
Sales Rep Query	31.76
TPS	1060.81

Test 2: Drop Secondary Indexes

In many applications, developers and DBAs add indexes to resolve performance problems. It is easy to add indexes, but harder to know whether and where they are used, and therefore when it is safe to remove or change an existing index. Indexes have an overhead in terms of taking up space in the database and maintenance during DML operations.
Automatic indexing is designed to take on this challenge. Oracle has provided a procedure to drop secondary indexes, DBMS_AUTO_INDEX.DROP_SECONDARY_INDEXES.

Frank Pachot: An Oracle Auto Index function to drop secondary indexes - what is a “secondary” index?

"documented as 'Deletes all the indexes, except the ones used for constraints, from a schema or a table.'"

Tim Hall: Automatic Indexing (DBMS_AUTO_INDEX) in Oracle Database 19c: Drop Secondary Indexes

"If you are feeling particularly brave, the DROP_SECONDARY_INDEXES procedure will drop all indexes except those used for constraints…This leaves you with a clean slate, so automatic indexing can make all your indexing decisions for you."

DROP_SECONDARY_INDEXES doesn't check the status of the constraint. Foreign key columns should be indexed to avoid TM locking when updating or deleting the parent record in a primary key. The index would not be needed if the foreign key constraint was not validated. You might make a constraint disabled, not validated, but reliable because you want to take advantage of foreign key join elimination. In this case, the index would not be necessary, but it would not be dropped by this procedure.

EXEC DBMS_AUTO_INDEX.drop_secondary_indexes('SOE','');

When this is run on the SOE schema, I am left with 15 indexes that are either unique or on foreign key columns.

Table                      Index                                     Cons
Owner TABLE_NAME           Owner INDEX_NAME                UNIQUENES Type STATUS   VISIBILIT AUT INDEX_KEYS
----- -------------------- ----- ------------------------- --------- ---- -------- --------- --- ----------------------------
SOE   ADDRESSES            SOE   ADDRESS_PK                UNIQUE    P    VALID    VISIBLE   NO  ADDRESS_ID
                           SOE   ADDRESS_CUST_IX           NONUNIQUE R    VALID    VISIBLE   NO  CUSTOMER_ID

SOE   CARD_DETAILS         SOE   CARD_DETAILS_PK           UNIQUE    P    VALID    VISIBLE   NO  CARD_ID

SOE   CUSTOMERS            SOE   CUSTOMERS_PK              UNIQUE    P    VALID    VISIBLE   NO  CUSTOMER_ID

SOE   INVENTORIES          SOE   INV_PRODUCT_IX            NONUNIQUE R    VALID    VISIBLE   NO  PRODUCT_ID
                           SOE   INV_WAREHOUSE_IX          NONUNIQUE R    VALID    VISIBLE   NO  WAREHOUSE_ID
                           SOE   INVENTORY_PK              UNIQUE    P    VALID    VISIBLE   NO  PRODUCT_ID,WAREHOUSE_ID

SOE   ORDERS               SOE   ORD_CUSTOMER_IX           NONUNIQUE R    VALID    VISIBLE   NO  CUSTOMER_ID
                           SOE   ORDER_PK                  UNIQUE    P    VALID    VISIBLE   NO  ORDER_ID

SOE   ORDER_ITEMS          SOE   ITEM_PRODUCT_IX           NONUNIQUE R    VALID    VISIBLE   NO  PRODUCT_ID
                           SOE   ORDER_ITEMS_PK            UNIQUE    P    VALID    VISIBLE   NO  ORDER_ID,LINE_ITEM_ID
                           SOE   ITEM_ORDER_IX             NONUNIQUE R    VALID    VISIBLE   NO  ORDER_ID

SOE   PRODUCT_DESCRIPTIONS SOE   PRD_DESC_PK               UNIQUE    P    VALID    VISIBLE   NO  PRODUCT_ID,LANGUAGE_ID

SOE   PRODUCT_INFORMATION  SOE   PRODUCT_INFORMATION_PK    UNIQUE    P    VALID    VISIBLE   NO  PRODUCT_ID

SOE   WAREHOUSES           SOE   WAREHOUSES_PK             UNIQUE    P    VALID    VISIBLE   NO  WAREHOUSE_ID

Results

Unsurprisingly, the effect on Swingbench is to severely degrade performance.

Transaction	Average Response (ms)
Transaction	1: Delivered Indexes	2: Drop Secondary Indexes
Update Customer Details	1.18	3.30
Browse Products	2.03	409.21
Browse Orders	2.38	2.05
Customer Registration	3.50	78.51
Order Products	5.67	40.97
Warehouse Query	6.20	2.82
Process Orders	13.42	247.80
Warehouse Activity Query	14.89	274.19
Sales Rep Query	31.76	268.51
TPS	1060.81	81.30

Enabling Automatic Indexing

There are several configuration settings that are made via the DBMS_AUTO_INDEX.CONFIGURE procedure.

I have created a tablespace AUTO_INDEXES_TS and configured Automatic Indexing to create its indexes there. It is permitted to use 100% of that tablespace.

CREATE TABLESPACE AUTO_INDEXES_TS DATAFILE SIZE 10M AUTOEXTEND ON NEXT 1M;
EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_DEFAULT_TABLESPACE','AUTO_INDEXES_TS');
EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_SPACE_BUDGET','100');

Automatic indexes will be retained until they have not been used for 1 day (the default was 373 days). This unrealistically low value is so that I can test that they will be dropped later.
Manual indexes, the ones created when Swingbench was installed, are not deleted.
The automatic indexing logs, visible in the various DBA_AUTO_INDEX% views, are retained for 7 days.

EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_RETENTION_FOR_AUTO','7');
EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_RETENTION_FOR_MANUAL','');
EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_REPORT_RETENTION','7');

Automatic indexing is configured only to apply to the SOE schema.

EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_SCHEMA', 'SOE', allow => TRUE);

Finally, I enable Automatic Indexing and permit it to create indexes.

EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_MODE','IMPLEMENT');

You can validate the current parameters by querying DBA_AUTO_INDEX_CONFIG. This view is based on smb$config. There are other hidden and undocumented parameters visible in smb$config.

                                  Auto Index Config
                                                                                              Modified
PARAMETER_NAME                   PARAMETER_VALUE                LAST_MODIFIED                 By
-------------------------------- ------------------------------ ----------------------------- ----------
AUTO_INDEX_COMPRESSION           OFF                            27-MAR-20 07.42.36.000000 AM  SYSTEM
AUTO_INDEX_DEFAULT_TABLESPACE    AUTO_INDEXES_TS                27-MAR-20 10.28.24.000000 AM  SYSTEM
AUTO_INDEX_MODE                  IMPLEMENT                      27-MAR-20 10.28.24.000000 AM  SYSTEM
AUTO_INDEX_REPORT_RETENTION      7                              27-MAR-20 10.28.24.000000 AM  SYSTEM
AUTO_INDEX_RETENTION_FOR_AUTO    7                              27-MAR-20 10.28.24.000000 AM  SYSTEM
AUTO_INDEX_RETENTION_FOR_MANUAL                                 27-MAR-20 10.28.24.000000 AM  SYSTEM
AUTO_INDEX_SCHEMA                schema IN (SOE)                27-MAR-20 10.28.24.000000 AM  SYSTEM
AUTO_INDEX_SPACE_BUDGET          100                            27-MAR-20 10.28.24.000000 AM  SYSTEM

see also Richard Foote: Oracle 19c Automatic Indexing: Configuration

DBA_AUTOTASK_SCHEDULE_CONTROL shows the two scheduled automatic tasks that form Automatic Indexing. The Auto Index Task runs when Automatic Indexing is enabled in either implement or report only mode. The Auto SQL Tuning Set (STS) Capture Task runs from when Automatic Indexing is first enabled, but it is not stopped when Automatic Indexing is disabled. Both jobs run every 15 minutes.

           Task                                                      Max Run       Elapsed
      DBID   ID TASK_NAME                        STATUS     INTERVAL    Time ENABL    Time LAST_SCHEDULE_TIME
---------- ---- -------------------------------- ---------- -------- ------- ----- ------- --------------------------------
1400798553    3 Auto Index Task                  SUCCEEDED       900    3600 TRUE        3 17-MAR-20 03.18.26.997 PM -05:00
1400798553    5 Auto STS Capture Task            SUCCEEDED       900     900 TRUE        0 17-MAR-20 03.17.31.051 PM -05:00

Test 3: Creating Automatic Indexes

When I ran Swingbench again the poor performance continued until halfway through the test when Automatic Indexing decided to create some indexes and make them visible. There was a step improvement in performance, although it was nowhere near the 1000 TPS that we started with!

At the end of the test, there are 5 new indexes, 3 of which are visible, 2 are invisible.

Table                      Index                                     Cons
Owner TABLE_NAME           Owner INDEX_NAME                UNIQUENES Type STATUS       VISIBILIT AUT INDEX_KEYS
----- -------------------- ----- ------------------------- --------- ---- ------------ --------- --- -------------------------------------------------
SOE   ADDRESSES            SOE   ADDRESS_CUST_IX           NONUNIQUE R    VALID        VISIBLE   NO  CUSTOMER_ID
                           SOE   ADDRESS_PK                UNIQUE    P    VALID        VISIBLE   NO  ADDRESS_ID

SOE   CARD_DETAILS         SOE   CARD_DETAILS_PK           UNIQUE    P    VALID        VISIBLE   NO  CARD_ID
                           SOE   SYS_AI_dt4w4vr174j9m      NONUNIQUE      VALID        VISIBLE   YES CUSTOMER_ID <-reinstated secondary

SOE   CUSTOMERS            SOE   CUSTOMERS_PK              UNIQUE    P    VALID        VISIBLE   NO  CUSTOMER_ID

SOE   INVENTORIES          SOE   INVENTORY_PK              UNIQUE    P    VALID        VISIBLE   NO  PRODUCT_ID,WAREHOUSE_ID
                           SOE   INV_PRODUCT_IX            NONUNIQUE R    VALID        VISIBLE   NO  PRODUCT_ID
                           SOE   INV_WAREHOUSE_IX          NONUNIQUE R    VALID        VISIBLE   NO  WAREHOUSE_ID

SOE   ORDERS               SOE   ORDER_PK                  UNIQUE    P    VALID        VISIBLE   NO  ORDER_ID
                           SOE   ORD_CUSTOMER_IX           NONUNIQUE R    VALID        VISIBLE   NO  CUSTOMER_ID
                           SOE   SYS_AI_3z00frhp9vd91      NONUNIQUE      VALID        VISIBLE   YES WAREHOUSE_ID <-original also order_status
                           SOE   SYS_AI_gbwwy984mc1ft      NONUNIQUE      VALID        VISIBLE   YES SALES_REP_ID <-reinstated secondary

SOE   ORDER_ITEMS          SOE   ORDER_ITEMS_PK            UNIQUE    P    VALID        VISIBLE   NO  ORDER_ID,LINE_ITEM_ID
                           SOE   ITEM_PRODUCT_IX           NONUNIQUE R    VALID        VISIBLE   NO  PRODUCT_ID
                           SOE   ITEM_ORDER_IX             NONUNIQUE R    VALID        VISIBLE   NO  ORDER_ID

SOE   PRODUCT_DESCRIPTIONS SOE   PRD_DESC_PK               UNIQUE    P    VALID        VISIBLE   NO  PRODUCT_ID,LANGUAGE_ID
                           SOE   SYS_AI_20tjdcuwznyhx      NONUNIQUE      VALID        INVISIBLE YES PRODUCT_ID <-redundant

SOE   PRODUCT_INFORMATION  SOE   PRODUCT_INFORMATION_PK    UNIQUE    P    VALID        VISIBLE   NO  PRODUCT_ID
                           SOE   SYS_AI_b9k5zyq0mjwf5      NONUNIQUE      VALID        INVISIBLE YES CATEGORY_ID <-reinstated invisible secondary

SOE   WAREHOUSES           SOE   WAREHOUSES_PK             UNIQUE    P    VALID        VISIBLE   NO  WAREHOUSE_ID <-reinstated redundant reinstated original

The names of the indexes are determined by applying the SYS_OP_COMBINED_HASH function to the table owner, table name, and indexed column list.

See Frank Pachot: How 19c Auto Indexes are named?

The various dictionary views reveal some of what has happened.

See Franck Pachot: 19c Auto Index: the dictionary views

Note that my invisible indexes are usable. Automatic Indexes start out as unusable and invisible. Here Automatic Indexes has rebuilt them as usable, but they are still invisible because they do not reduce logical I/O. So, I am still bearing the overhead of maintaining them during DML.
DBA_AUTO_INDEX_STATISTICS reports a summary of the automatic indexing task. Confirming the number of indexes built.

Tue Mar 17                                                                          page    1
                                  Auto Index Statistics

EXECUTION_NAME             STAT_NAME                          VALUE
-------------------------- ----------------------------- ----------
SYS_AI_2020-03-17/15:48:28 Space used in bytes            129105920
                           SQL plan baselines created             2
                           Index candidates                       5
                           Indexes created (visible)              3
                           Indexes created (invisible)            2
                           Improvement percentage             88.92
                           SQL statements verified               10
                           SQL statements improved                4
                           SQL statements managed by SPM          2

DBA_AUTO_INDEX_SQL_ACTIONS shows the commands issued to build the tuning set SYS_AUTO_STS SQL. Automatic indexing only uses this one tuning set and keeps adding statements to it. Even if I drop and recreate the SOE schema the SQL Tuning set remains.

Tue Mar 17                                                                                                                             page    1
                                                                Auto Index SQL Actions

                                                      SQL Plan
EXECUTION_NAME              ACTION_ID SQL_ID        Hash Value COMMAND
-------------------------- ---------- ------------- ---------- ------------------------------
STATEMENT                                                                        START_TIME          END_TIME                ERROR#
-------------------------------------------------------------------------------- ------------------- ------------------- ----------
SYS_AI_2020-03-17/15:48:28         12 dy8cxyd3mv1as 2679498789 DISALLOW AUTO INDEX FOR SQL
declare                                                                          15:50:01 17.03.2020 15:50:02 17.03.2020          0
         load_cnt pls_integer;
       begin
         load_cnt :=  dbms_spm_internal.load_plans_from_sqlset('SYS_AUTO_STS','S
YS','sql_id = ''dy8cxyd3mv1as''','NO','YES',1000,FALSE,'SYS',FALSE,TRUE); end;

SYS_AI_2020-03-17/15:48:28         11 dunt7pwuax92s 1878158884 DISALLOW AUTO INDEX FOR SQL
declare                                                                          15:50:01 17.03.2020 15:50:01 17.03.2020          0
         load_cnt pls_integer;
       begin
         load_cnt :=  dbms_spm_internal.load_plans_from_sqlset('SYS_AUTO_STS','S
YS','sql_id = ''dunt7pwuax92s''','NO','YES',1000,FALSE,'SYS',FALSE,TRUE); end;

Initially, the automatic indexes are created unusable and invisible. Later, the indexes will recreated as usable and invisible is they are judged to be beneficial.

Fri Mar 20                                                                                                                                   page    1
                                                             Auto Index Indexing Actions

                           Action                           Index                      Table
EXECUTION_NAME                 ID INDEX_NAME                Owner TABLE_NAME           Owner COMMAND
-------------------------- ------ ------------------------- ----- -------------------- ----- ------------------------------
STATEMENT                                                                        START_TIME          END_TIME            Error#
-------------------------------------------------------------------------------- ------------------- ------------------- ------
SYS_AI_2020-03-20/13:56:03      5 SYS_AI_3z00frhp9vd91      SOE   ORDERS               SOE   CREATE INDEX
CREATE INDEX "SOE"."SYS_AI_3z00frhp9vd91"   ON "SOE"."ORDERS"("WAREHOUSE_ID") TA 13:56:08 20.03.2020 13:56:08 20.03.2020      0
BLESPACE "AUTO_INDEXES_TS" UNUSABLE INVISIBLE AUTO COMPRESS ADVANCED LOW  ONLINE

SYS_AI_2020-03-20/13:56:03      6 SYS_AI_20tjdcuwznyhx      SOE   PRODUCT_DESCRIPTIONS       CREATE INDEX
CREATE INDEX "SOE"."SYS_AI_20tjdcuwznyhx"   ON "SOE"."PRODUCT_DESCRIPTIONS"("PRO 13:56:08 20.03.2020 13:56:08 20.03.2020      0
DUCT_ID") TABLESPACE "AUTO_INDEXES_TS" UNUSABLE INVISIBLE AUTO COMPRESS ADVANCED
 LOW  ONLINE

DBA_AUTO_INDEX_VERIFICATIONS reports on the tests that were made on statements before and after the index changes were made. You can see some have improved and some have regressed.

Tue Mar 17                                                                                        page    1
                                         Auto Index Verifications

                                           Original Auto Index    Original  Auto Index
EXECUTION_NAME             SQL_ID         Plan Hash  Plan Hash Buffer Gets Buffer Gets STATUS
-------------------------- ------------- ---------- ---------- ----------- ----------- ------------
SYS_AI_2020-03-17/15:48:28 0sh0fn7r21020 3619984409 3900469033       37784         130 IMPROVED
SYS_AI_2020-03-17/16:18:29               3900469033 3900469033        1316         135 UNCHANGED

SYS_AI_2020-03-17/15:48:28 200mw76ta6n1r 2844209861 2671811931       37769        3555 IMPROVED
SYS_AI_2020-03-17/16:18:29               2671811931 2671811931        3278        3596 UNCHANGED

SYS_AI_2020-03-17/15:48:28 28tr1bjf4t2uh 2692802960 3836151239       37764        3238 IMPROVED
SYS_AI_2020-03-17/16:18:29               3836151239 3836151239        3272        3442 UNCHANGED

SYS_AI_2020-03-17/15:48:28 9dt3dqym1tqzw 3954032495 1068597273          46           4 UNCHANGED

SYS_AI_2020-03-17/15:48:28 a90pbxt8zukdr 1513149408 3900469033          67           1 UNCHANGED

SYS_AI_2020-03-17/15:48:28 amaapqt3p9qd0 2597291669 1494990609       14645          23 IMPROVED
SYS_AI_2020-03-17/16:18:29               1494990609 1494990609           3          23 UNCHANGED

SYS_AI_2020-03-17/15:48:28 b4p66t3uznnuc 3551246360  463531433        4038        4406 UNCHANGED

SYS_AI_2020-03-17/15:48:28 dunt7pwuax92s 1878158884 2671811931          13        2965 REGRESSED

SYS_AI_2020-03-17/15:48:28 dy8cxyd3mv1as 2679498789 2126884530         155         298 REGRESSED

SYS_AI_2020-03-17/15:48:28 g1znkya370htg 3571181773  896069541          74          42 UNCHANGED

This testing mechanism generally prevents Automatic Indexing from creating indexes that are not used. However, Richard Foote has found an exception where the number of buffer gets goes down, but the optimizer cost goes up.

see Richard Foote: Oracle 19c Automatic Indexing: Index Created But Not Actually Used

The decision by the Tuning Advisor to propose the index is determined by optimizer cost, the decision to use a valid visible index is also determined by optimizer cost. I think it is slightly incongruous that the decision whether to make a candidate index visible and therefore available to the application, is determined by logical I/O, CPU consumption, and elapsed time but not at all optimiser cost.

Results

The entirety of this test was run with the automatically created indexes in place.

Transaction	Average Response (ms)
Transaction	1: Delivered Indexes	2: Drop Secondary Indexes	3: Automatic Indexing
Update Customer Details	1.18	3.30	3.32
Browse Products	2.03	409.21	478.52
Browse Orders	2.38	2.05	2.01
Customer Registration	3.50	78.51	5.91
Order Products	5.67	40.97	50.34
Warehouse Query	6.20	2.82	2.85
Process Orders	13.42	247.80	5.39
Warehouse Activity Query	14.89	274.19	11.43
Sales Rep Query	31.76	268.51	14.45
TPS	1060.81	81.30	137.40

Comparison with No Secondary Indexes

I have used the execution statistics in DBA_HIST_SQLSTAT for statements captured by AWR during each test and compared the execution plans and average elapsed time for each.

Where the plans change, they do change for the better, so Automatic Indexing is doing its job

                          Average                                                              Average       %    %Num
Test                 SQL Plan     Opt.      Num   Elapsed  Elapsed   Test                 SQL Plan     Opt.      Num   Elapsed  Elapsed    Time   Execs
  ID SQL_ID        Hash Value     Cost    Execs      Time     Time ?   ID SQL_ID        Hash Value     Cost    Execs      Time     Time    Diff    Diff
---- ------------- ---------- -------- -------- --------- -------- - ---- ------------- ---------- -------- -------- --------- -------- ------- -------
   2 g9wsbkb2jag3j 1005345217      229    15028   6108.13    .4064 =    3 g9wsbkb2jag3j 1005345217      229    31523  15046.99    .4773      17     110
   2 34mt4skacwwwd  235854103       73     7547    295.26    .0391 =    3 34mt4skacwwwd  235854103       73    15601    766.31    .0491      26     107
   2 g1znkya370htg 3571181773       45   224885     59.99    .0003 =    3 g1znkya370htg 3571181773       45   470063    111.48    .0002     -11     109
   2 djj5txv2dzwb6 3241608609        1   263982     38.19    .0001 =    3 djj5txv2dzwb6 3241608609        1   550563     77.90    .0001      -2     109
   2 09pzy8x10gjkg          0        1   139639     24.96    .0002 =    3 09pzy8x10gjkg          0        1   292179     53.20    .0002       2     109
   2 200mw76ta6n1r 2844209861    10151     1514    405.73    .2680 !    3 200mw76ta6n1r 2671811931     3257     3268     46.67    .0143     -95     116
   2 a6hdpzrqqhc7d          0        1    70858     26.01    .0004 =    3 a6hdpzrqqhc7d          0        1   148211     41.21    .0003     -24     109
   2 28tr1bjf4t2uh 2692802960    10140     1575    430.72    .2735 !    3 28tr1bjf4t2uh 3836151239     3245     3118     35.58    .0114     -96      98
   2 982zxphp8ht6c 1666523684        2   407633     14.10    .0000 =    3 982zxphp8ht6c 1666523684        2   849104     30.01    .0000       2     108
   2 csasr8ct2051v  900611645        3   263976     13.77    .0001 =    3 csasr8ct2051v  900611645        3   550572     29.06    .0001       1     109
   2 0sh0fn7r21020 3619984409    15124     3019    747.00    .2474 !    3 0sh0fn7r21020 3900469033     4695     5030     25.60    .0051     -98      67
   2 0sh0fn7r21020 3619984409    15124     3019    747.00    .2474 !    3 0sh0fn7r21020 2629004565    14875     1208      6.04    .0050     -98     -60
   2 5g00dq4fxwnsw 2141863993        3    95832      7.13    .0001 =    3 5g00dq4fxwnsw 2141863993        3   292176     21.40    .0001      -2     205
   2 2yp5w5a36s5xv 1628223527        3    48610      5.50    .0001 =    3 2yp5w5a36s5xv 1628223527        3   148215     12.84    .0001     -23     205
   2 4a7nqf7k0ztyc          0        1    30356      6.03    .0002 =    3 4a7nqf7k0ztyc          0        1    63339     12.33    .0002      -2     109
   2 49d9qhgsr8w9h          0        1    20825      3.40    .0002 =    3 49d9qhgsr8w9h          0        1    63339     10.44    .0002       1     204
   2 8uk8bquk453q8 3072215225        2    48612      5.61    .0001 =    3 8uk8bquk453q8 3072215225        2   134571      8.51    .0001     -45     177
   2 cr72yp489p3jw          0        1    20824      2.57    .0001 =    3 cr72yp489p3jw          0        1    44297      6.97    .0002      27     113
   2 g3kf1ppky3627 2480532011        8    67021      3.00    .0000 =    3 g3kf1ppky3627 2480532011        6   143326      6.57    .0000       2     114
   2 0t61wk161zz87 1544532951        2    20823      2.26    .0001 =    3 0t61wk161zz87 1544532951        2    13799      1.64    .0001       9     -34
   2 amaapqt3p9qd0 2597291669     4276    75096   5348.00    .0712 !    3 amaapqt3p9qd0 1494990609        7    34857      1.40    .0000    -100     -54
   2 8xqdxjkbt9ghg          0        1     5681      1.93    .0003 =    3 8xqdxjkbt9ghg          0        1     4129      1.34    .0003      -4     -27
   2 6k3uuf3g8pwh6 1628223527        3     5167      1.43    .0003 =    3 6k3uuf3g8pwh6 1628223527        3     3527      1.13    .0003      16     -32
   2 a9cv97h3dazfh 1197098199        3    11144      1.48    .0001 =    3 a9cv97h3dazfh 1197098199        3     7665      1.09    .0001       7     -31
   2 0c11vprf4881w  856749079        6    11370       .85    .0001 =    3 0c11vprf4881w  856749079        7    10487       .85    .0001       9      -8
   2 3rxkss61q68su 1322380957        5     4821       .31    .0001 =    3 3rxkss61q68su 1322380957        5     9281       .64    .0001       8      93
   2 9v9ky32fg9hy7  104664550        2     4140       .61    .0001 =    3 9v9ky32fg9hy7  104664550        2     4121       .55    .0001     -11      -0
   2 4abyshv6jmtdk  140963536      123       15       .05    .0036 =    3 4abyshv6jmtdk  140963536      123       20       .08    .0039       9      33

Comparison with Delivered Indexes

However, if we compare the delivered indexes against just the primary indexes and those created by Automatic Indexing, a number of statements have degraded, one particularly severely.

                                                  Average                                                              Average       %    %Num
Test                 SQL Plan     Opt.      Num   Elapsed  Elapsed   Test                 SQL Plan     Opt.      Num   Elapsed  Elapsed    Time   Execs
  ID SQL_ID        Hash Value     Cost    Execs      Time     Time ?   ID SQL_ID        Hash Value     Cost    Execs      Time     Time    Diff    Diff
---- ------------- ---------- -------- -------- --------- -------- - ---- ------------- ---------- -------- -------- --------- -------- ------- -------
   1 g9wsbkb2jag3j  574689976        5   148925      9.33    .0001 !    3 g9wsbkb2jag3j 1005345217      229    31523  15046.99    .4773  761882     -79
   1 34mt4skacwwwd  235854103       74    90568   2884.16    .0318 =    3 34mt4skacwwwd  235854103       73    15601    766.31    .0491      54     -83
   1 g1znkya370htg  124060720       26  2725529    331.81    .0001 !    3 g1znkya370htg 3571181773       45   470063    111.48    .0002      95     -83
   1 djj5txv2dzwb6 3241608609        1  3179667    435.37    .0001 =    3 djj5txv2dzwb6 3241608609        1   550563     77.90    .0001       3     -83
   1 09pzy8x10gjkg          0        1  1687520    285.05    .0002 =    3 09pzy8x10gjkg          0        1   292179     53.20    .0002       8     -83
   1 200mw76ta6n1r 1448083145     1437    18129    367.09    .0202 !    3 200mw76ta6n1r 2671811931     3257     3268     46.67    .0143     -29     -82
   1 a6hdpzrqqhc7d          0        1   857616    244.55    .0003 =    3 a6hdpzrqqhc7d          0        1   148211     41.21    .0003      -2     -83
   1 28tr1bjf4t2uh 2220165490     1425    17921    167.57    .0094 !    3 28tr1bjf4t2uh 3836151239     3245     3118     35.58    .0114      22     -83
   1 982zxphp8ht6c 1666523684        2  4903566    171.31    .0000 =    3 982zxphp8ht6c 1666523684        2   849104     30.01    .0000       1     -83
   1 csasr8ct2051v  900611645        3  3179610    159.11    .0001 =    3 csasr8ct2051v  900611645        3   550572     29.06    .0001       5     -83
   1 0sh0fn7r21020 1055577880     1258    36654    175.46    .0048 !    3 0sh0fn7r21020 3900469033     4695     5030     25.60    .0051       6     -86
   1 5g00dq4fxwnsw 2141863993        3  1687532    120.78    .0001 =    3 5g00dq4fxwnsw 2141863993        3   292176     21.40    .0001       2     -83
   1 2yp5w5a36s5xv 1628223527        3   857624    114.81    .0001 =    3 2yp5w5a36s5xv 1628223527        3   148215     12.84    .0001     -35     -83
   1 4a7nqf7k0ztyc          0        1   363873    109.76    .0003 =    3 4a7nqf7k0ztyc          0        1    63339     12.33    .0002     -35     -83
   1 49d9qhgsr8w9h          0        1   363871     55.61    .0002 =    3 49d9qhgsr8w9h          0        1    63339     10.44    .0002       8     -83
   1 8uk8bquk453q8 3072215225        2   857622     51.75    .0001 =    3 8uk8bquk453q8 3072215225        2   134571      8.51    .0001       5     -84
   1 cr72yp489p3jw          0        1   363878     52.63    .0001 =    3 cr72yp489p3jw          0        1    44297      6.97    .0002       9     -88
   1 g3kf1ppky3627 2480532011        8  1180857     51.46    .0000 =    3 g3kf1ppky3627 2480532011        6   143326      6.57    .0000       5     -88
   1 0sh0fn7r21020 1055577880     1258    36654    175.46    .0048 !    3 0sh0fn7r21020 2629004565    14875     1208      6.04    .0050       4     -97
   1 0t61wk161zz87 1544532951        2   363871     37.74    .0001 =    3 0t61wk161zz87 1544532951        2    13799      1.64    .0001      14     -96
   1 amaapqt3p9qd0 3722429161        8   908901     32.04    .0000 !    3 amaapqt3p9qd0 1494990609        7    34857      1.40    .0000      14     -96
   1 8xqdxjkbt9ghg          0        1    69829     14.61    .0002 =    3 8xqdxjkbt9ghg          0        1     4129      1.34    .0003      56     -94
   1 6k3uuf3g8pwh6 1628223527        3    90569     28.00    .0003 =    3 6k3uuf3g8pwh6 1628223527        3     3527      1.13    .0003       4     -96
   1 a9cv97h3dazfh 1197098199        3   147637     18.88    .0001 =    3 a9cv97h3dazfh 1197098199        3     7665      1.09    .0001      11     -95
   1 0c11vprf4881w  856749079        8   223512     15.24    .0001 =    3 0c11vprf4881w  856749079        7    10487       .85    .0001      19     -95
   1 3rxkss61q68su 1322380957        5   176508     11.20    .0001 =    3 3rxkss61q68su 1322380957        5     9281       .64    .0001       9     -95
   1 9v9ky32fg9hy7  104664550        2    43191      2.69    .0001 =    3 9v9ky32fg9hy7  104664550        2     4121       .55    .0001     113     -90
   1 4h624tuydrjnh 3828985807        3    62578      4.69    .0001 =    3 4h624tuydrjnh 3828985807        3     4131       .46    .0001      50     -93
   1 95hgbb2kkcvvg 3419397814    12934        1      4.09   4.0858 !
   1 3gs4005kgkhxu  296924608     6423        1      4.05   4.0539 !

Test 4: Manual Tuning

Then I looked at whether I could get back to the original performance by manually tuning the top SQL statements rather than reinstating all the indexes that I had dropped. I found I needed to create just four more indexes.

The first two are reinstated indexes that were originally part of the SOE schema but were dropped as secondary indexes.

CREATE INDEX SOE.CUST_FUNC_LOWER_NAME_IX 
  ON SOE.CUSTOMERS (LOWER(CUST_LAST_NAME), LOWER(CUST_FIRST_NAME))
  TABLESPACE SOE PARALLEL 8
/
CREATE INDEX SOE.PROD_CATEGORY_IX ON SOE.PRODUCT_INFORMATION (CATEGORY_ID)
  TABLESPACE SOE PARALLEL 8
/

The other two are new indexes that were not originally present.

CREATE INDEX SOE.DMK_ORDER_STATUS ON SOE.ORDERS (ORDER_STATUS) 
  TABLESPACE SOE PARALLEL 8
/
CREATE INDEX SOE.DMK_WAREHOUSE_ORDER_DATE ON SOE.ORDERS (WAREHOUSE_ID, ORDER_DATE)
  TABLESPACE SOE PARALLEL 8
/

Results

I now have 22 visible indexes instead of the original 27, and the performance is better than with the delivered indexes.

Transaction	Average Response (ms)
Transaction	1: Delivered Indexes	2: Drop Secondary Indexes	3: Automatic Indexing	4: Manual Tuning
Update Customer Details	1.18	3.30	3.32	3.51
Browse Products	2.03	409.21	478.52	1.93
Browse Orders	2.38	2.05	2.01	2.12
Customer Registration	3.50	78.51	5.91	5.92
Order Products	5.67	40.97	50.34	1.99
Warehouse Query	6.20	2.82	2.85	3.00
Process Orders	13.42	247.80	5.39	4.95
Warehouse Activity Query	14.89	274.19	11.43	20.29
Sales Rep Query	31.76	268.51	14.45	3.74
TPS	1060.81	81.30	137.40	1166.49

Comparison with Delivered Indexes

We can see from the SQL statistics comparison that most of the original plans have been reinstated, and elsewhere there are both improvements and regressions.

                                                           Average                                                              Average       %    %Num
Test                 SQL Plan     Opt.      Num   Elapsed  Elapsed   Test                 SQL Plan     Opt.      Num   Elapsed  Elapsed    Time   Execs
  ID SQL_ID        Hash Value     Cost    Execs      Time     Time ?   ID SQL_ID        Hash Value     Cost    Execs      Time     Time    Diff    Diff
---- ------------- ---------- -------- -------- --------- -------- - ---- ------------- ---------- -------- -------- --------- -------- ------- -------
   1 djj5txv2dzwb6 3241608609        1  3179667    435.37    .0001 =    4 djj5txv2dzwb6 3241608609        1  3787684    533.89    .0001       3      19
   1 g1znkya370htg  124060720       26  2725529    331.81    .0001 !    4 g1znkya370htg  684158979       19  3250699    491.48    .0002      24      19
   1 28tr1bjf4t2uh 2220165490     1425    17921    167.57    .0094 !    4 28tr1bjf4t2uh 3836151239     6155    21756    435.75    .0200     114      21
   1 09pzy8x10gjkg          0        1  1687520    285.05    .0002 =    4 09pzy8x10gjkg          0        1  2011130    357.96    .0002       5      19
   1 a6hdpzrqqhc7d          0        1   857616    244.55    .0003 =    4 a6hdpzrqqhc7d          0        1  1021001    304.66    .0003       5      19
   1 982zxphp8ht6c 1666523684        2  4903566    171.31    .0000 =    4 982zxphp8ht6c 1666523684        2  5846476    215.40    .0000       5      19
   1 csasr8ct2051v  900611645        3  3179610    159.11    .0001 =    4 csasr8ct2051v  900611645        3  3787526    197.96    .0001       4      19
   1 0sh0fn7r21020 1055577880     1258    36654    175.46    .0048 !    4 0sh0fn7r21020 3900469033    11026    43379    195.10    .0045      -6      18
   1 5g00dq4fxwnsw 2141863993        3  1687532    120.78    .0001 =    4 5g00dq4fxwnsw 2141863993        3  2011090    148.96    .0001       3      19
   1 2yp5w5a36s5xv 1628223527        3   857624    114.81    .0001 =    4 2yp5w5a36s5xv 1628223527        3  1020995    115.62    .0001     -15      19
   1 4a7nqf7k0ztyc          0        1   363873    109.76    .0003 =    4 4a7nqf7k0ztyc          0        1   432444     95.85    .0002     -27      19
   1 200mw76ta6n1r 1448083145     1437    18129    367.09    .0202 !    4 200mw76ta6n1r  437111724      371    21657     72.86    .0034     -83      19
   1 49d9qhgsr8w9h          0        1   363871     55.61    .0002 =    4 49d9qhgsr8w9h          0        1   432448     67.47    .0002       2      19
   1 g3kf1ppky3627 2480532011        8  1180857     51.46    .0000 =    4 g3kf1ppky3627 2480532011        6  1406867     67.09    .0000       9      19
   1 cr72yp489p3jw          0        1   363878     52.63    .0001 =    4 cr72yp489p3jw          0        1   432449     64.74    .0001       4      19
   1 8uk8bquk453q8 3072215225        2   857622     51.75    .0001 =    4 8uk8bquk453q8 3072215225        2  1020941     63.69    .0001       3      19
   1 34mt4skacwwwd  235854103       74    90568   2884.16    .0318 !    4 34mt4skacwwwd 1567979920       74   108274     48.63    .0004     -99      20
   1 0t61wk161zz87 1544532951        2   363871     37.74    .0001 =    4 0t61wk161zz87 1544532951        2   432449     46.49    .0001       4      19
   1 8xqdxjkbt9ghg          0        1    69829     14.61    .0002 =    4 8xqdxjkbt9ghg          0        1   195205     41.44    .0002       1     180
   1 amaapqt3p9qd0 3722429161        8   908901     32.04    .0000 !    4 amaapqt3p9qd0 1494990609        5  1082090     39.24    .0000       3      19
   1 a9cv97h3dazfh 1197098199        3   147637     18.88    .0001 =    4 a9cv97h3dazfh 1197098199        3   269481     35.83    .0001       4      83
   1 3rxkss61q68su 1322380957        5   176508     11.20    .0001 =    4 3rxkss61q68su 1322380957        5   293179     32.62    .0001      75      66
   1 6k3uuf3g8pwh6 1628223527        3    90569     28.00    .0003 =    4 6k3uuf3g8pwh6 1628223527        3    98133     20.24    .0002     -33       8
   1 0c11vprf4881w  856749079        8   223512     15.24    .0001 =    4 0c11vprf4881w  856749079        6   213021     17.96    .0001      24      -5
   1 g9wsbkb2jag3j  574689976        5   148925      9.33    .0001 =    4 g9wsbkb2jag3j  574689976        7    54410      4.41    .0001      29     -63

Test 5: Managing Manual Indexing

Finally, in this test, I started with all the delivered SOE indexes and have configured Automatic Indexing to consider dropping both automatic and manual indexes that have not been used for an hour (the default is 373 days, I have set this absurdly low value just to demonstrate the behaviour of this feature). Initially, Automatic Indexing is running in report only mode when I started Swingbench running.

EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_DEFAULT_TABLESPACE','AUTO_INDEXES_TS');
EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_SPACE_BUDGET','100');

EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_COMPRESSION','OFF');
EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_RETENTION_FOR_AUTO','.041666');
EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_RETENTION_FOR_MANUAL','.041666');
EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_REPORT_RETENTION','1');
EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_SCHEMA', 'SOE', allow => TRUE);

EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_MODE','REPORT ONLY');

After half an hour I switched to 'implement' mode.

EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_MODE','IMPLEMENT');

Very quickly (because I had previously run this test and the statements were already in the SQL Tuning set) I was left with just 17 indexes.

Table                      Index                                     Cons
Owner TABLE_NAME           Owner INDEX_NAME                UNIQUENES Type STATUS       VISIBILIT AUT INDEX_KEYS
----- -------------------- ----- ------------------------- --------- ---- ------------ --------- --- -------------------------------------------------
SOE   ADDRESSES            SOE   ADDRESS_CUST_IX           NONUNIQUE R    VALID        VISIBLE   NO  CUSTOMER_ID
                           SOE   ADDRESS_PK                UNIQUE    P    VALID        VISIBLE   NO  ADDRESS_ID

SOE   CARD_DETAILS         SOE   CARDDETAILS_CUST_IX       NONUNIQUE      VALID        VISIBLE   NO  CUSTOMER_ID
                           SOE   CARD_DETAILS_PK           UNIQUE    P    VALID        VISIBLE   NO  CARD_ID

SOE   CUSTOMERS            SOE   CUSTOMERS_PK              UNIQUE    P    VALID        VISIBLE   NO  CUSTOMER_ID
                           SOE   CUST_FUNC_LOWER_NAME_IX   NONUNIQUE      VALID        VISIBLE   NO  SYS_NC00017$,SYS_NC00018$

SOE   INVENTORIES          SOE   INVENTORY_PK              UNIQUE    P    VALID        VISIBLE   NO  PRODUCT_ID,WAREHOUSE_ID

SOE   ORDERS               SOE   ORDER_PK                  UNIQUE    P    VALID        VISIBLE   NO  ORDER_ID
                           SOE   ORD_CUSTOMER_IX           NONUNIQUE R    VALID        VISIBLE   NO  CUSTOMER_ID
                           SOE   ORD_SALES_REP_IX          NONUNIQUE      VALID        VISIBLE   NO  SALES_REP_ID
                           SOE   ORD_WAREHOUSE_IX          NONUNIQUE      VALID        VISIBLE   NO  WAREHOUSE_ID,ORDER_STATUS

SOE   ORDER_ITEMS          SOE   ITEM_ORDER_IX             NONUNIQUE R    VALID        VISIBLE   NO  ORDER_ID
                           SOE   ORDER_ITEMS_PK            UNIQUE    P    VALID        VISIBLE   NO  ORDER_ID,LINE_ITEM_ID

SOE   PRODUCT_DESCRIPTIONS SOE   PRD_DESC_PK               UNIQUE    P    VALID        VISIBLE   NO  PRODUCT_ID,LANGUAGE_ID

SOE   PRODUCT_INFORMATION  SOE   PRODUCT_INFORMATION_PK    UNIQUE    P    VALID        VISIBLE   NO  PRODUCT_ID
                           SOE   PROD_CATEGORY_IX          NONUNIQUE      VALID        VISIBLE   NO  CATEGORY_ID

SOE   WAREHOUSES           SOE   WAREHOUSES_PK             UNIQUE    P    VALID        VISIBLE   NO  WAREHOUSE_ID

17 rows selected.

The automatic indexing actions only report actions on automatic indexes. It does not report decisions to drop or not drop manual indexes. I only know the indexes have gone because I manually compared with the initial set of indexes.

It has left 5 secondary indexes, but it has removed 3 of the 6 indexes on foreign keys that DROP_SECONDARY_INDEXES left intact.

We can see from the performance chart that there is a significant drop in the performance of the test after about 30 minutes when the automatic indexing job dropped the indexes.

Conclusion

Automatic Indexing does what it claims, but I think it doesn't go far enough when it comes to identifying new indexes. In particular, it did not recreate the function-based index (on the lower-case customer names) that makes the most significant difference in performance to Swingbench.

Oracle makes bold claims for improvements in performance via automatically created indexes. However, my experience across the SOE benchmark as a whole was that I saw only modest performance gains relative to the point where I dropped the secondary indexes. The performance of the SQL statements that made use of the automatic indexes certainly did improve, and significantly. Automatic Indexing generally doesn't create indexes that are not used, but Richard Foote has shown that there are exceptions where the number of buffer gets goes down but the optimizer cost goes up.

As Tim Hall says, you have to be 'particularly brave' to DROP_SECONDARY_INDEXES. My experience was that doing so significantly degraded performance, and then Automatic Indexing did not fully mitigate that. You will be left trying to work out which indexes you have to put back yourself.

In the current release, I think allowing Automatic Indexing to remove manual indexes would be extremely dangerous. You wouldn't know when manual indexes, including those on foreign keys, were removed and again you could be left dealing with performance issues. If, as you should, you use foreign keys to enforce referential integrity you could get TM locking issues.

I think the SOE benchmark is a fair test of Automatic Indexing. My manual tuning, that not only restored original performance but improved upon it, was not significantly different to anything I have seen on a typical ERP or other OLTP systems. It was limited to adding indexes, and I still ended up with fewer indexes.

It is possible to rebuild, coalesce or shrink automatic indexes, however, you cannot drop or otherwise alter them. Procedure DROP_AUTO_INDEXES in DBMS_AUTO_INDEX is not documented and does not currently work (in 19.3-20.2). I think it would be very difficult to let Automatic Indexing do some of the work and then do some manual tuning alongside it. You would just get in each other's way. The activity reports and the index verification information may be a useful source of information during manual tuning, but that is using the feature as another tuning advisor. Automatic Indexing is clearly intended to be an autonomous feature. Either you turn it on and let it do its thing, or not.
Added 6.5.2020: Richard Foote has blogged on this point since I first wrote this article:

To be fair this is the initial release (though testing on 20c on a Virtual Machine produced the same behaviour), and like other Oracle database features before it, it will mature with time. However, at the moment, I think we are a long way from being able to just turn it on and walk away.

↧

Loading a Flat File from OCI Object Storage into an Autonomous Database. Part 1. Upload to Object Storage

June 1, 2020, 12:00 am

≫ Next: Loading a Flat File from OCI Object Storage into an Autonomous Database. Part 2. Reading from Object Storage with an External Table

≪ Previous: Oracle 19c: Automatic Indexing. Part 2. Testing Automatic Indexing with Swingbench

This blog is the first in a series of three that looks at transferring a file to Oracle Cloud Infrastructure (OCI) Object Storage, and then reading it into the database with an external table or copying it into a regular table.

Putting data files into OCI Object Storage

Setup Profile Authentication
Copy file to OCI Object Storage

Reading from OCI Object Storage (using DBMS_CLOUD)

Last year I wrote a blog titled Reading the Active Session History Compressed Export File in eDB360/SQLd360 as an External Table. I set myself the challenge of doing the same thing with an Autonomous database. I would imagine that these are commonly used Oracle Cloud operations, yet I found the documentation was spread over a number of places, and it took me a while to get it right. So, I hope you find this series helpful.

Install OCI

I could just upload my data file through the browser directly into an object storage bucket, but I don't want to copy it to a Windows desktop. That is not a good option for very large files. Instead, I am going to install the OCI Command Line Interface onto the Linux VM where my data file resides (see OCI CLI Quickstart Guide).

I am installing this into the oracle user on a Linux VM where the Oracle database has previously been installed, so I just accepted all the defaults.

bash -c "$(curl -L https://raw.githubusercontent.com/oracle/oci-cli/master/scripts/install/install.sh)"

Set up Token-Based Authentication for OCI

I couldn't get the instructions for generating a token without a browser to work. Instead, installed OCI on a Windows machine and generated a token there and transferred it to my Linux VM (see Token-based Authentication for the CLI).

C:\Users\david.kurtz>oci session authenticate
Enter a region (e.g. ap-mumbai-1, ap-seoul-1, ap-sydney-1, ap-tokyo-1, ca-toronto-1, eu-frankfurt-1, eu-zurich-1, sa-saopaulo-1, uk-london-1, us-ashburn-1, us-gov-ashburn-1, us-gov-chicago-1, us-gov-phoenix-1, us-langley-1, us-luke-1, us-phoenix-1): uk-london-1
    Please switch to newly opened browser window to log in!
    Completed browser authentication process!
Config written to: C:\Users\david.kurtz\.oci\config

    Try out your newly created session credentials with the following example command:

oci iam region list --config-file C:\Users\david.kurtz\.oci\config --profile DEFAULT --auth security_token

If I run the suggested example command, I get this response with the list of OCI regions.

{
"data": [
…
    {
"key": "LHR",
"name": "uk-london-1"
    },
…
  ]
}

Export OCI Profile

Now I can export the profile to a zip file

C:\Users\david.kurtz>oci session export --profile DEFAULT --output-file DEFAULT
File DEFAULT.zip already exists, do you want to overwrite it? [y/N]: y
Exporting profile: DEFAULT from config file: C:\Users\david.kurtz\.oci\config
Export file written to: C:\Users\david.kurtz\DEFAULT.zip

Import OCI Profile

I can transfer this zip file to my Linux VM and import it.

[oracle@oracle-database .oci]$ oci session import --session-archive ./DEFAULT.zip --force
Config already contains a profile with the same name as the archived profile: DEFAULT. Provide an alternative name for the imported profile: myprofile
Imported profile myprofile written to: /home/oracle/.oci/config

    Try out your newly imported session credentials with the following example command:

    oci iam region list --config-file /home/oracle/.oci/config --profile myprofile --auth security_token

I can test it by again getting the list of OCI regions.

Upload a File

I have created a bucket on OCI.

I could upload a file through the OCI web interface, but I want to use a command-line from my Linux VM

[oracle@oracle-database ~]$ oci os object put --bucket-name bucket-20200505-1552 --file /media/sf_temp/dba_hist_active_sess_history.txt.gz --disable-parallel-uploads --config-file /home/oracle/.oci/config --profile myprofile --auth security_token
Upload ID: 1ad452f7-ab49-a24b-2fe9-f55f565cdf40
Split file into 2 parts for upload.
Uploading object  [####################################]  100%
{
"etag": "66681c40-4e11-4b73-baf9-cc1e4c3ebd5f",
"last-modified": "Wed, 06 May 2020 15:17:03 GMT",
"opc-multipart-md5": "MFdfU7vGZlJ5Mb4nopxtpw==-2"
}

I can see the file in the bucket via the web interface, and I can see that the size and the MD5 checksum are both correct.

See also Oracle: All Things Database blog - Data Warehouse 101: Setting up Object Store.

In the next post, I will explain how to read the file from Object Storage using an External Table.

↧

Loading a Flat File from OCI Object Storage into an Autonomous Database. Part 2. Reading from Object Storage with an External Table

June 1, 2020, 11:14 pm

≫ Next: Loading a Flat File from OCI Object Storage into an Autonomous Database. Part 3. Copying data from Object Storage to a Regular Table

≪ Previous: Loading a Flat File from OCI Object Storage into an Autonomous Database. Part 1. Upload to Object Storage

This blog is the second in a series of three that looks at transferring a file to Oracle Cloud Infrastructure (OCI) Object Storage, and then reading it into the database with an external table or copying it into a regular table.

Putting data files into OCI Object Storage

Reading from OCI Object Storage (using DBMS_CLOUD)

Setting up a credential
Using an external table
Copy to a heap table

Create A Credential

First, I need to create a credential that the database will use to connect to the OCI Object Storage. This is not the same as the credential that the OCI CLI used to connect.
In the OCI interface navigate to Identity ➧ Users ➧ User Details, and create an Authentication Token.

It is important to copy the token at this point because you will not see it again.

Now you can put the token into a database credential.

connect admin/Password2020@gofaster1b_tp 
BEGIN
  DBMS_CLOUD.CREATE_CREDENTIAL (
    credential_name => 'MY_BUCKET',
    username=> 'oraclecloud1@go-faster.co.uk',
    password=> 'K7xfi-mG<1z:dq code="" end="" m="">1z:dq>

Note: The visibility of the bucket that I created earlier is private by default. Therefore, I can only access it with an authenticated user. If I were to create a credential for an unauthenticated user, it could only be accessed as public bucket. Otherwise, I would obtain an error.

ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-20404: Object not found -
https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu/b/bucket-202005
05-1552/o/dba_hist_active_sess_history.txt.gz
ORA-06512: at "C##CLOUD$SERVICE.DBMS_CLOUD", line 964
ORA-06512: at "C##CLOUD$SERVICE.DBMS_CLOUD_INTERNAL", line 3891
ORA-06512: at line 1

Create an External Table

In my blog Reading the Active Session History Compressed Export File in eDB360/SQLd360 as an External Table, I showed how to create an external table to read a compressed file. Now I am going to do the same thing as, except that now I am going to read it from OCI Object Storage into an external table created with DBMS_CLOUD.

I have to provide a list of columns in the external table and a list of fields in the flat file.
N.B. Some column names end in a # symbol. These must be put in double-quotes in the field list though this is not needed in the column list.
The Access Parameters section of the ORACLE_LOADER access driver that I used to create the external table becomes contents the format parameter. I have created a JSON object to hold the various parameters. The parameters are not exactly the same, in fact, I have added some. See also DBMS_CLOUD Package Format Options

DROP TABLE ash_hist PURGE;
BEGIN
   DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
    table_name =>'ASH_HIST',
    credential_name =>'MY_BUCKET',
    file_uri_list =>'https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu/b/bucket-20200505-1552/o/dba_hist_active_sess_history.txt.gz',
    format => json_object('blankasnull'      value 'true'
                         ,'compression'      value 'gzip'
                         ,'dateformat'       value 'YYYY-MM-DD/HH24:mi:ss'
                         ,'timestampformat'  value 'YYYY-MM-DD/HH24:mi:ss.ff'
                         ,'delimiter'        value '<,>'
                         ,'ignoreblanklines' value 'true'
                         ,'rejectlimit'      value '10'
                         ,'removequotes'     value 'true'
                         ,'trimspaces'       value 'lrtrim'
                         ),
    column_list => 'SNAP_ID NUMBER
,DBID NUMBER
,INSTANCE_NUMBER NUMBER
,SAMPLE_ID NUMBER
,SAMPLE_TIME TIMESTAMP(3)
,SESSION_ID NUMBER
,SESSION_SERIAL# NUMBER
,SESSION_TYPE VARCHAR2(10)
,FLAGS NUMBER
,USER_ID NUMBER
-----------------------------------------
,SQL_ID VARCHAR2(13)
,IS_SQLID_CURRENT VARCHAR2(1)
,SQL_CHILD_NUMBER NUMBER
,SQL_OPCODE NUMBER
,SQL_OPNAME VARCHAR2(64)
,FORCE_MATCHING_SIGNATURE NUMBER
,TOP_LEVEL_SQL_ID VARCHAR2(13)
,TOP_LEVEL_SQL_OPCODE NUMBER
,SQL_PLAN_HASH_VALUE NUMBER
,SQL_FULL_PLAN_HASH_VALUE NUMBER 
-----------------------------------------
,SQL_ADAPTIVE_PLAN_RESOLVED NUMBER
,SQL_PLAN_LINE_ID NUMBER
,SQL_PLAN_OPERATION VARCHAR2(64)
,SQL_PLAN_OPTIONS VARCHAR2(64)
,SQL_EXEC_ID NUMBER
,SQL_EXEC_START DATE
,PLSQL_ENTRY_OBJECT_ID NUMBER
,PLSQL_ENTRY_SUBPROGRAM_ID NUMBER
,PLSQL_OBJECT_ID NUMBER
,PLSQL_SUBPROGRAM_ID NUMBER
-----------------------------------------
,QC_INSTANCE_ID NUMBER
,QC_SESSION_ID NUMBER
,QC_SESSION_SERIAL# NUMBER
,PX_FLAGS NUMBER
,EVENT VARCHAR2(64)
,EVENT_ID NUMBER
,SEQ# NUMBER
,P1TEXT VARCHAR2(64)
,P1 NUMBER
,P2TEXT VARCHAR2(64)
-----------------------------------------
,P2 NUMBER
,P3TEXT VARCHAR2(64)
,P3 NUMBER
,WAIT_CLASS VARCHAR2(64)
,WAIT_CLASS_ID NUMBER
,WAIT_TIME NUMBER
,SESSION_STATE VARCHAR2(7)
,TIME_WAITED NUMBER
,BLOCKING_SESSION_STATUS VARCHAR2(11)
,BLOCKING_SESSION NUMBER
-----------------------------------------
,BLOCKING_SESSION_SERIAL# NUMBER
,BLOCKING_INST_ID NUMBER
,BLOCKING_HANGCHAIN_INFO VARCHAR2(1)
,CURRENT_OBJ# NUMBER
,CURRENT_FILE# NUMBER
,CURRENT_BLOCK# NUMBER
,CURRENT_ROW# NUMBER
,TOP_LEVEL_CALL# NUMBER
,TOP_LEVEL_CALL_NAME VARCHAR2(64)
,CONSUMER_GROUP_ID NUMBER
-----------------------------------------
,XID RAW(8)
,REMOTE_INSTANCE# NUMBER
,TIME_MODEL NUMBER
,IN_CONNECTION_MGMT VARCHAR2(1)
,IN_PARSE VARCHAR2(1)
,IN_HARD_PARSE VARCHAR2(1)
,IN_SQL_EXECUTION VARCHAR2(1)
,IN_PLSQL_EXECUTION VARCHAR2(1)
,IN_PLSQL_RPC VARCHAR2(1)
,IN_PLSQL_COMPILATION VARCHAR2(1)
-----------------------------------------
,IN_JAVA_EXECUTION VARCHAR2(1)
,IN_BIND VARCHAR2(1)
,IN_CURSOR_CLOSE VARCHAR2(1)
,IN_SEQUENCE_LOAD VARCHAR2(1)
,IN_INMEMORY_QUERY VARCHAR2(1) /*added 12.1*/
,IN_INMEMORY_POPULATE VARCHAR2(1) /*added 12.1*/
,IN_INMEMORY_PREPOPULATE VARCHAR2(1) /*added 12.1*/
,IN_INMEMORY_REPOPULATE VARCHAR2(1) /*added 12.1*/
,IN_INMEMORY_TREPOPULATE VARCHAR2(1) /*added 12.1*/
,CAPTURE_OVERHEAD VARCHAR2(1)
-----------------------------------------
,REPLAY_OVERHEAD VARCHAR2(1)
,IS_CAPTURED VARCHAR2(1)
,IS_REPLAYED VARCHAR2(1)
,SERVICE_HASH NUMBER
,PROGRAM VARCHAR2(64)
,MODULE VARCHAR2(64)
,ACTION VARCHAR2(64)
,CLIENT_ID VARCHAR2(64)
,MACHINE VARCHAR2(64)
,PORT NUMBER
-----------------------------------------
,ECID VARCHAR2(64)
,DBREPLAY_FILE_ID NUMBER /*added 12.1*/
,DBREPLAY_CALL_COUNTER NUMBER /*added 12.1*/
,TM_DELTA_TIME NUMBER
,TM_DELTA_CPU_TIME NUMBER
,TM_DELTA_DB_TIME NUMBER
,DELTA_TIME NUMBER
,DELTA_READ_IO_REQUESTS NUMBER
,DELTA_WRITE_IO_REQUESTS NUMBER
,DELTA_READ_IO_BYTES NUMBER
-----------------------------------------
,DELTA_WRITE_IO_BYTES NUMBER
,DELTA_INTERCONNECT_IO_BYTES NUMBER
,PGA_ALLOCATED NUMBER
,TEMP_SPACE_ALLOCATED NUMBER
,DBOP_NAME VARCHAR2(64) /*added 12.1*/
,DBOP_EXEC_ID NUMBER /*added 12.1*/
,CON_DBID NUMBER /*added 12.1*/
,CON_ID NUMBER /*added 12.1*/'
-----------------------------------------
,field_list=>'SNAP_ID,DBID,INSTANCE_NUMBER,SAMPLE_ID,SAMPLE_TIME ,SESSION_ID,"SESSION_SERIAL#",SESSION_TYPE,FLAGS,USER_ID
,SQL_ID,IS_SQLID_CURRENT,SQL_CHILD_NUMBER,SQL_OPCODE,SQL_OPNAME,FORCE_MATCHING_SIGNATURE,TOP_LEVEL_SQL_ID,TOP_LEVEL_SQL_OPCODE,SQL_PLAN_HASH_VALUE,SQL_FULL_PLAN_HASH_VALUE
,SQL_ADAPTIVE_PLAN_RESOLVED,SQL_PLAN_LINE_ID,SQL_PLAN_OPERATION,SQL_PLAN_OPTIONS,SQL_EXEC_ID,SQL_EXEC_START ,PLSQL_ENTRY_OBJECT_ID,PLSQL_ENTRY_SUBPROGRAM_ID,PLSQL_OBJECT_ID,PLSQL_SUBPROGRAM_ID
,QC_INSTANCE_ID,QC_SESSION_ID,"QC_SESSION_SERIAL#",PX_FLAGS,EVENT,EVENT_ID,"SEQ#",P1TEXT,P1,P2TEXT
,P2,P3TEXT,P3,WAIT_CLASS,WAIT_CLASS_ID,WAIT_TIME,SESSION_STATE,TIME_WAITED,BLOCKING_SESSION_STATUS,BLOCKING_SESSION
,"BLOCKING_SESSION_SERIAL#",BLOCKING_INST_ID,BLOCKING_HANGCHAIN_INFO,"CURRENT_OBJ#","CURRENT_FILE#","CURRENT_BLOCK#","CURRENT_ROW#","TOP_LEVEL_CALL#",TOP_LEVEL_CALL_NAME,CONSUMER_GROUP_ID
,XID,"REMOTE_INSTANCE#",TIME_MODEL,IN_CONNECTION_MGMT,IN_PARSE,IN_HARD_PARSE,IN_SQL_EXECUTION,IN_PLSQL_EXECUTION,IN_PLSQL_RPC,IN_PLSQL_COMPILATION
,IN_JAVA_EXECUTION,IN_BIND,IN_CURSOR_CLOSE,IN_SEQUENCE_LOAD,IN_INMEMORY_QUERY,IN_INMEMORY_POPULATE,IN_INMEMORY_PREPOPULATE,IN_INMEMORY_REPOPULATE,IN_INMEMORY_TREPOPULATE,CAPTURE_OVERHEAD
,REPLAY_OVERHEAD,IS_CAPTURED,IS_REPLAYED,SERVICE_HASH,PROGRAM,MODULE,ACTION,CLIENT_ID,MACHINE,PORT
,ECID,DBREPLAY_FILE_ID,DBREPLAY_CALL_COUNTER,TM_DELTA_TIME,TM_DELTA_CPU_TIME,TM_DELTA_DB_TIME,DELTA_TIME,DELTA_READ_IO_REQUESTS,DELTA_WRITE_IO_REQUESTS,DELTA_READ_IO_BYTES 
,DELTA_WRITE_IO_BYTES,DELTA_INTERCONNECT_IO_BYTES,PGA_ALLOCATED,TEMP_SPACE_ALLOCATED,DBOP_NAME,DBOP_EXEC_ID,CON_DBID,CON_ID'
);
END;
/

This file contains 1.4M rows in a 200Mb compressed file. If uncompressed it would be 4.6Gb. It takes about 81 seconds to perform a full scan on it.

set autotrace on timi on pages 99 lines 160
break on report
compute sum of ash_secs on report
column event format a40
column min(sample_time) format a22
column max(sample_time) format a22
select event, sum(10) ash_Secs, min(sample_time), max(sample_time)
from ash_hist
--where rownum <= 1000
group by event
order by ash_Secs desc
;
EVENT                                      ASH_SECS MIN(SAMPLE_TIME)       MAX(SAMPLE_TIME)
---------------------------------------- ---------- ---------------------- ----------------------
                                           10304530 22-MAR-20 09.59.51.125 07-APR-20 23.00.30.395
direct path read                            3258500 22-MAR-20 09.59.51.125 07-APR-20 23.00.30.395
SQL*Net more data to client                  269220 22-MAR-20 10.00.31.205 07-APR-20 22.59.30.275
direct path write temp                        32400 22-MAR-20 11.39.53.996 07-APR-20 21.43.47.329
gc cr block busy                              24930 22-MAR-20 10.51.33.189 07-APR-20 22.56.56.804
…
latch: gc element                                10 30-MAR-20 18.42.51.748 30-MAR-20 18.42.51.748
                                         ----------
sum                                        14093050

86 rows selected.

Elapsed: 00:01:21.17

We can see from the plan that it full scanned the external table in parallel.

Execution Plan
----------------------------------------------------------
Plan hash value: 4220750095

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |          |  8344K|   374M|  1417  (33)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR                      |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (ORDER)                 | :TQ10002 |  8344K|   374M|  1417  (33)| 00:00:01 |  Q1,02 | P->S | QC (ORDER) |
|   3 |    SORT ORDER BY                     |          |  8344K|   374M|  1417  (33)| 00:00:01 |  Q1,02 | PCWP |            |
|   4 |     PX RECEIVE                       |          |  8344K|   374M|  1417  (33)| 00:00:01 |  Q1,02 | PCWP |            |
|   5 |      PX SEND RANGE                   | :TQ10001 |  8344K|   374M|  1417  (33)| 00:00:01 |  Q1,01 | P->P | RANGE      |
|   6 |       HASH GROUP BY                  |          |  8344K|   374M|  1417  (33)| 00:00:01 |  Q1,01 | PCWP |            |
|   7 |        PX RECEIVE                    |          |  8344K|   374M|  1417  (33)| 00:00:01 |  Q1,01 | PCWP |            |
|   8 |         PX SEND HASH                 | :TQ10000 |  8344K|   374M|  1417  (33)| 00:00:01 |  Q1,00 | P->P | HASH       |
|   9 |          HASH GROUP BY               |          |  8344K|   374M|  1417  (33)| 00:00:01 |  Q1,00 | PCWP |            |
|  10 |           PX BLOCK ITERATOR          |          |  8344K|   374M|  1089  (13)| 00:00:01 |  Q1,00 | PCWC |            |
|  11 |            EXTERNAL TABLE ACCESS FULL| ASH_HIST |  8344K|   374M|  1089  (13)| 00:00:01 |  Q1,00 | PCWP |            |
------------------------------------------------------------------------------------------------------------------------------

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 2 because of degree limit


Statistics
----------------------------------------------------------
       2617  recursive calls
          3  db block gets
       2751  consistent gets
          0  physical reads
        728  redo size
       5428  bytes sent via SQL*Net to client
        602  bytes received via SQL*Net from client
          7  SQL*Net roundtrips to/from client
        346  sorts (memory)
          0  sorts (disk)
         86  rows processed

In the next post, I will explain how to copy the data directly from Object Storage into a regular table.

↧

Loading a Flat File from OCI Object Storage into an Autonomous Database. Part 3. Copying data from Object Storage to a Regular Table

June 3, 2020, 3:02 am

≫ Next: Oracle 19c: Real-Time Statistics & High-Frequency Statistics Collection

≪ Previous: Loading a Flat File from OCI Object Storage into an Autonomous Database. Part 2. Reading from Object Storage with an External Table

This blog is the third in a series of three that looks at transferring a file to Oracle Cloud Infrastructure (OCI) Object Storage, and then reading it into the database with an external table or copying it into a regular table.

Putting data files into OCI Object Storage

Reading from OCI Object Storage (using DBMS_CLOUD)

Copy Data into Table

Alternatively, we can copy the data into a normal table. The table needs to be created in advance. This time, I am going to run the copy as user SOE rather than ADMIN. I need to:

Grant connect and resource privilege and quota on the data tablespace.
Grant execute on DBMS_CLOUD to SOE, so it can execute the command.
Grant READ and WRITE access on the DATA_PUMP_DIR directory – the log and bad files created by this process are written to this database directory.

connect admin/Password2020!@gofaster1b_tp 
CREATE USER soe IDENTIFIED BY Password2020;
GRANT CONNECT, RESOURCE TO soe;
GRANT EXECUTE ON DBMS_CLOUD TO soe;
GRANT READ, WRITE ON DIRECTORY data_pump_dir TO soe;
ALTER USER soe QUOTA UNLIMITED ON data;

I am now going to switch to user SOE and create my table.

connect soe/Password2020@gofaster1b_tp
Drop table soe.ash_hist purge;
CREATE TABLE soe.ASH_HIST
   (    SNAP_ID NUMBER,
        DBID NUMBER,
        INSTANCE_NUMBER NUMBER,
        SAMPLE_ID NUMBER,
        SAMPLE_TIME TIMESTAMP (3),
--      SAMPLE_TIME_UTC TIMESTAMP (3),
--      USECS_PER_ROW NUMBER,
        SESSION_ID NUMBER,
        SESSION_SERIAL# NUMBER,
        SESSION_TYPE VARCHAR2(10),
        FLAGS NUMBER,
        USER_ID NUMBER,
-----------------------------------------
        SQL_ID VARCHAR2(13),
        IS_SQLID_CURRENT VARCHAR2(1),
        SQL_CHILD_NUMBER NUMBER,
        SQL_OPCODE NUMBER,
        SQL_OPNAME VARCHAR2(64),
        FORCE_MATCHING_SIGNATURE NUMBER,
        TOP_LEVEL_SQL_ID VARCHAR2(13),
        TOP_LEVEL_SQL_OPCODE NUMBER,
        SQL_PLAN_HASH_VALUE NUMBER,
        SQL_FULL_PLAN_HASH_VALUE NUMBER,
-----------------------------------------
        SQL_ADAPTIVE_PLAN_RESOLVED NUMBER,
        SQL_PLAN_LINE_ID NUMBER,
        SQL_PLAN_OPERATION VARCHAR2(64),
        SQL_PLAN_OPTIONS VARCHAR2(64),
        SQL_EXEC_ID NUMBER,
        SQL_EXEC_START DATE,
        PLSQL_ENTRY_OBJECT_ID NUMBER,
        PLSQL_ENTRY_SUBPROGRAM_ID NUMBER,
        PLSQL_OBJECT_ID NUMBER,
        PLSQL_SUBPROGRAM_ID NUMBER,
-----------------------------------------
        QC_INSTANCE_ID NUMBER,
        QC_SESSION_ID NUMBER,
        QC_SESSION_SERIAL# NUMBER,
        PX_FLAGS NUMBER,
        EVENT VARCHAR2(64),
        EVENT_ID NUMBER,
        SEQ# NUMBER,
        P1TEXT VARCHAR2(64),
        P1 NUMBER,
        P2TEXT VARCHAR2(64),
-----------------------------------------
        P2 NUMBER,
        P3TEXT VARCHAR2(64),
        P3 NUMBER,
        WAIT_CLASS VARCHAR2(64),
        WAIT_CLASS_ID NUMBER,
        WAIT_TIME NUMBER,
        SESSION_STATE VARCHAR2(7),
        TIME_WAITED NUMBER,
        BLOCKING_SESSION_STATUS VARCHAR2(11),
        BLOCKING_SESSION NUMBER,
-----------------------------------------
        BLOCKING_SESSION_SERIAL# NUMBER,
        BLOCKING_INST_ID NUMBER,
        BLOCKING_HANGCHAIN_INFO VARCHAR2(1),
        CURRENT_OBJ# NUMBER,
        CURRENT_FILE# NUMBER,
        CURRENT_BLOCK# NUMBER,
        CURRENT_ROW# NUMBER,
        TOP_LEVEL_CALL# NUMBER,
        TOP_LEVEL_CALL_NAME VARCHAR2(64),
        CONSUMER_GROUP_ID NUMBER,
-----------------------------------------
        XID RAW(8),
        REMOTE_INSTANCE# NUMBER,
        TIME_MODEL NUMBER,
        IN_CONNECTION_MGMT VARCHAR2(1),
        IN_PARSE VARCHAR2(1),
        IN_HARD_PARSE VARCHAR2(1),
        IN_SQL_EXECUTION VARCHAR2(1),
        IN_PLSQL_EXECUTION VARCHAR2(1),
        IN_PLSQL_RPC VARCHAR2(1),
        IN_PLSQL_COMPILATION VARCHAR2(1),
-----------------------------------------
        IN_JAVA_EXECUTION VARCHAR2(1),
        IN_BIND VARCHAR2(1),
        IN_CURSOR_CLOSE VARCHAR2(1),
        IN_SEQUENCE_LOAD VARCHAR2(1),
        IN_INMEMORY_QUERY VARCHAR2(1),
        IN_INMEMORY_POPULATE VARCHAR2(1),
        IN_INMEMORY_PREPOPULATE VARCHAR2(1),
        IN_INMEMORY_REPOPULATE VARCHAR2(1),
        IN_INMEMORY_TREPOPULATE VARCHAR2(1),
--      IN_TABLESPACE_ENCRYPTION VARCHAR2(1),
        CAPTURE_OVERHEAD VARCHAR2(1),
-----------------------------------------
        REPLAY_OVERHEAD VARCHAR2(1),
        IS_CAPTURED VARCHAR2(1),
        IS_REPLAYED VARCHAR2(1),
--      IS_REPLAY_SYNC_TOKEN_HOLDER VARCHAR2(1),
        SERVICE_HASH NUMBER,
        PROGRAM VARCHAR2(64),
        MODULE VARCHAR2(64),
        ACTION VARCHAR2(64),
        CLIENT_ID VARCHAR2(64),
        MACHINE VARCHAR2(64),
        PORT NUMBER,
-----------------------------------------
        ECID VARCHAR2(64),
        DBREPLAY_FILE_ID NUMBER,
        DBREPLAY_CALL_COUNTER NUMBER,
        TM_DELTA_TIME NUMBER,
        TM_DELTA_CPU_TIME NUMBER,
        TM_DELTA_DB_TIME NUMBER,
        DELTA_TIME NUMBER,
        DELTA_READ_IO_REQUESTS NUMBER,
        DELTA_WRITE_IO_REQUESTS NUMBER,
        DELTA_READ_IO_BYTES NUMBER,
-----------------------------------------
        DELTA_WRITE_IO_BYTES NUMBER,
        DELTA_INTERCONNECT_IO_BYTES NUMBER,
        PGA_ALLOCATED NUMBER,
        TEMP_SPACE_ALLOCATED NUMBER,
        DBOP_NAME VARCHAR2(64),
        DBOP_EXEC_ID NUMBER,
        CON_DBID NUMBER,
        CON_ID NUMBER,
-----------------------------------------
        CONSTRAINT ash_hist_pk PRIMARY KEY (dbid, instance_number, snap_id, sample_id, session_id)
   ) 
COMPRESS FOR QUERY LOW
/

As Autonomous Databases run on Exadata, I have also specified Hybrid Columnar Compression (HCC) for this table.
Credentials are specific to the database user. I have to create an additional credential, for the same cloud user, but owned by SOE.

ALTER SESSION SET nls_date_Format='hh24:mi:ss dd.mm.yyyy';
set serveroutput on timi on
BEGIN
  DBMS_CLOUD.CREATE_CREDENTIAL (
    credential_name => 'SOE_BUCKET',
    username=> 'oraclecloud1@go-faster.co.uk',
    password=> 'K7xfi-mG<1Z:dq#88;1m'
  );
END;
/
column owner format a10
column credential_name format a20
column comments format a80
column username format a40
SELECT * FROM dba_credentials;

OWNER      CREDENTIAL_NAME      USERNAME                                 WINDOWS_DOMAIN
---------- -------------------- ---------------------------------------- ------------------------------
COMMENTS                                                                         ENABL
-------------------------------------------------------------------------------- -----
ADMIN      MY_BUCKET            oraclecloud1@go-faster.co.uk
{"comments":"Created via DBMS_CLOUD.create_credential"}                          TRUE

SOE        SOE_BUCKET           oraclecloud1@go-faster.co.uk
{"comments":"Created via DBMS_CLOUD.create_credential"}                          TRUE

The COPY_DATA procedure is similar to CREATE_EXTERNAL_TABLE described in the previous post, but it doesn't have a column list. The field names much match the column names. It is sensitive to field names with a trailing #. These must be enclosed in double-quotes.

TRUNCATE TABLE soe.ash_hist;
DECLARE
  l_operation_id NUMBER;
BEGIN
  DBMS_CLOUD.COPY_DATA(
    table_name =>'ASH_HIST',
    credential_name =>'SOE_BUCKET',
    file_uri_list =>'https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu/b/bucket-20200505-1552/o/dba_hist_active_sess_history.txt.gz',
    schema_name => 'SOE',
    format => json_object('blankasnull'      value 'true'
                         ,'compression'      value 'gzip'
                         ,'dateformat'       value 'YYYY-MM-DD/HH24:mi:ss'
                         ,'timestampformat'  value 'YYYY-MM-DD/HH24:mi:ss.ff'
                         ,'delimiter'        value '<,>'
                         ,'ignoreblanklines' value 'true'
                         ,'rejectlimit'      value '10'
                         ,'removequotes'     value 'true'
                         ,'trimspaces'       value 'lrtrim'
                         ),
    field_list=>'SNAP_ID,DBID,INSTANCE_NUMBER,SAMPLE_ID,SAMPLE_TIME ,SESSION_ID,"SESSION_SERIAL#",SESSION_TYPE,FLAGS,USER_ID
,SQL_ID,IS_SQLID_CURRENT,SQL_CHILD_NUMBER,SQL_OPCODE,SQL_OPNAME,FORCE_MATCHING_SIGNATURE,TOP_LEVEL_SQL_ID,TOP_LEVEL_SQL_OPCODE,SQL_PLAN_HASH_VALUE,SQL_FULL_PLAN_HASH_VALUE
,SQL_ADAPTIVE_PLAN_RESOLVED,SQL_PLAN_LINE_ID,SQL_PLAN_OPERATION,SQL_PLAN_OPTIONS,SQL_EXEC_ID,SQL_EXEC_START,PLSQL_ENTRY_OBJECT_ID,PLSQL_ENTRY_SUBPROGRAM_ID,PLSQL_OBJECT_ID,PLSQL_SUBPROGRAM_ID
,QC_INSTANCE_ID,QC_SESSION_ID,"QC_SESSION_SERIAL#",PX_FLAGS,EVENT,EVENT_ID,"SEQ#",P1TEXT,P1,P2TEXT
,P2,P3TEXT,P3,WAIT_CLASS,WAIT_CLASS_ID,WAIT_TIME,SESSION_STATE,TIME_WAITED,BLOCKING_SESSION_STATUS,BLOCKING_SESSION
,"BLOCKING_SESSION_SERIAL#",BLOCKING_INST_ID,BLOCKING_HANGCHAIN_INFO,"CURRENT_OBJ#","CURRENT_FILE#","CURRENT_BLOCK#","CURRENT_ROW#","TOP_LEVEL_CALL#",TOP_LEVEL_CALL_NAME,CONSUMER_GROUP_ID
,XID,"REMOTE_INSTANCE#",TIME_MODEL,IN_CONNECTION_MGMT,IN_PARSE,IN_HARD_PARSE,IN_SQL_EXECUTION,IN_PLSQL_EXECUTION,IN_PLSQL_RPC,IN_PLSQL_COMPILATION
,IN_JAVA_EXECUTION,IN_BIND,IN_CURSOR_CLOSE,IN_SEQUENCE_LOAD,IN_INMEMORY_QUERY,IN_INMEMORY_POPULATE,IN_INMEMORY_PREPOPULATE,IN_INMEMORY_REPOPULATE,IN_INMEMORY_TREPOPULATE,CAPTURE_OVERHEAD
,REPLAY_OVERHEAD,IS_CAPTURED,IS_REPLAYED,SERVICE_HASH,PROGRAM,MODULE,ACTION,CLIENT_ID,MACHINE,PORT
,ECID,DBREPLAY_FILE_ID,DBREPLAY_CALL_COUNTER,TM_DELTA_TIME,TM_DELTA_CPU_TIME,TM_DELTA_DB_TIME,DELTA_TIME,DELTA_READ_IO_REQUESTS,DELTA_WRITE_IO_REQUESTS,DELTA_READ_IO_BYTES 
,DELTA_WRITE_IO_BYTES,DELTA_INTERCONNECT_IO_BYTES,PGA_ALLOCATED,TEMP_SPACE_ALLOCATED,DBOP_NAME,DBOP_EXEC_ID,CON_DBID,CON_ID',
    operation_id=>l_operation_id
  );
  dbms_output.put_line('Operation ID:'||l_operation_id||' finished successfully');
EXCEPTION WHEN OTHERS THEN
  dbms_output.put_line('Operation ID:'||l_operation_id||' raised an error');
  RAISE;
END;
/

The copy data takes slightly longer than the query on the external table.

Operation ID:31 finished successfully

PL/SQL procedure successfully completed.

Elapsed: 00:02:01.11

The status of the copy operation is reported in USER_LOAD_OPERATIONS. This includes the number of rows loaded and the names of external tables that are created for the log and bad files.

set lines 120
column type format a10
column file_uri_list format a64
column start_time format a32
column update_time format a32
column owner_name format a10
column table_name format a10
column partition_name format a10
column subpartition_name format a10
column logfile_table format a15
column badfile_table format a15
column tempext_table format a30
select * from user_load_operations where id = &operation_id;

        ID TYPE              SID    SERIAL# START_TIME                       UPDATE_TIME                      STATUS
---------- ---------- ---------- ---------- -------------------------------- -------------------------------- ---------
OWNER_NAME TABLE_NAME PARTITION_ SUBPARTITI FILE_URI_LIST                                                    ROWS_LOADED
---------- ---------- ---------- ---------- ---------------------------------------------------------------- -----------
LOGFILE_TABLE   BADFILE_TABLE   TEMPEXT_TABLE
--------------- --------------- ------------------------------
        31 COPY            19965      44088 07-MAY-20 17.03.20.328263 +01:00 07-MAY-20 17.05.36.157680 +01:00 COMPLETED
SOE        ASH_HIST                         https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu     1409305
                                            /b/bucket-20200505-1552/o/dba_hist_active_sess_history.txt.gz
COPY$31_LOG     COPY$31_BAD     COPY$Y2R021UKPJ5F75JCMSKL

An external table is temporarily created by the COPY_DATA procedure but is then dropped before the procedure completes. The bad file is empty because the copy operation succeeded without error, but we can query the copy log.

select * from COPY$31_LOG;

RECORD                                                                                                                  
------------------------------------------------------------------------------------------------------------------------
 LOG file opened at 05/07/20 16:03:21                                                                                   

Total Number of Files=1                                                                                                 

Data File: https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu/b/bucket-20200505-1552/o/dba_hist_active_sess_history.txt.gz                                                                                                        

Log File: COPY$31_105537.log                                                                                            

 LOG file opened at 05/07/20 16:03:21                                                                                   

Total Number of Files=1                                                                                                 

Data File: https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu/b/bucket-20200505-1552/o/dba_hist_active_sess_history.txt.gz                                                                                                        

Log File: COPY$31_105537.log                                                                                            

 LOG file opened at 05/07/20 16:03:21                                                                                   

KUP-05014:   Warning: Intra source concurrency disabled because the URLs specified for the Cloud Service map to compressed data.                                                                                                                

Bad File: COPY$31_105537.bad                                                                                            

Field Definitions for table COPY$Y2R021UKPJ5F75JCMSKL                                                                   
  Record format DELIMITED BY                                                                                            
  Data in file has same endianness as the platform                                                                      
  Rows with all null fields are accepted                                                                                
  Table level NULLIF (Field = BLANKS)                                                                                   
  Fields in Data Source:                                                                                                

    SNAP_ID                         CHAR (255)                                                                          
      Terminated by "<,>"
      Trim whitespace from left and right                                                                               
    DBID                            CHAR (255)                                                                          
      Terminated by "<,>"
      Trim whitespace from left and right                                                                               
    INSTANCE_NUMBER                 CHAR (255)                                                                          
      Terminated by "<,>"
      Trim whitespace from left and right                                                                               
    SAMPLE_ID                       CHAR (255)                                                                          
      Terminated by "<,>"
      Trim whitespace from left and right                                                                               
    SAMPLE_TIME                     CHAR (255)                                                                          
      Date datatype TIMESTAMP, date mask YYYY-MM-DD/HH24:mi:ss.ff                                                       
      Terminated by "<,>"
      Trim whitespace from left and right                                                                               
…
    CON_ID                          CHAR (255)                                                                          
      Terminated by "<,>"
      Trim whitespace from left and right                                                                               

Date Cache Statistics for table COPY$Y2R021UKPJ5F75JCMSKL                                                               
  Date conversion cache disabled due to overflow (default size: 1000)                                                   

365 rows selected.

These files are written to the DATA_DUMP_DIR database directory. We don't have access to the database file system in Autonomous, so Oracle has provided the LIST_FILES procedure in DBMS_CLOUD so that we can see what files are in a directory.

Set pages 99 lines 150
Column object_name format a32
Column created format a32
Column last_modified format a32
Column checksum format a20
SELECT * FROM DBMS_CLOUD.LIST_FILES('DATA_PUMP_DIR');

OBJECT_NAME                           BYTES CHECKSUM             CREATED                          LAST_MODIFIED
-------------------------------- ---------- -------------------- -------------------------------- --------------------------------
…
COPY$31_dflt.log                          0                      07-MAY-20 16.03.20.000000 +00:00 07-MAY-20 16.03.20.000000 +00:00
COPY$31_dflt.bad                          0                      07-MAY-20 16.03.20.000000 +00:00 07-MAY-20 16.03.20.000000 +00:00
COPY$31_105537.log                    13591                      07-MAY-20 16.03.21.000000 +00:00 07-MAY-20 16.05.35.000000 +00:00

Statistics are automatically collected on the table by the copy process because it was done in direct-path mode. We can see the number of rows retrieved corresponds with the number of rows imported by the COPY_DATA procedure.

Set pages 99 lines 140
Column owner format a10
Column IM_STAT_UPDATE_TIME format a30
Select * 
from all_tab_statistics
Where table_name = 'ASH_HIST';

OWNER      TABLE_NAME PARTITION_ PARTITION_POSITION SUBPARTITI SUBPARTITION_POSITION OBJECT_TYPE    NUM_ROWS     BLOCKS EMPTY_BLOCKS
---------- ---------- ---------- ------------------ ---------- --------------------- ------------ ---------- ---------- ------------
 AVG_SPACE  CHAIN_CNT AVG_ROW_LEN AVG_SPACE_FREELIST_BLOCKS NUM_FREELIST_BLOCKS AVG_CACHED_BLOCKS AVG_CACHE_HIT_RATIO IM_IMCU_COUNT
---------- ---------- ----------- ------------------------- ------------------- ----------------- ------------------- -------------
IM_BLOCK_COUNT IM_STAT_UPDATE_TIME             SCAN_RATE SAMPLE_SIZE LAST_ANALYZED       GLO USE STATT STALE_S SCOPE
-------------- ------------------------------ ---------- ----------- ------------------- --- --- ----- ------- -------
SOE        ASH_HIST                                                                  TABLE           1409305      19426            0
         0          0         486                         0                   0
                                                             1409305 15:16:14 07.05.2020 YES NO        NO      SHARED

I can confirm that the data is compressed because the compression type of every row is type 8 (HCC QUERY LOW). See also DBMS_COMPRESSION Compression Types

WITH x AS (
select dbms_compression.get_compression_type('SOE', 'ASH_HIST', rowid) ctype
from soe.ash_hist sample (.1))
Select ctype, count(*) From x group by ctype;

     CTYPE   COUNT(*)
---------- ----------
         8      14097

I can find this SQL Statement in the Performance Hub.

INSERT /*+ append enable_parallel_dml */ INTO "SOE"."ASH_HIST" SELECT * FROM COPY$Y2R021UKPJ5F75JCMSKL

Therefore, the data was queried from the temporary external table into the permanent table, in direct path mode and in parallel.
I can also look at the OCI Performance Hub and see that mode of the time was spent on CPU. I can see the SQL_ID of the insert statement and the call to the DBMS_CLOUD procedure.

I can drill in further to the exact SQL statement.

When I query the table I get exactly the same data as previously with the external table.

set autotrace on timi on lines 180 trimspool on
break on report
compute sum of ash_secs on report
column min(sample_time) format a22
column max(sample_time) format a22
select event, sum(10) ash_Secs, min(sample_time), max(sample_time)
from soe.ash_hist
group by event
order by ash_Secs desc
;

EVENT                                                              ASH_SECS MIN(SAMPLE_TIME)       MAX(SAMPLE_TIME)
---------------------------------------------------------------- ---------- ---------------------- ----------------------
                                                                   10304530 22-MAR-20 09.59.51.125 07-APR-20 23.00.30.395
direct path read                                                    3258500 22-MAR-20 09.59.51.125 07-APR-20 23.00.30.395
SQL*Net more data to client                                          269220 22-MAR-20 10.00.31.205 07-APR-20 22.59.30.275
direct path write temp                                                32400 22-MAR-20 11.39.53.996 07-APR-20 21.43.47.329
gc cr block busy                                                      24930 22-MAR-20 10.51.33.189 07-APR-20 22.56.56.804

latch free                                                               10 28-MAR-20 20.26.11.307 28-MAR-20 20.26.11.307
                                                                 ----------
sum                                                                14093050

86 rows selected.

Elapsed: 00:00:00.62

I can see that the execution plan is now a single serial full scan of the table.

Execution Plan
----------------------------------------------------------
Plan hash value: 1336681691

----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |    84 |  1428 |  1848   (9)| 00:00:01 |
|   1 |  SORT ORDER BY              |          |    84 |  1428 |  1848   (9)| 00:00:01 |
|   2 |   HASH GROUP BY             |          |    84 |  1428 |  1848   (9)| 00:00:01 |
|   3 |    TABLE ACCESS STORAGE FULL| ASH_HIST |  1409K|    22M|  1753   (4)| 00:00:01 |
----------------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
         11  recursive calls
         13  db block gets
      19255  consistent gets
      19247  physical reads
       2436  redo size
       5428  bytes sent via SQL*Net to client
        602  bytes received via SQL*Net from client
          7  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         86  rows processed

↧

Oracle 19c: Real-Time Statistics & High-Frequency Statistics Collection

November 5, 2020, 2:02 pm

≫ Next: Oracle 19c: Adventures with Automatic Indexing

≪ Previous: Loading a Flat File from OCI Object Storage into an Autonomous Database. Part 3. Copying data from Object Storage to a Regular Table

The video of this recent presentation, given as a part of the Oracle Groundbreakers EMEA Tour 2020, is now available.

Keeping object statistics up to date is critical to Oracle database performance and stability. Both of these features aim to address the challenge of using data that has been significantly updated before the statistics maintenance window has run again. The features are only available on engineered systems, and so certainly are targetted at the autonomous database.

Real-time Statistics augment existing statistics. However, they are not quite as real-time as the name suggests. To keep their implementation lightweight they use the table monitoring mechanism, this limits the information that can be collected.
High-Frequency Automatic Optimizer Statistics Collection is effectively a never-ending statistics maintenance window. As your data and statistics change, so there are opportunities for SQL execution plans, and therefore application performance to change. DBAs and developers need to be aware of the implications.

↧

Oracle 19c: Adventures with Automatic Indexing

November 6, 2020, 12:29 pm

≫ Next: Retrofitting Partitioning into an Existing Application: 1. Introduction

≪ Previous: Oracle 19c: Real-Time Statistics & High-Frequency Statistics Collection

The video of this recent presentation, given as a part of the Oracle Groundbreakers EMEA Tour 2020, is now available.

Automatic Indexing is one of the much-heralded features of Oracle 19c, but it is only available on Engineered Systems, therefore in Autonomous Database that is built on Exadata and on other Exadata platforms. This presentation shares some initial experiences with the feature based on testing it in conjunction with Swingbench and discusses how well it performed.

↧

Retrofitting Partitioning into an Existing Application: 1. Introduction

November 10, 2020, 8:30 am

≫ Next: Retrofitting Partitioning into an Existing Application: 2. What kinds of partitioning can you do?

≪ Previous: Oracle 19c: Adventures with Automatic Indexing

This post is the first in a series about the partitioning of database objects.

Introduction

What kind of partitioning can you do?

Scripting and Archiving

Examples from Real Life

General Ledger reporting: Typical example of partitioning for data warehouse-style queries
Payroll: Avoiding the need for read-consistency in a typical transaction processing system.
Workflow: Separate active and inactive rows, and partial indexing.

Conclusion

Introduction

Over the years I have seen and read many presentations and articles on the subject of partitioning database tables and indexes. Most explain how partitioning works. Many explain the options for the developer and discuss how to design your application to be able to make effective use of partitioning.

However, my experience comes from working with packaged applications or applications that are already in production where all the design decisions have been taken. Often, I am faced with performance or scalability problems, and sometimes I have to consider whether partitioning is an effective option.

In this series of posts, I am going to look at the thought process behind deciding whether you can retrofit partitioning into an existing application. The task often falls to the DBA but also requires input from application developers and administrators. I realise that I am going to say many of the same things that you can find in other articles, but I will be approaching them from a slightly different point of view.

The motivation is always the same: improved performance with, if possible, reduced overheads.

The fastest way to do anything is not to do it all.
In general, Oracle inserts data into the first available space. Any piece of data could be anywhere in a table. However, partitioning creates a relationship between the physical location of a piece of data and the logical value of that data. This dictates into which partition data is inserted.
Thus, the optimizer can discard partitions from a query, without the overhead of scanning them, where it can determine that no data of interest resides. This is called partition elimination or partition pruning.
If you aren't achieving elimination, then there is probably no benefit to the partitioning. In fact, it might increase your overheads as you probe every partition.

The following diagram was taken from the Oracle documentation. The table has been partitioned into monthly partitions. If I am looking for March data, then I don't need to inspect the January and February partitions. However, if the table had not been partitioned I would have to scan the whole segment. If the query was using a locally partitioned index, then I would only probe the partition for March.

Whose job is it?

Designing partitioning into an application during development is a job for the application architect/developers.
Retrofitting partitioning into an existing (or a packaged 3rd party) application usually falls to the DBA.

In my opinion, in order to be successful, the developers and the DBAs need to work together. Bear in mind also:

Partitioning is a licenced option available on Enterprise Edition only. That means you have to pay for it. So, if you are not getting an improvement in performance (or a reduction in resource consumption) then you have to question whether it is worth it.
Check your application vendor's support policy. Sometimes vendors do not support customer partitioning at all (e.g. Oracle's own E-Business Suite). Or, they may permit it, but it remains the customer's responsibility to support it (e.g. PeopleSoft - yes, also owned by Oracle).
There is also an ongoing cost of ownership. You have to look after your partitioning. If you partition by date or something that changes over time (like employee ID), then periodically you will add new partitions, and then also possibly compress and/or remove old partitions. If you rebuild or change a table or index, you have to remember that it is partitioned when you create the DDL.

↧

Retrofitting Partitioning into an Existing Application: 2. What kinds of partitioning can you do?

November 11, 2020, 3:57 am

≫ Next: Retrofitting Partitioning into an Existing Application: 3. Scripting & Archiving

≪ Previous: Retrofitting Partitioning into an Existing Application: 1. Introduction

This post is part of a series about the partitioning of database objects.

Introduction

What kind of partitioning can you do?

Scripting and Archiving

Examples from Real Life

General Ledger reporting: Typical example of partitioning for data warehouse-style queries
Payroll: Avoiding the need for read-consistency in a typical transaction processing system.
Workflow: Separate active and inactive rows, and partial indexing.

Conclusion

1-Dimensional Partitioning

Oracle supports three forms of partitioning:

Range: a non-inclusive upper limit is defined for each partition. Any row where the partition key value is higher than this limit is placed in a subsequent partition. Implicitly the minimum value is the upper limit of the preceding partition.
List: specific values are placed in specific partitions.
Hash: the value of the partitioning key is passed to a hash function. The output of the hash function determines the partition.

Interval partitioning is a form of range partitioning where Oracle calculates the partition boundaries mathematically, so you don't have to. Therefore, it only works with numeric, date and timestamp fields.

Partitioning Type	DDL	USER_TAB_PARTITIONS
Range	CREATE TABLE t_r (a NUMBER , b NUMBER ,CONSTRAINT t_r_pk PRIMARY KEY(a) ) PARTITION BY RANGE (b) (PARTITION VALUES LESS THAN (10) ,PARTITION VALUES LESS THAN (20) ,PARTITION VALUES LESS THAN (MAXVALUE));	Table Part Partition High Num Name Pos Name Value Rows ----- ---- --------- -------- ----- T_R 1 SYS_P539 10 1000 2 SYS_P540 20 1000
List	CREATE TABLE t_l (a NUMBER, b NUMBER ,CONSTRAINT t_l_pk PRIMARY KEY(a) ) PARTITION BY LIST (b) (PARTITION VALUES (1,2,3) ,PARTITION VALUES (4,5,6) ,PARTITION VALUES (DEFAULT));	Table Part Partition High Num Name Pos Name Value Rows ----- ---- --------- -------- ----- T_L 1 SYS_P542 1, 2, 3 300 2 SYS_P543 4, 5, 6 300 3 SYS_P544 DEFAULT 9400
Hash	CREATE TABLE t_h (a NUMBER, b NUMBER ,CONSTRAINT t_h_pk PRIMARY KEY(a) ) PARTITION BY HASH (b) PARTITIONS 4;	Table Part Partition High Num Name Pos Name Value Rows ----- ---- --------- -------- ----- T_H 1 SYS_P545 2000 2 SYS_P546 2900 3 SYS_P547 2400 4 SYS_P548 2700
Interval	CREATE TABLE t_i (a NUMBER, b NUMBER ,CONSTRAINT t_i_pk PRIMARY KEY(a) ) PARTITION BY RANGE (b) INTERVAL (10) (PARTITION VALUES LESS THAN (10));	Table Part Partition High Num Name Pos Name Value Rows ----- ---- --------- -------- ----- T_I 1 SYS_P549 10 1000 2 SYS_P550 20 1000 3 SYS_P551 30 1000 4 SYS_P552 40 1000 5 SYS_P553 50 1000 6 SYS_P554 60 1000 7 SYS_P555 70 1000 8 SYS_P556 80 1000 9 SYS_P557 90 1000 10 SYS_P558 100 1000

2-Dimensional (Composite) Partitioning

Oracle can partition in independently on two columns (or groups of columns). This is called composite partitioning. It is easy to think of this as partitioning in two dimensions. Again, this diagram is taken from Oracle`s documentation.

Composite partitioning can mix any form of partitioning with any form of partitioning, except that you cannot interval subpartition.

Partition DDL		Sub-partitioning type
Partition DDL		Range	List	Hash	Interval
Partitioning Type	Range	CREATE TABLE t_rr (a NUMBER, b NUMBER, c NUMBER ,CONSTRAINT t_rr_pk PRIMARY KEY(a)) PARTITION BY RANGE (b) SUBPARTITION BY RANGE (c) SUBPARTITION TEMPLATE (SUBPARTITION s_10 VALUES LESS THAN (10) ,SUBPARTITION s_20 VALUES LESS THAN (20) ,SUBPARTITION s_mx VALUES LESS THAN (MAXVALUE)) (PARTITION VALUES LESS THAN (10) ,PARTITION VALUES LESS THAN (20) ,PARTITION VALUES LESS THAN (MAXVALUE));	CREATE TABLE t_rl (a NUMBER, b NUMBER, c NUMBER ,CONSTRAINT t_rl_pk PRIMARY KEY(a)) PARTITION BY RANGE (b) SUBPARTITION BY LIST (c) SUBPARTITION TEMPLATE (SUBPARTITION s_1 VALUES (1,2,3) ,SUBPARTITION s_2 VALUES (4,5,6) ,SUBPARTITION s_mx VALUES (DEFAULT)) (PARTITION VALUES LESS THAN (10) ,PARTITION VALUES LESS THAN (20) ,PARTITION VALUES LESS THAN (MAXVALUE));	CREATE TABLE t_rh (a NUMBER, b NUMBER, c NUMBER ,CONSTRAINT t_rh_pk PRIMARY KEY(a)) PARTITION BY RANGE (b) SUBPARTITION BY HASH (c) SUBPARTITIONS 4 (PARTITION VALUES LESS THAN (10) ,PARTITION VALUES LESS THAN (20) ,PARTITION VALUES LESS THAN (MAXVALUE) );	ORA-14179: An unsupported partitioning method was specified in this context.
	List	CREATE TABLE t_lr (a NUMBER, b NUMBER, c NUMBER ,CONSTRAINT t_lr_pk PRIMARY KEY(a)) PARTITION BY LIST (b) SUBPARTITION BY RANGE (c) SUBPARTITION TEMPLATE (SUBPARTITION s_10 VALUES LESS THAN (10) ,SUBPARTITION s_20 VALUES LESS THAN (20) ,SUBPARTITION s_mx VALUES LESS THAN (MAXVALUE)) (PARTITION VALUES (1,2,3) ,PARTITION VALUES (4,5,6) ,PARTITION VALUES (DEFAULT));	CREATE TABLE t_ll (a NUMBER, b NUMBER, c NUMBER ,CONSTRAINT t_ll_pk PRIMARY KEY(a)) PARTITION BY LIST (b) SUBPARTITION BY LIST (c) SUBPARTITION TEMPLATE (SUBPARTITION s_1 VALUES (1,2,3) ,SUBPARTITION s_2 VALUES (4,5,6) ,SUBPARTITION s_mx VALUES (DEFAULT)) (PARTITION VALUES (1,2,3) ,PARTITION VALUES (4,5,6) ,PARTITION VALUES (DEFAULT));	CREATE TABLE t_lh (a NUMBER, b NUMBER, c NUMBER ,CONSTRAINT t_lh_pk PRIMARY KEY(a)) PARTITION BY LIST (b) SUBPARTITION BY HASH (c) SUBPARTITIONS 4 (PARTITION VALUES (1,2,3) ,PARTITION VALUES (4,5,6) ,PARTITION VALUES (DEFAULT));
	Hash	CREATE TABLE t_hr (a NUMBER, b NUMBER, c NUMBER ,CONSTRAINT t_hr_pk PRIMARY KEY(a)) PARTITION BY HASH (b) SUBPARTITION BY RANGE (c) SUBPARTITION TEMPLATE (SUBPARTITION s_10 VALUES LESS THAN (10) ,SUBPARTITION s_20 VALUES LESS THAN (20) ,SUBPARTITION s_mx VALUES LESS THAN (MAXVALUE)) PARTITIONS 4;	CREATE TABLE t_hl (a NUMBER, b NUMBER, c NUMBER ,CONSTRAINT t_hl_pk PRIMARY KEY(a)) PARTITION BY HASH (b) SUBPARTITION BY LIST (c) SUBPARTITION TEMPLATE (SUBPARTITION s_1 VALUES (1,2,3) ,SUBPARTITION s_2 VALUES (4,5,6) ,SUBPARTITION s_mx VALUES (DEFAULT)) PARTITIONS 4;	CREATE TABLE t_hh (a NUMBER, b NUMBER, c NUMBER ,CONSTRAINT t_hh_pk PRIMARY KEY(a)) PARTITION BY HASH (b) SUBPARTITION BY HASH (c) SUBPARTITIONS 4 PARTITIONS 4;
	Interval	CREATE TABLE t_ir (a NUMBER, b NUMBER, c NUMBER ,CONSTRAINT t_ir_pk PRIMARY KEY(a)) PARTITION BY RANGE (b) INTERVAL (10) SUBPARTITION BY RANGE (c) SUBPARTITION TEMPLATE (SUBPARTITION s_10 VALUES LESS THAN (10) ,SUBPARTITION s_20 VALUES LESS THAN (20) ,SUBPARTITION s_mx VALUES LESS THAN (MAXVALUE)) (PARTITION VALUES LESS THAN (10));	CREATE TABLE t_il (a NUMBER, b NUMBER, c NUMBER ,CONSTRAINT t_il_pk PRIMARY KEY(a)) PARTITION BY RANGE (b) INTERVAL (10) SUBPARTITION BY LIST (c) SUBPARTITION TEMPLATE (SUBPARTITION s_1 VALUES (1,2,3) ,SUBPARTITION s_2 VALUES (4,5,6) ,SUBPARTITION s_mx VALUES (DEFAULT)) (PARTITION VALUES LESS THAN (10));	CREATE TABLE t_ih (a NUMBER, b NUMBER, c NUMBER ,CONSTRAINT t_ih_pk PRIMARY KEY(a)) PARTITION BY RANGE (b) INTERVAL (10) SUBPARTITION BY HASH (c) SUBPARTITIONS 4 (PARTITION VALUES LESS THAN (10));

Sub-partition templates simplify the DDL, otherwise, you have to specify the sub-partitions for each partition. As you do not specify all the partitions when interval partitioning, you effectively have to use templates to sub-partition interval partitions. Otherwise, the automatically added partitions will not be sub-partitioned.

In some cases, involving hash partitioning, the database is sensitive to the order of partition and sub-partitions clauses in the DDL.

Partitions are given system-generated names unless names are specified. Explicitly specified interval partitions have to be explicitly named. Subpartition names in subpartition templates are only used when the partition is explicitly named, otherwise, the subpartition has an entirely system-generated name.

It can be helpful to explicitly specify partition and sub-partition names. It has no impact on performance, but can help administration, e.g. reporting space usage by partition. It can also be helpful later partitions and dropped, split or merged archiving or ILM.

↧

Retrofitting Partitioning into an Existing Application: 3. Scripting & Archiving

November 12, 2020, 7:06 am

≫ Next: Retrofitting Partitioning into Existing Applications: Example 1. General Ledger

≪ Previous: Retrofitting Partitioning into an Existing Application: 2. What kinds of partitioning can you do?

This post is part of a series about the partitioning of database objects.

Introduction

What kind of partitioning can you do?

Scripting and Archiving

Examples from Real Life

General Ledger reporting: Typical example of partitioning for data warehouse-style queries
Payroll: Avoiding the need for read-consistency in a typical transaction processing system.
Workflow: Separate active and inactive rows, and partial indexing.

Conclusion

Scripting

If you introduce partitioning, you need to look after it.

It is common to partition a table by date, or another column that is a proxy for the date, such as an accounting period. Often that implies a regular but not relatively infrequent maintenance activity, perhaps only annually.
You are likely to have to add, remove and possibly compress partitions. There may be groups of tables that have to be similarly partitioned. You can easily end up in a hellish world of manual scripting.
This can make interval partitioning attractive because Oracle automatically creates the partitions on demand. However, you are still responsible for any subsequent compressing and purging
Interval partitions (other than the ones you explicitly specify, and the whole point is that you need only specify the first one in the range) will be given system-generated names.
On the other hand, explicit partition names, with a consistent naming convention, can be very helpful when you come to partition-wise operations during archive/purge operations.
If you are going to manage partition DDL scripts manually, then you need strict version control.

For PeopleSoft, I created a utility to generate partition DDL from the PeopleSoft metadata. It was only worth my while doing this because I was solving the same challenge with partitioning different PeopleSoft product at many different customers. It is unlikely that you will be willing to put the investment into that sort of utility for a single implementation of an application.

Manual scripting opens the possibility for manual errors to creep it.
Generating DDL guarantees a degree of consistency.

Archiving

Never let the archiving tail wag the performance dog.

You pay for and implement partitioning for the benefit of the application users. Archiving is frequently done for much the same reasons. Partitioning can make archiving much easier if you can archive whole partitions at a time. However, making the archiving experience better is not the same as making the user experience better.

Where you have partitioned by time, it is frequently the case that you can also archive by time, and you have a rolling window of partitions that you both add, remove, and sometimes compress and possibly merge on a regular basis. It may be that the partitioning design that is best for application performance will also lend itself to partition-wise archive. Partition-wise archiving is attractive, but not at the expense of application performance.

In the next posts, I will look at some real-life examples of how partitioning was introduced into an application.

↧

Retrofitting Partitioning into Existing Applications: Example 1. General Ledger

November 13, 2020, 9:31 am

≫ Next: Retrofitting Partitioning into Existing Applications: Example 2. Payroll

≪ Previous: Retrofitting Partitioning into an Existing Application: 3. Scripting & Archiving

This post is part of a series about the partitioning of database objects.

Introduction

What kind of partitioning can you do?

Scripting and Archiving

Examples from Real Life

General Ledger reporting: Typical example of partitioning for data warehouse-style queries
Payroll: Avoiding the need for read-consistency in a typical transaction processing system.
Workflow: Separate active and inactive rows, and partial indexing.

Conclusion

If you were designing an application to use partitioning, you would write the code to reference the column by which the data was partitioned so that the database does partition elimination. However, with a pre-existing or 3rd party application you have to look at how the application queries the data and match the partitioning to that.

I am going to look at a number of cases from real-life, and discuss the thought process behind partitioning decisions. These examples happen to come from PeopleSoft ERP systems, but that does not make them unusual. PeopleSoft is just another packaged application. In every case, it is necessary to have some application knowledge when deciding whether and how to introduce partitioning.

General Ledger

GL is an example of where OLTP and DW activities clash on the same table. GL is a data warehouse of transactional information about a business. The rationale for partitioning ledger data is a very typical example of partitioning for SQL query performance.

Dimensions	Attributes
BUSINESS_UNIT LEDGER ACCOUNT DEPTID OPERATING_UNIT PRODUCT AFFILIATE CHARTFIELD1/2/3 PROJECT_ID BOOK_CODE FISCAL_YEAR/ACCOUNTING_PERIOD CURRENCY_CD/BASE_CURRENCY …and others	POSTED_TOTAL_AMT POSTED_BASE_AMT POSTED_TRANS_AMT

You can think of it as a star-schema. The ledger table is the fact table. Dimensions are generated from standing data in the application. The reports typically slice and dice that data by time, and various dimensions. The exact dimensions vary from business to business, and from time to time.

In PeopleSoft, you can optionally configure summary ledger tables that are pre-aggregations of ledger data by a limited set of dimensions. These are generated by batch processes. However, it is not a commonly used feature, as it introduces latency between a change being made, and not being able to report on it from the summary ledgers until the refresh process has run.

Business transactions post continuously to the ledger. Meanwhile, the accountants also want to query ledger data. Especially at month-end, they want to post adjustments and see the consequences immediately.

Here is a typical query from the PeopleSoft GL Reporting tool (nVision). The queries vary widely, but some elements (in bold) are always present.

SELECT L.TREE_NODE_NUM,L2.TREE_NODE_NUM,SUM(A.POSTED_TOTAL_AMT)
FROM   PS_LEDGER A
,      PSTREESELECT05 L1
,      PSTREESELECT10 L
,      PSTREESELECT10 L2
WHERE  A.LEDGER='ACTUALS'
AND    A.FISCAL_YEAR=2020
AND    A.ACCOUNTING_PERIOD BETWEEN 1 AND 11
AND    L1.SELECTOR_NUM=30982 AND A.BUSINESS_UNIT=L1.RANGE_FROM_05
AND    L.SELECTOR_NUM=30985 AND A.CHARTFIELD1=L.RANGE_FROM_10
AND    L2.SELECTOR_NUM=30984 AND A.ACCOUNT=L2.RANGE_FROM_10
AND    A.CURRENCY_CD='GBP'
GROUP BY L.TREE_NODE_NUM,L2.TREE_NODE_NUM

Queries are always on a particular ledger or group of ledgers.

You can have different ledgers for different accounting standards or reporting requirements.
Sometimes you can have adjustment ledgers – that are usually much smaller than the actuals ledgers – and they are aggregated with the main ledger.
In the latest version of the application, the budget ledger can be stored in the same table rather than a separate table. Budget data has a different shape to actuals data and is created up to a year earlier. It is generally much smaller and has a different usage profile.
So, there is always an equality criterion or IN-list criterion on LEDGER

Queries are always for a particular fiscal year. This year, last year, sometimes the year before. Therefore, there is always an equality criterion on FISCAL_YEAR.
Queries may be for a particular period, in which case there is a single-period equality criterion. Alternatively, they are for the year-to-date, in which case there is a BETWEEN 1 AND current period criterion. Sometimes for a particular quarter. It is common to see queries on the same year-to-date period in the previous fiscal year.
Queries always specify the reporting currency. Therefore, there is always a criterion on CURRENCY_CD, although many multi-national customers only have single currency ledgers, so the criterion may not be selective.
There will be varying criteria on other dimension columns on LEDGER by joining to the PSTREESELECT dimension tables.

What should I partition by?

We have seen the shape of the SQL, we know which columns are candidate partitioning keys because we have seen which columns have criteria. LEDGER is a candidate

                                  Cum.
LEDGER          NUM_ROWS      %      %
---------- ------------- ------ ------
XXXXCORE     759,496,900   43.9   43.9
CORE         533,320,425   30.8   74.7
XXXXGAAP     152,563,325    8.8   83.5
GAAP_ADJ      74,371,775    4.3   87.8
ZZZZ_CORE     34,251,514    2.0   89.8
C_XXCORE      29,569,381    1.7   91.5
…
           -------------
sum        1,731,153,467

FISCAL_YEAR is an obvious choice.

    Fiscal 
      Year      NUM_ROWS      %
---------- ------------- ------
      2016           121
      2017            32
      2018   510,168,673   29.5
      2019   574,615,980   33.2
      2020   646,336,579   37.3
      2021        32,082      
           -------------
sum        1,731,153,467

Most companies have monthly accounting periods (although some use other frequencies). Then we have 12 accounting periods, plus bought forward (0), carry forward (998), and adjustments (999).

    Fiscal Accounting                     Cum.
      Year     Period   NUM_ROWS      %      %
---------- ---------- ---------- ------ ------
…
      2020          0   66237947    3.8   37.3
                    1   42865339    2.5   33.5
                    2   47042492    2.7   31.0
                    3   53680915    3.1   28.3
                    4   50113011    2.9   25.2
                    5   44700409    2.6   22.3
                    6   54983221    3.2   19.7
                    7   51982401    3.0   16.6
                    8   44851506    2.6   13.6
                    9   56528783    3.3   11.0
                   10   52266343    3.0    7.7
                   11   70541810    4.1    4.7
                   12   10542380     .6     .6
                  999         22     .0     .0
**********            ----------
sum                    646336579
…

CURRENCY_CD is usually not a candidate for most companies because they report in a single currency, so all the rows are the same currency. But even then, each ledger is a particular currency. It is usually more effective to partition by LEDGER.

It is very tempting to interval partition on FISCAL_YEAR and then range or list sub-partition on ACCOUNTING PERIOD into 14 partitions each year. Then Oracle will automatically add the range partitions for each FISCAL_YEAR.

CREATE TABLE ps_ledger (...)
PARTITION BY RANGE (fiscal_year) INTERVAL (1)
SUBPARTITION BY RANGE (accounting_period) 
SUBPARTITION TEMPLATE
 (SUBPARTITION p00 VALUES LESS THAN (1)
 ,SUBPARTITION p01 VALUES LESS THAN (2)
...
 ,SUBPARTITION p12 VALUES LESS THAN (13)
 ,SUBPARTITION pxx VALUES LESS THAN (MAXVALUE))
(PARTITION VALUES LESS THAN (2019));

However, I would counsel against this. You can only partition in two dimensions, and LEDGER is a very attractive column.

Instead, you can partition in one dimension on the combination of two (or more) columns. I would range partition on the combination of FISCAL_YEAR and ACCOUNTING_PERIOD.

CREATE TABLE ps_ledger (...)
PARTITION BY RANGE (fiscal_year,accounting_period) 
(PARTITION ledger_2017     VALUES LESS THAN (2018,0)
,PARTITION ledger_2018_bf  VALUES LESS THAN (2018,1)
,PARTITION ledger_2018_p01 VALUES LESS THAN (2018,2)
…
,PARTITION ledger_2021_cf VALUES LESS THAN (2022,0)
);

The application never uses ACCOUNTING_PERIOD without also using FISCAL_YEAR. Sometimes it uses FISCAL_YEAR without ACCOUNTING_PERIOD.
Partition elimination does work with multi-column partitions.

If you only specify a criterion on FISCAL_YEAR in a query you will still get partition elimination.
If you only specify a criterion on ACCOUNTING_PERIOD only you will not get partition elimination.

You cannot interval partition on multiple columns. Therefore, you have to manage the annual addition of new partitions yourself.
Also, you cannot get partition change tracking for materialized view refresh to work with multi-column partitioning.
This leaves sub-partitioning to be used on a different column.

Should I create a MAXVALUE partition?

I deliberately haven't specified a MAXVALUE partition. There are arguments for and against this.

The argument against MAXVALUE it is that you might forget to add the new partition for the new year, and then all the data for the next fiscal year goes into the same partition and over time the performance of the reports gradually decay. By the time the performance issue is diagnosed several months may have piled up. Then you need to split the partition into several partitions (or exchange it out, add the new partitions, and reinsert the data). So not having a MAXVALUE partition forces the annual maintenance activity to be put in the diary, otherwise, the application will error when it tries to insert data for a FISCAL_YEAR for which there is currently no partition.

Now budget data is kept in the LEDGER table, you have do this before the budget ledger data is created, which is up to a year ahead of actuals data, so the risk of business interruption is minimal.

In favour of a MAXVALUE partition is that it prevents the error from occurring, but risks forgetting or deferring the maintenance for operational reasons.
Of course, a MAXVALUE partition can be added at any time!

Should I Sub-partition?

It depends on the data.

The ledger table is a big table, and the LEDGER column is usually a selective low cardinality column. So, it is a good candidate for sub-partitioning. A single value list sub-partition for each of the largest actuals and budget ledgers, a default sub-partition for all other values.
This is not the case in summary ledger tables that are usually built on a single ledger. So they are usually range partitioned on FISCAL_YEAR, ACCOUTING_PERIOD, and can then be sub-partitioned on a different dimension column

You can use a template if you want the same sub-partitions for every accounting period.

If you use interval partitioning, you have to use a subpartition template if you want to composite partition.

CREATE TABLE ps_ledger (…)
PARTITION BY RANGE (fiscal_year,accounting_period) INTERVAL (1) 
SUBPARTITION BY LIST (ledger) 
SUBPARTITION TEMPLATE
 (SUBPARTITION l_xxx VALUES LESS THAN ('XXX')
 ,SUBPARTITION l_yyy VALUES LESS THAN ('YYY')
…
,SUBPARTITION VALUES (DEFAULT))
(PARTITION VALUES LESS THAN (2019));

Sometimes, companies change their use of ledgers, in which case the sub-partitions need to reflect that. You can still use the template to specify whatever is the currently required sub-partitioning. If you ever recreate the table you end up explicitly specifying sub-partitions for every other partition. The DDL becomes very verbose. Although with deferred segment creation it wouldn't really matter if you had empty sub-partitions that had not been physically created for accounting periods where a ledger was not used.

However, if I want to specify different tablespaces, no free space allowance, compression etc on certain partitions, then I need to use explicit partition and subpartition clauses, or come along afterwards and alter and rebuild them.

I think explicit partition and subpartition names are administratively helpful when it comes to reporting on partition space usage, and when you archive/purge data by exchanging or dropping a partition.

CREATE TABLE ps_ledger (…)
PARTITION BY RANGE (fiscal_year,accounting_period) 
SUBPARTITION BY LIST (ledger) 
(PARTITION ledger_2018 VALUES LESS THAN (2019,0) PCTFREE 0 COMPRESS
 (SUBPARTITION ledger_2018_xxx      VALUES ('XXX')
 ,SUBPARTITION ledger_2018_yyy      VALUES ('YYY')
 ,SUBPARTITION ledger_2018_z_others VALUES (DEFAULT)
)
,PARTITION ledger_2019_bf VALUES LESS THAN (2019,1) PCTFREE 0 COMPRESS
 (SUBPARTITION ledger_2019_bf_xxx      VALUES ('XXX')
 ,SUBPARTITION ledger_2019_bf_yyy      VALUES ('YYY')
 ,SUBPARTITION ledger_2019_bf_z_others VALUES (DEFAULT)
)
…
;

Indexing

Indexes can be partitioned or not independently of the table.

Local indexes are partitioned in the same way as the table they are built on. Therefore, there is a 1:1 relationship of table partition/sub-partition to index partition/sub-partition.
Global indexes are not partitioned the same way as the table. You can have

Global partitioned indexes
Global non-partitioned indexes

Local indexes are easier to build and maintain. When you do a partition operation on a table partition (add, drop, merge, split or truncate) the same operation is applied to local indexes. However, if you do an operation on a table partition, any global index will become unusable, unless the DDL is done with the UPDATE INDEXES clause. Using this option, when you drop a partition, all the corresponding rows are deleted from the index. The benefit is that the indexes do not become unusable (in which case they would have to be rebuilt), but dropping the table partition takes longer because the rows have to be deleted from the index (effectively a DML operation).

As a general rule, indexes that contain the partitioning key, and at least the first partitioning key column is near the front of the index (I usually reckon in the first three key columns), should be locally partitioned unless there is a reason not to.

With the general ledger, I tend to create pairs of local indexes that match the reporting analysis criteria.

One of each of the pair of indexes leads on LEDGER, FISCAL_YEAR, ACCOUNTING_PERIOD and then the other dimension columns. This supports single period queries.
The other index leads on LEDGER, FISCAL_YEAR, then the other dimension columns and finally ACCOUNTING_PERIOD is last because we are interested in a range of periods.

To support single period queries	To support year-to-date queries
CREATE INDEX psgledger ON ps_ledger (ledger ,fiscal_year ,accounting_period ,business_unit ,account ,project_id ,book_code ) LOCAL …	CREATE INDEX pshledger ON ps_ledger (ledger ,fiscal_year ,business_unit ,account ,project_id ,book_code ,accounting_period ) LOCAL …

The unique index on the ledger table does include the partitioning keys. But FISCAL_YEAR and ACCOUNTING_PERIOD are the last 2 of 25 columns. This index is really to support queries from the on-line application and batch processes that post to the ledger. So a query on BUSINESS_UNIT would have to prove every partition. Therefore, I generally don't partition this index. It would be reasonable to globally partition it on LEDGER only.

CREATE UNIQUE INDEX ps_ledger ON ps_ledger
(business_unit,ledger,account,altacct,deptid
,operating_unit,product,fund_code,class_fld,program_code
,budget_ref,affiliate,affiliate_intra1,affiliate_intra2,chartfield1
,chartfield2,chartfield3,project_id,book_code,gl_adjust_type
,date_code,currency_cd,statistics_code,fiscal_year,accounting_period
)…

Archiving

Taken together, FISCAL_YEAR and ACCOUNTING_PERIODY are effectively a proxy for the date of the accounting period. So we will add partitions and can compress and later drop them after a period of time.

Once an accounting period has been closed it will not be written to again (or at least not much and not often), so it can then be compressed. It can't be compressed before because the application is still applying ordinary DML (unless the Advanced Compression option has been licenced). This applies to both conventional dictionary compression and Hybrid Columnar Compression on Exadata.

Most reports are on current and previous fiscal years. Earlier years are candidates to be purged or archived by dropping or exchanging partitions. When partitions are dropped, because you have global indexes, this should be with the UPDATE ALL INDEXES clause

ALTER TABLE ps_ledger DROP PARTITION ledger_2017 UPDATE INDEXES;

↧

Retrofitting Partitioning into Existing Applications: Example 2. Payroll

November 16, 2020, 4:47 am

≫ Next: Retrofitting Partitioning into Existing Applications: Example 3. Workflow: Separate Active and Inactive Rows, and Partial Indexing.

≪ Previous: Retrofitting Partitioning into Existing Applications: Example 1. General Ledger

This post is part of a series about the partitioning of database objects.

Introduction

What kind of partitioning can you do?

Scripting and Archiving

Examples from Real Life

General Ledger reporting: Typical example of partitioning for data warehouse-style queries
Payroll: Avoiding the need for read-consistency in a typical transaction processing system.
Workflow: Separate active and inactive rows, and partial indexing.

Conclusion

Partitioning Payroll

Range and List Partitioning brings similar data together and therefore keeps dissimilar data apart. This has implications for read-consistency as well as improving query performance by partition elimination.

Hash Partitioning spreads rows roughly evenly across a number of partitions. This can be used to mitigate contention problems. It is recommended that the number of hash partitions should be an integral power of 2 (ie. 2, 4, 8, 16 etc.) because the partition is taken from a number of bits from the hash value, and the distribution of data across the partitions works better.

Payroll calculation involves lots of computation per employee. There isn't much opportunity for database parallelism. The PeopleSoft Global Payroll (GP) calculation process works through employees in a sequential fashion. Each payroll process only consumes a single processor at any one time. In order to bring more resources to bear on the payroll and therefore process it is less time, multiple payroll calculation processes are run concurrently, each one working on a distinct set of data. In GP, the sets of data are ranges of employee IDs. Each set is called a 'stream'. The payroll processes are then configured to process a specific stream. Most of the SQLs therefore have EMPLID BETWEEN criteria.

DELETE /*GPPCANCL_D_ERNDALL*/ 
FROM PS_GP_RSLT_ERN_DED 
WHERE EMPLID BETWEEN :1 AND :2 
AND CAL_RUN_ID=:3

Typically, large companies run payroll calculation processes several times per pay period. Partly to see what the payroll value is in advance, and partly to see the effect of changes before the final payroll that is actually used to pay employees. Each concurrent payroll calculation process inserts data into result tables, also concurrently. So it is common for data blocks in result tables to contain data from many different streams. When the payroll is recalculated, results from previous payrolls are deleted (by statements such as the one above), also concurrently. You now have different transactions deleting different rows from the same data block. There is never any row-level locking because each row is only in scope for one and only one process. However, each delete transaction comes from a different process that created a different database session, that started at a slightly different time and therefore has a different System Change/Commit Number (SCN). Therefore, each payroll process needs its own read-consistent version of every data block that it reads, recovered back to its own SCN. So if I have 10 streams, I am likely to need 10 copies of every data block of every payroll-related table in the buffer cache.

The result is that the payroll runtime degrades very significantly with the number of concurrent processes to the extent that it quickly becomes worse than running a single process because the database

spends a huge amount of time on read-consistency,
is more likely to run out of buffer cache, so blocks are aged out, reloaded, and may have to be recovered back to the desired SCN again.

However, if one can align the partitioning with the processing, then this behaviour can be eliminated. If the payroll result tables (and some of the other application tables) are each range partitioned on EMPLID such that there is a 1:1 relationship of payroll stream to partition, then this problem does not occur because each stream references a single partition of each table, and each data block will only contain rows for one stream and so can only ever have a single transaction. Thus there is no requirement to produce a consistent version of a block. The database only needs a single copy of each data block in memory. The result is almost 100% scalability of payroll processing until eventually, the file system cannot cope with the redo generation.

This approach relies absolutely on the application processing ranges of employees specified with the between criteria, and that criteria mapping to one partition. When implemented the result is single range partition queries.

WHERE EMPLID BETWEEN :1 AND :2

Both partitioning and application configuration had to change and meet somewhere in the middle. The number of streams is limited by the hardware, usually the number of CPUs. The streams are calculated to be of a size such that all of them take about the same time to process (this is not the same as them being the same number of employees). It is necessary to allow for new employees being given new sequential employee IDs. Therefore there is also a need to periodically rebalance the streams as employees are hired and leave. This becomes an annual process that is combined with archiving.

Some customers have avoided the annual rebalancing by reversing the sequentially generated employee ID before it is used, but you have to do this when the system is first implemented and only if new employee IDs can be allocated.

However, this technique depends upon the application. When I looked at PeopleSoft's North American Payroll (which is a completely different product) this approach did not work. It does use multiple concurrent processes, but the employees are group logically by other business attributes. We still see the read-consistency problems, but we can't resolve them with range partitioning. So you see that understanding both partitioning and the application is essential.

Sub-partitioning

The results of each pay period accumulate overtime. In GP, each pay period has a calendar ID. It is a character string, defined in the application. So the larger payroll result tables can be sub-partitioned on CAL_RUN_ID.

When I first worked on Global Payroll it was often run on Oracle 8i, where we only had hash sub-partitioning. I can use dbms_utility.get_hash_value() to predict which hash partition a string value falls into (see also http://www.jlcomp.demon.co.uk/2d_parts.html from 1999). I could therefore adjust the calendar ID values to manipulate which partition they fall into.

Today, I list sub-partition the tables on CAL_RUN_ID. Most companies create and follow a naming convention for their calendar IDs, so the list sub-partitions can be created in advance, and it is simply a matter listing the calendar(s) that go into each partition. In some cases, for large companies, I have created a list sub-partition for each pay period.

↧