Quantcast
Channel: The /*+Go-Faster*/ Oracle Blog
Viewing all 105 articles
Browse latest View live

Oracle 19c Real-Time and High-Frequency Automatic Statistics Collection

$
0
0
I gave this presentation at the UKOUG Techfest 19 conference.  This video was produced as a part of the preparation for that session.  The slide deck is also available on my website.

It takes a look at the pros and cons of these new 19c features.  They are only available on Engineered Systems.  Both features aim to address the challenge of using data that has been significantly updated before the statistics maintenance window has run again.
  • Real-Time Statistics uses table monitoring to augment existing statistics with simple corrections
  • High-Frequency Automatic Optimizer Statistics is an extra statistics maintenance window that runs regularly to update the stalest statistics.
As your statistics change, so there are opportunities for SQL execution plans, and therefore application performance to change. DBAs and developers need to be aware of the implications.

Extended Column Group Statistics, Composite Index Statistics, Histograms and an EDB360 Enhancement to Detect the Coincidence

$
0
0
In this post:
  • A simple demonstration to show the behaviour of extended statistics and how it can be disabled by the presence of histograms.  None of this is new, there are many other blog posts on this topic. I provide links to some of them.
  • I have added an enhancement to the EDB360 utility to detect histograms on columns in extended statistics.

Introduction

'Extended statistics were introduced in Oracle 11g to allow statistics to be gathered on groups of columns, to highlight the relationship between them, or on expressions. Oracle 11gR2 makes the process of gathering extended statistics for column groups easier'. [Tim Hall: https://oracle-base.com/articles/11g/extended-statistics-enhancements-11gr2]

Example 1: Cardinality from Extended Statistics

Without extended statistics, Oracle will simply multiply column cardinalities together.  Here is a simple example.  I will create a table with 10000 rows, where two columns each have the same 100 rows of 100 values, so they correlate perfectly.  I will gather statistics, but no histograms.
create table t
(k number
,a number
,b number
,x varchar2(1000)
);

insert into t
with n as (select rownum n from dual connect by level <= 100)
select rownum, a.n, a.n, TO_CHAR(TO_DATE(rownum,'J'),'Jsp')
from n a, n b
order by a.n, b.n;

exec dbms_stats.gather_table_stats(user,'T',method_opt=>'FOR ALL COLUMNS SIZE 1');
I will deliberately disable optimizer feedback so that Oracle cannot learn from experience about the cardinality misestimates.
alter session set statistics_level=ALL;
alter session set "_optimizer_use_feedback"=FALSE;

select count(*) from t where a = 42 and b=42;

COUNT(*)
----------
100
Oracle estimates that it will get 1 row but actually gets 100.
It estimates 1 because it is 1/100 * 1/100 * 10000 rows
select * from table(dbms_xplan.display_cursor(null,null,format=>'ADVANCED +ALLSTATS LAST, IOSTATS -PROJECTION -OUTLINE'));

Plan hash value: 1071362934
---------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 21 (100)| | 1 |00:00:00.01 | 73 |
| 1 | SORT AGGREGATE | | 1 | 1 | 26 | | | 1 |00:00:00.01 | 73 |
|* 2 | TABLE ACCESS FULL| T | 1 | 1 | 26 | 21 (0)| 00:00:01 | 100 |00:00:00.01 | 73 |
---------------------------------------------------------------------------------------------------------------------
Now I will create extended statistics on the column group.  I can do that in one of two ways:
  • either by explicitly creating the definition and then creating them by gathering statistics:
SELECT dbms_stats.create_extended_stats(ownname=>user, tabname=>'t', extension=>'(a,b)')
FROM dual;
exec dbms_stats.gather_table_stats(user,'T',method_opt=>'FOR ALL COLUMNS SIZE 1');
  • Or, I can create extended statistics directly in one go by specifying them in the method opt clause.
exec dbms_stats.gather_table_stats(user,'T',method_opt=>'FOR ALL COLUMNS SIZE 1, FOR COLUMNS SIZE 1 (A,B)');
Now Oracle correctly estimates that the same query will fetch 100 rows because it directly knows the cardinality for the two columns in the query.
Plan hash value: 1071362934
---------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 21 (100)| | 1 |00:00:00.01 | 73 |
| 1 | SORT AGGREGATE | | 1 | 1 | 6 | | | 1 |00:00:00.01 | 73 |
|* 2 | TABLE ACCESS FULL| T | 1 | 100 | 600 | 21 (0)| 00:00:01 | 100 |00:00:00.01 | 73 |
---------------------------------------------------------------------------------------------------------------------

Example 2: Cardinality from Index Statistics

I can get exactly the same effect by creating an index on the two columns.
drop table t purge;

create table t
(k number
,a number
,b number
,x varchar2(1000)
);

insert into t
with n as (select rownum n from dual connect by level <= 100)
select rownum, a.n, a.n, TO_CHAR(TO_DATE(rownum,'J'),'Jsp')
from n a, n b
order by a.n, b.n;

create index t_ab on t(a,b) compress;
This time I have not collected any statistics on the table.  Statistics are automatically collected on the index when it is built.  I have used a hint to stop the query using the index to look up the rows, nonetheless, Oracle has correctly estimated that it will get 100 rows because it has used the number of distinct keys from the index statistics.
SQL_ID  711banpfgfa18, child number 0
-------------------------------------
select /*+FULL(t)*/ count(*) from t where a = 42 and b=42

Plan hash value: 1071362934
---------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 21 (100)| | 1 |00:00:00.01 | 74 |
| 1 | SORT AGGREGATE | | 1 | 1 | 26 | | | 1 |00:00:00.01 | 74 |
|* 2 | TABLE ACCESS FULL| T | 1 | 100 | 2600 | 21 (0)| 00:00:01 | 100 |00:00:00.01 | 74 |
---------------------------------------------------------------------------------------------------------------------
Note, that there is nothing in the execution plan to indicate that the index statistics were used to estimate the number of rows returned!

Example 3: Histograms Disable the Use of Extended Statistics

There have long been blogs that refer to behaviour that Oracle has documented as Bug 6972291: Column group selectivity is not used when there is a histogram on one column:
'As of 10.2.0.4 CBO can use the selectivity of column groups but this option is disabled if there is a histogram defined on any of the columns of the column group.
Note:  This fix is disabled by default. To enable the fix set "_fix_control"="6972291:ON"
When ENABLED the code will use multi-column stats regardless of whether there is a histogram on one of the columns or not.  When DISABLED (default) CBO will not use multi-column stats if there is a histogram on one of the columns in the column group.'
  • Christian Antognini, 2014: https://antognini.ch/2014/02/extension-bypassed-because-of-missing-histogram/
  • Jonathan Lewis, 2012: https://jonathanlewis.wordpress.com/2012/04/11/extended-stats/
    • Maria Colgan also commented: 'This … was a deliberate design decision to prevent over-estimations when one of the values supplied is ‘out of range’. We can’t ignore the ‘out of range’ scenario just because we have a column group. Extended statistics do not contain the min, max values for the column group so we rely on the individual column statistics to check for ‘out of range’ scenarios like yours. When one of the columns is ‘out of range’, we revert back to the column statistics because we know it is going to generate a lower selectivity range and if one of the columns is ‘out of range’ then the number of rows returned will be lower or none at all, as in your example'
In this example, I explicitly create a histogram on one of the columns in my extended statistics.  However, in the real world that can happen automatically if the application references one column and not another.
exec dbms_stats.gather_table_stats(user,'T',method_opt=>'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 100 A, FOR COLUMNS SIZE 1 B (A,B)');
My cardinality estimate goes back to 1 because Oracle does use the extended statistics in the presence of a histogram on any of the constituent columns.  Exactly the same happens if the number of distinct values on the combination of columns comes from composite index statistics.  A histogram similarly disables their use.
SQL_ID  8trj2kacqhm6f, child number 1
-------------------------------------
select count(*) from t where a = 42 and b=42

Plan hash value: 1071362934
---------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 21 (100)| | 1 |00:00:00.01 | 73 |
| 1 | SORT AGGREGATE | | 1 | 1 | 6 | | | 1 |00:00:00.01 | 73 |
|* 2 | TABLE ACCESS FULL| T | 1 | 1 | 6 | 21 (0)| 00:00:01 | 100 |00:00:00.01 | 73 |
---------------------------------------------------------------------------------------------------------------------
This is likely to happen in real-life systems because histograms can be automatically created when statistics are gathered.

Out-of-Range Predicates

If you have one or more predicates on columns that are part of an extended statistics, and that predicate goes out of range when compared to the column statistics, then Oracle still doesn’t use the extended statistics (see also https://jonathanlewis.wordpress.com/2012/04/11/extended-stats/), irrespective of whether it has a histogram or not, or whether fix control 6972291 is set or not.
The extended histogram uses a virtual column whose value is derived from SYS_OP_COMBINED_HASH().  You can see this in the default data value for the column.  Therefore the optimizer cannot use the minimum/maximum value (see also https://jonathanlewis.wordpress.com/2018/08/02/extended-histograms-2/).
Instead, Oracle does the linear decay of the density of the column predicates, and if there is a frequency or top-frequency histogram then it uses half the density of the lowest frequency bucket and applies linear decay to that.

Example 4: Extended Histograms

This time I will create a histogram on my extended statistics as well as histograms on the underlying columns.
exec dbms_stats.gather_table_stats(user,'T',method_opt=>'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 100 A B (A,B)');
I am back to getting the correct cardinality estimate.
SQL_ID  8trj2kacqhm6f, child number 0
-------------------------------------
select count(*) from t where a = 42 and b=42

Plan hash value: 1071362934
---------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 21 (100)| | 1 |00:00:00.01 | 73 |
| 1 | SORT AGGREGATE | | 1 | 1 | 6 | | | 1 |00:00:00.01 | 73 |
|* 2 | TABLE ACCESS FULL| T | 1 | 100 | 600 | 21 (0)| 00:00:01 | 100 |00:00:00.01 | 73 |
---------------------------------------------------------------------------------------------------------------------
And this is also something that has been blogged about previously:

Threats

In this blog, Jonathan Lewis comments (https://jonathanlewis.wordpress.com/2012/04/11/extended-stats/) on certain weaknesses in the implementations.  He also references other bloggers.
Either creating or removing histograms on columns in either extended statistics or composite indexes may result in the execution plan changing because those extended statistics may change.  This could happen automatically when gathering statistics as data skew and predicate usage changes.
If I drop a composite index, maybe because it is not used, or maybe it is redundant because it is a subset of another index, then I should replace it with an extended histogram on the same set of columns.

Detecting the Problem

I have added a report to section 3c of EDB360 to detect the problem.  The SQL query is shown below.  It will report on histograms on columns in either:
  • composite indexes where there are no extended column group statistics, or
  • extended column group statistics for which there are no histograms.

3c.25. Columns with Histograms in Extended Statistics (DBA_STAT_EXTENSIONS)

#
Table
Owner
Table
Name
Object
Type
Index/Extension Name
Number of
Distinct 
Values
Number of
Buckets
EXTENSION
Column
Name
Column
Number of
Distinct
Values

Column
Number of
Buckets

Column
Histogram
Type
1
HR
JOB_HISTORY
Index
JHIST_EMP_ID_ST_DATE_PK
10
("EMPLOYEE_ID","START_DATE")
EMPLOYEE_ID
7
7
FREQUENCY
2
OE
INVENTORIES
Index
INVENTORY_IX
1112
("WAREHOUSE_ID","PRODUCT_ID")
PRODUCT_ID
208
208
FREQUENCY
3
OE
ORDER_ITEMS
Index
ORDER_ITEMS_PK
665
("ORDER_ID","LINE_ITEM_ID")
ORDER_ID
105
105
FREQUENCY
4
OE
ORDER_ITEMS
Index
ORDER_ITEMS_UK
665
("ORDER_ID","PRODUCT_ID")
ORDER_ID
105
105
FREQUENCY
5
OE
ORDER_ITEMS
Index
ORDER_ITEMS_UK
665
("ORDER_ID","PRODUCT_ID")
PRODUCT_ID
185
185
FREQUENCY
6
SCOTT
T
Extension
SYS_STUNA$6DVXJXTP05EH56DTIR0X
100
1
("A","B")
A
100
100
FREQUENCY
7
SCOTT
T
Extension
SYS_STUNA$6DVXJXTP05EH56DTIR0X
100
1
("A","B")
B
100
100
FREQUENCY
8
SOE
ORDERS
Index
ORD_WAREHOUSE_IX
10270
("WAREHOUSE_ID","ORDER_STATUS")
ORDER_STATUS
10
10
FREQUENCY
9
SOE
ORDER_ITEMS
Index
ORDER_ITEMS_PK
13758515
("ORDER_ID","LINE_ITEM_ID")
LINE_ITEM_ID
7
7
FREQUENCY

Just because something is reported by this test, does not necessarily mean that you need to change anything.
  • Providing fix control 6972291 is not enabled, should you wish to drop or alter any reported index, you at least know that it cannot be used to provide column group statistics.  Though you would still need to consider SQL that might use the index directly.
  • You might choose to add column group histograms, and sometimes that will involve adding column statistics.  However, the number of distinct values on the column group will usually be higher than on the individual columns and can easily be greater than the number of buckets you can have in a frequency histogram.  In such cases, from 12c, you may end up with either a Top-frequency histogram or a hybrid histogram.
  • Or you might choose to remove the histograms from the individual columns so that the column group statistics are used.
  • Or you might choose to enforce the status quo, by setting table statistics preferences to ensure currently existing histograms are preserved and currently, non-existent histograms are not introduced.
Whatever you choose to do regarding statistics and histogram collection, I would certainly recommend doing so declaratively, by defining a table statistic preference.  For example, here I will preserve the histograms on the columns in the column group, but I will also build a histogram on the column group:
exec dbms_stats.gather_table_stats(user,'T',method_opt=>'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 254 A B (A,B)');
  • Or, you might even enable the fix_control. You can also do that at session level or even statement level (but beware of disabling any other fix controls that may be set). 
SQL_ID  16judk2v0uf7w, child number 0
-------------------------------------
select /*+FULL(t) OPT_PARAM('_fix_control','6972291:on')*/ count(*)
from t where a = 42 and b=42

Plan hash value: 1071362934
---------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 21 (100)| | 1 |00:00:00.01 | 73 |
| 1 | SORT AGGREGATE | | 1 | 1 | 6 | | | 1 |00:00:00.01 | 73 |
|* 2 | TABLE ACCESS FULL| T | 1 | 100 | 600 | 21 (0)| 00:00:01 | 100 |00:00:00.01 | 73 |
---------------------------------------------------------------------------------------------------------------------

Outline Data
-------------

/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE('19.1.0')
DB_VERSION('19.1.0')
OPT_PARAM('_fix_control''6972291:1')
ALL_ROWS
OUTLINE_LEAF(@"SEL$1")
FULL(@"SEL$1""T"@"SEL$1")
END_OUTLINE_DATA
*/

EDB360 Test Query

This is the SQL query that produces the report in EDB360.
WITH i as ( /*composite indexes*/
SELECT i.table_owner, i.table_name, i.owner index_owner, i.index_name, i.distinct_keys
, '('||(LISTAGG('"'||c.column_name||'"',',') WITHIN GROUP (order by c.column_position))||')' column_list
FROM dba_indexes i
, dba_ind_columns c
WHERE i.table_owner = c.table_owner
AND i.table_name = c.table_name
AND i.owner = c.index_owner
AND i.index_name = c.index_name
AND i.table_name NOT LIKE 'BIN$%'
AND i.table_owner NOT IN ('ANONYMOUS','APEX_030200','APEX_040000','APEX_040200','APEX_SSO','APPQOSSYS','CTXSYS','DBSNMP','DIP','EXFSYS','FLOWS_FILES','MDSYS','OLAPSYS','ORACLE_OCM','ORDDATA','ORDPLUGINS','ORDSYS','OUTLN','OWBSYS')
AND i.table_owner NOT IN ('SI_INFORMTN_SCHEMA','SQLTXADMIN','SQLTXPLAIN','SYS','SYSMAN','SYSTEM','TRCANLZR','WMSYS','XDB','XS$NULL','PERFSTAT','STDBYPERF','MGDSYS','OJVMSYS')
GROUP BY i.table_owner, i.table_name, i.owner, i.index_name, i.distinct_keys
HAVING COUNT(*) > 1 /*index with more than one column*/
), e as ( /*extended stats*/
SELECT e.owner, e.table_name, e.extension_name
, CAST(e.extension AS VARCHAR(1000)) extension
, se.histogram, se.num_buckets, se.num_distinct
FROM dba_stat_extensions e
, dba_tab_col_statistics se
WHERE e.creator = 'USER'
AND se.owner = e.owner
AND se.table_name = e.table_name
AND se.column_name = e.extension_name
AND e.table_name NOT LIKE 'BIN$%'
AND e.owner NOT IN ('ANONYMOUS','APEX_030200','APEX_040000','APEX_040200','APEX_SSO','APPQOSSYS','CTXSYS','DBSNMP','DIP','EXFSYS','FLOWS_FILES','MDSYS','OLAPSYS','ORACLE_OCM','ORDDATA','ORDPLUGINS','ORDSYS','OUTLN','OWBSYS')
AND e.owner NOT IN ('SI_INFORMTN_SCHEMA','SQLTXADMIN','SQLTXPLAIN','SYS','SYSMAN','SYSTEM','TRCANLZR','WMSYS','XDB','XS$NULL','PERFSTAT','STDBYPERF','MGDSYS','OJVMSYS')
)
SELECT e.owner, e.table_name
, 'Extension' object_type
, e.extension_name object_name, e.num_distinct, e.num_buckets, e.extension
, sc.column_name
, sc.num_distinct col_num_distinct
, sc.num_buckets col_num_buckets
, sc.histogram col_histogram
FROM e
, dba_tab_col_statistics sc
WHERE e.histogram = 'NONE'
AND e.extension LIKE '%"'||sc.column_name||'"%'
AND sc.owner = e.owner
AND sc.table_name = e.table_name
AND sc.histogram != 'NONE'
AND sc.num_buckets > 1 /*histogram on column*/
AND e.num_buckets = 1 /*no histogram on extended stats*/
UNION ALL
SELECT /*+ NO_MERGE */ /* 3c.25 */
i.table_owner, i.table_name
, 'Index' object_type
, i.index_name object_name, i.distinct_keys, TO_NUMBER(null), i.column_list
, sc.column_name
, sc.num_distinct col_num_distinct
, sc.num_buckets col_num_buckets
, sc.histogram col_histogram
From i
, dba_ind_columns ic
, dba_tab_col_statistics sc
WHERE ic.table_owner = i.table_owner
AND ic.table_name = i.table_name
AND ic.index_owner = i.index_owner
AND ic.index_name = i.index_name
AND sc.owner = i.table_owner
AND sc.table_name = ic.table_name
AND sc.column_name = ic.column_name
AND sc.histogram != 'NONE'
AND sc.num_buckets > 1 /*histogram on column*/
AND NOT EXISTS( /*report index if no extension*/
SELECT 'x'
FROM e
WHERE e.owner = i.table_owner
AND e.table_name = i.table_name
AND e.extension = i.column_list)
ORDER BY 1,2,3,4;

Sparse Indexing

$
0
0
This is the first of two blog posts that discuss sparse and partial indexing.

Problem Statement

It is not an uncommon requirement to find rows that match a rare value in a column with a small number of distinct values.  So, the distribution of values is skewed.  A typical example is a status column where an application processes newer rows that are a relatively small proportion of the table because over time the majority of rows have been processed and are at the final status.
An index is effective at finding the rare values, but it is usually more efficient to scan the table for the common values.  A histogram is would almost certainly be required on such a column.  However, if you build an index on the column you have to index all the rows.  The index is, therefore, larger and requires more maintenance.  Could we not just index the rare values for which we want to use the index to find?
  • Oracle does not index null values. If we could engineer that the common value was null, and then the index would only contain the rare values.  This is sometimes called sparse indexing and is discussed in this blog.
  • Or we could separate the rare and common values into different index partitions, and build only the index partition(s) for the rare values.  This is called partial indexing and is discussed in the next blog.
As usual, this is not a new subject and other people have written extensively on these subjects, and I will provide links.  However, I want to draw some of the issues together.

Sparse Indexing

The ideas discussed in this section are based on the principle that Oracle indexes do not include rows where the key values are null.  

Store Null Values in the Database?

One option is to engineer the application to use null as the common status value.  However, this means that the column in question has to be nullable, and you may require different logic because the comparison to null is always false.
CREATE TABLE t 
(key NUMBER NOT NULL
,status VARCHAR2(1)
,other VARCHAR2(1000)
,CONSTRAINT t_pk PRIMARY KEY(key)
);

INSERT /*+APPEND*/ INTO t
SELECT rownum
, CASE WHEN rownum<=1e6-42 THEN NULL /*common status*/
WHEN rownum<=1e6-10 THEN 'A'
ELSE 'R' END CASE
FROM dual
CONNECT BY level <= 1e6;

CREATE INDEX t_status ON t (status);
exec sys.dbms_stats.gather_table_stats(user,'T',method_opt=>'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 254 status');
I have created a test table with 1000000 rows. 10 rows have status R, and 32 rows have status A. The rest have status NULL. I have indexed the status column, and also created a histogram on it when I collected statistics.
SELECT status, COUNT(*)
FROM t
GROUP BY status
/

S COUNT(*)
- ----------
999958
R 10
A 32
I can see from the statistics that I have 1000000 rows in the primary key index, but only 42 rows in the status index because it only contains the not null values. Therefore, it is much smaller, having only a single leaf block, whereas the primary key index has 1875 leaf blocks.
SELECT index_name, num_rows, leaf_blocks FROM user_indexes WHERE table_name = 'T';

INDEX_NAME NUM_ROWS LEAF_BLOCKS
---------- ---------- -----------
T_PK 1000000 1875
T_STATUS 42 1
There are some problems with this approach.

Not All Index Columns are Null 
If any of the index columns are not null, then there is an entry in the index for the row, and there is no saving of space. It is not uncommon to add additional columns to such an index, either for additional filtering, or to avoid accessing the table by satisfying the query from the index.
CREATE INDEX t_status2 ON t (status,other);
SELECT index_name, num_rows, leaf_blocks FROM user_indexes WHERE table_name = 'T' ORDER BY 1;

INDEX_NAME NUM_ROWS LEAF_BLOCKS
---------- ---------- -----------
T_PK 1000000 1875
T_STATUS 42 1
T_STATUS2 1000000 9081

Null Logic
If, for example, I want to find the rows that do not have status A, then a simple inequality does not find the null statuses because comparison to null is always false.
SELECT status, COUNT(*)
FROM t
WHERE status != 'A'
GROUP BY status
/

S COUNT(*)
- ----------
R 10
Instead, I would have to explicitly code for the null values.
SELECT status, COUNT(*)
FROM t
WHERE status != 'A' OR status IS NULL
GROUP BY status
/

S COUNT(*)
- ----------
999958
R 10
This additional complexity is certainly one reason why developers shy away from this approach in custom applications. It is almost impossible to retrofit it into an existing or packaged application. 

Function-Based Indexes 

It is possible to build an index on a function, such that that function to evaluates to null for the common values. This time my test table still has 1,000,000 rows. The status column is now not nullable.
CREATE TABLE t 
(key NUMBER NOT NULL
,status VARCHAR2(1) NOT NULL
,other VARCHAR2(1000)
,CONSTRAINT t_pk PRIMARY KEY(key)
)
/
INSERT /*+APPEND*/ INTO t
SELECT rownum
, CASE WHEN rownum<=1e6-42 THEN 'C'
WHEN rownum<=1e6-10 THEN 'A'
ELSE 'R' END CASE
, TO_CHAR(TO_DATE(rownum,'J'),'Jsp')
FROM dual
CONNECT BY level <= 1e6;
exec sys.dbms_stats.gather_table_stats(user,'T',method_opt=>'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 254 status');
10 rows have status A, and 32 rows have status O. The rest have status C.
SELECT status, COUNT(*)
FROM t
GROUP BY status
/
S COUNT(*)
- ----------
R 10
C 999958
A 32
I will build a simple index on status, and a second index on a function of status that decodes the common status C back to NULL;
CREATE INDEX t_status ON t (status);
CREATE INDEX t_status_fn ON t (DECODE(status,'C',NULL,status));
As before, with the null column, the function-based index has only a single leaf block, the other indexes are much larger because they contain all 1 million rows.
SELECT index_name, index_type, num_rows, leaf_blocks 
from user_indexes WHERE table_name = 'T' ORDER BY 1;

INDEX_NAME INDEX_TYPE NUM_ROWS LEAF_BLOCKS
------------ --------------------------- ---------- -----------
T_PK NORMAL 1000000 1875
T_STATUS NORMAL 1000000 1812
T_STATUS_FN FUNCTION-BASED NORMAL 42 1
If I query the table for the common status, Oracle quite reasonably full scans the table.
SELECT COUNT(other) FROM t WHERE status='C';

COUNT(OTHER)
------------
999958

Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 55 | 2446 (1)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 55 | | |
|* 2 | TABLE ACCESS FULL| T | 999K| 52M| 2446 (1)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("STATUS"='C')
If I query for the rare status value, it will use the normal index to look that up.
SELECT COUNT(other) FROM t WHERE status='R';

COUNT(OTHER)
------------
10

Plan hash value: 1997248105
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 55 | 4 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 55 | | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 10 | 550 | 4 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | T_STATUS | 10 | | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("STATUS"='R')
Now I will make that index invisible, and the optimizer can only choose to full scan the table. It cannot use the function-based index because the query does not match the function.
ALTER INDEX t_status INVISIBLE;
SELECT COUNT(other) FROM t WHERE status='R';

COUNT(OTHER)
------------
10

Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 55 | 2445 (1)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 55 | | |
|* 2 | TABLE ACCESS FULL| T | 10 | 550 | 2445 (1)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("STATUS"='R')
Instead, I must change the query to reference the function in the function-based index, and then the optimizer chooses the function-based index, even if I make the normal index visible again. Note that the function is shown in the access operation in the predicate section.
ALTER INDEX t_status VISIBLE;
SELECT COUNT(other) FROM t WHERE DECODE(status,'C',null,status)='R';

COUNT(OTHER)
------------
10

Plan hash value: 2511618215
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 55 | 2 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 55 | | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 21 | 1155 | 2 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | T_STATUS_FN | 21 | | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access(DECODE("STATUS",'C',NULL,"STATUS")='R')

Invisible Virtual Columns 

Function-based indexes are implemented using an invisible virtual column. You can even reference that virtual column in a query. However, the name of the column is system generated, so you may not want to include it in your application.
SELECT * FROM user_stat_extensions WHERE table_name = 'T';

TABLE_NAME EXTENSION_NAME EXTENSION CREATO DRO
---------- --------------- ---------------------------------------- ------ ---
T SYS_NC00004$ (DECODE("STATUS",'C',NULL,"STATUS")) SYSTEM NO

SELECT SYS_NC00004$, COUNT(*) FROM t group by SYS_NC00004$;

S COUNT(*)
- ----------
999958
R 10
A 32
Instead, you could create a virtual column and then index it. The resulting index is still function-based because it references the function inside the virtual column. From Oracle 12c it is also possible to make a column invisible. I would recommend doing so in case you have any insert statements without explicit column lists, otherwise, you might get ORA-00947: not enough values.
ALTER TABLE t ADD virtual_status VARCHAR2(1) INVISIBLE
GENERATED ALWAYS AS (DECODE(status,'C',null,status));
CREATE INDEX t_status_virtual ON t (virtual_status);

SELECT index_name, index_type, num_rows, leaf_blocks FROM user_indexes WHERE table_name = 'T' ORDER BY 1;

INDEX_NAME INDEX_TYPE NUM_ROWS LEAF_BLOCKS
---------------- --------------------------- ---------- -----------
T_PK NORMAL 1000000 1875
T_STATUS NORMAL 1000000 1812
T_STATUS_VIRTUAL FUNCTION-BASED NORMAL 42 1
The only difference between this and previous function-based index example is that now you can control the name of the virtual column, and you can easily reference it in the application. 
If you have only ever referenced the virtual column in the application, and never the function, then it is also easy to change the function.  Although you would have to rebuild the index.
SELECT COUNT(other) FROM t WHERE virtual_status='R';

COUNT(OTHER)
------------
10

Plan hash value: 3855131553
---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 55 | 2 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 55 | | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 21 | 1155 | 2 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | T_STATUS_VIRTUAL | 21 | | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("VIRTUAL_STATUS"='R')
If you have already created function-based indexes and referenced the function in the application you can replace them with an index on a named virtual column and the index will still be used.
SELECT COUNT(other) FROM t WHERE DECODE(status,'C',null,status)='R';

COUNT(OTHER)
------------
10

Plan hash value: 3855131553
---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 55 | 2 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 55 | | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 21 | 1155 | 2 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | T_STATUS_VIRTUAL | 21 | | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("T"."VIRTUAL_STATUS"='R')

Conclusion 

A function-based index, preferably on an explicitly named and created virtual column, will permit you to build an index on just the rare values in a column. Making the virtual column invisible will prevent errors during insert statements without explicit column lists. However, you will still need to alter the application SQL to reference either the virtual column or the function that generates it.

Partial Indexing

$
0
0
This is the second of two blog posts that discuss sparse and partial indexing.

Problem Statement

(This is the same problem statements as for sparse indexing)
It is not an uncommon requirement to find rows that match a rare value in a column with a small number of distinct values.  So, the distribution of values is skewed.  A typical example is a status column where an application processes newer rows that are a relatively small proportion of the table because over time the majority of rows have been processed and are at the final status.
An index is effective at finding the rare values, but it is usually more efficient to scan the table for the common values.  A histogram is would almost certainly be required on such a column.  However, if you build an index on the column you have to index all the rows.  The index is, therefore, larger and requires more maintenance.  Could we not just index the rare values for which we want to use the index to find?
  • Oracle does not index null values. If we could engineer that the common value was null, and then the index would only contain the rare values.  This is sometimes called sparse indexing and was discussed in the previous blog.
  • Or we could separate the rare and common values into different index partitions, and build only the index partition(s) for the rare values.  This is called partial indexing and is discussed in this blog.
As usual, this is not a new subject and other people have written extensively on these subjects, and I will provide links.  However, I want to draw some of the issues together.

Partition Table and Locally Partitioned Partial Index 

I could partition the table on the status column. Here, I have used list partitioning, because the common status is between the two rare status, so I only need two partitions not three. From Oracle 12.1, I can specify indexing on and off on the table and certain partitions so that later I can build partial local indexes only on some partitions. See also:
CREATE TABLE t 
(key NUMBER NOT NULL
,status VARCHAR2(1) NOT NULL
,other VARCHAR2(1000)
,CONSTRAINT t_pk PRIMARY KEY(key)
) INDEXING OFF
PARTITION BY LIST (status)
(PARTITION t_status_rare VALUES ('R','A') INDEXING ON
,PARTITION t_status_common VALUES (DEFAULT)
) ENABLE ROW MOVEMENT
/
INSERT /*+APPEND*/ INTO t --(key, status)
SELECT rownum
, CASE WHEN rownum<=1e6-1000 THEN 'C'
WHEN rownum<=1e6-10 THEN 'A'
ELSE 'R' END CASE
, TO_CHAR(TO_DATE(rownum,'J'),'Jsp')
FROM dual
CONNECT BY level <= 1e6;
exec sys.dbms_stats.gather_table_stats(user,'T',method_opt=>'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 254 status');
Here Oracle eliminated the common status partition and only scanned the rare status partition (partition 1). Note that I don't even have an index at this point.  So simply partitioning the table can be effective.
SELECT COUNT(other) FROM t WHERE status='R';

COUNT(OTHER)
------------
10

Plan hash value: 2831600127
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 58 | 4 (0)| 00:00:01 | | |
| 1 | SORT AGGREGATE | | 1 | 58 | | | | |
| 2 | PARTITION LIST SINGLE| | 10 | 580 | 4 (0)| 00:00:01 | KEY | KEY |
|* 3 | TABLE ACCESS FULL | T | 10 | 580 | 4 (0)| 00:00:01 | 1 | 1 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("STATUS"='R')

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("STATUS"='R')
However, now when the application updates the status from R (rare) to C (common) the row must be moved between partitions. It is necessary to enable row movement on the table, otherwise, an error will be generated. However, there is additional overhead in moving the row. It is effectively deleted from one partition and inserted into the other.
In this test, I have increased the frequency of one of the rare statuses. Otherwise, the optimizer determines that it is cheaper just to scan the table partition than use the index!
SELECT status, COUNT(*)
FROM t
GROUP BY status
/
S COUNT(*)
- ----------
R 10
A 990
C 999000
Note that I have already specified INDEXING OFF on the table and INDEXING ON on the rare statuses partition. Now I can just build a locally partitioned partial index.
CREATE INDEX t_status ON t(status) LOCAL INDEXING PARTIAL;
Note that only partition T_STATUS_RARE is physically built, and it only contains a single extent. Partition T_STATUS_COMMON exists, is unusable and the segment has not been physically built. It contains no rows and no leaf blocks.
SELECT partition_name, status, num_rows, leaf_blocks
from user_ind_partitions where index_name = 'T_STATUS';

PARTITION_NAME STATUS NUM_ROWS LEAF_BLOCKS
-------------------- -------- ---------- -----------
T_STATUS_COMMON UNUSABLE 0 0
T_STATUS_RARE USABLE 1000 2

SELECT segment_name, partition_name, blocks
FROM user_segments WHERE segment_name = 'T_STATUS';

SEGMENT_NAME PARTITION_NAME BLOCKS
------------ -------------------- ----------
T_STATUS T_STATUS_RARE 8

SELECT segment_name, partition_name, segment_type, extent_id, blocks
FROM user_extents WHERE segment_name = 'T_STATUS';

SEGMENT_NAME PARTITION_NAME SEGMENT_TYPE EXTENT_ID BLOCKS
------------ -------------------- ------------------ ---------- ----------
T_STATUS T_STATUS_RARE INDEX PARTITION 0 8
Scans for the common status value can only full scan the table partition because there is no index to use.
SELECT COUNT(other) FROM t WHERE status='C';

COUNT(OTHER)
------------
999000

Plan hash value: 2831600127
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 55 | 2444 (1)| 00:00:01 | | |
| 1 | SORT AGGREGATE | | 1 | 55 | | | | |
| 2 | PARTITION LIST SINGLE| | 998K| 52M| 2444 (1)| 00:00:01 | KEY | KEY |
|* 3 | TABLE ACCESS FULL | T | 998K| 52M| 2444 (1)| 00:00:01 | 2 | 2 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("STATUS"='C')
To query the rare value Oracle does use the index on the rare values partition.
SELECT COUNT(other) FROM t WHERE status='R';

COUNT(OTHER)
------------
10

Plan hash value: 3051124889
------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 58 | 2 (0)| 00:00:01 | | |
| 1 | SORT AGGREGATE | | 1 | 58 | | | | |
| 2 | PARTITION LIST SINGLE | | 10 | 580 | 2 (0)| 00:00:01 | KEY | KEY |
| 3 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T | 10 | 580 | 2 (0)| 00:00:01 | 1 | 1 |
|* 4 | INDEX RANGE SCAN | T_STATUS | 10 | | 1 (0)| 00:00:01 | 1 | 1 |
------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("STATUS"='R')
However, it is not worth using the index for the slightly more common status A.  Here, Oracle full scans the table partition.
SELECT COUNT(other) FROM t WHERE status='A';

COUNT(OTHER)
------------
990

Plan hash value: 2831600127
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 58 | 4 (0)| 00:00:01 | | |
| 1 | SORT AGGREGATE | | 1 | 58 | | | | |
| 2 | PARTITION LIST SINGLE| | 990 | 57420 | 4 (0)| 00:00:01 | KEY | KEY |
|* 3 | TABLE ACCESS FULL | T | 990 | 57420 | 4 (0)| 00:00:01 | 1 | 1 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("STATUS"='A')
Note that to use partial indexing I also had to partition the table.

Globally Partitioned Index with Zero-Sized Unusable Partitions 

Since Oracle 11.2.0.4, it has been possible to achieve the same effect without partitioning the table, thus avoiding the overhead of row movement. See also:
This feature also worked in earlier versions, but Oracle built a single extent for each unusable partition.
Here, I will recreate a non-partitioned table.
CREATE TABLE t 
(key NUMBER NOT NULL
,status VARCHAR2(1) NOT NULL
,other VARCHAR2(1000)
,CONSTRAINT t_pk PRIMARY KEY(key)
)
/
INSERT /*+APPEND*/ INTO t
SELECT rownum
, CASE WHEN rownum<=1e6-1000 THEN 'C'
WHEN rownum<=1e6-10 THEN 'A'
ELSE 'R' END CASE
, TO_CHAR(TO_DATE(rownum,'J'),'Jsp')
FROM dual
CONNECT BY level <= 1e6;

exec sys.dbms_stats.gather_table_stats(user,'T',method_opt=>'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 254 status');

SELECT status, COUNT(*)
FROM t
GROUP BY status
/

S COUNT(*)
- ----------
R 10
C 999000
A 990
It is not possible to create a globally list-partitioned index. Oracle simply does not support it.
CREATE INDEX t_status ON t(status)
GLOBAL PARTITION BY LIST (status, id2)
(PARTITION t_status_common VALUES ('R','A')
,PARTITION t_status_rare VALUES (DEFAULT)
);

GLOBAL PARTITION BY LIST (status)
*
ERROR at line 2:
ORA-14151: invalid table partitioning method
You can create a global range or hash partitioned index. It is unlikely that the hash values of the column will break down conveniently into particular hash values. In this example, I would still have needed to create 4 hash partitions and still build 2 of them.
WITH x as (
SELECT status, COUNT(*) freq
FROM t
GROUP BY status
) SELECT x.*
, dbms_utility.get_hash_value(status,0,2)
, dbms_utility.get_hash_value(status,0,4)
FROM x
/

S FREQ DBMS_UTILITY.GET_HASH_VALUE(STATUS,0,2) DBMS_UTILITY.GET_HASH_VALUE(STATUS,0,4)
- ---------- --------------------------------------- ---------------------------------------
R 990 1 1
C 1009000 0 0
A 10 0 2
It is easier to create a globally range partitioned index. Although in my example, the common status lies between the two rare statuses, so I need to create three partitions. I will create the index unusable and build the two rare status partitions.
CREATE INDEX t_status ON t(status)
GLOBAL PARTITION BY RANGE (status)
(PARTITION t_status_rare1 VALUES LESS THAN ('C')
,PARTITION t_status_common VALUES LESS THAN ('D')
,PARTITION t_status_rare2 VALUES LESS THAN (MAXVALUE)
) UNUSABLE;
ALTER INDEX t_status REBUILD PARTITION t_status_rare1;
ALTER INDEX t_status REBUILD PARTITION t_status_rare2;
The index partition for the common status is unusable so Oracle can only full scan the table.
SELECT COUNT(other) FROM t WHERE status='C';

COUNT(OTHER)
------------
999000

Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 55 | 2445 (1)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 55 | | |
|* 2 | TABLE ACCESS FULL| T | 999K| 52M| 2445 (1)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("STATUS"='C')
However, for the rare statuses, Oracle scans the index and looks up each of the table rows.
SELECT COUNT(other) FROM t WHERE status='R';

COUNT(OTHER)
------------
10

Plan hash value: 2558590380
------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 55 | 2 (0)| 00:00:01 | | |
| 1 | SORT AGGREGATE | | 1 | 55 | | | | |
| 2 | PARTITION RANGE SINGLE | | 10 | 550 | 2 (0)| 00:00:01 | 3 | 3 |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 10 | 550 | 2 (0)| 00:00:01 | | |
|* 4 | INDEX RANGE SCAN | T_STATUS | 10 | | 1 (0)| 00:00:01 | 3 | 3 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("STATUS"='R')

Conclusion 

The advantage of this global partitioning approach is that it does not require any change to application code, and it does not involve partitioning the table. However, you will have to remember not to rebuild the unusable partitions, otherwise, they will have to be maintained as the table changes until you make them unusable again and, they will consume space that you will only get back by recreating the entire index.
NB: Partitioning is a licenced option that is only available on the Enterprise Edition of the database.

    On-Line Statistics Gathering Disabled by Column Specific METHOD_OPT Table Statistics Preference

    $
    0
    0
    I have come across a quirk where the presence of a table statistics preference that specifies METHOD_OPT that is specific to some columns disables on-line statistics gathering.  This behaviour is at least not documented.  I have reproduced this in Oracle version 12.1.0.2 and 19.3.

    Demonstration 

    I will create two identical tables, but on the first table and specify a table statistic preference to collect a histogram on column C.
    set serveroutput on verify on autotrace off
    CREATE TABLE t1(a number, b varchar2(1000), c number);
    CREATE TABLE t2(a number, b varchar2(1000), c number);
    exec dbms_stats.set_table_prefs(user,'t1','METHOD_OPT','FOR ALL COLUMNS SIZE AUTO FOR COLUMNS SIZE 254 C');
    Then I will truncate each table, delete any statistics (because truncate does not delete statistics) and then populate the table again in direct-path mode.
    TRUNCATE TABLE t1;
    EXEC dbms_stats.delete_table_stats(user,'T1');
    INSERT /*+APPEND*/ INTO t1
    SELECT ROWNUM a, TO_CHAR(TO_DATE(rownum,'J'),'Jsp') b, CEIL(SQRT(rownum)) c
    FROM dual CONNECT BY level <= 1e5;

    TRUNCATE TABLE t2;
    EXEC dbms_stats.delete_table_stats(user,'T2');
    INSERT /*+APPEND*/ INTO t2
    SELECT ROWNUM a, TO_CHAR(TO_DATE(rownum,'J'),'Jsp') b, CEIL(SQRT(rownum)) c
    FROM dual CONNECT BY level <= 1e5;
    COMMIT;
    I expect to get statistics on both tables.
    alter session set nls_date_Format = 'hh24:mi:ss dd/mm/yy';
    column table_name format a10
    column column_name format a11
    SELECT table_name, num_rows, last_analyzed FROM user_tables WHERE table_name LIKE 'T_' ORDER BY 1;
    SELECT table_name, column_name, num_distinct, histogram, num_buckets FROM user_tab_columns WHERE table_name LIKE 'T_' ORDER BY 1,2;
    But I only get table and column statistics on T2, the one without the statistics preference.
    TABLE_NAME   NUM_ROWS LAST_ANALYZED
    ---------- ---------- -----------------
    T1
    T2 100000 10:08:30 15/01/20

    Table Column
    Name Name NUM_DISTINCT HISTOGRAM NUM_BUCKETS
    ----- ------ ------------ --------------- -----------
    T1 A NONE
    T1 B NONE
    T1 C NONE
    T2 A 100000 NONE 1
    T2 B 98928 NONE 1
    T2 C 317 NONE 1
    It appears that I don't get statistics on T1 because I have specified a table statistics preference that is specific to some named columns. It doesn't have to specify creating a histogram, it might be preventing a histogram from being created.
    For example, this preference does not disable on-line statistics collection.
    EXEC dbms_stats.set_table_prefs(user,'t2','METHOD_OPT','FOR ALL COLUMNS SIZE 1');
    But these preferences do disable on-line statistics collection.
    EXEC dbms_stats.set_table_prefs(user,'t2','METHOD_OPT','FOR COLUMNS SIZE 1 B C');
    EXEC dbms_stats.set_table_prefs(user,'t2','METHOD_OPT','FOR COLUMNS SIZE 1 A B C');
    I have not found any other statistics preferences (for other DBMS_STATS parameters) that cause this behaviour.

    Conclusion 

    Table preferences are recommended as a method of controlling statistics collection declaratively and consistently. You don't have to specify parameters to DBMS_STATS into scripts collect statistics ad-hoc. The table statistics preferences provide a method that every time statistics are collected on a particular table, they are collected consistently, albeit in a way that may be different from the default.
    However, take the example of an ETL process loading data into a data warehouse. If you rely on on-line statistics gathering to collect table statistics as a part of a data load process, you must now be careful not to disable statistics collection during the load with METHOD_OPT statistics preferences.

    Analysing Database Time with Active Session History for Statements with On-line Optimizer Statistics Gathering Operations

    $
    0
    0
    I have been looking into the performance of on-line statistics collection. When statistics are collected on-line there is an extra OPTIMIZER STATISTICS GATHERING operation in the execution plan. However, I have noticed that the presence or absence of this operation does not change the hash value of the plan. This has consequences for profiling DB time by execution plan line and then describing that line from a captured plan.

    OPTIMIZER STATISTICS GATHERING Operation

    From 12c, statistics are collected on-line during either a create-table-as-select operation or the initial direct-path insert into a new segment.  Below, I have different statements, whose execution plans have the same plan hash value, but actually differ. So, the differences are in areas that do not contribute to the plan hash value.
    • The first statement performs online statistics gathering, and so the plan includes the OPTIMIZER STATISTICS GATHERING operation, the second does not.
    • Note also that the statements insert into different tables, and that does not alter the plan hash value either. However, if the data was queried from different tables that would have produced a different plan hash value.
    INSERT /*+APPEND PARALLEL(i)*/ into T2 i SELECT * /*+*/ FROM t1 s

    Plan hash value: 90348617
    ---------------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
    ---------------------------------------------------------------------------------------------------------
    | 0 | INSERT STATEMENT | | | | 178K(100)| | | |
    | 1 | LOAD AS SELECT | T2 | | | | | | |
    | 2 | OPTIMIZER STATISTICS GATHERING | | 100M| 4005M| 178K (1)| 00:00:07 | | |
    | 3 | PARTITION RANGE ALL | | 100M| 4005M| 178K (1)| 00:00:07 | 1 |1048575|
    | 4 | TABLE ACCESS STORAGE FULL | T1 | 100M| 4005M| 178K (1)| 00:00:07 | 1 |1048575|
    ---------------------------------------------------------------------------------------------------------
    INSERT /*+APPEND PARALLEL(i) NO_GATHER_OPTIMIZER_STATISTICS*/ into T3 i
    SELECT /*+*/ * FROM t1 s

    Plan hash value: 90348617
    ----------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
    ----------------------------------------------------------------------------------------------------
    | 0 | INSERT STATEMENT | | | | 178K(100)| | | |
    | 1 | LOAD AS SELECT | T3 | | | | | | |
    | 2 | PARTITION RANGE ALL | | 100M| 4005M| 178K (1)| 00:00:07 | 1 |1048575|
    | 3 | TABLE ACCESS STORAGE FULL| T1 | 100M| 4005M| 178K (1)| 00:00:07 | 1 |1048575|
    ----------------------------------------------------------------------------------------------------
    I find that it is often useful to profile database time from DBA_HIST_ACTIVE_SESS_HISTORY (or v$active_session_history) by line in the execution plan, in order to see how much time was consumed by the different operations. I can then join the profile to DBA_HIST_SQL_PLAN (or v$sql_plan) to see what is the operation for each line. So long as I also join these tables by SQL_ID, the answer I get will be correct, but I may not always get an answer.
    column inst_id heading 'Inst|Id' format 99
    column sql_plan_line_id heading 'SQL Plan|Line ID'
    column sql_plan_hash_value heading 'SQL Plan|Hash Value'
    column ash_secs heading 'ASH|Secs' format 999
    break on sql_id skip 1
    with h as (
    SELECT h.dbid, h.sql_id, h.sql_plan_line_id, h.sql_plan_hash_Value
    , SUM(10) ash_secs
    FROM dba_hist_Active_Sess_history h
    WHERE h.sql_plan_hash_value = 90348617
    AND h.sql_id IN('g7awpb71jbup1','c2dy3rmnqp7d7','drrbxctf8t5nz','7140frhyu42t5')
    GROUP BY h.dbid, h.sql_id, h.sql_plan_line_id, h.sql_plan_hash_Value
    )
    SELECT h.*, p.operation
    FROM h
    LEFT OUTER JOIN dba_hist_sql_plan p
    ON p.dbid = h.dbid
    and p.sql_id = h.sql_id
    AND p.plan_hash_value = h.sql_plan_hash_value
    AND p.id = h.sql_plan_line_id
    ORDER BY 1,2,3
    /
    If the plan was not captured into AWR or is no longer in the library cache, I don't get a description of the operations in the plan.
                    SQL Plan   SQL Plan  ASH
    SQL_ID Line ID Hash Value Secs OPERATION
    ------------- ---------- ---------- ---- --------------------------------
    0s4ruucw2wvsw 0 90348617 4 INSERT STATEMENT
    1 90348617 77 LOAD AS SELECT
    2 90348617 25 OPTIMIZER STATISTICS GATHERING
    3 90348617 11 PARTITION RANGE
    4 90348617 24 TABLE ACCESS

    33x8fjppwh095 0 90348617 2 INSERT STATEMENT
    1 90348617 89 LOAD AS SELECT
    2 90348617 10 PARTITION RANGE
    3 90348617 20 TABLE ACCESS

    7140frhyu42t5 0 90348617 1
    1 90348617 83
    2 90348617 8
    3 90348617 28

    9vky53vhy5740 0 90348617 3
    1 90348617 89
    2 90348617 23
    3 90348617 9
    4 90348617 22
    Normally, I would look for another SQL_ID that produced the same plan hash value. However, for an execution plan that only sometimes includes on-line statistics gathering, the operations may not match correctly because the OPTIMIZER STATISTICS GATHERING operation changes the line IDs.
    WITH h as (
    SELECT h.dbid, h.sql_id, h.sql_plan_line_id, h.sql_plan_hash_Value
    , SUM(10) ash_secs
    FROM dba_hist_Active_Sess_history h
    WHERE h.sql_plan_hash_value = 90348617
    AND h.sql_id IN('g7awpb71jbup1','c2dy3rmnqp7d7','drrbxctf8t5nz','7140frhyu42t5')
    GROUP BY h.dbid, h.sql_id, h.sql_plan_line_id, h.sql_plan_hash_Value
    ), p as (
    SELECT DISTINCT dbid, plan_hash_value, id, operation
    from dba_hist_sql_plan

    )
    SELECT h.*, p.operation
    FROM h
    LEFT OUTER JOIN p
    ON p.dbid = h.dbid
    AND p.plan_hash_value = h.sql_plan_hash_value
    AND p.id = h.sql_plan_line_id
    ORDER BY 1,2,3
    /
    If I just join the ASH profile to a distinct list of ID and operation for the same plan hash value but matching any SQL_ID, I can get duplicate rows returned, starting at the line with the OPTIMIZER STATISTICS GATHERING operation because I have different plans with the same plan hash value.
                               SQL Plan   SQL Plan  ASH
    DBID SQL_ID Line ID Hash Value Secs OPERATION
    ---------- ------------- ---------- ---------- ---- ------------------------------
    1278460406 7140frhyu42t5 1 90348617 80 LOAD AS SELECT
    1278460406 2 90348617 10 OPTIMIZER STATISTICS GATHERING
    1278460406 2 90348617 10 PARTITION RANGE
    1278460406 3 90348617 30 PARTITION RANGE
    1278460406 3 90348617 30 TABLE ACCESS
    ...
    To mitigate this problem, in the following SQL Query, I check that the maximum plan line ID for which I have ASH data matches the maximum line ID (i.e. the number of lines) in any alternative plan with the same hash value.
    WITH h as (
    SELECT h.dbid, h.sql_id, h.sql_plan_line_id, h.sql_plan_hash_Value
    , SUM(10) ash_secs
    FROM dba_hist_Active_Sess_history h
    WHERE h.sql_plan_hash_value = 90348617
    AND h.sql_id IN('g7awpb71jbup1','c2dy3rmnqp7d7','drrbxctf8t5nz','7140frhyu42t5')
    GROUP BY h.dbid, h.sql_id, h.sql_plan_line_id, h.sql_plan_hash_value
    ), x as (
    SELECT h.*
    , MAX(sql_plan_line_id) OVER (PARTITION BY h.dbid, h.sql_id) plan_lines
    , p1.operation
    FROM h
    LEFT OUTER JOIN dba_hist_sql_plan p1
    ON p1.dbid = h.dbid
    AND p1.sql_id = h.sql_id
    AND p1.plan_hash_value = h.sql_plan_hash_value
    AND p1.id = h.sql_plan_line_id
    )
    SELECT x.*
    , (SELECT p2.operation
    FROM dba_hist_sql_plan p2
    WHERE p2.dbid = x.dbid
    AND p2.plan_hash_value = x.sql_plan_hash_value
    AND p2.id = x.sql_plan_line_id
    AND p2.sql_id IN(
    SELECT p.sql_id
    FROM dba_hist_sql_plan p
    WHERE p.dbid = x.dbid
    AND p.plan_hash_value = x.sql_plan_hash_value
    GROUP BY p.dbid, p.sql_id
    HAVING MAX(p.id) = x.plan_lines)
    AND rownum = 1) operation2
    FROM x
    ORDER BY 1,2,3
    /
    Now, I get an operation description for every line ID (if the same plan was gathered for a different SQL_ID).
                               SQL Plan   SQL Plan  ASH
    DBID SQL_ID Line ID Hash Value Secs PLAN_LINES OPERATION OPERATION2
    ---------- ------------- ---------- ---------- ---- ---------- -------------------------------- ------------------------------
    1278460406 7140frhyu42t5 1 90348617 80 3 LOAD AS SELECT
    1278460406 2 90348617 10 3 PARTITION RANGE
    1278460406 3 90348617 30 3 TABLE ACCESS

    1278460406 c2dy3rmnqp7d7 1 90348617 520 4 LOAD AS SELECT LOAD AS SELECT
    1278460406 2 90348617 100 4 OPTIMIZER STATISTICS GATHERING OPTIMIZER STATISTICS GATHERING
    1278460406 3 90348617 80 4 PARTITION RANGE PARTITION RANGE
    1278460406 4 90348617 280 4 TABLE ACCESS TABLE ACCESS
    1278460406 90348617 30 4

    1278460406 drrbxctf8t5nz 1 90348617 100 4 LOAD AS SELECT
    1278460406 2 90348617 10 4 OPTIMIZER STATISTICS GATHERING
    1278460406 3 90348617 10 4 PARTITION RANGE
    1278460406 4 90348617 50 4 TABLE ACCESS

    1278460406 g7awpb71jbup1 1 90348617 540 3 LOAD AS SELECT LOAD AS SELECT
    1278460406 2 90348617 60 3 PARTITION RANGE PARTITION RANGE
    1278460406 3 90348617 90 3 TABLE ACCESS TABLE ACCESS
    1278460406 90348617 20 3
    However, this approach, while better, is still not perfect. I may not have sufficient DB time for the last line in the execution plan to be sampled, and therefore I may not choose a valid alternative plan.

    Autonomous & Cloud Databases

    Automatic on-line statistics gathering is becoming a more common occurrence.
    • In the Autonomous Data Warehouse, Oracle has set _optimizer_gather_stats_on_load_all=TRUE, so statistics are collected on every direct-path insert. 
    • From 19c, on Engineered Systems (both in the cloud and on-premises), Real-Time statistics are collected during conventional DML (on inserts, updates and some deletes), also using the OPTIMIZER STATISTICS GATHERING operation. Again, the presence or absence of this operation does not affect the execution plan hash value.
    SQL_ID  f0fsghg088k3q, child number 0
    -------------------------------------
    INSERT INTO t2 SELECT * FROM t1

    Plan hash value: 589593414
    ---------------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
    ---------------------------------------------------------------------------------------------------------
    | 0 | INSERT STATEMENT | | | | 1879 (100)| | | |
    | 1 | LOAD TABLE CONVENTIONAL | T2 | | | | | | |
    | 2 | OPTIMIZER STATISTICS GATHERING | | 1000K| 40M| 1879 (1)| 00:00:01 | | |
    | 3 | PARTITION RANGE ALL | | 1000K| 40M| 1879 (1)| 00:00:01 | 1 |1048575|
    | 4 | TABLE ACCESS STORAGE FULL | T1 | 1000K| 40M| 1879 (1)| 00:00:01 | 1 |1048575|
    ---------------------------------------------------------------------------------------------------------
    SQL_ID  360pwsfmdkxf4, child number 0
    -------------------------------------
    INSERT /*+NO_GATHER_OPTIMIZER_STATISTICS*/ INTO t3 SELECT * FROM t1

    Plan hash value: 589593414
    ----------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
    ----------------------------------------------------------------------------------------------------
    | 0 | INSERT STATEMENT | | | | 1879 (100)| | | |
    | 1 | LOAD TABLE CONVENTIONAL | T3 | | | | | | |
    | 2 | PARTITION RANGE ALL | | 1000K| 40M| 1879 (1)| 00:00:01 | 1 |1048575|
    | 3 | TABLE ACCESS STORAGE FULL| T1 | 1000K| 40M| 1879 (1)| 00:00:01 | 1 |1048575|
    ----------------------------------------------------------------------------------------------------

    Online Statistics Collection during Bulk Loads on Partitioned Tables

    $
    0
    0

    Introduction

    One of the enhancements to statistics collection and management in Oracle 12c was the ability of the database will automatically collect statistics during either a create-table-as-select operation or during the initial insert into a freshly created or freshly truncated table, provide that insert is done in direct-path mode (i.e. using the APPEND hint).
    When that occurs, there is an additional operation in the execution plan; OPTIMIZER STATISTICS GATHERING.
    ----------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    ----------------------------------------------------------------------------------------------------
    | 0 | INSERT STATEMENT | | | | 495K(100)| |
    | 1 | LOAD AS SELECT | | | | | |
    | 2 | OPTIMIZER STATISTICS GATHERING | | 70M| 11G| 495K (2)| 00:00:20 |
    | 3 | TABLE ACCESS FULL | XXXXXXXXXXXXXXX | 70M| 11G| 495K (2)| 00:00:20 |
    ----------------------------------------------------------------------------------------------------
    The motivation for this blog was encountering a bulk insert into a partitioned table where the statistics gathering operation consumed a very significant amount of time. Partitioning gives you more things to consider.

    A Simple Test

    I created a simple test that compares the time taken by online statistics collection on partitioned and non-partitioned tables, with the explicit collection of statistics using DBMS_STATS. I have four tables with the same structure.
    • T1: Not partitioned. Data will be copied from this table to each of the others. 
    • T2: Partitioned. Online statistics only. 
    • T3: Partitioned. Explicitly gathered statistics. 
    • T4: Partitioned. Explicitly gathered incremental statistics.
    CREATE TABLE T1 (a number, b varchar2(1000), c number) NOLOGGING;
    CREATE TABLE T2 (a number, b varchar2(1000), c number)
    PARTITION BY RANGE (a) INTERVAL(100) (PARTITION t_part VALUES less than (101)) NOLOGGING;
    CREATE TABLE T3 (a number, b varchar2(1000), c number)
    PARTITION BY RANGE (a) INTERVAL(100) (PARTITION t_part VALUES less than (101)) NOLOGGING;
    CREATE TABLE T4 (a number, b varchar2(1000), c number)
    PARTITION BY RANGE (a) INTERVAL(100) (PARTITION t_part VALUES less than (101)) NOLOGGING;
    I loaded 100 million rows into each in direct-path mode. The partitioned tables end up with 100 partitions, each with 1 million rows. I have also suppressed redo logging during the direct-path insert by creating the tables with the NOLOGGING attribute.
    EXEC dbms_stats.set_table_prefs(user,'T3','INCREMENTAL','FALSE');
    EXEC dbms_stats.set_table_prefs(user,'T4','INCREMENTAL','TRUE');
    The following set of tests will be run for different combinations of:
    • Parallel hint on query, or not 
    • Parallel hint on insert, or not 
    • Table parallelism specified 
    • Parallel DML enabled or disabled at session level 
    • Column-specific METHOD_OPT table preference specified or not. 
    I enabled SQL trace, from which I was able to obtain the elapsed time of the various statements, and I can determine the amount of time spent on online statistics gathering from timings on the OPTIMIZER STATISTICS GATHERING operation in the execution plan in the trace.
    TRUNCATE TABLE T2;
    TRUNCATE TABLE T3;
    EXEC dbms_stats.delete_table_stats(user,'T2');
    EXEC dbms_stats.delete_table_stats(user,'T3');
    EXEC dbms_stats.delete_table_stats(user,'T4');

    INSERT /*+APPEND &inshint*/ into T2 i SELECT * /*+&selhint*/ from t1 s;
    INSERT /*+APPEND &inshint NO_GATHER_OPTIMIZER_STATISTICS*/ into T3 i SELECT /*+&selhint*/ * from t1 s;
    INSERT /*+APPEND &inshint NO_GATHER_OPTIMIZER_STATISTICS*/ into T4 i SELECT /*+&selhint*/ * from t1 s;
    commit;
    EXEC dbms_stats.gather_table_stats(user,'T3');
    EXEC dbms_stats.gather_table_stats(user,'T4');

    Quirks

    It was while building this test that I discovered a couple of quirks:

    What Statistics Are Normally Collected by Online Statistics Gathering? 

    After just the initial insert, I can see that I have table statistics on T1 and T2, but not on T3 and T4.
    SELECT table_name, num_rows from user_tables where table_name LIKE 'T_' order by 1; 

    TABLE_NAME NUM_ROWS LAST_ANALYZED
    ---------- ---------- -----------------
    T1 10000000 14:07:36 16/01/20
    T2 10000000 14:07:36 16/01/20
    T3
    T4
    I also have column statistics on T1 and T2, but no histograms.
    break on table_name skip 1
    SELECT table_name, column_name, num_distinct, global_stats, histogram, num_buckets, last_analyzed
    FROM user_tab_columns where table_name like 'T_' order by 1,2;

    TABLE_NAME COLUMN_NAME NUM_DISTINCT GLO HISTOGRAM NUM_BUCKETS LAST_ANALYZED
    ---------- ------------ ------------ --- --------------- ----------- -----------------
    T1 A 10000 YES NONE 1 14:06:58 16/01/20
    B 10000 YES NONE 1 14:06:58 16/01/20
    C 100 YES NONE 1 14:06:58 16/01/20

    T2 A 10000 YES NONE 1 14:07:11 16/01/20
    B 10000 YES NONE 1 14:07:11 16/01/20
    C 100 YES NONE 1 14:07:11 16/01/20

    T3 A NO NONE
    B NO NONE
    C NO NONE

    T4 A NO NONE
    B NO NONE
    C NO NONE
    However, I do not have any partition statistics (I have only shown the first and last partition of each table in this report).
    break on table_name skip 1
    SELECT table_name, partition_position, partition_name, num_rows
    FROM user_tab_partitions WHERE table_name like 'T_' ORDER BY 1,2 nulls first;

    TABLE_NAME PARTITION_POSITION PARTITION_NAME NUM_ROWS LAST_ANALYZED
    ---------- ------------------ -------------------- ---------- -----------------
    T2 1 T_PART

    100 SYS_P20008

    T3 1 T_PART

    100 SYS_P20107

    T4 1 T_PART

    100 SYS_P20206
    Online optimizer statistics gathering only collects statistics at table level but not partition or sub-partition level. Histograms are not collected.
    From Oracle 18c, there are two undocumented parameters that modify this behaviour. Both default to false. Interestingly, both are enabled in the Oracle Autonomous Data Warehouse.
    • If _optimizer_gather_stats_on_load_hist=TRUE histograms are be collected on all columns during online statistics collection. 
    • If _optimizer_gather_stats_on_load_all=TRUE statistics are collected online during every direct-path insert, not just the first one into a segment. 

    Do I Need Partition Statistics?

    Statistics will be collected on partitions that do not have them when the automatic statistics collection job runs in the next database maintenance window. The question is whether to manage without them until then?
    "The optimizer will use global or table level statistics if one or more of your queries touches two or more partitions. The optimizer will use partition level statistics if your queries do partition elimination, such that only one partition is necessary to answer each query. If your queries touch two or more partitions the optimizer will use a combination of global and partition level statistics."
     – Oracle The Data Warehouse Insider Blog: Managing Optimizer Statistics in an Oracle Database 11g - Maria Colgan
    It will depend upon the nature of the SQL in the application. If the optimizer does some partition elimination, and the data is not uniformly distributed across the partitions, then partition statistics are likely to be beneficial. If there is no partition elimination, then you might question whether partitioning (or at least the current partitioning strategy) is appropriate!

    What is the Fastest Way to Collect Statistics on Partitioned Tables?

    Let's look at how long it takes to insert data into, and then subsequently collect statistics on the tables in my example. This test was run on Oracle 19c on one compute node of a virtualised Exadata X4 machine with 16 CPUs.  This table shows elapsed time and the total DB time include all parallel server processes for each operation.
    Table Name
    Oper-ationCommentOptionSerial Insert & StatisticsParallel Insert & StatisticsParallel SQL & StatisticsParallel DML, Insert, Select & StatisticsParallel DML, SQL & StatisticsParallel Tables Parallel Tables & DMLParallel Tables, DML & Method Opt
    Table
    NOPARALLELNOPARALLELNOPARALLELNOPARALLELNOPARALLELPARALLELPARALLELPARALLEL
    Insert HintblankPARALLEL(i)blankPARALLEL(i)blankblankblankblank
    Select HintblankPARALLEL(s)PARALLELPARALLEL(s)PARALLELblankblankblank
    Parallel DMLDISABLEDISABLEDISABLEENABLEENABLEDISABLEENABLEENABLE
    Stats DegreenoneDEFAULTDEFAULTDEFAULTDEFAULTnonenonenone
    Method Optnonenonenonenonenonenonenone... FOR COLUMNS SIZE 1 A
    T2
    Insert
    Online Stats Gathering
    Elapsed Time (s)
    172.46
    160.86
    121.61
    108.29
    60.31
    194.47
    23.57
    20.57
    Optimizer Statistics Gathering
    82.71
    55.59
    55.90
    -
    -
    -
    -
    -
    T3
    Insert
    NO_GATHER_OPTIMIZER_STATS
    125.40
    156.36
    124.18
    20.62
    29.01
    199.20
    20.97
    21.15
    Explicit Stats
    122.80
    146.25
    63.23
    15.99
    24.88
    24.58
    24.99
    24.62
    T4
    Insert
    NO_GATHER_OPTIMIZER_STATS
    123.18
    158.15
    147.04
    20.44
    29.91
    204.61
    20.65
    20.60
    Incremental Explicit Stats
    80.51
    104.85
    46.05
    23.42
    23.14
    23.21
    22.60
    23.03
    T2
    Insert
    Online Stats Gathering
    DB Time (s)
    174
    163169359337248366308
    T3
    Insert
    NO_GATHER_OPTIMIZER_STATS
    128
    193160290211236312326
    Explicit Stats
    122
    14663265305262335
    T4
    Insert
    NO_GATHER_OPTIMIZER_STATS
    126
    194167295205233304295
    Incremental Explicit Stats
    80
    1052281266300179226
    • It is difficult to determine the actual duration of the OPTIMIZER STATISTICS GATHERING operation, short of measuring the effect of disabling it. The time in the above table has been taken from SQL trace files. That duration is always greater than the amount saved by disabling online statistics gathering with the NO_GATHER_OPTIMIZER_STATS hint. However, the amount of time accounted in Active Session History (ASH) for that line in the execution plan is usually less than the elapsed saving. 
      • Eg. For the sequential insert, 83s was accounted for OPTIMIZER STATISTICS GATHERING in the trace, while ASH showed only 23s of database time for that line of the plan. However, perhaps the only meaningful measurement is that disabling online statistics gathering saved 47s, 
    • DML statements, including insert statements in direct-path, only actually execute in parallel if parallel DML is enabled. Specifying a degree of parallelism on the table, or a parallel hint is not enough. Parallel DML should be enabled.
      • either at session level
    ALTER SESSION ENABLE PARALLEL DML;
      • or for the individual statement.
    insert /*+APPEND ENABLE_PARALLEL_DML*/ into T2 SELECT * from t1;
      • Specifying parallel insert with a hint, without enabling parallel DML will not improve performance and can actually degrade it.
      • Specifying parallel query without running the insert in parallel can also degrade performance.
    • Online statistics will be collected in parallel if
      • either the table being queried has a degree of parallelism,
      • or a parallel hint applies to the table being queried, or the entire statement,
      • or parallel DML has been enabled 
    • Where statistics are collected explicitly (i.e. with a call to DBMS_STATS.GATHER_TABLE_STATS) they are collected in parallel if 
      • either, the DEGREE is specified (I specified a table statistics preference),
    EXEC dbms_stats.set_table_prefs(user,'T3','DEGREE','DBMS_STATS.DEFAULT_DEGREE');
      • or the table has a degree of parallelism.
    ALTER TABLE T3 PARALLEL;
    • Incremental statistics are generally faster to collect because they calculate table-level statistics from partition-level statistics, saving a second pass through the data.
    • When parallel DML is enabled at session level, I found that the performance of statistics collection also improves.

    Conclusion

    Overall, the best performance was obtained when the tables were altered to use parallelism, and parallel DML was enabled; then the query, insert and statistics collection are performed in parallel.
    However, the improved performance of parallelism comes at a cost.  It can be a brutal way of bringing more resource to bear on an activity.  A parallel operation can be expected to use more database time across all the parallel server processes than the same operation processed serially.  My best results were obtained by activating all of the CPUs on the server without regard for any other activity.  Too many concurrent parallel operations have the potential to overload a system.  Remember also, that while the parallel attribute remains on the table any subsequent query will also run in parallel.
    Suppressing online statistics collection saves total database time whether working in parallel or not. The saving in elapsed time is reduced when the insert and query are running in parallel.  The time taken to explicitly collect statistics will exceed that saving because it is doing additional work to collect partition statistics not done during online statistics collection.
    Using incremental statistics for partitioned tables will also reduce the total amount of work and database time required to gather statistics, but may not significantly change the elapsed time to collect statistics.
    If you need table statistics but can manage without partition statistics until the next maintenance window, then online statistics collection is very effective. However, I think the general case will be to require partition statistics, so you will probably need to explicitly collect statistics instead.  If you want histograms, then you will also need to explicitly collect statistics.

    Data Warehouse Design: Snowflake Dimensions and Lost Skew Trap

    $
    0
    0
    This post is part of a series that discusses some common issues in data warehouses. Originally written in 2018, but I never got round to publishing it.
    While I was experimenting with the previous query I noticed that the cost of the execution plans didn't change as I changed the COUNTRY_ISO_CODE, yet the data volumes for different countries are very different.
    select c.country_name
    , u.cust_state_province
    , COUNT(*) num_sales
    , SUM(s.amount_sold) total_amount_sold
    from sales s
    , customers u
    , products p
    , times t
    , countries c
    WHERE s.time_id = t.time_id
    AND s.prod_id = p.prod_id
    AND u.cust_id = s.cust_id
    AND u.country_id = c.country_id
    AND c.country_iso_code = '&&iso_country_code'
    AND p.prod_category_id = 205
    and t.fiscal_year = 1999
    GROUP BY c.country_name, u.cust_state_province
    ORDER BY 1,2
    /
    Plan hash value: 3095970037
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | Pstart| Pstop | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1 | | | 1473 (100)| | | | 45 |00:00:01.77 | 101K| | | |
    | 1 | TEMP TABLE TRANSFORMATION | | 1 | | | | | | | 45 |00:00:01.77 | 101K| | | |
    | 2 | LOAD AS SELECT (CURSOR DURATION MEMORY) | SYS_TEMP_0FD9D7C68_A4BC21 | 1 | | | | | | | 0 |00:00:00.13 | 1889 | 1024 | 1024 | |
    | * 3 | HASH JOIN | | 1 | 2413 | 94107 | 418 (1)| 00:00:01 | | | 18520 |00:00:00.10 | 1888 | 1185K| 1185K| 639K (0)|
    | * 4 | TABLE ACCESS FULL | COUNTRIES | 1 | 1 | 18 | 2 (0)| 00:00:01 | | | 1 |00:00:00.01 | 2 | | | |
    | 5 | TABLE ACCESS FULL | CUSTOMERS | 1 | 55500 | 1138K| 416 (1)| 00:00:01 | | | 55500 |00:00:00.02 | 1521 | | | |
    | 6 | SORT GROUP BY | | 1 | 2359 | 101K| 1055 (1)| 00:00:01 | | | 45 |00:00:01.65 | 99111 | 6144 | 6144 | 6144 (0)|
    | * 7 | HASH JOIN | | 1 | 3597 | 154K| 1054 (1)| 00:00:01 | | | 64818 |00:00:01.58 | 99111 | 2391K| 1595K| 2025K (0)|
    | 8 | TABLE ACCESS FULL | SYS_TEMP_0FD9D7C68_A4BC21 | 1 | 2413 | 62738 | 5 (0)| 00:00:01 | | | 18520 |00:00:00.01 | 0 | | | |
    | 9 | VIEW | VW_ST_C525CEF3 | 1 | 3597 | 64746 | 1048 (1)| 00:00:01 | | | 64818 |00:00:01.44 | 99111 | | | |
    Note:
    • There are 55500 rows on CUSTOMERS
    • There are 23 rows on COUNTRIES
    • Oracle expects 2413 rows on joining those tables
      • 55500÷23= 2413.04, so Oracle assumes the data is evenly distributed between countries, although there are histograms on COUNTRY_ISO_CODE and COUNTRY_ID. 
      • This is sometimes called 'lost skew'. The skew of a dimension does not pass into the cardinality calculation on the fact table.
    If I replace the predicate on COUNTRY_ISO_CODE with a predicate on COUNTRY_ID then the estimate of the number of rows from customers is correctly 18520 rows. The cost of the star transformation has gone up from 1473 to 6922.
    Plan hash value: 1339390240

    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | Pstart| Pstop | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1 | | | 6922 (100)| | | | 45 |00:00:01.50 | 97998 | | | |
    | 1 | TEMP TABLE TRANSFORMATION | | 1 | | | | | | | 45 |00:00:01.50 | 97998 | | | |
    | 2 | LOAD AS SELECT (CURSOR DURATION MEMORY) | SYS_TEMP_0FD9D7C6A_A4BC21 | 1 | | | | | | | 0 |00:00:00.06 | 1524 | 1024 | 1024 | |
    | 3 | NESTED LOOPS | | 1 | 18520 | 651K| 417 (1)| 00:00:01 | | | 18520 |00:00:00.04 | 1523 | | | |
    | 4 | TABLE ACCESS BY INDEX ROWID | COUNTRIES | 1 | 1 | 15 | 1 (0)| 00:00:01 | | | 1 |00:00:00.01 | 2 | | | |
    | 5 | INDEX UNIQUE SCAN | COUNTRIES_PK | 1 | 1 | | 0 (0)| | | | 1 |00:00:00.01 | 1 | | | |
    | 6 | TABLE ACCESS FULL | CUSTOMERS | 1 | 18520 | 379K| 416 (1)| 00:00:01 | | | 18520 |00:00:00.03 | 1521 | | | |
    | 7 | SORT GROUP BY | | 1 | 2359 | 101K| 6505 (1)| 00:00:01 | | | 45 |00:00:01.43 | 96473 | 6144 | 6144 | 6144 (0)|
    | 8 | HASH JOIN | | 1 | 82724 | 3554K| 6499 (1)| 00:00:01 | | | 64818 |00:00:01.37 | 96473 | 2391K| 1595K| 2002K (0)|
    | 9 | TABLE ACCESS FULL | SYS_TEMP_0FD9D7C6A_A4BC21 | 1 | 18520 | 470K| 25 (0)| 00:00:01 | | | 18520 |00:00:00.01 | 0 | | | |
    In fact, I only get the star transformation if I force the issue with a STAR_TRANSFORMATION hint. Otherwise, I get the full scan plan which is much cheaper, but again the cardinality calculation on CUSTOMERS is correct.
    Plan hash value: 3784979335
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | Pstart| Pstop | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1 | | | 1595 (100)| | | | 45 |00:00:00.54 | 2065 | 472 | | | |
    | 1 | SORT GROUP BY | | 1 | 45 | 3510 | 1595 (3)| 00:00:01 | | | 45 |00:00:00.54 | 2065 | 472 | 6144 | 6144 | 6144 (0)|
    | 2 | HASH JOIN | | 1 | 81133 | 6180K| 1589 (3)| 00:00:01 | | | 64818 |00:00:00.43 | 2065 | 472 | 2337K| 2200K| 2221K (0)|
    | 3 | TABLE ACCESS FULL | CUSTOMERS | 1 | 18520 | 379K| 416 (1)| 00:00:01 | | | 18520 |00:00:00.02 | 1521 | 0 | | | |
    | 4 | HASH JOIN | | 1 | 81133 | 4516K| 1172 (3)| 00:00:01 | | | 110K|00:00:00.35 | 544 | 472 | 2546K| 2546K| 1610K (0)|
    | 5 | TABLE ACCESS FULL | PRODUCTS | 1 | 26 | 208 | 3 (0)| 00:00:01 | | | 26 |00:00:00.01 | 4 | 0 | | | |
    | 6 | HASH JOIN | | 1 | 229K| 10M| 1167 (3)| 00:00:01 | | | 246K|00:00:00.30 | 539 | 472 | 1133K| 1133K| 1698K (0)|
    | 7 | PART JOIN FILTER CREATE | :BF0000 | 1 | 364 | 9828 | 17 (0)| 00:00:01 | | | 364 |00:00:00.01 | 57 | 0 | | | |
    | 8 | NESTED LOOPS | | 1 | 364 | 9828 | 17 (0)| 00:00:01 | | | 364 |00:00:00.01 | 57 | 0 | | | |
    | 9 | TABLE ACCESS BY INDEX ROWID| COUNTRIES | 1 | 1 | 15 | 1 (0)| 00:00:01 | | | 1 |00:00:00.01 | 2 | 0 | | | |
    | 10 | INDEX UNIQUE SCAN | COUNTRIES_PK | 1 | 1 | | 0 (0)| | | | 1 |00:00:00.01 | 1 | 0 | | | |
    | 11 | TABLE ACCESS FULL | TIMES | 1 | 364 | 4368 | 16 (0)| 00:00:01 | | | 364 |00:00:00.01 | 55 | 0 | | | |
    | 12 | PARTITION RANGE JOIN-FILTER | | 1 | 918K| 19M| 1142 (3)| 00:00:01 |:BF0000|:BF0000| 296K|00:00:00.21 | 482 | 472 | | | |
    | 13 | TABLE ACCESS FULL | SALES | 5 | 918K| 19M| 1142 (3)| 00:00:01 |:BF0000|:BF0000| 296K|00:00:00.20 | 482 | 472 | | | |
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    Oracle 19c: Automatic Indexing. Part 1. Introduction

    $
    0
    0
    This is the first of a two-part post that looks at the Automatic Indexing feature introduced in Oracle 19c, available on engineered systems only. Initially, I simply wanted to see what it does and to understand how it worked.
    Next, I wanted to see how good it is. I created a test based on Dominic Giles'Swingbench Sales Order Entry benchmark. Having dropped the secondary indexes (ones not involved in key constraints), I wanted to see which Automatic Indexing would recreate and whether that would reinstate the original performance.

    References and Acknowledgements 

    This blog is not intended to provide a comprehensive description of Automatic Indexing.  I explain some things as I go along, but I have referenced the sources that I found helpful.
    The Oracle 19c documentation is not particularly verbose. Automatic Indexing is introduced in New Database Features Guide: Big Data & Data Warehousing: Automatic Indexing.
    "The automatic indexing feature automates index management tasks, such as creating, rebuilding, and dropping indexes in an Oracle Database based on changes in the application workload. This feature improves database performance by managing indexes automatically in an Oracle Database."
    However, there is more information the Database Administrator's Guide at Managing Auto Indexes:
    "The automatic indexing feature automates the index management tasks in an Oracle database. Automatic indexing automatically creates, rebuilds, and drops indexes in a database based on the changes in application workload, thus improving database performance. The automatically managed indexes are known as auto indexes.
    Index structures are an essential feature to database performance. Indexes are critical for OLTP applications, which use large data sets and run millions of SQL statements a day. Indexes are also critical for data warehousing applications, which typically query a relatively small amount of data from very large tables. If you do not update the indexes whenever there are changes in the application workload, the existing indexes can cause the database performance to deteriorate considerably. 
    Automatic indexing improves database performance by managing indexes automatically and dynamically in an Oracle database based on changes in the application workload." 
    Maria Colgan (the Master Oracle Database product manager) has blogged and presented on this feature:
    Automatic Indexing is certainly intended for use in the Autonomous Database, but also for other 19c Exadata databases. These presentations also make it clear that Automatic Indexing is intended for OLTP as well as Warehouse and Analytic databases. Some of the examples refer to packaged applications (an unnamed Accounts Receivable system, and a PeopleSoft ERP system).
    I found a number of other valuable resources that helped me to get it going, monitor it, and to begin to understand what was going on behind the scenes.

    How does Automatic Indexing Work?

    Automatic Indexing is an expert system that runs with two background scheduler automatic tasks. By default, both run every 15 minutes.
    • Auto STS Capture Task captures workload into a SQL tuning set SYS_AUTO_STS. This process runs regardless of the Automatic Indexing configuration. It has a maximum runtime of 15 minutes. 
    • Auto Index Task runs if AUTO_INDEX_MODE is not OFF. It has a maximum runtime of 1 hour. This process creates automatic indexes. Initially, they are invisible and unusable. It checks whether the optimizer will use them. If so, it rebuilds them as usable invisible indexes and checks for improved performance before making them visible. It may also make them invisible again later.
    Indexes created by Automatic Indexing are created with the AUTO option, and are identified in ALL_INDEXES with the AUTO attribute.  Automatic indexes will be dropped if they haven't been used for longer than a specified retention period (default 373 days). Optionally, manually created indexes can be considered by Automatic Indexes, and can also be dropped after a separately specified retention period.
    This creates a feedback loop where indexes are created and dropped in response to changing load on the database while assuring that the newly created indexes will be used and will improve performance and that any indexes that are dropped were not being used.
    Automatic Indexing is only available on engineered (Exadata) systems (see Database Licensing Information User Manual, 1.3 Permitted Features, Options, and Management Packs by Oracle Database Offering, Performance). This includes Oracle Database Enterprise Edition on Engineered Systems, Oracle Database Exadata Cloud Service, and Oracle's Autonomous databases. Automation of index creation and removal is an important part of the 'self-driving' aspiration for the Autonomous database, where it will do 'database management by using machine learning and automation to eliminate human labor, human error, and manual tuning'.
    (In 20c, there are two additional automatic tasks to flush and purge the SQL Tuning Sets). 

    Oracle 19c: Automatic Indexing. Part 2. Testing Automatic Indexing with Swingbench

    $
    0
    0
    This is the second of a two-part post that looks at the Automatic Indexing feature introduced in Oracle 19c.
    I have used Dominic Giles'Swingbench utility to create a realistic and repeatable OLTP load test using the Sales Order Entry (SOE) benchmark.  This post explains how I set up and ran the test, and what results I obtained.

    Installation & Setup of Swingbench

    I have tested Automatic Indexing on an Exadata X4 running Oracle 19.3.1.0.0, and I have used the results from that system in this blog.  I have also successfully tested it on 19.6 and 20.2 running in Oracle VirtualBox VMs (built with Frits Hoogland's vagrant-builder) and have enabled Exadata features by setting _exadata_feature_on = TRUE.  Of course, I could never recommend setting this on anything other than a play database, but it does show the feature could work on any database platform.
    alter system set "_exadata_feature_on"=true scope=spfile;
    shutdown immediate;
    startup;
    Swingbench requires a Java 8 in a Java virtual machine.
    yum install java
    Then, it is simply a matter of downloading and unzipping the distribution.
    curl http://www.dominicgiles.com/swingbench/swingbench261082.zip -o swingbench.zip
    unzip swingbench.zip
    To assist with monitoring the test and capturing SQL and metrics, I set the AWR snapshot frequency to 15 minutes.
    execute dbms_workload_repository.modify_snapshot_settings(interval => 15);
    I have created a dedicated tablespace for the SOE schema
    CREATE TABLESPACE SOE DATAFILE SIZE 10M AUTOEXTEND ON NEXT 1M;
    The SOE schema is built with the oewizard utility. I am creating all the indexes, and not using any partitioning.
    cd ~/swingbench/bin
    ./oewizard -cs //enkx4c02-scan/swingbench_dmk -dt thin -dba "sys as sysdba" -dbap welcome1 -ts SOE -u soe -p soe -create -allindexes -nopart -cl -v

    Test 1: Baseline Test

    The Swingbench SOE benchmark has 9 tables with 27 indexes. 15 of those indexes are on primary key or referential integrity constraints.
    Table                      Index                                     Cons
    Owner TABLE_NAME Owner INDEX_NAME UNIQUENES Type STATUS VISIBILIT AUT INDEX_KEYS
    ----- -------------------- ----- ------------------------- --------- ---- -------- --------- --- ----------------------------
    SOE ADDRESSES SOE ADDRESS_CUST_IX NONUNIQUE R VALID VISIBLE NO CUSTOMER_ID
    SOE ADDRESS_PK UNIQUE P VALID VISIBLE NO ADDRESS_ID

    SOE CARD_DETAILS SOE CARDDETAILS_CUST_IX NONUNIQUE VALID VISIBLE NO CUSTOMER_ID
    SOE CARD_DETAILS_PK UNIQUE P VALID VISIBLE NO CARD_ID

    SOE CUSTOMERS SOE CUST_EMAIL_IX NONUNIQUE VALID VISIBLE NO CUST_EMAIL
    SOE CUSTOMERS_PK UNIQUE P VALID VISIBLE NO CUSTOMER_ID
    SOE CUST_FUNC_LOWER_NAME_IX NONUNIQUE VALID VISIBLE NO SYS_NC00017$,SYS_NC00018$
    SOE CUST_DOB_IX NONUNIQUE VALID VISIBLE NO DOB
    SOE CUST_ACCOUNT_MANAGER_IX NONUNIQUE VALID VISIBLE NO ACCOUNT_MGR_ID

    SOE INVENTORIES SOE INV_WAREHOUSE_IX NONUNIQUE R VALID VISIBLE NO WAREHOUSE_ID
    SOE INV_PRODUCT_IX NONUNIQUE R VALID VISIBLE NO PRODUCT_ID
    SOE INVENTORY_PK UNIQUE P VALID VISIBLE NO PRODUCT_ID,WAREHOUSE_ID

    SOE ORDERS SOE ORD_WAREHOUSE_IX NONUNIQUE VALID VISIBLE NO WAREHOUSE_ID,ORDER_STATUS
    SOE ORDER_PK UNIQUE P VALID VISIBLE NO ORDER_ID
    SOE ORD_SALES_REP_IX NONUNIQUE VALID VISIBLE NO SALES_REP_ID
    SOE ORD_CUSTOMER_IX NONUNIQUE R VALID VISIBLE NO CUSTOMER_ID
    SOE ORD_ORDER_DATE_IX NONUNIQUE VALID VISIBLE NO ORDER_DATE

    SOE ORDER_ITEMS SOE ITEM_ORDER_IX NONUNIQUE R VALID VISIBLE NO ORDER_ID
    SOE ITEM_PRODUCT_IX NONUNIQUE R VALID VISIBLE NO PRODUCT_ID
    SOE ORDER_ITEMS_PK UNIQUE P VALID VISIBLE NO ORDER_ID,LINE_ITEM_ID

    SOE PRODUCT_DESCRIPTIONS SOE PRD_DESC_PK UNIQUE P VALID VISIBLE NO PRODUCT_ID,LANGUAGE_ID
    SOE PROD_NAME_IX NONUNIQUE VALID VISIBLE NO TRANSLATED_NAME

    SOE PRODUCT_INFORMATION SOE PROD_SUPPLIER_IX NONUNIQUE VALID VISIBLE NO SUPPLIER_ID
    SOE PRODUCT_INFORMATION_PK UNIQUE P VALID VISIBLE NO PRODUCT_ID
    SOE PROD_CATEGORY_IX NONUNIQUE VALID VISIBLE NO CATEGORY_ID

    SOE WAREHOUSES SOE WAREHOUSES_PK UNIQUE P VALID VISIBLE NO WAREHOUSE_ID
    SOE WHS_LOCATION_IX NONUNIQUE VALID VISIBLE NO LOCATION_ID
    At this stage, Automatic Indexing is off. If you rebuild the SOE schema having previously run Automatic Indexing, remember to disable the feature, otherwise, it might act on the basis of previous activity. It is administered via the DBMS_AUTO_INDEX package.
    EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_MODE','OFF');
    I ran Swingbench using the character mode charbench front end.  Each test runs for an hour.
    ./charbench -c ../configs/SOE_Client_Side.xml -cs //enkx4c02-scan/swingbench_dmk -dt thin -u soe -p soe -rt 01:00 -v

    Author : Dominic Giles
    Version : 2.6.0.1082

    Results will be written to results.xml.
    Hit Return to Terminate Run...

    Time Users TPM TPS

    12:54:54 PM 0 58500 869
    Completed Run.
    The results are written to an XML file, from which a formatted report can be produced using Result2Pdf. I run this on Windows.
    >results2pdf -c results00001.xml

    >java -cp ../launcher LauncherBootstrap -executablename results2pdf results2pdf -c results.xml
    Application : Results2Pdf
    Author : Dominic Giles
    Version : 2.6.0.1076
    Success : Pdf file null was created from results.xml results file.
    The report gives average response times for the 9 different transactions and an overall average number of transactions per second.

    Results

    This test is my baseline.
    Transaction
    Average Response (ms)
    1: Delivered Indexes
    Update Customer Details
    1.18
    Browse Products
    2.03
    Browse Orders
    2.38
    Customer Registration
    3.50
    Order Products
    5.67
    Warehouse Query
    6.20
    Process Orders
    13.42
    Warehouse Activity Query
    14.89
    Sales Rep Query
    31.76
    TPS
    1060.81

    Test 2: Drop Secondary Indexes

    In many applications, developers and DBAs add indexes to resolve performance problems. It is easy to add indexes, but harder to know whether and where they are used, and therefore when it is safe to remove or change an existing index. Indexes have an overhead in terms of taking up space in the database and maintenance during DML operations.
    Automatic indexing is designed to take on this challenge. Oracle has provided a procedure to drop secondary indexes, DBMS_AUTO_INDEX.DROP_SECONDARY_INDEXES.
    DROP_SECONDARY_INDEXES doesn't check the status of the constraint. Foreign key columns should be indexed to avoid TM locking when updating or deleting the parent record in a primary key. The index would not be needed if the foreign key constraint was not validated. You might make a constraint disabled, not validated, but reliable because you want to take advantage of foreign key join elimination. In this case, the index would not be necessary, but it would not be dropped by this procedure.
    EXEC DBMS_AUTO_INDEX.drop_secondary_indexes('SOE','');
    When this is run on the SOE schema, I am left with 15 indexes that are either unique or on foreign key columns.
    Table                      Index                                     Cons
    Owner TABLE_NAME Owner INDEX_NAME UNIQUENES Type STATUS VISIBILIT AUT INDEX_KEYS
    ----- -------------------- ----- ------------------------- --------- ---- -------- --------- --- ----------------------------
    SOE ADDRESSES SOE ADDRESS_PK UNIQUE P VALID VISIBLE NO ADDRESS_ID
    SOE ADDRESS_CUST_IX NONUNIQUE R VALID VISIBLE NO CUSTOMER_ID

    SOE CARD_DETAILS SOE CARD_DETAILS_PK UNIQUE P VALID VISIBLE NO CARD_ID

    SOE CUSTOMERS SOE CUSTOMERS_PK UNIQUE P VALID VISIBLE NO CUSTOMER_ID

    SOE INVENTORIES SOE INV_PRODUCT_IX NONUNIQUE R VALID VISIBLE NO PRODUCT_ID
    SOE INV_WAREHOUSE_IX NONUNIQUE R VALID VISIBLE NO WAREHOUSE_ID
    SOE INVENTORY_PK UNIQUE P VALID VISIBLE NO PRODUCT_ID,WAREHOUSE_ID

    SOE ORDERS SOE ORD_CUSTOMER_IX NONUNIQUE R VALID VISIBLE NO CUSTOMER_ID
    SOE ORDER_PK UNIQUE P VALID VISIBLE NO ORDER_ID

    SOE ORDER_ITEMS SOE ITEM_PRODUCT_IX NONUNIQUE R VALID VISIBLE NO PRODUCT_ID
    SOE ORDER_ITEMS_PK UNIQUE P VALID VISIBLE NO ORDER_ID,LINE_ITEM_ID
    SOE ITEM_ORDER_IX NONUNIQUE R VALID VISIBLE NO ORDER_ID

    SOE PRODUCT_DESCRIPTIONS SOE PRD_DESC_PK UNIQUE P VALID VISIBLE NO PRODUCT_ID,LANGUAGE_ID

    SOE PRODUCT_INFORMATION SOE PRODUCT_INFORMATION_PK UNIQUE P VALID VISIBLE NO PRODUCT_ID

    SOE WAREHOUSES SOE WAREHOUSES_PK UNIQUE P VALID VISIBLE NO WAREHOUSE_ID

    Results

    Unsurprisingly, the effect on Swingbench is to severely degrade performance.
    Transaction
    Average Response (ms)
    1: Delivered Indexes
    2: Drop Secondary Indexes
    Update Customer Details
    1.18
    3.30
    Browse Products
    2.03
    409.21
    Browse Orders
    2.38
    2.05
    Customer Registration
    3.50
    78.51
    Order Products
    5.67
    40.97
    Warehouse Query
    6.20
    2.82
    Process Orders
    13.42
    247.80
    Warehouse Activity Query
    14.89
    274.19
    Sales Rep Query
    31.76
    268.51
    TPS
    1060.81
    81.30

    Enabling Automatic Indexing

    There are several configuration settings that are made via the DBMS_AUTO_INDEX.CONFIGURE procedure.
    • I have created a tablespace AUTO_INDEXES_TS and configured Automatic Indexing to create its indexes there.  It is permitted to use 100% of that tablespace.
    CREATE TABLESPACE AUTO_INDEXES_TS DATAFILE SIZE 10M AUTOEXTEND ON NEXT 1M;
    EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_DEFAULT_TABLESPACE','AUTO_INDEXES_TS');
    EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_SPACE_BUDGET','100');
    • Automatic indexes will be retained until they have not been used for 1 day (the default was 373 days).  This unrealistically low value is so that I can test that they will be dropped later.
    • Manual indexes, the ones created when Swingbench was installed, are not deleted.  
    • The automatic indexing logs, visible in the various DBA_AUTO_INDEX% views, are retained for 7 days.
    EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_RETENTION_FOR_AUTO','7');
    EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_RETENTION_FOR_MANUAL','');
    EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_REPORT_RETENTION','7');
    • Automatic indexing is configured only to apply to the SOE schema.
    EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_SCHEMA', 'SOE', allow => TRUE);
    • Finally, I enable Automatic Indexing and permit it to create indexes.
    EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_MODE','IMPLEMENT');
    You can validate the current parameters by querying DBA_AUTO_INDEX_CONFIG.  This view is based on smb$config.  There are other hidden and undocumented parameters visible in smb$config.
                                      Auto Index Config
    Modified
    PARAMETER_NAME PARAMETER_VALUE LAST_MODIFIED By
    -------------------------------- ------------------------------ ----------------------------- ----------
    AUTO_INDEX_COMPRESSION OFF 27-MAR-20 07.42.36.000000 AM SYSTEM
    AUTO_INDEX_DEFAULT_TABLESPACE AUTO_INDEXES_TS 27-MAR-20 10.28.24.000000 AM SYSTEM
    AUTO_INDEX_MODE IMPLEMENT 27-MAR-20 10.28.24.000000 AM SYSTEM
    AUTO_INDEX_REPORT_RETENTION 7 27-MAR-20 10.28.24.000000 AM SYSTEM
    AUTO_INDEX_RETENTION_FOR_AUTO 7 27-MAR-20 10.28.24.000000 AM SYSTEM
    AUTO_INDEX_RETENTION_FOR_MANUAL 27-MAR-20 10.28.24.000000 AM SYSTEM
    AUTO_INDEX_SCHEMA schema IN (SOE) 27-MAR-20 10.28.24.000000 AM SYSTEM
    AUTO_INDEX_SPACE_BUDGET 100 27-MAR-20 10.28.24.000000 AM SYSTEM
    DBA_AUTOTASK_SCHEDULE_CONTROL shows the two scheduled automatic tasks that form Automatic Indexing.  The Auto Index Task runs when Automatic Indexing is enabled in either implement or report only mode. The Auto SQL Tuning Set (STS) Capture Task runs from when Automatic Indexing is first enabled, but it is not stopped when Automatic Indexing is disabled. Both jobs run every 15 minutes.
               Task                                                      Max Run       Elapsed
    DBID ID TASK_NAME STATUS INTERVAL Time ENABL Time LAST_SCHEDULE_TIME
    ---------- ---- -------------------------------- ---------- -------- ------- ----- ------- --------------------------------
    1400798553 3 Auto Index Task SUCCEEDED 900 3600 TRUE 3 17-MAR-20 03.18.26.997 PM -05:00
    1400798553 5 Auto STS Capture Task SUCCEEDED 900 900 TRUE 0 17-MAR-20 03.17.31.051 PM -05:00

    Test 3: Creating Automatic Indexes

    When I ran Swingbench again the poor performance continued until halfway through the test when Automatic Indexing decided to create some indexes and make them visible. There was a step improvement in performance, although it was nowhere near the 1000 TPS that we started with!
    At the end of the test, there are 5 new indexes, 3 of which are visible, 2 are invisible.
    Table                      Index                                     Cons
    Owner TABLE_NAME Owner INDEX_NAME UNIQUENES Type STATUS VISIBILIT AUT INDEX_KEYS
    ----- -------------------- ----- ------------------------- --------- ---- ------------ --------- --- -------------------------------------------------
    SOE ADDRESSES SOE ADDRESS_CUST_IX NONUNIQUE R VALID VISIBLE NO CUSTOMER_ID
    SOE ADDRESS_PK UNIQUE P VALID VISIBLE NO ADDRESS_ID

    SOE CARD_DETAILS SOE CARD_DETAILS_PK UNIQUE P VALID VISIBLE NO CARD_ID
    SOE SYS_AI_dt4w4vr174j9m NONUNIQUE VALID VISIBLE YES CUSTOMER_ID <-reinstated secondary

    SOE CUSTOMERS SOE CUSTOMERS_PK UNIQUE P VALID VISIBLE NO CUSTOMER_ID

    SOE INVENTORIES SOE INVENTORY_PK UNIQUE P VALID VISIBLE NO PRODUCT_ID,WAREHOUSE_ID
    SOE INV_PRODUCT_IX NONUNIQUE R VALID VISIBLE NO PRODUCT_ID
    SOE INV_WAREHOUSE_IX NONUNIQUE R VALID VISIBLE NO WAREHOUSE_ID

    SOE ORDERS SOE ORDER_PK UNIQUE P VALID VISIBLE NO ORDER_ID
    SOE ORD_CUSTOMER_IX NONUNIQUE R VALID VISIBLE NO CUSTOMER_ID
    SOE SYS_AI_3z00frhp9vd91 NONUNIQUE VALID VISIBLE YES WAREHOUSE_ID <-original also order_status
    SOE SYS_AI_gbwwy984mc1ft NONUNIQUE VALID VISIBLE YES SALES_REP_ID <-reinstated secondary

    SOE ORDER_ITEMS SOE ORDER_ITEMS_PK UNIQUE P VALID VISIBLE NO ORDER_ID,LINE_ITEM_ID
    SOE ITEM_PRODUCT_IX NONUNIQUE R VALID VISIBLE NO PRODUCT_ID
    SOE ITEM_ORDER_IX NONUNIQUE R VALID VISIBLE NO ORDER_ID

    SOE PRODUCT_DESCRIPTIONS SOE PRD_DESC_PK UNIQUE P VALID VISIBLE NO PRODUCT_ID,LANGUAGE_ID
    SOE SYS_AI_20tjdcuwznyhx NONUNIQUE VALID INVISIBLE YES PRODUCT_ID <-redundant

    SOE PRODUCT_INFORMATION SOE PRODUCT_INFORMATION_PK UNIQUE P VALID VISIBLE NO PRODUCT_ID
    SOE SYS_AI_b9k5zyq0mjwf5 NONUNIQUE VALID INVISIBLE YES CATEGORY_ID <-reinstated invisible secondary

    SOE WAREHOUSES SOE WAREHOUSES_PK UNIQUE P VALID VISIBLE NO WAREHOUSE_ID <-reinstated redundant reinstated original
    The names of the indexes are determined by applying the SYS_OP_COMBINED_HASH function to the table owner, table name, and indexed column list.
    The various dictionary views reveal some of what has happened.
    Note that my invisible indexes are usable.  Automatic Indexes start out as unusable and invisible.  Here Automatic Indexes has rebuilt them as usable, but they are still invisible because they do not reduce logical I/O.  So, I am still bearing the overhead of maintaining them during DML.
    DBA_AUTO_INDEX_STATISTICS reports a summary of the automatic indexing task.  Confirming the number of indexes built. 
    Tue Mar 17                                                                          page    1
    Auto Index Statistics

    EXECUTION_NAME STAT_NAME VALUE
    -------------------------- ----------------------------- ----------
    SYS_AI_2020-03-17/15:48:28 Space used in bytes 129105920
    SQL plan baselines created 2
    Index candidates 5
    Indexes created (visible) 3
    Indexes created (invisible) 2
    Improvement percentage 88.92
    SQL statements verified 10
    SQL statements improved 4
    SQL statements managed by SPM 2
    DBA_AUTO_INDEX_SQL_ACTIONS shows the commands issued to build the tuning set SYS_AUTO_STS SQL.  Automatic indexing only uses this one tuning set and keeps adding statements to it.  Even if I drop and recreate the SOE schema the SQL Tuning set remains.
    Tue Mar 17                                                                                                                             page    1
    Auto Index SQL Actions

    SQL Plan
    EXECUTION_NAME ACTION_ID SQL_ID Hash Value COMMAND
    -------------------------- ---------- ------------- ---------- ------------------------------
    STATEMENT START_TIME END_TIME ERROR#
    -------------------------------------------------------------------------------- ------------------- ------------------- ----------
    SYS_AI_2020-03-17/15:48:28 12 dy8cxyd3mv1as 2679498789 DISALLOW AUTO INDEX FOR SQL
    declare 15:50:01 17.03.2020 15:50:02 17.03.2020 0
    load_cnt pls_integer;
    begin
    load_cnt := dbms_spm_internal.load_plans_from_sqlset('SYS_AUTO_STS','S
    YS','sql_id = ''dy8cxyd3mv1as''','NO','YES',1000,FALSE,'SYS',FALSE,TRUE); end;

    SYS_AI_2020-03-17/15:48:28 11 dunt7pwuax92s 1878158884 DISALLOW AUTO INDEX FOR SQL
    declare 15:50:01 17.03.2020 15:50:01 17.03.2020 0
    load_cnt pls_integer;
    begin
    load_cnt := dbms_spm_internal.load_plans_from_sqlset('SYS_AUTO_STS','S
    YS','sql_id = ''dunt7pwuax92s''','NO','YES',1000,FALSE,'SYS',FALSE,TRUE); end;

    Initially, the automatic indexes are created unusable and invisible.  Later, the indexes will recreated as usable and invisible is they are judged to be beneficial.
    Fri Mar 20                                                                                                                                   page    1
    Auto Index Indexing Actions

    Action Index Table
    EXECUTION_NAME ID INDEX_NAME Owner TABLE_NAME Owner COMMAND
    -------------------------- ------ ------------------------- ----- -------------------- ----- ------------------------------
    STATEMENT START_TIME END_TIME Error#
    -------------------------------------------------------------------------------- ------------------- ------------------- ------
    SYS_AI_2020-03-20/13:56:03 5 SYS_AI_3z00frhp9vd91 SOE ORDERS SOE CREATE INDEX
    CREATE INDEX "SOE"."SYS_AI_3z00frhp9vd91" ON "SOE"."ORDERS"("WAREHOUSE_ID") TA 13:56:08 20.03.2020 13:56:08 20.03.2020 0
    BLESPACE "AUTO_INDEXES_TS" UNUSABLE INVISIBLE AUTO COMPRESS ADVANCED LOW ONLINE

    SYS_AI_2020-03-20/13:56:03 6 SYS_AI_20tjdcuwznyhx SOE PRODUCT_DESCRIPTIONS CREATE INDEX
    CREATE INDEX "SOE"."SYS_AI_20tjdcuwznyhx" ON "SOE"."PRODUCT_DESCRIPTIONS"("PRO 13:56:08 20.03.2020 13:56:08 20.03.2020 0
    DUCT_ID") TABLESPACE "AUTO_INDEXES_TS" UNUSABLE INVISIBLE AUTO COMPRESS ADVANCED
    LOW ONLINE
    DBA_AUTO_INDEX_VERIFICATIONS reports on the tests that were made on statements before and after the index changes were made.  You can see some have improved and some have regressed.
    Tue Mar 17                                                                                        page    1
    Auto Index Verifications

    Original Auto Index Original Auto Index
    EXECUTION_NAME SQL_ID Plan Hash Plan Hash Buffer Gets Buffer Gets STATUS
    -------------------------- ------------- ---------- ---------- ----------- ----------- ------------
    SYS_AI_2020-03-17/15:48:28 0sh0fn7r21020 3619984409 3900469033 37784 130 IMPROVED
    SYS_AI_2020-03-17/16:18:29 3900469033 3900469033 1316 135 UNCHANGED

    SYS_AI_2020-03-17/15:48:28 200mw76ta6n1r 2844209861 2671811931 37769 3555 IMPROVED
    SYS_AI_2020-03-17/16:18:29 2671811931 2671811931 3278 3596 UNCHANGED

    SYS_AI_2020-03-17/15:48:28 28tr1bjf4t2uh 2692802960 3836151239 37764 3238 IMPROVED
    SYS_AI_2020-03-17/16:18:29 3836151239 3836151239 3272 3442 UNCHANGED

    SYS_AI_2020-03-17/15:48:28 9dt3dqym1tqzw 3954032495 1068597273 46 4 UNCHANGED

    SYS_AI_2020-03-17/15:48:28 a90pbxt8zukdr 1513149408 3900469033 67 1 UNCHANGED

    SYS_AI_2020-03-17/15:48:28 amaapqt3p9qd0 2597291669 1494990609 14645 23 IMPROVED
    SYS_AI_2020-03-17/16:18:29 1494990609 1494990609 3 23 UNCHANGED

    SYS_AI_2020-03-17/15:48:28 b4p66t3uznnuc 3551246360 463531433 4038 4406 UNCHANGED

    SYS_AI_2020-03-17/15:48:28 dunt7pwuax92s 1878158884 2671811931 13 2965 REGRESSED

    SYS_AI_2020-03-17/15:48:28 dy8cxyd3mv1as 2679498789 2126884530 155 298 REGRESSED

    SYS_AI_2020-03-17/15:48:28 g1znkya370htg 3571181773 896069541 74 42 UNCHANGED

    This testing mechanism generally prevents Automatic Indexing from creating indexes that are not used.  However, Richard Foote has found an exception where the number of buffer gets goes down, but the optimizer cost goes up.
    The decision by the Tuning Advisor to propose the index is determined by optimizer cost, the decision to use a valid visible index is also determined by optimizer cost.  I think it is slightly incongruous that the decision whether to make a candidate index visible and therefore available to the application, is determined by logical I/O, CPU consumption, and elapsed time but not at all optimiser cost.

    Results

    The entirety of this test was run with the automatically created indexes in place.
    Transaction
    Average Response (ms)
    1: Delivered Indexes
    2: Drop Secondary Indexes
    3: Automatic Indexing
    Update Customer Details
    1.18
    3.30
    3.32
    Browse Products
    2.03
    409.21
    478.52
    Browse Orders
    2.38
    2.05
    2.01
    Customer Registration
    3.50
    78.51
    5.91
    Order Products
    5.67
    40.97
    50.34
    Warehouse Query
    6.20
    2.82
    2.85
    Process Orders
    13.42
    247.80
    5.39
    Warehouse Activity Query
    14.89
    274.19
    11.43
    Sales Rep Query
    31.76
    268.51
    14.45
    TPS
    1060.81
    81.30
    137.40

    Comparison with No Secondary Indexes

    I have used the execution statistics in DBA_HIST_SQLSTAT for statements captured by AWR during each test and compared the execution plans and average elapsed time for each.
    Where the plans change, they do change for the better, so Automatic Indexing is doing its job
                              Average                                                              Average       %    %Num
    Test SQL Plan Opt. Num Elapsed Elapsed Test SQL Plan Opt. Num Elapsed Elapsed Time Execs
    ID SQL_ID Hash Value Cost Execs Time Time ? ID SQL_ID Hash Value Cost Execs Time Time Diff Diff
    ---- ------------- ---------- -------- -------- --------- -------- - ---- ------------- ---------- -------- -------- --------- -------- ------- -------
    2 g9wsbkb2jag3j 1005345217 229 15028 6108.13 .4064 = 3 g9wsbkb2jag3j 1005345217 229 31523 15046.99 .4773 17 110
    2 34mt4skacwwwd 235854103 73 7547 295.26 .0391 = 3 34mt4skacwwwd 235854103 73 15601 766.31 .0491 26 107
    2 g1znkya370htg 3571181773 45 224885 59.99 .0003 = 3 g1znkya370htg 3571181773 45 470063 111.48 .0002 -11 109
    2 djj5txv2dzwb6 3241608609 1 263982 38.19 .0001 = 3 djj5txv2dzwb6 3241608609 1 550563 77.90 .0001 -2 109
    2 09pzy8x10gjkg 0 1 139639 24.96 .0002 = 3 09pzy8x10gjkg 0 1 292179 53.20 .0002 2 109
    2 200mw76ta6n1r 2844209861 10151 1514 405.73 .2680 ! 3 200mw76ta6n1r 2671811931 3257 3268 46.67 .0143 -95 116
    2 a6hdpzrqqhc7d 0 1 70858 26.01 .0004 = 3 a6hdpzrqqhc7d 0 1 148211 41.21 .0003 -24 109
    2 28tr1bjf4t2uh 2692802960 10140 1575 430.72 .2735 ! 3 28tr1bjf4t2uh 3836151239 3245 3118 35.58 .0114 -96 98
    2 982zxphp8ht6c 1666523684 2 407633 14.10 .0000 = 3 982zxphp8ht6c 1666523684 2 849104 30.01 .0000 2 108
    2 csasr8ct2051v 900611645 3 263976 13.77 .0001 = 3 csasr8ct2051v 900611645 3 550572 29.06 .0001 1 109
    2 0sh0fn7r21020 3619984409 15124 3019 747.00 .2474 ! 3 0sh0fn7r21020 3900469033 4695 5030 25.60 .0051 -98 67
    2 0sh0fn7r21020 3619984409 15124 3019 747.00 .2474 ! 3 0sh0fn7r21020 2629004565 14875 1208 6.04 .0050 -98 -60
    2 5g00dq4fxwnsw 2141863993 3 95832 7.13 .0001 = 3 5g00dq4fxwnsw 2141863993 3 292176 21.40 .0001 -2 205
    2 2yp5w5a36s5xv 1628223527 3 48610 5.50 .0001 = 3 2yp5w5a36s5xv 1628223527 3 148215 12.84 .0001 -23 205
    2 4a7nqf7k0ztyc 0 1 30356 6.03 .0002 = 3 4a7nqf7k0ztyc 0 1 63339 12.33 .0002 -2 109
    2 49d9qhgsr8w9h 0 1 20825 3.40 .0002 = 3 49d9qhgsr8w9h 0 1 63339 10.44 .0002 1 204
    2 8uk8bquk453q8 3072215225 2 48612 5.61 .0001 = 3 8uk8bquk453q8 3072215225 2 134571 8.51 .0001 -45 177
    2 cr72yp489p3jw 0 1 20824 2.57 .0001 = 3 cr72yp489p3jw 0 1 44297 6.97 .0002 27 113
    2 g3kf1ppky3627 2480532011 8 67021 3.00 .0000 = 3 g3kf1ppky3627 2480532011 6 143326 6.57 .0000 2 114
    2 0t61wk161zz87 1544532951 2 20823 2.26 .0001 = 3 0t61wk161zz87 1544532951 2 13799 1.64 .0001 9 -34
    2 amaapqt3p9qd0 2597291669 4276 75096 5348.00 .0712 ! 3 amaapqt3p9qd0 1494990609 7 34857 1.40 .0000 -100 -54
    2 8xqdxjkbt9ghg 0 1 5681 1.93 .0003 = 3 8xqdxjkbt9ghg 0 1 4129 1.34 .0003 -4 -27
    2 6k3uuf3g8pwh6 1628223527 3 5167 1.43 .0003 = 3 6k3uuf3g8pwh6 1628223527 3 3527 1.13 .0003 16 -32
    2 a9cv97h3dazfh 1197098199 3 11144 1.48 .0001 = 3 a9cv97h3dazfh 1197098199 3 7665 1.09 .0001 7 -31
    2 0c11vprf4881w 856749079 6 11370 .85 .0001 = 3 0c11vprf4881w 856749079 7 10487 .85 .0001 9 -8
    2 3rxkss61q68su 1322380957 5 4821 .31 .0001 = 3 3rxkss61q68su 1322380957 5 9281 .64 .0001 8 93
    2 9v9ky32fg9hy7 104664550 2 4140 .61 .0001 = 3 9v9ky32fg9hy7 104664550 2 4121 .55 .0001 -11 -0
    2 4abyshv6jmtdk 140963536 123 15 .05 .0036 = 3 4abyshv6jmtdk 140963536 123 20 .08 .0039 9 33

    Comparison with Delivered Indexes

    However, if we compare the delivered indexes against just the primary indexes and those created by Automatic Indexing, a number of statements have degraded, one particularly severely.
                                                      Average                                                              Average       %    %Num
    Test SQL Plan Opt. Num Elapsed Elapsed Test SQL Plan Opt. Num Elapsed Elapsed Time Execs
    ID SQL_ID Hash Value Cost Execs Time Time ? ID SQL_ID Hash Value Cost Execs Time Time Diff Diff
    ---- ------------- ---------- -------- -------- --------- -------- - ---- ------------- ---------- -------- -------- --------- -------- ------- -------
    1 g9wsbkb2jag3j 574689976 5 148925 9.33 .0001 ! 3 g9wsbkb2jag3j 1005345217 229 31523 15046.99 .4773 761882 -79
    1 34mt4skacwwwd 235854103 74 90568 2884.16 .0318 = 3 34mt4skacwwwd 235854103 73 15601 766.31 .0491 54 -83
    1 g1znkya370htg 124060720 26 2725529 331.81 .0001 ! 3 g1znkya370htg 3571181773 45 470063 111.48 .0002 95 -83
    1 djj5txv2dzwb6 3241608609 1 3179667 435.37 .0001 = 3 djj5txv2dzwb6 3241608609 1 550563 77.90 .0001 3 -83
    1 09pzy8x10gjkg 0 1 1687520 285.05 .0002 = 3 09pzy8x10gjkg 0 1 292179 53.20 .0002 8 -83
    1 200mw76ta6n1r 1448083145 1437 18129 367.09 .0202 ! 3 200mw76ta6n1r 2671811931 3257 3268 46.67 .0143 -29 -82
    1 a6hdpzrqqhc7d 0 1 857616 244.55 .0003 = 3 a6hdpzrqqhc7d 0 1 148211 41.21 .0003 -2 -83
    1 28tr1bjf4t2uh 2220165490 1425 17921 167.57 .0094 ! 3 28tr1bjf4t2uh 3836151239 3245 3118 35.58 .0114 22 -83
    1 982zxphp8ht6c 1666523684 2 4903566 171.31 .0000 = 3 982zxphp8ht6c 1666523684 2 849104 30.01 .0000 1 -83
    1 csasr8ct2051v 900611645 3 3179610 159.11 .0001 = 3 csasr8ct2051v 900611645 3 550572 29.06 .0001 5 -83
    1 0sh0fn7r21020 1055577880 1258 36654 175.46 .0048 ! 3 0sh0fn7r21020 3900469033 4695 5030 25.60 .0051 6 -86
    1 5g00dq4fxwnsw 2141863993 3 1687532 120.78 .0001 = 3 5g00dq4fxwnsw 2141863993 3 292176 21.40 .0001 2 -83
    1 2yp5w5a36s5xv 1628223527 3 857624 114.81 .0001 = 3 2yp5w5a36s5xv 1628223527 3 148215 12.84 .0001 -35 -83
    1 4a7nqf7k0ztyc 0 1 363873 109.76 .0003 = 3 4a7nqf7k0ztyc 0 1 63339 12.33 .0002 -35 -83
    1 49d9qhgsr8w9h 0 1 363871 55.61 .0002 = 3 49d9qhgsr8w9h 0 1 63339 10.44 .0002 8 -83
    1 8uk8bquk453q8 3072215225 2 857622 51.75 .0001 = 3 8uk8bquk453q8 3072215225 2 134571 8.51 .0001 5 -84
    1 cr72yp489p3jw 0 1 363878 52.63 .0001 = 3 cr72yp489p3jw 0 1 44297 6.97 .0002 9 -88
    1 g3kf1ppky3627 2480532011 8 1180857 51.46 .0000 = 3 g3kf1ppky3627 2480532011 6 143326 6.57 .0000 5 -88
    1 0sh0fn7r21020 1055577880 1258 36654 175.46 .0048 ! 3 0sh0fn7r21020 2629004565 14875 1208 6.04 .0050 4 -97
    1 0t61wk161zz87 1544532951 2 363871 37.74 .0001 = 3 0t61wk161zz87 1544532951 2 13799 1.64 .0001 14 -96
    1 amaapqt3p9qd0 3722429161 8 908901 32.04 .0000 ! 3 amaapqt3p9qd0 1494990609 7 34857 1.40 .0000 14 -96
    1 8xqdxjkbt9ghg 0 1 69829 14.61 .0002 = 3 8xqdxjkbt9ghg 0 1 4129 1.34 .0003 56 -94
    1 6k3uuf3g8pwh6 1628223527 3 90569 28.00 .0003 = 3 6k3uuf3g8pwh6 1628223527 3 3527 1.13 .0003 4 -96
    1 a9cv97h3dazfh 1197098199 3 147637 18.88 .0001 = 3 a9cv97h3dazfh 1197098199 3 7665 1.09 .0001 11 -95
    1 0c11vprf4881w 856749079 8 223512 15.24 .0001 = 3 0c11vprf4881w 856749079 7 10487 .85 .0001 19 -95
    1 3rxkss61q68su 1322380957 5 176508 11.20 .0001 = 3 3rxkss61q68su 1322380957 5 9281 .64 .0001 9 -95
    1 9v9ky32fg9hy7 104664550 2 43191 2.69 .0001 = 3 9v9ky32fg9hy7 104664550 2 4121 .55 .0001 113 -90
    1 4h624tuydrjnh 3828985807 3 62578 4.69 .0001 = 3 4h624tuydrjnh 3828985807 3 4131 .46 .0001 50 -93
    1 95hgbb2kkcvvg 3419397814 12934 1 4.09 4.0858 !
    1 3gs4005kgkhxu 296924608 6423 1 4.05 4.0539 !

    Test 4: Manual Tuning

    Then I looked at whether I could get back to the original performance by manually tuning the top SQL statements rather than reinstating all the indexes that I had dropped.  I found I needed to create just four more indexes.  
    The first two are reinstated indexes that were originally part of the SOE schema but were dropped as secondary indexes.  
    CREATE INDEX SOE.CUST_FUNC_LOWER_NAME_IX 
    ON SOE.CUSTOMERS (LOWER(CUST_LAST_NAME), LOWER(CUST_FIRST_NAME))
    TABLESPACE SOE PARALLEL 8
    /
    CREATE INDEX SOE.PROD_CATEGORY_IX ON SOE.PRODUCT_INFORMATION (CATEGORY_ID)
    TABLESPACE SOE PARALLEL 8
    /
    The other two are new indexes that were not originally present.
    CREATE INDEX SOE.DMK_ORDER_STATUS ON SOE.ORDERS (ORDER_STATUS) 
    TABLESPACE SOE PARALLEL 8
    /
    CREATE INDEX SOE.DMK_WAREHOUSE_ORDER_DATE ON SOE.ORDERS (WAREHOUSE_ID, ORDER_DATE)
    TABLESPACE SOE PARALLEL 8
    /

    Results

    I now have 22 visible indexes instead of the original 27, and the performance is better than with the delivered indexes.
    Transaction
    Average Response (ms)
    1: Delivered Indexes
    2: Drop Secondary Indexes
    3: Automatic Indexing
    4: Manual Tuning
    Update Customer Details
    1.18
    3.30
    3.32
    3.51
    Browse Products
    2.03
    409.21
    478.52
    1.93
    Browse Orders
    2.38
    2.05
    2.01
    2.12
    Customer Registration
    3.50
    78.51
    5.91
    5.92
    Order Products
    5.67
    40.97
    50.34
    1.99
    Warehouse Query
    6.20
    2.82
    2.85
    3.00
    Process Orders
    13.42
    247.80
    5.39
    4.95
    Warehouse Activity Query
    14.89
    274.19
    11.43
    20.29
    Sales Rep Query
    31.76
    268.51
    14.45
    3.74
    TPS
    1060.81
    81.30
    137.40
    1166.49

    Comparison with Delivered Indexes

    We can see from the SQL statistics comparison that most of the original plans have been reinstated, and elsewhere there are both improvements and regressions.
                                                               Average                                                              Average       %    %Num
    Test SQL Plan Opt. Num Elapsed Elapsed Test SQL Plan Opt. Num Elapsed Elapsed Time Execs
    ID SQL_ID Hash Value Cost Execs Time Time ? ID SQL_ID Hash Value Cost Execs Time Time Diff Diff
    ---- ------------- ---------- -------- -------- --------- -------- - ---- ------------- ---------- -------- -------- --------- -------- ------- -------
    1 djj5txv2dzwb6 3241608609 1 3179667 435.37 .0001 = 4 djj5txv2dzwb6 3241608609 1 3787684 533.89 .0001 3 19
    1 g1znkya370htg 124060720 26 2725529 331.81 .0001 ! 4 g1znkya370htg 684158979 19 3250699 491.48 .0002 24 19
    1 28tr1bjf4t2uh 2220165490 1425 17921 167.57 .0094 ! 4 28tr1bjf4t2uh 3836151239 6155 21756 435.75 .0200 114 21
    1 09pzy8x10gjkg 0 1 1687520 285.05 .0002 = 4 09pzy8x10gjkg 0 1 2011130 357.96 .0002 5 19
    1 a6hdpzrqqhc7d 0 1 857616 244.55 .0003 = 4 a6hdpzrqqhc7d 0 1 1021001 304.66 .0003 5 19
    1 982zxphp8ht6c 1666523684 2 4903566 171.31 .0000 = 4 982zxphp8ht6c 1666523684 2 5846476 215.40 .0000 5 19
    1 csasr8ct2051v 900611645 3 3179610 159.11 .0001 = 4 csasr8ct2051v 900611645 3 3787526 197.96 .0001 4 19
    1 0sh0fn7r21020 1055577880 1258 36654 175.46 .0048 ! 4 0sh0fn7r21020 3900469033 11026 43379 195.10 .0045 -6 18
    1 5g00dq4fxwnsw 2141863993 3 1687532 120.78 .0001 = 4 5g00dq4fxwnsw 2141863993 3 2011090 148.96 .0001 3 19
    1 2yp5w5a36s5xv 1628223527 3 857624 114.81 .0001 = 4 2yp5w5a36s5xv 1628223527 3 1020995 115.62 .0001 -15 19
    1 4a7nqf7k0ztyc 0 1 363873 109.76 .0003 = 4 4a7nqf7k0ztyc 0 1 432444 95.85 .0002 -27 19
    1 200mw76ta6n1r 1448083145 1437 18129 367.09 .0202 ! 4 200mw76ta6n1r 437111724 371 21657 72.86 .0034 -83 19
    1 49d9qhgsr8w9h 0 1 363871 55.61 .0002 = 4 49d9qhgsr8w9h 0 1 432448 67.47 .0002 2 19
    1 g3kf1ppky3627 2480532011 8 1180857 51.46 .0000 = 4 g3kf1ppky3627 2480532011 6 1406867 67.09 .0000 9 19
    1 cr72yp489p3jw 0 1 363878 52.63 .0001 = 4 cr72yp489p3jw 0 1 432449 64.74 .0001 4 19
    1 8uk8bquk453q8 3072215225 2 857622 51.75 .0001 = 4 8uk8bquk453q8 3072215225 2 1020941 63.69 .0001 3 19
    1 34mt4skacwwwd 235854103 74 90568 2884.16 .0318 ! 4 34mt4skacwwwd 1567979920 74 108274 48.63 .0004 -99 20
    1 0t61wk161zz87 1544532951 2 363871 37.74 .0001 = 4 0t61wk161zz87 1544532951 2 432449 46.49 .0001 4 19
    1 8xqdxjkbt9ghg 0 1 69829 14.61 .0002 = 4 8xqdxjkbt9ghg 0 1 195205 41.44 .0002 1 180
    1 amaapqt3p9qd0 3722429161 8 908901 32.04 .0000 ! 4 amaapqt3p9qd0 1494990609 5 1082090 39.24 .0000 3 19
    1 a9cv97h3dazfh 1197098199 3 147637 18.88 .0001 = 4 a9cv97h3dazfh 1197098199 3 269481 35.83 .0001 4 83
    1 3rxkss61q68su 1322380957 5 176508 11.20 .0001 = 4 3rxkss61q68su 1322380957 5 293179 32.62 .0001 75 66
    1 6k3uuf3g8pwh6 1628223527 3 90569 28.00 .0003 = 4 6k3uuf3g8pwh6 1628223527 3 98133 20.24 .0002 -33 8
    1 0c11vprf4881w 856749079 8 223512 15.24 .0001 = 4 0c11vprf4881w 856749079 6 213021 17.96 .0001 24 -5
    1 g9wsbkb2jag3j 574689976 5 148925 9.33 .0001 = 4 g9wsbkb2jag3j 574689976 7 54410 4.41 .0001 29 -63

    Test 5: Managing Manual Indexing

    Finally, in this test, I started with all the delivered SOE indexes and have configured Automatic Indexing to consider dropping both automatic and manual indexes that have not been used for an hour (the default is 373 days, I have set this absurdly low value just to demonstrate the behaviour of this feature).  Initially, Automatic Indexing is running in report only mode when I started Swingbench running.
    EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_DEFAULT_TABLESPACE','AUTO_INDEXES_TS');
    EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_SPACE_BUDGET','100');

    EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_COMPRESSION','OFF');
    EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_RETENTION_FOR_AUTO','.041666');
    EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_RETENTION_FOR_MANUAL','.041666');
    EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_REPORT_RETENTION','1');
    EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_SCHEMA', 'SOE', allow => TRUE);

    EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_MODE','REPORT ONLY');
    After half an hour I switched to 'implement' mode.
    EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_MODE','IMPLEMENT');
    Very quickly (because I had previously run this test and the statements were already in the SQL Tuning set) I was left with just 17 indexes.
    Table                      Index                                     Cons
    Owner TABLE_NAME Owner INDEX_NAME UNIQUENES Type STATUS VISIBILIT AUT INDEX_KEYS
    ----- -------------------- ----- ------------------------- --------- ---- ------------ --------- --- -------------------------------------------------
    SOE ADDRESSES SOE ADDRESS_CUST_IX NONUNIQUE R VALID VISIBLE NO CUSTOMER_ID
    SOE ADDRESS_PK UNIQUE P VALID VISIBLE NO ADDRESS_ID

    SOE CARD_DETAILS SOE CARDDETAILS_CUST_IX NONUNIQUE VALID VISIBLE NO CUSTOMER_ID
    SOE CARD_DETAILS_PK UNIQUE P VALID VISIBLE NO CARD_ID

    SOE CUSTOMERS SOE CUSTOMERS_PK UNIQUE P VALID VISIBLE NO CUSTOMER_ID
    SOE CUST_FUNC_LOWER_NAME_IX NONUNIQUE VALID VISIBLE NO SYS_NC00017$,SYS_NC00018$

    SOE INVENTORIES SOE INVENTORY_PK UNIQUE P VALID VISIBLE NO PRODUCT_ID,WAREHOUSE_ID

    SOE ORDERS SOE ORDER_PK UNIQUE P VALID VISIBLE NO ORDER_ID
    SOE ORD_CUSTOMER_IX NONUNIQUE R VALID VISIBLE NO CUSTOMER_ID
    SOE ORD_SALES_REP_IX NONUNIQUE VALID VISIBLE NO SALES_REP_ID
    SOE ORD_WAREHOUSE_IX NONUNIQUE VALID VISIBLE NO WAREHOUSE_ID,ORDER_STATUS

    SOE ORDER_ITEMS SOE ITEM_ORDER_IX NONUNIQUE R VALID VISIBLE NO ORDER_ID
    SOE ORDER_ITEMS_PK UNIQUE P VALID VISIBLE NO ORDER_ID,LINE_ITEM_ID

    SOE PRODUCT_DESCRIPTIONS SOE PRD_DESC_PK UNIQUE P VALID VISIBLE NO PRODUCT_ID,LANGUAGE_ID

    SOE PRODUCT_INFORMATION SOE PRODUCT_INFORMATION_PK UNIQUE P VALID VISIBLE NO PRODUCT_ID
    SOE PROD_CATEGORY_IX NONUNIQUE VALID VISIBLE NO CATEGORY_ID

    SOE WAREHOUSES SOE WAREHOUSES_PK UNIQUE P VALID VISIBLE NO WAREHOUSE_ID

    17 rows selected.

    The automatic indexing actions only report actions on automatic indexes.  It does not report decisions to drop or not drop manual indexes.  I only know the indexes have gone because I manually compared with the initial set of indexes.
    It has left 5 secondary indexes, but it has removed 3 of the 6 indexes on foreign keys that DROP_SECONDARY_INDEXES left intact.
    We can see from the performance chart that there is a significant drop in the performance of the test after about 30 minutes when the automatic indexing job dropped the indexes.

    Conclusion

    Automatic Indexing does what it claims, but I think it doesn't go far enough when it comes to identifying new indexes.  In particular, it did not recreate the function-based index (on the lower-case customer names) that makes the most significant difference in performance to Swingbench.
    Oracle makes bold claims for improvements in performance via automatically created indexes.  However, my experience across the SOE benchmark as a whole was that I saw only modest performance gains relative to the point where I dropped the secondary indexes.  The performance of the SQL statements that made use of the automatic indexes certainly did improve, and significantly.  Automatic Indexing generally doesn't create indexes that are not used, but Richard Foote has shown that there are exceptions where the number of buffer gets goes down but the optimizer cost goes up.
    As Tim Hall says, you have to be 'particularly brave' to DROP_SECONDARY_INDEXES.  My experience was that doing so significantly degraded performance, and then Automatic Indexing did not fully mitigate that.  You will be left trying to work out which indexes you have to put back yourself.
    In the current release, I think allowing Automatic Indexing to remove manual indexes would be extremely dangerous.  You wouldn't know when manual indexes, including those on foreign keys, were removed and again you could be left dealing with performance issues.  If, as you should, you use foreign keys to enforce referential integrity you could get TM locking issues.
    I think the SOE benchmark is a fair test of Automatic Indexing.  My manual tuning, that not only restored original performance but improved upon it, was not significantly different to anything I have seen on a typical ERP or other OLTP systems.  It was limited to adding indexes, and I still ended up with fewer indexes.
    It is possible to rebuild, coalesce or shrink automatic indexes, however, you cannot drop or otherwise alter them.  Procedure DROP_AUTO_INDEXES in DBMS_AUTO_INDEX is not documented and does not currently work (in 19.3-20.2).  I think it would be very difficult to let Automatic Indexing do some of the work and then do some manual tuning alongside it.  You would just get in each other's way. The activity reports and the index verification information may be a useful source of information during manual tuning, but that is using the feature as another tuning advisor.  Automatic Indexing is clearly intended to be an autonomous feature.  Either you turn it on and let it do its thing, or not.
    Added 6.5.2020: Richard Foote has blogged on this point since I first wrote this article:
    To be fair this is the initial release (though testing on 20c on a Virtual Machine produced the same behaviour), and like other Oracle database features before it, it will mature with time.  However, at the moment, I think we are a long way from being able to just turn it on and walk away.

    Loading a Flat File from OCI Object Storage into an Autonomous Database. Part 1. Upload to Object Storage

    $
    0
    0
    This blog is the first in a series of three that looks at transferring a file to Oracle Cloud Infrastructure (OCI) Object Storage, and then reading it into the database with an external table or copying it into a regular table.
    Last year I wrote a blog titled Reading the Active Session History Compressed Export File in eDB360/SQLd360 as an External Table.  I set myself the challenge of doing the same thing with an Autonomous database.  I would imagine that these are commonly used Oracle Cloud operations, yet I found the documentation was spread over a number of places, and it took me a while to get it right. So, I hope you find this series helpful.

    Install OCI

    I could just upload my data file through the browser directly into an object storage bucket, but I don't want to copy it to a Windows desktop.  That is not a good option for very large files.  Instead, I am going to install the OCI Command Line Interface onto the Linux VM where my data file resides (see OCI CLI Quickstart Guide).
    I am installing this into the oracle user on a Linux VM where the Oracle database has previously been installed, so I just accepted all the defaults.
    bash -c "$(curl -L https://raw.githubusercontent.com/oracle/oci-cli/master/scripts/install/install.sh)"

    Set up Token-Based Authentication for OCI

    I couldn't get the instructions for generating a token without a browser to work.  Instead, installed OCI on a Windows machine and generated a token there and transferred it to my Linux VM (see Token-based Authentication for the CLI).
    C:\Users\david.kurtz>oci session authenticate
    Enter a region (e.g. ap-mumbai-1, ap-seoul-1, ap-sydney-1, ap-tokyo-1, ca-toronto-1, eu-frankfurt-1, eu-zurich-1, sa-saopaulo-1, uk-london-1, us-ashburn-1, us-gov-ashburn-1, us-gov-chicago-1, us-gov-phoenix-1, us-langley-1, us-luke-1, us-phoenix-1): uk-london-1
    Please switch to newly opened browser window to log in!
    Completed browser authentication process!
    Config written to: C:\Users\david.kurtz\.oci\config

    Try out your newly created session credentials with the following example command:

    oci iam region list --config-file C:\Users\david.kurtz\.oci\config --profile DEFAULT --auth security_token
    If I run the suggested example command, I get this response with the list of OCI regions.
    {
    "data": [

    {
    "key": "LHR",
    "name": "uk-london-1"
    },

    ]
    }

    Export OCI Profile

    Now I can export the profile to a zip file
    C:\Users\david.kurtz>oci session export --profile DEFAULT --output-file DEFAULT
    File DEFAULT.zip already exists, do you want to overwrite it? [y/N]: y
    Exporting profile: DEFAULT from config file: C:\Users\david.kurtz\.oci\config
    Export file written to: C:\Users\david.kurtz\DEFAULT.zip

    Import OCI Profile

    I can transfer this zip file to my Linux VM and import it.
    [oracle@oracle-database .oci]$ oci session import --session-archive ./DEFAULT.zip --force
    Config already contains a profile with the same name as the archived profile: DEFAULT. Provide an alternative name for the imported profile: myprofile
    Imported profile myprofile written to: /home/oracle/.oci/config

    Try out your newly imported session credentials with the following example command:

    oci iam region list --config-file /home/oracle/.oci/config --profile myprofile --auth security_token
    I can test it by again getting the list of OCI regions.

    Upload a File

    I have created a bucket on OCI.
    I could upload a file through the OCI web interface, but I want to use a command-line from my Linux VM
    [oracle@oracle-database ~]$ oci os object put --bucket-name bucket-20200505-1552 --file /media/sf_temp/dba_hist_active_sess_history.txt.gz --disable-parallel-uploads --config-file /home/oracle/.oci/config --profile myprofile --auth security_token
    Upload ID: 1ad452f7-ab49-a24b-2fe9-f55f565cdf40
    Split file into 2 parts for upload.
    Uploading object [####################################] 100%
    {
    "etag": "66681c40-4e11-4b73-baf9-cc1e4c3ebd5f",
    "last-modified": "Wed, 06 May 2020 15:17:03 GMT",
    "opc-multipart-md5": "MFdfU7vGZlJ5Mb4nopxtpw==-2"
    }
    I can see the file in the bucket via the web interface, and I can see that the size and the MD5 checksum are both correct.
    In the next post, I will explain how to read the file from Object Storage using an External Table.

    Loading a Flat File from OCI Object Storage into an Autonomous Database. Part 2. Reading from Object Storage with an External Table

    $
    0
    0
    This blog is the second in a series of three that looks at transferring a file to Oracle Cloud Infrastructure (OCI) Object Storage, and then reading it into the database with an external table or copying it into a regular table.

    Create A Credential 

    First, I need to create a credential that the database will use to connect to the OCI Object Storage. This is not the same as the credential that the OCI CLI used to connect.
    In the OCI interface navigate to Identity ➧ Users ➧ User Details, and create an Authentication Token.
    It is important to copy the token at this point because you will not see it again.
    Now you can put the token into a database credential.
    connect admin/Password2020@gofaster1b_tp 
    BEGIN
    DBMS_CLOUD.CREATE_CREDENTIAL (
    credential_name => 'MY_BUCKET',
    username=> 'oraclecloud1@go-faster.co.uk',
    password=> 'K7xfi-mG<1z:dq code="" end="" m="">1z:dq>
    Note: The visibility of the bucket that I created earlier is private by default. Therefore, I can only access it with an authenticated user. If I were to create a credential for an unauthenticated user, it could only be accessed as public bucket. Otherwise, I would obtain an error.
    ORA-29913: error in executing ODCIEXTTABLEOPEN callout
    ORA-20404: Object not found -
    https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu/b/bucket-202005
    05-1552/o/dba_hist_active_sess_history.txt.gz
    ORA-06512: at "C##CLOUD$SERVICE.DBMS_CLOUD", line 964
    ORA-06512: at "C##CLOUD$SERVICE.DBMS_CLOUD_INTERNAL", line 3891
    ORA-06512: at line 1

    Create an External Table

    In my blog Reading the Active Session History Compressed Export File in eDB360/SQLd360 as an External Table, I showed how to create an external table to read a compressed file.  Now I am going to do the same thing as, except that now I am going to read it from OCI Object Storage into an external table created with DBMS_CLOUD.
    • I have to provide a list of columns in the external table and a list of fields in the flat file.
    • N.B. Some column names end in a # symbol.  These must be put in double-quotes in the field list though this is not needed in the column list.
    • The Access Parameters section of the ORACLE_LOADER access driver that I used to create the external table becomes contents the format parameter.  I have created a JSON object to hold the various parameters.  The parameters are not exactly the same, in fact, I have added some.  See also DBMS_CLOUD Package Format Options
    DROP TABLE ash_hist PURGE;
    BEGIN
    DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
    table_name =>'ASH_HIST',
    credential_name =>'MY_BUCKET',
    file_uri_list =>'https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu/b/bucket-20200505-1552/o/dba_hist_active_sess_history.txt.gz',
    format => json_object('blankasnull' value 'true'
    ,'compression' value 'gzip'
    ,'dateformat' value 'YYYY-MM-DD/HH24:mi:ss'
    ,'timestampformat' value 'YYYY-MM-DD/HH24:mi:ss.ff'
    ,'delimiter' value '<,>'
    ,'ignoreblanklines' value 'true'
    ,'rejectlimit' value '10'
    ,'removequotes' value 'true'
    ,'trimspaces' value 'lrtrim'
    ),
    column_list => 'SNAP_ID NUMBER
    ,DBID NUMBER
    ,INSTANCE_NUMBER NUMBER
    ,SAMPLE_ID NUMBER
    ,SAMPLE_TIME TIMESTAMP(3)
    ,SESSION_ID NUMBER
    ,SESSION_SERIAL# NUMBER
    ,SESSION_TYPE VARCHAR2(10)
    ,FLAGS NUMBER
    ,USER_ID NUMBER
    -----------------------------------------
    ,SQL_ID VARCHAR2(13)
    ,IS_SQLID_CURRENT VARCHAR2(1)
    ,SQL_CHILD_NUMBER NUMBER
    ,SQL_OPCODE NUMBER
    ,SQL_OPNAME VARCHAR2(64)
    ,FORCE_MATCHING_SIGNATURE NUMBER
    ,TOP_LEVEL_SQL_ID VARCHAR2(13)
    ,TOP_LEVEL_SQL_OPCODE NUMBER
    ,SQL_PLAN_HASH_VALUE NUMBER
    ,SQL_FULL_PLAN_HASH_VALUE NUMBER
    -----------------------------------------
    ,SQL_ADAPTIVE_PLAN_RESOLVED NUMBER
    ,SQL_PLAN_LINE_ID NUMBER
    ,SQL_PLAN_OPERATION VARCHAR2(64)
    ,SQL_PLAN_OPTIONS VARCHAR2(64)
    ,SQL_EXEC_ID NUMBER
    ,SQL_EXEC_START DATE
    ,PLSQL_ENTRY_OBJECT_ID NUMBER
    ,PLSQL_ENTRY_SUBPROGRAM_ID NUMBER
    ,PLSQL_OBJECT_ID NUMBER
    ,PLSQL_SUBPROGRAM_ID NUMBER
    -----------------------------------------
    ,QC_INSTANCE_ID NUMBER
    ,QC_SESSION_ID NUMBER
    ,QC_SESSION_SERIAL# NUMBER
    ,PX_FLAGS NUMBER
    ,EVENT VARCHAR2(64)
    ,EVENT_ID NUMBER
    ,SEQ# NUMBER
    ,P1TEXT VARCHAR2(64)
    ,P1 NUMBER
    ,P2TEXT VARCHAR2(64)
    -----------------------------------------
    ,P2 NUMBER
    ,P3TEXT VARCHAR2(64)
    ,P3 NUMBER
    ,WAIT_CLASS VARCHAR2(64)
    ,WAIT_CLASS_ID NUMBER
    ,WAIT_TIME NUMBER
    ,SESSION_STATE VARCHAR2(7)
    ,TIME_WAITED NUMBER
    ,BLOCKING_SESSION_STATUS VARCHAR2(11)
    ,BLOCKING_SESSION NUMBER
    -----------------------------------------
    ,BLOCKING_SESSION_SERIAL# NUMBER
    ,BLOCKING_INST_ID NUMBER
    ,BLOCKING_HANGCHAIN_INFO VARCHAR2(1)
    ,CURRENT_OBJ# NUMBER
    ,CURRENT_FILE# NUMBER
    ,CURRENT_BLOCK# NUMBER
    ,CURRENT_ROW# NUMBER
    ,TOP_LEVEL_CALL# NUMBER
    ,TOP_LEVEL_CALL_NAME VARCHAR2(64)
    ,CONSUMER_GROUP_ID NUMBER
    -----------------------------------------
    ,XID RAW(8)
    ,REMOTE_INSTANCE# NUMBER
    ,TIME_MODEL NUMBER
    ,IN_CONNECTION_MGMT VARCHAR2(1)
    ,IN_PARSE VARCHAR2(1)
    ,IN_HARD_PARSE VARCHAR2(1)
    ,IN_SQL_EXECUTION VARCHAR2(1)
    ,IN_PLSQL_EXECUTION VARCHAR2(1)
    ,IN_PLSQL_RPC VARCHAR2(1)
    ,IN_PLSQL_COMPILATION VARCHAR2(1)
    -----------------------------------------
    ,IN_JAVA_EXECUTION VARCHAR2(1)
    ,IN_BIND VARCHAR2(1)
    ,IN_CURSOR_CLOSE VARCHAR2(1)
    ,IN_SEQUENCE_LOAD VARCHAR2(1)
    ,IN_INMEMORY_QUERY VARCHAR2(1) /*added 12.1*/
    ,IN_INMEMORY_POPULATE VARCHAR2(1) /*added 12.1*/
    ,IN_INMEMORY_PREPOPULATE VARCHAR2(1) /*added 12.1*/
    ,IN_INMEMORY_REPOPULATE VARCHAR2(1) /*added 12.1*/
    ,IN_INMEMORY_TREPOPULATE VARCHAR2(1) /*added 12.1*/
    ,CAPTURE_OVERHEAD VARCHAR2(1)
    -----------------------------------------
    ,REPLAY_OVERHEAD VARCHAR2(1)
    ,IS_CAPTURED VARCHAR2(1)
    ,IS_REPLAYED VARCHAR2(1)
    ,SERVICE_HASH NUMBER
    ,PROGRAM VARCHAR2(64)
    ,MODULE VARCHAR2(64)
    ,ACTION VARCHAR2(64)
    ,CLIENT_ID VARCHAR2(64)
    ,MACHINE VARCHAR2(64)
    ,PORT NUMBER
    -----------------------------------------
    ,ECID VARCHAR2(64)
    ,DBREPLAY_FILE_ID NUMBER /*added 12.1*/
    ,DBREPLAY_CALL_COUNTER NUMBER /*added 12.1*/
    ,TM_DELTA_TIME NUMBER
    ,TM_DELTA_CPU_TIME NUMBER
    ,TM_DELTA_DB_TIME NUMBER
    ,DELTA_TIME NUMBER
    ,DELTA_READ_IO_REQUESTS NUMBER
    ,DELTA_WRITE_IO_REQUESTS NUMBER
    ,DELTA_READ_IO_BYTES NUMBER
    -----------------------------------------
    ,DELTA_WRITE_IO_BYTES NUMBER
    ,DELTA_INTERCONNECT_IO_BYTES NUMBER
    ,PGA_ALLOCATED NUMBER
    ,TEMP_SPACE_ALLOCATED NUMBER
    ,DBOP_NAME VARCHAR2(64) /*added 12.1*/
    ,DBOP_EXEC_ID NUMBER /*added 12.1*/
    ,CON_DBID NUMBER /*added 12.1*/
    ,CON_ID NUMBER /*added 12.1*/'
    -----------------------------------------
    ,field_list=>'SNAP_ID,DBID,INSTANCE_NUMBER,SAMPLE_ID,SAMPLE_TIME ,SESSION_ID,"SESSION_SERIAL#",SESSION_TYPE,FLAGS,USER_ID
    ,SQL_ID,IS_SQLID_CURRENT,SQL_CHILD_NUMBER,SQL_OPCODE,SQL_OPNAME,FORCE_MATCHING_SIGNATURE,TOP_LEVEL_SQL_ID,TOP_LEVEL_SQL_OPCODE,SQL_PLAN_HASH_VALUE,SQL_FULL_PLAN_HASH_VALUE
    ,SQL_ADAPTIVE_PLAN_RESOLVED,SQL_PLAN_LINE_ID,SQL_PLAN_OPERATION,SQL_PLAN_OPTIONS,SQL_EXEC_ID,SQL_EXEC_START ,PLSQL_ENTRY_OBJECT_ID,PLSQL_ENTRY_SUBPROGRAM_ID,PLSQL_OBJECT_ID,PLSQL_SUBPROGRAM_ID
    ,QC_INSTANCE_ID,QC_SESSION_ID,"QC_SESSION_SERIAL#",PX_FLAGS,EVENT,EVENT_ID,"SEQ#",P1TEXT,P1,P2TEXT
    ,P2,P3TEXT,P3,WAIT_CLASS,WAIT_CLASS_ID,WAIT_TIME,SESSION_STATE,TIME_WAITED,BLOCKING_SESSION_STATUS,BLOCKING_SESSION
    ,"BLOCKING_SESSION_SERIAL#",BLOCKING_INST_ID,BLOCKING_HANGCHAIN_INFO,"CURRENT_OBJ#","CURRENT_FILE#","CURRENT_BLOCK#","CURRENT_ROW#","TOP_LEVEL_CALL#",TOP_LEVEL_CALL_NAME,CONSUMER_GROUP_ID
    ,XID,"REMOTE_INSTANCE#",TIME_MODEL,IN_CONNECTION_MGMT,IN_PARSE,IN_HARD_PARSE,IN_SQL_EXECUTION,IN_PLSQL_EXECUTION,IN_PLSQL_RPC,IN_PLSQL_COMPILATION
    ,IN_JAVA_EXECUTION,IN_BIND,IN_CURSOR_CLOSE,IN_SEQUENCE_LOAD,IN_INMEMORY_QUERY,IN_INMEMORY_POPULATE,IN_INMEMORY_PREPOPULATE,IN_INMEMORY_REPOPULATE,IN_INMEMORY_TREPOPULATE,CAPTURE_OVERHEAD
    ,REPLAY_OVERHEAD,IS_CAPTURED,IS_REPLAYED,SERVICE_HASH,PROGRAM,MODULE,ACTION,CLIENT_ID,MACHINE,PORT
    ,ECID,DBREPLAY_FILE_ID,DBREPLAY_CALL_COUNTER,TM_DELTA_TIME,TM_DELTA_CPU_TIME,TM_DELTA_DB_TIME,DELTA_TIME,DELTA_READ_IO_REQUESTS,DELTA_WRITE_IO_REQUESTS,DELTA_READ_IO_BYTES
    ,DELTA_WRITE_IO_BYTES,DELTA_INTERCONNECT_IO_BYTES,PGA_ALLOCATED,TEMP_SPACE_ALLOCATED,DBOP_NAME,DBOP_EXEC_ID,CON_DBID,CON_ID'
    );
    END;
    /
    This file contains 1.4M rows in a 200Mb compressed file. If uncompressed it would be 4.6Gb. It takes about 81 seconds to perform a full scan on it.
    set autotrace on timi on pages 99 lines 160
    break on report
    compute sum of ash_secs on report
    column event format a40
    column min(sample_time) format a22
    column max(sample_time) format a22
    select event, sum(10) ash_Secs, min(sample_time), max(sample_time)
    from ash_hist
    --where rownum <= 1000
    group by event
    order by ash_Secs desc
    ;
    EVENT ASH_SECS MIN(SAMPLE_TIME) MAX(SAMPLE_TIME)
    ---------------------------------------- ---------- ---------------------- ----------------------
    10304530 22-MAR-20 09.59.51.125 07-APR-20 23.00.30.395
    direct path read 3258500 22-MAR-20 09.59.51.125 07-APR-20 23.00.30.395
    SQL*Net more data to client 269220 22-MAR-20 10.00.31.205 07-APR-20 22.59.30.275
    direct path write temp 32400 22-MAR-20 11.39.53.996 07-APR-20 21.43.47.329
    gc cr block busy 24930 22-MAR-20 10.51.33.189 07-APR-20 22.56.56.804

    latch: gc element 10 30-MAR-20 18.42.51.748 30-MAR-20 18.42.51.748
    ----------
    sum 14093050

    86 rows selected.

    Elapsed: 00:01:21.17
    We can see from the plan that it full scanned the external table in parallel.
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 4220750095

    ------------------------------------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
    ------------------------------------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 8344K| 374M| 1417 (33)| 00:00:01 | | | |
    | 1 | PX COORDINATOR | | | | | | | | |
    | 2 | PX SEND QC (ORDER) | :TQ10002 | 8344K| 374M| 1417 (33)| 00:00:01 | Q1,02 | P->S | QC (ORDER) |
    | 3 | SORT ORDER BY | | 8344K| 374M| 1417 (33)| 00:00:01 | Q1,02 | PCWP | |
    | 4 | PX RECEIVE | | 8344K| 374M| 1417 (33)| 00:00:01 | Q1,02 | PCWP | |
    | 5 | PX SEND RANGE | :TQ10001 | 8344K| 374M| 1417 (33)| 00:00:01 | Q1,01 | P->P | RANGE |
    | 6 | HASH GROUP BY | | 8344K| 374M| 1417 (33)| 00:00:01 | Q1,01 | PCWP | |
    | 7 | PX RECEIVE | | 8344K| 374M| 1417 (33)| 00:00:01 | Q1,01 | PCWP | |
    | 8 | PX SEND HASH | :TQ10000 | 8344K| 374M| 1417 (33)| 00:00:01 | Q1,00 | P->P | HASH |
    | 9 | HASH GROUP BY | | 8344K| 374M| 1417 (33)| 00:00:01 | Q1,00 | PCWP | |
    | 10 | PX BLOCK ITERATOR | | 8344K| 374M| 1089 (13)| 00:00:01 | Q1,00 | PCWC | |
    | 11 | EXTERNAL TABLE ACCESS FULL| ASH_HIST | 8344K| 374M| 1089 (13)| 00:00:01 | Q1,00 | PCWP | |
    ------------------------------------------------------------------------------------------------------------------------------

    Note
    -----
    - automatic DOP: Computed Degree of Parallelism is 2 because of degree limit


    Statistics
    ----------------------------------------------------------
    2617 recursive calls
    3 db block gets
    2751 consistent gets
    0 physical reads
    728 redo size
    5428 bytes sent via SQL*Net to client
    602 bytes received via SQL*Net from client
    7 SQL*Net roundtrips to/from client
    346 sorts (memory)
    0 sorts (disk)
    86 rows processed
    In the next post, I will explain how to copy the data directly from Object Storage into a regular table.

    Loading a Flat File from OCI Object Storage into an Autonomous Database. Part 3. Copying data from Object Storage to a Regular Table

    $
    0
    0
    This blog is the third in a series of three that looks at transferring a file to Oracle Cloud Infrastructure (OCI) Object Storage, and then reading it into the database with an external table or copying it into a regular table.

    Copy Data into Table 

    Alternatively, we can copy the data into a normal table. The table needs to be created in advance. This time, I am going to run the copy as user SOE rather than ADMIN.  I need to:
    • Grant connect and resource privilege and quota on the data tablespace.
    • Grant execute on DBMS_CLOUD to SOE, so it can execute the command.
    • Grant READ and WRITE access on the DATA_PUMP_DIR directory – the log and bad files created by this process are written to this database directory.
    connect admin/Password2020!@gofaster1b_tp 
    CREATE USER soe IDENTIFIED BY Password2020;
    GRANT CONNECT, RESOURCE TO soe;
    GRANT EXECUTE ON DBMS_CLOUD TO soe;
    GRANT READ, WRITE ON DIRECTORY data_pump_dir TO soe;
    ALTER USER soe QUOTA UNLIMITED ON data;
    I am now going to switch to user SOE and create my table.
    connect soe/Password2020@gofaster1b_tp
    Drop table soe.ash_hist purge;
    CREATE TABLE soe.ASH_HIST
    ( SNAP_ID NUMBER,
    DBID NUMBER,
    INSTANCE_NUMBER NUMBER,
    SAMPLE_ID NUMBER,
    SAMPLE_TIME TIMESTAMP (3),
    -- SAMPLE_TIME_UTC TIMESTAMP (3),
    -- USECS_PER_ROW NUMBER,
    SESSION_ID NUMBER,
    SESSION_SERIAL# NUMBER,
    SESSION_TYPE VARCHAR2(10),
    FLAGS NUMBER,
    USER_ID NUMBER,
    -----------------------------------------
    SQL_ID VARCHAR2(13),
    IS_SQLID_CURRENT VARCHAR2(1),
    SQL_CHILD_NUMBER NUMBER,
    SQL_OPCODE NUMBER,
    SQL_OPNAME VARCHAR2(64),
    FORCE_MATCHING_SIGNATURE NUMBER,
    TOP_LEVEL_SQL_ID VARCHAR2(13),
    TOP_LEVEL_SQL_OPCODE NUMBER,
    SQL_PLAN_HASH_VALUE NUMBER,
    SQL_FULL_PLAN_HASH_VALUE NUMBER,
    -----------------------------------------
    SQL_ADAPTIVE_PLAN_RESOLVED NUMBER,
    SQL_PLAN_LINE_ID NUMBER,
    SQL_PLAN_OPERATION VARCHAR2(64),
    SQL_PLAN_OPTIONS VARCHAR2(64),
    SQL_EXEC_ID NUMBER,
    SQL_EXEC_START DATE,
    PLSQL_ENTRY_OBJECT_ID NUMBER,
    PLSQL_ENTRY_SUBPROGRAM_ID NUMBER,
    PLSQL_OBJECT_ID NUMBER,
    PLSQL_SUBPROGRAM_ID NUMBER,
    -----------------------------------------
    QC_INSTANCE_ID NUMBER,
    QC_SESSION_ID NUMBER,
    QC_SESSION_SERIAL# NUMBER,
    PX_FLAGS NUMBER,
    EVENT VARCHAR2(64),
    EVENT_ID NUMBER,
    SEQ# NUMBER,
    P1TEXT VARCHAR2(64),
    P1 NUMBER,
    P2TEXT VARCHAR2(64),
    -----------------------------------------
    P2 NUMBER,
    P3TEXT VARCHAR2(64),
    P3 NUMBER,
    WAIT_CLASS VARCHAR2(64),
    WAIT_CLASS_ID NUMBER,
    WAIT_TIME NUMBER,
    SESSION_STATE VARCHAR2(7),
    TIME_WAITED NUMBER,
    BLOCKING_SESSION_STATUS VARCHAR2(11),
    BLOCKING_SESSION NUMBER,
    -----------------------------------------
    BLOCKING_SESSION_SERIAL# NUMBER,
    BLOCKING_INST_ID NUMBER,
    BLOCKING_HANGCHAIN_INFO VARCHAR2(1),
    CURRENT_OBJ# NUMBER,
    CURRENT_FILE# NUMBER,
    CURRENT_BLOCK# NUMBER,
    CURRENT_ROW# NUMBER,
    TOP_LEVEL_CALL# NUMBER,
    TOP_LEVEL_CALL_NAME VARCHAR2(64),
    CONSUMER_GROUP_ID NUMBER,
    -----------------------------------------
    XID RAW(8),
    REMOTE_INSTANCE# NUMBER,
    TIME_MODEL NUMBER,
    IN_CONNECTION_MGMT VARCHAR2(1),
    IN_PARSE VARCHAR2(1),
    IN_HARD_PARSE VARCHAR2(1),
    IN_SQL_EXECUTION VARCHAR2(1),
    IN_PLSQL_EXECUTION VARCHAR2(1),
    IN_PLSQL_RPC VARCHAR2(1),
    IN_PLSQL_COMPILATION VARCHAR2(1),
    -----------------------------------------
    IN_JAVA_EXECUTION VARCHAR2(1),
    IN_BIND VARCHAR2(1),
    IN_CURSOR_CLOSE VARCHAR2(1),
    IN_SEQUENCE_LOAD VARCHAR2(1),
    IN_INMEMORY_QUERY VARCHAR2(1),
    IN_INMEMORY_POPULATE VARCHAR2(1),
    IN_INMEMORY_PREPOPULATE VARCHAR2(1),
    IN_INMEMORY_REPOPULATE VARCHAR2(1),
    IN_INMEMORY_TREPOPULATE VARCHAR2(1),
    -- IN_TABLESPACE_ENCRYPTION VARCHAR2(1),
    CAPTURE_OVERHEAD VARCHAR2(1),
    -----------------------------------------
    REPLAY_OVERHEAD VARCHAR2(1),
    IS_CAPTURED VARCHAR2(1),
    IS_REPLAYED VARCHAR2(1),
    -- IS_REPLAY_SYNC_TOKEN_HOLDER VARCHAR2(1),
    SERVICE_HASH NUMBER,
    PROGRAM VARCHAR2(64),
    MODULE VARCHAR2(64),
    ACTION VARCHAR2(64),
    CLIENT_ID VARCHAR2(64),
    MACHINE VARCHAR2(64),
    PORT NUMBER,
    -----------------------------------------
    ECID VARCHAR2(64),
    DBREPLAY_FILE_ID NUMBER,
    DBREPLAY_CALL_COUNTER NUMBER,
    TM_DELTA_TIME NUMBER,
    TM_DELTA_CPU_TIME NUMBER,
    TM_DELTA_DB_TIME NUMBER,
    DELTA_TIME NUMBER,
    DELTA_READ_IO_REQUESTS NUMBER,
    DELTA_WRITE_IO_REQUESTS NUMBER,
    DELTA_READ_IO_BYTES NUMBER,
    -----------------------------------------
    DELTA_WRITE_IO_BYTES NUMBER,
    DELTA_INTERCONNECT_IO_BYTES NUMBER,
    PGA_ALLOCATED NUMBER,
    TEMP_SPACE_ALLOCATED NUMBER,
    DBOP_NAME VARCHAR2(64),
    DBOP_EXEC_ID NUMBER,
    CON_DBID NUMBER,
    CON_ID NUMBER,
    -----------------------------------------
    CONSTRAINT ash_hist_pk PRIMARY KEY (dbid, instance_number, snap_id, sample_id, session_id)
    )
    COMPRESS FOR QUERY LOW
    /
    As Autonomous Databases run on Exadata, I have also specified Hybrid Columnar Compression (HCC) for this table.
    Credentials are specific to the database user.  I have to create an additional credential, for the same cloud user, but owned by SOE.
    ALTER SESSION SET nls_date_Format='hh24:mi:ss dd.mm.yyyy';
    set serveroutput on timi on
    BEGIN
    DBMS_CLOUD.CREATE_CREDENTIAL (
    credential_name => 'SOE_BUCKET',
    username=> 'oraclecloud1@go-faster.co.uk',
    password=> 'K7xfi-mG<1Z:dq#88;1m'
    );
    END;
    /
    column owner format a10
    column credential_name format a20
    column comments format a80
    column username format a40
    SELECT * FROM dba_credentials;

    OWNER CREDENTIAL_NAME USERNAME WINDOWS_DOMAIN
    ---------- -------------------- ---------------------------------------- ------------------------------
    COMMENTS ENABL
    -------------------------------------------------------------------------------- -----
    ADMIN MY_BUCKET oraclecloud1@go-faster.co.uk
    {"comments":"Created via DBMS_CLOUD.create_credential"} TRUE

    SOE SOE_BUCKET oraclecloud1@go-faster.co.uk
    {"comments":"Created via DBMS_CLOUD.create_credential"} TRUE

    The COPY_DATA procedure is similar to CREATE_EXTERNAL_TABLE described in the previous post, but it doesn't have a column list. The field names much match the column names. It is sensitive to field names with a trailing #.  These must be enclosed in double-quotes.
    TRUNCATE TABLE soe.ash_hist;
    DECLARE
    l_operation_id NUMBER;
    BEGIN
    DBMS_CLOUD.COPY_DATA(
    table_name =>'ASH_HIST',
    credential_name =>'SOE_BUCKET',
    file_uri_list =>'https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu/b/bucket-20200505-1552/o/dba_hist_active_sess_history.txt.gz',
    schema_name => 'SOE',
    format => json_object('blankasnull' value 'true'
    ,'compression' value 'gzip'
    ,'dateformat' value 'YYYY-MM-DD/HH24:mi:ss'
    ,'timestampformat' value 'YYYY-MM-DD/HH24:mi:ss.ff'
    ,'delimiter' value '<,>'
    ,'ignoreblanklines' value 'true'
    ,'rejectlimit' value '10'
    ,'removequotes' value 'true'
    ,'trimspaces' value 'lrtrim'
    ),
    field_list=>'SNAP_ID,DBID,INSTANCE_NUMBER,SAMPLE_ID,SAMPLE_TIME ,SESSION_ID,"SESSION_SERIAL#",SESSION_TYPE,FLAGS,USER_ID
    ,SQL_ID,IS_SQLID_CURRENT,SQL_CHILD_NUMBER,SQL_OPCODE,SQL_OPNAME,FORCE_MATCHING_SIGNATURE,TOP_LEVEL_SQL_ID,TOP_LEVEL_SQL_OPCODE,SQL_PLAN_HASH_VALUE,SQL_FULL_PLAN_HASH_VALUE
    ,SQL_ADAPTIVE_PLAN_RESOLVED,SQL_PLAN_LINE_ID,SQL_PLAN_OPERATION,SQL_PLAN_OPTIONS,SQL_EXEC_ID,SQL_EXEC_START,PLSQL_ENTRY_OBJECT_ID,PLSQL_ENTRY_SUBPROGRAM_ID,PLSQL_OBJECT_ID,PLSQL_SUBPROGRAM_ID
    ,QC_INSTANCE_ID,QC_SESSION_ID,"QC_SESSION_SERIAL#",PX_FLAGS,EVENT,EVENT_ID,"SEQ#",P1TEXT,P1,P2TEXT
    ,P2,P3TEXT,P3,WAIT_CLASS,WAIT_CLASS_ID,WAIT_TIME,SESSION_STATE,TIME_WAITED,BLOCKING_SESSION_STATUS,BLOCKING_SESSION
    ,"BLOCKING_SESSION_SERIAL#",BLOCKING_INST_ID,BLOCKING_HANGCHAIN_INFO,"CURRENT_OBJ#","CURRENT_FILE#","CURRENT_BLOCK#","CURRENT_ROW#","TOP_LEVEL_CALL#",TOP_LEVEL_CALL_NAME,CONSUMER_GROUP_ID
    ,XID,"REMOTE_INSTANCE#",TIME_MODEL,IN_CONNECTION_MGMT,IN_PARSE,IN_HARD_PARSE,IN_SQL_EXECUTION,IN_PLSQL_EXECUTION,IN_PLSQL_RPC,IN_PLSQL_COMPILATION
    ,IN_JAVA_EXECUTION,IN_BIND,IN_CURSOR_CLOSE,IN_SEQUENCE_LOAD,IN_INMEMORY_QUERY,IN_INMEMORY_POPULATE,IN_INMEMORY_PREPOPULATE,IN_INMEMORY_REPOPULATE,IN_INMEMORY_TREPOPULATE,CAPTURE_OVERHEAD
    ,REPLAY_OVERHEAD,IS_CAPTURED,IS_REPLAYED,SERVICE_HASH,PROGRAM,MODULE,ACTION,CLIENT_ID,MACHINE,PORT
    ,ECID,DBREPLAY_FILE_ID,DBREPLAY_CALL_COUNTER,TM_DELTA_TIME,TM_DELTA_CPU_TIME,TM_DELTA_DB_TIME,DELTA_TIME,DELTA_READ_IO_REQUESTS,DELTA_WRITE_IO_REQUESTS,DELTA_READ_IO_BYTES
    ,DELTA_WRITE_IO_BYTES,DELTA_INTERCONNECT_IO_BYTES,PGA_ALLOCATED,TEMP_SPACE_ALLOCATED,DBOP_NAME,DBOP_EXEC_ID,CON_DBID,CON_ID',
    operation_id=>l_operation_id
    );
    dbms_output.put_line('Operation ID:'||l_operation_id||' finished successfully');
    EXCEPTION WHEN OTHERS THEN
    dbms_output.put_line('Operation ID:'||l_operation_id||' raised an error');
    RAISE;
    END;
    /

    The copy data takes slightly longer than the query on the external table.
    Operation ID:31 finished successfully

    PL/SQL procedure successfully completed.

    Elapsed: 00:02:01.11
    The status of the copy operation is reported in USER_LOAD_OPERATIONS.  This includes the number of rows loaded and the names of external tables that are created for the log and bad files.
    set lines 120
    column type format a10
    column file_uri_list format a64
    column start_time format a32
    column update_time format a32
    column owner_name format a10
    column table_name format a10
    column partition_name format a10
    column subpartition_name format a10
    column logfile_table format a15
    column badfile_table format a15
    column tempext_table format a30
    select * from user_load_operations where id = &operation_id;

    ID TYPE SID SERIAL# START_TIME UPDATE_TIME STATUS
    ---------- ---------- ---------- ---------- -------------------------------- -------------------------------- ---------
    OWNER_NAME TABLE_NAME PARTITION_ SUBPARTITI FILE_URI_LIST ROWS_LOADED
    ---------- ---------- ---------- ---------- ---------------------------------------------------------------- -----------
    LOGFILE_TABLE BADFILE_TABLE TEMPEXT_TABLE
    --------------- --------------- ------------------------------
    31 COPY 19965 44088 07-MAY-20 17.03.20.328263 +01:00 07-MAY-20 17.05.36.157680 +01:00 COMPLETED
    SOE ASH_HIST https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu 1409305
    /b/bucket-20200505-1552/o/dba_hist_active_sess_history.txt.gz
    COPY$31_LOG COPY$31_BAD COPY$Y2R021UKPJ5F75JCMSKL

    An external table is temporarily created by the COPY_DATA procedure but is then dropped before the procedure completes.  The bad file is empty because the copy operation succeeded without error, but we can query the copy log.
    select * from COPY$31_LOG;

    RECORD
    ------------------------------------------------------------------------------------------------------------------------
    LOG file opened at 05/07/20 16:03:21

    Total Number of Files=1

    Data File: https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu/b/bucket-20200505-1552/o/dba_hist_active_sess_history.txt.gz

    Log File: COPY$31_105537.log

    LOG file opened at 05/07/20 16:03:21

    Total Number of Files=1

    Data File: https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu/b/bucket-20200505-1552/o/dba_hist_active_sess_history.txt.gz

    Log File: COPY$31_105537.log

    LOG file opened at 05/07/20 16:03:21

    KUP-05014: Warning: Intra source concurrency disabled because the URLs specified for the Cloud Service map to compressed data.

    Bad File: COPY$31_105537.bad

    Field Definitions for table COPY$Y2R021UKPJ5F75JCMSKL
    Record format DELIMITED BY
    Data in file has same endianness as the platform
    Rows with all null fields are accepted
    Table level NULLIF (Field = BLANKS)
    Fields in Data Source:

    SNAP_ID CHAR (255)
    Terminated by "<,>"
    Trim whitespace from left and right
    DBID CHAR (255)
    Terminated by "<,>"
    Trim whitespace from left and right
    INSTANCE_NUMBER CHAR (255)
    Terminated by "<,>"
    Trim whitespace from left and right
    SAMPLE_ID CHAR (255)
    Terminated by "<,>"
    Trim whitespace from left and right
    SAMPLE_TIME CHAR (255)
    Date datatype TIMESTAMP, date mask YYYY-MM-DD/HH24:mi:ss.ff
    Terminated by "<,>"
    Trim whitespace from left and right

    CON_ID CHAR (255)
    Terminated by "<,>"
    Trim whitespace from left and right

    Date Cache Statistics for table COPY$Y2R021UKPJ5F75JCMSKL
    Date conversion cache disabled due to overflow (default size: 1000)

    365 rows selected.
    These files are written to the DATA_DUMP_DIR database directory.  We don't have access to the database file system in Autonomous, so Oracle has provided the LIST_FILES procedure in DBMS_CLOUD so that we can see what files are in a directory.
    Set pages 99 lines 150
    Column object_name format a32
    Column created format a32
    Column last_modified format a32
    Column checksum format a20
    SELECT * FROM DBMS_CLOUD.LIST_FILES('DATA_PUMP_DIR');

    OBJECT_NAME BYTES CHECKSUM CREATED LAST_MODIFIED
    -------------------------------- ---------- -------------------- -------------------------------- --------------------------------

    COPY$31_dflt.log 0 07-MAY-20 16.03.20.000000 +00:00 07-MAY-20 16.03.20.000000 +00:00
    COPY$31_dflt.bad 0 07-MAY-20 16.03.20.000000 +00:00 07-MAY-20 16.03.20.000000 +00:00
    COPY$31_105537.log 13591 07-MAY-20 16.03.21.000000 +00:00 07-MAY-20 16.05.35.000000 +00:00

    Statistics are automatically collected on the table by the copy process because it was done in direct-path mode.  We can see the number of rows retrieved corresponds with the number of rows imported by the COPY_DATA procedure.
    Set pages 99 lines 140
    Column owner format a10
    Column IM_STAT_UPDATE_TIME format a30
    Select *
    from all_tab_statistics
    Where table_name = 'ASH_HIST';

    OWNER TABLE_NAME PARTITION_ PARTITION_POSITION SUBPARTITI SUBPARTITION_POSITION OBJECT_TYPE NUM_ROWS BLOCKS EMPTY_BLOCKS
    ---------- ---------- ---------- ------------------ ---------- --------------------- ------------ ---------- ---------- ------------
    AVG_SPACE CHAIN_CNT AVG_ROW_LEN AVG_SPACE_FREELIST_BLOCKS NUM_FREELIST_BLOCKS AVG_CACHED_BLOCKS AVG_CACHE_HIT_RATIO IM_IMCU_COUNT
    ---------- ---------- ----------- ------------------------- ------------------- ----------------- ------------------- -------------
    IM_BLOCK_COUNT IM_STAT_UPDATE_TIME SCAN_RATE SAMPLE_SIZE LAST_ANALYZED GLO USE STATT STALE_S SCOPE
    -------------- ------------------------------ ---------- ----------- ------------------- --- --- ----- ------- -------
    SOE ASH_HIST TABLE 1409305 19426 0
    0 0 486 0 0
    1409305 15:16:14 07.05.2020 YES NO NO SHARED

    I can confirm that the data is compressed because the compression type of every row is type 8 (HCC QUERY LOW).  See also DBMS_COMPRESSION Compression Types
    WITH x AS (
    select dbms_compression.get_compression_type('SOE', 'ASH_HIST', rowid) ctype
    from soe.ash_hist sample (.1))
    Select ctype, count(*) From x group by ctype;

    CTYPE COUNT(*)
    ---------- ----------
    8 14097
    I can find this SQL Statement in the Performance Hub. 
    INSERT /*+ append enable_parallel_dml */ INTO "SOE"."ASH_HIST" SELECT * FROM COPY$Y2R021UKPJ5F75JCMSKL
    Therefore, the data was queried from the temporary external table into the permanent table, in direct path mode and in parallel.
    I can also look at the OCI Performance Hub and see that mode of the time was spent on CPU.  I can see the SQL_ID of the insert statement and the call to the DBMS_CLOUD procedure.
    I can drill in further to the exact SQL statement.
    When I query the table I get exactly the same data as previously with the external table.
    set autotrace on timi on lines 180 trimspool on
    break on report
    compute sum of ash_secs on report
    column min(sample_time) format a22
    column max(sample_time) format a22
    select event, sum(10) ash_Secs, min(sample_time), max(sample_time)
    from soe.ash_hist
    group by event
    order by ash_Secs desc
    ;

    EVENT ASH_SECS MIN(SAMPLE_TIME) MAX(SAMPLE_TIME)
    ---------------------------------------------------------------- ---------- ---------------------- ----------------------
    10304530 22-MAR-20 09.59.51.125 07-APR-20 23.00.30.395
    direct path read 3258500 22-MAR-20 09.59.51.125 07-APR-20 23.00.30.395
    SQL*Net more data to client 269220 22-MAR-20 10.00.31.205 07-APR-20 22.59.30.275
    direct path write temp 32400 22-MAR-20 11.39.53.996 07-APR-20 21.43.47.329
    gc cr block busy 24930 22-MAR-20 10.51.33.189 07-APR-20 22.56.56.804

    latch free 10 28-MAR-20 20.26.11.307 28-MAR-20 20.26.11.307
    ----------
    sum 14093050

    86 rows selected.

    Elapsed: 00:00:00.62

    I can see that the execution plan is now a single serial full scan of the table.
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 1336681691

    ----------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    ----------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 84 | 1428 | 1848 (9)| 00:00:01 |
    | 1 | SORT ORDER BY | | 84 | 1428 | 1848 (9)| 00:00:01 |
    | 2 | HASH GROUP BY | | 84 | 1428 | 1848 (9)| 00:00:01 |
    | 3 | TABLE ACCESS STORAGE FULL| ASH_HIST | 1409K| 22M| 1753 (4)| 00:00:01 |
    ----------------------------------------------------------------------------------------


    Statistics
    ----------------------------------------------------------
    11 recursive calls
    13 db block gets
    19255 consistent gets
    19247 physical reads
    2436 redo size
    5428 bytes sent via SQL*Net to client
    602 bytes received via SQL*Net from client
    7 SQL*Net roundtrips to/from client
    1 sorts (memory)
    0 sorts (disk)
    86 rows processed

    Oracle 19c: Real-Time Statistics & High-Frequency Statistics Collection

    $
    0
    0
    The video of this recent presentation, given as a part of the Oracle Groundbreakers EMEA Tour 2020, is now available.
    Keeping object statistics up to date is critical to Oracle database performance and stability. Both of these features aim to address the challenge of using data that has been significantly updated before the statistics maintenance window has run again. The features are only available on engineered systems, and so certainly are targetted at the autonomous database.
    • Real-time Statistics augment existing statistics. However, they are not quite as real-time as the name suggests. To keep their implementation lightweight they use the table monitoring mechanism, this limits the information that can be collected.
    • High-Frequency Automatic Optimizer Statistics Collection is effectively a never-ending statistics maintenance window. As your data and statistics change, so there are opportunities for SQL execution plans, and therefore application performance to change. DBAs and developers need to be aware of the implications.

    Oracle 19c: Adventures with Automatic Indexing

    $
    0
    0
    The video of this recent presentation, given as a part of the Oracle Groundbreakers EMEA Tour 2020, is now available.
    Automatic Indexing is one of the much-heralded features of Oracle 19c, but it is only available on Engineered Systems, therefore in Autonomous Database that is built on Exadata and on other Exadata platforms. This presentation shares some initial experiences with the feature based on testing it in conjunction with Swingbench and discusses how well it performed.








    Retrofitting Partitioning into an Existing Application: 1. Introduction

    $
    0
    0
    This post is the first in a series about the partitioning of database objects.
    1. General Ledger reporting: Typical example of partitioning for data warehouse-style queries
    2. Payroll: Avoiding the need for read-consistency in a typical transaction processing system.
    3. Workflow: Separate active and inactive rows, and partial indexing.
  • Conclusion
  • Introduction

    Over the years I have seen and read many presentations and articles on the subject of partitioning database tables and indexes. Most explain how partitioning works. Many explain the options for the developer and discuss how to design your application to be able to make effective use of partitioning.

    However, my experience comes from working with packaged applications or applications that are already in production where all the design decisions have been taken. Often, I am faced with performance or scalability problems, and sometimes I have to consider whether partitioning is an effective option.

    In this series of posts, I am going to look at the thought process behind deciding whether you can retrofit partitioning into an existing application. The task often falls to the DBA but also requires input from application developers and administrators.  I realise that I am going to say many of the same things that you can find in other articles, but I will be approaching them from a slightly different point of view.

    The motivation is always the same: improved performance with, if possible, reduced overheads.
    • The fastest way to do anything is not to do it all.  
    • In general, Oracle inserts data into the first available space.  Any piece of data could be anywhere in a table.  However, partitioning creates a relationship between the physical location of a piece of data and the logical value of that data. This dictates into which partition data is inserted.
    • Thus, the optimizer can discard partitions from a query, without the overhead of scanning them, where it can determine that no data of interest resides.  This is called partition elimination or partition pruning.  
    • If you aren't achieving elimination, then there is probably no benefit to the partitioning.  In fact, it might increase your overheads as you probe every partition.
    The following diagram was taken from the Oracle documentation.  The table has been partitioned into monthly partitions.  If I am looking for March data, then I don't need to inspect the January and February partitions.  However, if the table had not been partitioned I would have to scan the whole segment.  If the query was using a locally partitioned index, then I would only probe the partition for March.

    Whose job is it?

    • Designing partitioning into an application during development is a job for the application architect/developers.
    • Retrofitting partitioning into an existing (or a packaged 3rd party) application usually falls to the DBA.
    In my opinion, in order to be successful, the developers and the DBAs need to work together.  Bear in mind also:
    • Partitioning is a licenced option available on Enterprise Edition only.  That means you have to pay for it.  So, if you are not getting an improvement in performance (or a reduction in resource consumption) then you have to question whether it is worth it.
    • Check your application vendor's support policy.  Sometimes vendors do not support customer partitioning at all (e.g. Oracle's own E-Business Suite).  Or, they may permit it, but it remains the customer's responsibility to support it (e.g. PeopleSoft - yes, also owned by Oracle).
    • There is also an ongoing cost of ownership.  You have to look after your partitioning.  If you partition by date or something that changes over time (like employee ID), then periodically you will add new partitions, and then also possibly compress and/or remove old partitions.  If you rebuild or change a table or index, you have to remember that it is partitioned when you create the DDL.


    Retrofitting Partitioning into an Existing Application: 2. What kinds of partitioning can you do?

    $
    0
    0
    This post is part of a series about the partitioning of database objects.

    1. General Ledger reporting: Typical example of partitioning for data warehouse-style queries
    2. Payroll: Avoiding the need for read-consistency in a typical transaction processing system.
    3. Workflow: Separate active and inactive rows, and partial indexing.
  • Conclusion
  • 1-Dimensional Partitioning

    Oracle supports three forms of partitioning:

    • Range: a non-inclusive upper limit is defined for each partition.  Any row where the partition key value is higher than this limit is placed in a subsequent partition.  Implicitly the minimum value is the upper limit of the preceding partition.
    • List: specific values are placed in specific partitions.
    • Hash: the value of the partitioning key is passed to a hash function.  The output of the hash function determines the partition.
    Interval partitioning is a form of range partitioning where Oracle calculates the partition boundaries mathematically, so you don't have to.   Therefore, it only works with numeric, date and timestamp fields.
    Partitioning TypeDDLUSER_TAB_PARTITIONS
    RangeCREATE TABLE t_r (a NUMBER
    , b NUMBER
    ,CONSTRAINT t_r_pk PRIMARY KEY(a)
    )
    PARTITION BY RANGE (b)
    (PARTITION VALUES LESS THAN (10)
    ,PARTITION VALUES LESS THAN (20)
    ,PARTITION VALUES LESS THAN (MAXVALUE));
    Table Part Partition High       Num
    Name   Pos Name      Value     Rows
    ----- ---- --------- -------- -----
    T_R      1 SYS_P539        10  1000
             2 SYS_P540        20  1000
    ListCREATE TABLE t_l
    (a NUMBER, b NUMBER
    ,CONSTRAINT t_l_pk PRIMARY KEY(a)
    )
    PARTITION BY LIST (b)
    (PARTITION VALUES (1,2,3)
    ,PARTITION VALUES (4,5,6)
    ,PARTITION VALUES (DEFAULT));
    Table Part Partition High       Num
    Name   Pos Name      Value     Rows
    ----- ---- --------- -------- -----
    T_L      1 SYS_P542  1, 2, 3    300
             2 SYS_P543  4, 5, 6    300
             3 SYS_P544  DEFAULT   9400  
    HashCREATE TABLE t_h
    (a NUMBER, b NUMBER
    ,CONSTRAINT t_h_pk PRIMARY KEY(a)
    )
    PARTITION BY HASH (b)
    PARTITIONS 4;
    Table Part Partition High       Num
    Name   Pos Name      Value     Rows
    ----- ---- --------- -------- -----
    T_H      1 SYS_P545            2000
             2 SYS_P546            2900
             3 SYS_P547            2400
             4 SYS_P548            2700
    IntervalCREATE TABLE t_i
    (a NUMBER, b NUMBER
    ,CONSTRAINT t_i_pk PRIMARY KEY(a)
    )
    PARTITION BY RANGE (b)
    INTERVAL (10)
    (PARTITION VALUES LESS THAN (10));
    Table Part Partition High       Num
    Name   Pos Name      Value     Rows
    ----- ---- --------- -------- -----
    T_I      1 SYS_P549        10  1000
             2 SYS_P550        20  1000
             3 SYS_P551        30  1000
             4 SYS_P552        40  1000
             5 SYS_P553        50  1000
             6 SYS_P554        60  1000
             7 SYS_P555        70  1000
             8 SYS_P556        80  1000
             9 SYS_P557        90  1000
            10 SYS_P558       100  1000

    2-Dimensional (Composite) Partitioning

    Oracle can partition in independently on two columns (or groups of columns).  This is called composite partitioning. It is easy to think of this as partitioning in two dimensions.  Again, this diagram is taken from Oracle`s documentation.

    Composite partitioning can mix any form of partitioning with any form of partitioning, except that you cannot interval subpartition.
    Partition
    DDL
    Sub-partitioning type
    RangeListHashInterval
    Partitioning

    Type

    RangeCREATE TABLE t_rr
    (a NUMBER, b NUMBER, c NUMBER
    ,CONSTRAINT t_rr_pk PRIMARY KEY(a))
    PARTITION BY RANGE (b)
    SUBPARTITION BY RANGE (c)
    SUBPARTITION TEMPLATE
    (SUBPARTITION s_10 VALUES LESS THAN (10)
    ,SUBPARTITION s_20 VALUES LESS THAN (20)
    ,SUBPARTITION s_mx VALUES LESS THAN
                                                             (MAXVALUE))
    (PARTITION VALUES LESS THAN (10)
    ,PARTITION VALUES LESS THAN (20)
    ,PARTITION VALUES LESS THAN (MAXVALUE));
    CREATE TABLE t_rl
    (a NUMBER, b NUMBER, c NUMBER
    ,CONSTRAINT t_rl_pk PRIMARY KEY(a))
    PARTITION BY RANGE (b)
    SUBPARTITION BY LIST (c)
    SUBPARTITION TEMPLATE
    (SUBPARTITION s_1 VALUES (1,2,3)
    ,SUBPARTITION s_2 VALUES (4,5,6)
    ,SUBPARTITION s_mx VALUES
                                                (DEFAULT))
    (PARTITION VALUES LESS THAN (10)
    ,PARTITION VALUES LESS THAN (20)
    ,PARTITION VALUES LESS THAN
                                             (MAXVALUE));
    CREATE TABLE t_rh
    (a NUMBER, b NUMBER, c NUMBER
    ,CONSTRAINT t_rh_pk PRIMARY KEY(a))
    PARTITION BY RANGE (b)
    SUBPARTITION BY HASH (c)
    SUBPARTITIONS 4
    (PARTITION VALUES LESS THAN (10)
    ,PARTITION VALUES LESS THAN (20)
    ,PARTITION VALUES LESS THAN
                                              (MAXVALUE)
    );
    ORA-14179: An unsupported partitioning method was specified in this context.
    ListCREATE TABLE t_lr
    (a NUMBER, b NUMBER, c NUMBER
    ,CONSTRAINT t_lr_pk PRIMARY KEY(a))
    PARTITION BY LIST (b)
    SUBPARTITION BY RANGE (c)
    SUBPARTITION TEMPLATE
    (SUBPARTITION s_10 VALUES LESS THAN (10)
    ,SUBPARTITION s_20 VALUES LESS THAN (20)
    ,SUBPARTITION s_mx VALUES LESS THAN
                                                            (MAXVALUE))
    (PARTITION VALUES (1,2,3)
    ,PARTITION VALUES (4,5,6)
    ,PARTITION VALUES (DEFAULT));
    CREATE TABLE t_ll
    (a NUMBER, b NUMBER, c NUMBER
    ,CONSTRAINT t_ll_pk PRIMARY KEY(a))
    PARTITION BY LIST (b)
    SUBPARTITION BY LIST (c)
    SUBPARTITION TEMPLATE
    (SUBPARTITION s_1 VALUES (1,2,3)
    ,SUBPARTITION s_2 VALUES (4,5,6)
    ,SUBPARTITION s_mx VALUES
                                            (DEFAULT))
    (PARTITION VALUES (1,2,3)
    ,PARTITION VALUES (4,5,6)
    ,PARTITION VALUES (DEFAULT));
    CREATE TABLE t_lh
    (a NUMBER, b NUMBER, c NUMBER
    ,CONSTRAINT t_lh_pk PRIMARY KEY(a))
    PARTITION BY LIST (b)
    SUBPARTITION BY HASH (c)
    SUBPARTITIONS 4
    (PARTITION VALUES (1,2,3)
    ,PARTITION VALUES (4,5,6)
    ,PARTITION VALUES (DEFAULT));
    HashCREATE TABLE t_hr
    (a NUMBER, b NUMBER, c NUMBER
    ,CONSTRAINT t_hr_pk PRIMARY KEY(a))
    PARTITION BY HASH (b)
    SUBPARTITION BY RANGE (c)
    SUBPARTITION TEMPLATE
    (SUBPARTITION s_10 VALUES LESS THAN (10)
    ,SUBPARTITION s_20 VALUES LESS THAN (20)
    ,SUBPARTITION s_mx VALUES LESS THAN
                                                            (MAXVALUE))
    PARTITIONS 4;
    CREATE TABLE t_hl
    (a NUMBER, b NUMBER, c NUMBER
    ,CONSTRAINT t_hl_pk PRIMARY KEY(a))
    PARTITION BY HASH (b)
    SUBPARTITION BY LIST (c)
    SUBPARTITION TEMPLATE
    (SUBPARTITION s_1 VALUES (1,2,3)
    ,SUBPARTITION s_2 VALUES (4,5,6)
    ,SUBPARTITION s_mx VALUES
                                            (DEFAULT))
    PARTITIONS 4;
    CREATE TABLE t_hh
    (a NUMBER, b NUMBER, c NUMBER
    ,CONSTRAINT t_hh_pk PRIMARY KEY(a))
    PARTITION BY HASH (b)
    SUBPARTITION BY HASH (c)
    SUBPARTITIONS 4
    PARTITIONS 4;
    IntervalCREATE TABLE t_ir
    (a NUMBER, b NUMBER, c NUMBER
    ,CONSTRAINT t_ir_pk PRIMARY KEY(a))
    PARTITION BY RANGE (b) INTERVAL (10)
    SUBPARTITION BY RANGE (c)
    SUBPARTITION TEMPLATE
    (SUBPARTITION s_10 VALUES LESS THAN (10)
    ,SUBPARTITION s_20 VALUES LESS THAN (20)
    ,SUBPARTITION s_mx VALUES LESS THAN
                                                             (MAXVALUE))
    (PARTITION VALUES LESS THAN (10));
    CREATE TABLE t_il
    (a NUMBER, b NUMBER, c NUMBER
    ,CONSTRAINT t_il_pk PRIMARY KEY(a))
    PARTITION BY RANGE (b) INTERVAL (10)
    SUBPARTITION BY LIST (c)
    SUBPARTITION TEMPLATE
    (SUBPARTITION s_1 VALUES (1,2,3)
    ,SUBPARTITION s_2 VALUES (4,5,6)
    ,SUBPARTITION s_mx VALUES
                                              (DEFAULT))
    (PARTITION VALUES LESS THAN (10));
    CREATE TABLE t_ih
    (a NUMBER, b NUMBER, c NUMBER
    ,CONSTRAINT t_ih_pk PRIMARY KEY(a))
    PARTITION BY RANGE (b)
    INTERVAL (10)
    SUBPARTITION BY HASH (c)
    SUBPARTITIONS 4
    (PARTITION VALUES LESS THAN (10));

    Sub-partition templates simplify the DDL, otherwise, you have to specify the sub-partitions for each partition.  As you do not specify all the partitions when interval partitioning, you effectively have to use templates to sub-partition interval partitions.  Otherwise, the automatically added partitions will not be sub-partitioned.  
    In some cases, involving hash partitioning, the database is sensitive to the order of partition and sub-partitions clauses in the DDL.
    Partitions are given system-generated names unless names are specified.  Explicitly specified interval partitions have to be explicitly named.  Subpartition names in subpartition templates are only used when the partition is explicitly named, otherwise, the subpartition has an entirely system-generated name. 
    It can be helpful to explicitly specify partition and sub-partition names.  It has no impact on performance, but can help administration, e.g. reporting space usage by partition.  It can also be helpful later partitions and dropped, split or merged archiving or ILM. 

    Retrofitting Partitioning into an Existing Application: 3. Scripting & Archiving

    $
    0
    0
    This post is part of a series about the partitioning of database objects.

    1. General Ledger reporting: Typical example of partitioning for data warehouse-style queries
    2. Payroll: Avoiding the need for read-consistency in a typical transaction processing system.
    3. Workflow: Separate active and inactive rows, and partial indexing.
  • Conclusion
  • Scripting

    If you introduce partitioning, you need to look after it.
    • It is common to partition a table by date, or another column that is a proxy for the date, such as an accounting period. Often that implies a regular but not relatively infrequent maintenance activity, perhaps only annually. 
    • You are likely to have to add, remove and possibly compress partitions. There may be groups of tables that have to be similarly partitioned.  You can easily end up in a hellish world of manual scripting.
    • This can make interval partitioning attractive because Oracle automatically creates the partitions on demand. However, you are still responsible for any subsequent compressing and purging
    • Interval partitions (other than the ones you explicitly specify, and the whole point is that you need only specify the first one in the range) will be given system-generated names. 
    • On the other hand, explicit partition names, with a consistent naming convention, can be very helpful when you come to partition-wise operations during archive/purge operations. 
    • If you are going to manage partition DDL scripts manually, then you need strict version control. 
    For PeopleSoft, I created a utility to generate partition DDL from the PeopleSoft metadata. It was only worth my while doing this because I was solving the same challenge with partitioning different PeopleSoft product at many different customers. It is unlikely that you will be willing to put the investment into that sort of utility for a single implementation of an application.
    • Manual scripting opens the possibility for manual errors to creep it.
    • Generating DDL guarantees a degree of consistency.

    Archiving

    Never let the archiving tail wag the performance dog.  
    You pay for and implement partitioning for the benefit of the application users.  Archiving is frequently done for much the same reasons.  Partitioning can make archiving much easier if you can archive whole partitions at a time.  However, making the archiving experience better is not the same as making the user experience better.
    Where you have partitioned by time, it is frequently the case that you can also archive by time, and you have a rolling window of partitions that you both add, remove, and sometimes compress and possibly merge on a regular basis.  It may be that the partitioning design that is best for application performance will also lend itself to partition-wise archive.  Partition-wise archiving is attractive, but not at the expense of application performance.

    In the next posts, I will look at some real-life examples of how partitioning was introduced into an application.

    Retrofitting Partitioning into Existing Applications: Example 1. General Ledger

    $
    0
    0
    This post is part of a series about the partitioning of database objects.

    1. General Ledger reporting: Typical example of partitioning for data warehouse-style queries
    2. Payroll: Avoiding the need for read-consistency in a typical transaction processing system.
    3. Workflow: Separate active and inactive rows, and partial indexing.
  • Conclusion
  • If you were designing an application to use partitioning, you would write the code to reference the column by which the data was partitioned so that the database does partition elimination.  However, with a pre-existing or 3rd party application you have to look at how the application queries the data and match the partitioning to that.
    I am going to look at a number of cases from real-life, and discuss the thought process behind partitioning decisions.  These examples happen to come from PeopleSoft ERP systems, but that does not make them unusual.  PeopleSoft is just another packaged application.  In every case, it is necessary to have some application knowledge when deciding whether and how to introduce partitioning.

    General Ledger

    GL is an example of where OLTP and DW activities clash on the same table.  GL is a data warehouse of transactional information about a business.  The rationale for partitioning ledger data is a very typical example of partitioning for SQL query performance.
    DimensionsAttributes
    BUSINESS_UNIT
    LEDGER
    ACCOUNT
    DEPTID
    OPERATING_UNIT
    PRODUCT
    AFFILIATE
    CHARTFIELD1/2/3
    PROJECT_ID
    BOOK_CODE
    FISCAL_YEAR/ACCOUNTING_PERIOD
    CURRENCY_CD/BASE_CURRENCY
    …and others
    POSTED_TOTAL_AMT
    POSTED_BASE_AMT
    POSTED_TRANS_AMT

    You can think of it as a star-schema.  The ledger table is the fact table.  Dimensions are generated from standing data in the application. The reports typically slice and dice that data by time, and various dimensions.  The exact dimensions vary from business to business, and from time to time. 

    In PeopleSoft, you can optionally configure summary ledger tables that are pre-aggregations of ledger data by a limited set of dimensions.  These are generated by batch processes.  However, it is not a commonly used feature, as it introduces latency between a change being made, and not being able to report on it from the summary ledgers until the refresh process has run.
    Business transactions post continuously to the ledger.  Meanwhile, the accountants also want to query ledger data.  Especially at month-end, they want to post adjustments and see the consequences immediately.
    Here is a typical query from the PeopleSoft GL Reporting tool (nVision).  The queries vary widely, but some elements (in bold) are always present.
    SELECT L.TREE_NODE_NUM,L2.TREE_NODE_NUM,SUM(A.POSTED_TOTAL_AMT)
    FROM PS_LEDGER A
    , PSTREESELECT05 L1
    , PSTREESELECT10 L
    , PSTREESELECT10 L2
    WHERE A.LEDGER='ACTUALS'
    AND A.FISCAL_YEAR=2020
    AND A.ACCOUNTING_PERIOD BETWEEN 1 AND 11
    AND L1.SELECTOR_NUM=30982 AND A.BUSINESS_UNIT=L1.RANGE_FROM_05
    AND L.SELECTOR_NUM=30985 AND A.CHARTFIELD1=L.RANGE_FROM_10
    AND L2.SELECTOR_NUM=30984 AND A.ACCOUNT=L2.RANGE_FROM_10
    AND A.CURRENCY_CD='GBP'
    GROUP BY L.TREE_NODE_NUM,L2.TREE_NODE_NUM
    • Queries are always on a particular ledger or group of ledgers.
      • You can have different ledgers for different accounting standards or reporting requirements.
      • Sometimes you can have adjustment ledgers – that are usually much smaller than the actuals ledgers – and they are aggregated with the main ledger.
      • In the latest version of the application, the budget ledger can be stored in the same table rather than a separate table.  Budget data has a different shape to actuals data and is created up to a year earlier.  It is generally much smaller and has a different usage profile.
      • So, there is always an equality criterion or IN-list criterion on LEDGER
    • Queries are always for a particular fiscal year.  This year, last year, sometimes the year before.  Therefore, there is always an equality criterion on FISCAL_YEAR.
    • Queries may be for a particular period, in which case there is a single-period equality criterion.  Alternatively, they are for the year-to-date, in which case there is a BETWEEN 1 AND current period criterion.  Sometimes for a particular quarter.  It is common to see queries on the same year-to-date period in the previous fiscal year.
    • Queries always specify the reporting currency.  Therefore, there is always a criterion on CURRENCY_CD, although many multi-national customers only have single currency ledgers, so the criterion may not be selective.
    • There will be varying criteria on other dimension columns on LEDGER by joining to the PSTREESELECT dimension tables.

    What should I partition by?

    We have seen the shape of the SQL, we know which columns are candidate partitioning keys because we have seen which columns have criteria.  LEDGER is a candidate
                                      Cum.
    LEDGER NUM_ROWS % %
    ---------- ------------- ------ ------
    XXXXCORE 759,496,900 43.9 43.9
    CORE 533,320,425 30.8 74.7
    XXXXGAAP 152,563,325 8.8 83.5
    GAAP_ADJ 74,371,775 4.3 87.8
    ZZZZ_CORE 34,251,514 2.0 89.8
    C_XXCORE 29,569,381 1.7 91.5

    -------------
    sum 1,731,153,467
    FISCAL_YEAR is an obvious choice.  
        Fiscal 
    Year NUM_ROWS %
    ---------- ------------- ------
    2016 121
    2017 32
    2018 510,168,673 29.5
    2019 574,615,980 33.2
    2020 646,336,579 37.3
    2021 32,082
    -------------
    sum 1,731,153,467
    Most companies have monthly accounting periods (although some use other frequencies).  Then we have 12 accounting periods, plus bought forward (0), carry forward (998), and adjustments (999).
        Fiscal Accounting                     Cum.
    Year Period NUM_ROWS % %
    ---------- ---------- ---------- ------ ------

    2020 0 66237947 3.8 37.3
    1 42865339 2.5 33.5
    2 47042492 2.7 31.0
    3 53680915 3.1 28.3
    4 50113011 2.9 25.2
    5 44700409 2.6 22.3
    6 54983221 3.2 19.7
    7 51982401 3.0 16.6
    8 44851506 2.6 13.6
    9 56528783 3.3 11.0
    10 52266343 3.0 7.7
    11 70541810 4.1 4.7
    12 10542380 .6 .6
    999 22 .0 .0
    ********** ----------
    sum 646336579
    CURRENCY_CD is usually not a candidate for most companies because they report in a single currency, so all the rows are the same currency.  But even then, each ledger is a particular currency.  It is usually more effective to partition by LEDGER.
    It is very tempting to interval partition on FISCAL_YEAR and then range or list sub-partition on ACCOUNTING PERIOD into 14 partitions each year.  Then Oracle will automatically add the range partitions for each FISCAL_YEAR.
    CREATE TABLE ps_ledger (...)
    PARTITION BY RANGE (fiscal_year) INTERVAL (1)
    SUBPARTITION BY RANGE (accounting_period)
    SUBPARTITION TEMPLATE
    (SUBPARTITION p00 VALUES LESS THAN (1)
    ,SUBPARTITION p01 VALUES LESS THAN (2)
    ...
    ,SUBPARTITION p12 VALUES LESS THAN (13)
    ,SUBPARTITION pxx VALUES LESS THAN (MAXVALUE))
    (PARTITION VALUES LESS THAN (2019));
    However, I would counsel against this.  You can only partition in two dimensions, and LEDGER is a very attractive column.
    Instead, you can partition in one dimension on the combination of two (or more) columns.  I would range partition on the combination of FISCAL_YEAR and ACCOUNTING_PERIOD.
    CREATE TABLE ps_ledger (...)
    PARTITION BY RANGE (fiscal_year,accounting_period)
    (PARTITION ledger_2017 VALUES LESS THAN (2018,0)
    ,PARTITION ledger_2018_bf VALUES LESS THAN (2018,1)
    ,PARTITION ledger_2018_p01 VALUES LESS THAN (2018,2)

    ,PARTITION ledger_2021_cf VALUES LESS THAN (2022,0)
    );
    • The application never uses ACCOUNTING_PERIOD without also using FISCAL_YEAR.  Sometimes it uses FISCAL_YEAR without ACCOUNTING_PERIOD.
    • Partition elimination does work with multi-column partitions.
      • If you only specify a criterion on FISCAL_YEAR in a query you will still get partition elimination.
      • If you only specify a criterion on ACCOUNTING_PERIOD only you will not get partition elimination.
    • You cannot interval partition on multiple columns.  Therefore, you have to manage the annual addition of new partitions yourself.
    • Also, you cannot get partition change tracking for materialized view refresh to work with multi-column partitioning.
    • This leaves sub-partitioning to be used on a different column.

    Should I create a MAXVALUE partition?

    • I deliberately haven't specified a MAXVALUE partition.  There are arguments for and against this.
      • The argument against MAXVALUE it is that you might forget to add the new partition for the new year, and then all the data for the next fiscal year goes into the same partition and over time the performance of the reports gradually decay.  By the time the performance issue is diagnosed several months may have piled up.  Then you need to split the partition into several partitions (or exchange it out, add the new partitions, and reinsert the data).  So not having a MAXVALUE partition forces the annual maintenance activity to be put in the diary, otherwise, the application will error when it tries to insert data for a FISCAL_YEAR for which there is currently no partition.
        • Now budget data is kept in the LEDGER table, you have do this before the budget ledger data is created, which is up to a year ahead of actuals data, so the risk of business interruption is minimal.
      • In favour of a MAXVALUE partition is that it prevents the error from occurring, but risks forgetting or deferring the maintenance for operational reasons.  
      • Of course, a MAXVALUE partition can be added at any time!

    Should I Sub-partition?

    It depends on the data.
    • The ledger table is a big table, and the LEDGER column is usually a selective low cardinality column.  So, it is a good candidate for sub-partitioning.  A single value list sub-partition for each of the largest actuals and budget ledgers, a default sub-partition for all other values.
    • This is not the case in summary ledger tables that are usually built on a single ledger.  So they are usually range partitioned on FISCAL_YEAR, ACCOUTING_PERIOD, and can then be sub-partitioned on a different dimension column
    You can use a template if you want the same sub-partitions for every accounting period.
    If you use interval partitioning, you have to use a subpartition template if you want to composite partition.
    CREATE TABLE ps_ledger (…)
    PARTITION BY RANGE (fiscal_year,accounting_period) INTERVAL (1)
    SUBPARTITION BY LIST (ledger)
    SUBPARTITION TEMPLATE
    (SUBPARTITION l_xxx VALUES LESS THAN ('XXX')
    ,SUBPARTITION l_yyy VALUES LESS THAN ('YYY')

    ,SUBPARTITION VALUES (DEFAULT))
    (PARTITION VALUES LESS THAN (2019));
    Sometimes, companies change their use of ledgers, in which case the sub-partitions need to reflect that.  You can still use the template to specify whatever is the currently required sub-partitioning.  If you ever recreate the table you end up explicitly specifying sub-partitions for every other partition.  The DDL becomes very verbose.  Although with deferred segment creation it wouldn't really matter if you had empty sub-partitions that had not been physically created for accounting periods where a ledger was not used.  
    However, if I want to specify different tablespaces, no free space allowance, compression etc on certain partitions, then I need to use explicit partition and subpartition clauses, or come along afterwards and alter and rebuild them.  
    I think explicit partition and subpartition names are administratively helpful when it comes to reporting on partition space usage, and when you archive/purge data by exchanging or dropping a partition.
    CREATE TABLE ps_ledger (…)
    PARTITION BY RANGE (fiscal_year,accounting_period)
    SUBPARTITION BY LIST (ledger)
    (PARTITION ledger_2018 VALUES LESS THAN (2019,0) PCTFREE 0 COMPRESS
    (SUBPARTITION ledger_2018_xxx VALUES ('XXX')
    ,SUBPARTITION ledger_2018_yyy VALUES ('YYY')
    ,SUBPARTITION ledger_2018_z_others VALUES (DEFAULT)
    )
    ,PARTITION ledger_2019_bf VALUES LESS THAN (2019,1) PCTFREE 0 COMPRESS
    (SUBPARTITION ledger_2019_bf_xxx VALUES ('XXX')
    ,SUBPARTITION ledger_2019_bf_yyy VALUES ('YYY')
    ,SUBPARTITION ledger_2019_bf_z_others VALUES (DEFAULT)
    )

    ;

    Indexing

    Indexes can be partitioned or not independently of the table.  
    • Local indexes are partitioned in the same way as the table they are built on.  Therefore, there is a 1:1 relationship of table partition/sub-partition to index partition/sub-partition.
    • Global indexes are not partitioned the same way as the table.  You can have
      • Global partitioned indexes
      • Global non-partitioned indexes
    Local indexes are easier to build and maintain.  When you do a partition operation on a table partition (add, drop, merge, split or truncate) the same operation is applied to local indexes.  However, if you do an operation on a table partition, any global index will become unusable, unless the DDL is done with the UPDATE INDEXES clause.  Using this option, when you drop a partition, all the corresponding rows are deleted from the index.  The benefit is that the indexes do not become unusable (in which case they would have to be rebuilt), but dropping the table partition takes longer because the rows have to be deleted from the index (effectively a DML operation).
    As a general rule, indexes that contain the partitioning key, and at least the first partitioning key column is near the front of the index (I usually reckon in the first three key columns), should be locally partitioned unless there is a reason not to. 
    With the general ledger, I tend to create pairs of local indexes that match the reporting analysis criteria.  
    • One of each of the pair of indexes leads on LEDGER, FISCAL_YEAR, ACCOUNTING_PERIOD and then the other dimension columns.  This supports single period queries.
    • The other index leads on LEDGER, FISCAL_YEAR, then the other dimension columns and finally ACCOUNTING_PERIOD is last because we are interested in a range of periods.
    To support single period queriesTo support year-to-date queries
    CREATE INDEX psgledger ON ps_ledger
    (ledger
    ,fiscal_year
    ,accounting_period
    ,business_unit
    ,account
    ,project_id
    ,book_code
    )
    LOCAL
    CREATE INDEX pshledger ON ps_ledger
    (ledger
    ,fiscal_year
    ,business_unit
    ,account
    ,project_id
    ,book_code
    ,accounting_period
    )
    LOCAL
    The unique index on the ledger table does include the partitioning keys.  But FISCAL_YEAR and ACCOUNTING_PERIOD are the last 2 of 25 columns.  This index is really to support queries from the on-line application and batch processes that post to the ledger.  So a query on BUSINESS_UNIT would have to prove every partition.  Therefore, I generally don't partition this index. It would be reasonable to globally partition it on LEDGER only.
    CREATE UNIQUE INDEX ps_ledger ON ps_ledger
    (business_unit,ledger,account,altacct,deptid
    ,operating_unit,product,fund_code,class_fld,program_code
    ,budget_ref,affiliate,affiliate_intra1,affiliate_intra2,chartfield1
    ,chartfield2,chartfield3,project_id,book_code,gl_adjust_type
    ,date_code,currency_cd,statistics_code,fiscal_year,accounting_period
    )…

    Archiving

    Taken together, FISCAL_YEAR and ACCOUNTING_PERIODY are effectively a proxy for the date of the accounting period.  So we will add partitions and can compress and later drop them after a period of time.
    Once an accounting period has been closed it will not be written to again (or at least not much and not often), so it can then be compressed.  It can't be compressed before because the application is still applying ordinary DML (unless the Advanced Compression option has been licenced).  This applies to both conventional dictionary compression and Hybrid Columnar Compression on Exadata.
    Most reports are on current and previous fiscal years.  Earlier years are candidates to be purged or archived by dropping or exchanging partitions.  When partitions are dropped, because you have global indexes, this should be with the UPDATE ALL INDEXES clause
    ALTER TABLE ps_ledger DROP PARTITION ledger_2017 UPDATE INDEXES;

    Retrofitting Partitioning into Existing Applications: Example 2. Payroll

    $
    0
    0
    This post is part of a series about the partitioning of database objects.

    1. General Ledger reporting: Typical example of partitioning for data warehouse-style queries
    2. Payroll: Avoiding the need for read-consistency in a typical transaction processing system.
    3. Workflow: Separate active and inactive rows, and partial indexing.
  • Conclusion
  • Partitioning Payroll

    Range and List Partitioning brings similar data together and therefore keeps dissimilar data apart. This has implications for read-consistency as well as improving query performance by partition elimination. 
    Hash Partitioning spreads rows roughly evenly across a number of partitions. This can be used to mitigate contention problems. It is recommended that the number of hash partitions should be an integral power of 2 (ie. 2, 4, 8, 16 etc.) because the partition is taken from a number of bits from the hash value, and the distribution of data across the partitions works better. 
    Payroll calculation involves lots of computation per employee. There isn't much opportunity for database parallelism. The PeopleSoft Global Payroll (GP) calculation process works through employees in a sequential fashion. Each payroll process only consumes a single processor at any one time. In order to bring more resources to bear on the payroll and therefore process it is less time, multiple payroll calculation processes are run concurrently, each one working on a distinct set of data. In GP, the sets of data are ranges of employee IDs. Each set is called a 'stream'. The payroll processes are then configured to process a specific stream. Most of the SQLs therefore have EMPLID BETWEEN criteria.
    DELETE /*GPPCANCL_D_ERNDALL*/ 
    FROM PS_GP_RSLT_ERN_DED
    WHERE EMPLID BETWEEN :1 AND :2
    AND CAL_RUN_ID=:3
    Typically, large companies run payroll calculation processes several times per pay period. Partly to see what the payroll value is in advance, and partly to see the effect of changes before the final payroll that is actually used to pay employees. Each concurrent payroll calculation process inserts data into result tables, also concurrently. So it is common for data blocks in result tables to contain data from many different streams. When the payroll is recalculated, results from previous payrolls are deleted (by statements such as the one above), also concurrently. You now have different transactions deleting different rows from the same data block. There is never any row-level locking because each row is only in scope for one and only one process. However, each delete transaction comes from a different process that created a different database session, that started at a slightly different time and therefore has a different System Change/Commit Number (SCN). Therefore, each payroll process needs its own read-consistent version of every data block that it reads, recovered back to its own SCN. So if I have 10 streams, I am likely to need 10 copies of every data block of every payroll-related table in the buffer cache. 
     The result is that the payroll runtime degrades very significantly with the number of concurrent processes to the extent that it quickly becomes worse than running a single process because the database
    1. spends a huge amount of time on read-consistency,
    2. is more likely to run out of buffer cache, so blocks are aged out, reloaded, and may have to be recovered back to the desired SCN again. 
    However, if one can align the partitioning with the processing, then this behaviour can be eliminated. If the payroll result tables (and some of the other application tables) are each range partitioned on EMPLID such that there is a 1:1 relationship of payroll stream to partition, then this problem does not occur because each stream references a single partition of each table, and each data block will only contain rows for one stream and so can only ever have a single transaction.  Thus there is no requirement to produce a consistent version of a block. The database only needs a single copy of each data block in memory. The result is almost 100% scalability of payroll processing until eventually, the file system cannot cope with the redo generation. 
    This approach relies absolutely on the application processing ranges of employees specified with the between criteria, and that criteria mapping to one partition. When implemented the result is single range partition queries.
    WHERE EMPLID BETWEEN :1 AND :2 
    Both partitioning and application configuration had to change and meet somewhere in the middle. The number of streams is limited by the hardware, usually the number of CPUs. The streams are calculated to be of a size such that all of them take about the same time to process (this is not the same as them being the same number of employees). It is necessary to allow for new employees being given new sequential employee IDs. Therefore there is also a need to periodically rebalance the streams as employees are hired and leave. This becomes an annual process that is combined with archiving. 
    Some customers have avoided the annual rebalancing by reversing the sequentially generated employee ID before it is used, but you have to do this when the system is first implemented and only if new employee IDs can be allocated. 
    However, this technique depends upon the application. When I looked at PeopleSoft's North American Payroll (which is a completely different product) this approach did not work. It does use multiple concurrent processes, but the employees are group logically by other business attributes. We still see the read-consistency problems, but we can't resolve them with range partitioning. So you see that understanding both partitioning and the application is essential. 

    Sub-partitioning 

    The results of each pay period accumulate overtime. In GP, each pay period has a calendar ID. It is a character string, defined in the application. So the larger payroll result tables can be sub-partitioned on CAL_RUN_ID. 
    When I first worked on Global Payroll it was often run on Oracle 8i, where we only had hash sub-partitioning. I can use dbms_utility.get_hash_value() to predict which hash partition a string value falls into (see also http://www.jlcomp.demon.co.uk/2d_parts.html from 1999). I could therefore adjust the calendar ID values to manipulate which partition they fall into. 
    Today, I list sub-partition the tables on CAL_RUN_ID. Most companies create and follow a naming convention for their calendar IDs, so the list sub-partitions can be created in advance, and it is simply a matter listing the calendar(s) that go into each partition. In some cases, for large companies, I have created a list sub-partition for each pay period.
    Viewing all 105 articles
    Browse latest View live


    Latest Images