Retrofitting Partitioning into Existing Applications: Example 3. Workflow: Separate Active and Inactive Rows, and Partial Indexing.

November 17, 2020, 5:42 am

≫ Next: Retrofitting Partitioning into Existing Applications: Conclusion

≪ Previous: Retrofitting Partitioning into Existing Applications: Example 2. Payroll

This post is part of a series about the partitioning of database objects.

Introduction

What kind of partitioning can you do?

Scripting and Archiving

Examples from Real Life

General Ledger reporting: Typical example of partitioning for data warehouse-style queries
Payroll: Avoiding the need for read-consistency in a typical transaction processing system.
Workflow: Separate active and inactive rows, and partial indexing.

Conclusion

Workflow

Workflow is an example of a case where you have a roughly constant volume of active data and an increasing quality of historical inactive data that builds up until such time as it is archived. Workflow requests are created, worked and closed.

The PeopleSoft workflow table has four statuses.

INSTSTATUS	Description
0	Available
1	Selected
2	Worked
3	Cancelled

Over time the majority of rows in the table end up with status 2 as they are worked and closed, and a few end up being cancelled. These rows are now inactive. All the workflow activity focuses on statuses 0 and 1. Partitioning can be used to separate the active rows from the inactive. I chose to range partition the worklist table by status, creating a partition of active worklist rows where the status is less than 2, and a partition of inactive rows where status is greater than or equal to 2. I could also have used list partitioning to create the same effect.

CREATE TABLE PSWORKLIST (
   BUSPROCNAME VARCHAR2(30) NOT NULL,
   ACTIVITYNAME VARCHAR2(30) NOT NULL,
   EVENTNAME VARCHAR2(30) NOT NULL,
   WORKLISTNAME VARCHAR2(30) NOT NULL,
…
   OPRID VARCHAR2(30) NOT NULL,
…
   INSTSTATUS SMALLINT NOT NULL,
…
)
PARTITION BY RANGE (INSTSTATUS) 
(
 PARTITION WL_OPEN VALUES LESS THAN (2) PCTFREE 20, 
 PARTITION WL_CLOSED VALUES LESS THAN (MAXVALUE) PCTFREE 0
) 
ENABLE ROW MOVEMENT
/

Operators query their worklist queue by their operator ID and the open request status; therefore there is an index to support this query. This index can be locally partitioned, i.e. on INSTSTATUS.

The optimizer prunes the partition containing closed worklist requests because it knows the open requests can't be found there, and only queries the open partition.

The open partition remains small because as worklist rows are updated to the closed status they are moved to the closed partition. Therefore, row movement must be enabled on the table. Thus, queries for open worklist requests remain small more efficient.

There is an additional overhead of moving the rows between partitions as the status is updated to closed, but this is outweighed by the savings of only looking for open records in the open partition.

Additional free space is specified on the open partition because that is where all the application update activity occurs. Conversely, no free space is required for the closed partition because after the rows move there, they are not updated until they are purged.

From Oracle 12c, it is also possible to partially index a partitioned table. You can choose to build specific partitions in a local index by marking indexing on or off on the table partitions. In this example, it is only necessary to index the open workflow records. The application will never query the closed ones by operator ID, so indexing can be disabled on the closed partition. Thus saving space and index management overhead.

ALTER TABLE PSWORKLIST MODIFY PARTITION WL_OPEN INDEXING ON;
ALTER TABLE PSWORKLIST MODIFY PARTITION WL_CLOSED INDEXING OFF;

CREATE INDEX PSBPSWORKLIST ON PSWORKLIST (OPRID, INSTSTATUS) 
LOCAL 
INDEXING PARTIAL
/

Here is my worklist table with two partitions, and some sample data. You can see over 90% of the rows are in the closed partition.

SELECT table_name, partition_name, num_rows, blocks
FROM   dba_tab_statistics
WHERE  table_name = 'PSWORKLIST'
ORDER BY partition_name nulls first
/
TABLE_NAME         PARTITION_NAME                   NUM_ROWS     BLOCKS
------------------ ------------------------------ ---------- ----------
PSWORKLIST                                            100000       2711
PSWORKLIST         WL_CLOSED                           90742       2430
PSWORKLIST         WL_OPEN                              9258        281

There is a notional entry in DBA_IND_STATISTICS for the index on the closed partition, but it says that it holds no rows and consumes no blocks. The index-level statistics for index PSBPSWORKLIST are an estimate of the total for the index if all partitions were indexed (although in fact, the B-tree level would actually still have been 1 if I had built the index in my test case).

SELECT index_name, partition_name, num_rows, blevel, leaf_blocks
FROM   dba_ind_statistics
WHERE  table_name = 'PSWORKLIST'
ORDER BY index_name, partition_name nulls first
/
INDEX_NAME         PARTITION_NAME                   NUM_ROWS     BLEVEL LEAF_BLOCKS
------------------ ------------------------------ ---------- ---------- -----------
PSBPSWORKLIST                                         100000          2         318
                   WL_CLOSED                               0          0           0
                   WL_OPEN                              9258          1          27

PS_PSWORKLIST                                         100000          2         814

The index segment for the closed partition does not physically exist, so it is not reported in DBA_SEGMENTS.

SELECT segment_type, segment_name, partition_name, blocks
FROM   dba_segments
WHERE  segment_name like 'PS_PSWORKLIST'
ORDER BY segment_name
/
SEGMENT_TYPE       SEGMENT_NAME                   PARTITION_NAME                     BLOCKS
------------------ ------------------------------ ------------------------------ ----------
INDEX PARTITION    PSBPSWORKLIST                  WL_OPEN                               128
INDEX              PS_PSWORKLIST                                                        896

When active working requests are queried, the index may be used.

SELECT * FROM psworklist WHERE oprid = 'OPRID042' AND inststatus IN(1);

Plan hash value: 3105966310
----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name          | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |               |     9 |  1953 |    11   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE                    |               |     9 |  1953 |    11   (0)| 00:00:01 |     1 |     1 |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| PSWORKLIST    |     9 |  1953 |    11   (0)| 00:00:01 |     1 |     1 |
|*  3 |    INDEX RANGE SCAN                        | PSBPSWORKLIST |     9 |       |     1   (0)| 00:00:01 |     1 |     1 |
----------------------------------------------------------------------------------------------------------------------------

However, the optimizer may still judge that it is easier to full scan the small table partition.

SELECT * FROM psworklist WHERE oprid = 'OPRID042' AND inststatus IN(0,1);

Plan hash value: 1913856494
-----------------------------------------------------------------------------------------------------
| Id  | Operation              | Name       | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |            |     9 |  1953 |    86   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE INLIST|            |     9 |  1953 |    86   (0)| 00:00:01 |KEY(I) |KEY(I) |
|*  2 |   TABLE ACCESS FULL    | PSWORKLIST |     9 |  1953 |    86   (0)| 00:00:01 |KEY(I) |KEY(I) |
-----------------------------------------------------------------------------------------------------

A query on closed requests can only full scan the unindexed partition.

SELECT * FROM psworklist WHERE oprid = 'OPRID042' AND inststatus = 2;

Plan hash value: 597831193
-----------------------------------------------------------------------------------------------------
| Id  | Operation              | Name       | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |            |    86 | 18662 |   718   (1)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE|            |    86 | 18662 |   718   (1)| 00:00:01 |     2 |     2 |
|*  2 |   TABLE ACCESS FULL    | PSWORKLIST |    86 | 18662 |   718   (1)| 00:00:01 |     2 |     2 |
-----------------------------------------------------------------------------------------------------

A query across both partitions may choose to use the index where it is available and full scan where it is not.

SELECT * FROM psworklist WHERE oprid = 'OPRID042';

Plan hash value: 3927567812
------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name          | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |               |   101 | 21917 |   730   (1)| 00:00:01 |       |       |
|   1 |  VIEW                                        | VW_TE_2       |   101 | 58580 |   730   (1)| 00:00:01 |       |       |
|   2 |   UNION-ALL                                  |               |       |       |            |          |       |       |
|   3 |    PARTITION RANGE SINGLE                    |               |    10 |  2170 |    12   (0)| 00:00:01 |     1 |     1 |
|   4 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| PSWORKLIST    |    10 |  2170 |    12   (0)| 00:00:01 |     1 |     1 |
|*  5 |      INDEX RANGE SCAN                        | PSBPSWORKLIST |    10 |       |     2   (0)| 00:00:01 |     1 |     1 |
|   6 |    PARTITION RANGE SINGLE                    |               |    91 | 19747 |   718   (1)| 00:00:01 |     2 |     2 |
|*  7 |     TABLE ACCESS FULL                        | PSWORKLIST    |    91 | 19747 |   718   (1)| 00:00:01 |     2 |     2 |
------------------------------------------------------------------------------------------------------------------------------

In this case, I cannot partition the unique index because the partitioning column does not appear in it. So that must remain a global non-partitioned index.

CREATE UNIQUE  INDEX PS_PSWORKLIST ON PSWORKLIST (BUSPROCNAME,
   ACTIVITYNAME,
   EVENTNAME,
   WORKLISTNAME,
   INSTANCEID) 
/

See also PeopleSoft DBA Blog: PeopleTools 8.54: Table/Index Partitioning

Conclusion

Make sure you understand what your application is doing.
Match the partitioning to the way the application accesses data so that the application queries prune partitions. Even if that means that it is harder to archive data.
If you are not getting partition elimination, you probably should not be partitioning.
Range and list partitioning keep similar data values together, so it follows that dissimilar data values are kept apart in different segments. That can avoid I/O during scans, but if it keeps transactions apart it can also avoid read consistency.
Hash partitioning spreads data out across segments and can be used to avoid some forms of contention.
Partitioning can separate data with different usage profiles, such as active rows from inactive rows. They might then have different indexing requirements.
Partial indexing of partitioned tables allows you to choose which partitions should be built in a locally partitioned index.

↧

Retrofitting Partitioning into Existing Applications: Conclusion

November 18, 2020, 4:11 am

≫ Next: Oracle 12.2: New Statistic Preference PREFERENCE_OVERRIDES_PARAMETER

≪ Previous: Retrofitting Partitioning into Existing Applications: Example 3. Workflow: Separate Active and Inactive Rows, and Partial Indexing.

This is the last post in a series about the partitioning of database objects.

Introduction

What kind of partitioning can you do?

Scripting and Archiving

Examples from Real Life

General Ledger reporting: Typical example of partitioning for data warehouse-style queries
Payroll: Avoiding the need for read-consistency in a typical transaction processing system.
Workflow: Separate active and inactive rows, and partial indexing.

Conclusion

The decisions you make when you introduce partitioning into an existing application are similar to those when you design an application and partitioning together. The essential difference is that by the time it has been built you probably can't make changes to the application. So partitioning either works with the application as it is or it doesn't.

Make sure you understand what your application is doing.
Match the partitioning to the way the application accesses data so that the application queries prune partitions. Even if that means that it is harder to archive data later on.
If you are not getting partition elimination, you probably should not be partitioning.
Range and list partitioning keep similar data values together, so it follows that dissimilar data values are kept apart in different segments. That can avoid I/O during scans, but if it keeps transactions apart it can also avoid read consistency.
Hash partitioning spreads data out across segments and can be used to avoid some forms of contention.
Partitioning can separate data with different usage profiles, such as active rows from inactive rows. They might then have different indexing requirements.
Partial indexing of partitioned tables allows you to choose which partitions should be built in a locally partitioned index.

↧

Oracle 12.2: New Statistic Preference PREFERENCE_OVERRIDES_PARAMETER

November 20, 2020, 5:58 am

≫ Next: Partition Change Tracking During Materialized View Refresh and Query Rewrite

≪ Previous: Retrofitting Partitioning into Existing Applications: Conclusion

Introduction

There is a new statistic preference, PREFERENCE_OVERRIDES_PARAMETER available from Oracle 12.2. It allows the DBA to override any parameters specified when gathering statistics in favour of any statistics preferences that are defined. This new preference can be specified at database-level or at table-level, or both.

From the introduction of the cost-based optimizer in Oracle 7, we all had to write scripts to collect statistics. The introduction of the maintenance window in Oracle 10g was supposed to supersede that with a regularly scheduled maintenance window. It still is not uncommon to find systems that rely on custom scripts that collect object statistics. Sometimes, commands to collect statistics are embedded in applications.

It remains perfectly reasonable to choose to collect statistics on certain objects at the exact times when it is most appropriate. For example: just after a table has been populated, or perhaps refresh the statistics on a very large table at a quiet time.

Since 11g, Oracle has provided global and table statistics preferences to specify how statistics are to be collected. This declarative method is generally recommended instead of specifying parameter on calls to DBMS_STATS. The main advantage is consistency. When statistics are collected on a table, they will always be collected in the same way, including during the maintenance window job. However, if scripts and programs still specify parameters when they call DBMS_STATS, those parameters will override the preferences.

Two scenarios in which enabling PREFERENCE_OVERRIDES_PARAMETER would be advantageous come immediately to mind.

The hash-based algorithm, introduced in 12c, to calculate the number of distinct values on a column only applies if the ESTIMATE_PERCENT parameter is AUTO_SAMPLE_SIZE, which is the default (see How does AUTO_SAMPLE_SIZE work in Oracle Database 12c? by Nigel Bayliss). This new algorithm produces more accurate answers than even a large sample size, and much more quickly because there is no need to sort the sampled data for each column. Therefore, ESTIMATE_PERCENT should no longer be specified.
I have found applications that specify a METHOD_OPT parameter to collect histograms that do not make sense. For example, PeopleSoft used to use FOR ALL INDEXED COLUMNS SIZE 1 in the code that collected statistics. That would collect column minimum and maximum column values only on index columns, but the column statistics on any unindexed columns simply would not be updated. If scripts collect histograms that should be retained, then those METHOD_OPTs should be defined as a table preference.

See also Overriding DBMS_STATS Parameter Settings by Maria Colgan

Demonstration

The default value of PREFERENCE_OVERRIDES_PARAMETER is false. The default is the status quo, parameters override preferences.

exec dbms_stats.set_global_prefs('PREFERENCE_OVERRIDES_PARAMETER','FALSE');

I am going to create two tables with 50,000 rows each.

DROP TABLE t1 PURGE;
DROP TABLE t2 PURGE;
CREATE TABLE t1 AS SELECT * FROM all_objects WHERE rownum <= 50000;
CREATE UNIQUE INDEX t1_idx ON t1 (owner, object_type, object_name, subobject_name);
CREATE TABLE t2 AS SELECT /*+NO_GATHER_OPTIMIZER_STATISTICS*/ * FROM all_objects WHERE rownum <= 50000;
CREATE UNIQUE INDEX t2_idx ON t2 (owner, object_type, object_name, subobject_name);
@tstats

REM tstats.sql
set pages 99 lines 200 trimspool on autotrace off
column table_name format a10
column column_name format a30
column PREFERENCE_OVERRIDES_PARAMETER format a30
break on report
alter session set nls_date_Format = 'hh24:mi:ss dd/mm/yyyy';
spool tstats
SELECT DBMS_STATS.GET_PREFS('PREFERENCE_OVERRIDES_PARAMETER') AS PREFERENCE_OVERRIDES_PARAMETER
FROM   dual;
SELECT table_name, sample_size, num_rows, last_analyzed
FROM user_tables
WHERE table_name in('T1','T2')
ORDER BY 1;

break on table_name skip 1
SELECT table_name, column_name, num_distinct, num_Buckets, histogram, last_analyzed
FROM user_tab_columns
WHERE table_name in('T1','T2')
order by 1,2;
spool off

Real-time statistics were collected on T1. Note that I have column statistics on T1, but no histograms. I suppressed statistics collection on T2.

If I had run this test on an autonomous database then I would have had histograms because _optimizer_gather_stats_on_load_hist=TRUE

PREFERENCE_OVERRIDES_PARAMETER
------------------------------
FALSE

TABLE_NAME SAMPLE_SIZE   NUM_ROWS LAST_ANALYZED
---------- ----------- ---------- -------------------
T1               50000      50000 15:33:06 13/11/2020
T2

TABLE_NAME COLUMN_NAME                    NUM_DISTINCT NUM_BUCKETS HISTOGRAM       LAST_ANALYZED
---------- ------------------------------ ------------ ----------- --------------- -------------------
T1         APPLICATION                               1           1 NONE            15:33:06 13/11/2020
           CREATED                                 520           1 NONE            15:33:06 13/11/2020
           CREATED_APPID                             0           0 NONE            15:33:06 13/11/2020
           CREATED_VSNID                             0           0 NONE            15:33:06 13/11/2020
           DATA_OBJECT_ID                          343           1 NONE            15:33:06 13/11/2020
           DEFAULT_COLLATION                         1           1 NONE            15:33:06 13/11/2020
           DUPLICATED                                1           1 NONE            15:33:06 13/11/2020
           EDITIONABLE                               1           1 NONE            15:33:06 13/11/2020
           EDITION_NAME                              0           0 NONE            15:33:06 13/11/2020
           GENERATED                                 2           1 NONE            15:33:06 13/11/2020
           LAST_DDL_TIME                           736           1 NONE            15:33:06 13/11/2020
           MODIFIED_APPID                            0           0 NONE            15:33:06 13/11/2020
           MODIFIED_VSNID                            0           0 NONE            15:33:06 13/11/2020
           NAMESPACE                                 7           1 NONE            15:33:06 13/11/2020
           OBJECT_ID                             50000           1 NONE            15:33:06 13/11/2020
           OBJECT_NAME                           45268           1 NONE            15:33:06 13/11/2020
           OBJECT_TYPE                              24           1 NONE            15:33:06 13/11/2020
           ORACLE_MAINTAINED                         2           1 NONE            15:33:06 13/11/2020
           OWNER                                     8           1 NONE            15:33:06 13/11/2020
           SECONDARY                                 1           1 NONE            15:33:06 13/11/2020
           SHARDED                                   1           1 NONE            15:33:06 13/11/2020
           SHARING                                   4           1 NONE            15:33:06 13/11/2020
           STATUS                                    1           1 NONE            15:33:06 13/11/2020
           SUBOBJECT_NAME                           77           1 NONE            15:33:06 13/11/2020
           TEMPORARY                                 2           1 NONE            15:33:06 13/11/2020
           TIMESTAMP                               602           1 NONE            15:33:06 13/11/2020

T2         APPLICATION                                             NONE
           CREATED                                                 NONE
           CREATED_APPID                                           NONE
           CREATED_VSNID                                           NONE
           DATA_OBJECT_ID                                          NONE
           DEFAULT_COLLATION                                       NONE
           DUPLICATED                                              NONE
           EDITIONABLE                                             NONE
           EDITION_NAME                                            NONE
           GENERATED                                               NONE
           LAST_DDL_TIME                                           NONE
           MODIFIED_APPID                                          NONE
           MODIFIED_VSNID                                          NONE
           NAMESPACE                                               NONE
           OBJECT_ID                                               NONE
           OBJECT_NAME                                             NONE
           OBJECT_TYPE                                             NONE
           ORACLE_MAINTAINED                                       NONE
           OWNER                                                   NONE
           SECONDARY                                               NONE
           SHARDED                                                 NONE
           SHARING                                                 NONE
           STATUS                                                  NONE
           SUBOBJECT_NAME                                          NONE
           TEMPORARY                                               NONE
           TIMESTAMP                                               NONE

I will now gather statistics on both tables with an explicit sample size and a METHOD_OPT that does not collect statistics on unindexed columns.

EXEC dbms_stats.gather_table_stats(user,'T1',estimate_percent=>10,method_opt=>'FOR ALL INDEXED COLUMNS SIZE 10');
EXEC dbms_stats.gather_table_stats(user,'T2',estimate_percent=>10,method_opt=>'FOR ALL INDEXED COLUMNS SIZE 10');
@tstats

I can see that I got sample sizes close to 5000 for each table, and the number of rows is 10 times larger, so it was a 10% sample size.
There is a unique index on each table on 4 columns. Only the column statistics for those 4 columns were updated. On T1, we can see the column statistics were collected at different times.

TABLE_NAME SAMPLE_SIZE   NUM_ROWS LAST_ANALYZED
---------- ----------- ---------- -------------------
T1                4937      49370 15:33:46 13/11/2020
T2                5013      50130 15:33:47 13/11/2020


TABLE_NAME COLUMN_NAME                    NUM_DISTINCT NUM_BUCKETS HISTOGRAM       LAST_ANALYZED
---------- ------------------------------ ------------ ----------- --------------- -------------------
T1         APPLICATION                               1           1 NONE            15:33:06 13/11/2020
           CREATED                                 520           1 NONE            15:33:06 13/11/2020
           CREATED_APPID                             0           0 NONE            15:33:06 13/11/2020
           CREATED_VSNID                             0           0 NONE            15:33:06 13/11/2020
           DATA_OBJECT_ID                          343           1 NONE            15:33:06 13/11/2020
           DEFAULT_COLLATION                         1           1 NONE            15:33:06 13/11/2020
           DUPLICATED                                1           1 NONE            15:33:06 13/11/2020
           EDITIONABLE                               1           1 NONE            15:33:06 13/11/2020
           EDITION_NAME                              0           0 NONE            15:33:06 13/11/2020
           GENERATED                                 2           1 NONE            15:33:06 13/11/2020
           LAST_DDL_TIME                           736           1 NONE            15:33:06 13/11/2020
           MODIFIED_APPID                            0           0 NONE            15:33:06 13/11/2020
           MODIFIED_VSNID                            0           0 NONE            15:33:06 13/11/2020
           NAMESPACE                                 7           1 NONE            15:33:06 13/11/2020
           OBJECT_ID                             50000           1 NONE            15:33:06 13/11/2020
           OBJECT_NAME                           40572           1 NONE            15:33:46 13/11/2020
           OBJECT_TYPE                              18          10 HEIGHT BALANCED 15:33:46 13/11/2020
           ORACLE_MAINTAINED                         2           1 NONE            15:33:06 13/11/2020
           OWNER                                     7           7 FREQUENCY       15:33:46 13/11/2020
           SECONDARY                                 1           1 NONE            15:33:06 13/11/2020
           SHARDED                                   1           1 NONE            15:33:06 13/11/2020
           SHARING                                   4           1 NONE            15:33:06 13/11/2020
           STATUS                                    1           1 NONE            15:33:06 13/11/2020
           SUBOBJECT_NAME                            8           1 NONE            15:33:46 13/11/2020
           TEMPORARY                                 2           1 NONE            15:33:06 13/11/2020
           TIMESTAMP                               602           1 NONE            15:33:06 13/11/2020

T2         APPLICATION                                             NONE
           CREATED                                                 NONE
           CREATED_APPID                                           NONE
           CREATED_VSNID                                           NONE
           DATA_OBJECT_ID                                          NONE
           DEFAULT_COLLATION                                       NONE
           DUPLICATED                                              NONE
           EDITIONABLE                                             NONE
           EDITION_NAME                                            NONE
           GENERATED                                               NONE
           LAST_DDL_TIME                                           NONE
           MODIFIED_APPID                                          NONE
           MODIFIED_VSNID                                          NONE
           NAMESPACE                                               NONE
           OBJECT_ID                                               NONE
           OBJECT_NAME                           43403           1 NONE            15:33:47 13/11/2020
           OBJECT_TYPE                              14          10 HEIGHT BALANCED 15:33:47 13/11/2020
           ORACLE_MAINTAINED                                       NONE
           OWNER                                     7           7 FREQUENCY       15:33:47 13/11/2020
           SECONDARY                                               NONE
           SHARDED                                                 NONE
           SHARING                                                 NONE
           STATUS                                                  NONE
           SUBOBJECT_NAME                           10           1 NONE            15:33:47 13/11/2020
           TEMPORARY                                               NONE
           TIMESTAMP                                               NONE

Now I am going to enable PREFERENCE_OVERRIDES_PARAMETER at database-level, but I am also going to disable it for T1, so the parameters still override the preference. I would like histograms on the indexed columns on T2, so I am going to specify a METHOD_OPT table preference. If I mix FOR ALL COLUMNS and FOR ALL INDEXED COLUMNS, whichever is specified first will be overridden by the second. So instead, I must explicitly list the indexed columns.

connect / as sysdba
exec dbms_stats.set_global_prefs('PREFERENCE_OVERRIDES_PARAMETER','TRUE');
connect scott/tiger
exec dbms_stats.set_table_prefs(user,'T1','PREFERENCE_OVERRIDES_PARAMETER','FALSE');
exec dbms_stats.set_table_prefs(user,'T2','METHOD_OPT'
              ,'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 10 owner, object_type, object_name, subobject_name');

EXEC dbms_stats.gather_table_stats(user,'T1',estimate_percent=>10,method_opt=>'FOR ALL INDEXED COLUMNS SIZE 10');
EXEC dbms_stats.gather_table_stats(user,'T2',estimate_percent=>10,method_opt=>'FOR ALL INDEXED COLUMNS SIZE 10');
@tstats

For T2, the sample size and the number of rows are both 50000. The new NDV algorithm has produced the correct answer. I didn't get any histograms other than on the columns I specified because there isn't any table usage yet.
T1 used the 10% sample size and again only the statistics on the indexed columns were updated.
I didn't get a histogram on T1 on OBJECT_NAME and SUBOBJECT_NAME, and I got a height-balanced histogram on OBJECT_TYPE because there were more distinct values than buckets specified.
However, on T2, Oracle collected hybrid histograms on OBJECT_NAME and SUBOBJECT and a top-frequency histogram on OBJECT_TYPE. This type of histogram is only gathered is ESTIMATE_PERCENT is AUTO_SAMPLE_SIZE.

PREFERENCE_OVERRIDES_PARAMETER
------------------------------
TRUE


TABLE_NAME SAMPLE_SIZE   NUM_ROWS LAST_ANALYZED
---------- ----------- ---------- -------------------
T1                5000      50000 15:56:58 13/11/2020
T2               50000      50000 15:56:59 13/11/2020


TABLE_NAME COLUMN_NAME                    NUM_DISTINCT NUM_BUCKETS HISTOGRAM       LAST_ANALYZED
---------- ------------------------------ ------------ ----------- --------------- -------------------
T1         APPLICATION                               1           1 NONE            15:33:06 13/11/2020
           CREATED                                 520           1 NONE            15:33:06 13/11/2020
           CREATED_APPID                             0           0 NONE            15:33:06 13/11/2020
           CREATED_VSNID                             0           0 NONE            15:33:06 13/11/2020
           DATA_OBJECT_ID                          343           1 NONE            15:33:06 13/11/2020
           DEFAULT_COLLATION                         1           1 NONE            15:33:06 13/11/2020
           DUPLICATED                                1           1 NONE            15:33:06 13/11/2020
           EDITIONABLE                               1           1 NONE            15:33:06 13/11/2020
           EDITION_NAME                              0           0 NONE            15:33:06 13/11/2020
           GENERATED                                 2           1 NONE            15:33:06 13/11/2020
           LAST_DDL_TIME                           736           1 NONE            15:33:06 13/11/2020
           MODIFIED_APPID                            0           0 NONE            15:33:06 13/11/2020
           MODIFIED_VSNID                            0           0 NONE            15:33:06 13/11/2020
           NAMESPACE                                 7           1 NONE            15:33:06 13/11/2020
           OBJECT_ID                             50000           1 NONE            15:33:06 13/11/2020
           OBJECT_NAME                           42274           1 NONE            15:56:58 13/11/2020
           OBJECT_TYPE                              19          10 HEIGHT BALANCED 15:56:58 13/11/2020
           ORACLE_MAINTAINED                         2           1 NONE            15:33:06 13/11/2020
           OWNER                                     8           8 FREQUENCY       15:56:58 13/11/2020
           SECONDARY                                 1           1 NONE            15:33:06 13/11/2020
           SHARDED                                   1           1 NONE            15:33:06 13/11/2020
           SHARING                                   4           1 NONE            15:33:06 13/11/2020
           STATUS                                    1           1 NONE            15:33:06 13/11/2020
           SUBOBJECT_NAME                           14           1 NONE            15:56:58 13/11/2020
           TEMPORARY                                 2           1 NONE            15:33:06 13/11/2020
           TIMESTAMP                               602           1 NONE            15:33:06 13/11/2020

T2         APPLICATION                               1           1 NONE            15:56:59 13/11/2020
           CREATED                                 520           1 NONE            15:56:59 13/11/2020
           CREATED_APPID                             0           0 NONE            15:56:59 13/11/2020
           CREATED_VSNID                             0           0 NONE            15:56:59 13/11/2020
           DATA_OBJECT_ID                          343           1 NONE            15:56:59 13/11/2020
           DEFAULT_COLLATION                         1           1 NONE            15:56:59 13/11/2020
           DUPLICATED                                1           1 NONE            15:56:59 13/11/2020
           EDITIONABLE                               1           1 NONE            15:56:59 13/11/2020
           EDITION_NAME                              0           0 NONE            15:56:59 13/11/2020
           GENERATED                                 2           1 NONE            15:56:59 13/11/2020
           LAST_DDL_TIME                           736           1 NONE            15:56:59 13/11/2020
           MODIFIED_APPID                            0           0 NONE            15:56:59 13/11/2020
           MODIFIED_VSNID                            0           0 NONE            15:56:59 13/11/2020
           NAMESPACE                                 7           1 NONE            15:56:59 13/11/2020
           OBJECT_ID                             50000           1 NONE            15:56:59 13/11/2020
           OBJECT_NAME                           45268          10 HYBRID          15:56:59 13/11/2020
           OBJECT_TYPE                              24          10 TOP-FREQUENCY   15:56:59 13/11/2020
           ORACLE_MAINTAINED                         2           1 NONE            15:56:59 13/11/2020
           OWNER                                     8           8 FREQUENCY       15:56:59 13/11/2020
           SECONDARY                                 1           1 NONE            15:56:59 13/11/2020
           SHARDED                                   1           1 NONE            15:56:59 13/11/2020
           SHARING                                   4           1 NONE            15:56:59 13/11/2020
           STATUS                                    1           1 NONE            15:56:59 13/11/2020
           SUBOBJECT_NAME                           77          10 HYBRID          15:56:59 13/11/2020
           TEMPORARY                                 2           1 NONE            15:56:59 13/11/2020
           TIMESTAMP                               602           1 NONE            15:56:59 13/11/2020

Conclusion

I think there is a strong case for enabling PREFERENCE_OVERIDES_PARAMETER at database level on all databases from 12.2.

exec dbms_stats.set_global_prefs('PREFERENCE_OVERRIDES_PARAMETER','TRUE');

If your application explicitly collects statistics, or if you have legacy scripts that collect statistics, and either explicitly specify GATHER_TABLE_STATS parameters, then setting this parameter will revert them to the default. This is particularly valuable in the case of ESTIMATE_PERCENT as it will default to AUTO_SAMPLE_SIZE and you will get improved row count estimates based on the new 12c algorithm will less work.

If you don't have this problem in the first place, then enabling PREFERENCE_OVERIDES_PARAMETER at database level will prevent this problem from developing in the future!

If you have tables with requirements for particular statistics collections (e.g. METHOD_OPT, GRANULARITY etc.) and you don't wish to simply use the defaults, then these variations should be implemented as table statistics preferences. If for some reason that is not possible, PREFERENCE_OVERIDES_PARAMETER can be disabled again at table level, also with a table statistics preference.

↧

Partition Change Tracking During Materialized View Refresh and Query Rewrite

November 23, 2020, 5:06 am

≫ Next: First Steps in Spatial Data

≪ Previous: Oracle 12.2: New Statistic Preference PREFERENCE_OVERRIDES_PARAMETER

This article discusses the interplay of Partitioning, Partition Change Tracking and Query Rewrite in relation to Materialized Views.

Introduction

Documented Preconditions and Limitations

Demonstrations

Conclusion

Introduction

In the Oracle database, Materialized Views can be used to create pre-generated reporting tables. A view of the data based on a SQL query is materialized into a table. That query may restrict the rows and columns and may aggregate the data. An application can reference the materialized view directly, or the Oracle database can 'rewrite' SQL queries on the original tables that are similar to query in a materialized view to use that materialized view instead.

By default, QUERY_REWRITE_INTEGRITY is enforced, which means Query rewrite works only with materialized views that are up to date (i.e. the underlying data hasn't changed since the materialized view was last refreshed). This note deals with that scenario. Optionally, rewrite integrity can be configured to allow rewrite to occur on stale materialized views (this is called 'stale tolerated'). It can be set at system or session-level.

Partition Change Tracking (PCT) is 'a way of tracking the staleness of a materialized view on the partition and subpartition level'. If both the materialized view and at least one underlying table in the view are similarly partitioned, then Oracle can determine the relationship between partitions and subpartitions in the underlying table and those in the materialized view. The database can track not just whether any partition in the underlying tables has been updated since the last refresh of the materialized view, but which ones. During SQL parse, if after partition pruning of the query on the underlying tables, none of the remaining partitions are stale then the query can still be rewritten. Also, it is possible to refresh just the stale partitions in the materialized view, those that correspond to the underlying partitions that have been updated since the last refresh.

Query rewrite is a cost-based SQL transformation. Therefore, it will only occur if the optimizer calculates that the rewritten query has a lower cost. If I refresh the materialized view in non-atomic mode, then the materialized view will be truncated and populated in direct-path mode, thus the data can be compressed (either with basic compression, or Hybrid-Columnar Compression if on an engineered platform) without the need of the Advanced Compression Licence. This will further reduce the size and cost of accessing the materialized view and increase the likelihood of query rewrite.

I have written a series of blogs about retrofitting partitioning into existing applications. One of my examples was based on PeopleSoft General Ledger reporting in which I discussed options for partitioning the ledger such that there is a different partition for each accounting period. Once an accounting period is closed the application generally doesn't usually change it further. It should be possible to create partitioned materialized views on the ledger table to support GL reporting using query rewrite. As the application continues to insert data into the partition for the current accounting period, that partition will quickly become stale and queries on that partition won't be rewritten. However, it is common for customers to run suites of reports overnight, and those could be run after a materialized view refresh and make good use of query rewrite.

However, as I modelled this, I ran into a few problems that reveal some of the behaviour of PCT, query rewrite and materialized view refresh. I have created a number of test scripts that illustrate various scenarios that I will describe below. The full scripts are available on Github.

Documented Preconditions and Limitations

Oracle's documentation sets out a number of preconditions for PCT.

Partitioned tables must use either range, list or composite partitioning with range or list as the top-level partitioning strategy. - Therefore, hash partitioning is not supported. What about interval partitioning? See demonstration 3.
The top-level partition key must consist of only a single column. - If, as I proposed, the ledger table is range partitioned on the combination FISCAL_YEAR and ACCOUNTING_PERIOD then PCT will not work (see demonstration 1: Multi-column composite partitioning). So, are other partitioning strategies viable?
The materialized view must contain either the partition key column or a partition marker or ROWID or join dependent expression of the detail table.
If you use a GROUP BY clause, the partition key column or the partition marker or ROWID or join dependent expression must be present in the GROUP BY clause.

Note that, while partition change tracking tracks the staleness on a partition and subpartition level (for composite partitioned tables), the level of granularity for PCT refresh is only the top-level partitioning strategy. Consequently, any change to data in one of the subpartitions of a composite partitioned-table will only mark the single impacted subpartition as stale and have the rest of the table available for rewrite, but the PCT refresh will refresh the whole partition that contains the impacted subpartition.

Demonstrations

In each of the following demonstrations, I will create a copy of the PeopleSoft Financials General Ledger table PS_LEDGER, populate it with random data to simulate 2½ years of actuals and 4 years of budget data. The table will be partitioned differently in each demonstration. I will also create one or two materialized views that will also be partitioned. Then I will add data for another accounting period and look at how the materialized view refresh and query rewrite behave when one partition is stale.

The tests have been run on Oracle 19.9. Query rewrite is enabled, and rewrite integrity is enforced.

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
query_rewrite_enabled                string      TRUE
query_rewrite_integrity              string      enforced

Demonstration 1: Multi-column composite partitioning

I will start with my usual composite partitioning of the ledger table on the combination of FISCAL_YEAR and ACCOUNTING_PERIOD to permit sub-partitioning on LEDGER.

CREATE TABLE ps_ledger
(business_unit VARCHAR2(5) NOT NULL
,ledger VARCHAR2(10) NOT NULL
,account VARCHAR2(10) NOT NULL
…
) PCTFREE 10 PCTUSED 80
PARTITION BY RANGE (FISCAL_YEAR,ACCOUNTING_PERIOD) 
SUBPARTITION BY LIST (LEDGER)
SUBPARTITION TEMPLATE
(SUBPARTITION actuals  VALUES ('ACTUALS')
,SUBPARTITION budget   VALUES ('BUDGET'))
(PARTITION ledger_2018 VALUES LESS THAN (2019,0) PCTFREE 0 COMPRESS
--
,PARTITION ledger_2019_bf VALUES LESS THAN (2019,1) PCTFREE 0 COMPRESS 
,PARTITION ledger_2019_01 VALUES LESS THAN (2019,2) PCTFREE 0 COMPRESS 
…
,PARTITION ledger_2019_12 VALUES LESS THAN (2019,13) PCTFREE 0 COMPRESS
,PARTITION ledger_2019_cf VALUES LESS THAN (2020,0) PCTFREE 0 COMPRESS
--
,PARTITION ledger_2020_bf VALUES LESS THAN (2020,1)
,PARTITION ledger_2020_01 VALUES LESS THAN (2020,2) 
…
,PARTITION ledger_2020_12 VALUES LESS THAN (2020,13)
,PARTITION ledger_2020_cf VALUES LESS THAN (2021,0)
--
,PARTITION ledger_2021_bf VALUES LESS THAN (2021,1)
,PARTITION ledger_2021_01 VALUES LESS THAN (2021,2)
…
,PARTITION ledger_2021_12 VALUES LESS THAN (2021,13)
,PARTITION ledger_2021_cf VALUES LESS THAN (2022,0)
)
ENABLE ROW MOVEMENT 
NOPARALLEL NOLOGGING
/
@treeselectors
@popledger

I will also create the tree selector tables used as dimension tables in the nVision General Ledger Reports

REM treeselectors.sql 
CREATE TABLE PSTREESELECT05
(SELECTOR_NUM INTEGER NOT NULL,
 TREE_NODE_NUM INTEGER NOT NULL,
 RANGE_FROM_05 VARCHAR2(05) NOT NULL,
 RANGE_TO_05   VARCHAR2(05) NOT NULL)
 PARTITION BY RANGE (SELECTOR_NUM) INTERVAL (1)
 (PARTITION pstreeselector VALUES LESS THAN (2))
 NOPARALLEL NOLOGGING;
CREATE UNIQUE INDEX PS_PSTREESELECT05 ON PSTREESELECT05 (SELECTOR_NUM, TREE_NODE_NUM, RANGE_FROM_05);

CREATE TABLE PSTREESELECT10 
(SELECTOR_NUM INTEGER NOT NULL,
 TREE_NODE_NUM INTEGER NOT NULL,
 RANGE_FROM_10 VARCHAR2(10) NOT NULL,
 RANGE_TO_10   VARCHAR2(10) NOT NULL)
 PARTITION BY RANGE (SELECTOR_NUM) INTERVAL (1)
 (PARTITION pstreeselector VALUES LESS THAN (2))
 NOPARALLEL NOLOGGING;
CREATE UNIQUE INDEX PS_PSTREESELECT10 ON PSTREESELECT10 (SELECTOR_NUM, TREE_NODE_NUM, RANGE_FROM_10);

exec dbms_stats.set_table_prefs('SCOTT','PSTREESELECT05','GRANULARITY','ALL');
exec dbms_stats.set_table_prefs('SCOTT','PSTREESELECT10','GRANULARITY','ALL');
exec dbms_stats.set_table_prefs('SCOTT','PSTREESELECT05','METHOD_OPT'-
    ,'FOR ALL COLUMNS SIZE 1, FOR COLUMNS SELECTOR_NUM, (SELECTOR_NUM, TREE_NODE_NUM) SIZE 254');
exec dbms_stats.set_table_prefs('SCOTT','PSTREESELECT10','METHOD_OPT'-
    ,'FOR ALL COLUMNS SIZE 1, FOR COLUMNS SELECTOR_NUM, (SELECTOR_NUM, TREE_NODE_NUM) SIZE 254');

And then I will populate and collect statistics on the ledger with randomised, but skewed, data to simulate

actuals data from fiscal year 2018 to period 6 of 2020
budget data from fiscal year 2018 to 2021 that is 10% of the size of the actuals data.

Some typical indexes will be built on the ledger table.

The tree selector tables will be populated with data corresponding to the ledger data:

the business unit tree will have both business units,
the account tree will have 25% of the 999 accounts,
the chartfield tree will have 10% of the 999 chartfields.

Statistics preferences will be defined so that statistics will be collected at all table, partition and subpartition levels on all these tables. There will only be histograms on a few low cardinality columns.

REM popledger.sql
set autotrace off echo on pages 99 lines 200 trimspool on
truncate table ps_ledger;
exec dbms_stats.set_table_prefs('SCOTT','PS_LEDGER','METHOD_OPT'-
    ,'FOR ALL COLUMNS SIZE 1, FOR COLUMNS FISCAL_YEAR, ACCOUNTING_PERIOD, LEDGER, BUSINESS_UNIT SIZE 254');
exec dbms_stats.set_table_prefs('SCOTT','PS_LEDGER','GRANULARITY','ALL');
ALTER TABLE PS_LEDGER PARALLEL 8 NOLOGGING;

CREATE /*UNIQUE*/ INDEX ps_ledger ON ps_ledger
(business_unit, ledger, account, deptid
,product, fund_code, class_fld, affiliate
,chartfield2, project_id, book_code, gl_adjust_type
,currency_cd, statistics_code, fiscal_year, accounting_period
) COMPRESS 2 PARALLEL
/
INSERT /*+APPEND PARALLEL ENABLE_PARALLEL_DML NO_GATHER_OPTIMIZER_STATISTICS*//*IGNORE_ROW_ON_DUPKEY_INDEX(PS_LEDGER)*/ 
INTO ps_ledger
with n as (
SELECT rownum n from dual connect by level <= 1e2
), fy as (
SELECT 2017+rownum fiscal_year FROM dual CONNECT BY level <= 4
), ap as (
SELECT FLOOR(dbms_random.value(0,13)) accounting_period FROM dual connect by level <= 998
UNION ALL SELECT 998 FROM DUAL CONNECT BY LEVEL <= 1
UNION ALL SELECT 999 FROM DUAL CONNECT BY LEVEL <= 1
), l as (
SELECT 'ACTUALS' ledger FROM DUAL CONNECT BY LEVEL <= 10
UNION ALL SELECT 'BUDGET' FROM DUAL
)
select 'BU'||LTRIM(TO_CHAR(CASE WHEN dbms_random.value <= .9 THEN 1 ELSE 2 END,'000')) business_unit 
,      l.ledger
,      'ACC'||LTRIM(TO_CHAR(999*SQRT(dbms_random.value),'000')) account 
,      'ALTACCT'||LTRIM(TO_CHAR(999*dbms_random.value,'000')) altacct
,      'DEPT'||LTRIM(TO_CHAR(9999*dbms_random.value,'0000')) deptid
,      'OPUNIT'||LTRIM(TO_CHAR(99*dbms_random.value,'00')) operating_unit
,      'P'||LTRIM(TO_CHAR(99999*dbms_random.value,'00000')) product 
,      'FUND'||LTRIM(TO_CHAR(9*dbms_random.value,'0')) fund_code
,      'CLAS'||LTRIM(TO_CHAR(9*dbms_random.value,'0')) class_fld
,      'PROD'||LTRIM(TO_CHAR(9*dbms_random.value,'0')) program_code
,      '' budget_ref
,      'AF'||LTRIM(TO_CHAR(999*dbms_random.value,'000')) affiliate 
,      'AFI'||LTRIM(TO_CHAR(99999*dbms_random.value,'00000')) affiliate_intra1
,      'AFI'||LTRIM(TO_CHAR( 9999*dbms_random.value,'0000')) affiliate_intra2
,      'CF'||LTRIM(TO_CHAR(  999*SQRT(dbms_random.value),'000')) chartfield1
,      'CF'||LTRIM(TO_CHAR(99999*dbms_random.value,'00000')) chartfield2
,      'CF'||LTRIM(TO_CHAR( 9999*dbms_random.value,'0000')) chartfield3
,      'PRJ'||LTRIM(TO_CHAR(9999*dbms_random.value,'0000')) project_id
,      'BK'||LTRIM(TO_CHAR(99*dbms_random.value,'00')) book_code
,      'GL'||LTRIM(TO_CHAR(99*dbms_random.value,'00')) gl_adjust_type
,      'GBP' currency_cd 
,      '' statistics_code 
,      fy.fiscal_year
,      ap.accounting_period
,      dbms_random.value(0,1e6) posted_total_amt 
,      0 posted_base_amt 
,      0 posted_tran_amt 
,      'GBP' base_currency 
,      SYSDATE dttm_stamp_sec
,      0 process_instance 
FROM   fy,ap, l, n
WHERE  l.ledger = 'BUDGET' or (fy.fiscal_year < 2020 or (fy.fiscal_year = 2020 AND ap.accounting_period <= 6))
/
commit;
exec dbms_stats.gather_table_stats('SCOTT','PS_LEDGER');

CREATE INDEX psxledger ON ps_ledger
(ledger, fiscal_year, accounting_period, business_unit, account, chartfield1
) LOCAL COMPRESS 4 PARALLEL
/
CREATE INDEX psyledger ON ps_ledger
(ledger, fiscal_year, business_unit, account, chartfield1, accounting_period
) LOCAL COMPRESS 3 PARALLEL
/
ALTER INDEX ps_ledger NOPARALLEL;
ALTER INDEX psxledger NOPARALLEL;
ALTER INDEX psyledger NOPARALLEL;

TRUNCATE TABLE PSTREESELECT05;
TRUNCATE TABLE PSTREESELECT10;
INSERT INTO PSTREESELECT05
WITH x as (SELECT DISTINCT business_unit FROM ps_ledger)
, y as (SELECT 30982, FLOOR(DBMS_RANDOM.value(1,1e10)) tree_node_num, business_unit FROM x)
select y.*, business_unit FROM y
/
INSERT INTO PSTREESELECT10
WITH x as (SELECT DISTINCT account FROM ps_ledger)
, y as (SELECT 30984, FLOOR(DBMS_RANDOM.value(1,1e10)) tree_node_num, account FROM x)
select y.*, account FROM y
where mod(tree_node_num,100)<25
/
INSERT INTO PSTREESELECT10
WITH x as (SELECT DISTINCT chartfield1 FROM ps_ledger)
, y as (SELECT 30985, FLOOR(DBMS_RANDOM.value(1,1e10)) tree_node_num, chartfield1 FROM x)
select y.*, chartfield1 FROM y
where mod(tree_node_num,100)<10
/

Per complete fiscal year, there are 1,000,000 actuals rows and 100,000 budget rows

LEDGER     FISCAL_YEAR   COUNT(*) MAX(ACCOUNTING_PERIOD)
---------- ----------- ---------- ----------------------
ACTUALS           2018    1000000                    999
                  2019    1000000                    999
                  2020     538408                      6

BUDGET            2018     100000                    999
                  2019     100000                    999
                  2020     100000                    999
                  2021     100000                    999

**********             ----------
sum                       2938408

There are about 77K rows per accounting period with just 1000 rows in periods 998 (adjustments), 999 (carry forward)

LEDGER     FISCAL_YEAR ACCOUNTING_PERIOD   COUNT(*)
---------- ----------- ----------------- ----------
…
ACTUALS           2019                 0      76841
                                       1      76410
                                       2      76867
                                       3      77088
                                       4      77740
                                       5      77010
                                       6      76650
                                       7      76553
                                       8      76923
                                       9      76586
                                      10      76276
                                      11      76943
                                      12      76113
                                     998       1000
                                     999       1000

********** ***********                   ----------
           sum                              1000000

ACTUALS           2020                 0      77308
                                       1      76696
                                       2      76944
                                       3      77227
                                       4      76944
                                       5      76524
                                       6      76765

********** ***********                   ----------
           sum                               538408
…

I will create two MVs each containing data for a single fiscal year; one for 2019 and one for 2020 I will only range partition the MV on accounting period. We don't need to partition it on FISCAL_YEAR since it only contains a single year.

CREATE MATERIALIZED VIEW mv_ledger_2019
PARTITION BY RANGE (ACCOUNTING_PERIOD)
(PARTITION ap_bf VALUES LESS THAN (1) 
,PARTITION ap_01 VALUES LESS THAN (2) 
,PARTITION ap_02 VALUES LESS THAN (3) 
,PARTITION ap_03 VALUES LESS THAN (4) 
,PARTITION ap_04 VALUES LESS THAN (5) 
,PARTITION ap_05 VALUES LESS THAN (6) 
,PARTITION ap_06 VALUES LESS THAN (7) 
,PARTITION ap_07 VALUES LESS THAN (8) 
,PARTITION ap_08 VALUES LESS THAN (9) 
,PARTITION ap_09 VALUES LESS THAN (10) 
,PARTITION ap_10 VALUES LESS THAN (11) 
,PARTITION ap_11 VALUES LESS THAN (12) 
,PARTITION ap_12 VALUES LESS THAN (13) 
,PARTITION ap_cf VALUES LESS THAN (MAXVALUE)
) PCTFREE 0 COMPRESS 
REFRESH COMPLETE ON DEMAND
ENABLE QUERY REWRITE AS
SELECT business_unit, account, chartfield1, fiscal_year, accounting_period,
sum(posted_total_amt) posted_total_amt
FROM ps_ledger
WHERE fiscal_year = 2019
AND   ledger = 'ACTUALS'
AND   currency_cd = 'GBP'
GROUP BY business_unit, account, chartfield1, fiscal_year, accounting_period
/
CREATE MATERIALIZED VIEW mv_ledger_2020
PARTITION BY RANGE (ACCOUNTING_PERIOD)
(PARTITION ap_bf VALUES LESS THAN (1) 
,PARTITION ap_01 VALUES LESS THAN (2) 
,PARTITION ap_02 VALUES LESS THAN (3) 
,PARTITION ap_03 VALUES LESS THAN (4) 
,PARTITION ap_04 VALUES LESS THAN (5) 
,PARTITION ap_05 VALUES LESS THAN (6) 
,PARTITION ap_06 VALUES LESS THAN (7) 
,PARTITION ap_07 VALUES LESS THAN (8) 
,PARTITION ap_08 VALUES LESS THAN (9) 
,PARTITION ap_09 VALUES LESS THAN (10) 
,PARTITION ap_10 VALUES LESS THAN (11) 
,PARTITION ap_11 VALUES LESS THAN (12) 
,PARTITION ap_12 VALUES LESS THAN (13) 
,PARTITION ap_cf VALUES LESS THAN (MAXVALUE)
) PCTFREE 0 COMPRESS 
REFRESH COMPLETE ON DEMAND
ENABLE QUERY REWRITE AS
SELECT business_unit, account, chartfield1, fiscal_year, accounting_period,
sum(posted_total_amt) posted_total_amt
FROM ps_ledger
WHERE fiscal_year = 2020
AND   ledger = 'ACTUALS'
AND   currency_cd = 'GBP'
GROUP BY business_unit, account, chartfield1, fiscal_year, accounting_period
/
@mvpop
@@mvpop
@@mvsql
@@pop2020m7
@@mvsql
@@mvtrc
@@mvvol
@@mvsql
@@mvcap

The materialized views are populated on creation, but I will explicitly collect statistics on them.

SELECT MVIEW_NAME, STALENESS, LAST_REFRESH_TYPE, COMPILE_STATE FROM USER_MVIEWS ORDER BY MVIEW_NAME;

ALTER MATERIALIZED VIEW mv_ledger_2019 NOPARALLEL;
exec dbms_stats.set_table_prefs('SCOTT','MV_LEDGER_2019','METHOD_OPT',-
'FOR ALL COLUMNS SIZE 1, FOR COLUMNS FISCAL_YEAR, ACCOUNTING_PERIOD, BUSINESS_UNIT SIZE 254');
exec dbms_stats.set_table_prefs('SCOTT','MV_LEDGER_2019','GRANULARITY','ALL');

ALTER MATERIALIZED VIEW mv_ledger_2020 NOPARALLEL;
exec dbms_stats.set_table_prefs('SCOTT','MV_LEDGER_2020','METHOD_OPT',-
'FOR ALL COLUMNS SIZE 1, FOR COLUMNS FISCAL_YEAR, ACCOUNTING_PERIOD, BUSINESS_UNIT SIZE 254');
exec dbms_stats.set_table_prefs('SCOTT','MV_LEDGER_2020','GRANULARITY','ALL');

exec dbms_stats.gather_table_stats('SCOTT','MV_LEDGER_2019');
exec dbms_stats.gather_table_stats('SCOTT','MV_LEDGER_2020');

Although I can do a full refresh of the MV, I cannot do a PCT refresh.

BEGIN dbms_mview.refresh(list=>'MV_LEDGER_2020',method=>'P',atomic_refresh=>FALSE); END;

*
ERROR at line 1:
ORA-12047: PCT FAST REFRESH cannot be used for materialized view "SCOTT"."MV_LEDGER_2020"
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 3020
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 2432
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 88
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 253
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 2413
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 2976
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 3263
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 3295
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 16
ORA-06512: at line 1

I can use EXPLAIN_MVIEW to check the status of the MV

REM mvcap.sql
create table MV_CAPABILITIES_TABLE
(
  statement_id      varchar(30) ,
  mvowner           varchar(30) ,
  mvname            varchar(30) ,
  capability_name   varchar(30) ,
  possible          character(1) ,
  related_text      varchar(2000) ,
  related_num       number ,
  msgno             integer ,
  msgtxt            varchar(2000) ,
  seq               number
) ;

truncate table MV_CAPABILITIES_TABLE;
EXECUTE DBMS_MVIEW.EXPLAIN_MVIEW ('SCOTT.MV_LEDGER_2019');
EXECUTE DBMS_MVIEW.EXPLAIN_MVIEW ('SCOTT.MV_LEDGER_2020');
break on mvname skip 1
column rel_text format a20
column msgtxt format a60
SELECT mvname, capability_name,  possible, SUBSTR(related_text,1,20) AS rel_text, SUBSTR(msgtxt,1,60) AS msgtxt
FROM MV_CAPABILITIES_TABLE
WHERE mvname like 'MV_LEDGER_20%'
ORDER BY mvname, seq;

EXPLAIN_MVIEW reports that general query rewrite is available but PCT and PCT query rewrite are not. Per the manual, Oracle simply cannot do a PCT refresh if the table has multi-column partitioning.

CAPABILITY_NAME                P REL_TEXT             MSGTXT
------------------------------ - -------------------- ------------------------------------------------------------
PCT                            N
REFRESH_COMPLETE               Y
REFRESH_FAST                   N
REWRITE                        Y
PCT_TABLE                      N PS_LEDGER            PCT not supported with multi-column partition key
REFRESH_FAST_AFTER_INSERT      N SCOTT.PS_LEDGER      the detail table does not have a materialized view log
REFRESH_FAST_AFTER_ONETAB_DML  N POSTED_TOTAL_AMT     SUM(expr) without COUNT(expr)
REFRESH_FAST_AFTER_ONETAB_DML  N                      see the reason why REFRESH_FAST_AFTER_INSERT is disabled
REFRESH_FAST_AFTER_ONETAB_DML  N                      COUNT(*) is not present in the select list
REFRESH_FAST_AFTER_ONETAB_DML  N                      SUM(expr) without COUNT(expr)
REFRESH_FAST_AFTER_ANY_DML     N                      see the reason why REFRESH_FAST_AFTER_ONETAB_DML is disabled
REFRESH_FAST_PCT               N                      PCT is not possible on any of the detail tables in the mater
REWRITE_FULL_TEXT_MATCH        Y
REWRITE_PARTIAL_TEXT_MATCH     Y
REWRITE_GENERAL                Y
REWRITE_PCT                    N                      general rewrite is not possible or PCT is not possible on an
PCT_TABLE_REWRITE              N PS_LEDGER            PCT not supported with multi-column partition key

At the moment, the materialized views are up to date.

SELECT L.TREE_NODE_NUM,L2.TREE_NODE_NUM,SUM(A.POSTED_TOTAL_AMT)
FROM   PS_LEDGER A
,      PSTREESELECT05 L1
,      PSTREESELECT10 L
,      PSTREESELECT10 L2
WHERE  A.LEDGER='ACTUALS'
AND    A.FISCAL_YEAR=2020
AND    (A.ACCOUNTING_PERIOD BETWEEN 1 AND 6)
AND    L1.SELECTOR_NUM=30982 AND A.BUSINESS_UNIT=L1.RANGE_FROM_05
AND    L.SELECTOR_NUM=30985 AND A.CHARTFIELD1=L.RANGE_FROM_10
AND    L2.SELECTOR_NUM=30984 AND A.ACCOUNT=L2.RANGE_FROM_10
AND    A.CURRENCY_CD='GBP'
GROUP BY L.TREE_NODE_NUM,L2.TREE_NODE_NUM
/

And I get MV rewrite because the MV is up to date. Note that Oracle only probed partitions 2 to 7, so it correctly pruned partitions.

Plan hash value: 3290858815
--------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name              | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                   |  5573 |   239K|   276   (3)| 00:00:01 |       |       |
|   1 |  HASH GROUP BY                       |                   |  5573 |   239K|   276   (3)| 00:00:01 |       |       |
|*  2 |   HASH JOIN                          |                   |  5573 |   239K|   275   (3)| 00:00:01 |       |       |
|   3 |    JOIN FILTER CREATE                | :BF0000           |     2 |    22 |     1   (0)| 00:00:01 |       |       |
|*  4 |     INDEX RANGE SCAN                 | PS_PSTREESELECT05 |     2 |    22 |     1   (0)| 00:00:01 |       |       |
|   5 |    VIEW                              | VW_GBC_17         |  5573 |   179K|   274   (3)| 00:00:01 |       |       |
|   6 |     HASH GROUP BY                    |                   |  5573 |   364K|   274   (3)| 00:00:01 |       |       |
|   7 |      JOIN FILTER USE                 | :BF0000           |  5573 |   364K|   273   (2)| 00:00:01 |       |       |
|*  8 |       HASH JOIN                      |                   |  5573 |   364K|   273   (2)| 00:00:01 |       |       |
|*  9 |        INDEX RANGE SCAN              | PS_PSTREESELECT10 |   239 |  4541 |     2   (0)| 00:00:01 |       |       |
|* 10 |        HASH JOIN                     |                   | 23295 |  1091K|   270   (2)| 00:00:01 |       |       |
|* 11 |         INDEX RANGE SCAN             | PS_PSTREESELECT10 |    77 |  1386 |     2   (0)| 00:00:01 |       |       |
|  12 |         PARTITION RANGE ITERATOR     |                   |   301K|  8827K|   267   (2)| 00:00:01 |     2 |     7 |
|* 13 |          MAT_VIEW REWRITE ACCESS FULL| MV_LEDGER_2020    |   301K|  8827K|   267   (2)| 00:00:01 |     2 |     7 |
--------------------------------------------------------------------------------------------------------------------------

Now I will add more random data for the financial year 2020, accounting period 7. So there have been changes to just one partition.

REM pop2020m7.sql
insert into ps_ledger
with n as (
SELECT rownum n from dual connect by level <= 1e6/13
)
select 'BU'||LTRIM(TO_CHAR(CASE WHEN dbms_random.value <= .9 THEN 1 ELSE 2 END,'000')) business_unit 
,      'ACTUALS' ledger
,      'ACC'||LTRIM(TO_CHAR(999*SQRT(dbms_random.value),'000')) account 
,      'ALTACCT'||LTRIM(TO_CHAR(999*dbms_random.value,'000')) altacct
,      'DEPT'||LTRIM(TO_CHAR(9999*dbms_random.value,'0000')) deptid
,      'OPUNIT'||LTRIM(TO_CHAR(99*dbms_random.value,'00')) operating_unit
,      'P'||LTRIM(TO_CHAR(99999*dbms_random.value,'00000')) product 
,      'FUND'||LTRIM(TO_CHAR(9*dbms_random.value,'0')) fund_code
,      'CLAS'||LTRIM(TO_CHAR(9*dbms_random.value,'0')) class_fld
,      'PROD'||LTRIM(TO_CHAR(9*dbms_random.value,'0')) program_code
,      '' budget_ref
,      'AF'||LTRIM(TO_CHAR(999*dbms_random.value,'000')) affiliate 
,      'AFI'||LTRIM(TO_CHAR(99999*dbms_random.value,'00000')) affiliate_intra1
,      'AFI'||LTRIM(TO_CHAR( 9999*dbms_random.value,'0000')) affiliate_intra2
,      'CF'||LTRIM(TO_CHAR(  999*SQRT(dbms_random.value),'000')) chartfield1
,      'CF'||LTRIM(TO_CHAR(99999*dbms_random.value,'00000')) chartfield2
,      'CF'||LTRIM(TO_CHAR( 9999*dbms_random.value,'0000')) chartfield3
,      'PRJ'||LTRIM(TO_CHAR(9999*dbms_random.value,'0000')) project_id
,      'BK'||LTRIM(TO_CHAR(99*dbms_random.value,'00')) book_code
,      'GL'||LTRIM(TO_CHAR(99*dbms_random.value,'00')) gl_adjust_type
,      'GBP' currency_cd 
,      '' statistics_code 
,      2020 fiscal_year
,      7 accounting_period
,      dbms_random.value(0,1e6) posted_total_amt 
,      0 posted_base_amt 
,      0 posted_tran_amt 
,      'GBP' base_currency 
,      SYSDATE dttm_stamp_sec
,      0 process_instance 
FROM   n
/
set lines 200 pages 999 autotrace off
commit;
column owner format a10
column table_name format a15
column mview_name format a15
column detailobj_owner format a10 heading 'Detailobj|Owner'
column detailobj_name  format a15
column detailobj_alias format a20
column detail_partition_name format a20
column detail_subpartition_name format a20
column parent_table_partition format a20
select * from user_mview_detail_relations;
select * from user_mview_detail_partition;
select * from user_mview_detail_subpartition where freshness != 'FRESH';
SELECT MVIEW_NAME, STALENESS, LAST_REFRESH_TYPE, COMPILE_STATE FROM USER_MVIEWS ORDER BY MVIEW_NAME;
/

As soon as I have committed the insert, both the MVs need to be refreshed, even though none of the data queried by MV_LEDGER_2019 was changed. USER_MVIEW_DETAIL_RELATIONS reports PCT not applicable. No individual partitions are listed as stale.

MVIEW_NAME      STALENESS           LAST_REF COMPILE_STATE
--------------- ------------------- -------- -------------------
MV_LEDGER_2019  NEEDS_COMPILE       COMPLETE NEEDS_COMPILE
MV_LEDGER_2020  NEEDS_COMPILE       COMPLETE NEEDS_COMPILE

                           Detailobj
OWNER      MVIEW_NAME      Owner      DETAILOBJ_NAME  DETAILOBJ DETAILOBJ_ALIAS      D NUM_FRESH_PCT_PARTITIONS NUM_STALE_PCT_PARTITIONS
---------- --------------- ---------- --------------- --------- -------------------- - ------------------------ ------------------------
SCOTT      MV_LEDGER_2019  SCOTT      PS_LEDGER       TABLE     PS_LEDGER            N                       86                        0
SCOTT      MV_LEDGER_2020  SCOTT      PS_LEDGER       TABLE     PS_LEDGER            N                       86                        0

I no longer get Query Rewrite for either fiscal year.

SELECT L.TREE_NODE_NUM,L2.TREE_NODE_NUM,SUM(A.POSTED_TOTAL_AMT)
FROM   PS_LEDGER A
,      PSTREESELECT05 L1
,      PSTREESELECT10 L
,      PSTREESELECT10 L2
WHERE  A.LEDGER='ACTUALS'
AND    A.FISCAL_YEAR=2019
AND    A.ACCOUNTING_PERIOD BETWEEN 1 AND 6
AND    L1.SELECTOR_NUM=30982 AND A.BUSINESS_UNIT=L1.RANGE_FROM_05
AND    L.SELECTOR_NUM=30985 AND A.CHARTFIELD1=L.RANGE_FROM_10
AND    L2.SELECTOR_NUM=30984 AND A.ACCOUNT=L2.RANGE_FROM_10
AND    A.CURRENCY_CD='GBP'
GROUP BY L.TREE_NODE_NUM,L2.TREE_NODE_NUM
/ 

Plan hash value: 346876754
-----------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name              | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                   |   492 | 45756 |  2036   (1)| 00:00:01 |       |       |
|   1 |  HASH GROUP BY              |                   |   492 | 45756 |  2036   (1)| 00:00:01 |       |       |
|*  2 |   HASH JOIN                 |                   |   492 | 45756 |  2035   (1)| 00:00:01 |       |       |
|*  3 |    INDEX RANGE SCAN         | PS_PSTREESELECT10 |   239 |  4541 |     2   (0)| 00:00:01 |       |       |
|*  4 |    HASH JOIN                |                   |  2055 |   148K|  2033   (1)| 00:00:01 |       |       |
|   5 |     MERGE JOIN CARTESIAN    |                   |   154 |  4466 |     3   (0)| 00:00:01 |       |       |
|*  6 |      INDEX RANGE SCAN       | PS_PSTREESELECT05 |     2 |    22 |     1   (0)| 00:00:01 |       |       |
|   7 |      BUFFER SORT            |                   |    77 |  1386 |     2   (0)| 00:00:01 |       |       |
|*  8 |       INDEX RANGE SCAN      | PS_PSTREESELECT10 |    77 |  1386 |     1   (0)| 00:00:01 |       |       |
|   9 |     PARTITION RANGE ITERATOR|                   | 26686 |  1172K|  2030   (1)| 00:00:01 |     3 |     8 |
|  10 |      PARTITION LIST SINGLE  |                   | 26686 |  1172K|  2030   (1)| 00:00:01 |     1 |     1 |
|* 11 |       TABLE ACCESS FULL     | PS_LEDGER         | 26686 |  1172K|  2030   (1)| 00:00:01 |   KEY |   KEY |
-----------------------------------------------------------------------------------------------------------------

Without PCT, I cannot do a partial refresh of a partitioned materialized view, and will not get query rewrite if just a single partition in the underlying table has changed, whether I need it for this query or not.

So is there a different partitioning strategy that will permit PCT to work effectively?

Demonstration 2: Simple 1-Dimensional Range Partitioning

Let's start with a simple range partitioned example; one partition per fiscal year.

CREATE TABLE ps_ledger
(business_unit VARCHAR2(5) NOT NULL
…
) PCTFREE 10 PCTUSED 80
PARTITION BY RANGE (FISCAL_YEAR) 
(PARTITION ledger_2018 VALUES LESS THAN (2019) PCTFREE 0 COMPRESS
,PARTITION ledger_2019 VALUES LESS THAN (2020) PCTFREE 0 COMPRESS
,PARTITION ledger_2020 VALUES LESS THAN (2021) PCTFREE 10 NOCOMPRESS
,PARTITION ledger_2021 VALUES LESS THAN (2022) PCTFREE 10 NOCOMPRESS)
ENABLE ROW MOVEMENT 
NOPARALLEL NOLOGGING
/
@treeselectors
@popledger

Now I am going to build a materialized view to summarise the ledger data by BUSINESS_UNIT, ACCOUNT and CHARTFIELD1, and of course by FISCAL_YEAR and ACCOUNTING_PERIOD.

CREATE MATERIALIZED VIEW mv_ledger_2020
PARTITION BY RANGE (FISCAL_YEAR)
(PARTITION ledger_2019 VALUES LESS THAN (2020) 
,PARTITION ledger_2020 VALUES LESS THAN (2021) 
) PCTFREE 0 COMPRESS NOPARALLEL
REFRESH COMPLETE ON DEMAND
ENABLE QUERY REWRITE AS 
SELECT business_unit, ledger, account, chartfield1, fiscal_year, accounting_period,
sum(posted_total_amt) posted_total_amt
FROM ps_ledger
WHERE fiscal_year >= 2019
AND   ledger = 'ACTUALS'
AND   currency_cd = 'GBP'
GROUP BY business_unit, ledger, account, chartfield1, fiscal_year, accounting_period
/
@mvpop

I can see the MV has partitions for 2019 and 2020 populated, and they contain fewer rows than the original.

                                          Sub-                                             Rows
                Part                      Part                                              per
TABLE_NAME       Pos PARTITION_NAME        Pos SUBPARTITION_NAME         NUM_ROWS BLOCKS  Block COMPRESS COMPRESS_FOR
--------------- ---- -------------------- ---- ------------------------- -------- ------ ------ -------- -------------------
MV_LEDGER_2020     1 LEDGER_2019                                                                ENABLED  BASIC
                   2 LEDGER_2020                                                                ENABLED  BASIC
                                                                          1456077   4864  299.4

PS_LEDGER          1 LEDGER_2018                                          1100000  17893   61.5 ENABLED  BASIC
                   2 LEDGER_2019                                          1100000  17892   61.5 ENABLED  BASIC
                   3 LEDGER_2020                                           637915  16456   38.8 DISABLED
                   4 LEDGER_2021                                           100000   2559   39.1 DISABLED
                                                                          2937915  54800   53.6

When I query 2018 ledger data, for which there is no materialized view, the execution plan shows that Oracle full scanned only the first partition of the PS_LEDGER table that contains the 2018 data. It eliminated the other partitions.

Plan hash value: 1780139226
---------------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name              | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                   |   822 | 76446 |  4883   (1)| 00:00:01 |       |       |
|   1 |  HASH GROUP BY            |                   |   822 | 76446 |  4883   (1)| 00:00:01 |       |       |
|*  2 |   HASH JOIN               |                   |   822 | 76446 |  4882   (1)| 00:00:01 |       |       |
|*  3 |    INDEX RANGE SCAN       | PS_PSTREESELECT10 |   228 |  4332 |     2   (0)| 00:00:01 |       |       |
|*  4 |    HASH JOIN              |                   |  3601 |   260K|  4880   (1)| 00:00:01 |       |       |
|   5 |     MERGE JOIN CARTESIAN  |                   |   180 |  5220 |     3   (0)| 00:00:01 |       |       |
|*  6 |      INDEX RANGE SCAN     | PS_PSTREESELECT05 |     2 |    22 |     1   (0)| 00:00:01 |       |       |
|   7 |      BUFFER SORT          |                   |    90 |  1620 |     2   (0)| 00:00:01 |       |       |
|*  8 |       INDEX RANGE SCAN    | PS_PSTREESELECT10 |    90 |  1620 |     1   (0)| 00:00:01 |       |       |
|   9 |     PARTITION RANGE SINGLE|                   | 39970 |  1756K|  4877   (1)| 00:00:01 |     1 |     1 |
|* 10 |      TABLE ACCESS FULL    | PS_LEDGER         | 39970 |  1756K|  4877   (1)| 00:00:01 |     1 |     1 |
---------------------------------------------------------------------------------------------------------------
…
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("A"."ACCOUNT"="L2"."RANGE_FROM_10")
   3 - access("L2"."SELECTOR_NUM"=30984)
   4 - access("A"."BUSINESS_UNIT"="L1"."RANGE_FROM_05" AND "A"."CHARTFIELD1"="L"."RANGE_FROM_10")
   6 - access("L1"."SELECTOR_NUM"=30982)
   8 - access("L"."SELECTOR_NUM"=30985)
  10 - filter("A"."ACCOUNTING_PERIOD"<=6 AND "A"."LEDGER"='ACTUALS' AND "A"."FISCAL_YEAR"=2018 AND
"A"."ACCOUNTING_PERIOD">=1 AND "A"."CURRENCY_CD"='GBP')

When I query the 2020 data, Oracle has rewritten the query to use the second partition of the materialised view. Again it only queried a single partition.

Plan hash value: 4006930814
----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name              | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                   |  1088 | 88128 |   674   (2)| 00:00:01 |       |       |
|   1 |  HASH GROUP BY                   |                   |  1088 | 88128 |   674   (2)| 00:00:01 |       |       |
|*  2 |   HASH JOIN                      |                   |  1088 | 88128 |   673   (2)| 00:00:01 |       |       |
|*  3 |    INDEX RANGE SCAN              | PS_PSTREESELECT10 |   228 |  4332 |     2   (0)| 00:00:01 |       |       |
|*  4 |    HASH JOIN                     |                   |  4767 |   288K|   671   (2)| 00:00:01 |       |       |
|   5 |     MERGE JOIN CARTESIAN         |                   |   180 |  5220 |     3   (0)| 00:00:01 |       |       |
|*  6 |      INDEX RANGE SCAN            | PS_PSTREESELECT05 |     2 |    22 |     1   (0)| 00:00:01 |       |       |
|   7 |      BUFFER SORT                 |                   |    90 |  1620 |     2   (0)| 00:00:01 |       |       |
|*  8 |       INDEX RANGE SCAN           | PS_PSTREESELECT10 |    90 |  1620 |     1   (0)| 00:00:01 |       |       |
|   9 |     PARTITION RANGE SINGLE       |                   | 52909 |  1705K|   668   (2)| 00:00:01 |     2 |     2 |
|* 10 |      MAT_VIEW REWRITE ACCESS FULL| MV_LEDGER_2020    | 52909 |  1705K|   668   (2)| 00:00:01 |     2 |     2 |
----------------------------------------------------------------------------------------------------------------------
…
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("MV_LEDGER_2020"."ACCOUNT"="L2"."RANGE_FROM_10")
   3 - access("L2"."SELECTOR_NUM"=30984)
   4 - access("MV_LEDGER_2020"."BUSINESS_UNIT"="L1"."RANGE_FROM_05" AND
"MV_LEDGER_2020"."CHARTFIELD1"="L"."RANGE_FROM_10")
   6 - access("L1"."SELECTOR_NUM"=30982)
   8 - access("L"."SELECTOR_NUM"=30985)
  10 - filter("MV_LEDGER_2020"."ACCOUNTING_PERIOD"<=6 AND "MV_LEDGER_2020"."FISCAL_YEAR"=2020 AND
"MV_LEDGER_2020"."ACCOUNTING_PERIOD">=1)

Now I am going to simulate running financial processing for period 7 in fiscal year 2020, by inserting data into PS_LEDGER for that period.

@pop2020m7.sql

The materialised view status and staleness on USER_MVIEWS changes to NEEDS_COMPILE when the insert into PS_LEDGER is committed.

USER_MVIEW_DETAIL_RELATIONS shows that 1 tracked partition is stale but three are still fresh.
USER_MVIEW_DETAIL_PARTITION shows the tracking status of each source partition. We can see that the LEDGER_2020 partition on PS_LEDGER is stale but the others are still fresh.

22:00:01 SQL> SELECT MVIEW_NAME, STALENESS, LAST_REFRESH_TYPE, COMPILE_STATE FROM USER_MVIEWS ORDER BY MVIEW_NAME;

MVIEW_NAME      STALENESS           LAST_REF COMPILE_STATE
--------------- ------------------- -------- -------------------
MV_LEDGER_2020  NEEDS_COMPILE       COMPLETE NEEDS_COMPILE

22:00:01 SQL> select * from user_mview_detail_relations;

                           Detailobj
OWNER      MVIEW_NAME      Owner      DETAILOBJ_NAME  DETAILOBJ DETAILOBJ_ALIAS      D NUM_FRESH_PCT_PARTITIONS NUM_STALE_PCT_PARTITIONS
---------- --------------- ---------- --------------- --------- -------------------- - ------------------------ ------------------------
SCOTT      MV_LEDGER_2020  SCOTT      PS_LEDGER       TABLE     PS_LEDGER            Y                        3                        1

22:00:01 SQL> select * from user_mview_detail_partition;

                           Detailobj
OWNER      MVIEW_NAME      Owner      DETAILOBJ_NAME  DETAIL_PARTITION_NAM DETAIL_PARTITION_POSITION FRESHNE LAST_REFRESH_TIME
---------- --------------- ---------- --------------- -------------------- ------------------------- ------- -------------------
SCOTT      MV_LEDGER_2020  SCOTT      PS_LEDGER       LEDGER_2018                                  1 FRESH   21:59:41 15/11/2020
SCOTT      MV_LEDGER_2020  SCOTT      PS_LEDGER       LEDGER_2019                                  2 FRESH   21:59:41 15/11/2020
SCOTT      MV_LEDGER_2020  SCOTT      PS_LEDGER       LEDGER_2020                                  3 STALE   21:59:41 15/11/2020
SCOTT      MV_LEDGER_2020  SCOTT      PS_LEDGER       LEDGER_2021                                  4 FRESH   21:59:41 15/11/2020

The query on 2019 still rewrites because the 2019 partition is fresh

Plan hash value: 4006930814
----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name              | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                   |  1088 | 88128 |   674   (2)| 00:00:01 |       |       |
|   1 |  HASH GROUP BY                   |                   |  1088 | 88128 |   674   (2)| 00:00:01 |       |       |
|*  2 |   HASH JOIN                      |                   |  1088 | 88128 |   673   (2)| 00:00:01 |       |       |
|*  3 |    INDEX RANGE SCAN              | PS_PSTREESELECT10 |   228 |  4332 |     2   (0)| 00:00:01 |       |       |
|*  4 |    HASH JOIN                     |                   |  4767 |   288K|   671   (2)| 00:00:01 |       |       |
|   5 |     MERGE JOIN CARTESIAN         |                   |   180 |  5220 |     3   (0)| 00:00:01 |       |       |
|*  6 |      INDEX RANGE SCAN            | PS_PSTREESELECT05 |     2 |    22 |     1   (0)| 00:00:01 |       |       |
|   7 |      BUFFER SORT                 |                   |    90 |  1620 |     2   (0)| 00:00:01 |       |       |
|*  8 |       INDEX RANGE SCAN           | PS_PSTREESELECT10 |    90 |  1620 |     1   (0)| 00:00:01 |       |       |
|   9 |     PARTITION RANGE SINGLE       |                   | 52909 |  1705K|   668   (2)| 00:00:01 |     1 |     1 |
|* 10 |      MAT_VIEW REWRITE ACCESS FULL| MV_LEDGER_2020    | 52909 |  1705K|   668   (2)| 00:00:01 |     1 |     1 |
----------------------------------------------------------------------------------------------------------------------
…
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("MV_LEDGER_2020"."ACCOUNT"="L2"."RANGE_FROM_10")
   3 - access("L2"."SELECTOR_NUM"=30984)
   4 - access("MV_LEDGER_2020"."BUSINESS_UNIT"="L1"."RANGE_FROM_05" AND
"MV_LEDGER_2020"."CHARTFIELD1"="L"."RANGE_FROM_10")
   6 - access("L1"."SELECTOR_NUM"=30982)
   8 - access("L"."SELECTOR_NUM"=30985)
  10 - filter("MV_LEDGER_2020"."ACCOUNTING_PERIOD"<=6 AND "MV_LEDGER_2020"."FISCAL_YEAR"=2019 AND
"MV_LEDGER_2020"."ACCOUNTING_PERIOD">=1)

But we no longer get rewrite on the 2020 partition because it is stale. The query stays on PS_LEDGER.

Plan hash value: 1780139226

---------------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name              | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                   |   477 | 44361 |  4483   (1)| 00:00:01 |       |       |
|   1 |  HASH GROUP BY            |                   |   477 | 44361 |  4483   (1)| 00:00:01 |       |       |
|*  2 |   HASH JOIN               |                   |   477 | 44361 |  4482   (1)| 00:00:01 |       |       |
|*  3 |    INDEX RANGE SCAN       | PS_PSTREESELECT10 |   228 |  4332 |     2   (0)| 00:00:01 |       |       |
|*  4 |    HASH JOIN              |                   |  2090 |   151K|  4479   (1)| 00:00:01 |       |       |
|   5 |     MERGE JOIN CARTESIAN  |                   |   180 |  5220 |     3   (0)| 00:00:01 |       |       |
|*  6 |      INDEX RANGE SCAN     | PS_PSTREESELECT05 |     2 |    22 |     1   (0)| 00:00:01 |       |       |
|   7 |      BUFFER SORT          |                   |    90 |  1620 |     2   (0)| 00:00:01 |       |       |
|*  8 |       INDEX RANGE SCAN    | PS_PSTREESELECT10 |    90 |  1620 |     1   (0)| 00:00:01 |       |       |
|   9 |     PARTITION RANGE SINGLE|                   | 23179 |  1018K|  4476   (1)| 00:00:01 |     3 |     3 |
|* 10 |      TABLE ACCESS FULL    | PS_LEDGER         | 23179 |  1018K|  4476   (1)| 00:00:01 |     3 |     3 |
---------------------------------------------------------------------------------------------------------------
…
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("A"."ACCOUNT"="L2"."RANGE_FROM_10")
   3 - access("L2"."SELECTOR_NUM"=30984)
   4 - access("A"."BUSINESS_UNIT"="L1"."RANGE_FROM_05" AND "A"."CHARTFIELD1"="L"."RANGE_FROM_10")
   6 - access("L1"."SELECTOR_NUM"=30982)
   8 - access("L"."SELECTOR_NUM"=30985)
  10 - filter("A"."ACCOUNTING_PERIOD"<=6 AND "A"."LEDGER"='ACTUALS' AND "A"."FISCAL_YEAR"=2020 AND
"A"."ACCOUNTING_PERIOD">=1 AND "A"."CURRENCY_CD"='GBP')

So now I have to refresh the view. I am going to use

method P to indicate that it should use PCT,
atomic refresh is set to false because I want Oracle to truncate the partition and repopulate it in direct path mode so that the data is compressed (because I am not licenced for advanced compression).

I am also going to trace the refresh process so I can see what actually happened. I will give the trace file an identifying suffix to make it easier to find. I can query the trace file name from v$diag_info.

I need to collect statistics myself, or they won't be updated.

REM mvtrc.sql
disconnect
connect scott/tiger@oracle_pdb

column name format a20
column value format a70
alter session set tracefile_identifier=PCT;
select * from v$diag_info where name like '%Trace%';

alter session set sql_trace = true;
exec dbms_mview.refresh(list=>'MV_LEDGER_2019',method=>'P',atomic_refresh=>FALSE);
exec dbms_mview.refresh(list=>'MV_LEDGER_2020',method=>'P',atomic_refresh=>FALSE);
alter session set sql_trace = false;
exec dbms_stats.gather_Table_stats(user,'MV_LEDGER_2019');
exec dbms_stats.gather_Table_stats(user,'MV_LEDGER_2020');
v$diag_info indicates the trace file
   INST_ID NAME                 VALUE                                                                      CON_ID
---------- -------------------- ---------------------------------------------------------------------- ----------
         1 Diag Trace           /u01/app/oracle/diag/rdbms/oracle/oracle/trace                                  0
         1 Default Trace File   /u01/app/oracle/diag/rdbms/oracle/oracle/trace/oracle_ora_7802_PCT.trc          0

I can see the total number of rows in MV_LEDGER_2020 has gone up from 1455085 to 1528980, reflecting the rows I inserted.

                                          Sub-                                             Rows
                Part                      Part                                              per
TABLE_NAME       Pos PARTITION_NAME        Pos SUBPARTITION_NAME         NUM_ROWS BLOCKS  Block COMPRESS COMPRESS_FOR
--------------- ---- -------------------- ---- ------------------------- -------- ------ ------ -------- ------------------------------
MV_LEDGER_2020     1 LEDGER_2019                                           946825   3173  298.4 ENABLED  BASIC
                   2 LEDGER_2020                                           582155   1926  302.3 ENABLED  BASIC
                                                                          1528980   5099  299.9

PS_LEDGER          1 LEDGER_2018                                          1100000  17893   61.5 ENABLED  BASIC
                   2 LEDGER_2019                                          1100000  17892   61.5 ENABLED  BASIC
                   3 LEDGER_2020                                           637915  16456   38.8 DISABLED
                   4 LEDGER_2021                                           100000   2559   39.1 DISABLED
                                                                          2937915  54800   53.6

I am just going to pick out the statements from the trace that alter the materialized view. I can see the LEDGER_2020 partition was truncated and then the data for the stale ledger partition is reinserted in direct path mode, so it will have been compressed. Statistics confirm this as I can calculate that the number of rows per block is still around 300.

…
/* MV_REFRESH (ATB) */ ALTER TABLE "SCOTT"."MV_LEDGER_2020" TRUNCATE  PARTITION LEDGER_2020
…
/* MV_REFRESH (INS) */ INSERT /*+ APPEND BYPASS_RECURSIVE_CHECK */  INTO "SCOTT"."MV_LEDGER_2020" PARTITION ( LEDGER_2020 ) ("BUSINESS_UNIT"
,"LEDGER", "ACCOUNT", "CHARTFIELD1", "FISCAL_YEAR", "ACCOUNTING_PERIOD", "POSTED_TOTAL_AMT") SELECT  /*+ X_DYN_PRUNE */ 
"PS_LEDGER"."BUSINESS_UNIT", "PS_LEDGER"."LEDGER" , "PS_LEDGER"."ACCOUNT" , "PS_LEDGER"."CHARTFIELD1" , "PS_LEDGER"."FISCAL_YEAR" P0,
"PS_LEDGER"."ACCOUNTING_PERIOD" ,SUM("PS_LEDGER"."POSTED_TOTAL_AMT")  FROM "PS_LEDGER""PS_LEDGER" WHERE ("PS_LEDGER"."FISCAL_YEAR">=2019 
AND "PS_LEDGER"."LEDGER"='ACTUALS' AND "PS_LEDGER"."CURRENCY_CD"='GBP') AND  ( ( (  (  (  ( "PS_LEDGER"."FISCAL_YEAR">= 2020 )  )  )  
AND  (  (  ( "PS_LEDGER"."FISCAL_YEAR"< 2021 )  )  ) )  ) ) GROUP BY "PS_LEDGER"."BUSINESS_UNIT","PS_LEDGER"."LEDGER"
,"PS_LEDGER"."ACCOUNT","PS_LEDGER"."CHARTFIELD1","PS_LEDGER"."FISCAL_YEAR","PS_LEDGER"."ACCOUNTING_PERIOD"
…

I can use EXPLAIN_MVIEW to check the status of MV_LEDGER_2020. PCT is enabled for refresh and rewrite.

CAPABILITY_NAME                P REL_TEXT             MSGTXT
------------------------------ - -------------------- ------------------------------------------------------------
PCT                            Y
REFRESH_COMPLETE               Y
REFRESH_FAST                   Y
REWRITE                        Y
PCT_TABLE                      Y PS_LEDGER
REFRESH_FAST_AFTER_INSERT      N SCOTT.PS_LEDGER      the detail table does not have a materialized view log
REFRESH_FAST_AFTER_ONETAB_DML  N POSTED_TOTAL_AMT     SUM(expr) without COUNT(expr)
REFRESH_FAST_AFTER_ONETAB_DML  N                      see the reason why REFRESH_FAST_AFTER_INSERT is disabled
REFRESH_FAST_AFTER_ONETAB_DML  N                      COUNT(*) is not present in the select list
REFRESH_FAST_AFTER_ONETAB_DML  N                      SUM(expr) without COUNT(expr)
REFRESH_FAST_AFTER_ANY_DML     N                      see the reason why REFRESH_FAST_AFTER_ONETAB_DML is disabled
REFRESH_FAST_PCT               Y
REWRITE_FULL_TEXT_MATCH        Y
REWRITE_PARTIAL_TEXT_MATCH     Y
REWRITE_GENERAL                Y
REWRITE_PCT                    Y
PCT_TABLE_REWRITE              Y PS_LEDGER

I can see PCT has worked.

I still get query rewrite for the partitions that are still fresh rather than stale.
The refresh process refreshes only the stale partitions.

However, I have to regenerate the materialized view for the whole fiscal year, when I have only changed one accounting period. Could I organise it to refresh just a single accounting period?

Demonstration 3: Interval Partitioning

This time I am going to use interval partitioning. I have explicitly specified the partitions for previous years because I don't want to allow any free space in the blocks, but the current and future partitions will be created automatically.

CREATE TABLE ps_ledger
(business_unit VARCHAR2(5) NOT NULL
…
) PCTFREE 10 PCTUSED 80
PARTITION BY RANGE (FISCAL_YEAR) INTERVAL (1)
(PARTITION ledger_2018 VALUES LESS THAN (2019) PCTFREE 0 COMPRESS
,PARTITION ledger_2019 VALUES LESS THAN (2020) PCTFREE 0 COMPRESS)
ENABLE ROW MOVEMENT NOLOGGING
/
@treeselectors
@popledger

I will similarly create a single materialized view with interval partitioning per fiscal year and populate it for 2019 onwards.

CREATE MATERIALIZED VIEW mv_ledger_2020
PARTITION BY RANGE (FISCAL_YEAR) INTERVAL (1)
(PARTITION ledger_2019 VALUES LESS THAN (2020)
) PCTFREE 0 COMPRESS
REFRESH COMPLETE ON DEMAND
ENABLE QUERY REWRITE AS
SELECT business_unit, account, chartfield1, fiscal_year, accounting_period,
sum(posted_total_amt) posted_total_amt
FROM ps_ledger
WHERE fiscal_year >= 2019
AND   ledger = 'ACTUALS'
AND   currency_cd = 'GBP'
GROUP BY business_unit, account, chartfield1, fiscal_year, accounting_period
/
@@mvpop
@@mvvol
@@mvsql

I get exactly the same behaviour as the previous demonstration. The only difference is that the new partitions have system generated names, but as before just one of them is identified as stale.

@pop2020m7.sql
23:25:42 SQL> SELECT MVIEW_NAME, STALENESS, LAST_REFRESH_TYPE, COMPILE_STATE FROM USER_MVIEWS ORDER BY MVIEW_NAME;

MVIEW_NAME      STALENESS           LAST_REF COMPILE_STATE
--------------- ------------------- -------- -------------------
MV_LEDGER_2020  NEEDS_COMPILE       COMPLETE NEEDS_COMPILE

23:25:42 SQL> select * from user_mview_detail_relations;

                           Detailobj
OWNER      MVIEW_NAME      Owner      DETAILOBJ_NAME  DETAILOBJ DETAILOBJ_ALIAS      D NUM_FRESH_PCT_PARTITIONS NUM_STALE_PCT_PARTITIONS
---------- --------------- ---------- --------------- --------- -------------------- - ------------------------ ------------------------
SCOTT      MV_LEDGER_2020  SCOTT      PS_LEDGER       TABLE     PS_LEDGER            Y                        3                        1

23:25:42 SQL> select * from user_mview_detail_partition;

                           Detailobj
OWNER      MVIEW_NAME      Owner      DETAILOBJ_NAME  DETAIL_PARTITION_NAM DETAIL_PARTITION_POSITION FRESHNE LAST_REFRESH_TIME
---------- --------------- ---------- --------------- -------------------- ------------------------- ------- -------------------
SCOTT      MV_LEDGER_2020  SCOTT      PS_LEDGER       LEDGER_2018                                  1 FRESH   23:25:21 15/11/2020
SCOTT      MV_LEDGER_2020  SCOTT      PS_LEDGER       LEDGER_2019                                  2 FRESH   23:25:21 15/11/2020
SCOTT      MV_LEDGER_2020  SCOTT      PS_LEDGER       SYS_P981                                     3 STALE   23:25:21 15/11/2020
SCOTT      MV_LEDGER_2020  SCOTT      PS_LEDGER       SYS_P982                                     4 FRESH   23:25:21 15/11/2020

However, when I look in the trace of the refresh, I see that it has truncated and repopulated the partitions for both 2020 and 2021 even though I didn't change any of the data in the 2021 partition, and it is listed as fresh.

…
/* MV_REFRESH (ATB) */ ALTER TABLE "SCOTT"."MV_LEDGER_2020"TRUNCATE  PARTITION SYS_P987
…
/* MV_REFRESH (ATB) */ ALTER TABLE "SCOTT"."MV_LEDGER_2020"TRUNCATE  PARTITION SYS_P986
…
/* MV_REFRESH (INS) */ INSERT /*+ APPEND BYPASS_RECURSIVE_CHECK */  FIRST  WHEN  (  (  (  ( "P0">= 2020 )  )  )  AND  (  (  ( "P0"< 2021 )
)  )  )  THEN  INTO "SCOTT"."MV_LEDGER_2020"PARTITION (SYS_P986)("BUSINESS_UNIT", "ACCOUNT", "CHARTFIELD1", "FISCAL_YEAR", 
"ACCOUNTING_PERIOD", "POSTED_TOTAL_AMT")  WHEN  (  (  (  ( "P0">= 2021 )  )  )  AND  (  (  ( "P0"< 2022 )  )  )  )  THEN  INTO 
"SCOTT"."MV_LEDGER_2020"PARTITION (SYS_P987)("BUSINESS_UNIT", "ACCOUNT", "CHARTFIELD1", "FISCAL_YEAR", "ACCOUNTING_PERIOD", 
"POSTED_TOTAL_AMT") SELECT  /*+ X_DYN_PRUNE */ "PS_LEDGER"."BUSINESS_UNIT" , "PS_LEDGER"."ACCOUNT" , "PS_LEDGER"."CHARTFIELD1" , 
"PS_LEDGER"."FISCAL_YEAR" P0, "PS_LEDGER"."ACCOUNTING_PERIOD" ,SUM("PS_LEDGER"."POSTED_TOTAL_AMT")  FROM "PS_LEDGER""PS_LEDGER" WHERE
("PS_LEDGER"."FISCAL_YEAR">=2019 AND "PS_LEDGER"."LEDGER"='ACTUALS' AND "PS_LEDGER"."CURRENCY_CD"='GBP') AND  ( ( (  (  (  ( 
"PS_LEDGER"."FISCAL_YEAR">= 2020 )  )  )  AND  (  (  ( "PS_LEDGER"."FISCAL_YEAR"< 2022 )  )  )  )  ) ) GROUP BY
"PS_LEDGER"."BUSINESS_UNIT","PS_LEDGER"."ACCOUNT","PS_LEDGER"."CHARTFIELD1","PS_LEDGER"."FISCAL_YEAR","PS_LEDGER"."ACCOUNTING_PERIOD"
…

In practice, in this particular case, it won't make a huge difference because there is no actuals data in 2021. The partition for 2021 has been created in the data dictionary, but due to deferred segment creation, it has not been physically created because there is no data in it. However, if I had updated data in 2019, then it would have truncated and repopulated two partitions (2019 and 2020).

Interval partitioning is a form of range partitioning, so it is expected that PCT still works. However, I have no explanation as to why the partition following the stale partition was also refreshed. This might be a bug.

Demonstration 4: Composite (Range-List) Partitioning

This time I am going to create a composite partitioned table. It will have the same range partitioning on FISCAL_YEAR, but then I will list subpartition it by ACCOUTING_PERIOD with 14 periods per fiscal year. I will use a template so that each partition will have the same subpartitions.

CREATE TABLE ps_ledger
(business_unit VARCHAR2(5) NOT NULL
…
) PCTFREE 10 PCTUSED 80
PARTITION BY RANGE (FISCAL_YEAR) 
SUBPARTITION BY LIST (ACCOUNTING_PERIOD)
SUBPARTITION TEMPLATE
(SUBPARTITION ap_bf VALUES (0) 
,SUBPARTITION ap_01 VALUES (1) 
,SUBPARTITION ap_02 VALUES (2) 
,SUBPARTITION ap_03 VALUES (3) 
,SUBPARTITION ap_04 VALUES (4) 
,SUBPARTITION ap_05 VALUES (5) 
,SUBPARTITION ap_06 VALUES (6) 
,SUBPARTITION ap_07 VALUES (7) 
,SUBPARTITION ap_08 VALUES (8) 
,SUBPARTITION ap_09 VALUES (9) 
,SUBPARTITION ap_10 VALUES (10) 
,SUBPARTITION ap_11 VALUES (11) 
,SUBPARTITION ap_12 VALUES (12) 
,SUBPARTITION ap_cf VALUES (DEFAULT))
 (PARTITION ledger_2018 VALUES LESS THAN (2019) PCTFREE 0 COMPRESS
,PARTITION ledger_2019 VALUES LESS THAN (2020) PCTFREE 0 COMPRESS
,PARTITION ledger_2020 VALUES LESS THAN (2021) 
,PARTITION ledger_2021 VALUES LESS THAN (2022) 
) ENABLE ROW MOVEMENT NOPARALLEL NOLOGGING
/
@treeselectors
@popledger

I will similarly partition the materialized view

CREATE MATERIALIZED VIEW mv_ledger_2020
PARTITION BY RANGE (FISCAL_YEAR) 
SUBPARTITION BY LIST (ACCOUNTING_PERIOD)
SUBPARTITION TEMPLATE
(SUBPARTITION ap_bf VALUES (0) 
,SUBPARTITION ap_01 VALUES (1) 
,SUBPARTITION ap_02 VALUES (2) 
,SUBPARTITION ap_03 VALUES (3) 
,SUBPARTITION ap_04 VALUES (4) 
,SUBPARTITION ap_05 VALUES (5) 
,SUBPARTITION ap_06 VALUES (6) 
,SUBPARTITION ap_07 VALUES (7) 
,SUBPARTITION ap_08 VALUES (8) 
,SUBPARTITION ap_09 VALUES (9) 
,SUBPARTITION ap_10 VALUES (10) 
,SUBPARTITION ap_11 VALUES (11) 
,SUBPARTITION ap_12 VALUES (12) 
,SUBPARTITION ap_cf VALUES (DEFAULT))
(PARTITION ledger_2019 VALUES LESS THAN (2020)
,PARTITION ledger_2020 VALUES LESS THAN (2021)
) PCTFREE 0 COMPRESS PARALLEL
REFRESH COMPLETE ON DEMAND
ENABLE QUERY REWRITE AS
SELECT business_unit, account, chartfield1, fiscal_year, accounting_period,
sum(posted_total_amt) posted_total_amt
FROM ps_ledger
WHERE fiscal_year >= 2019
AND   ledger = 'ACTUALS'
AND   currency_cd = 'GBP'
GROUP BY business_unit, account, chartfield1, fiscal_year, accounting_period
/
@mvpop

PCT does work properly. USER_MVIEW_DETAIL_PARTITION reports that one partition is stale USER_MVIEW_DETAIL_SUBPARTITION correctly identified that it is a stale sub-partition, but as expected, the materialized view refresh truncates the partition not the sub-partition and repopulates it. So we are still processing a whole fiscal year.

@pop2020m7.sql
17:40:03 SQL> SELECT MVIEW_NAME, STALENESS, LAST_REFRESH_TYPE, COMPILE_STATE FROM USER_MVIEWS ORDER BY MVIEW_NAME;

MVIEW_NAME      STALENESS           LAST_REF COMPILE_STATE
--------------- ------------------- -------- -------------------
MV_LEDGER_2020  NEEDS_COMPILE       COMPLETE NEEDS_COMPILE

17:40:03 SQL> select * from user_mview_detail_relations;

                           Detailobj
OWNER      MVIEW_NAME      Owner      DETAILOBJ_NAME  DETAILOBJ DETAILOBJ_ALIAS      D NUM_FRESH_PCT_PARTITIONS NUM_STALE_PCT_PARTITIONS
---------- --------------- ---------- --------------- --------- -------------------- - ------------------------ ------------------------
SCOTT      MV_LEDGER_2020  SCOTT      PS_LEDGER       TABLE     PS_LEDGER            Y                       55                        1

17:40:10 SQL> select * from user_mview_detail_partition;

no rows selected

17:40:10 SQL> select * from user_mview_detail_subpartition where freshness != 'FRESH';

                           Detailobj
OWNER      MVIEW_NAME      Owner      DETAILOBJ_NAME  DETAIL_PARTITION_NAM DETAIL_SUBPARTITION_ DETAIL_SUBPARTITION_POSITION FRESH
---------- --------------- ---------- --------------- -------------------- -------------------- ---------------------------- -----
SCOTT      MV_LEDGER_2020  SCOTT      PS_LEDGER       LEDGER_2020          LEDGER_2020_AP_07                               8 STALE

If I query periods 1 to 6 in 2020 using a BETWEEN, this is then expanded to two inequalities that I can see in the predicate section. These subpartitions are up to date, and Oracle performs query rewrite.

Plan hash value: 1400212726
-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name              | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                   | 12260 |   969K|       |   664   (1)| 00:00:01 |       |       |
|   1 |  HASH GROUP BY                    |                   | 12260 |   969K|  1128K|   664   (1)| 00:00:01 |       |       |
|*  2 |   HASH JOIN                       |                   | 12260 |   969K|       |   428   (2)| 00:00:01 |       |       |
|*  3 |    INDEX RANGE SCAN               | PS_PSTREESELECT10 |   270 |  5130 |       |     2   (0)| 00:00:01 |       |       |
|*  4 |    HASH JOIN                      |                   | 45363 |  2746K|       |   425   (1)| 00:00:01 |       |       |
|   5 |     MERGE JOIN CARTESIAN          |                   |   182 |  5278 |       |     3   (0)| 00:00:01 |       |       |
|*  6 |      INDEX RANGE SCAN             | PS_PSTREESELECT05 |     2 |    22 |       |     1   (0)| 00:00:01 |       |       |
|   7 |      BUFFER SORT                  |                   |    91 |  1638 |       |     2   (0)| 00:00:01 |       |       |
|*  8 |       INDEX RANGE SCAN            | PS_PSTREESELECT10 |    91 |  1638 |       |     1   (0)| 00:00:01 |       |       |
|   9 |     PARTITION RANGE SINGLE        |                   |   497K|    15M|       |   421   (1)| 00:00:01 |     2 |     2 |
|  10 |      PARTITION LIST ITERATOR      |                   |   497K|    15M|       |   421   (1)| 00:00:01 |   KEY |   KEY |
|* 11 |       MAT_VIEW REWRITE ACCESS FULL| MV_LEDGER_2020    |   497K|    15M|       |   421   (1)| 00:00:01 |    15 |    28 |
-------------------------------------------------------------------------------------------------------------------------------
…
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("MV_LEDGER_2020"."ACCOUNT"="L2"."RANGE_FROM_10")
   3 - access("L2"."SELECTOR_NUM"=30984)
   4 - access("MV_LEDGER_2020"."BUSINESS_UNIT"="L1"."RANGE_FROM_05" AND
"MV_LEDGER_2020"."CHARTFIELD1"="L"."RANGE_FROM_10")
   6 - access("L1"."SELECTOR_NUM"=30982)
   8 - access("L"."SELECTOR_NUM"=30985)
  11 - filter("MV_LEDGER_2020"."ACCOUNTING_PERIOD">=1 AND "MV_LEDGER_2020"."ACCOUNTING_PERIOD"<=6 AND
"MV_LEDGER_2020"."FISCAL_YEAR"=2020)

But if I create period 7 in fiscal year 2020, then that subpartition is stale and Oracle leaves the query against that period as submitted to run against PS_LEDGER.

Plan hash value: 3964652976
---------------------------------------------------------------------------------------------------------------------------------------------
|   Id  | Operation                                             | Name              | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------------------------
|     0 | SELECT STATEMENT                                      |                   |     1 |    92 |     7  (15)| 00:00:01 |       |       |
|     1 |  HASH GROUP BY                                        |                   |     1 |    92 |     7  (15)| 00:00:01 |       |       |
|- *  2 |   HASH JOIN                                           |                   |     1 |    92 |     6   (0)| 00:00:01 |       |       |
|     3 |    NESTED LOOPS                                       |                   |     1 |    92 |     6   (0)| 00:00:01 |       |       |
|-    4 |     STATISTICS COLLECTOR                              |                   |       |       |            |          |       |       |
|- *  5 |      HASH JOIN                                        |                   |     1 |    73 |     5   (0)| 00:00:01 |       |       |
|     6 |       NESTED LOOPS                                    |                   |     1 |    73 |     5   (0)| 00:00:01 |       |       |
|-    7 |        STATISTICS COLLECTOR                           |                   |       |       |            |          |       |       |
|- *  8 |         HASH JOIN                                     |                   |     1 |    55 |     4   (0)| 00:00:01 |       |       |
|     9 |          NESTED LOOPS                                 |                   |     1 |    55 |     4   (0)| 00:00:01 |       |       |
|-   10 |           STATISTICS COLLECTOR                        |                   |       |       |            |          |       |       |
|    11 |            PARTITION RANGE SINGLE                     |                   |     1 |    44 |     3   (0)| 00:00:01 |     3 |     3 |
|    12 |             PARTITION LIST SINGLE                     |                   |     1 |    44 |     3   (0)| 00:00:01 |   KEY |   KEY |
|  * 13 |              TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| PS_LEDGER         |     1 |    44 |     3   (0)| 00:00:01 |    36 |    36 |
|  * 14 |               INDEX RANGE SCAN                        | PSXLEDGER         |     1 |       |     2   (0)| 00:00:01 |    36 |    36 |
|  * 15 |           INDEX RANGE SCAN                            | PS_PSTREESELECT05 |     1 |    11 |     1   (0)| 00:00:01 |       |       |
|- * 16 |          INDEX RANGE SCAN                             | PS_PSTREESELECT05 |     1 |    11 |     1   (0)| 00:00:01 |       |       |
|  * 17 |        INDEX RANGE SCAN                               | PS_PSTREESELECT10 |     1 |    18 |     1   (0)| 00:00:01 |       |       |
|- * 18 |       INDEX RANGE SCAN                                | PS_PSTREESELECT10 |     1 |    18 |     1   (0)| 00:00:01 |       |       |
|  * 19 |     INDEX RANGE SCAN                                  | PS_PSTREESELECT10 |     1 |    19 |     1   (0)| 00:00:01 |       |       |
|- * 20 |    INDEX RANGE SCAN                                   | PS_PSTREESELECT10 |     1 |    19 |     1   (0)| 00:00:01 |       |       |
---------------------------------------------------------------------------------------------------------------------------------------------
…

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("A"."ACCOUNT"="L2"."RANGE_FROM_10")
   5 - access("A"."CHARTFIELD1"="L"."RANGE_FROM_10")
   8 - access("A"."BUSINESS_UNIT"="L1"."RANGE_FROM_05")
  13 - filter("A"."CURRENCY_CD"='GBP')
  14 - access("A"."LEDGER"='ACTUALS' AND "A"."FISCAL_YEAR"=2020 AND "A"."ACCOUNTING_PERIOD"=7)
  15 - access("L1"."SELECTOR_NUM"=30982 AND "A"."BUSINESS_UNIT"="L1"."RANGE_FROM_05")
       filter("A"."BUSINESS_UNIT"="L1"."RANGE_FROM_05")
  16 - access("L1"."SELECTOR_NUM"=30982)
  17 - access("L"."SELECTOR_NUM"=30985 AND "A"."CHARTFIELD1"="L"."RANGE_FROM_10")
       filter("A"."CHARTFIELD1"="L"."RANGE_FROM_10")
  18 - access("L"."SELECTOR_NUM"=30985)
  19 - access("L2"."SELECTOR_NUM"=30984 AND "A"."ACCOUNT"="L2"."RANGE_FROM_10")
       filter("A"."ACCOUNT"="L2"."RANGE_FROM_10")
  20 - access("L2"."SELECTOR_NUM"=30984)

So PCT also controls query rewrite correctly on list partitioning. Again, when I look at the trace of the stale partition refresh, the entire 2020 partition was truncated and refreshed in direct-path mode. There is no accounting period criterion on the insert statement.

…
/* MV_REFRESH (ATB) */ ALTER TABLE "SCOTT"."MV_LEDGER_2020" TRUNCATE  PARTITION LEDGER_2020
…
/* MV_REFRESH (INS) */ INSERT /*+ APPEND BYPASS_RECURSIVE_CHECK */  INTO "SCOTT"."MV_LEDGER_2020" PARTITION ( LEDGER_2020 ) ("BUSINESS_UNIT",
"ACCOUNT", "CHARTFIELD1", "FISCAL_YEAR", "ACCOUNTING_PERIOD", "POSTED_TOTAL_AMT") SELECT  /*+ X_DYN_PRUNE */ "PS_LEDGER"."BUSINESS_UNIT" ,
"PS_LEDGER"."ACCOUNT" , "PS_LEDGER"."CHARTFIELD1" , "PS_LEDGER"."FISCAL_YEAR" P0, "PS_LEDGER"."ACCOUNTING_PERIOD" , 
SUM("PS_LEDGER"."POSTED_TOTAL_AMT")  FROM "PS_LEDGER""PS_LEDGER" WHERE ("PS_LEDGER"."FISCAL_YEAR"=2020 AND "PS_LEDGER"."LEDGER"='ACTUALS'
AND "PS_LEDGER"."CURRENCY_CD"='GBP') AND  ( ( (  (  (  ( "PS_LEDGER"."FISCAL_YEAR"< 2021 )  )  )  )  ) )GROUP BY
"PS_LEDGER"."BUSINESS_UNIT","PS_LEDGER"."ACCOUNT","PS_LEDGER"."CHARTFIELD1","PS_LEDGER"."FISCAL_YEAR","PS_LEDGER"."ACCOUNTING_PERIOD"
…

Demonstration 5: Composite (Range-Range) Partitioning

I am still composite partitioning the ledger table and materialized view in this test. It will have the same range partitioning on FISCAL_YEAR, but this time I will range subpartition it by ACCOUTING_PERIOD with 14 periods per fiscal year. I will use a template so that each partition will have the same subpartitions.

CREATE TABLE ps_ledger
(business_unit VARCHAR2(5) NOT NULL
…
) PCTFREE 10 PCTUSED 80
PARTITION BY RANGE (FISCAL_YEAR) 
SUBPARTITION BY RANGE (ACCOUNTING_PERIOD)
SUBPARTITION TEMPLATE
(SUBPARTITION ap_bf VALUES LESS THAN (1) 
,SUBPARTITION ap_01 VALUES LESS THAN (2) 
,SUBPARTITION ap_02 VALUES LESS THAN (3) 
,SUBPARTITION ap_03 VALUES LESS THAN (4) 
,SUBPARTITION ap_04 VALUES LESS THAN (5) 
,SUBPARTITION ap_05 VALUES LESS THAN (6) 
,SUBPARTITION ap_06 VALUES LESS THAN (7) 
,SUBPARTITION ap_07 VALUES LESS THAN (8) 
,SUBPARTITION ap_08 VALUES LESS THAN (9) 
,SUBPARTITION ap_09 VALUES LESS THAN (10) 
,SUBPARTITION ap_10 VALUES LESS THAN (11) 
,SUBPARTITION ap_11 VALUES LESS THAN (12) 
,SUBPARTITION ap_12 VALUES LESS THAN (13) 
,SUBPARTITION ap_cf VALUES LESS THAN (MAXVALUE))
 (PARTITION ledger_2018 VALUES LESS THAN (2019) PCTFREE 0 COMPRESS
,PARTITION ledger_2019 VALUES LESS THAN (2020) PCTFREE 0 COMPRESS
,PARTITION ledger_2020 VALUES LESS THAN (2021) 
,PARTITION ledger_2021 VALUES LESS THAN (2022) 
)
ENABLE ROW MOVEMENT NOLOGGING
/
@treeselectors
@popledger

This time I will create one materialized view with two range partitions for two fiscal years

CREATE MATERIALIZED VIEW mv_ledger_2020
PARTITION BY RANGE (FISCAL_YEAR)
SUBPARTITION BY RANGE (ACCOUNTING_PERIOD)
SUBPARTITION TEMPLATE
(SUBPARTITION ap_bf VALUES LESS THAN (1) 
,SUBPARTITION ap_01 VALUES LESS THAN (2) 
,SUBPARTITION ap_02 VALUES LESS THAN (3) 
,SUBPARTITION ap_03 VALUES LESS THAN (4) 
,SUBPARTITION ap_04 VALUES LESS THAN (5) 
,SUBPARTITION ap_05 VALUES LESS THAN (6) 
,SUBPARTITION ap_06 VALUES LESS THAN (7) 
,SUBPARTITION ap_07 VALUES LESS THAN (8) 
,SUBPARTITION ap_08 VALUES LESS THAN (9) 
,SUBPARTITION ap_09 VALUES LESS THAN (10) 
,SUBPARTITION ap_10 VALUES LESS THAN (11) 
,SUBPARTITION ap_11 VALUES LESS THAN (12) 
,SUBPARTITION ap_12 VALUES LESS THAN (13) 
,SUBPARTITION ap_cf VALUES LESS THAN (MAXVALUE))
(PARTITION ledger_2019 VALUES LESS THAN (2020)
,PARTITION ledger_2020 VALUES LESS THAN (2021)
) PCTFREE 0 COMPRESS PARALLEL
REFRESH COMPLETE ON DEMAND
ENABLE QUERY REWRITE AS
SELECT business_unit, account, chartfield1, fiscal_year, accounting_period,
sum(posted_total_amt) posted_total_amt
FROM ps_ledger
WHERE fiscal_year >= 2019
AND   ledger = 'ACTUALS'
AND   currency_cd = 'GBP'
GROUP BY business_unit, account, chartfield1, fiscal_year, accounting_period
/
@mvpop

After inserting and committing data for fiscal year 2020, period 7 USER_MVIEW_DETAIL_SUBPARTITION correctly identified the one stale sub-partition, and USER_MVIEW_DETAIL_PARTITION reports that one range subpartition is stale.

@pop2020m7.sql
19:09:50 SQL> SELECT MVIEW_NAME, STALENESS, LAST_REFRESH_TYPE, COMPILE_STATE FROM USER_MVIEWS ORDER BY MVIEW_NAME;

MVIEW_NAME      STALENESS           LAST_REF COMPILE_STATE
--------------- ------------------- -------- -------------------
MV_LEDGER_2020  NEEDS_COMPILE       COMPLETE NEEDS_COMPILE

19:09:50 SQL> select * from user_mview_detail_relations;

                           Detailobj
OWNER      MVIEW_NAME      Owner      DETAILOBJ_NAME  DETAILOBJ DETAILOBJ_ALIAS      D NUM_FRESH_PCT_PARTITIONS NUM_STALE_PCT_PARTITIONS
---------- --------------- ---------- --------------- --------- -------------------- - ------------------------ ------------------------
SCOTT      MV_LEDGER_2020  SCOTT      PS_LEDGER       TABLE     PS_LEDGER            Y                       55                        1

19:09:56 SQL> select * from user_mview_detail_subpartition where freshness != 'FRESH';

                           Detailobj
OWNER      MVIEW_NAME      Owner      DETAILOBJ_NAME  DETAIL_PARTITION_NAM DETAIL_SUBPARTITION_ DETAIL_SUBPARTITION_POSITION FRESH
---------- --------------- ---------- --------------- -------------------- -------------------- ---------------------------- -----
SCOTT      MV_LEDGER_2020  SCOTT      PS_LEDGER       LEDGER_2020          LEDGER_2020_AP_07                               8 STALE

Query rewrite continues to work on the fresh partitions.

Plan hash value: 589110139
-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name              | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                   | 13427 |  1062K|       |   683   (1)| 00:00:01 |       |       |
|   1 |  HASH GROUP BY                    |                   | 13427 |  1062K|  1232K|   683   (1)| 00:00:01 |       |       |
|*  2 |   HASH JOIN                       |                   | 13427 |  1062K|       |   427   (2)| 00:00:01 |       |       |
|*  3 |    INDEX RANGE SCAN               | PS_PSTREESELECT10 |   257 |  4883 |       |     2   (0)| 00:00:01 |       |       |
|*  4 |    HASH JOIN                      |                   | 52141 |  3156K|       |   424   (1)| 00:00:01 |       |       |
|   5 |     MERGE JOIN CARTESIAN          |                   |   210 |  6090 |       |     3   (0)| 00:00:01 |       |       |
|*  6 |      INDEX RANGE SCAN             | PS_PSTREESELECT05 |     2 |    22 |       |     1   (0)| 00:00:01 |       |       |
|   7 |      BUFFER SORT                  |                   |   105 |  1890 |       |     2   (0)| 00:00:01 |       |       |
|*  8 |       INDEX RANGE SCAN            | PS_PSTREESELECT10 |   105 |  1890 |       |     1   (0)| 00:00:01 |       |       |
|   9 |     PARTITION RANGE SINGLE        |                   |   496K|    15M|       |   420   (1)| 00:00:01 |     2 |     2 |
|  10 |      PARTITION RANGE ITERATOR     |                   |   496K|    15M|       |   420   (1)| 00:00:01 |     2 |     7 |
|* 11 |       MAT_VIEW REWRITE ACCESS FULL| MV_LEDGER_2020    |   496K|    15M|       |   420   (1)| 00:00:01 |    15 |    28 |
-------------------------------------------------------------------------------------------------------------------------------
…
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("MV_LEDGER_2020"."ACCOUNT"="L2"."RANGE_FROM_10")
   3 - access("L2"."SELECTOR_NUM"=30984)
   4 - access("MV_LEDGER_2020"."BUSINESS_UNIT"="L1"."RANGE_FROM_05" AND
"MV_LEDGER_2020"."CHARTFIELD1"="L"."RANGE_FROM_10")
   6 - access("L1"."SELECTOR_NUM"=30982)
   8 - access("L"."SELECTOR_NUM"=30985)
  11 - filter("MV_LEDGER_2020"."ACCOUNTING_PERIOD"<=6 AND "MV_LEDGER_2020"."FISCAL_YEAR"=2020)

PCT correctly identifies stale partition in this query on period 7 only and prevents query rewrite.

Plan hash value: 1321682226
---------------------------------------------------------------------------------------------------------------------------------------------
|   Id  | Operation                                             | Name              | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------------------------
|     0 | SELECT STATEMENT                                      |                   |     1 |    92 |     7  (15)| 00:00:01 |       |       |
|     1 |  HASH GROUP BY                                        |                   |     1 |    92 |     7  (15)| 00:00:01 |       |       |
|- *  2 |   HASH JOIN                                           |                   |     1 |    92 |     6   (0)| 00:00:01 |       |       |
|     3 |    NESTED LOOPS                                       |                   |     1 |    92 |     6   (0)| 00:00:01 |       |       |
|-    4 |     STATISTICS COLLECTOR                              |                   |       |       |            |          |       |       |
|- *  5 |      HASH JOIN                                        |                   |     1 |    73 |     5   (0)| 00:00:01 |       |       |
|     6 |       NESTED LOOPS                                    |                   |     1 |    73 |     5   (0)| 00:00:01 |       |       |
|-    7 |        STATISTICS COLLECTOR                           |                   |       |       |            |          |       |       |
|- *  8 |         HASH JOIN                                     |                   |     1 |    55 |     4   (0)| 00:00:01 |       |       |
|     9 |          NESTED LOOPS                                 |                   |     1 |    55 |     4   (0)| 00:00:01 |       |       |
|-   10 |           STATISTICS COLLECTOR                        |                   |       |       |            |          |       |       |
|    11 |            PARTITION RANGE SINGLE                     |                   |     1 |    44 |     3   (0)| 00:00:01 |     3 |     3 |
|    12 |             PARTITION RANGE SINGLE                    |                   |     1 |    44 |     3   (0)| 00:00:01 |     8 |     8 |
|  * 13 |              TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| PS_LEDGER         |     1 |    44 |     3   (0)| 00:00:01 |    36 |    36 |
|  * 14 |               INDEX RANGE SCAN                        | PSXLEDGER         |     1 |       |     2   (0)| 00:00:01 |    36 |    36 |
|  * 15 |           INDEX RANGE SCAN                            | PS_PSTREESELECT05 |     1 |    11 |     1   (0)| 00:00:01 |       |       |
|- * 16 |          INDEX RANGE SCAN                             | PS_PSTREESELECT05 |     1 |    11 |     1   (0)| 00:00:01 |       |       |
|  * 17 |        INDEX RANGE SCAN                               | PS_PSTREESELECT10 |     1 |    18 |     1   (0)| 00:00:01 |       |       |
|- * 18 |       INDEX RANGE SCAN                                | PS_PSTREESELECT10 |     1 |    18 |     1   (0)| 00:00:01 |       |       |
|  * 19 |     INDEX RANGE SCAN                                  | PS_PSTREESELECT10 |     1 |    19 |     1   (0)| 00:00:01 |       |       |
|- * 20 |    INDEX RANGE SCAN                                   | PS_PSTREESELECT10 |     1 |    19 |     1   (0)| 00:00:01 |       |       |
---------------------------------------------------------------------------------------------------------------------------------------------
…
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("A"."ACCOUNT"="L2"."RANGE_FROM_10")
   5 - access("A"."CHARTFIELD1"="L"."RANGE_FROM_10")
   8 - access("A"."BUSINESS_UNIT"="L1"."RANGE_FROM_05")
  13 - filter("A"."CURRENCY_CD"='GBP')
  14 - access("A"."LEDGER"='ACTUALS' AND "A"."FISCAL_YEAR"=2020 AND "A"."ACCOUNTING_PERIOD"=7)
  15 - access("L1"."SELECTOR_NUM"=30982 AND "A"."BUSINESS_UNIT"="L1"."RANGE_FROM_05")
       filter("A"."BUSINESS_UNIT"="L1"."RANGE_FROM_05")
  16 - access("L1"."SELECTOR_NUM"=30982)
  17 - access("L"."SELECTOR_NUM"=30985 AND "A"."CHARTFIELD1"="L"."RANGE_FROM_10")
       filter("A"."CHARTFIELD1"="L"."RANGE_FROM_10")
  18 - access("L"."SELECTOR_NUM"=30985)
  19 - access("L2"."SELECTOR_NUM"=30984 AND "A"."ACCOUNT"="L2"."RANGE_FROM_10")
       filter("A"."ACCOUNT"="L2"."RANGE_FROM_10")
  20 - access("L2"."SELECTOR_NUM"=30984)

The query rewrite is prevented if a stale partition is not pruned. It is all or nothing. The query is not expanded and then rewritten to use materialised view for periods 1 to 6 and then the underlying table for period 7.

Plan hash value: 3827045647
------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name              | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                   |   561 | 52173 |  3670   (1)| 00:00:01 |       |       |
|   1 |  HASH GROUP BY               |                   |   561 | 52173 |  3670   (1)| 00:00:01 |       |       |
|*  2 |   HASH JOIN                  |                   |   561 | 52173 |  3669   (1)| 00:00:01 |       |       |
|*  3 |    INDEX RANGE SCAN          | PS_PSTREESELECT10 |   227 |  4313 |     2   (0)| 00:00:01 |       |       |
|*  4 |    HASH JOIN                 |                   |  2468 |   178K|  3667   (1)| 00:00:01 |       |       |
|   5 |     MERGE JOIN CARTESIAN     |                   |   210 |  6090 |     3   (0)| 00:00:01 |       |       |
|*  6 |      INDEX RANGE SCAN        | PS_PSTREESELECT05 |     2 |    22 |     1   (0)| 00:00:01 |       |       |
|   7 |      BUFFER SORT             |                   |   105 |  1890 |     2   (0)| 00:00:01 |       |       |
|*  8 |       INDEX RANGE SCAN       | PS_PSTREESELECT10 |   105 |  1890 |     1   (0)| 00:00:01 |       |       |
|   9 |     PARTITION RANGE SINGLE   |                   | 23486 |  1032K|  3664   (1)| 00:00:01 |     3 |     3 |
|  10 |      PARTITION RANGE ITERATOR|                   | 23486 |  1032K|  3664   (1)| 00:00:01 |     2 |     8 |
|* 11 |       TABLE ACCESS FULL      | PS_LEDGER         | 23486 |  1032K|  3664   (1)| 00:00:01 |    29 |    42 |
------------------------------------------------------------------------------------------------------------------
…
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("A"."ACCOUNT"="L2"."RANGE_FROM_10")
   3 - access("L2"."SELECTOR_NUM"=30984)
   4 - access("A"."BUSINESS_UNIT"="L1"."RANGE_FROM_05" AND "A"."CHARTFIELD1"="L"."RANGE_FROM_10")
   6 - access("L1"."SELECTOR_NUM"=30982)
   8 - access("L"."SELECTOR_NUM"=30985)
  11 - filter("A"."ACCOUNTING_PERIOD"<=7 AND "A"."LEDGER"='ACTUALS' AND "A"."FISCAL_YEAR"=2020 AND
"A"."CURRENCY_CD"='GBP')

Again, the materialized view refresh process truncates and repopulates the whole partition not the sub-partition. So we are still processing a whole fiscal year.

…
/* MV_REFRESH (ATB) */ ALTER TABLE "SCOTT"."MV_LEDGER_2020" TRUNCATE  PARTITION LEDGER_2020
…
/* MV_REFRESH (INS) */ INSERT /*+ APPEND BYPASS_RECURSIVE_CHECK */  INTO "SCOTT"."MV_LEDGER_2020" PARTITION ( LEDGER_2020 ) ("BUSINESS_UNIT",
"ACCOUNT", "CHARTFIELD1", "FISCAL_YEAR", "ACCOUNTING_PERIOD", "POSTED_TOTAL_AMT") SELECT  /*+ X_DYN_PRUNE */ "PS_LEDGER"."BUSINESS_UNIT" , 
"PS_LEDGER"."ACCOUNT" , "PS_LEDGER"."CHARTFIELD1" , "PS_LEDGER"."FISCAL_YEAR" P0, "PS_LEDGER"."ACCOUNTING_PERIOD" , 
SUM("PS_LEDGER"."POSTED_TOTAL_AMT")  FROM "PS_LEDGER""PS_LEDGER" WHERE ("PS_LEDGER"."FISCAL_YEAR"=2020 AND "PS_LEDGER"."LEDGER"='ACTUALS'
AND "PS_LEDGER"."CURRENCY_CD"='GBP') AND  ( ( (  (  (  ( "PS_LEDGER"."FISCAL_YEAR"< 2021 )  )  )  )  ) )GROUP BY
"PS_LEDGER"."BUSINESS_UNIT","PS_LEDGER"."ACCOUNT","PS_LEDGER"."CHARTFIELD1","PS_LEDGER"."FISCAL_YEAR","PS_LEDGER"."ACCOUNTING_PERIOD"
…

Demonstration 6: Mismatching Partitioning

In this example, I am still composite partitioning the ledger table and materialized view. It will have the same range partitioning on FISCAL_YEAR, I still will range subpartition it by ACCOUTING_PERIOD with 14 periods per fiscal year. I will use a template so that each partition will have the same subpartitions.

CREATE TABLE ps_ledger
(business_unit VARCHAR2(5) NOT NULL
…
) PCTFREE 10 PCTUSED 80
PARTITION BY RANGE (FISCAL_YEAR) 
SUBPARTITION BY RANGE (ACCOUNTING_PERIOD)
SUBPARTITION TEMPLATE
(SUBPARTITION ap_bf VALUES LESS THAN (1) 
,SUBPARTITION ap_01 VALUES LESS THAN (2) 
,SUBPARTITION ap_02 VALUES LESS THAN (3) 
,SUBPARTITION ap_03 VALUES LESS THAN (4) 
,SUBPARTITION ap_04 VALUES LESS THAN (5) 
,SUBPARTITION ap_05 VALUES LESS THAN (6) 
,SUBPARTITION ap_06 VALUES LESS THAN (7) 
,SUBPARTITION ap_07 VALUES LESS THAN (8) 
,SUBPARTITION ap_08 VALUES LESS THAN (9) 
,SUBPARTITION ap_09 VALUES LESS THAN (10) 
,SUBPARTITION ap_10 VALUES LESS THAN (11) 
,SUBPARTITION ap_11 VALUES LESS THAN (12) 
,SUBPARTITION ap_12 VALUES LESS THAN (13) 
,SUBPARTITION ap_cf VALUES LESS THAN (MAXVALUE))
 (PARTITION ledger_2018 VALUES LESS THAN (2019) PCTFREE 0 COMPRESS
,PARTITION ledger_2019 VALUES LESS THAN (2020) PCTFREE 0 COMPRESS
,PARTITION ledger_2020 VALUES LESS THAN (2021) 
,PARTITION ledger_2021 VALUES LESS THAN (2022)
) ENABLE ROW MOVEMENT NOLOGGING
/
@treeselectors
@popledger

I will create two materialized views, one for 2019 and one for 2020. I will only range partition the MV on accounting period because each contains only a single fiscal year.

CREATE MATERIALIZED VIEW mv_ledger_2019
PARTITION BY RANGE (ACCOUNTING_PERIOD)
(PARTITION ap_bf VALUES LESS THAN (1) 
,PARTITION ap_01 VALUES LESS THAN (2) 
,PARTITION ap_02 VALUES LESS THAN (3) 
,PARTITION ap_03 VALUES LESS THAN (4) 
,PARTITION ap_04 VALUES LESS THAN (5) 
,PARTITION ap_05 VALUES LESS THAN (6) 
,PARTITION ap_06 VALUES LESS THAN (7) 
,PARTITION ap_07 VALUES LESS THAN (8) 
,PARTITION ap_08 VALUES LESS THAN (9) 
,PARTITION ap_09 VALUES LESS THAN (10) 
,PARTITION ap_10 VALUES LESS THAN (11) 
,PARTITION ap_11 VALUES LESS THAN (12) 
,PARTITION ap_12 VALUES LESS THAN (13) 
,PARTITION ap_cf VALUES LESS THAN (MAXVALUE)
) PCTFREE 0 COMPRESS
REFRESH COMPLETE ON DEMAND
ENABLE QUERY REWRITE AS
SELECT business_unit, account, chartfield1, fiscal_year, accounting_period,
sum(posted_total_amt) posted_total_amt
FROM ps_ledger
WHERE fiscal_year = 2019
AND   ledger = 'ACTUALS'
AND   currency_cd = 'GBP'
GROUP BY business_unit, account, chartfield1, fiscal_year, accounting_period
/

CREATE MATERIALIZED VIEW mv_ledger_2020
PARTITION BY RANGE (ACCOUNTING_PERIOD)
(PARTITION ap_bf VALUES LESS THAN (1) 
,PARTITION ap_01 VALUES LESS THAN (2) 
,PARTITION ap_02 VALUES LESS THAN (3) 
,PARTITION ap_03 VALUES LESS THAN (4) 
,PARTITION ap_04 VALUES LESS THAN (5) 
,PARTITION ap_05 VALUES LESS THAN (6) 
,PARTITION ap_06 VALUES LESS THAN (7) 
,PARTITION ap_07 VALUES LESS THAN (8) 
,PARTITION ap_08 VALUES LESS THAN (9) 
,PARTITION ap_09 VALUES LESS THAN (10) 
,PARTITION ap_10 VALUES LESS THAN (11) 
,PARTITION ap_11 VALUES LESS THAN (12) 
,PARTITION ap_12 VALUES LESS THAN (13) 
,PARTITION ap_cf VALUES LESS THAN (MAXVALUE)
) PCTFREE 0 COMPRESS PARALLEL
REFRESH COMPLETE ON DEMAND
ENABLE QUERY REWRITE AS
SELECT business_unit, account, chartfield1, fiscal_year, accounting_period,
sum(posted_total_amt) posted_total_amt
FROM ps_ledger
WHERE fiscal_year = 2020
AND   ledger = 'ACTUALS'
AND   currency_cd = 'GBP'
GROUP BY business_unit, account, chartfield1, fiscal_year, accounting_period
/
@mvpop

USER_MVIEW_DETAIL_RELATIONS reports that PCT does apply to these materialized views. USER_MVIEW_DETAIL_SUBPARTITION correctly identified the one stale sub-partition into which new data was added is stale, but in both materialised views, even though we can see it is not needed by MV_LEDGER_2019.

@pop2020m7.sql
23:57:09 SQL> SELECT MVIEW_NAME, STALENESS, LAST_REFRESH_TYPE, COMPILE_STATE FROM USER_MVIEWS ORDER BY MVIEW_NAME;

MVIEW_NAME      STALENESS           LAST_REF COMPILE_STATE
--------------- ------------------- -------- -------------------
MV_LEDGER_2019  NEEDS_COMPILE       COMPLETE NEEDS_COMPILE
MV_LEDGER_2020  NEEDS_COMPILE       COMPLETE NEEDS_COMPILE

Elapsed: 00:00:00.00
23:57:09 SQL> select * from user_mview_detail_relations;

                           Detailobj
OWNER      MVIEW_NAME      Owner      DETAILOBJ_NAME  DETAILOBJ DETAILOBJ_ALIAS      D NUM_FRESH_PCT_PARTITIONS NUM_STALE_PCT_PARTITIONS
---------- --------------- ---------- --------------- --------- -------------------- - ------------------------ ------------------------
SCOTT      MV_LEDGER_2019  SCOTT      PS_LEDGER       TABLE     PS_LEDGER            Y                       55                        1
SCOTT      MV_LEDGER_2020  SCOTT      PS_LEDGER       TABLE     PS_LEDGER            Y                       55                        1

Elapsed: 00:00:13.46
23:57:23 SQL> select * from user_mview_detail_partition;

no rows selected

Elapsed: 00:00:00.00
23:57:23 SQL> select * from user_mview_detail_subpartition where freshness != 'FRESH';

                           Detailobj
OWNER      MVIEW_NAME      Owner      DETAILOBJ_NAME  DETAIL_PARTITION_NAM DETAIL_SUBPARTITION_ DETAIL_SUBPARTITION_POSITION FRESH
---------- --------------- ---------- --------------- -------------------- -------------------- ---------------------------- -----
SCOTT      MV_LEDGER_2019  SCOTT      PS_LEDGER       LEDGER_2020          LEDGER_2020_AP_07                               8 STALE
SCOTT      MV_LEDGER_2020  SCOTT      PS_LEDGER       LEDGER_2020          LEDGER_2020_AP_07                               8 STALE

Query on 2019 continues to be rewritten to use MV_LEDGER_2019 even though the MV needs compilation.

Plan hash value: 1498194812
----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name              | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                   |  1703 |   128K|   421   (2)| 00:00:01 |       |       |
|   1 |  HASH GROUP BY                   |                   |  1703 |   128K|   421   (2)| 00:00:01 |       |       |
|*  2 |   HASH JOIN                      |                   |  1703 |   128K|   420   (2)| 00:00:01 |       |       |
|*  3 |    INDEX RANGE SCAN              | PS_PSTREESELECT10 |   238 |  4522 |     2   (0)| 00:00:01 |       |       |
|*  4 |    HASH JOIN                     |                   |  7156 |   405K|   418   (2)| 00:00:01 |       |       |
|   5 |     MERGE JOIN CARTESIAN         |                   |   208 |  6032 |     3   (0)| 00:00:01 |       |       |
|*  6 |      INDEX RANGE SCAN            | PS_PSTREESELECT05 |     2 |    22 |     1   (0)| 00:00:01 |       |       |
|   7 |      BUFFER SORT                 |                   |   104 |  1872 |     2   (0)| 00:00:01 |       |       |
|*  8 |       INDEX RANGE SCAN           | PS_PSTREESELECT10 |   104 |  1872 |     1   (0)| 00:00:01 |       |       |
|   9 |     PARTITION RANGE ITERATOR     |                   | 68804 |  1948K|   415   (2)| 00:00:01 |     2 |     7 |
|* 10 |      MAT_VIEW REWRITE ACCESS FULL| MV_LEDGER_2019    | 68804 |  1948K|   415   (2)| 00:00:01 |     2 |     7 |
----------------------------------------------------------------------------------------------------------------------
….
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("MV_LEDGER_2019"."ACCOUNT"="L2"."RANGE_FROM_10")
   3 - access("L2"."SELECTOR_NUM"=30984)
   4 - access("MV_LEDGER_2019"."BUSINESS_UNIT"="L1"."RANGE_FROM_05" AND
"MV_LEDGER_2019"."CHARTFIELD1"="L"."RANGE_FROM_10")
   6 - access("L1"."SELECTOR_NUM"=30982)
   8 - access("L"."SELECTOR_NUM"=30985)
  10 - filter("MV_LEDGER_2019"."ACCOUNTING_PERIOD"<=6)

Queries on periods 1-6 in 2020 also get rewritten

Plan hash value: 3016493666
------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name              | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                   | 12328 |   927K|       |   653   (2)| 00:00:01 |       |       |
|   1 |  HASH GROUP BY                   |                   | 12328 |   927K|  1080K|   653   (2)| 00:00:01 |       |       |
|*  2 |   HASH JOIN                      |                   | 12328 |   927K|       |   429   (2)| 00:00:01 |       |       |
|*  3 |    INDEX RANGE SCAN              | PS_PSTREESELECT10 |   238 |  4522 |       |     2   (0)| 00:00:01 |       |       |
|*  4 |    HASH JOIN                     |                   | 51748 |  2931K|       |   427   (2)| 00:00:01 |       |       |
|   5 |     MERGE JOIN CARTESIAN         |                   |   208 |  6032 |       |     3   (0)| 00:00:01 |       |       |
|*  6 |      INDEX RANGE SCAN            | PS_PSTREESELECT05 |     2 |    22 |       |     1   (0)| 00:00:01 |       |       |
|   7 |      BUFFER SORT                 |                   |   104 |  1872 |       |     2   (0)| 00:00:01 |       |       |
|*  8 |       INDEX RANGE SCAN           | PS_PSTREESELECT10 |   104 |  1872 |       |     1   (0)| 00:00:01 |       |       |
|   9 |     PARTITION RANGE ITERATOR     |                   |   496K|    13M|       |   423   (2)| 00:00:01 |     2 |     7 |
|* 10 |      MAT_VIEW REWRITE ACCESS FULL| MV_LEDGER_2020    |   496K|    13M|       |   423   (2)| 00:00:01 |     2 |     7 |
------------------------------------------------------------------------------------------------------------------------------
…
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("MV_LEDGER_2020"."ACCOUNT"="L2"."RANGE_FROM_10")
   3 - access("L2"."SELECTOR_NUM"=30984)
   4 - access("MV_LEDGER_2020"."BUSINESS_UNIT"="L1"."RANGE_FROM_05" AND
"MV_LEDGER_2020"."CHARTFIELD1"="L"."RANGE_FROM_10")
   6 - access("L1"."SELECTOR_NUM"=30982)
   8 - access("L"."SELECTOR_NUM"=30985)
  10 - filter("MV_LEDGER_2020"."ACCOUNTING_PERIOD"<=6)

Quite correctly, the query on 2020 period 7 is not rewritten because the underlying partition is stale.

Plan hash value: 1321682226
---------------------------------------------------------------------------------------------------------------------------------------------
|   Id  | Operation                                             | Name              | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------------------------
|     0 | SELECT STATEMENT                                      |                   |     1 |    92 |     7  (15)| 00:00:01 |       |       |
|     1 |  HASH GROUP BY                                        |                   |     1 |    92 |     7  (15)| 00:00:01 |       |       |
|- *  2 |   HASH JOIN                                           |                   |     1 |    92 |     6   (0)| 00:00:01 |       |       |
|     3 |    NESTED LOOPS                                       |                   |     1 |    92 |     6   (0)| 00:00:01 |       |       |
|-    4 |     STATISTICS COLLECTOR                              |                   |       |       |            |          |       |       |
|- *  5 |      HASH JOIN                                        |                   |     1 |    73 |     5   (0)| 00:00:01 |       |       |
|     6 |       NESTED LOOPS                                    |                   |     1 |    73 |     5   (0)| 00:00:01 |       |       |
|-    7 |        STATISTICS COLLECTOR                           |                   |       |       |            |          |       |       |
|- *  8 |         HASH JOIN                                     |                   |     1 |    55 |     4   (0)| 00:00:01 |       |       |
|     9 |          NESTED LOOPS                                 |                   |     1 |    55 |     4   (0)| 00:00:01 |       |       |
|-   10 |           STATISTICS COLLECTOR                        |                   |       |       |            |          |       |       |
|    11 |            PARTITION RANGE SINGLE                     |                   |     1 |    44 |     3   (0)| 00:00:01 |     3 |     3 |
|    12 |             PARTITION RANGE SINGLE                    |                   |     1 |    44 |     3   (0)| 00:00:01 |     8 |     8 |
|  * 13 |              TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| PS_LEDGER         |     1 |    44 |     3   (0)| 00:00:01 |    36 |    36 |
|  * 14 |               INDEX RANGE SCAN                        | PSXLEDGER         |     1 |       |     2   (0)| 00:00:01 |    36 |    36 |
|  * 15 |           INDEX RANGE SCAN                            | PS_PSTREESELECT05 |     1 |    11 |     1   (0)| 00:00:01 |       |       |
|- * 16 |          INDEX RANGE SCAN                             | PS_PSTREESELECT05 |     1 |    11 |     1   (0)| 00:00:01 |       |       |
|  * 17 |        INDEX RANGE SCAN                               | PS_PSTREESELECT10 |     1 |    18 |     1   (0)| 00:00:01 |       |       |
|- * 18 |       INDEX RANGE SCAN                                | PS_PSTREESELECT10 |     1 |    18 |     1   (0)| 00:00:01 |       |       |
|  * 19 |     INDEX RANGE SCAN                                  | PS_PSTREESELECT10 |     1 |    19 |     1   (0)| 00:00:01 |       |       |
|- * 20 |    INDEX RANGE SCAN                                   | PS_PSTREESELECT10 |     1 |    19 |     1   (0)| 00:00:01 |       |       |
---------------------------------------------------------------------------------------------------------------------------------------------
…
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("A"."ACCOUNT"="L2"."RANGE_FROM_10")
   5 - access("A"."CHARTFIELD1"="L"."RANGE_FROM_10")
   8 - access("A"."BUSINESS_UNIT"="L1"."RANGE_FROM_05")
  13 - filter("A"."CURRENCY_CD"='GBP')
  14 - access("A"."LEDGER"='ACTUALS' AND "A"."FISCAL_YEAR"=2020 AND "A"."ACCOUNTING_PERIOD"=7)
  15 - access("L1"."SELECTOR_NUM"=30982 AND "A"."BUSINESS_UNIT"="L1"."RANGE_FROM_05")
       filter("A"."BUSINESS_UNIT"="L1"."RANGE_FROM_05")
  16 - access("L1"."SELECTOR_NUM"=30982)
  17 - access("L"."SELECTOR_NUM"=30985 AND "A"."CHARTFIELD1"="L"."RANGE_FROM_10")
       filter("A"."CHARTFIELD1"="L"."RANGE_FROM_10")
  18 - access("L"."SELECTOR_NUM"=30985)
  19 - access("L2"."SELECTOR_NUM"=30984 AND "A"."ACCOUNT"="L2"."RANGE_FROM_10")
       filter("A"."ACCOUNT"="L2"."RANGE_FROM_10")
  20 - access("L2"."SELECTOR_NUM"=30984)

Both MVs are compressed after the initial creation. Note the sizes of the partitions for fiscal year 2020; about 256 blocks, and 284 rows per block

                                          Sub-                                             Rows
                Part                      Part                                              per
TABLE_NAME       Pos PARTITION_NAME        Pos SUBPARTITION_NAME         NUM_ROWS BLOCKS  Block COMPRESS COMPRESS_FOR
--------------- ---- -------------------- ---- ------------------------- -------- ------ ------ -------- ----------------
MV_LEDGER_2019     1 AP_BF                                                  72886    252  289.2 ENABLED  BASIC
                   2 AP_01                                                  72925    252  289.4 ENABLED  BASIC
                   3 AP_02                                                  72736    251  289.8 ENABLED  BASIC
                   4 AP_03                                                  72745    251  289.8 ENABLED  BASIC
                   5 AP_04                                                  72649    251  289.4 ENABLED  BASIC
                   6 AP_05                                                  71947    249  288.9 ENABLED  BASIC
                   7 AP_06                                                  72903    252  289.3 ENABLED  BASIC
                   8 AP_07                                                  72510    250  290.0 ENABLED  BASIC
                   9 AP_08                                                  72520    251  288.9 ENABLED  BASIC
                  10 AP_09                                                  72965    252  289.5 ENABLED  BASIC
                  11 AP_10                                                  72209    250  288.8 ENABLED  BASIC
                  12 AP_11                                                  72647    251  289.4 ENABLED  BASIC
                  13 AP_12                                                  73121    253  289.0 ENABLED  BASIC
                  14 AP_CF                                                   1999     25   80.0 ENABLED  BASIC
                                                                           946762   3290  287.8

MV_LEDGER_2020     1 AP_BF                                                  72475    256  283.1 ENABLED  BASIC
                   2 AP_01                                                  72981    256  285.1 ENABLED  BASIC
                   3 AP_02                                                  72726    256  284.1 ENABLED  BASIC
                   4 AP_03                                                  72844    256  284.5 ENABLED  BASIC
                   5 AP_04                                                  72709    256  284.0 ENABLED  BASIC
                   6 AP_05                                                  72535    256  283.3 ENABLED  BASIC
                   7 AP_06                                                  72419    256  282.9 ENABLED  BASIC
                   8 AP_07                                                      0      0        ENABLED  BASIC
                   9 AP_08                                                      0      0        ENABLED  BASIC
                  10 AP_09                                                      0      0        ENABLED  BASIC
                  11 AP_10                                                      0      0        ENABLED  BASIC
                  12 AP_11                                                      0      0        ENABLED  BASIC
                  13 AP_12                                                      0      0        ENABLED  BASIC
                  14 AP_CF                                                      0      0        ENABLED  BASIC
                                                                           508689   1792  283.9

Let's look at the trace of the refresh processes. Both materialised views were marked as NEEDS_COMPILE, so both were refreshed. However, the trace shows that the refresh has changed from truncate to delete and the insert is not done in direct path mode. The refresh of MV_LEDGER_2019 didn't actually change any data because both refreshes tried to process 2020 because a 2020 subpartition had been changed. No data was deleted, and none was inserted.

…
/* MV_REFRESH (DEL) */ DELETE FROM "SCOTT"."MV_LEDGER_2019" WHERE  ( ( ( (2020 <= "FISCAL_YEAR" AND "FISCAL_YEAR"< 2021)  )) )
…
/* MV_REFRESH (INS) */ INSERT /*+ BYPASS_RECURSIVE_CHECK */ INTO "SCOTT"."MV_LEDGER_2019" ("BUSINESS_UNIT", "ACCOUNT", "CHARTFIELD1", 
"FISCAL_YEAR", "ACCOUNTING_PERIOD", "POSTED_TOTAL_AMT") SELECT  /*+ X_DYN_PRUNE */ "PS_LEDGER"."BUSINESS_UNIT" , "PS_LEDGER"."ACCOUNT" , 
"PS_LEDGER"."CHARTFIELD1" , "PS_LEDGER"."FISCAL_YEAR" , "PS_LEDGER"."ACCOUNTING_PERIOD" , SUM("PS_LEDGER"."POSTED_TOTAL_AMT")  
FROM "PS_LEDGER""PS_LEDGER" WHERE ("PS_LEDGER"."FISCAL_YEAR"=2019 AND "PS_LEDGER"."LEDGER"='ACTUALS' AND "PS_LEDGER"."CURRENCY_CD"='GBP') 
AND  ( ( ( (2020 <= "PS_LEDGER"."FISCAL_YEAR" AND "PS_LEDGER"."FISCAL_YEAR"< 2021) ) )  )GROUP BY 
"PS_LEDGER"."BUSINESS_UNIT","PS_LEDGER"."ACCOUNT","PS_LEDGER"."CHARTFIELD1","PS_LEDGER"."FISCAL_YEAR","PS_LEDGER"."ACCOUNTING_PERIOD"
…

…
/* MV_REFRESH (DEL) */ DELETE FROM "SCOTT"."MV_LEDGER_2020" WHERE  ( ( ( (2020 <= "FISCAL_YEAR" AND "FISCAL_YEAR"< 2021)  )) )
…
/* MV_REFRESH (INS) */ INSERT /*+ BYPASS_RECURSIVE_CHECK */ INTO "SCOTT"."MV_LEDGER_2020" ("BUSINESS_UNIT", "ACCOUNT", "CHARTFIELD1", 
"FISCAL_YEAR", "ACCOUNTING_PERIOD", "POSTED_TOTAL_AMT") SELECT  /*+ X_DYN_PRUNE */ "PS_LEDGER"."BUSINESS_UNIT" , "PS_LEDGER"."ACCOUNT" , 
"PS_LEDGER"."CHARTFIELD1" , "PS_LEDGER"."FISCAL_YEAR" , "PS_LEDGER"."ACCOUNTING_PERIOD" , SUM("PS_LEDGER"."POSTED_TOTAL_AMT")  
FROM "PS_LEDGER""PS_LEDGER" WHERE ("PS_LEDGER"."FISCAL_YEAR"=2020 AND "PS_LEDGER"."LEDGER"='ACTUALS' AND "PS_LEDGER"."CURRENCY_CD"='GBP') 
AND  ( ( ( (2020 <= "PS_LEDGER"."FISCAL_YEAR" AND "PS_LEDGER"."FISCAL_YEAR"< 2021) ) )  )GROUP BY 
"PS_LEDGER"."BUSINESS_UNIT","PS_LEDGER"."ACCOUNT","PS_LEDGER"."CHARTFIELD1","PS_LEDGER"."FISCAL_YEAR","PS_LEDGER"."ACCOUNTING_PERIOD"
…

However, the 2020 materialized view has gone from 256 blocks per period to 384 blocks, and from 285 to 189 rows per block because the data is no longer compressed because it was not inserted in direct path mode, although there was still a commit between the delete and insert statements.

                                          Sub-                                             Rows
                Part                      Part                                              per
TABLE_NAME       Pos PARTITION_NAME        Pos SUBPARTITION_NAME         NUM_ROWS BLOCKS  Block COMPRESS COMPRESS_FOR
--------------- ---- -------------------- ---- ------------------------- -------- ------ ------ -------- ------------------------------
MV_LEDGER_2019     1 AP_BF                                                  72886    252  289.2 ENABLED  BASIC
                   2 AP_01                                                  72925    252  289.4 ENABLED  BASIC
                   3 AP_02                                                  72736    251  289.8 ENABLED  BASIC
                   4 AP_03                                                  72745    251  289.8 ENABLED  BASIC
                   5 AP_04                                                  72649    251  289.4 ENABLED  BASIC
                   6 AP_05                                                  71947    249  288.9 ENABLED  BASIC
                   7 AP_06                                                  72903    252  289.3 ENABLED  BASIC
                   8 AP_07                                                  72510    250  290.0 ENABLED  BASIC
                   9 AP_08                                                  72520    251  288.9 ENABLED  BASIC
                  10 AP_09                                                  72965    252  289.5 ENABLED  BASIC
                  11 AP_10                                                  72209    250  288.8 ENABLED  BASIC
                  12 AP_11                                                  72647    251  289.4 ENABLED  BASIC
                  13 AP_12                                                  73121    253  289.0 ENABLED  BASIC
                  14 AP_CF                                                   1999     25   80.0 ENABLED  BASIC
                                                                           946762   3290  287.8

MV_LEDGER_2020     1 AP_BF                                                  72475    384  188.7 ENABLED  BASIC
                   2 AP_01                                                  72981    384  190.1 ENABLED  BASIC
                   3 AP_02                                                  72726    384  189.4 ENABLED  BASIC
                   4 AP_03                                                  72844    384  189.7 ENABLED  BASIC
                   5 AP_04                                                  72709    384  189.3 ENABLED  BASIC
                   6 AP_05                                                  72535    384  188.9 ENABLED  BASIC
                   7 AP_06                                                  72419    384  188.6 ENABLED  BASIC
                   8 AP_07                                                  72795   1006   72.4 ENABLED  BASIC
                   9 AP_08                                                      0      0        ENABLED  BASIC
                  10 AP_09                                                      0      0        ENABLED  BASIC
                  11 AP_10                                                      0      0        ENABLED  BASIC
                  12 AP_11                                                      0      0        ENABLED  BASIC
                  13 AP_12                                                      0      0        ENABLED  BASIC
                  14 AP_CF                                                      0      0        ENABLED  BASIC
                                                                           581484   3694  157.4

MV_CAPABILITIES reports PCT is available, and it is. It correctly identified stale partitions that prevent rewrite.

MVNAME                         CAPABILITY_NAME                P REL_TEXT             MSGTXT
------------------------------ ------------------------------ - -------------------- ------------------------------------------------------------
MV_LEDGER_2019                 PCT                            Y
                               REFRESH_COMPLETE               Y
                               REFRESH_FAST                   Y
                               REWRITE                        Y
                               PCT_TABLE                      Y PS_LEDGER
                               REFRESH_FAST_AFTER_INSERT      N SCOTT.PS_LEDGER      the detail table does not have a materialized view log
                               REFRESH_FAST_AFTER_ONETAB_DML  N POSTED_TOTAL_AMT     SUM(expr) without COUNT(expr)
                               REFRESH_FAST_AFTER_ONETAB_DML  N                      see the reason why REFRESH_FAST_AFTER_INSERT is disabled
                               REFRESH_FAST_AFTER_ONETAB_DML  N                      COUNT(*) is not present in the select list
                               REFRESH_FAST_AFTER_ONETAB_DML  N                      SUM(expr) without COUNT(expr)
                               REFRESH_FAST_AFTER_ANY_DML     N                      see the reason why REFRESH_FAST_AFTER_ONETAB_DML is disabled
                               REFRESH_FAST_PCT               Y
                               REWRITE_FULL_TEXT_MATCH        Y
                               REWRITE_PARTIAL_TEXT_MATCH     Y
                               REWRITE_GENERAL                Y
                               REWRITE_PCT                    Y
                               PCT_TABLE_REWRITE              Y PS_LEDGER

MV_LEDGER_2020                 PCT                            Y
                               REFRESH_COMPLETE               Y
                               REFRESH_FAST                   Y
                               REWRITE                        Y
                               PCT_TABLE                      Y PS_LEDGER
                               REFRESH_FAST_AFTER_INSERT      N SCOTT.PS_LEDGER      the detail table does not have a materialized view log
                               REFRESH_FAST_AFTER_ONETAB_DML  N POSTED_TOTAL_AMT     SUM(expr) without COUNT(expr)
                               REFRESH_FAST_AFTER_ONETAB_DML  N                      see the reason why REFRESH_FAST_AFTER_INSERT is disabled
                               REFRESH_FAST_AFTER_ONETAB_DML  N                      COUNT(*) is not present in the select list
                               REFRESH_FAST_AFTER_ONETAB_DML  N                      SUM(expr) without COUNT(expr)
                               REFRESH_FAST_AFTER_ANY_DML     N                      see the reason why REFRESH_FAST_AFTER_ONETAB_DML is disabled
                               REFRESH_FAST_PCT               Y
                               REWRITE_FULL_TEXT_MATCH        Y
                               REWRITE_PARTIAL_TEXT_MATCH     Y
                               REWRITE_GENERAL                Y
                               REWRITE_PCT                    Y
                               PCT_TABLE_REWRITE              Y PS_LEDGER

Mismatching partitioning caused non-atomic refresh to go back to atomic mode and so the data was no longer compressed.

Demonstration 7: Partition on Accounting Period, Subpartition on Fiscal Year!

This final example still composite partitions the ledger table, but now I will swap the partitioning and sub-partitioning. I will range partition on ACCOUNTING PERIOD into 14 partitions per fiscal year and will subpartition on FISCAL_YEAR. The intention is to demonstrate that the partition elimination will still work correctly and that I will only have to refresh a single accounting period.

However, you will see that there are some problems, and I can't work around all of them.

I will use a template so that each accounting period partition will have the same fiscal year subpartitions.

I will still only range partition the MV on accounting period. We don't need to partition it on FISCAL_YEAR since it only contains a single year.

CREATE TABLE ps_ledger
(business_unit VARCHAR2(5) NOT NULL
…
) PCTFREE 10 PCTUSED 80
PARTITION BY RANGE (ACCOUNTING_PERIOD)
SUBPARTITION BY RANGE (FISCAL_YEAR) 
SUBPARTITION TEMPLATE
(SUBPARTITION ledger_2018 VALUES LESS THAN (2019) 
,SUBPARTITION ledger_2019 VALUES LESS THAN (2020)
,SUBPARTITION ledger_2020 VALUES LESS THAN (2021) 
,SUBPARTITION ledger_2021 VALUES LESS THAN (2022))
(PARTITION ap_bf VALUES LESS THAN (1) 
,PARTITION ap_01 VALUES LESS THAN (2) 
,PARTITION ap_02 VALUES LESS THAN (3) 
,PARTITION ap_03 VALUES LESS THAN (4) 
,PARTITION ap_04 VALUES LESS THAN (5) 
,PARTITION ap_05 VALUES LESS THAN (6) 
,PARTITION ap_06 VALUES LESS THAN (7) 
,PARTITION ap_07 VALUES LESS THAN (8) 
,PARTITION ap_08 VALUES LESS THAN (9) 
,PARTITION ap_09 VALUES LESS THAN (10) 
,PARTITION ap_10 VALUES LESS THAN (11) 
,PARTITION ap_11 VALUES LESS THAN (12) 
,PARTITION ap_12 VALUES LESS THAN (13) 
,PARTITION ap_cf VALUES LESS THAN (MAXVALUE))
ENABLE ROW MOVEMENT NOLOGGING
/

I can't specify physical attributes on subpartitions, only partitions. So I have to come along afterwards and alter the sub-partitions. I am going to do that before I populate the data so it is compressed on load rather than load it and rebuild it afterwards.

set serveroutput on 
DECLARE
 l_sql CLOB;
BEGIN
 FOR i IN (
  select * 
  from user_tab_subpartitions
  where table_name = 'PS_LEDGER'
  and subpartition_name like 'AP%LEDGER%201%'
  and (compression = 'DISABLED' OR pct_free>0)
  order by table_name, partition_position, subpartition_position
 ) LOOP
  l_sql := 'ALTER TABLE '||i.table_name||' MOVE SUBPARTITION '||i.subpartition_name||' COMPRESS UPDATE INDEXES';
  dbms_output.put_line(l_sql);
  EXECUTE IMMEDIATE l_sql;
 END LOOP;
END;
/
@treeselectors
@popledger

                                          Sub-                                             Rows
                Part                      Part                                              per
TABLE_NAME       Pos PARTITION_NAME        Pos SUBPARTITION_NAME         NUM_ROWS BLOCKS  Block COMPRESS COMPRESS_FOR
--------------- ---- -------------------- ---- ------------------------- -------- ------ ------ -------- ------------
PS_LEDGER          1 AP_BF                                                 261458   5147   50.8 NONE
                   1                         1 AP_BF_LEDGER_2018            84565   1372   61.6 ENABLED  BASIC
                   1                         2 AP_BF_LEDGER_2019            84519   1371   61.6 ENABLED  BASIC
                   1                         3 AP_BF_LEDGER_2020            84673   2193   38.6 DISABLED
                   1                         4 AP_BF_LEDGER_2021             7701    211   36.5 DISABLED
                   2 AP_01                                                 261108   5174   50.5 NONE
                   2                         1 AP_01_LEDGER_2018            84268   1368   61.6 ENABLED  BASIC
                   2                         2 AP_01_LEDGER_2019            84233   1366   61.7 ENABLED  BASIC
                   2                         3 AP_01_LEDGER_2020            84831   2224   38.1 DISABLED
                   2                         4 AP_01_LEDGER_2021             7776    216   36.0 DISABLED
                   3 AP_02                                                 261174   5172   50.5 NONE
                   3                         1 AP_02_LEDGER_2018            84372   1369   61.6 ENABLED  BASIC
                   3                         2 AP_02_LEDGER_2019            84444   1370   61.6 ENABLED  BASIC
                   3                         3 AP_02_LEDGER_2020            84596   2218   38.1 DISABLED
                   3                         4 AP_02_LEDGER_2021             7762    215   36.1 DISABLED
                   4 AP_03                                                 259982   5149   50.5 NONE
                   4                         1 AP_03_LEDGER_2018            84105   1364   61.7 ENABLED  BASIC
                   4                         2 AP_03_LEDGER_2019            83820   1360   61.6 ENABLED  BASIC
                   4                         3 AP_03_LEDGER_2020            84284   2210   38.1 DISABLED
                   4                         4 AP_03_LEDGER_2021             7773    215   36.2 DISABLED
                   5 AP_04                                                 261376   5177   50.5 NONE
                   5                         1 AP_04_LEDGER_2018            84378   1369   61.6 ENABLED  BASIC
                   5                         2 AP_04_LEDGER_2019            84649   1374   61.6 ENABLED  BASIC
                   5                         3 AP_04_LEDGER_2020            84652   2220   38.1 DISABLED
                   5                         4 AP_04_LEDGER_2021             7697    214   36.0 DISABLED
                   6 AP_05                                                 261772   5180   50.5 NONE
                   6                         1 AP_05_LEDGER_2018            84984   1378   61.7 ENABLED  BASIC
                   6                         2 AP_05_LEDGER_2019            84656   1374   61.6 ENABLED  BASIC
                   6                         3 AP_05_LEDGER_2020            84507   2216   38.1 DISABLED
                   6                         4 AP_05_LEDGER_2021             7625    212   36.0 DISABLED
                   7 AP_06                                                 260581   5165   50.5 NONE
                   7                         1 AP_06_LEDGER_2018            83994   1363   61.6 ENABLED  BASIC
                   7                         2 AP_06_LEDGER_2019            84150   1366   61.6 ENABLED  BASIC
                   7                         3 AP_06_LEDGER_2020            84729   2222   38.1 DISABLED
                   7                         4 AP_06_LEDGER_2021             7708    214   36.0 DISABLED
                   8 AP_07                                                 184118   3163   58.2 NONE
                   8                         1 AP_07_LEDGER_2018            84863   1377   61.6 ENABLED  BASIC
                   8                         2 AP_07_LEDGER_2019            84155   1366   61.6 ENABLED  BASIC
                   8                         3 AP_07_LEDGER_2020             7587    211   36.0 DISABLED
                   8                         4 AP_07_LEDGER_2021             7513    209   35.9 DISABLED
                   9 AP_08                                                 184619   3173   58.2 NONE
                   9                         1 AP_08_LEDGER_2018            84547   1372   61.6 ENABLED  BASIC
                   9                         2 AP_08_LEDGER_2019            84775   1376   61.6 ENABLED  BASIC
                   9                         3 AP_08_LEDGER_2020             7662    213   36.0 DISABLED
                   9                         4 AP_08_LEDGER_2021             7635    212   36.0 DISABLED
                  10 AP_09                                                 184375   3168   58.2 NONE
                  10                         1 AP_09_LEDGER_2018            84407   1370   61.6 ENABLED  BASIC
                  10                         2 AP_09_LEDGER_2019            84645   1373   61.6 ENABLED  BASIC
                  10                         3 AP_09_LEDGER_2020             7570    210   36.0 DISABLED
                  10                         4 AP_09_LEDGER_2021             7753    215   36.1 DISABLED
                  11 AP_10                                                 184327   3166   58.2 NONE
                  11                         1 AP_10_LEDGER_2018            84300   1368   61.6 ENABLED  BASIC
                  11                         2 AP_10_LEDGER_2019            84738   1374   61.7 ENABLED  BASIC
                  11                         3 AP_10_LEDGER_2020             7656    212   36.1 DISABLED
                  11                         4 AP_10_LEDGER_2021             7633    212   36.0 DISABLED
                  12 AP_11                                                 184489   3167   58.3 NONE
                  12                         1 AP_11_LEDGER_2018            84406   1369   61.7 ENABLED  BASIC
                  12                         2 AP_11_LEDGER_2019            84861   1376   61.7 ENABLED  BASIC
                  12                         3 AP_11_LEDGER_2020             7700    213   36.2 DISABLED
                  12                         4 AP_11_LEDGER_2021             7522    209   36.0 DISABLED
                  13 AP_12                                                 184244   3168   58.2 NONE
                  13                         1 AP_12_LEDGER_2018            84611   1373   61.6 ENABLED  BASIC
                  13                         2 AP_12_LEDGER_2019            84155   1365   61.7 ENABLED  BASIC
                  13                         3 AP_12_LEDGER_2020             7776    216   36.0 DISABLED
                  13                         4 AP_12_LEDGER_2021             7702    214   36.0 DISABLED
                  14 AP_CF                                                   4800    154   31.2 NONE
                  14                         1 AP_CF_LEDGER_2018             2200     53   41.5 ENABLED  BASIC
                  14                         2 AP_CF_LEDGER_2019             2200     53   41.5 ENABLED  BASIC
                  14                         3 AP_CF_LEDGER_2020              200     24    8.3 DISABLED
                  14                         4 AP_CF_LEDGER_2021              200     24    8.3 DISABLED
                                                                          2938423  55323   53.1

If I query periods 1-6 in 2018 I get correct partition elimination. Oracle inspects 6 partitions, 1 sub-partition on each. So swapping the composite partitioning types and columns should not affect performance.

Plan hash value: 2690363151
-----------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name              | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                   |   717 | 66681 |  2244   (1)| 00:00:01 |       |       |
|   1 |  HASH GROUP BY              |                   |   717 | 66681 |  2244   (1)| 00:00:01 |       |       |
|*  2 |   HASH JOIN                 |                   |   717 | 66681 |  2243   (1)| 00:00:01 |       |       |
|*  3 |    INDEX RANGE SCAN         | PS_PSTREESELECT10 |   258 |  4902 |     2   (0)| 00:00:01 |       |       |
|*  4 |    HASH JOIN                |                   |  2776 |   200K|  2241   (1)| 00:00:01 |       |       |
|   5 |     MERGE JOIN CARTESIAN    |                   |   208 |  6032 |     3   (0)| 00:00:01 |       |       |
|*  6 |      INDEX RANGE SCAN       | PS_PSTREESELECT05 |     2 |    22 |     1   (0)| 00:00:01 |       |       |
|   7 |      BUFFER SORT            |                   |   104 |  1872 |     2   (0)| 00:00:01 |       |       |
|*  8 |       INDEX RANGE SCAN      | PS_PSTREESELECT10 |   104 |  1872 |     1   (0)| 00:00:01 |       |       |
|   9 |     PARTITION RANGE ITERATOR|                   | 26693 |  1173K|  2238   (1)| 00:00:01 |     2 |     7 |
|  10 |      PARTITION RANGE SINGLE |                   | 26693 |  1173K|  2238   (1)| 00:00:01 |     1 |     1 |
|* 11 |       TABLE ACCESS FULL     | PS_LEDGER         | 26693 |  1173K|  2238   (1)| 00:00:01 |       |       |
-----------------------------------------------------------------------------------------------------------------
…
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("A"."ACCOUNT"="L2"."RANGE_FROM_10")
   3 - access("L2"."SELECTOR_NUM"=30984)
   4 - access("A"."BUSINESS_UNIT"="L1"."RANGE_FROM_05" AND "A"."CHARTFIELD1"="L"."RANGE_FROM_10")
   6 - access("L1"."SELECTOR_NUM"=30982)
   8 - access("L"."SELECTOR_NUM"=30985)
  11 - filter("A"."ACCOUNTING_PERIOD"<=6 AND "A"."FISCAL_YEAR"=2018 AND "A"."LEDGER"='ACTUALS' AND
"A"."CURRENCY_CD"='GBP')

CREATE MATERIALIZED VIEW mv_ledger_2019
PARTITION BY RANGE (ACCOUNTING_PERIOD)
(PARTITION ap_bf VALUES LESS THAN (1) 
,PARTITION ap_01 VALUES LESS THAN (2) 
,PARTITION ap_02 VALUES LESS THAN (3) 
,PARTITION ap_03 VALUES LESS THAN (4) 
,PARTITION ap_04 VALUES LESS THAN (5) 
,PARTITION ap_05 VALUES LESS THAN (6) 
,PARTITION ap_06 VALUES LESS THAN (7) 
,PARTITION ap_07 VALUES LESS THAN (8) 
,PARTITION ap_08 VALUES LESS THAN (9) 
,PARTITION ap_09 VALUES LESS THAN (10) 
,PARTITION ap_10 VALUES LESS THAN (11) 
,PARTITION ap_11 VALUES LESS THAN (12) 
,PARTITION ap_12 VALUES LESS THAN (13) 
,PARTITION ap_cf VALUES LESS THAN (MAXVALUE)
) PCTFREE 0 COMPRESS
REFRESH COMPLETE ON DEMAND
ENABLE QUERY REWRITE AS
SELECT business_unit, account, chartfield1, fiscal_year, accounting_period,
sum(posted_total_amt) posted_total_amt
FROM ps_ledger
WHERE fiscal_year = 2019
AND   ledger = 'ACTUALS'
AND   currency_cd = 'GBP'
GROUP BY business_unit, account, chartfield1, fiscal_year, accounting_period
/

CREATE MATERIALIZED VIEW mv_ledger_2020
PARTITION BY RANGE (ACCOUNTING_PERIOD)
(PARTITION ap_bf VALUES LESS THAN (1) 
,PARTITION ap_01 VALUES LESS THAN (2) 
,PARTITION ap_02 VALUES LESS THAN (3) 
,PARTITION ap_03 VALUES LESS THAN (4) 
,PARTITION ap_04 VALUES LESS THAN (5) 
,PARTITION ap_05 VALUES LESS THAN (6) 
,PARTITION ap_06 VALUES LESS THAN (7) 
,PARTITION ap_07 VALUES LESS THAN (8) 
,PARTITION ap_08 VALUES LESS THAN (9) 
,PARTITION ap_09 VALUES LESS THAN (10) 
,PARTITION ap_10 VALUES LESS THAN (11) 
,PARTITION ap_11 VALUES LESS THAN (12) 
,PARTITION ap_12 VALUES LESS THAN (13) 
,PARTITION ap_cf VALUES LESS THAN (MAXVALUE)
) PCTFREE 0 COMPRESS PARALLEL
REFRESH COMPLETE ON DEMAND
ENABLE QUERY REWRITE AS
SELECT business_unit, account, chartfield1, fiscal_year, accounting_period,
sum(posted_total_amt) posted_total_amt
FROM ps_ledger
WHERE fiscal_year = 2020
AND   ledger = 'ACTUALS'
AND   currency_cd = 'GBP'
GROUP BY business_unit, account, chartfield1, fiscal_year, accounting_period
/
@mvpop

USER_MVIEW_DETAIL_SUBPARTITION correctly identified the one stale sub-partition, but USER_MVIEW_DETAIL_PARTITION reports that one range partition is stale

@pop2020m7.sql
MVIEW_NAME      STALENESS           LAST_REF COMPILE_STATE
--------------- ------------------- -------- -------------------
MV_LEDGER_2019  NEEDS_COMPILE       COMPLETE NEEDS_COMPILE
MV_LEDGER_2020  NEEDS_COMPILE       COMPLETE NEEDS_COMPILE

01:02:53 SQL> select * from user_mview_detail_relations;

                           Detailobj
OWNER      MVIEW_NAME      Owner      DETAILOBJ_NAME  DETAILOBJ DETAILOBJ_ALIAS      D NUM_FRESH_PCT_PARTITIONS NUM_STALE_PCT_PARTITIONS
---------- --------------- ---------- --------------- --------- -------------------- - ------------------------ ------------------------
SCOTT      MV_LEDGER_2019  SCOTT      PS_LEDGER       TABLE     PS_LEDGER            Y                       55                        1
SCOTT      MV_LEDGER_2020  SCOTT      PS_LEDGER       TABLE     PS_LEDGER            Y                       55                        1

01:03:06 SQL> select * from user_mview_detail_subpartition where freshness != 'FRESH';

                           Detailobj
OWNER      MVIEW_NAME      Owner      DETAILOBJ_NAME  DETAIL_PARTITION_NAM DETAIL_SUBPARTITION_ DETAIL_SUBPARTITION_POSITION FRESH
---------- --------------- ---------- --------------- -------------------- -------------------- ---------------------------- -----
SCOTT      MV_LEDGER_2019  SCOTT      PS_LEDGER       AP_07                AP_07_LEDGER_2020                               3 STALE
SCOTT      MV_LEDGER_2020  SCOTT      PS_LEDGER       AP_07                AP_07_LEDGER_2020                               3 STALE

I get query rewrite as you would expect, and as seen in demo 5. Fiscal year 2019, period 7 still rewrites because the partition is not stale

Plan hash value: 387550712
----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name              | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                   |  1967 |   147K|    76   (3)| 00:00:01 |       |       |
|   1 |  HASH GROUP BY                   |                   |  1967 |   147K|    76   (3)| 00:00:01 |       |       |
|*  2 |   HASH JOIN                      |                   |  1967 |   147K|    75   (2)| 00:00:01 |       |       |
|*  3 |    INDEX RANGE SCAN              | PS_PSTREESELECT10 |   258 |  4902 |     2   (0)| 00:00:01 |       |       |
|*  4 |    HASH JOIN                     |                   |  7576 |   429K|    73   (2)| 00:00:01 |       |       |
|   5 |     MERGE JOIN CARTESIAN         |                   |   208 |  6032 |     3   (0)| 00:00:01 |       |       |
|*  6 |      INDEX RANGE SCAN            | PS_PSTREESELECT05 |     2 |    22 |     1   (0)| 00:00:01 |       |       |
|   7 |      BUFFER SORT                 |                   |   104 |  1872 |     2   (0)| 00:00:01 |       |       |
|*  8 |       INDEX RANGE SCAN           | PS_PSTREESELECT10 |   104 |  1872 |     1   (0)| 00:00:01 |       |       |
|   9 |     PARTITION RANGE SINGLE       |                   | 72486 |  2052K|    70   (2)| 00:00:01 |     8 |     8 |
|* 10 |      MAT_VIEW REWRITE ACCESS FULL| MV_LEDGER_2019    | 72486 |  2052K|    70   (2)| 00:00:01 |     8 |     8 |
----------------------------------------------------------------------------------------------------------------------
…
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("MV_LEDGER_2019"."ACCOUNT"="L2"."RANGE_FROM_10")
   3 - access("L2"."SELECTOR_NUM"=30984)
   4 - access("MV_LEDGER_2019"."BUSINESS_UNIT"="L1"."RANGE_FROM_05" AND
"MV_LEDGER_2019"."CHARTFIELD1"="L"."RANGE_FROM_10")
   6 - access("L1"."SELECTOR_NUM"=30982)
   8 - access("L"."SELECTOR_NUM"=30985)
  10 - filter("MV_LEDGER_2019"."ACCOUNTING_PERIOD"=7)

Fiscal year 2020 period 7 doesn't rewrite, because the subpartition is stale.

Plan hash value: 1321682226
---------------------------------------------------------------------------------------------------------------------------------------------
|   Id  | Operation                                             | Name              | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------------------------
|     0 | SELECT STATEMENT                                      |                   |     1 |    92 |     7  (15)| 00:00:01 |       |       |
|     1 |  HASH GROUP BY                                        |                   |     1 |    92 |     7  (15)| 00:00:01 |       |       |
|- *  2 |   HASH JOIN                                           |                   |     1 |    92 |     6   (0)| 00:00:01 |       |       |
|     3 |    NESTED LOOPS                                       |                   |     1 |    92 |     6   (0)| 00:00:01 |       |       |
|-    4 |     STATISTICS COLLECTOR                              |                   |       |       |            |          |       |       |
|- *  5 |      HASH JOIN                                        |                   |     1 |    73 |     5   (0)| 00:00:01 |       |       |
|     6 |       NESTED LOOPS                                    |                   |     1 |    73 |     5   (0)| 00:00:01 |       |       |
|-    7 |        STATISTICS COLLECTOR                           |                   |       |       |            |          |       |       |
|- *  8 |         HASH JOIN                                     |                   |     1 |    55 |     4   (0)| 00:00:01 |       |       |
|     9 |          NESTED LOOPS                                 |                   |     1 |    55 |     4   (0)| 00:00:01 |       |       |
|-   10 |           STATISTICS COLLECTOR                        |                   |       |       |            |          |       |       |
|    11 |            PARTITION RANGE SINGLE                     |                   |     1 |    44 |     3   (0)| 00:00:01 |     8 |     8 |
|    12 |             PARTITION RANGE SINGLE                    |                   |     1 |    44 |     3   (0)| 00:00:01 |     3 |     3 |
|  * 13 |              TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| PS_LEDGER         |     1 |    44 |     3   (0)| 00:00:01 |    31 |    31 |
|  * 14 |               INDEX RANGE SCAN                        | PSXLEDGER         |     1 |       |     2   (0)| 00:00:01 |    31 |    31 |
|  * 15 |           INDEX RANGE SCAN                            | PS_PSTREESELECT05 |     1 |    11 |     1   (0)| 00:00:01 |       |       |
|- * 16 |          INDEX RANGE SCAN                             | PS_PSTREESELECT05 |     1 |    11 |     1   (0)| 00:00:01 |       |       |
|  * 17 |        INDEX RANGE SCAN                               | PS_PSTREESELECT10 |     1 |    18 |     1   (0)| 00:00:01 |       |       |
|- * 18 |       INDEX RANGE SCAN                                | PS_PSTREESELECT10 |     1 |    18 |     1   (0)| 00:00:01 |       |       |
|  * 19 |     INDEX RANGE SCAN                                  | PS_PSTREESELECT10 |     1 |    19 |     1   (0)| 00:00:01 |       |       |
|- * 20 |    INDEX RANGE SCAN                                   | PS_PSTREESELECT10 |     1 |    19 |     1   (0)| 00:00:01 |       |       |
---------------------------------------------------------------------------------------------------------------------------------------------
…
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("A"."ACCOUNT"="L2"."RANGE_FROM_10")
   5 - access("A"."CHARTFIELD1"="L"."RANGE_FROM_10")
   8 - access("A"."BUSINESS_UNIT"="L1"."RANGE_FROM_05")
  13 - filter("A"."CURRENCY_CD"='GBP')
  14 - access("A"."LEDGER"='ACTUALS' AND "A"."FISCAL_YEAR"=2020 AND "A"."ACCOUNTING_PERIOD"=7)
  15 - access("L1"."SELECTOR_NUM"=30982 AND "A"."BUSINESS_UNIT"="L1"."RANGE_FROM_05")
       filter("A"."BUSINESS_UNIT"="L1"."RANGE_FROM_05")
  16 - access("L1"."SELECTOR_NUM"=30982)
  17 - access("L"."SELECTOR_NUM"=30985 AND "A"."CHARTFIELD1"="L"."RANGE_FROM_10")
       filter("A"."CHARTFIELD1"="L"."RANGE_FROM_10")
  18 - access("L"."SELECTOR_NUM"=30985)
  19 - access("L2"."SELECTOR_NUM"=30984 AND "A"."ACCOUNT"="L2"."RANGE_FROM_10")
       filter("A"."ACCOUNT"="L2"."RANGE_FROM_10")
  20 - access("L2"."SELECTOR_NUM"=30984)

As we have already seen refresh processes all subpartitions for a partition. Now, not surprisingly, the refresh process truncates the partition for period 7 in both the 2019 and 2020 MVs even though only the 2020 data was affected. So because period 7 was stale in one fiscal year, it processed all fiscal years. We would have had the same problem if I had composite partitioned the materialized view to match table, it would have truncated and reprocessed fiscal yeares for period 7.

…
/* MV_REFRESH (ATB) */ ALTER TABLE "SCOTT"."MV_LEDGER_2019"TRUNCATE  PARTITION AP_07 UPDATE GLOBAL INDEXES
…
/* MV_REFRESH (INS) */ INSERT /*+ APPEND BYPASS_RECURSIVE_CHECK */  INTO "SCOTT"."MV_LEDGER_2019"PARTITION ( AP_07 ) ("BUSINESS_UNIT", 
"ACCOUNT", "CHARTFIELD1", "FISCAL_YEAR", "ACCOUNTING_PERIOD", "POSTED_TOTAL_AMT") SELECT  /*+ X_DYN_PRUNE */ "PS_LEDGER"."BUSINESS_UNIT" , 
"PS_LEDGER"."ACCOUNT" , "PS_LEDGER"."CHARTFIELD1" , "PS_LEDGER"."FISCAL_YEAR" , "PS_LEDGER"."ACCOUNTING_PERIOD" P0, 
SUM("PS_LEDGER"."POSTED_TOTAL_AMT")  FROM "PS_LEDGER""PS_LEDGER" WHERE ("PS_LEDGER"."FISCAL_YEAR"=2019 AND "PS_LEDGER"."LEDGER"='ACTUALS'
AND "PS_LEDGER"."CURRENCY_CD"='GBP') AND  ( ( (  (  (  ( "PS_LEDGER"."ACCOUNTING_PERIOD">= 7 )  )  )  
AND  (  (  ( "PS_LEDGER"."ACCOUNTING_PERIOD"< 8 )  )  )  )  ) )GROUP BY 
"PS_LEDGER"."BUSINESS_UNIT","PS_LEDGER"."ACCOUNT","PS_LEDGER"."CHARTFIELD1","PS_LEDGER"."FISCAL_YEAR","PS_LEDGER"."ACCOUNTING_PERIOD"
…
/* MV_REFRESH (ATB) */ ALTER TABLE "SCOTT"."MV_LEDGER_2020" TRUNCATE  PARTITION AP_07 UPDATE GLOBAL INDEXES
…
/* MV_REFRESH (INS) */ INSERT /*+ APPEND BYPASS_RECURSIVE_CHECK */  INTO "SCOTT"."MV_LEDGER_2020"PARTITION ( AP_07 ) ("BUSINESS_UNIT", 
"ACCOUNT", "CHARTFIELD1", "FISCAL_YEAR", "ACCOUNTING_PERIOD", "POSTED_TOTAL_AMT") SELECT  /*+ X_DYN_PRUNE */ "PS_LEDGER"."BUSINESS_UNIT" , 
"PS_LEDGER"."ACCOUNT" , "PS_LEDGER"."CHARTFIELD1" , "PS_LEDGER"."FISCAL_YEAR" , "PS_LEDGER"."ACCOUNTING_PERIOD" P0, 
SUM("PS_LEDGER"."POSTED_TOTAL_AMT")  FROM "PS_LEDGER""PS_LEDGER" WHERE ("PS_LEDGER"."FISCAL_YEAR"=2020 AND "PS_LEDGER"."LEDGER"='ACTUALS'
AND "PS_LEDGER"."CURRENCY_CD"='GBP') AND  ( ( (  (  (  ( "PS_LEDGER"."ACCOUNTING_PERIOD">= 7 )  )  )  
AND  (  (  ( "PS_LEDGER"."ACCOUNTING_PERIOD"< 8 )  )  )  )  ) )GROUP BY 
"PS_LEDGER"."BUSINESS_UNIT","PS_LEDGER"."ACCOUNT","PS_LEDGER"."CHARTFIELD1","PS_LEDGER"."FISCAL_YEAR","PS_LEDGER"."ACCOUNTING_PERIOD"
…

Partition pruning still worked correctly after swapping the partitioning and sub-partitioning columns.

It also correctly controlled query rewrite.

However, the PCT refresh processed all years for the single accounting period, rather than all accounting periods for the single year. That is less work if you have fewer fiscal years than accounting periods. Generally, I see systems only contain 3 to 6 fiscal years of data. However, it is also refreshing MVs that didn't need to be refreshed

Swapping the partitioning columns has also made the management of the partitions in the ledger table much more complicated.

I can't interval sub-partition, so I can't automatically add partitions for future fiscal years on demand. Instead, I am going to have to add new fiscal year subpartitions to each of the 1 4 range partitions.
I can't specify storage options or compression attributes on sub-partitions in the create table DDL command, so I have to come along afterwards with PL/SQL to alter the sub-partitions.

On balance, I don't think I would choose to implement this.

Conclusion

PCT does track individually stale partitions and subpartitions, but the subsequent refresh appears only to be done by partition. If one subpartition is stale, then the entire partition is refreshed. If you use composite partitioning then you may have to accept reprocessing more data than is absolutely necessary rather than create a partitioning strategy that is less effective.

The subpartition key should be subordinate to the partition key. In the ledger example that I have used, I think it is better to partition by fiscal year and subpartition by accounting period (demonstration 5) than vice versa (demonstration 7).

PCT doesn't work when there are multiple partitioning key columns. So you need to find a single partition key column that is used by the application that is sufficiently selective to restrict the number of partitions being refreshed.

The partitioning on the table and the materialized view must be the same type of partitioning and on the same column. Otherwise, while PCT may still work, the refresh process may not be possible to populate the materialized view in direct-path mode, and it may not be possible to maintain compressed materialized views.

There will be a balance to be struck. On the one hand application performance can be improved by partitioning application tables in a way that partition elimination is effective, but that partitioning strategy may not work with PCT. On the other, reporting performance can be improved maintaining fresh pre-aggregated data in materialized views, and PCT can help to keep the materialized fresh with less overhead.

↧

First Steps in Spatial Data

February 3, 2021, 3:56 am

≫ Next: Spatial Data 1: Loading GPX data into XML data types

≪ Previous: Partition Change Tracking During Materialized View Refresh and Query Rewrite

This is the introductory blog post in a series about using Spatial Data in the Oracle database.

Caveat: Spatial Data has been a part of the Oracle database since at least version 8i. I have been aware of it for many years, but have never previously used it myself. Recently, I have recently had some spare time and decided to experiment with it. These blogs document my first steps. I have spent a lot of time reading the documentation and using Google to find other people's blogs. Where I found useful material I have provided links to it. It is likely that more experienced developers can point out my mistakes, and better methods to achieve results. In which case, I will gladly publish comments and make corrections to my material.

Index

Problem Statement

A map reading stop!

When I am not working with Oracle databases, I am a keen cyclist and I ride with a touring club. I have also always enjoyed maps having been taught to read Ordnance Survey maps at school. It is no surprise therefore that I lead rides for my cycling club. We used to use (and you can still buy) paper maps. By 2005 I was starting to use a GPS.

Initially, I recorded rides as tracks on PDA. By 2012, I was regularly using an android tablet on my handlebar bag for navigation. The market has caught up and people now attach their phones to their handlebars or have dedicated bike computers with GPS and Bluetooth links to their phones. The cycling club website includes a library of the routes of previous rides, however, you can only search that by the structured data held for that ride. So, for example, I can only search for rides in the Chilterns if that word appears in the description. I cannot do a spatial search.

I have also started to use Strava, an internet service for tracking exercise. It is mainly used by cyclists and runners. Activities can be recorded on a phone or other device and then be uploaded, compared and analysed. Every time I go out on the bike I upload the activity. I also uploaded my back catalogue of GPS data. As a result of the Coronavirus lockdowns, I bought an indoor trainer by that I use with Zwift and that also posts data to Strava. My most recent toy is a heart monitor. Both Strava and Zwift also capture data from that. Strava will let you see a certain amount of analysis about your activities and how you compare to other people, and more if you pay for their subscription service. They will also allow you to export and download all of your data as a set of structured data in CSV files, and also the GPX files and photographs that you uploaded.

I thought it would be interesting to try to analyse and interrogate that data. Typical questions might include:

I ride up Swain's Lane in Highgate most days. How long do I take, and am I getting faster or slower?
I want to go for a ride in the Chilterns, so I would like to see tracks of previous rides to get some route ideas.

So I am going to upload my Strava data into an Oracle database, load the GPS tracks currently in GPX files into the database, convert them to Spatial geometries, and then process them. To answer the first question I will need to provide a working definition of Swain's Lane. For the second, I need definitions of various areas. For example, I will take the Chilterns to be the area designed by Natural England as an Area of Outstanding Natural Beauty. So I will need to import a definition of that and other areas from published data.

The following series of blogs illustrate how I dealt with these and other challenges.

↧

Spatial Data 1: Loading GPX data into XML data types

February 3, 2021, 4:06 am

≫ Next: Spatial Data 2: Convert GPX Track to a Spatial Line Geometry

≪ Previous: First Steps in Spatial Data

This blog is part of a series about my first steps using Spatial Data in the Oracle database. I am using the GPS data for my cycling activities collected by Strava.

In these posts I have only shown extracts of some of the scripts I have written. The full files are available on github.

Upload and Expand Strava Bulk Export

Strava will bulk export all your data to a zipped folder. It contains various CSV files. I am interested in activities.csv that contains a row for each activity with various pieces of data including the name of the data file that can be found in the /activities directory. That file will usually be a .gpx, or it may be zipped as a .gpx.gz file. GPX is an XML schema that contains sets of longitude/latitude coordinates and may contain other attributes.

The first job is to upload the Strava export .zip file to somewhere accessible to the database server (in my case /vagrant) and to expand it (to /tmp/strava/).

cd /vagrant
mkdir /tmp/strava
unzip /vagrant/export_1679301.zip -d /tmp/strava

Create Strava Schema

I need to create a new database schema to hold the various objects I will create, and I have to give it certain privileges.

connect / as sysdba
create user strava identified by strava;
grant connect, resource to strava;
grant create view to strava;
grant select_catalog_role to strava;
grant XDBADMIN to STRAVA;
grant alter session to STRAVA;
alter user strava quota unlimited on users;
alter user strava default tablespace users;

GRANT CREATE ANY DIRECTORY TO strava;
CREATE OR REPLACE DIRECTORY strava as '/tmp/strava';
CREATE OR REPLACE DIRECTORY activities as '/tmp/strava/activities';
CREATE OR REPLACE DIRECTORY exec_dir AS '/usr/bin';

GRANT READ, EXECUTE ON DIRECTORY exec_dir TO strava;
GRANT READ, EXECUTE ON DIRECTORY strava TO strava;
GRANT READ ON DIRECTORY activities TO strava;

1a_create_strava_user.sql

I need to create database directories for both the CSV files in /tmp/strava and the various GPX files in the /tmp/strava/activities sub-directory. I will need read privilege on both directories, and also execute privilege on the strava directory so that I can use a pre-processor script.
The exec_dir directory points to /usr/bin where the zip executables are located. I need read and execute privilege on this so I can read directly from zipped files.
XDBADMIN: "Allows the grantee to register an XML schema globally, as opposed to registering it for use or access only by its owner. It also lets the grantee bypass access control list (ACL) checks when accessing Oracle XML DB Repository".

Import CSV file via an External Table

I will start by creating an external table to read the Strava activities.csv file, and then copy it into a database table. This file is a simple comma-separated variable file. The activity date, name and description are enclosed in double-quotes.

The first problem that I encountered was that some of the descriptions I typed into Strava contain newline characters and the external table interprets them as the end of the record even though these characters are inside the double-quotes.

4380927517,"23 Nov 2020, 18:03:54",Zwift Crash Recovery,Virtual Ride,"Zwift Crash Recovery
1. recover fit file per https://zwiftinsider.com/retrieve-lost-ride/, 
2. fix corrupt .fit file with https://www.fitfiletools.com",1648,13.48,,false,Other,activities/4682540615.gpx.gz,,10.0,1648.0,1648.0,13480.2001953125,13.199999809265137,
8.179733276367188,91.0,36.20000076293945,12.600000381469727,69.5999984741211,7.099999904632568,0.40652215480804443,,,84.0,62.1943244934082,
,,,150.66201782226562,276.8444519042969,,,,,,,,,,,,158.0,1649.0,,,0.0,,1.0,,,,,,,,,,,,,,,,4907360.0,,,,,,,,,,,

As Chris Saxon points out on AskTom, it is necessary to pre-process the records to replace the newline characters with something else. I found this awk script to process the record. So I put it into a shell script nlfix.sh, made it executable and invoked as a pre-processor in the external table definition.

#nlfix.sh
/usr/bin/gawk -v RS='"''NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", $0, RT) }' $*

nlfix.sh

Note the full path for gawk is specified.

A database directory is needed for the location of the pre-processor scripts and it is necessary to grant read and execute privileges on it. I simply put the pre-processor in the same directory as the CSV file so I could use the same strava directory I created earlier.

GRANT READ, EXECUTE ON DIRECTORY strava TO strava;

Now I can define an external table that will read the activities.csv file.

CREATE TABLE strava.activities_ext
(Activity_ID NUMBER
,Activity_Date DATE
,Activity_Name VARCHAR2(100)
,Activity_Type VARCHAR2(15)
,Activity_Description VARCHAR2(200)
,Elapsed_Time NUMBER
,Distance_km NUMBER
…)
ORGANIZATION EXTERNAL
(TYPE ORACLE_LOADER
 DEFAULT DIRECTORY strava
 ACCESS PARAMETERS 
 (RECORDS DELIMITED BY newline 
  SKIP 1
  DISABLE_DIRECTORY_LINK_CHECK
  PREPROCESSOR strava:'nlfix.sh'
  FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' RTRIM
  MISSING FIELD VALUES ARE NULL
  REJECT ROWS WITH ALL NULL FIELDS
  NULLIF = BLANKS
(Activity_ID,Activity_Date date "DD Mon yyyy,HH24:mi:ss"
,Activity_Name,Activity_Type,Activity_Description
,Elapsed_Time,Distance_km
…))
LOCATION ('activities.csv')
) REJECT LIMIT 5
/

1b_create_activities_ext.sql

I have included all of the columns in the CSV whether I need them later or not.
I have specified a date format picture string for the activity date.
See also Reading the Active Session History Compressed Export File in eDB360/SQLd360 as an External Table.

Import Activities

Now I can simply copy from the external table to a regular table. I have omitted a lot of columns that Strava does not populate (at least not in my export) but that appear in the CSV file.

rem 1b_create_activities_ext.sql
spool 1b_create_activities_ext 

CREATE TABLE strava.activities AS
select ACTIVITY_ID,ACTIVITY_DATE,ACTIVITY_NAME,ACTIVITY_TYPE,ACTIVITY_DESCRIPTION,
ELAPSED_TIME,DISTANCE_KM,RELATIVE_EFFORT,COMMUTE_CHAR,ACTIVITY_GEAR,
FILENAME,
ATHLETE_WEIGHT,BIKE_WEIGHT,ELAPSED_TIME2,MOVING_TIME,DISTANCE_M,MAX_SPEED,AVERAGE_SPEED,
ELEVATION_GAIN,ELEVATION_LOSS,ELEVATION_LOW,ELEVATION_HIGH,MAX_GRADE,AVERAGE_GRADE,
--AVERAGE_POSITIVE_GRADE,AVERAGE_NEGATIVE_GRADE,
MAX_CADENCE,AVERAGE_CADENCE,
--MAX_HEART_RATE,
AVERAGE_HEART_RATE,
--MAX_WATTS,
AVERAGE_WATTS,CALORIES,
--MAX_TEMPERATURE,AVERAGE_TEMPERATURE,
RELATIVE_EFFORT2,
TOTAL_WORK,
--NUMBER_OF_RUNS,
--UPHILL_TIME,DOWNHILL_TIME,OTHER_TIME,
PERCEIVED_EXERTION,
--TYPE,
--START_TIME,
WEIGHTED_AVERAGE_POWER,POWER_COUNT,
PREFER_PERCEIVED_EXERTION,PERCEIVED_RELATIVE_EFFORT,
COMMUTE,
--TOTAL_WEIGHT_LIFTED,
FROM_UPLOAD,
GRADE_ADJUSTED_DISTANCE,
--WEATHER_OBSERVATION_TIME,WEATHER_CONDITION,
--WEATHER_TEMPERATURE,APPARENT_TEMPERATURE,
--DEWPOINT,HUMIDITY,WEATHER_PRESSURE,
--WIND_SPEED,WIND_GUST,WIND_BEARING,
--PRECIPITATION_INTENSITY,
--SUNRISE_TIME,SUNSET_TIME,MOON_PHASE,
BIKE
--GEAR,
--PRECIPITATION_PROBABILITY,PRECIPITATION_TYPE,
--CLOUD_COVER,WEATHER_VISIBILITY,UV_INDEX,WEATHER_OZONE,
--JUMP_COUNT,TOTAL_GRIT,AVG_FLOW,
--FLAGGED
FROM strava.activities_ext
/

ALTER TABLE activities ADD CONSTRAINT activities_pk PRIMARY KEY (activity_id);
…
ALTER TABLE activities ADD (gpx XMLTYPE) XMLTYPE COLUMN gpx STORE AS SECUREFILE BINARY XML (CACHE DISABLE STORAGE IN ROW);
ALTER TABLE activities ADD (geom mdsys.sdo_geometry));
ALTER TABLE activities ADD (geom_27700 mdsys.sdo_geometry));
ALTER TABLE activities ADD (mbr mdsys.sdo_geometry));
ALTER TABLE activities ADD (xmlns VARCHAR2(128));
ALTER TABLE activities ADD (num_pts INTEGER DEFAULT 0);

Spool off

1c_create_activities.sql

I have specified a primary key on activity_id and made a number of other columns not nullable.
I have added a new XMLTYPE column GPX into which I will load the GPS data in the .gpx files.

FIT files

Some applications, such as Garmin and Rouvy generate compressed .fit files, and Strava exports them again (apparently if it can't convert them, although it can convert the .fit files from Zwift to .gpx). These are binary files, and since I only have a few of them, I have converted them to .gpx files using GPSBabel on my laptop, and then I reuploaded the .gpx files.

for %i in (*.fit.gz) do "C:\Program Files\GnuWin\bin\gzip" -fd %i
for %i in (*.fit) do "C:\Program Files (x86)\GPSBabel\GPSBabel.exe" -i garmin_fit -f "%i" -o gpx -F "%~ni".gpx

I then update the file name in the activities table.

UPDATE activities
SET filename = REPLACE(filename,'.fit.gz','.gpx')
WHERE filename like '%.fit.gz'
/

Compress GPX files (optional)

Some of the GPX files in the Strava export are compressed and some are not. There is no obvious reason why. To minimise the space I can gzip the GPX files.

gzip -9v /tmp/strava/activities/*.gpx

If I do compress any .gpx files, then I also need to update the file names in the activities table.

UPDATE activities
Set filename = filename||'.gz'
Where filename like '%.gpx'
/

Load the GPX files into the XML data type.

The next stage is to load each of the GPX files into the activities table.

A GPX file can be read directly into a CLOB with dbms_lob.loadclobfromfile(). I have written a PL/SQL packaged function to read the contents of a file and return it as a CLOB.

Based on Simon Greener's Spatial DB Advisor Blog: Loading and Processing GPX 1.1 files using Oracle XMLDB.

If the filename ends in .gz then I assume the file is GZIPped and I first uncompress it into a BLOB with utl_compress.lz_uncompress(), and then I can convert the BLOB to a CLOB.

Based on https://mikesmithers.wordpress.com/2020/04/28/to-boldly-gzip-from-pl-sql/.

create or replace package body strava_pkg as 
k_module      CONSTANT VARCHAR2(48) := $$PLSQL_UNIT;
…
----------------------------------------------------------------------------------------------------
function getClobDocument
(p_directory IN VARCHAR2
,p_filename  IN VARCHAR2
,p_charset   IN VARCHAR2 DEFAULT NULL
) return        CLOB deterministic
is
  l_module VARCHAR2(64); 
  l_action VARCHAR2(64);

  v_filename      VARCHAR2(128);
  v_directory     VARCHAR2(128);
  v_file          bfile;
  v_unzipped      blob := empty_blob();

  v_Content       CLOB := '';
  v_src_offset    number := 1 ;
  v_dst_offset    number := 1 ;
  v_charset_id    number := 0;
  v_lang_ctx      number := DBMS_LOB.default_lang_ctx;
  v_warning       number;

  e_22288 EXCEPTION; --file or LOB operation FILEOPEN failed
  PRAGMA EXCEPTION_INIT(e_22288, -22288);
BEGIN
  dbms_application_info.read_module(module_name=>l_module
                                   ,action_name=>l_action);
  dbms_application_info.set_module(module_name=>k_module
                                  ,action_name=>'getClobDocument');

  IF p_charset IS NOT NULL THEN
    v_charset_id := NLS_CHARSET_ID(p_charset);
  END IF;

  v_filename  := REGEXP_SUBSTR(p_filename,'[^\/]+',1,2);
  v_directory := REGEXP_SUBSTR(p_filename,'[^\/]+',1,1);

  IF v_directory IS NOT NULL and v_filename IS NULL THEN /*if only one parameters then it is actually a filename*/
    v_filename := v_directory; 
    v_directory := '';
  END IF;

  IF p_directory IS NOT NULL THEN
    v_directory := p_directory;
  END IF;

  v_File := bfilename(UPPER(v_directory),v_filename);

  BEGIN
    DBMS_LOB.fileopen(v_File, DBMS_LOB.file_readonly);
  exception 
    when VALUE_ERROR OR e_22288 then
      dbms_output.put_line('Can''t open:'||v_directory||'/'||v_filename||' - '||v_dst_offset||' bytes');
      v_content := '';
      dbms_application_info.set_module(module_name=>l_module
                                      ,action_name=>l_action);
      return v_content;
  END;

  IF v_filename LIKE '%.gz' THEN
    v_unzipped := utl_compress.lz_uncompress(v_file);
    dbms_lob.converttoclob(
      dest_lob     => v_content,
      src_blob     => v_unzipped,
      amount       => DBMS_LOB.LOBMAXSIZE, 
      dest_offset  => v_dst_offset,
      src_offset   => v_src_offset,
      blob_csid    => dbms_lob.default_csid,
      lang_context => v_lang_ctx,
      warning      => v_warning);
  ELSE --ELSIF v_filename LIKE '%.g__' THEN
    DBMS_LOB.LOADCLOBFROMFILE(v_Content, 
      Src_bfile    => v_File,
      amount       => DBMS_LOB.LOBMAXSIZE, 
      src_offset   => v_src_offset, 
      dest_offset  => v_dst_offset,
      bfile_csid   => v_charset_id, 
      lang_context => v_lang_ctx,
      warning => v_warning);
  END IF;

  dbms_output.put_line(v_directory||'/'||v_filename||' - '||v_dst_offset||' bytes');
  DBMS_LOB.fileclose(v_File);

  dbms_application_info.set_module(module_name=>l_module
                                  ,action_name=>l_action);

  return v_Content;
exception when others then
  dbms_output.put_line(v_directory||'/'||v_filename||' - '||v_dst_offset||' bytes');
  DBMS_LOB.fileclose(v_File);
  dbms_application_info.set_module(module_name=>l_module
                                  ,action_name=>l_action);
  raise;
end getClobDocument;
----------------------------------------------------------------------------------------------------
…
END strava_pkg;
/

strava_pkg.sql

I can simply query the contents of the uncompressed GPX file in SQL by calling the function. In this case, the zipped .gpx file is 65K but decompresses to 1.2Mb.

Set long 1000 lines 200 pages 99 serveroutput on
Column filename  format a30
Column gpx format a100
select activity_id, filename
, getClobDocument('',filename) gpx
from activities
where filename like '%.gpx%'
And activity_id = 4468006769
order by 1
/

ACTIVITY_ID FILENAME                       GPX
----------- ------------------------------ ----------------------------------------------------------------------------------------------------
 4468006769 activities/4468006769.gpx.gz   <?xml version="1.0" encoding="UTF-8"?>
<gpx creator="StravaGPX Android" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLoc
                                           ation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd http://www.garmin
                                           .com/xmlschemas/GpxExtensions/v3 http://www.garmin.com/xmlschemas/GpxExtensionsv3.xsd http://www.gar
                                           min.com/xmlschemas/TrackPointExtension/v1 http://www.garmin.com/xmlschemas/TrackPointExtensionv1.xsd
" version="1.1" xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxtpx="http://www.garmin.com/xmlsch
                                           emas/TrackPointExtension/v1" xmlns:gpxx="http://www.garmin.com/xmlschemas/GpxExtensions/v3">
<metadata>
<time>2020-12-13T14:31:13Z</time>
</metadata>
<trk>
<name>Loop</name>
<type>1</type>
<trkseg>
<trkpt lat="51.5296380" lon="-0.1875360">
<ele>30.6</ele>
<time>2020-12-13T14:31:13Z</time>
<extensions>
<gpxtpx:TrackPointExtension>
<gpxtpx:hr>57</gpxtpx:hr>
</gpxtpx:TrackPointExtension>
</extensions>
</trkpt>
…

activities/4468006769.gpx.gz - 1286238
Elapsed: 00:00:00.14

I can load the .gpx files into the GPX column of the activities table with a simple update statement. The CLOB returned from the function is converted to an XML with XMLTYPE.

UPDATE activities
SET gpx = XMLTYPE(getClobDocument('ACTIVITIES',filename))
WHERE filename like '%.gpx%'
/

I can now query back the same GPX from the database.

Set long 1100 lines 200 pages 99 serveroutput on
select activity_id, filename, gpx
from activities
where filename like '%.gpx%'
And activity_id = 4468006769
order by 1
/

ACTIVITY_ID FILENAME                       GPX
----------- ------------------------------ ----------------------------------------------------------------------------------------------------
 4468006769 activities/4468006769.gpx.gz   <?xml version="1.0" encoding="US-ASCII"?>
<gpx creator="StravaGPX Android" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLoc
                                           ation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd http://www.garmin
                                           .com/xmlschemas/GpxExtensions/v3 http://www.garmin.com/xmlschemas/GpxExtensionsv3.xsd http://www.gar
                                           min.com/xmlschemas/TrackPointExtension/v1 http://www.garmin.com/xmlschemas/TrackPointExtensionv1.xsd
" version="1.1" xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxtpx="http://www.garmin.com/xmlsch
                                           emas/TrackPointExtension/v1" xmlns:gpxx="http://www.garmin.com/xmlschemas/GpxExtensions/v3">
<metadata>
<time>2020-12-13T14:31:13Z</time>
</metadata>
<trk>
<name>Loop</name>
<type>1</type>
<trkseg>
<trkpt lat="51.5296380" lon="-0.1875360">
<ele>30.6</ele>
<time>2020-12-13T14:31:13Z</time>
<extensions>
<gpxtpx:TrackPointExtension>
<gpxtpx:hr>57</gpxtpx:hr>
</gpxtpx:TrackPointExtension>
</extensions>
</trkpt>
<trkpt lat="51.5296350" lon="-0.1875340">
…

↧

Spatial Data 2: Convert GPX Track to a Spatial Line Geometry

February 12, 2021, 2:00 am

≫ Next: Spatial Data 3. Analyse a track in proximity to a GPS route

≪ Previous: Spatial Data 1: Loading GPX data into XML data types

This blog is part of a series about my first steps using Spatial Data in the Oracle database. I am using the GPS data for my cycling activities collected by Strava.

Having loaded my GPS tracks from GPX files into an XML type column, the next stage is to extract the track points and create a spatial geometry column.

Defining Spatial Geometries

Spatial objects are generically referred to as geometries. When you define one, you have to specify what kind of geometry it is, and what coordinate system you are using. Later when you compare geometries to each other they have to use the same coordinate system. Otherwise, Oracle will raise an error. Fortunately, Oracle can convert between coordinate systems.

Various coordinate systems are used for geographical data, they are given EPSG Geodetic Parameter Dataset codes. Oracle supports various coordinate systems. As well as older definitions, it also has current definitions where the ESPG code matches the Spatial Reference ID (SDO_SRID). They can be queried from SDO_COORD_REF_SYS.

I will use two different coordinate systems during this series of blogs

Set lines 150 pages 99
Column coord_ref_sys_name format a35
Column legacy_cs_bounds format a110
select srid, coord_ref_sys_name, coord_ref_sys_kind, legacy_cs_bounds 
from SDO_COORD_REF_SYS where srid IN(4326, 27700)
/
      SRID COORD_REF_SYS_NAME                  COORD_REF_SYS_KIND
---------- ----------------------------------- ------------------------
LEGACY_CS_BOUNDS(SDO_GTYPE, SDO_SRID, SDO_POINT(X, Y, Z), SDO_ELEM_INFO, SDO_ORDINATES)
--------------------------------------------------------------------------------------------------------------
      4326 WGS 84                              GEOGRAPHIC2D
SDO_GEOMETRY(2003, 4326, NULL, SDO_ELEM_INFO_ARRAY(1, 1003, 3), SDO_ORDINATE_ARRAY(-180, -90, 180, 90))

     27700 OSGB 1936 / British National Grid   PROJECTED

"The World Geodetic System (WGS) is a standard for use in cartography, geodesy, and satellite navigation including GPS". The latest revision is WGS 84 (also known as WGS 1984, EPSG:4326). It is the reference coordinate system used by the Global Positioning System (GPS). Where I am dealing with longitude and latitude, specified in degrees, especially from GPS data, I need to tell Oracle that it is WGS84 by specifying SDO_SRID of 4326.
Later on, I will also be using data for Great Britain available from the Ordnance Survey that uses the Ordnance Survey National Grid (also known as British National Grid) reference system. That requires SDO_SRID to be set to 27700.

Creating Spatial Points

I have found it useful to create a packaged function to convert longitude and latitude to a spatial data point. It is a useful shorthand that I use in various places.

See Create an SDO_GEOMETRY view from a non-spatial table?

create or replace package body strava_pkg as 
k_module  CONSTANT VARCHAR2(48) := $$PLSQL_UNIT;
…
----------------------------------------------------------------------------------------------------
function make_point 
(longitude in number
,latitude  in number)
return sdo_geometry deterministic is
  l_module VARCHAR2(64);
  l_action VARCHAR2(64);
begin
  dbms_application_info.read_module(module_name=>l_module
                                   ,action_name=>l_action);
  dbms_application_info.set_module(module_name=>k_module
                                  ,action_name=>'make_point');

  if longitude is not null and latitude is not null then
    return
      sdo_geometry (
        2001, 4326,
        sdo_point_type (longitude, latitude, null),
        null, null
      );
  else
    return null;
  end if;

  dbms_application_info.set_module(module_name=>l_module
                                  ,action_name=>l_action);
end make_point;
----------------------------------------------------------------------------------------------------
END strava_pkg;
/

strava_pkg.sql

There are two parameters to SDO_GEOMETRY that I always have to specify.

The first parameter, SDO_GTYPE, describes the natures of the spatial geometry being defined. Here it is 2001. The 2 indicates that it is a 2-dimensional geometry, and the 1 indicates that it is a single point. See SDO_GEOMETRY Object Type
The second parameter, SDO_SRID, defines the coordinate system that I discussed above. 4326 indicates that I am working with longitude and latitude.

XML Namespace

GPS data is often held in GPX or GPS Exchange Format. This is an XML schema. GPX has been the de-facto XML standard for the lightweight interchange of GPS data since the initial GPX 1.0 release in 2002. The GPX 1.1 schema was released in 2004 (see https://www.topografix.com/gpx.asp).

Garmin has created an extension schema that holds additional athlete training information such as heart rate.

I can extract individual track points from a GPX with SQL using the extract() and extractvalue() functions. However, I have GPX tracks that use both versions of the Topographix GPX schema (it depends on upon which piece of software emitted the GPX file), and some that also use the Garmin extensions.

Therefore, I need to register all three schemas with Oracle. I can download the schema files with wget.

cd /tmp/strava
wget http://www.topografix.com/GPX/1/0/gpx.xsd --output-document=gpx0.xsd
wget http://www.topografix.com/GPX/1/1/gpx.xsd
wget https://www8.garmin.com/xmlschemas/TrackPointExtensionv1.xsd

Then I can register the files

delete from plan_table WHERE statement_id = 'XSD';
insert into plan_table (statement_id, plan_id, object_name, object_alias)
values ('XSD', 1, 'gpx0.xsd', 'http://www.topografix.com/GPX/1/0/gpx.xsd');
insert into plan_table (statement_id, plan_id, object_name, object_alias)
values ('XSD', 2, 'gpx.xsd', 'http://www.topografix.com/GPX/1/1/gpx.xsd');
insert into plan_table (statement_id, plan_id, object_name, object_alias)
values ('XSD', 3, 'TrackPointExtensionv1.xsd', 'https://www8.garmin.com/xmlschemas/TrackPointExtensionv1.xsd');

DECLARE
  xmlSchema xmlType;
  res       boolean;
BEGIN
  FOR i IN (
    SELECT object_alias schemaURL
    ,      object_name  schemaDoc
    FROM   plan_table
    WHERE  statement_id = 'XSD'
    ORDER BY plan_id
  ) LOOP
    --read xsd file
    xmlSchema := XMLTYPE(getCLOBDocument('STRAVA',i.schemaDoc,'AL32UTF8'));
    --if already exists delete XSD
    if (dbms_xdb.existsResource(i.schemaDoc)) then
        dbms_xdb.deleteResource(i.schemaDoc);
    end if;
    --create resource from XSD
    res := dbms_xdb.createResource(i.schemaDoc,xmlSchema);

    -- Delete existing  schema
    dbms_xmlschema.deleteSchema(
      i.schemaURL
    );
    -- Now reregister the schema
    dbms_xmlschema.registerSchema(
      i.schemaURL,
      xmlSchema,
      TRUE,TRUE,FALSE,FALSE
    );
  END LOOP;
End;
/

3a_register_xml_schema.sql

Then I can query the registered schemas.

Set pages 99 lines 160
Column schema_url format a60
Column qual_schema_url format a105
select schema_url, local, hier_type, binary, qual_schema_url
from user_xml_schemas
/

SCHEMA_URL                                                   LOC HIER_TYPE   BIN
------------------------------------------------------------ --- ----------- ---
QUAL_SCHEMA_URL
---------------------------------------------------------------------------------------------------------
https://www8.garmin.com/xmlschemas/TrackPointExtensionv1.xsd YES CONTENTS    NO
http://xmlns.oracle.com/xdb/schemas/STRAVA/https://www8.garmin.com/xmlschemas/TrackPointExtensionv1.xsd

http://www.topografix.com/GPX/1/0/gpx.xsd                    YES CONTENTS    NO
http://xmlns.oracle.com/xdb/schemas/STRAVA/www.topografix.com/GPX/1/0/gpx.xsd

http://www.topografix.com/GPX/1/1/gpx.xsd                    YES CONTENTS    NO
http://xmlns.oracle.com/xdb/schemas/STRAVA/www.topografix.com/GPX/1/1/gpx.xsd

REGISTERSCHEMA Procedures

Extracting GPS Track Points from GPX

A GPS track is a list of points specifying at least time, longitude, latitude and often elevation. I can extract all the points in a GPX as a set of rows. However, I must specify the correct namespace for the specific GPX.

Column time_string format a20
SELECT g.activity_id
,      EXTRACTVALUE(VALUE(t), 'trkpt/time') time_string
,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lat')) lat
,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lon')) lng
,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/ele')) ele
,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/extensions/gpxtpx:TrackPointExtension/gpxtpx:hr'
       ,'xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1"')) hr
 FROM activities g,
      TABLE(XMLSEQUENCE(extract(g.gpx,'/gpx/trk/trkseg/trkpt'
      ,'xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1"'
      ))) t
Where  activity_id IN(4468006769)
And rownum <= 10
/

  Activity
        ID TIME_STRING                    LAT           LNG     ELE   HR
---------- -------------------- ------------- ------------- ------- ----
4468006769 2020-12-13T14:31:13Z   51.52963800    -.18753600    30.6   57
           2020-12-13T14:31:14Z   51.52963500    -.18753400    30.6   57
           2020-12-13T14:31:15Z   51.52964100    -.18753100    30.6   57
           2020-12-13T14:31:16Z   51.52964000    -.18752900    30.6   57
           2020-12-13T14:31:17Z   51.52963600    -.18752700    30.6   57
           2020-12-13T14:31:18Z   51.52963200    -.18752700    30.6   57
           2020-12-13T14:31:19Z   51.52962900    -.18752800    30.6   57
           2020-12-13T14:31:20Z   51.52962800    -.18752800    30.6   57
           2020-12-13T14:31:21Z   51.52962800    -.18752900    30.6   57
           2020-12-13T14:31:22Z   51.52962800    -.18753000    30.6   57

I can use this approach to extract all the points from a GPS track and create a spatial line geometry. I have put the whole process into a packaged procedure strava_pkg.load_activity.

First I need to work out which version of the Topographix schema is in use. So I can try extracting the creator name with each and see which is not null.

…
IF l_num_rows > 0 THEN
  UPDATE activities
  SET    gpx = XMLTYPE(l_gpx), geom = null, geom_27700 = null, num_pts = 0, xmlns = NULL
  WHERE  activity_id = p_activity_id
  RETURNING extractvalue(gpx,'/gpx/@version', 'xmlns="http://www.topografix.com/GPX/1/0"') 
  ,         extractvalue(gpx,'/gpx/@version', 'xmlns="http://www.topografix.com/GPX/1/1"') 
  INTO      l_xmlns0, l_xmlns1;
  l_num_rows := SQL%rowcount;
END IF;
…

Now I can extract all the points in a GPX as a set of rows and put them into a spatial geometry. I turn each row with two coordinates into two rows with one point each. Note that longitude is listed before latitude for each point. I convert the rows into a list using multiset() and finally cast that as a spatial ordinate array.

Note that the SDO_GTYPE is 2002 (rather than 2001) because it is a line (rather than a single point) on a two-dimensional coordinate system.

  BEGIN
    UPDATE activities a
    SET geom = mdsys.sdo_geometry(2002,4326,null,mdsys.sdo_elem_info_array(1,2,1),
    cast(multiset(
      select CASE n.rn WHEN 1 THEN pt.lng WHEN 2 THEN pt.lat END ord
      from (
        SELECT rownum rn
        ,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lon')) as lng
        ,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lat')) as lat
        FROM   TABLE(XMLSEQUENCE(extract(a.gpx,'/gpx/trk/trkseg/trkpt', 'xmlns="http://www.topografix.com/GPX/1/1"'))) t
        ) pt,
        (select 1 rn from dual union all select 2 from dual) n
	    order by pt.rn, n.rn
      ) AS mdsys.sdo_ordinate_array))
    , xmlns = 'xmlns="http://www.topografix.com/GPX/1/1"'
    WHERE  a.gpx IS NOT NULL
    And    activity_id = p_activity_id;
    l_num_rows := SQL%rowcount;
  EXCEPTION
    WHEN e_13034 OR e_29877 THEN 
	  dbms_output.put_line('Exception:'||sqlerrm);
	  l_num_rows := 0;
  END;

I have found it helpful to simplify the line geometry with sdo_util.simplify(). It removes some of the noise in the GPS data and has resolved problems with calculating the length of lines that intersect with areas.

  BEGIN
    UPDATE activities 
    SET    geom = sdo_util.simplify(geom,1)
    WHERE  geom IS NOT NULL
    And    activity_id = p_activity_id;
    l_num_rows := SQL%rowcount;
  EXCEPTION
    WHEN e_13034 THEN 
	  dbms_output.put_line('Exception:'||sqlerrm);
  END;

There are a few other fields I also update at this point. You will see me use them later.

NUM_PTS is the number of points in the line geometry.
GEOM_27700 is the result of converting the line to British National Grid reference coordinates. This helps when comparing it to British boundary data obtained from the Ordnance Survey or other government agencies.
MBR is the minimum bounding rectangle for the line. This is generated to enable me to improve the performance of some spatial queries. I have found some of the spatial operators to calculate intersections between geometries are quite slow and CPU intensive when applied to GPS tracks and boundary data that both have lots of points. SDO_GEOM.SDO_MBR simply returns 4 ordinates that define the bounding rectangle. This can be used to roughly match geometries that might match before doing a proper match.

  UPDATE activities 
  SET    num_pts = SDO_UTIL.GETNUMVERTICES(geom)
  ,      geom_27700 = sdo_cs.transform(geom,27700)
  ,      mbr = sdo_geom.sdo_mbr(geom)
  WHERE  geom IS NOT NULL
  And    activity_id = p_activity_id
  RETURNING num_pts INTO l_num_pts;
  dbms_output.put_line('Activity ID:'||p_activity_id||', '||l_num_pts||' points');
…

see also Loading and Processing GPX 1.1 files using Oracle XMLDB

Now I can load each GPX and process it into a spatial geometry in one step. I can process all of the activities in a simple loop.

set serveroutput on timi on
exec strava_pkg.load_activity(4468006769);
Loading Activity: 4468006769
ACTIVITIES/4468006769.gpx.gz - 1286238 bytes
xmlns 1=StravaGPX Android
Activity ID:4468006769, 998 points

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.41

Now my Strava activities are all in spatial geometries and I can start to do some spatial processing.

↧

Spatial Data 3. Analyse a track in proximity to a GPS route

February 18, 2021, 4:29 am

≫ Next: Spatial Data 4: Obtaining Geographical Data

≪ Previous: Spatial Data 2: Convert GPX Track to a Spatial Line Geometry

This blog is part of a series about my first steps using Spatial Data in the Oracle database. I am using the GPS data for my cycling activities collected by Strava.

Now I have loaded some data, I am going to start to do something useful with it. I go out on my bike most mornings, and I usually ride up Swain's Lane in Highgate three times. How long did each one take? Over time, have I got faster or slower?

I need a definition of Swain's Lane that I can compare to. I will start by drawing a route with my favourite GPS software. A route is just a sequence of route points. I can then export that as a GPX file.

<?xml version="1.0" encoding="UTF-8"?>
<gpx xmlns="http://www.topografix.com/GPX/1/1" version="1.1" creator="ViewRanger - //www.viewranger.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd">
<rte>
<name><![CDATA[Swain's World]]></name>
<rtept lat="51.569613039632" lon="-0.14770468632509"></rtept>
<rtept lat="51.569407978151" lon="-0.14832964102552"></rtept>
<rtept lat="51.567090552402" lon="-0.14674177328872"></rtept>
<rtept lat="51.567080548869" lon="-0.14592101733016"></rtept>
<rtept lat="51.569618041121" lon="-0.14773419062425"></rtept>
</rte>
</gpx>

Geometries Table

I will load the GPX route into a table much as I did with the track files.

drop table my_geometries purge;

createg table my_geometries
(geom_id    NUMBER NOT NULL 
,descr      VARCHAR2(64)
,gpx        XMLTYPE
,geom       mdsys.sdo_geometry
,geom_27700 mdsys.sdo_geometry
,mbr        mdsys.sdo_geometry
,constraint my_geometries_pk PRIMARY KEY (geom_id)
)
XMLTYPE COLUMN gpx STORE AS SECUREFILE BINARY XML (CACHE DISABLE STORAGE IN ROW)
/

The difference is that I have a series of route points instead of track points, so the paths in extract() and extractvalue() are slightly different.

delete from my_geometries where geom_id = 2;
INSERT INTO my_geometries (geom_id, descr, gpx) 
VALUES (2,'Swains World Route', XMLTYPE(strava_pkg.getClobDocument('STRAVA','swainsworldroute.gpx')));

UPDATE my_geometries
SET geom = mdsys.sdo_geometry(2002,4326,null,mdsys.sdo_elem_info_array(1,2,1),
cast(multiset(
  select CASE n.rn WHEN 1 THEN pt.lng WHEN 2 THEN pt.lat END ord
  from (
    SELECT /*+MATERIALIZE*/ rownum rn
    ,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'rtept/@lon')) as lng
    ,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'rtept/@lat')) as lat
    FROM   my_geometries g,
           TABLE(XMLSEQUENCE(extract(g.gpx,'/gpx/rte/rtept','xmlns="http://www.topografix.com/GPX/1/1"'))) t
    where g.geom_id = 2
    ) pt,
    (select 1 rn from dual union all select 2 from dual) n
	order by pt.rn, n.rn
  ) AS mdsys.sdo_ordinate_array))
WHERE gpx IS NOT NULL
AND   geom IS NULL
/
UPDATE my_geometries
SET mbr = sdo_geom.sdo_mbr(geom)
,   geom_27700 = sdo_cs.transform(geom,27700)
/

Commit;
Set pages 99 lines 180 
Select geom_id, descr, gpx, geom 
from my_geometries
where geom_id = 2;

   GEOM_ID DESCR
---------- ----------------------------------------------------------------
GPX
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GEOM(SDO_GTYPE, SDO_SRID, SDO_POINT(X, Y, Z), SDO_ELEM_INFO, SDO_ORDINATES)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
         2 Swains World Route
<?xml version="1.0" encoding="US-ASCII"?>
<gpx xmlns="http://www.topografix.com/GPX/1/1" version="1.1" creator="ViewRanger - //www.viewranger.com" xml
SDO_GEOMETRY(2002, 4326, NULL, SDO_ELEM_INFO_ARRAY(1, 2, 1), SDO_ORDINATE_ARRAY(-.14651114, 51.5670769, -.14649237, 51.567298, -.1465782, 51.567563, -.14680618, 51.5680165, -.14697
516, 51.5682533, -.14754379, 51.5688701, -.14807219, 51.5694887))

I am going to build spatial indexes on the geometry columns, so I need to define the upper and lower bound values on the coordinates.

delete from user_sdo_geom_metadata where table_name = 'MY_GEOMETRIES';
insert into user_sdo_geom_metadata (table_name,column_name,diminfo,srid)
values ( 
'MY_GEOMETRIES' , 'GEOM_27700',
  sdo_dim_array(
    sdo_dim_element('Easting',-1000000,1500000,0.05), 
    sdo_dim_element('Northing', -500000,2000000,0.05)),
  27700);
insert into user_sdo_geom_metadata (table_name,column_name,diminfo,srid)
values ( 
'MY_GEOMETRIES' , 'GEOM',
  sdo_dim_array(
    sdo_dim_element('Longitude',-180,180,0.05), 
    sdo_dim_element('Latgitude',-90,90,0.05)),
  4326);
insert into user_sdo_geom_metadata (table_name,column_name,diminfo,srid)
values ( 
'MY_GEOMETRIES' , 'MBR',
  sdo_dim_array(
    sdo_dim_element('Longitude',-180,180,0.05), 
    sdo_dim_element('Latgitude',-90,90,0.05)),
  4326);
commit;

CREATE INDEX my_geometries_geom ON my_geometries (geom) INDEXTYPE IS MDSYS.SPATIAL_INDEX_v2;
CREATE INDEX my_geometries_geom_27700 ON my_geometries (geom_27700) INDEXTYPE IS MDSYS.SPATIAL_INDEX_v2;
CREATE INDEX my_geometries_mbr ON my_geometries (mbr) INDEXTYPE IS MDSYS.SPATIAL_INDEX_v2;

Compare Geometries

Now I can compare my Swain's Lane geometry to my activity geometries. Let's start by looking for rides in December 2020 that went up Swain's Lane

Column activity_id heading 'Activity|ID'
Column activity_name format a30
Column geom_relate heading 'geom|relate' format a6
With a as (
SELECT a.activity_id, a.activity_date, a.activity_name
,      SDO_GEOM.RELATE(a.geom,'anyinteract',g.geom,25) geom_relate
FROM   activities a
,      my_geometries g
WHERE  a.activity_type = 'Ride'
--And    a.activity_id IN(4468006769)
And    a.activity_date >= TO_DATE('01122020','DDMMYYYY')
and    g.geom_id = 2 /*Swains World Route*/
)
Select *
From   a
Where  geom_relate = 'TRUE'
Order by activity_date
/

Where there is a relation between the two geometries then I have a hit.

  Activity                                                    geom
        ID ACTIVITY_DATE       ACTIVITY_NAME                  relate
---------- ------------------- ------------------------------ ------
4419821750 08:44:45 02.12.2020 Loop                           TRUE
4428307816 10:49:25 04.12.2020 Loop                           TRUE
4431920358 09:41:13 05.12.2020 Loop                           TRUE
…
4528825613 09:39:38 28.12.2020 Loop                           TRUE
4534027888 11:29:45 29.12.2020 Loop                           TRUE
4538488655 09:57:55 30.12.2020 Loop                           TRUE

25 rows selected.

Analyse Individual Efforts

Now I want to analyse each of my trips up Swain's Lane on a particular day. I am going to work with the GPX rather than the spatial geometry because I am interested also in time, elevation and heart rate data that is not stored in the spatial geometry.

Also, you can't use analytic functions on spatial geometries.

with x as (
SELECT activity_id
,      TO_DATE(EXTRACTVALUE(VALUE(t), 'trkpt/time'),'YYYY-MM-DD"T"HH24:MI:SS"Z"') time
,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lat')) lat
,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lon')) lng
FROM   activities a,
       TABLE(XMLSEQUENCE(extract(a.gpx,'/gpx/trk/trkseg/trkpt','xmlns="http://www.topografix.com/GPX/1/1"'))) t
WHERE  a.activity_id IN(4468006769)
), y as (
select x.*, strava_pkg.make_point(lng,lat) loc
from x
)
select lag(loc,1) over (partition by activity_id order by time) last_loc
from   y
/

select lag(loc,1) over (partition by activity_id order by time) last_loc
           *
ERROR at line 13:
ORA-22901: cannot compare VARRAY or LOB attributes of an object type

Instead, I will have to apply analytic functions to the values extracted from the GPX and then create a spatial point. Thus I will be able to calculate the length of each individual trip by aggregating the distance between each pair of points.

The following query splits out each trip up Swain's Lane in a particular activity and shows the distance, duration, and metrics about elevation, gradient, and heart rate.

alter session set statistics_level=ALL;
alter session set nls_date_Format = 'hh24:mi:ss dd.mm.yyyy';
break on activity_id skip 1
compute sum of sum_dist on activity_id
compute sum of num_pt on activity_id
compute sum of sum_secs on activity_id
Set lines 180 pages 50 timi on
Column activity_id heading 'Activity|ID'
Column activity_name format a15
column time format a20
column lat format 999.99999999
column lng format 999.99999999
column ele format 9999.9
column hr format 999
column sdo_relate format a10
column num_pts heading 'Num|Pts' format 99999
column sum_dist heading 'Dist.|(km)' format 999.999
column sum_secs heading 'Secs' format 9999
column avg_speed heading 'Avg|Speed|(kmph)' format 99.9
column ele_gain heading 'Ele|Gain|(m)' format 9999.9
column ele_loss heading 'Ele|Loss|(m)' format 9999.9
column avg_grade heading 'Avg|Grade|%' format 99.9
column min_ele heading 'Min|Ele|(m)' format 999.9
column max_ele heading 'Max|Ele|(m)' format 999.9
column avg_hr heading 'Avg|HR' format 999
column max_hr heading 'Max|HR' format 999
WITH geo as ( /*route geometry to compare to*/
select /*MATERIALIZE*/ g.*, 25 tol
,      sdo_geom.sdo_length(geom, unit=>'unit=m') geom_length
from   my_geometries g
where  geom_id = 2 /*Swains World Route*/
), a as ( /*extract all points in activity*/
SELECT a.activity_id, g.geom g_geom, g.tol, g.geom_length
,      TO_DATE(EXTRACTVALUE(VALUE(t), 'trkpt/time'),'YYYY-MM-DD"T"HH24:MI:SS"Z"') time
,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lat')) lat
,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lon')) lng
,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/ele')) ele
,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/extensions/gpxtpx:TrackPointExtension/gpxtpx:hr'
       ,'xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1"')) hr
FROM   activities a,
       geo g,
       TABLE(XMLSEQUENCE(extract(a.gpx,'/gpx/trk/trkseg/trkpt'
       ,'xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1"'))) t
Where  a.activity_id IN(4468006769)
and    SDO_GEOM.RELATE(a.geom,'anyinteract',g.geom,g.tol) = 'TRUE' /*activity has relation to reference geometry*/
), b as ( /*smooth elevation*/
Select a.*
,      avg(ele) over (partition by activity_id order by time rows between 2 preceding and 2 following) avg_ele
From   a
), c as ( /*last point*/
Select b.*
,      row_number() over (partition by activity_id order by time) seq
,      lag(time,1) over (partition by activity_id order by time) last_time
,      lag(lat,1) over (partition by activity_id order by time) last_lat
,      lag(lng,1) over (partition by activity_id order by time) last_lng
--,      lag(ele,1) over (partition by activity_id order by time) last_ele
,      lag(avg_ele,1) over (partition by activity_id order by time) last_avg_ele
From   b
), d as ( /*make points*/
SELECT c.* 
,      strava_pkg.make_point(lng,lat) loc
,      strava_pkg.make_point(last_lng,last_lat) last_loc
FROM   c
), e as ( /*determine whether point is inside the polygon*/
select d.*
,      86400*(time-last_time) secs
,      avg_ele-last_avg_ele ele_diff
,      sdo_geom.sdo_distance(loc,last_loc,0.05,'unit=m') dist
,      SDO_GEOM.RELATE(loc,'anyinteract', g_geom, tol) sdo_relate
FROM   d
), f as (
select e.*
,      CASE WHEN sdo_relate != lag(sdo_relate,1) over (partition by activity_id order by time) THEN 1 END sdo_diff
from   e
), g as (
select f.*
,      SUM(sdo_diff) over (partition by activity_id order by time range between unbounded preceding and current row) sdo_seq
from f
where  sdo_relate = 'TRUE'
)
select activity_id, min(time), max(time)
, sum(dist)/1000 sum_dist
, sum(secs) sum_secs
, 3.6*sum(dist)/sum(secs) avg_speed
, sum(greatest(0,ele_diff)) ele_gain
, sum(least(0,ele_diff)) ele_loss
, 100*sum(ele_diff*dist)/sum(dist*dist) avg_grade
, min(ele) min_ele
, max(ele) max_ele
, sum(hr*secs)/sum(secs) avg_Hr
, max(hr) max_hr
, count(*) num_pts
from   g
group by activity_id, sdo_seq, g.geom_length
having sum(dist)>= g.geom_length/2 /*make sure line we find is longer than half route to prevent fragmentation*/
order by 2
/
select * from table(dbms_xplan.display_cursor(null,null,'ADVANCED +IOSTATS -PROJECTION +ADAPTIVE'))
/

4a_1swains.sql

In subquery a, I compare the geometry of the activity with the geometry of Swain's Lane using sdo_geom.relate() to confirm that the activity includes Swain's Lane, but then I extract all the points in the activity GPX.
GPS is optimised for horizontal accuracy. Even so, the tolerance for determining whether the track is close to the route has to be set to 25m to allow for noise in the data (Swain's Lane is tree-lined, and has walls on both sides, that both attenuate the GPS signal). GPS elevation data is notorious for being noisy even under good conditions; you can see this in the variation of height gained on each ascent. Sub-query b calculates an average elevation across 5 track points (up to +/-2 points).
I need to compare each point in the track to each previous point so I can do some calculations and determine when the track comes into proximity with the Swain's Lane route, subquery c uses analytic functions to determine the previous point. It is not possible to apply the analytic function to a geometry.
Subquery e determines whether a track point is in proximity to the route. The tolerance, 25m, is set in subquery geo. Then subquery f flags where the track point is in proximity to the route and the previous one was not. Finally, subquery g maintains a running total of the number of times the track has gone close enough to the route. That becomes a sequence number for each ascent of Swain's Lane by which I can group the subsequent analytics.

                                                                     Avg     Ele     Ele   Avg    Min    Max
  Activity                                            Dist.        Speed    Gain    Loss Grade    Ele    Ele  Avg  Max   Num
        ID MIN(TIME)           MAX(TIME)               (km)  Secs (kmph)     (m)     (m)     %    (m)    (m)   HR   HR   Pts
---------- ------------------- ------------------- -------- ----- ------ ------- ------- ----- ------ ------ ---- ---- -----
4468006769 14:55:51 13.12.2020 14:58:17 13.12.2020     .372   147    9.1    36.1      .0   8.6   86.8  122.7  141  153   147
           15:08:13 13.12.2020 15:10:28 13.12.2020     .374   136    9.9    36.2      .0   8.4   86.8  122.8  147  155   136
           15:22:49 13.12.2020 15:25:18 13.12.2020     .369   150    8.9    36.2      .0   8.2   86.8  122.7  147  155   150
**********                                         -------- -----
sum                                                   1.116   433

On my laptop, this query takes about 10s, of which about 8s is spent on the window sort for the analytic functions, and 2s is spent working out whether the track points are in proximity to the route.

Plan hash value: 3042349692

---------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name             | Starts | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                  |      2 |        |       |       |  5147 (100)|          |      6 |00:00:20.94 |     392 |
|   1 |  SORT ORDER BY                           |                  |      2 |      1 |   104 |       |  5147   (1)| 00:00:01 |      6 |00:00:20.94 |     392 |
|*  2 |   FILTER                                 |                  |      2 |        |       |       |            |          |      6 |00:00:20.94 |     392 |
|   3 |    HASH GROUP BY                         |                  |      2 |      1 |   104 |       |  5147   (1)| 00:00:01 |      6 |00:00:20.94 |     392 |
|   4 |     VIEW                                 |                  |      2 |   8168 |   829K|       |  5144   (1)| 00:00:01 |    866 |00:00:20.93 |     392 |
|   5 |      WINDOW SORT                         |                  |      2 |   8168 |    16M|    21M|  5144   (1)| 00:00:01 |    866 |00:00:20.93 |     392 |
|*  6 |       VIEW                               |                  |      2 |   8168 |    16M|       |  1569   (1)| 00:00:01 |    866 |00:00:20.93 |     392 |
|   7 |        WINDOW SORT                       |                  |      2 |   8168 |  1403K|  1688K|  1569   (1)| 00:00:01 |  10208 |00:00:09.63 |     392 |
|   8 |         VIEW                             |                  |      2 |   8168 |  1403K|       |  1252   (1)| 00:00:01 |  10208 |00:00:06.22 |     392 |
|   9 |          WINDOW SORT                     |                  |      2 |   8168 |  1021K|  1248K|  1252   (1)| 00:00:01 |  10208 |00:00:06.20 |     392 |
|  10 |           VIEW                           |                  |      2 |   8168 |  1021K|       |  1016   (1)| 00:00:01 |  10208 |00:00:06.05 |     392 |
|  11 |            WINDOW SORT                   |                  |      2 |   8168 |  4546K|  5040K|  1016   (1)| 00:00:01 |  10208 |00:00:00.76 |     392 |
|  12 |             NESTED LOOPS                 |                  |      2 |   8168 |  4546K|       |    31   (0)| 00:00:01 |  10208 |00:00:00.41 |     392 |
|  13 |              NESTED LOOPS                |                  |      2 |      1 |   560 |       |     2   (0)| 00:00:01 |      2 |00:00:00.03 |     104 |
|  14 |               TABLE ACCESS BY INDEX ROWID| MY_GEOMETRIES    |      2 |      1 |   112 |       |     1   (0)| 00:00:01 |      2 |00:00:00.01 |       4 |
|* 15 |                INDEX UNIQUE SCAN         | MY_GEOMETRIES_PK |      2 |      1 |       |       |     0   (0)|          |      2 |00:00:00.01 |       2 |
|* 16 |               TABLE ACCESS BY INDEX ROWID| ACTIVITIES       |      2 |      1 |   448 |       |     1   (0)| 00:00:01 |      2 |00:00:00.03 |     100 |
|* 17 |                INDEX UNIQUE SCAN         | ACTIVITIES_PK    |      2 |      1 |       |       |     0   (0)|          |      2 |00:00:00.01 |       4 |
|  18 |              XPATH EVALUATION            |                  |      2 |        |       |       |            |          |  10208 |00:00:00.37 |     288 |
---------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(SUM("DIST")>="G"."GEOM_LENGTH"/2)
   6 - filter("SDO_RELATE"='TRUE')
  15 - access("GEOM_ID"=2)
  16 - filter(("A"."ACTIVITY_TYPE"='Ride' AND "SDO_GEOM"."RELATE"("A"."GEOM",'anyinteract',"G"."GEOM",25)='TRUE'))
  17 - access("A"."ACTIVITY_ID"=4468006769)

I can apply this approach to all my trips up Swain's Lane. However, I have logged 1115 ascents, and if I attempt to process them in a single SQL query I will have to do some very large window sorts that will spill out of memory (at least they will on my machine). Instead, it is faster to process each activity separately in a PL/SQL loop (see 4b_allswains2.sql).

I now have a table containing all of my ascents of Swain's Lane and I can see if I am getting faster or slower. I simply dumped the data into Excel with SQL developer.

Unfortunately, I have discovered that I am not going faster!

↧

Spatial Data 4: Obtaining Geographical Data

March 21, 2021, 3:52 pm

≫ Next: Spatial Data 5: Searching For Geometries That Intersect Other Geometries

≪ Previous: Spatial Data 3. Analyse a track in proximity to a GPS route

This blog is part of a series about my first steps in using Spatial Data in the Oracle database. I am using the GPS data from my cycling activities collected by Strava. All of my files are available on GitHub.

The next stage is to use my Strava data as a resource for ride planning. For example, I want to go for a ride in the Chilterns this weekend, I want to look at previous rides in the Chilterns to see where I have gone. This presents a number of challenges that I will cover over the next few blogs.

I need a working definition of the Chilterns.
I need to identify which activities entered the area defined as being the Chilterns.

More generically, I might be interested in any area in any country. I need to be able to search for areas by name, then identify the activities that passed through these areas.

Geographical Areas

The world is divided up into 206 sovereignties (including independent and leased areas), and those are then divided down. Let's take the United Kingdom as an example:

United Kingdom

.England
.Northern Ireland
.Scotland
.Wales

.Guernsey
..Alderney
..Guernsey
..Herm
..Sark
.Isle of Man
.Jersey

.Anguilla
.Bermuda
.Cayman Islands
.Dhekelia Sovereign Base Area
.Falkland Islands
.Gibraltar
.British Indian Ocean Territory
..Diego Garcia Naval Support Facility
.Montserrat
.Pitcairn Islands
.South Georgia and the Islands
..South Georgia
..South Sandwich Islands
.Saint Helena
..Ascension
..Saint Helena
..Tristan da Cunha
.Turks and Caicos Islands
.British Virgin Islands
 Akrotiri Sovereign Base Area

The United Kingdom consists of the 4 'home' countries.

These are divided down into counties, authorities, districts, boroughs, wards and parishes.

Guernsey, Jersey and the Isle of Man are "Crown Dependencies".
There are 14 dependent territories

Some of these are broken down further into separate islands.

I need enough areas to allow me to effectively search areas by name and then determine which activities are in which areas.

To return to the original question, the Chiltern Hills are not a government administrative area but are designated as an Area of Outstanding Natural Beauty (AONB). As, they are a useful shorthand to describe some areas where I regularly cycle, so I have included them in the heirarchy.

Loading Spatial Data from Esri Shapefile

Lots of geographical data is publically available from a variety of organisations and governments in the form of shapefiles. This is "Esri's somewhat open, hybrid vector data format using SHP, SHX and DBF files. Originally invented in the early 1990s, it is still commonly used as a widely supported interchange format". Oracle provides a java shapefile converter that transforms shapefiles into database tables.

Merging Shapefile Data into a Single Set of Data

The various tables created by loading shapefiles will each have their own structures determined by what was put into the shapefile. Ultimately, I am going to load them all into a single table with which I will work.

Areas have a hierarchy and that is represented in this table by the linked list of area code and number to parent area code and number. Foreign key constraints ensure the parent values are valid. There are also check constraints to prevent an area from being its own parent.

REM my_areas_ddl.sql
…
CREATE TABLE my_areas
(area_Code varchar2(4) NOT NULL
,area_number integer NOT NULL
,uqid varchar2(20) NOT NULL
…
,area_level integer NOT NULL 
,parent_area_code varchar2(4)
,parent_area_number integer
,parent_uqid varchar2(20)
,name varchar2(40)
,suffix varchar2(20)
,iso_code3 varchar2(3)
…
,num_children integer
,matchable integer default 1
…
,geom mdsys.sdo_geometry
,geom_27700 mdsys.sdo_geometry
,mbr mdsys.sdo_geometry
,constraint my_areas_pk primary key (area_code, area_number)
,constraint my_areas_uqid unique (uqid)
,constraint my_areas_rfk_area_code foreign key (parent_area_code, parent_area_number) references my_areas (area_code, area_number)
,constraint my_areas_rfk_uqid foreign key (parent_uqid) references my_areas (uqid)
,constraint my_areas_fk_area_code foreign key (area_code) references my_area_codes (area_code)
,constraint my_areas_check_parent_area_code CHECK (area_code != parent_area_code OR area_number != parent_area_number) 
,constraint my_areas_check_parent_uqid CHECK (uqid != parent_uqid)
)
/
--alter table my_areas modify matchable default 1;
Alter table my_areas add constraint my_areas_uq_iso_code3 unique (iso_code3);
Create index my_areas_rfk_uqid on my_areas(parent_uqid);
Create index my_areas_rfk_area_code on my_areas (parent_area_code, parent_area_number);

I have created scripts to populate data in the my_areas table from the Natural Earth data, and from the data for each country. Different scripts are needed for each shapefile.

load_countries.sql - to load Natural Earth data
load_uk.sql - to load Ordnance Survey data of Great Britain. This includes some DML to work out which wards and parishes are in which districts and boroughs and update the hierarchy accordingly.
load_XXX.sql, - load administrative areas for a country where XXX is the 3-letter ISO code for that country (eg. load_FRA.sql for France).
fix_names.sql - to simplify names stripping common suffixes such as a county, district, authority, ward etc.

fix_my_areas.sql - script to collect statistics, count children for each area, look for areas that children of another area with the same name, simplify areas with more than 10,000 points.

↧

Spatial Data 5: Searching For Geometries That Intersect Other Geometries

March 29, 2021, 2:00 am

≫ Next: Spatial Data 6: Text Searching Areas by their Name, and the Names of Parent Areas

≪ Previous: Spatial Data 4: Obtaining Geographical Data

I have loaded basic data for all countries, and detailed data for the UK and other countries where I have recorded activities. The next step is to determine which activities pass through which areas. Generically, the question is simply whether one geometry intersects with another. I can test this in SQL with the sdo_geom.relate() function.

WHERE SDO_GEOM.RELATE(a.geom,'anyinteract',m.geom) = 'TRUE'

However, working out whether an activity, with several thousand points, is within an area defined with several thousand points can be CPU intensive and time-consuming. Larger areas such as UK counties average over 20,000 points.

I have 60,000 defined areas, of which, over 20,000 of which are for the UK. I have 2700 activities recorded on Strava, with an average of 2700 points, but some have over 10,000 points. It isn't viable to compare every activity with every area. Comparing these large geometries can take a significant time, too long to do the spatial queries every time I want to interrogate the data, and too long for an on-line application.

Pre-processing Geometry Intersections

However, the data, once loaded is static. Definitions of areas can change, but it is rare. Activities do not change. Therefore, I have decided to pre-process the data to produce a table of matching activities and areas.

CREATE TABLE activity_areas
(activity_id NUMBER NOT NULL
,area_code   VARCHAR2(4) NOT NULL
,area_number NUMBER NOT NULL
,geom_length NUMBER
,CONSTRAINT ACTIVITY_AREAS_PK PRIMARY KEY (activity_id, area_code, area_number)
,CONSTRAINT ACTIVITY_AREAS_FK FOREIGN KEY (activity_id) REFERENCES ACTIVITIES (activity_id)
,CONSTRAINT ACTIVITY_AREAS_FK2 FOREIGN KEY (area_code, area_number) 
                       REFERENCES MY_AREAS (area_code, area_number)
);

Recursive Search

I have written the search as a PL/SQL procedure to search areas that match a particular activity.

I pass the ID of the activity to be processed to the procedure.
I can specify the area code and number, or the parent area code and number, at which to search through the areas. I usually leave them to default to null so the search starts with areas at the root of the hierarchy that therefore have no parents (i.e. sovereign countries).
The procedure then calls itself recursively for each area that it finds matches the activity, to search its children. This way, I limit the total number of comparisons required.
For every area and activity, I have calculated the minimum bounding rectangle using sdo_geom.sdo_mbr() and stored it in another geometry column on the same row. This geometry contains just 5 points (the last point is the same as the first to close the rectangle). I can compare two rectangles very quickly, and if they don't intersect overlap then there is no need to see if the actual geometries overlap. This approach filters out geometries that cannot match, so that fewer geometries then have to be compared in full, thus significantly improving the performance of the search.

AND SDO_GEOM.RELATE(a.mbr,'anyinteract',m.mbr) = 'TRUE'

I have found that it is necessary to have the MBR comparison earlier in the predicate clauses than the GEOM comparison.

…
PROCEDURE activity_area_search
(p_activity_id INTEGER
,p_area_code   my_areas.area_code%TYPE DEFAULT NULL
,p_area_number my_areas.area_number%TYPE DEFAULT NULL
,p_query_type VARCHAR2 DEFAULT 'P'
,p_level INTEGER DEFAULT 0
) IS
BEGIN
  FOR i IN(
   SELECT m.*
   ,      CASE WHEN m.geom_27700 IS NOT NULL THEN sdo_geom.sdo_length(SDO_GEOM.sdo_intersection(m.geom_27700,a.geom_27700,5), unit=>'unit=km') 
               WHEN m.geom       IS NOT NULL THEN sdo_geom.sdo_length(SDO_GEOM.sdo_intersection(m.geom,a.geom,5), unit=>'unit=km') 
          END geom_length
   ,      (SELECT MIN(m2.area_level) FROM my_areas m2 
           WHERE  m2.parent_area_code = m.area_code AND m2.parent_area_number = m.area_number) min_child_level
   FROM   my_areas m
   ,      activities a
   WHERE  (  (p_query_type = 'P' AND parent_area_code = p_area_code AND parent_area_number = p_area_number) 
          OR (p_query_type = 'A' AND area_code        = p_area_code AND area_number        = p_area_number)
	  OR (p_query_type = 'A' AND p_area_number IS NULL          AND area_code          =  p_area_code)
          OR (p_area_code IS NULL AND p_area_number IS NULL AND parent_area_code IS NULL AND parent_area_number IS NULL))
   AND    a.activity_id = p_activity_id
   and    SDO_GEOM.RELATE(a.mbr,'anyinteract',m.mbr) = 'TRUE'
   and    SDO_GEOM.RELATE(a.geom,'anyinteract',m.geom) = 'TRUE'
  ) LOOP
    IF i.area_level>0 OR i.num_children IS NULL THEN
      BEGIN
        INSERT INTO activity_areas
        (activity_id, area_code, area_number, geom_length)
        VALUES
        (p_activity_id, i.area_code, i.area_number, i.geom_length);
      EXCEPTION
        WHEN dup_val_on_index THEN
          UPDATE activity_areas
          SET    geom_length = i.geom_length
          WHERE  activity_id = p_activity_id
          AND    area_code = i.area_code
          AND    area_number = i.area_number;
      END;
    END IF;

    IF i.num_children > 0 THEN
      strava_pkg.activity_area_search(p_activity_id, i.area_code, i.area_number, 'P', p_level+1);
    END IF;
  END LOOP;

END activity_area_search;
…

The search can process a single activity by calling the procedure. An activity that found just 10 areas, took just 6 seconds to process. However, it does not scale linearly. Activities that have over 100 areas can take at least 6 minutes.

SQL> exec strava_pkg.activity_area_search(4372796838);
Searching 4372796838:-
Found SOV-1159320701:United Kingdom,    2.895 km
.Searching 4372796838:SOV-1159320701
.Found GEOU-1159320743:England,    2.851 km
..Searching 4372796838:GEOU-1159320743
..Found GLA-117537:Greater London,    2.851 km
...Searching 4372796838:GLA-117537
...Found LBO-50724:City of Westminster,    1.732 km
....Searching 4372796838:LBO-50724
....Found LBW-117484:Abbey Road,    1.435 km
....Found LBW-50639:Maida Vale,    0.298 km
....Done 4372796838:LBO-50724:    0.415 secs).
...Found LBO-50632:Camden,    1.119 km
....Searching 4372796838:LBO-50632
....Found LBW-117286:Kilburn,    0.273 km
....Found LBW-117288:Swiss Cottage,    1.033 km
....Found LBW-117287:West Hampstead,    0.084 km
....Done 4372796838:LBO-50632:    0.521 secs).
...Done 4372796838:GLA-117537:    3.368 secs).
..Done 4372796838:GEOU-1159320743:    4.372 secs).
.Done 4372796838:SOV-1159320701:    4.750 secs).
Done 4372796838:-:    5.532 secs).

PL/SQL procedure successfully completed.

Since I load Strava activities from the bulk download, I also process them in bulk.

--process unmatched activities
set pages 99 lines 180 timi on serveroutput on
column activity_name format a60
BEGIN 
  FOR i IN (
    SELECT a.activity_id, activity_date, activity_name
    ,      distance_km, num_pts, ROUND(num_pts/NULLIF(distance_km,0),0) ppkm
    FROM   activities a
    WHERE  activity_id NOT IN (SELECT DISTINCT activity_id FROM activity_areas)
    AND    num_pts>0
  ) LOOP
    strava_pkg.activity_area_search(i.activity_id);
    commit;
  END LOOP;
END;
/

Matching 2,700 activities produced 71,628 rows on activity_areas for 5,620 distinct areas. In the next article, I will demonstrate how to text search the areas for matching activities.

↧

Spatial Data 6: Text Searching Areas by their Name, and the Names of Parent Areas

April 6, 2021, 11:45 am

≫ Next: Clashing SQL Profiles - Exact Matching Profiles Take Precedence Over Force Matching Profiles

≪ Previous: Spatial Data 5: Searching For Geometries That Intersect Other Geometries

Now I have loaded all the areas, I want to be able to search for them by name. I am going to create an Oracle Text Index, but I need to index more than just the name of each area. I must index the full hierarchy of each area so I can search on combinations of names in different types of areas. For example, I might search for a village and county (e.g. Streatley and Berkshire), to distinguish it from a village of the same name in a different county (e.g. Streatley in Bedfordshire).

I can generate the full hierarchy of an area with a PL/SQL function (strava_pkg.name_heirarchy_fn) by navigating up the linked list and discarding repeated names. I could make that available in a virtual column. However, I cannot build a text index on a function or a virtual column.

Text Index Option 1: Store Hierarchy on Table, and Create a Multi-Column Text Index

I could store the hierarchy of an area on the my_areas table, and generate the area from PL/SQL function strava_pkg. name_heirarchy_fn.

DECLARE
  l_clob CLOB;
  l_my_areas my_areas%ROWTYPE;
BEGIN
  select m.*
  into   l_my_areas
  FROM   my_areas m
  WHERE  area_code = 'CPC'
  And    area_number = '40307';

  dbms_output.put_line(strava_pkg.name_heirarchy_fn(l_my_areas.area_code,l_my_areas.area_number));
  dbms_output.put_line(strava_pkg.name_heirarchy_fn(l_my_areas.parent_area_code,l_my_areas.parent_area_number));
END;
/

If I pass the code and number for a particular area, I can get its full hierarchy including its name. I can see the parish of Streatley, is in the Unitary Authority of West Berkshire, which is in England, and England is in the United Kingdom. If I pass the code and number of its parent, I just get the hierarchy up to its parent.

Streatley, West Berkshire, England, United Kingdom
West Berkshire, England, United Kingdom

I can store the hierarchy on my_areas, though I have to store results on a temporary table, rather than update it directly. Otherwise, I get a mutation error.

ALTER TABLE my_areas add name_heirarchy VARCHAR(4000)
/
CREATE GLOBAL TEMPORARY TABLE my_areas_temp ON COMMIT PRESERVE ROWS AS 
SELECT area_code, area_number, strava_pkg.name_heirarchy_fn(parent_area_code,parent_area_number) name_heirarchy
FROM my_areas WHERE parent_area_code IS NOT NULL AND parent_area_number IS NOT NULL
/
MERGE INTO my_areas u 
USING (SELECT * FROM my_areas_temp) s
ON (u.area_code = s.area_code AND u.area_number = s.area_number)
WHEN MATCHED THEN UPDATE
SET u.name_heirarchy = s.name_heirarchy
/

Then I can create a multi-column text index on the name

begin
 ctx_ddl.create_preference('my_areas_lexer', 'BASIC_LEXER');  
 ctx_ddl.set_attribute('my_areas_lexer', 'mixed_case', 'NO'); 
 ctx_ddl.create_preference('my_areas_datastore', 'MULTI_COLUMN_DATASTORE'); 
 ctx_ddl.set_attribute('my_areas_datastore', 'columns', 'name, name_heirarchy'); 
end;
/
CREATE INDEX my_areas_name_txtidx ON my_areas (name) INDEXTYPE IS ctxsys.context 
PARAMETERS ('datastore my_areas_datastore lexer my_areas_lexer sync(on commit)');

The index will sync if I have cause to update the hierarchy.

Text Index Option 2: Index a user_datastore based on the result of a PL/SQL function

Alternatively, I can build a text index on a combination of data from various sources by creating a PL/SQL procedure that combines the data and returns the string to be indexed.

I have created a procedure (strava_pkg.name_heirarchy_txtidx) that returns a string containing the hierarchy of a given area, and then I will create a text index on that. The format of the parameters must be exactly as follows:

The rowid of the row being indexed is passed to the procedure;
The string to be indexed is passed back as a CLOB parameter.

See also: Oracle Text Indexing Elements: USER_DATASTORE Attributes

…
PROCEDURE name_heirarchy_txtidx
(p_rowid in rowid
,p_dataout IN OUT NOCOPY CLOB
) IS
  l_count INTEGER := 0;
BEGIN
  FOR i IN (
    SELECT area_code, area_number, name, matchable
    FROM   my_areas m
    START WITH rowid = p_rowid
    CONNECT BY NOCYCLE prior m.parent_area_code   = m.area_code
                   AND prior m.parent_area_number = m.area_number
  ) LOOP
    IF i.matchable >= 1 THEN
      l_count := l_count + 1;
      IF l_count > 1 THEN
        p_dataout := p_dataout ||', '|| i.name;
      ELSE
        p_dataout := i.name;
      END IF;
    END IF;
  END LOOP;
END name_heirarchy_txtidx;
…

As an example, if I pass a particular rowid to the procedure, I obtain the full hierarchy of areas as before.

set serveroutput on
DECLARE
  l_rowid ROWID;
  l_clob CLOB;
BEGIN
  select rowid
  into   l_rowid
  FROM   my_areas m
  WHERE  area_code = 'CPC'
  And    area_number = '40307';

  strava_pkg.name_heirarchy_txtidx(l_rowid, l_clob);
  dbms_output.put_line(l_clob);
END;
/

Streatley, West Berkshire, England, United Kingdom

PL/SQL procedure successfully completed.

The procedure is referenced as an attribute to a user datastore, I can then build a text index on the user datastore.

BEGIN
  ctx_ddl.create_preference('my_areas_lexer', 'BASIC_LEXER');  
  ctx_ddl.set_attribute('my_areas_lexer', 'mixed_case', 'NO'); 
  ctx_ddl.create_preference('my_areas_datastore', 'user_datastore'); 
  ctx_ddl.set_attribute('my_areas_datastore', 'procedure', 'strava_pkg.name_heirarchy_txtidx'); 
  ctx_ddl.set_attribute('my_areas_datastore', 'output_type', 'CLOB');
END;
/

CREATE INDEX my_areas_name_txtidx on my_areas (name) INDEXTYPE IS ctxsys.context 
PARAMETERS ('datastore my_areas_datastore lexer my_areas_lexer');

I have not been able to combine a multi-column datastore with a user datastore.

Text Search examples

Both options produce an index that I can use in the same way. I can search for a particular name, for example, the village of Streatley.

SELECT score(1), area_Code, area_number, name, suffix, name_heirarchy
FROM   my_areas m
WHERE  CONTAINS(name,'streatley',1)>0
/

I get the two Streatleys, one in Berkshire, and the other in Bedfordshire.

  SCORE(1) AREA AREA_NUMBER NAME                 SUFFIX     NAME_HEIRARCHY
---------- ---- ----------- -------------------- ---------- ------------------------------------------------------------
        16 CPC        41076 Streatley            CP         Streatley, Central Bedfordshire, England, United Kingdom
        16 CPC        40307 Streatley            CP         Streatley, West Berkshire, England, United Kingdom

As I have indexed the full hierarchy, I can be more precise and search for both the village and the county, even though they are two different rows in the my_areas table.

SELECT score(1), area_Code, area_number, name, suffix, name_heirarchy
FROM   my_areas m
WHERE  CONTAINS(name,'streatley and berks%',1)>0
/

Now I just get one result. The Streatley in Berkshire.

  SCORE(1) AREA AREA_NUMBER NAME                 SUFFIX     NAME_HEIRARCHY
---------- ---- ----------- -------------------- ---------- ------------------------------------------------------------
        11 CPC        40307 Streatley            CP         Streatley, West Berkshire, England, United Kingdom

Searching For the Top of Hierarchies

My search query works satisfactorily if my search identifies areas with no children, but supposing I search for something higher up the hierarchy, like Berkshire?

SELECT score(1), area_Code, area_number, name, suffix, name_heirarchy
FROM   my_areas m
WHERE  CONTAINS(name,'berkshire',1)>0
/

I get 184 areas, of different types within the areas called Berkshire, because the name of the parent area appears in the hierarchy of all the children and so is returned by the text index.

           Area        Area
  SCORE(1) Code      Number NAME                      SUFFIX     NAME_HEIRARCHY
---------- ---- ----------- ------------------------- ---------- -----------------------------------------------------------
        11 UTA       101678 Windsor and Maidenhead    (B)        Windsor and Maidenhead, Berkshire, England, United Kingdom
        11 UTA       101680 Wokingham                 (B)        Wokingham, Berkshire, England, United Kingdom
        11 UTA       101681 Reading                   (B)        Reading, Berkshire, England, United Kingdom
        11 UTA       101685 West Berkshire                       West Berkshire, England, United Kingdom
        11 UTW        40258 Norreys                   Ward       Norreys, Wokingham, Berkshire, England, United Kingdom
        11 UTW        40261 Barkham                   Ward       Barkham, Wokingham, Berkshire, England, United Kingdom
…

However, I am just interested in the highest points on each part of the hierarchy I have identified. So, I exclude any result where its parent is also in the result set.

WITH x AS (
SELECT area_code, area_number, parent_area_code, parent_area_number, name, name_heirarchy
FROM   my_areas m
WHERE  CONTAINS(name,'berkshire',1)>0
) SELECT * FROM x WHERE NOT EXISTS (
  SELECT 'x' FROM x x1
  WHERE  x1.area_code = x.parent_area_code
  AND    x1.area_number = x.parent_area_number
)
/

In this case, I still get two results because the boundaries of the unitary authority of West Berkshire are not entirely within the ceremonial county of Berkshire (some parts of Hungerford and Lambourne were exchanged with Wiltshire in 1990), hence I could not make Berkshire the parent of West Berkshire.

Area        Area
Code      Number      SCORE NAME_HEIRARCHY
---- ----------- ---------- ------------------------------------------------------------
UTA       101685         11 West Berkshire, England, United Kingdom
CCTY           7         11 Berkshire, England, United Kingdom

Text Searching for Activities that pass through Areas

It is a simple extension to join the pre-processed areas through which activities pass to the areas found by the text search, and then exclude areas whose parent was also found in the same activity.

WITH x AS (
SELECT aa.activity_id, m.area_code, m.area_number, m.parent_area_code, m.parent_area_number, m.name, m.name_heirarchy
FROM   my_areas m, activity_areas aa
WHERE  m.area_Code = aa.area_code
AND    m.area_number = aa.area_number
AND    CONTAINS(name,'berkshire',1)>0
) 
SELECT a.activity_id, a.activity_date, a.activity_name, a.activity_type, a.distance_km
,      x.area_Code, x.area_number, x.name, x.name_heirarchy
FROM   x, activities a
WHERE  x.activity_id = a.activity_id
AND    a.activity_date between TO_DATE('01022019','DDMMYYYY') and TO_DATE('28022019','DDMMYYYY')
AND NOT EXISTS (
  SELECT 'x' FROM x x1
  WHERE  x1.area_code = x.parent_area_code
  AND    x1.area_number = x.parent_area_number
  AND    x1.activity_id = x.activity_id)
ORDER BY a.activity_date
/

Now I can see the rides in Berkshire in February 2019. I get two rows returned for the ride that was in both Berkshire and West Berkshire.

  Activity Activity                                                Activity Distance Area   Area
        ID Date      ACTIVITY_NAME                                 Type         (km) Code Number NAME            NAME_HEIRARCHY
---------- --------- --------------------------------------------- -------- -------- ---- ------ --------------- -------------------------
2156308823 17-FEB-19 MV - Aldworth, CLCTC Aldworth-Reading         Ride       120.86 CCTY      7 Berkshire       England, United Kingdom
2156308823 17-FEB-19 MV - Aldworth, CLCTC Aldworth-Reading         Ride       120.86 UTA  101685 West Berkshire  England, United Kingdom
2172794879 24-FEB-19 MV - Maidenhead                               Ride        48.14 CCTY      7 Berkshire       England, United Kingdom
2173048214 24-FEB-19 CLCTC: Maidenhead - Turville Heath            Ride        53.15 CCTY      7 Berkshire       England, United Kingdom
2173048406 24-FEB-19 Maidenhead - Burnham Beeches - West Drayton   Ride        27.92 CCTY      7 Berkshire       England, United Kingdom
…

References

I found these references useful while creating the Text index:

Boyko Dimitrov: Full text search across multiple database columns with Oracle Text
Oracle Blog about Oracle Text: Getting started Part 3 - Index maintenance
Jonathan Lewis (at Redgate): Text Indexes

↧

Clashing SQL Profiles - Exact Matching Profiles Take Precedence Over Force Matching Profiles

July 31, 2021, 5:53 am

≫ Next: Detecting Clashing SQL Profiles

≪ Previous: Spatial Data 6: Text Searching Areas by their Name, and the Names of Parent Areas

Sometimes, you reach a point in performance tuning, where you use a SQL Baseline, or SQL Patch, or SQL Profile to stabilise an execution plan. These methods all effectively inject a hint or set of hints into a statement to produce the desired execution plan. Baselines and Patches will only exactly match a SQL ID and therefore a SQL statement. However, a SQL Profile can optionally do force matching so that it applies to "all SQL statements that have the same text after the literal values in the WHERE clause have been replaced by bind variables.

This setting may be useful for applications that use only literal values because it enables SQL with text differing only in its literal values to share a SQL profile. If both literal values and bind variables are in the SQL text, or if force_match is set to false (default), then the literal values in the WHERE clause are not replaced by bind variables."[Oracle Database SQL Tuning Guide]

I often work with PeopleSoft, whose batch processes often dynamically generate SQL with literal values. Therefore, I usually create force matching profiles when I need to control an execution plan. However, sometimes I come across situations where some exact matching (i.e. not force matching) profiles have been created (often by production DBAs using the tuning advisor) on different statements that have the same force matching signature, and then maybe a force matching profile has also been applied.

Note: SQL Profiles require the Tuning Pack licence.

Where both exact and force matching profiles apply to a SQL statement, the exact matching profile will take precedence over the force matching profile, and even if disabled it will prevent the force matching profile from being applied.

I will demonstrate this with a simple test. I will create a table with a couple of indexes, collect statistics, and generate an execution plan for a query. I am using explain plan for command to force a parse of the statement every time.

CREATE TABLE t (a not null, b) AS 
SELECT rownum, ceil(sqrt(rownum)) FROM dual CONNECT BY LEVEL <= 100;
CREATE UNIQUE INDEX t_idx on t(a);
CREATE INDEX t_idx2 on t(b,a);
EXEC dbms_stats.gather_table_stats(user,'T');

Without Any SQL Profiles

Without any profiles in place, I get a skip scan of T_IDX2, and there is no note in the execution plan.

EXPLAIN PLAN FOR SELECT * FROM t WHERE a = 42;
SELECT * FROM table(dbms_xplan.display(null,null,'ADVANCED +ADAPTIVE -PROJECTION'));

Plan hash value: 3418618943
---------------------------------------------------------------------------
| Id  | Operation        | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT |        |     1 |     6 |     1   (0)| 00:00:01 |
|*  1 |  INDEX SKIP SCAN | T_IDX2 |     1 |     6 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------
…
Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      INDEX_SS(@"SEL$1""T"@"SEL$1" ("T"."B""T"."A"))
      OUTLINE_LEAF(@"SEL$1")
      ALL_ROWS
      DB_VERSION('19.1.0')
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A"=42)
       filter("A"=42)
…

Force Matching Profile

Now I will create an exact matching SQL profile on the query that will force the use of the unique index. The query is the same except the literal value is different (it is 54 instead of 42).

DECLARE
signature INTEGER;
sql_txt CLOB;
h       SYS.SQLPROF_ATTR;
BEGIN
sql_txt := q'[
SELECT * FROM t WHERE a = 54
]';
h := SYS.SQLPROF_ATTR(
q'[BEGIN_OUTLINE_DATA]',
q'[IGNORE_OPTIM_EMBEDDED_HINTS]',
q'[FULL(@"SEL$1""T"@"SEL$1")]',
q'[END_OUTLINE_DATA]');
signature := DBMS_SQLTUNE.SQLTEXT_TO_SIGNATURE(sql_txt);
DBMS_SQLTUNE.IMPORT_SQL_PROFILE (
sql_text    => sql_txt,
profile     => h,
name        => 'clashing_profile_test_force',
category    => 'DEFAULT',
validate    => TRUE,
replace     => TRUE,
force_match => TRUE 
);
END;
/

I only have a force-matching profile.

                                                                   Execution plan with force matching profile (full scan)

NAME                           CATEGORY               SIGNATURE SQL_TEXT                                                                         CREATED
------------------------------ ---------- --------------------- -------------------------------------------------------------------------------- ------------------------------
LAST_MODIFIED                  DESCRIPTION          TYPE    STATUS   FOR    TASK_ID TASK_EXEC_NAME       TASK_OBJ_ID TASK_FND_ID TASK_REC_ID TASK_CON_DBID
------------------------------ -------------------- ------- -------- --- ---------- -------------------- ----------- ----------- ----------- -------------
clashing_profile_test_force    DEFAULT     11431056000319719221                                                                                  27-JUL-21 01.35.43.854691 PM
                                                                SELECT * FROM t WHERE a = 54
27-JUL-21 01.35.43.000000 PM                        MANUAL  ENABLED  YES

The execution plan uses the full plan as specified by the profile, there is a note confirming that the profile was matched and used, and the full hint was listed in the hint report.

EXPLAIN PLAN FOR SELECT * FROM t WHERE a = 42;
SELECT * FROM table(dbms_xplan.display(null,null,'ADVANCED +ADAPTIVE -PROJECTION'));

Plan hash value: 1601196873

----------------------------------------------------------------------------------
| Id  | Operation                 | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |      |     1 |     6 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS STORAGE FULL| T    |     1 |     6 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------
…
Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      FULL(@"SEL$1""T"@"SEL$1")
      OUTLINE_LEAF(@"SEL$1")
      ALL_ROWS
      DB_VERSION('19.1.0')
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage("A"=42)
       filter("A"=42)

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2
---------------------------------------------------------------------------

   0 -  STATEMENT
           -  IGNORE_OPTIM_EMBEDDED_HINTS

   1 -  SEL$1 / T@SEL$1
           -  FULL(@"SEL$1""T"@"SEL$1")

Note
-----
   - SQL profile "clashing_profile_test_force" used for this statement

Exact Matching Profile

I will now add an exact matching profile

DECLARE
signature INTEGER;
sql_txt CLOB;
h       SYS.SQLPROF_ATTR;
BEGIN
sql_txt := q'[
SELECT * FROM t WHERE a = 42
]';
h := SYS.SQLPROF_ATTR(
q'[BEGIN_OUTLINE_DATA]',
q'[IGNORE_OPTIM_EMBEDDED_HINTS]',
q'[INDEX(@"SEL$1""T"@"SEL$1" ("T"."A"))]',
q'[END_OUTLINE_DATA]');
signature := DBMS_SQLTUNE.SQLTEXT_TO_SIGNATURE(sql_txt);
DBMS_SQLTUNE.IMPORT_SQL_PROFILE (
sql_text    => sql_txt,
profile     => h,
name        => 'clashing_profile_test_exact',
category    => 'DEFAULT',
validate    => TRUE,
replace     => TRUE,
force_match => FALSE 
);
END;
/

I can see I now have two SQL Profiles; one force matched, and one exact matched.

                                                                    Execution plan with force matching profile (unique index lookup)

NAME                           CATEGORY               SIGNATURE SQL_TEXT                                                                         CREATED
------------------------------ ---------- --------------------- -------------------------------------------------------------------------------- ------------------------------
LAST_MODIFIED                  DESCRIPTION          TYPE    STATUS   FOR    TASK_ID TASK_EXEC_NAME       TASK_OBJ_ID TASK_FND_ID TASK_REC_ID TASK_CON_DBID
------------------------------ -------------------- ------- -------- --- ---------- -------------------- ----------- ----------- ----------- -------------
clashing_profile_test_exact    DEFAULT     14843900676141266266                                                                                  27-JUL-21 01.35.46.825697 PM
                                                                SELECT * FROM t WHERE a = 42
27-JUL-21 01.35.46.000000 PM                        MANUAL  ENABLED  NO

clashing_profile_test_force    DEFAULT     11431056000319719221                                                                                  27-JUL-21 01.35.43.854691 PM
                                                                SELECT * FROM t WHERE a = 54
27-JUL-21 01.35.43.000000 PM                        MANUAL  ENABLED  YES

The execution plan has changed to the unique index scan. The index hint from the profile appears hints report. The note at the bottom of the plan shows the exact matching profile has been used, taking precedence over the force matching profile.

EXPLAIN PLAN FOR SELECT * FROM t WHERE a = 42;
SELECT * FROM table(dbms_xplan.display(null,null,'ADVANCED +ADAPTIVE -PROJECTION'));

Plan hash value: 2929955852

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     1 |     6 |     1   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |     1 |     6 |     1   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | T_IDX |     1 |       |     0   (0)| 00:00:01 |
-------------------------------------------------------------------------------------
…
Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      INDEX_RS_ASC(@"SEL$1""T"@"SEL$1" ("T"."A"))
      OUTLINE_LEAF(@"SEL$1")
      ALL_ROWS
      DB_VERSION('19.1.0')
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("A"=42)

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2
---------------------------------------------------------------------------

   0 -  STATEMENT
           -  IGNORE_OPTIM_EMBEDDED_HINTS

   1 -  SEL$1 / T@SEL$1
           -  INDEX(@"SEL$1""T"@"SEL$1" ("T"."A"))

Note
-----
   - SQL profile "clashing_profile_test_exact" used for this statement

Different Query

If I run the query with a different literal value, the plan changes back to the full scan, and the note reports the force matching profile was used

EXPLAIN PLAN FOR SELECT * FROM t WHERE a = 54;
SELECT * FROM table(dbms_xplan.display(null,null,'ADVANCED +ADAPTIVE -PROJECTION'));

Plan hash value: 1601196873

----------------------------------------------------------------------------------
| Id  | Operation                 | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |      |     1 |     6 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS STORAGE FULL| T    |     1 |     6 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------
…
Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      FULL(@"SEL$1""T"@"SEL$1")
      OUTLINE_LEAF(@"SEL$1")
      ALL_ROWS
      DB_VERSION('19.1.0')
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage("A"=54)
       filter("A"=54)

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2
---------------------------------------------------------------------------

   0 -  STATEMENT
           -  IGNORE_OPTIM_EMBEDDED_HINTS

   1 -  SEL$1 / T@SEL$1
           -  FULL(@"SEL$1""T"@"SEL$1")

Note
-----
   - SQL profile "clashing_profile_test_force" used for this statement

Disable Exact Matching SQL Profile

I will now disable the exact matching profile.

exec dbms_sqltune.alter_sql_profile(name=>'clashing_profile_test_exact', attribute_name=>'STATUS',value=>'DISABLED');
SELECT * FROM dba_sql_profiles where name like 'clashing%';

                                                                Disable Exact Profile - Execution plan with no profile (skip scan) - Odd

NAME                           CATEGORY               SIGNATURE SQL_TEXT                                                                         CREATED
------------------------------ ---------- --------------------- -------------------------------------------------------------------------------- ------------------------------
LAST_MODIFIED                  DESCRIPTION          TYPE    STATUS   FOR    TASK_ID TASK_EXEC_NAME       TASK_OBJ_ID TASK_FND_ID TASK_REC_ID TASK_CON_DBID
------------------------------ -------------------- ------- -------- --- ---------- -------------------- ----------- ----------- ----------- -------------
clashing_profile_test_exact    DEFAULT     14843900676141266266                                                                                  27-JUL-21 01.35.46.825697 PM
                                                                SELECT * FROM t WHERE a = 42
27-JUL-21 01.35.52.000000 PM                        MANUAL  DISABLED NO

clashing_profile_test_force    DEFAULT     11431056000319719221                                                                                  27-JUL-21 01.35.43.854691 PM
                                                                SELECT * FROM t WHERE a = 54
27-JUL-21 01.35.43.000000 PM                        MANUAL  ENABLED  YES

I expected the profile to switch back to the force matching profile, but instead it goes back to the original plan using the skip scan with no profile at all. So the disabled exact matching profile prevents the force matching profile from matching the statement, and then doesn't get applied to the statement either! There is no note in the execution plan and no hint report.

EXPLAIN PLAN FOR SELECT * FROM t WHERE a = 42;
SELECT * FROM table(dbms_xplan.display(null,null,'ADVANCED +ADAPTIVE -PROJECTION'));

Plan hash value: 3418618943

---------------------------------------------------------------------------
| Id  | Operation        | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT |        |     1 |     6 |     1   (0)| 00:00:01 |
|*  1 |  INDEX SKIP SCAN | T_IDX2 |     1 |     6 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------
…
Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      INDEX_SS(@"SEL$1""T"@"SEL$1" ("T"."B""T"."A"))
      OUTLINE_LEAF(@"SEL$1")
      ALL_ROWS
      DB_VERSION('19.1.0')
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("A"=42)
       filter("A"=42)

Alter Category of Exact Matching SQL Profile

I could have dropped the SQL Profile, but I might want to retain it for documentation and in case I need to reinstate it. So instead I will move it to a different category.

exec dbms_sqltune.alter_sql_profile(name=>'clashing_profile_test_exact', attribute_name=>'CATEGORY',value=>'DO_NOT_USE');
SELECT * FROM dba_sql_profiles where name like 'clashing%';

                                                      Change Category of Exact Profile - Execution plan with force matching profile (full scan)

NAME                           CATEGORY               SIGNATURE SQL_TEXT                                                                         CREATED
------------------------------ ---------- --------------------- -------------------------------------------------------------------------------- ------------------------------
LAST_MODIFIED                  DESCRIPTION          TYPE    STATUS   FOR    TASK_ID TASK_EXEC_NAME       TASK_OBJ_ID TASK_FND_ID TASK_REC_ID TASK_CON_DBID
------------------------------ -------------------- ------- -------- --- ---------- -------------------- ----------- ----------- ----------- -------------
clashing_profile_test_exact    DO_NOT_USE  14843900676141266266                                                                                  27-JUL-21 02.57.11.343291 PM
                                                                SELECT * FROM t WHERE a = 42
27-JUL-21 02.57.19.000000 PM                        MANUAL  DISABLED NO

clashing_profile_test_force    DEFAULT     11431056000319719221                                                                                  27-JUL-21 02.57.08.390801 PM
                                                                SELECT * FROM t WHERE a = 54
27-JUL-21 02.57.08.000000 PM                        MANUAL  ENABLED  YES

And now the execution plan goes back to the force matching profile and the unique index lookup.

EXPLAIN PLAN FOR SELECT * FROM t WHERE a = 42;
SELECT * FROM table(dbms_xplan.display(null,null,'ADVANCED +ADAPTIVE -PROJECTION'));

Plan hash value: 1601196873

----------------------------------------------------------------------------------
| Id  | Operation                 | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |      |     1 |     6 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS STORAGE FULL| T    |     1 |     6 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------
…
Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      FULL(@"SEL$1""T"@"SEL$1")
      OUTLINE_LEAF(@"SEL$1")
      ALL_ROWS
      DB_VERSION('19.1.0')
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage("A"=42)
       filter("A"=42)

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2
---------------------------------------------------------------------------

   0 -  STATEMENT
           -  IGNORE_OPTIM_EMBEDDED_HINTS

   1 -  SEL$1 / T@SEL$1
           -  FULL(@"SEL$1""T"@"SEL$1")

Note
-----
   - SQL profile "clashing_profile_test_force" used for this statement

Conclusion

An exact matching profile will be matched to a SQL statement before a force matching SQL statement, even if it is disabled, in which case neither profile will be applied.

If you have exact matching SQL profiles that provide the same hints to produce the same execution plan on various similar SQL statements that have the same force matching signature (i.e. they only differ in their literal values), and you wish to replace them with a single force matching profile, then rather than disable the exact matching profiles you should either drop them or if you prefer to retain them for documentation then alter them to a different category.

The script GitHub used in this blog to demonstrate this behaviour is available on GitHub. They were run in Oracle 19.9 for this post.
The script disabled_profiles_category.sql moves all disabled profiles from the category DEFAULT to DO_NOT_USE.

In a subsequent post, I will show how to detect conflicting SQL profiles.

↧

Detecting Clashing SQL Profiles

August 2, 2021, 7:14 am

≫ Next: Alter SQL Profiles from Exact to Force Matching

≪ Previous: Clashing SQL Profiles - Exact Matching Profiles Take Precedence Over Force Matching Profiles

In my last post, I discussed the possible undesirable consequences of force and exact matching SQL profiles on statements with the same force matching signature. The question is how do you detect such profiles?

I have created three profiles on very similar SQL statements that only differ in the literal value of a predicate. One of them is force matching, the others are exact matching. The signature reported by DBA_SQL_PROFILES is the force matching signature for force matching profiles, and the exact matching signature for exact matching profiles.

select * from dba_sql_profiles;

NAME                           CATEGORY               SIGNATURE SQL_TEXT                                           CREATED
------------------------------ ---------- --------------------- -------------------------------------------------- ------------------------------
LAST_MODIFIED                  DESCRIPTION          TYPE    STATUS   FOR    TASK_ID TASK_EXEC_NAME       TASK_OBJ_ID TASK_FND_ID TASK_REC_ID TASK_CON_DBID
------------------------------ -------------------- ------- -------- --- ---------- -------------------- ----------- ----------- ----------- -------------
my_sql_profile_force           DEFAULT     11431056000319719221                                                    16:09:33 01/08/2021
                                                                SELECT * FROM t WHERE a = 54
16:09:33 01/08/2021                                 MANUAL  ENABLED  YES

my_sql_profile_24              DEFAULT     12140764948557749245                                                    16:09:33 01/08/2021
                                                                SELECT * FROM t
                                                                WHERE a = 24
16:09:33 01/08/2021                                 MANUAL  ENABLED  NO

my_sql_profile_42              DEFAULT     14843900676141266266                                                    16:09:33 01/08/2021
                                                                SELECT * FROM t WHERE a = 42
16:09:33 01/08/2021                                 MANUAL  ENABLED  NO

In order to be able to compare the profiles, I need to calculate the force matching signature for the exact matching profiles using DBMS_SQLTUNE.SQLTEXT_TO_SIGNATURE. I can't use the Boolean constant TRUE parameter in SQL. Instead, I have used a PL/SQL function in a with clause.

REM dup_sql_profiles1.sql
WITH function sig(p_sql_text CLOB, p_number INTEGER) RETURN NUMBER IS
 l_sig NUMBER;
BEGIN
 IF p_number > 0 THEN 
  l_sig := dbms_sqltune.sqltext_to_signature(p_sql_text,TRUE);
 ELSIF p_number = 0 THEN 
  l_sig := dbms_sqltune.sqltext_to_signature(p_sql_text,FALSE);
 END IF;
 RETURN l_sig;
END;
x as (
select CASE WHEN force_matching = 'NO'  THEN signature ELSE sig(sql_text, 0) END exact_sig
,      CASE WHEN force_matching = 'YES' THEN signature ELSE sig(sql_text, 1) END force_sig
,      p.*
from   dba_sql_profiles p
where  (status = 'ENABLED' or force_matching = 'NO')
), y as (
select x.*
, row_number() over (partition by category, force_sig order by force_matching desc, exact_sig nulls first) profile#
, count(*) over (partition by category, force_sig) num_profiles
from x
)
select profile#, num_profiles, force_sig, exact_sig, name, created, category, status, force_matching, sql_text
from y
where num_profiles > 1
order by force_sig, force_matching desc, exact_sig
/

We can see these three profiles are grouped together. The force matching signature calculated on the exact matching profiles is the same as the signature on the force matching profile. Now I can start to make some decisions about whether I should retain the exact matching profiles or remove them and just use the force matching profile.

Prof   Num        Force Matching        Exact Matching
   # Profs             Signature             Signature NAME                           CREATED                      CATEGORY             STATUS   FOR
---- ----- --------------------- --------------------- ------------------------------ ---------------------------- -------------------- -------- ---
SQL_TEXT
----------------------------------------------------------------------------------------------------------------------------------------------------
   1     3  11431056000319719221                       my_sql_profile_force           16:35:36 01/08/2021          DEFAULT              ENABLED  YES

SELECT * FROM t WHERE a = 54

   2     3                        12140764948557749245 my_sql_profile_24              16:35:36 01/08/2021          DEFAULT              ENABLED  NO

SELECT * FROM t
WHERE a = 24

   3     3                        14843900676141266266 my_sql_profile_42              16:35:36 01/08/2021          DEFAULT              ENABLED  NO

SELECT * FROM t WHERE a = 42

The SQL statements in this example are absurdly simple. In real life that is rarely the case. Sometimes it can be a struggle to see where two complex statements differ.

In the next query, I compare enabled force matching SQL profiles to any exact matching profiles in the same category with the same force matching signature. The full query is on GitHub.

REM dup_sql_profiles2.sql
WITH function sig(p_sql_text CLOB, p_number INTEGER) RETURN NUMBER IS
…
END sig;
function norm(p_queryin CLOB) RETURN CLOB IS
…
END norm;
function str_diff(p_str1 CLOB, p_str2 CLOB) RETURN NUMBER IS
…
END str_diff;
x as (
select CASE WHEN force_matching = 'NO'  THEN signature ELSE sig(sql_text, 0) END exact_sig
,      CASE WHEN force_matching = 'YES' THEN signature ELSE sig(sql_text, 1) END force_sig
,      p.*
from   dba_sql_profiles p
), y as (
select f.force_matching, f.force_sig, f.name force_name, f.created force_created, f.status force_status
,      e.force_matching exact_matching, e.exact_sig, e.name exact_name
,      e.created exact_created, e.status exact_status, e.category
,      norm(e.sql_text) esql_text, norm(f.sql_text) fsql_text
from   x e
,      x f
where  f.force_matching = 'YES'
and    e.force_matching = 'NO'
and    e.force_sig = f.force_sig
and    e.category = f.category
and    e.name != f.name
and    f.status = 'ENABLED'
), z as (
select y.*
,      str_diff(fsql_Text, esql_text) diff_len
from y
)
select force_matching, force_Sig, force_name, force_created, force_status
,      exact_matching, exact_sig, exact_name, exact_Created, exact_status
, substr(fsql_text,1,diff_len) common_text
, substr(fsql_text,diff_len+1) fdiff_text, substr(esql_text,diff_len+1) ediff_text
from z
order by force_sig
/

I have shown the common part of both statements, from the start to the first difference, and then also how the rest of each statement continues.

It is not enough to simply compare two statements character by character. Both the force and exact matching signatures are "calculated on the normalized SQL text. The normalization includes the removal of white space and the uppercasing of all non-literal strings". However, neither the normalised SQL, nor the normalisation mechanism is exposed by Oracle. Therefore, in this query, I have included my own rudimentary normalisation function (based on an idea from AskTOM) that I apply first and a string comparison function. You can see that normalisation has eliminated the line feed in from the statement in my_sql_profile_24.

Now I can see my two exact matching profiles match my force matching profile. I can see the common part of the SQL up to the literal value, and the different parts of the text are just the literal value.

           Force Matching Force                          Force                        Force               Exact Matching Exact                          Exact                        Exact
FOR             Signature Name                           Created Date                 Status   EXA             Signature Name                           Created Date                 Status
--- --------------------- ------------------------------ ---------------------------- -------- --- --------------------- ------------------------------ ---------------------------- --------
Common Text
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Force Text                                                                                          Exact Text
--------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------
YES  11431056000319719221 my_sql_profile_force           16:35:36 01/08/2021          ENABLED  NO   12140764948557749245 my_sql_profile_24              16:35:36 01/08/2021          ENABLED
SELECT * FROM T WHERE A =
54                                                                                                  24

                                                                                      ENABLED  NO   14843900676141266266 my_sql_profile_42              16:35:36 01/08/2021          ENABLED
SELECT * FROM T WHERE A =
54

Both the queries mentioned in this blog are available on GitHub.

↧

Alter SQL Profiles from Exact to Force Matching

August 3, 2021, 2:03 am

≫ Next: Obtaining Trace Files without Access to the Database Server

≪ Previous: Detecting Clashing SQL Profiles

You can use DBMS_SQLTUNE.ALTER_SQL_PROFILE to change the status, name, description, or category of a SQL profile, but you can't alter it from exact to force matching. Instead, you would have to recreate it. That is easy if you have the script that you used to create it in the first place. There is another way.

Oracle support note How to Move SQL Profiles from One Database to Another (Including to Higher Versions) (Doc ID 457531.1) describes a process to export SQL profiles to a staging table that can be imported into another database. This provides an opportunity to alter a profile by updating the data in the staging table. There are two columns in the staging table that have to be updated.

SQLFLAGS must be updated from 0 (indicating an exact match profile) to 1 (indicating a force match profile)
SIGNATURE must be recalculated as a force matching signature using DBMS_SQLTUNE.SQLTEXT_TO_SIGNATURE.

Demonstration

I am going to create a small table with a unique index.

CREATE TABLE t (a not null, b) AS 
SELECT rownum, ceil(sqrt(rownum)) FROM dual connect by level <= 100;
create unique index t_idx on t(a);
exec dbms_stats.gather_table_stats(user,'T');

ttitle off
select * from dba_sql_profiles where name like 'my%sql_profile%';
explain plan for SELECT * FROM t WHERE a = 42;
ttitle 'Default Execution plan without profiles (index scan)'
select * from table(dbms_xplan.display(null,null,'ADVANCED +ADAPTIVE -PROJECTION'));

Without any SQL profiles, when I query by the unique key I get a unique index scan.

Plan hash value: 2929955852

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     1 |     6 |     1   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |     1 |     6 |     1   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | T_IDX |     1 |       |     0   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Now I am going to create two SQL profiles. I have deliberately put the same SQL text into both SQL Profiles.

my_sql_profile is exact matching
my_sql_profile_force is force matching.

DECLARE
signature INTEGER;
sql_txt CLOB;
h       SYS.SQLPROF_ATTR;
BEGIN
sql_txt := q'[
SELECT * FROM t WHERE a = 54
]';
h := SYS.SQLPROF_ATTR(
q'[BEGIN_OUTLINE_DATA]',
q'[IGNORE_OPTIM_EMBEDDED_HINTS]',
q'[FULL(@"SEL$1""T"@"SEL$1")]',
q'[END_OUTLINE_DATA]');
signature := DBMS_SQLTUNE.SQLTEXT_TO_SIGNATURE(sql_txt);
DBMS_SQLTUNE.IMPORT_SQL_PROFILE (
sql_text    => sql_txt,
profile     => h,
name        => 'my_sql_profile',
category    => 'DEFAULT',
validate    => TRUE,
replace     => TRUE,
force_match => FALSE 
);
END;
/

DECLARE
signature INTEGER;
sql_txt CLOB;
h       SYS.SQLPROF_ATTR;
BEGIN
sql_txt := q'[
SELECT * FROM t WHERE a = 54
]';
h := SYS.SQLPROF_ATTR(
q'[BEGIN_OUTLINE_DATA]',
q'[IGNORE_OPTIM_EMBEDDED_HINTS]',
q'[FULL(@"SEL$1""T"@"SEL$1")]',
q'[END_OUTLINE_DATA]');
signature := DBMS_SQLTUNE.SQLTEXT_TO_SIGNATURE(sql_txt);
DBMS_SQLTUNE.IMPORT_SQL_PROFILE (
sql_text    => sql_txt,
profile     => h,
name        => 'my_sql_profile_force',
category    => 'DEFAULT',
validate    => TRUE,
replace     => TRUE,
force_match => TRUE 
);
END;
/
ttitle off
select * from dba_sql_profiles where name like 'my%sql_profile%';

NAME                           CATEGORY               SIGNATURE SQL_TEXT                                                                         CREATED
------------------------------ ---------- --------------------- -------------------------------------------------------------------------------- ------------------------------
LAST_MODIFIED                  DESCRIPTION          TYPE    STATUS   FOR    TASK_ID TASK_EXEC_NAME       TASK_OBJ_ID TASK_FND_ID TASK_REC_ID TASK_CON_DBID
------------------------------ -------------------- ------- -------- --- ---------- -------------------- ----------- ----------- ----------- -------------
my_sql_profile                 DEFAULT      9394869341287877934                                                                                  31-JUL-21 10.47.34.243454
                                                                SELECT * FROM t WHERE a = 54
31-JUL-21 10.47.34.000000                           MANUAL  ENABLED  NO

my_sql_profile_force           DEFAULT     11431056000319719221                                                                                  31-JUL-21 10.47.34.502721
                                                                SELECT * FROM t WHERE a = 54
31-JUL-21 10.47.34.000000                           MANUAL  ENABLED  YES

The force match profile works if the literal value is different from that in the profiles.

explain plan for SELECT * FROM t WHERE a = 42;
select * from table(dbms_xplan.display(null,null,'ADVANCED +ADAPTIVE -PROJECTION'));

Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |     6 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |     1 |     6 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------
…
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("A"=42)

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2
---------------------------------------------------------------------------

   0 -  STATEMENT
           -  IGNORE_OPTIM_EMBEDDED_HINTS

   1 -  SEL$1 / T@SEL$1
           -  FULL(@"SEL$1""T"@"SEL$1")

Note
-----
   - SQL profile "my_sql_profile_force" used for this statement

The exact match profile takes precedence of the force match profile.

explain plan for SELECT * FROM t WHERE a = 54;
select * from table(dbms_xplan.display(null,null,'ADVANCED +ADAPTIVE -PROJECTION'));

Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |     6 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |     1 |     6 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------
…
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("A"=54)

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2
---------------------------------------------------------------------------

   0 -  STATEMENT
           -  IGNORE_OPTIM_EMBEDDED_HINTS

   1 -  SEL$1 / T@SEL$1
           -  FULL(@"SEL$1""T"@"SEL$1")

Note
-----
   - SQL profile "my_sql_profile" used for this statement

I am now going to follow the process to export the SQL Profiles to a staging table, and subsequently reimport them.

exec DBMS_SQLTUNE.CREATE_STGTAB_SQLPROF(table_name=>'STAGE',schema_name=>user);
exec DBMS_SQLTUNE.PACK_STGTAB_SQLPROF (staging_table_name =>'STAGE',profile_name=>'my_sql_profile');
exec DBMS_SQLTUNE.PACK_STGTAB_SQLPROF (staging_table_name =>'STAGE',profile_name=>'my_sql_profile_force');

There is a row in the staging table for each profile and you can see the differences between them.

select signature, sql_handle, obj_name, obj_type, sql_text, sqlflags from STAGE;

            SIGNATURE SQL_HANDLE                     OBJ_NAME
--------------------- ------------------------------ ---------------------------------------------------------------------
OBJ_TYPE                       SQL_TEXT                                                                           SQLFLAGS
------------------------------ -------------------------------------------------------------------------------- ----------
  9394869341287877934 SQL_826147e3c6ac0d2e           my_sql_profile
SQL_PROFILE                                                                                                              0
                               SELECT * FROM t WHERE a = 54

 11431056000319719221 SQL_9ea344de32a78735           my_sql_profile_force
SQL_PROFILE                                                                                                              1
                               SELECT * FROM t WHERE a = 54

I will update the staging table using this PL/SQL loop (because SQL doesn't recognise TRUE as a boolean constant).

DECLARE
  l_sig INTEGER;
BEGIN
  FOR i IN (
    SELECT rowid, stage.* FROM stage WHERE sqlflags = 0 FOR UPDATE
  ) LOOP
    l_sig := dbms_sqltune.sqltext_to_signature(i.sql_text,TRUE);
    UPDATE stage
    SET    signature = l_sig
    ,      sqlflags = 1
    WHERE  sqlflags = 0
    AND    rowid = i.rowid;
  END LOOP;
END;
/

And now the profiles look the same.

select signature, sql_handle, obj_name, obj_type, sql_text, sqlflags from STAGE;

            SIGNATURE SQL_HANDLE                     OBJ_NAME
--------------------- ------------------------------ ---------------------------------------------------------------------
OBJ_TYPE                       SQL_TEXT                                                                           SQLFLAGS
------------------------------ -------------------------------------------------------------------------------- ----------
 11431056000319719221 SQL_826147e3c6ac0d2e           my_sql_profile
SQL_PROFILE                                                                                                              1
                               SELECT * FROM t WHERE a = 54

 11431056000319719221 SQL_9ea344de32a78735           my_sql_profile_force
SQL_PROFILE                                                                                                              1
                               SELECT * FROM t WHERE a = 54

But I can't just reimport my_sql_profile from the staging replacing the one in the database because I will get ORA-13841: SQL profile named my_sql_profile already exists for a different signature/category pair. To avoid this error I must either drop the profile or rename it.

I am going to rename the existing exact matching profile, and also disable it and move it to another category to stop it from matching my statement in preference to the force matching profile (see previous post Clashing SQL Profiles - Exact Matching Profiles Take Precedence Over Force Matching Profiles), and thus I can go back to it later if needed.

I will drop my example force matching profile. I no longer need that.

Then, I can reimport the profile from the staging table.

exec dbms_sqltune.alter_sql_profile(name=>'my_sql_profile', attribute_name=>'NAME',value=>'my_old_sql_profile');
exec dbms_sqltune.alter_sql_profile(name=>'my_old_sql_profile', attribute_name=>'CATEGORY',value=>'DO_NOT_USE');
exec dbms_sqltune.alter_sql_profile(name=>'my_old_sql_profile', attribute_name=>'STATUS',value=>'DISABLED');
exec dbms_sqltune.drop_sql_profile('my_sql_profile_force',TRUE);
EXEC DBMS_SQLTUNE.UNPACK_STGTAB_SQLPROF(profile_name => 'my_sql_profile', replace => TRUE, staging_table_name => 'STAGE');

I can see in the SQL profile table that my SQL profile is now force matching, and it has a different signature to the old one that is exact matching.

ttitle off
select * from dba_sql_profiles where name like 'my%sql_profile%';

NAME                           CATEGORY               SIGNATURE SQL_TEXT                                                                         CREATED
------------------------------ ---------- --------------------- -------------------------------------------------------------------------------- ------------------------------
LAST_MODIFIED                  DESCRIPTION          TYPE    STATUS   FOR    TASK_ID TASK_EXEC_NAME       TASK_OBJ_ID TASK_FND_ID TASK_REC_ID TASK_CON_DBID
------------------------------ -------------------- ------- -------- --- ---------- -------------------- ----------- ----------- ----------- -------------
my_old_sql_profile             DO_NOT_USE   9394869341287877934                                                                                  31-JUL-21 10.54.58.694037
                                                                SELECT * FROM t WHERE a = 54
31-JUL-21 10.55.00.000000                           MANUAL  DISABLED NO

my_sql_profile                 DEFAULT     11431056000319719221                                                                                  31-JUL-21 10.55.01.005377
                                                                SELECT * FROM t WHERE a = 54
31-JUL-21 10.55.01.000000                           MANUAL  ENABLED  YES

Both my queries now match the new force matching version of the profile.

explain plan for SELECT * FROM t WHERE a = 42;
ttitle 'Execution plan with force match profile (full scan)'
select * from table(dbms_xplan.display(null,null,'ADVANCED +ADAPTIVE -PROJECTION'));

Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |     6 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |     1 |     6 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------
…
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("A"=42)

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2
---------------------------------------------------------------------------

   0 -  STATEMENT
           -  IGNORE_OPTIM_EMBEDDED_HINTS

   1 -  SEL$1 / T@SEL$1
           -  FULL(@"SEL$1""T"@"SEL$1")

Note
-----
   - SQL profile "my_sql_profile" used for this statement

explain plan for SELECT * FROM t WHERE a = 54;
select * from table(dbms_xplan.display(null,null,'ADVANCED +ADAPTIVE -PROJECTION'));

Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |     6 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |     1 |     6 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

…
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("A"=54)

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2
---------------------------------------------------------------------------

   0 -  STATEMENT
           -  IGNORE_OPTIM_EMBEDDED_HINTS

   1 -  SEL$1 / T@SEL$1
           -  FULL(@"SEL$1""T"@"SEL$1")

Note
-----
   - SQL profile "my_sql_profile" used for this statement

The script used for this demonstration is available on GitHub

↧

Obtaining Trace Files without Access to the Database Server

September 14, 2022, 4:28 am

≫ Next: No Execution Plan Survives Contact with the Optimizer Untransformed

≪ Previous: Alter SQL Profiles from Exact to Force Matching

Why Trace?

For many years, I used database SQL Trace to investigate SQL performance problems. I would trace a process, obtain the trace file, profile it (with Oracle's TKPROF or another profiling tool such as the Method R profiler, TVD$XTAT, or OraSRP), and analyse the profile.

Active Session History (ASH) was introduced in Oracle 10g. Today, it is usually where I start to investigate performance problems. It has the advantage that it is always on, and I can just query ASH data from the Automatic Workload Repository (AWR). However, ASH is only available on Enterprise Edition and requires the Diagnostics Pack licence.

Sometimes, even if available, ASH isn't enough. ASH is based on sampling database activity, while trace is a record of all the SQL activity in a session. Some short-lived behaviour, that doesn't generate many samples, is difficult to investigate with ASH. Sometimes, it is necessary to dig deeper and use SQL trace.

On occasion, you might want to generate other forms trace. For example, an optimizer trace (event 10053) in order to understand how an execution plan was arrived at.

Where is my Trace File?

A trend that I have observed over the years is that is is becoming ever more difficult to get hold of the trace files. If you are not the production DBA, you are unlikely to get access to the database server. Frequently, I find that pre-production performance test databases, which are often clones of the production database, are treated as production systems. After all, they contain production data. The move to the cloud has accelerated that trend. On some cloud services, you have no access to the database server at all!

In the past, I have blogged about using an external table from which the trace file can be queried, a variation of a theme others had also written about. It required certain privileges, a new external table was required for each trace file, and you had to know the name of the trace file, and on which RAC instance it was located.

However, in version 12.2, it is much easier. Oracle has provided some new views that report what trace files are available and then query their contents.

Where Is This Session Writing Its Trace File?

The Automatic Diagnostic Repository (ADR) was first documented in 11g. The view V$DIAG_INFO was introduced in 12c, from which you can query the state of the ADR. This includes the various directory paths to which files are written and the name of the current trace file.

select dbid, con_dbid, name from v$database;
column inst_id format 99 heading 'Inst|ID'
column con_id format 99 heading 'Con|ID'
column name format a22
column value format a95
select * from v$diag_info;

Inst                                                                                                                        Con
  ID NAME                   VALUE                                                                                            ID
---- ---------------------- ----------------------------------------------------------------------------------------------- ---
   1 Diag Enabled           TRUE                                                                                              0
   1 ADR Base               /opt/oracle/psft/db/oracle-server                                                                 0
   1 ADR Home               /opt/oracle/psft/db/oracle-server/diag/rdbms/cdbhcm/CDBHCM                                        0
   1 Diag Trace             /opt/oracle/psft/db/oracle-server/diag/rdbms/cdbhcm/CDBHCM/trace                                  0
   1 Diag Alert             /opt/oracle/psft/db/oracle-server/diag/rdbms/cdbhcm/CDBHCM/alert                                  0
   1 Diag Incident          /opt/oracle/psft/db/oracle-server/diag/rdbms/cdbhcm/CDBHCM/incident                               0
   1 Diag Cdump             /opt/oracle/psft/db/oracle-server/diag/rdbms/cdbhcm/CDBHCM/cdump                                  0
   1 Health Monitor         /opt/oracle/psft/db/oracle-server/diag/rdbms/cdbhcm/CDBHCM/hm                                     0
   1 Default Trace File     /opt/oracle/psft/db/oracle-server/diag/rdbms/cdbhcm/CDBHCM/trace/CDBHCM_ora_27009_unnest.trc      0
   1 Active Problem Count   0                                                                                                 0
   1 Active Incident Count  0                                                                                                 0
   1 ORACLE_HOME            /opt/oracle/psft/db/oracle-server/19.3.0.0                                                        0

What files have been written?

The available files are reported by V$DIAG_TRACE_FILE

column adr_home format a60
column trace_filename format a40
column change_time format a32
column modify_time format a32
column con_id format 999
select *
from v$DIAG_TRACE_FILE
where adr_home = '&adr_Home'
order by modify_time
/

ADR_HOME                                                     TRACE_FILENAME                           CHANGE_TIME                      MODIFY_TIME                      CON_ID
------------------------------------------------------------ ---------------------------------------- -------------------------------- -------------------------------- ------
…
/opt/oracle/psft/db/oracle-server/diag/rdbms/cdbhcm/CDBHCM   CDBHCM_ora_27674_no_unnest.trc           13-SEP-22 02.06.10.000 PM +00:00 13-SEP-22 02.06.10.000 PM +00:00      3
/opt/oracle/psft/db/oracle-server/diag/rdbms/cdbhcm/CDBHCM   CDBHCM_ora_27674_unnest.trc              13-SEP-22 02.06.11.000 PM +00:00 13-SEP-22 02.06.11.000 PM +00:00      3

What is in the file?

I can then extract the contents of the file from V$DIAG_TRACE_FILE_CONTENTS. Each line of the trace is returned in a different row.

This script spools the contents of the current trace file from SQL Plus locally to a file of the same name. It stores the name of the ADR home and its file path and the trace file name to SQL*Plus variables and then uses these to query the trace file contents.

I can generate a trace and then run this script to extract it locally.

REM spooltrc.sql

clear screen
set heading on pages 99 lines 180 verify off echo off trimspool on termout on feedback off
column value format a95
column value new_value adr_home heading 'ADR Home'
select value from v$diag_info where name = 'ADR Home';
column value new_value diag_trace heading 'Diag Trace'
select value from v$diag_info where name = 'Diag Trace';
column value new_value trace_filename heading 'Trace File'
select SUBSTR(value,2+LENGTH('&diag_trace')) value from v$diag_info where name = 'Default Trace File'
/
column adr_home format a60
column trace_filename format a40
column change_time format a32
column modify_time format a32
column con_id format 999
select *
from v$DIAG_TRACE_FILE
where adr_home = '&adr_home'
and trace_filename = '&trace_filename'
/
set head off pages 0 lines 5000 verify off echo off timi off termout off feedback off long 5000
spool &trace_filename
select payload
from v$diag_trace_file_contents
where adr_home = '&adr_home'
and trace_filename = '&trace_filename'
order by line_number
/
spool off
set head on pages 99 lines 180 verify on echo on termout on feedback on

The spooltrc.sql script is available on Github. In a subsequent blog, I will demonstrate how to use it.

The payload is a VARCHAR2 column, so it is easy to search one or several trace files for specific text. This is useful if you are having trouble identifying the trace file of interest.

No Execution Plan Survives Contact with the Optimizer Untransformed

September 20, 2022, 1:00 am

≫ Next: There is no BITOR() in Oracle SQL

≪ Previous: Obtaining Trace Files without Access to the Database Server

One of the benefits of attending Oracle conferences is that by listening and talking to other people I get a different perspective on things. Sometimes, something gives me an idea or reminds me of the importance of something that I don't use often enough. I was talking with Neil Chandler about SQL Query Transformation. We came up with a variation of a well known quote:

Chatting with @ChandlerDBA #aced at #OUGIreland @UKOUG today we came to the conclusion that
"No plan of execution survives contact with the optimizer untransformed!"
— David Kurtz - /*+Go-Faster*/ Consultancy (@davidmkurtz) September 5, 2022

It isn't completely accurate. Not every query gets transformed, but it occurs commonly, it made a good title, and you are reading this blog!

During SQL parse, the optimizer can transform a SQL query into another SQL query that is functionally identical but that results in an execution plan with a lower cost (and therefore should execute more quickly). Sometimes, multiple transformations can be applied to a single statement.

The Oracle documentation describes various forms of transformation. You can see in the execution plan that something has happened, but you can't see the transformed SQL statement directly. However, it can be obtained from the optimizer trace that can be enabled by setting event 10053.

Demonstration

I am going to take a simple SQL query

For the first execution, a NO_UNNEST hint is used to prevent the subquery from being unnested.
Optimizer trace is enabled and disabled by setting and resetting event 10053.
Trace file names are enhanced with TRACEFILE_IDENTIFIER, so I know which trace file relates to which test.
Finally, I use my spooltrc script to spool the trace file locally from V$DIAG_TRACE_FILE_CONTENTS (see previous blog post Obtaining Trace Files without Access to the Database Server).

set pages 99 lines 200 autotrace off
alter session set tracefile_identifier='no_unnest';
alter session set events '10053 trace name context forever, level 1';
select emplid, name, effdt, last_name
from ps_names x
where x.last_name = 'Smith'
and x.name_type = 'PRI'
and x.effdt = (
  SELECT /*+NO_UNNEST*/ MAX(x1.effdt)
  FROM ps_names x1
  WHERE x1.emplid = x.emplid
  AND x1.name_type = x.name_type
  AND x1.effdt <= SYSDATE)
/
alter session set events '10053 trace name context off';
@spooltrc

For the second execution, an UNNEST hint is used to force the optimizer to unnest the sub-query.

alter session set tracefile_identifier='unnest';
alter session set events '10053 trace name context forever, level 1';
select emplid, name, effdt, last_name
from ps_names x
where x.last_name = 'Smith'
and x.name_type = 'PRI'
and x.effdt = (
  SELECT /*+UNNEST*/ MAX(x1.effdt)
  FROM ps_names x1
  WHERE x1.emplid = x.emplid
  AND x1.name_type = x.name_type
  AND x1.effdt <= SYSDATE)
/
alter session set events '10053 trace name context off';
@spooltrc

This is the execution plan from the first trace file for the statement with the NO_UNNEST hint. The select query blocks are simply numbered sequentially and thus are called SEL$1 and SEL$2. SEL$2 is the sub-query that references PS_NAMES with the row source alias X1. No query transformation has occurred.

-------------------------------------------------------+-----------------------------------+
| Id  | Operation                            | Name    | Rows  | Bytes | Cost  | Time      |
-------------------------------------------------------+-----------------------------------+
| 0   | SELECT STATEMENT                     |         |       |       |   122 |           |
| 1   |  TABLE ACCESS BY INDEX ROWID BATCHED | PS_NAMES|     1 |    44 |   120 |  00:00:02 |
| 2   |   INDEX SKIP SCAN                    | PS_NAMES|    11 |       |   112 |  00:00:02 |
| 3   |    SORT AGGREGATE                    |         |     1 |    21 |       |           |
| 4   |     FIRST ROW                        |         |     1 |    21 |     2 |  00:00:01 |
| 5   |      INDEX RANGE SCAN (MIN/MAX)      | PS_NAMES|     1 |    21 |     2 |  00:00:01 |
-------------------------------------------------------+-----------------------------------+
Query Block Name / Object Alias (identified by operation id):
------------------------------------------------------------
 1 - SEL$1                / "X"@"SEL$1"
 2 - SEL$1                / "X"@"SEL$1"
 3 - SEL$2
 5 - SEL$2                / "X1"@"SEL$2"
------------------------------------------------------------
Predicate Information:
----------------------
1 - filter("X"."LAST_NAME"='Smith')
2 - access("X"."NAME_TYPE"='PRI')
2 - filter(("X"."NAME_TYPE"='PRI' AND "X"."EFFDT"=))
5 - access("X1"."EMPLID"=:B1 AND "X1"."NAME_TYPE"=:B2 AND "X1"."EFFDT"<=SYSDATE@!)

Now, let's look at the optimizer trace file for the statement with the UNNEST hint. First, we can see the statement as submitted with its SQL_ID.

Trace file /opt/oracle/psft/db/oracle-server/diag/rdbms/cdbhcm/CDBHCM/trace/CDBHCM_ora_21909_unnest.trc
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.7.0.0.0
…
----- Current SQL Statement for this session (sql_id=7r3mwa86fma5t) -----
select emplid, name, effdt, last_name
from ps_names x
where x.last_name = 'Smith'
and x.name_type = 'PRI'
and x.effdt = (
  SELECT /*+UNNEST*/ MAX(x1.effdt)
  FROM ps_names x1
  WHERE x1.emplid = x.emplid
  AND x1.name_type = x.name_type
  AND x1.effdt <= SYSDATE)
…

Later in the trace, we can see the fully expanded SQL statement preceded by the 'UNPARSED QUERY IS' message.

All the SQL language keywords have been forced into upper case.
All the object and column names have been made upper case to match the objects.
Every column and table is double-quoted which makes them case sensitive.
The columns all have row source aliases.
The row sources (tables in this case) are fully qualified.
Only the literal 'Smith' is in mixed case.

Various unparsed queries may appear in the trace as the optimizer tries and costs different transformations. These are not nicely formatted, the expanded statements are just a long string of text. The first one is the expanded form of the untransformed statement.

Stmt: ******* UNPARSED QUERY IS *******
SELECT "X"."EMPLID""EMPLID","X"."NAME""NAME","X"."EFFDT""EFFDT","X"."LAST_NAME""LAST_NAME" FROM "SYSADM"."PS_NAMES"
"X" WHERE "X"."LAST_NAME"='Smith' AND "X"."NAME_TYPE"='PRI' AND "X"."EFFDT"= (SELECT /*+ UNNEST */ MAX("X1"."EFFDT") 
"MAX(X1.EFFDT)" FROM "SYSADM"."PS_NAMES""X1" WHERE "X1"."EMPLID"="X"."EMPLID" AND "X1"."NAME_TYPE"="X"."NAME_TYPE" AND
"X1"."EFFDT"<=SYSDATE@!)

Here the sub-query has been transformed into an in-line view. I have reformatted it to make it easier to read.

CVM:   Merging complex view SEL$683B0107 (#2) into SEL$C772B8D1 (#1).
qbcp:******* UNPARSED QUERY IS *******
SELECT "X"."EMPLID""EMPLID","X"."NAME""NAME","X"."EFFDT""EFFDT","X"."LAST_NAME""LAST_NAME"
FROM  (SELECT /*+ UNNEST */ MAX("X1"."EFFDT") "MAX(X1.EFFDT)","X1"."EMPLID""ITEM_0","X1"."NAME_TYPE""ITEM_1"
       FROM   "SYSADM"."PS_NAMES""X1"
       WHERE  "X1"."EFFDT"<=SYSDATE@! 
       GROUP BY "X1"."EMPLID","X1"."NAME_TYPE") "VW_SQ_1"
      ,"SYSADM"."PS_NAMES""X"
WHERE "X"."LAST_NAME"='Smith'
AND   "X"."NAME_TYPE"='PRI'
AND   "X"."EFFDT"="VW_SQ_1"."MAX(X1.EFFDT)"
AND   "VW_SQ_1"."ITEM_0"="X"."EMPLID"
AND   "VW_SQ_1"."ITEM_1"="X"."NAME_TYPE"

This is the final form of the statement that was executed and that produced the execution plan. The in-line view has been merged into the parent query. There will only be a final query section if any transformations have occurred. Again, I have reformatted it here to make it easier to read.

Final query after transformations:******* UNPARSED QUERY IS *******
SELECT /*+ UNNEST */ "X"."EMPLID""EMPLID","X"."NAME""NAME","X"."EFFDT""EFFDT",'Smith'"LAST_NAME"
FROM  "SYSADM"."PS_NAMES""X1"
     ,"SYSADM"."PS_NAMES""X"
WHERE "X"."LAST_NAME"='Smith'
AND   "X"."NAME_TYPE"='PRI'
AND   "X1"."EMPLID"="X"."EMPLID"
AND   "X1"."NAME_TYPE"="X"."NAME_TYPE"
AND   "X1"."EFFDT"<=SYSDATE@! 
AND   "X1"."NAME_TYPE"='PRI'
GROUP BY "X1"."NAME_TYPE","X".ROWID,"X"."EFFDT","X"."NAME","X"."EMPLID"
HAVING "X"."EFFDT"=MAX("X1"."EFFDT")
…

PS_NAMES X1 has been moved from the subquery into the main from clause. Instead of a correlated subquery, we now have a two-table join.
The query is grouped by the ROWID on row source X and the other selected columns.
Instead of joining the tables on NAME_TYPE, the literal criterion has been duplicated in X1
A having clause is used to join X.EFFDT to the maximum value of X1.EFFDT.
Instead of selecting LAST_NAME from X, the literal value in the predicate has been put in the select clause.

If we look at the execution plan for the unnested statement we can see X and X1 are now in query block SEL$841DDE77 that has been unnested and merged.

…
----- Explain Plan Dump -----
…
----------------------------------------+-----------------------------------+
| Id  | Operation             | Name    | Rows  | Bytes | Cost  | Time      |
----------------------------------------+-----------------------------------+
| 0   | SELECT STATEMENT      |         |       |       |   139 |           |
| 1   |  FILTER               |         |       |       |       |           |
| 2   |   SORT GROUP BY       |         |     1 |    77 |   139 |  00:00:02 |
| 3   |    NESTED LOOPS       |         |     3 |   231 |   138 |  00:00:02 |
| 4   |     TABLE ACCESS FULL | PS_NAMES|     2 |   112 |   136 |  00:00:02 |
| 5   |     INDEX RANGE SCAN  | PS_NAMES|     1 |    21 |     1 |  00:00:01 |
----------------------------------------+-----------------------------------+
Query Block Name / Object Alias (identified by operation id):
------------------------------------------------------------
 1 - SEL$841DDE77
 4 - SEL$841DDE77         / "X"@"SEL$1"
 5 - SEL$841DDE77         / "X1"@"SEL$2"
------------------------------------------------------------
Predicate Information:
----------------------
1 - filter("EFFDT"=MAX("X1"."EFFDT"))
4 - filter(("X"."LAST_NAME"='Smith' AND "X"."NAME_TYPE"='PRI'))
5 - access("X1"."EMPLID"="X"."EMPLID" AND "X1"."NAME_TYPE"='PRI' AND "X1"."EFFDT"<=SYSDATE@!)
…

The new query block name is a hash value based on the names of other blocks. The presence of such a block name is an indication of query transformation occurring. The query block name is stable and it is referenced in the outline of hints.

"A question that we could ask about the incomprehensible query block names that Oracle generates is: 'are they deterministic?'– is it possible for the same query to give you the same plan while generating different query block names on different versions of Oracle (or different days of the week). The answer is (or should be) no; when Oracle generates a query block name (after supplying the initial defaults of sel$1, sel$2 etc.) it applies a hashing function to the query block names that have gone INTO a transformation to generate the name that it will use for the block that comes OUT of the transformation." - Jonathan Lewis: Query Blocks and Inline Views

As Jonathan points out "the 'Outline Data' section of the report tells us that query block" in my example SEL$841DDE77 "is an 'outline_leaf', in other words, it is a 'final' query block that has actually been subject to independent optimization". We can also see other query block names referenced in OUTLINE hints.

  Outline Data:
  /*+
    BEGIN_OUTLINE_DATA
…
      OUTLINE_LEAF(@"SEL$841DDE77")
      MERGE(@"SEL$683B0107">"SEL$C772B8D1")
      OUTLINE(@"SEL$C772B8D1")
      UNNEST(@"SEL$2")
      OUTLINE(@"SEL$683B0107")
      OUTLINE(@"SEL$7511BFD2")
      OUTLINE(@"SEL$2")
      OUTLINE(@"SEL$1")
      FULL(@"SEL$841DDE77""X"@"SEL$1")
      INDEX(@"SEL$841DDE77""X1"@"SEL$2" ("PS_NAMES"."EMPLID""PS_NAMES"."NAME_TYPE""PS_NAMES"."EFFDT"))
      LEADING(@"SEL$841DDE77""X"@"SEL$1""X1"@"SEL$2")
      USE_NL(@"SEL$841DDE77""X1"@"SEL$2")
    END_OUTLINE_DATA
  */

We can see these query block names being generated in the trace as a number of transformations are applied with some description of the transformation.

Registered qb: SEL$683B0107 0xfc6e3030 (SUBQ INTO VIEW FOR COMPLEX UNNEST SEL$2)
Registered qb: SEL$7511BFD2 0xfc6c5c68 (VIEW ADDED SEL$1)
Registered qb: SEL$C772B8D1 0xfc6c5c68 (SUBQUERY UNNEST SEL$7511BFD2; SEL$2)
Registered qb: SEL$841DDE77 0xfc6d91e0 (VIEW MERGE SEL$C772B8D1; SEL$683B0107; SEL$C772B8D1)

↧

There is no BITOR() in Oracle SQL

October 14, 2022, 8:36 am

≫ Next: Optimizer Panel Group @ UKOUG Breakthough'22 Conference, Thursday 1st December, 4pm.

≪ Previous: No Execution Plan Survives Contact with the Optimizer Untransformed

In Oracle SQL, I can do a bitwise AND of two numbers, but there is no equivalent function to do a bitwise OR. However, it turns out to be really easy to do using BITAND().
I was manipulating some trace values where each binary digit, or bit, corresponds to a different function. I wanted to ensure certain attributes were set. So, I wanted to do a bitwise OR between the current flag value and the value of the bits I wanted to set.
In bitwise OR, if either or both bits are set, then the answer is 1. It is like addition, but when both the bits are 1, the answer is 1 rather than 2. I can add the bits up and then deduct BITAND(). Thus:

BITOR 0 1 + 0 1 BITAND 0 1
0 0 1 = 0 0 1 - 0 0 0
1 1 1 1 1 2 1 0 1

Or I could write it as

BITOR(x,y) = x + y - BITAND(x,y)

Here is a simple example with two decimal numbers expressed in binary. The results of AND and OR operations are below, with their decimal values.

 27 = 00011011
 42 = 00101010

AND = 00001010 = 10
 OR = 00111011 = 59

I can then write a simple SQL expression to calculate this, and perhaps put it into a PL/SQL function thus:

WITH FUNCTION bitor(p1 INTEGER, p2 INTEGER) RETURN INTEGER IS
BEGIN
  RETURN p1+p2-bitand(p1,p2);
END;
SELECT BITAND(27,42) 
,      27+42-BITAND(27,42) 
,      bitor(27,42)
FROM   DUAL
/

BITAND(27,42) 27+42-BITAND(27,42) BITOR(27,42)
------------- ------------------- ------------
           10                  59           59

↧

Optimizer Panel Group @ UKOUG Breakthough'22 Conference, Thursday 1st December, 4pm.

November 29, 2022, 11:02 am

≫ Next: Loading a Flat File from OCI Object Storage into an Autonomous Database. Part 3. Copying data from Object Storage to a Regular Table

≪ Previous: There is no BITOR() in Oracle SQL

Come to the Optimizer Panel Group session at the UKOUG Breakthrough'22 conference in Birmingham. Reply to this tweet to submit a question in advance.

@UKOUG #UKOUGBreakthrough22 is this week!
The Optimizer Panel Group with @JLOracle @VLDBB @dani_schnider @ChandlerDBA @davidmkurtz is on Thursday @ 16:00.
If you have a question for the panel reply here. If you want to hear a question answered, please like the reply. pic.twitter.com/eUNN8ey7ON
— David Kurtz - /*+Go-Faster*/ Consultancy 🇺🇦 (@davidmkurtz) November 29, 2022

↧

Loading a Flat File from OCI Object Storage into an Autonomous Database. Part 3. Copying data from Object Storage to a Regular Table

June 3, 2020, 3:02 am

≫ Next: In the Cloud, Performance is Instrumented as Cost

≪ Previous: Optimizer Panel Group @ UKOUG Breakthough'22 Conference, Thursday 1st December, 4pm.

This blog is the third in a series of three that looks at transferring a file to Oracle Cloud Infrastructure (OCI) Object Storage, and then reading it into the database with an external table or copying it into a regular table.

Putting data files into OCI Object Storage

Reading from OCI Object Storage (using DBMS_CLOUD)

Copy Data into Table

Alternatively, we can copy the data into a normal table. The table needs to be created in advance. This time, I am going to run the copy as user SOE rather than ADMIN. I need to:

Grant connect and resource privilege and quota on the data tablespace.
Grant execute on DBMS_CLOUD to SOE, so it can execute the command.
Grant READ and WRITE access on the DATA_PUMP_DIR directory – the log and bad files created by this process are written to this database directory.

connect admin/Password2020!@gofaster1b_tp 
CREATE USER soe IDENTIFIED BY Password2020;
GRANT CONNECT, RESOURCE TO soe;
GRANT EXECUTE ON DBMS_CLOUD TO soe;
GRANT READ, WRITE ON DIRECTORY data_pump_dir TO soe;
ALTER USER soe QUOTA UNLIMITED ON data;

I am now going to switch to user SOE and create my table.

connect soe/Password2020@gofaster1b_tp
Drop table soe.ash_hist purge;
CREATE TABLE soe.ASH_HIST
   (    SNAP_ID NUMBER,
        DBID NUMBER,
        INSTANCE_NUMBER NUMBER,
        SAMPLE_ID NUMBER,
        SAMPLE_TIME TIMESTAMP (3),
--      SAMPLE_TIME_UTC TIMESTAMP (3),
--      USECS_PER_ROW NUMBER,
        SESSION_ID NUMBER,
        SESSION_SERIAL# NUMBER,
        SESSION_TYPE VARCHAR2(10),
        FLAGS NUMBER,
        USER_ID NUMBER,
-----------------------------------------
        SQL_ID VARCHAR2(13),
        IS_SQLID_CURRENT VARCHAR2(1),
        SQL_CHILD_NUMBER NUMBER,
        SQL_OPCODE NUMBER,
        SQL_OPNAME VARCHAR2(64),
        FORCE_MATCHING_SIGNATURE NUMBER,
        TOP_LEVEL_SQL_ID VARCHAR2(13),
        TOP_LEVEL_SQL_OPCODE NUMBER,
        SQL_PLAN_HASH_VALUE NUMBER,
        SQL_FULL_PLAN_HASH_VALUE NUMBER,
-----------------------------------------
        SQL_ADAPTIVE_PLAN_RESOLVED NUMBER,
        SQL_PLAN_LINE_ID NUMBER,
        SQL_PLAN_OPERATION VARCHAR2(64),
        SQL_PLAN_OPTIONS VARCHAR2(64),
        SQL_EXEC_ID NUMBER,
        SQL_EXEC_START DATE,
        PLSQL_ENTRY_OBJECT_ID NUMBER,
        PLSQL_ENTRY_SUBPROGRAM_ID NUMBER,
        PLSQL_OBJECT_ID NUMBER,
        PLSQL_SUBPROGRAM_ID NUMBER,
-----------------------------------------
        QC_INSTANCE_ID NUMBER,
        QC_SESSION_ID NUMBER,
        QC_SESSION_SERIAL# NUMBER,
        PX_FLAGS NUMBER,
        EVENT VARCHAR2(64),
        EVENT_ID NUMBER,
        SEQ# NUMBER,
        P1TEXT VARCHAR2(64),
        P1 NUMBER,
        P2TEXT VARCHAR2(64),
-----------------------------------------
        P2 NUMBER,
        P3TEXT VARCHAR2(64),
        P3 NUMBER,
        WAIT_CLASS VARCHAR2(64),
        WAIT_CLASS_ID NUMBER,
        WAIT_TIME NUMBER,
        SESSION_STATE VARCHAR2(7),
        TIME_WAITED NUMBER,
        BLOCKING_SESSION_STATUS VARCHAR2(11),
        BLOCKING_SESSION NUMBER,
-----------------------------------------
        BLOCKING_SESSION_SERIAL# NUMBER,
        BLOCKING_INST_ID NUMBER,
        BLOCKING_HANGCHAIN_INFO VARCHAR2(1),
        CURRENT_OBJ# NUMBER,
        CURRENT_FILE# NUMBER,
        CURRENT_BLOCK# NUMBER,
        CURRENT_ROW# NUMBER,
        TOP_LEVEL_CALL# NUMBER,
        TOP_LEVEL_CALL_NAME VARCHAR2(64),
        CONSUMER_GROUP_ID NUMBER,
-----------------------------------------
        XID RAW(8),
        REMOTE_INSTANCE# NUMBER,
        TIME_MODEL NUMBER,
        IN_CONNECTION_MGMT VARCHAR2(1),
        IN_PARSE VARCHAR2(1),
        IN_HARD_PARSE VARCHAR2(1),
        IN_SQL_EXECUTION VARCHAR2(1),
        IN_PLSQL_EXECUTION VARCHAR2(1),
        IN_PLSQL_RPC VARCHAR2(1),
        IN_PLSQL_COMPILATION VARCHAR2(1),
-----------------------------------------
        IN_JAVA_EXECUTION VARCHAR2(1),
        IN_BIND VARCHAR2(1),
        IN_CURSOR_CLOSE VARCHAR2(1),
        IN_SEQUENCE_LOAD VARCHAR2(1),
        IN_INMEMORY_QUERY VARCHAR2(1),
        IN_INMEMORY_POPULATE VARCHAR2(1),
        IN_INMEMORY_PREPOPULATE VARCHAR2(1),
        IN_INMEMORY_REPOPULATE VARCHAR2(1),
        IN_INMEMORY_TREPOPULATE VARCHAR2(1),
--      IN_TABLESPACE_ENCRYPTION VARCHAR2(1),
        CAPTURE_OVERHEAD VARCHAR2(1),
-----------------------------------------
        REPLAY_OVERHEAD VARCHAR2(1),
        IS_CAPTURED VARCHAR2(1),
        IS_REPLAYED VARCHAR2(1),
--      IS_REPLAY_SYNC_TOKEN_HOLDER VARCHAR2(1),
        SERVICE_HASH NUMBER,
        PROGRAM VARCHAR2(64),
        MODULE VARCHAR2(64),
        ACTION VARCHAR2(64),
        CLIENT_ID VARCHAR2(64),
        MACHINE VARCHAR2(64),
        PORT NUMBER,
-----------------------------------------
        ECID VARCHAR2(64),
        DBREPLAY_FILE_ID NUMBER,
        DBREPLAY_CALL_COUNTER NUMBER,
        TM_DELTA_TIME NUMBER,
        TM_DELTA_CPU_TIME NUMBER,
        TM_DELTA_DB_TIME NUMBER,
        DELTA_TIME NUMBER,
        DELTA_READ_IO_REQUESTS NUMBER,
        DELTA_WRITE_IO_REQUESTS NUMBER,
        DELTA_READ_IO_BYTES NUMBER,
-----------------------------------------
        DELTA_WRITE_IO_BYTES NUMBER,
        DELTA_INTERCONNECT_IO_BYTES NUMBER,
        PGA_ALLOCATED NUMBER,
        TEMP_SPACE_ALLOCATED NUMBER,
        DBOP_NAME VARCHAR2(64),
        DBOP_EXEC_ID NUMBER,
        CON_DBID NUMBER,
        CON_ID NUMBER,
-----------------------------------------
        CONSTRAINT ash_hist_pk PRIMARY KEY (dbid, instance_number, snap_id, sample_id, session_id)
   ) 
COMPRESS FOR QUERY LOW
/

As Autonomous Databases run on Exadata, I have also specified Hybrid Columnar Compression (HCC) for this table.
Credentials are specific to the database user. I have to create an additional credential, for the same cloud user, but owned by SOE.

ALTER SESSION SET nls_date_Format='hh24:mi:ss dd.mm.yyyy';
set serveroutput on timi on
BEGIN
  DBMS_CLOUD.CREATE_CREDENTIAL (
    credential_name => 'SOE_BUCKET',
    username=> 'oraclecloud1@go-faster.co.uk',
    password=> 'K7xfi-mG<1Z:dq#88;1m'
  );
END;
/
column owner format a10
column credential_name format a20
column comments format a80
column username format a40
SELECT * FROM dba_credentials;

OWNER      CREDENTIAL_NAME      USERNAME                                 WINDOWS_DOMAIN
---------- -------------------- ---------------------------------------- ------------------------------
COMMENTS                                                                         ENABL
-------------------------------------------------------------------------------- -----
ADMIN      MY_BUCKET            oraclecloud1@go-faster.co.uk
{"comments":"Created via DBMS_CLOUD.create_credential"}                          TRUE

SOE        SOE_BUCKET           oraclecloud1@go-faster.co.uk
{"comments":"Created via DBMS_CLOUD.create_credential"}                          TRUE

The COPY_DATA procedure is similar to CREATE_EXTERNAL_TABLE described in the previous post, but it doesn't have a column list. The field names much match the column names. It is sensitive to field names with a trailing #. These must be enclosed in double-quotes.

TRUNCATE TABLE soe.ash_hist;
DECLARE
  l_operation_id NUMBER;
BEGIN
  DBMS_CLOUD.COPY_DATA(
    table_name =>'ASH_HIST',
    credential_name =>'SOE_BUCKET',
    file_uri_list =>'https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu/b/bucket-20200505-1552/o/dba_hist_active_sess_history.txt.gz',
    schema_name => 'SOE',
    format => json_object('blankasnull'      value 'true'
                         ,'compression'      value 'gzip'
                         ,'dateformat'       value 'YYYY-MM-DD/HH24:mi:ss'
                         ,'timestampformat'  value 'YYYY-MM-DD/HH24:mi:ss.ff'
                         ,'delimiter'        value '<,>'
                         ,'ignoreblanklines' value 'true'
                         ,'rejectlimit'      value '10'
                         ,'removequotes'     value 'true'
                         ,'trimspaces'       value 'lrtrim'
                         ),
    field_list=>'SNAP_ID,DBID,INSTANCE_NUMBER,SAMPLE_ID,SAMPLE_TIME ,SESSION_ID,"SESSION_SERIAL#",SESSION_TYPE,FLAGS,USER_ID
,SQL_ID,IS_SQLID_CURRENT,SQL_CHILD_NUMBER,SQL_OPCODE,SQL_OPNAME,FORCE_MATCHING_SIGNATURE,TOP_LEVEL_SQL_ID,TOP_LEVEL_SQL_OPCODE,SQL_PLAN_HASH_VALUE,SQL_FULL_PLAN_HASH_VALUE
,SQL_ADAPTIVE_PLAN_RESOLVED,SQL_PLAN_LINE_ID,SQL_PLAN_OPERATION,SQL_PLAN_OPTIONS,SQL_EXEC_ID,SQL_EXEC_START,PLSQL_ENTRY_OBJECT_ID,PLSQL_ENTRY_SUBPROGRAM_ID,PLSQL_OBJECT_ID,PLSQL_SUBPROGRAM_ID
,QC_INSTANCE_ID,QC_SESSION_ID,"QC_SESSION_SERIAL#",PX_FLAGS,EVENT,EVENT_ID,"SEQ#",P1TEXT,P1,P2TEXT
,P2,P3TEXT,P3,WAIT_CLASS,WAIT_CLASS_ID,WAIT_TIME,SESSION_STATE,TIME_WAITED,BLOCKING_SESSION_STATUS,BLOCKING_SESSION
,"BLOCKING_SESSION_SERIAL#",BLOCKING_INST_ID,BLOCKING_HANGCHAIN_INFO,"CURRENT_OBJ#","CURRENT_FILE#","CURRENT_BLOCK#","CURRENT_ROW#","TOP_LEVEL_CALL#",TOP_LEVEL_CALL_NAME,CONSUMER_GROUP_ID
,XID,"REMOTE_INSTANCE#",TIME_MODEL,IN_CONNECTION_MGMT,IN_PARSE,IN_HARD_PARSE,IN_SQL_EXECUTION,IN_PLSQL_EXECUTION,IN_PLSQL_RPC,IN_PLSQL_COMPILATION
,IN_JAVA_EXECUTION,IN_BIND,IN_CURSOR_CLOSE,IN_SEQUENCE_LOAD,IN_INMEMORY_QUERY,IN_INMEMORY_POPULATE,IN_INMEMORY_PREPOPULATE,IN_INMEMORY_REPOPULATE,IN_INMEMORY_TREPOPULATE,CAPTURE_OVERHEAD
,REPLAY_OVERHEAD,IS_CAPTURED,IS_REPLAYED,SERVICE_HASH,PROGRAM,MODULE,ACTION,CLIENT_ID,MACHINE,PORT
,ECID,DBREPLAY_FILE_ID,DBREPLAY_CALL_COUNTER,TM_DELTA_TIME,TM_DELTA_CPU_TIME,TM_DELTA_DB_TIME,DELTA_TIME,DELTA_READ_IO_REQUESTS,DELTA_WRITE_IO_REQUESTS,DELTA_READ_IO_BYTES 
,DELTA_WRITE_IO_BYTES,DELTA_INTERCONNECT_IO_BYTES,PGA_ALLOCATED,TEMP_SPACE_ALLOCATED,DBOP_NAME,DBOP_EXEC_ID,CON_DBID,CON_ID',
    operation_id=>l_operation_id
  );
  dbms_output.put_line('Operation ID:'||l_operation_id||' finished successfully');
EXCEPTION WHEN OTHERS THEN
  dbms_output.put_line('Operation ID:'||l_operation_id||' raised an error');
  RAISE;
END;
/

The copy data takes slightly longer than the query on the external table.

Operation ID:31 finished successfully

PL/SQL procedure successfully completed.

Elapsed: 00:02:01.11

The status of the copy operation is reported in USER_LOAD_OPERATIONS. This includes the number of rows loaded and the names of external tables that are created for the log and bad files.

set lines 120
column type format a10
column file_uri_list format a64
column start_time format a32
column update_time format a32
column owner_name format a10
column table_name format a10
column partition_name format a10
column subpartition_name format a10
column logfile_table format a15
column badfile_table format a15
column tempext_table format a30
select * from user_load_operations where id = &operation_id;

        ID TYPE              SID    SERIAL# START_TIME                       UPDATE_TIME                      STATUS
---------- ---------- ---------- ---------- -------------------------------- -------------------------------- ---------
OWNER_NAME TABLE_NAME PARTITION_ SUBPARTITI FILE_URI_LIST                                                    ROWS_LOADED
---------- ---------- ---------- ---------- ---------------------------------------------------------------- -----------
LOGFILE_TABLE   BADFILE_TABLE   TEMPEXT_TABLE
--------------- --------------- ------------------------------
        31 COPY            19965      44088 07-MAY-20 17.03.20.328263 +01:00 07-MAY-20 17.05.36.157680 +01:00 COMPLETED
SOE        ASH_HIST                         https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu     1409305
                                            /b/bucket-20200505-1552/o/dba_hist_active_sess_history.txt.gz
COPY$31_LOG     COPY$31_BAD     COPY$Y2R021UKPJ5F75JCMSKL

An external table is temporarily created by the COPY_DATA procedure but is then dropped before the procedure completes. The bad file is empty because the copy operation succeeded without error, but we can query the copy log.

select * from COPY$31_LOG;

RECORD                                                                                                                  
------------------------------------------------------------------------------------------------------------------------
 LOG file opened at 05/07/20 16:03:21                                                                                   

Total Number of Files=1                                                                                                 

Data File: https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu/b/bucket-20200505-1552/o/dba_hist_active_sess_history.txt.gz                                                                                                        

Log File: COPY$31_105537.log                                                                                            

 LOG file opened at 05/07/20 16:03:21                                                                                   

Total Number of Files=1                                                                                                 

Data File: https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu/b/bucket-20200505-1552/o/dba_hist_active_sess_history.txt.gz                                                                                                        

Log File: COPY$31_105537.log                                                                                            

 LOG file opened at 05/07/20 16:03:21                                                                                   

KUP-05014:   Warning: Intra source concurrency disabled because the URLs specified for the Cloud Service map to compressed data.                                                                                                                

Bad File: COPY$31_105537.bad                                                                                            

Field Definitions for table COPY$Y2R021UKPJ5F75JCMSKL                                                                   
  Record format DELIMITED BY                                                                                            
  Data in file has same endianness as the platform                                                                      
  Rows with all null fields are accepted                                                                                
  Table level NULLIF (Field = BLANKS)                                                                                   
  Fields in Data Source:                                                                                                

    SNAP_ID                         CHAR (255)                                                                          
      Terminated by "<,>"
      Trim whitespace from left and right                                                                               
    DBID                            CHAR (255)                                                                          
      Terminated by "<,>"
      Trim whitespace from left and right                                                                               
    INSTANCE_NUMBER                 CHAR (255)                                                                          
      Terminated by "<,>"
      Trim whitespace from left and right                                                                               
    SAMPLE_ID                       CHAR (255)                                                                          
      Terminated by "<,>"
      Trim whitespace from left and right                                                                               
    SAMPLE_TIME                     CHAR (255)                                                                          
      Date datatype TIMESTAMP, date mask YYYY-MM-DD/HH24:mi:ss.ff                                                       
      Terminated by "<,>"
      Trim whitespace from left and right                                                                               
…
    CON_ID                          CHAR (255)                                                                          
      Terminated by "<,>"
      Trim whitespace from left and right                                                                               

Date Cache Statistics for table COPY$Y2R021UKPJ5F75JCMSKL                                                               
  Date conversion cache disabled due to overflow (default size: 1000)                                                   

365 rows selected.

These files are written to the DATA_DUMP_DIR database directory. We don't have access to the database file system in Autonomous, so Oracle has provided the LIST_FILES procedure in DBMS_CLOUD so that we can see what files are in a directory.

Set pages 99 lines 150
Column object_name format a32
Column created format a32
Column last_modified format a32
Column checksum format a20
SELECT * FROM DBMS_CLOUD.LIST_FILES('DATA_PUMP_DIR');

OBJECT_NAME                           BYTES CHECKSUM             CREATED                          LAST_MODIFIED
-------------------------------- ---------- -------------------- -------------------------------- --------------------------------
…
COPY$31_dflt.log                          0                      07-MAY-20 16.03.20.000000 +00:00 07-MAY-20 16.03.20.000000 +00:00
COPY$31_dflt.bad                          0                      07-MAY-20 16.03.20.000000 +00:00 07-MAY-20 16.03.20.000000 +00:00
COPY$31_105537.log                    13591                      07-MAY-20 16.03.21.000000 +00:00 07-MAY-20 16.05.35.000000 +00:00

Statistics are automatically collected on the table by the copy process because it was done in direct-path mode. We can see the number of rows retrieved corresponds with the number of rows imported by the COPY_DATA procedure.

Set pages 99 lines 140
Column owner format a10
Column IM_STAT_UPDATE_TIME format a30
Select * 
from all_tab_statistics
Where table_name = 'ASH_HIST';

OWNER      TABLE_NAME PARTITION_ PARTITION_POSITION SUBPARTITI SUBPARTITION_POSITION OBJECT_TYPE    NUM_ROWS     BLOCKS EMPTY_BLOCKS
---------- ---------- ---------- ------------------ ---------- --------------------- ------------ ---------- ---------- ------------
 AVG_SPACE  CHAIN_CNT AVG_ROW_LEN AVG_SPACE_FREELIST_BLOCKS NUM_FREELIST_BLOCKS AVG_CACHED_BLOCKS AVG_CACHE_HIT_RATIO IM_IMCU_COUNT
---------- ---------- ----------- ------------------------- ------------------- ----------------- ------------------- -------------
IM_BLOCK_COUNT IM_STAT_UPDATE_TIME             SCAN_RATE SAMPLE_SIZE LAST_ANALYZED       GLO USE STATT STALE_S SCOPE
-------------- ------------------------------ ---------- ----------- ------------------- --- --- ----- ------- -------
SOE        ASH_HIST                                                                  TABLE           1409305      19426            0
         0          0         486                         0                   0
                                                             1409305 15:16:14 07.05.2020 YES NO        NO      SHARED

I can confirm that the data is compressed because the compression type of every row is type 8 (HCC QUERY LOW). See also DBMS_COMPRESSION Compression Types

WITH x AS (
select dbms_compression.get_compression_type('SOE', 'ASH_HIST', rowid) ctype
from soe.ash_hist sample (.1))
Select ctype, count(*) From x group by ctype;

     CTYPE   COUNT(*)
---------- ----------
         8      14097

I can find this SQL Statement in the Performance Hub.

INSERT /*+ append enable_parallel_dml */ INTO "SOE"."ASH_HIST" SELECT * FROM COPY$Y2R021UKPJ5F75JCMSKL

Therefore, the data was queried from the temporary external table into the permanent table, in direct path mode and in parallel.
I can also look at the OCI Performance Hub and see that mode of the time was spent on CPU. I can see the SQL_ID of the insert statement and the call to the DBMS_CLOUD procedure.

I can drill in further to the exact SQL statement.

When I query the table I get exactly the same data as previously with the external table.

set autotrace on timi on lines 180 trimspool on
break on report
compute sum of ash_secs on report
column min(sample_time) format a22
column max(sample_time) format a22
select event, sum(10) ash_Secs, min(sample_time), max(sample_time)
from soe.ash_hist
group by event
order by ash_Secs desc
;

EVENT                                                              ASH_SECS MIN(SAMPLE_TIME)       MAX(SAMPLE_TIME)
---------------------------------------------------------------- ---------- ---------------------- ----------------------
                                                                   10304530 22-MAR-20 09.59.51.125 07-APR-20 23.00.30.395
direct path read                                                    3258500 22-MAR-20 09.59.51.125 07-APR-20 23.00.30.395
SQL*Net more data to client                                          269220 22-MAR-20 10.00.31.205 07-APR-20 22.59.30.275
direct path write temp                                                32400 22-MAR-20 11.39.53.996 07-APR-20 21.43.47.329
gc cr block busy                                                      24930 22-MAR-20 10.51.33.189 07-APR-20 22.56.56.804

latch free                                                               10 28-MAR-20 20.26.11.307 28-MAR-20 20.26.11.307
                                                                 ----------
sum                                                                14093050

86 rows selected.

Elapsed: 00:00:00.62

I can see that the execution plan is now a single serial full scan of the table.

Execution Plan
----------------------------------------------------------
Plan hash value: 1336681691

----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |    84 |  1428 |  1848   (9)| 00:00:01 |
|   1 |  SORT ORDER BY              |          |    84 |  1428 |  1848   (9)| 00:00:01 |
|   2 |   HASH GROUP BY             |          |    84 |  1428 |  1848   (9)| 00:00:01 |
|   3 |    TABLE ACCESS STORAGE FULL| ASH_HIST |  1409K|    22M|  1753   (4)| 00:00:01 |
----------------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
         11  recursive calls
         13  db block gets
      19255  consistent gets
      19247  physical reads
       2436  redo size
       5428  bytes sent via SQL*Net to client
        602  bytes received via SQL*Net from client
          7  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         86  rows processed

↧

In the Cloud, Performance is Instrumented as Cost

March 6, 2023, 8:07 am

≫ Next: Reading Trace files with SQL

≪ Previous: Loading a Flat File from OCI Object Storage into an Autonomous Database. Part 3. Copying data from Object Storage to a Regular Table

About 5 years ago, I was at a conference where someone put this statement up in a PowerPoint slide. (I would like to be able to correctly credit the author, but I can't remember who it was). We all looked it at, thought about and said 'yes, of course' to ourselves. However, as a consultant who specialises in performance optimisation, it has taken until only recently that I started to have conversations with clients that reflect that idea.

In the good old/bad old days of 'on premises'

It is not that long ago that the only option for procuring new hardware was to go through a sizing exercise that involved guessing how much you needed, allowing for future growth in data and processing volumes, and then deciding how much you were actually willing to afford, purchase it, and finally wheel it into your data centre and hope for the best.

It was then normal to want to get the best possible performance out of whatever system was installed on that hardware. It would inevitably slow down over time. Eventually, after the hardware purchase had been fully depreciated, you would have to start the whole cycle again and replace the hardware with newer hardware.

Similarly, Oracle licencing. You would have to licence Oracle for all your CPUs (there are a few exceptions where you can associate specific CPUs to specific VMs and only licence Oracle for the CPUs in those VMs). You would also have to decide how many Oracle features you licenced. Standard or Enterprise Edition? Diagnostics? Tuning? RAC? Partitioning? Compression? In-Memory?

"You are gonna need a bigger boat"

Then when you encountered performance problems you did the best you could with what you had. As a consultant, there was rarely any point in saying to a customer that they had run out of resource and they needed more. The answer was usually along the lines of 'we have spent our money on that, and it has to last for five years, we have no additional budget and it has to work'. So you got on with finding the rabbit in the hat.

Instead of purchasing hardware as a capital expense, in the cloud you rent hardware as an operational expense.

You can bring your own Oracle licence (BYOL), and then you have exactly what you were previously licenced for. "At a high level, one Oracle Processor License maps to two OCPUs."

With Oracle's cloud licencing there are still lots of choices to make, not just how many CPUs and how much memory. You can choose Infrastructure as a Service (IAAS) where you rent the server and install and licence Oracle on it just as you did on-premises. You can choose different storage systems with different I/O profiles. There are different levels of PAAS that have different database features. You can go all the way up to Extreme performance on Exadata. All of these choices have a cost consequence. Oracle provides a Cloud cost estimator tool (other consultancies have produced their own versions). These tools clearly show the link between these choices and their costs very clear.

"You can have as much performance as you are willing to pay for"

I have been working with a customer who is moving a PeopleSoft system from Supercluster on-premises to Exadata Cloud-at-Customer (so it is physically on-site, but in all other respects it is in the cloud). They are not bringing their own licence (BYOL). Instead, they are on a tariff of US$1.3441/OCPU/hr, we have found it easier to talk about US$1000/OCPU/month.

Just as you would with an on-premises system, they went through a sizing exercise that predicted they needed 6 OCPU on each of 2 RAC nodes during the day, and 10 at night.

It has been very helpful to have a clear quantitative definition of acceptable performance for the critical part of the system, the overnight reporting batch. "The reports need to be available to users by the start of the working day in continental Europe, at 8am CET", which is 2am EST. There is no benefit in providing additional resources to allow the batch to finish any earlier. Instead, we only need to provide as much as is necessary to reliably meet the target.

A performance tuning/testing exercise quickly showed that fewer than the predicted number of CPUs were actually needed. 2-4 OCPUs/node during the day is looking comfortable. The new Exadata has fewer but much faster CPUs. As we adjusted the application configuration to match we found we are able to reduce the number of OCPUs.

If we hadn't already been using base-level In Memory feature on Supercluster, then to complete the overnight batch in time for the start of the European working day, we would probably have needed 10 OCPUs/node. The base-level In Memory option brought that down to around 7. This shows the huge value of the careful use of database features and techniques to reduce CPU overhead.

We are not using BYOL, so we can use fully featured In Memory with a larger store. Increasing the In Memory store from 16Gb to 40Gb per node saved another OCPU, but cost nothing. If we had been using BYOL we would have had to pay additionally for fully featured In Memory. I doubt the marginal benefit would have justified the cost.

The customer has been considering switching on the extra OCPUs overnight to facilitate the batch. Doing so costs $1.33/hour, and at the end of the month, they get an invoice from Oracle. That has concentrated minds and changed behaviours. The customer understands that there is a real $ cost/saving to their business decisions.

One day I was asked: "What happens if we reduce the number of CPUs from 6 to 4?"

Essentially the batch will take longer. We are already using the database resource manager to prioritise processes when all the CPU is in use. The resource manager plan has been built to reflect the business priorities, and so keeps it fair for all users. For example, it ensures that users of the online part of the application get CPU in preference to batch processes, this is important for users in Asia who are online when the batch runs overnight in North America. We also use the resource plan to impose different parallel query limits to different groups of processes. If we are going to vary the number of CPUs we will have to switch between different resource manager plans with different limits. We will also have to reduce the number of reports that can be concurrently executed by the application, so some application configuration has to go hand in hand with the database configuration.

Effective caching by the database meant we already did relatively little physical I/O during the reporting. Most of the time was already spent on CPU. Use of In Memory further reduced physical I/O, and now nearly all the time is spent on CPU, but it also reduced the overall CPU consumption and therefore response time.

When we did vary the number of CPUs, we were not surprised to observe, from the Active Session History (ASH), that the total amount of database time spent on CPU by the nVision reporting processes is roughly constant (indicated by the blue area in the below charts). If we reduce the number of concurrent processes, then the batch simply runs for longer.

There is no question that effective design and tuning are as important as they ever were. The laws of physics are the same in the cloud as they are in your own data centre. We worked hard to get the reporting to this level of performance and down to this CPU usage.

The difference is that now you can measure exactly how much that effort is saving you on your cloud subscription, and you can choose to spend more or less on that cloud subscription in order to achieve your business objectives.

Determining the benefit to the business, in terms of the quantity and cost of users' time, remains as difficult as ever. However, it was not a major consideration in this example because this all happens before the users are at work.