Quantcast
Channel: The /*+Go-Faster*/ Oracle Blog
Viewing all 105 articles
Browse latest View live

Retrofitting Partitioning into Existing Applications: Example 3. Workflow: Separate Active and Inactive Rows, and Partial Indexing.

$
0
0
This post is part of a series about the partitioning of database objects.

  1. General Ledger reporting: Typical example of partitioning for data warehouse-style queries
  2. Payroll: Avoiding the need for read-consistency in a typical transaction processing system.
  3. Workflow: Separate active and inactive rows, and partial indexing.
  • Conclusion
  • Workflow

    Workflow is an example of a case where you have a roughly constant volume of active data and an increasing quality of historical inactive data that builds up until such time as it is archived.  Workflow requests are created, worked and closed.  

    The PeopleSoft workflow table has four statuses.

    INSTSTATUS Description
    0 Available
    1 Selected
    2 Worked
    3 Cancelled
    Over time the majority of rows in the table end up with status 2 as they are worked and closed, and a few end up being cancelled. These rows are now inactive. All the workflow activity focuses on statuses 0 and 1. Partitioning can be used to separate the active rows from the inactive. I chose to range partition the worklist table by status, creating a partition of active worklist rows where the status is less than 2, and a partition of inactive rows where status is greater than or equal to 2. I could also have used list partitioning to create the same effect.
    CREATE TABLE PSWORKLIST (
    BUSPROCNAME VARCHAR2(30) NOT NULL,
    ACTIVITYNAME VARCHAR2(30) NOT NULL,
    EVENTNAME VARCHAR2(30) NOT NULL,
    WORKLISTNAME VARCHAR2(30) NOT NULL,

    OPRID VARCHAR2(30) NOT NULL,

    INSTSTATUS SMALLINT NOT NULL,

    )
    PARTITION BY RANGE (INSTSTATUS)
    (
    PARTITION WL_OPEN VALUES LESS THAN (2) PCTFREE 20,
    PARTITION WL_CLOSED VALUES LESS THAN (MAXVALUE) PCTFREE 0
    )
    ENABLE ROW MOVEMENT
    /
    Operators query their worklist queue by their operator ID and the open request status; therefore there is an index to support this query. This index can be locally partitioned, i.e. on INSTSTATUS. 
    The optimizer prunes the partition containing closed worklist requests because it knows the open requests can't be found there, and only queries the open partition. 
    The open partition remains small because as worklist rows are updated to the closed status they are moved to the closed partition. Therefore, row movement must be enabled on the table. Thus, queries for open worklist requests remain small more efficient. 
    There is an additional overhead of moving the rows between partitions as the status is updated to closed, but this is outweighed by the savings of only looking for open records in the open partition. 
    Additional free space is specified on the open partition because that is where all the application update activity occurs. Conversely, no free space is required for the closed partition because after the rows move there, they are not updated until they are purged. 
    From Oracle 12c, it is also possible to partially index a partitioned table. You can choose to build specific partitions in a local index by marking indexing on or off on the table partitions. In this example, it is only necessary to index the open workflow records. The application will never query the closed ones by operator ID, so indexing can be disabled on the closed partition.  Thus saving space and index management overhead.
    ALTER TABLE PSWORKLIST MODIFY PARTITION WL_OPEN INDEXING ON;
    ALTER TABLE PSWORKLIST MODIFY PARTITION WL_CLOSED INDEXING OFF;

    CREATE INDEX PSBPSWORKLIST ON PSWORKLIST (OPRID, INSTSTATUS)
    LOCAL
    INDEXING PARTIAL
    /
    Here is my worklist table with two partitions, and some sample data.  You can see over 90% of the rows are in the closed partition.
    SELECT table_name, partition_name, num_rows, blocks
    FROM dba_tab_statistics
    WHERE table_name = 'PSWORKLIST'
    ORDER BY partition_name nulls first
    /
    TABLE_NAME PARTITION_NAME NUM_ROWS BLOCKS
    ------------------ ------------------------------ ---------- ----------
    PSWORKLIST 100000 2711
    PSWORKLIST WL_CLOSED 90742 2430
    PSWORKLIST WL_OPEN 9258 281
    There is a notional entry in DBA_IND_STATISTICS for the index on the closed partition, but it says that it holds no rows and consumes no blocks. The index-level statistics for index PSBPSWORKLIST are an estimate of the total for the index if all partitions were indexed (although in fact, the B-tree level would actually still have been 1 if I had built the index in my test case).
    SELECT index_name, partition_name, num_rows, blevel, leaf_blocks
    FROM dba_ind_statistics
    WHERE table_name = 'PSWORKLIST'
    ORDER BY index_name, partition_name nulls first
    /
    INDEX_NAME PARTITION_NAME NUM_ROWS BLEVEL LEAF_BLOCKS
    ------------------ ------------------------------ ---------- ---------- -----------
    PSBPSWORKLIST 100000 2 318
    WL_CLOSED 0 0 0
    WL_OPEN 9258 1 27

    PS_PSWORKLIST 100000 2 814
    The index segment for the closed partition does not physically exist, so it is not reported in DBA_SEGMENTS.
    SELECT segment_type, segment_name, partition_name, blocks
    FROM dba_segments
    WHERE segment_name like 'PS_PSWORKLIST'
    ORDER BY segment_name
    /
    SEGMENT_TYPE SEGMENT_NAME PARTITION_NAME BLOCKS
    ------------------ ------------------------------ ------------------------------ ----------
    INDEX PARTITION PSBPSWORKLIST WL_OPEN 128
    INDEX PS_PSWORKLIST 896
    When active working requests are queried, the index may be used.
    SELECT * FROM psworklist WHERE oprid = 'OPRID042' AND inststatus IN(1);

    Plan hash value: 3105966310
    ----------------------------------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
    ----------------------------------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 9 | 1953 | 11 (0)| 00:00:01 | | |
    | 1 | PARTITION RANGE SINGLE | | 9 | 1953 | 11 (0)| 00:00:01 | 1 | 1 |
    | 2 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| PSWORKLIST | 9 | 1953 | 11 (0)| 00:00:01 | 1 | 1 |
    |* 3 | INDEX RANGE SCAN | PSBPSWORKLIST | 9 | | 1 (0)| 00:00:01 | 1 | 1 |
    ----------------------------------------------------------------------------------------------------------------------------
    However, the optimizer may still judge that it is easier to full scan the small table partition.
    SELECT * FROM psworklist WHERE oprid = 'OPRID042' AND inststatus IN(0,1);

    Plan hash value: 1913856494
    -----------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
    -----------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 9 | 1953 | 86 (0)| 00:00:01 | | |
    | 1 | PARTITION RANGE INLIST| | 9 | 1953 | 86 (0)| 00:00:01 |KEY(I) |KEY(I) |
    |* 2 | TABLE ACCESS FULL | PSWORKLIST | 9 | 1953 | 86 (0)| 00:00:01 |KEY(I) |KEY(I) |
    -----------------------------------------------------------------------------------------------------
    A query on closed requests can only full scan the unindexed partition.
    SELECT * FROM psworklist WHERE oprid = 'OPRID042' AND inststatus = 2;

    Plan hash value: 597831193
    -----------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
    -----------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 86 | 18662 | 718 (1)| 00:00:01 | | |
    | 1 | PARTITION RANGE SINGLE| | 86 | 18662 | 718 (1)| 00:00:01 | 2 | 2 |
    |* 2 | TABLE ACCESS FULL | PSWORKLIST | 86 | 18662 | 718 (1)| 00:00:01 | 2 | 2 |
    -----------------------------------------------------------------------------------------------------
    A query across both partitions may choose to use the index where it is available and full scan where it is not.
    SELECT * FROM psworklist WHERE oprid = 'OPRID042';

    Plan hash value: 3927567812
    ------------------------------------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
    ------------------------------------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 101 | 21917 | 730 (1)| 00:00:01 | | |
    | 1 | VIEW | VW_TE_2 | 101 | 58580 | 730 (1)| 00:00:01 | | |
    | 2 | UNION-ALL | | | | | | | |
    | 3 | PARTITION RANGE SINGLE | | 10 | 2170 | 12 (0)| 00:00:01 | 1 | 1 |
    | 4 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| PSWORKLIST | 10 | 2170 | 12 (0)| 00:00:01 | 1 | 1 |
    |* 5 | INDEX RANGE SCAN | PSBPSWORKLIST | 10 | | 2 (0)| 00:00:01 | 1 | 1 |
    | 6 | PARTITION RANGE SINGLE | | 91 | 19747 | 718 (1)| 00:00:01 | 2 | 2 |
    |* 7 | TABLE ACCESS FULL | PSWORKLIST | 91 | 19747 | 718 (1)| 00:00:01 | 2 | 2 |
    ------------------------------------------------------------------------------------------------------------------------------
    In this case, I cannot partition the unique index because the partitioning column does not appear in it. So that must remain a global non-partitioned index.
    CREATE UNIQUE  INDEX PS_PSWORKLIST ON PSWORKLIST (BUSPROCNAME,
    ACTIVITYNAME,
    EVENTNAME,
    WORKLISTNAME,
    INSTANCEID)
    /

    Conclusion

    • Make sure you understand what your application is doing. 
    • Match the partitioning to the way the application accesses data so that the application queries prune partitions. Even if that means that it is harder to archive data. 
    • If you are not getting partition elimination, you probably should not be partitioning. 
    • Range and list partitioning keep similar data values together, so it follows that dissimilar data values are kept apart in different segments. That can avoid I/O during scans, but if it keeps transactions apart it can also avoid read consistency. 
    • Hash partitioning spreads data out across segments and can be used to avoid some forms of contention.
    • Partitioning can separate data with different usage profiles, such as active rows from inactive rows. They might then have different indexing requirements. 
    • Partial indexing of partitioned tables allows you to choose which partitions should be built in a locally partitioned index.

    Retrofitting Partitioning into Existing Applications: Conclusion

    $
    0
    0
    This is the last post in a series about the partitioning of database objects.

    1. General Ledger reporting: Typical example of partitioning for data warehouse-style queries
    2. Payroll: Avoiding the need for read-consistency in a typical transaction processing system.
    3. Workflow: Separate active and inactive rows, and partial indexing.
  • Conclusion
  • Conclusion

    The decisions you make when you introduce partitioning into an existing application are similar to those when you design an application and partitioning together.  The essential difference is that by the time it has been built you probably can't make changes to the application.   So partitioning either works with the application as it is or it doesn't.
    • Make sure you understand what your application is doing. 
    • Match the partitioning to the way the application accesses data so that the application queries prune partitions. Even if that means that it is harder to archive data later on.
    • If you are not getting partition elimination, you probably should not be partitioning. 
    • Range and list partitioning keep similar data values together, so it follows that dissimilar data values are kept apart in different segments. That can avoid I/O during scans, but if it keeps transactions apart it can also avoid read consistency. 
    • Hash partitioning spreads data out across segments and can be used to avoid some forms of contention.
    • Partitioning can separate data with different usage profiles, such as active rows from inactive rows. They might then have different indexing requirements. 
    • Partial indexing of partitioned tables allows you to choose which partitions should be built in a locally partitioned index.

    Oracle 12.2: New Statistic Preference PREFERENCE_OVERRIDES_PARAMETER

    $
    0
    0

    Introduction

    There is a new statistic preference, PREFERENCE_OVERRIDES_PARAMETER available from Oracle 12.2.  It allows the DBA to override any parameters specified when gathering statistics in favour of any statistics preferences that are defined.  This new preference can be specified at database-level or at table-level, or both.

    From the introduction of the cost-based optimizer in Oracle 7, we all had to write scripts to collect statistics.  The introduction of the maintenance window in Oracle 10g was supposed to supersede that with a regularly scheduled maintenance window.  It still is not uncommon to find systems that rely on custom scripts that collect object statistics.  Sometimes, commands to collect statistics are embedded in applications.  

    It remains perfectly reasonable to choose to collect statistics on certain objects at the exact times when it is most appropriate.  For example: just after a table has been populated, or perhaps refresh the statistics on a very large table at a quiet time.  

    Since 11g, Oracle has provided global and table statistics preferences to specify how statistics are to be collected.  This declarative method is generally recommended instead of specifying parameter on calls to DBMS_STATS.  The main advantage is consistency. When statistics are collected on a table, they will always be collected in the same way, including during the maintenance window job.  However, if scripts and programs still specify parameters when they call DBMS_STATS, those parameters will override the preferences.

    Two scenarios in which enabling PREFERENCE_OVERRIDES_PARAMETER would be advantageous come immediately to mind.

    • The hash-based algorithm, introduced in 12c, to calculate the number of distinct values on a column only applies if the ESTIMATE_PERCENT parameter is AUTO_SAMPLE_SIZE, which is the default (see How does AUTO_SAMPLE_SIZE work in Oracle Database 12c? by Nigel Bayliss).  This new algorithm produces more accurate answers than even a large sample size, and much more quickly because there is no need to sort the sampled data for each column.  Therefore, ESTIMATE_PERCENT should no longer be specified.
    • I have found applications that specify a METHOD_OPT parameter to collect histograms that do not make sense.  For example, PeopleSoft used to use FOR ALL INDEXED COLUMNS SIZE 1 in the code that collected statistics.  That would collect column minimum and maximum column values only on index columns, but the column statistics on any unindexed columns simply would not be updated.   If scripts collect histograms that should be retained, then those METHOD_OPTs should be defined as a table preference.

    See also Overriding DBMS_STATS Parameter Settings by Maria Colgan

    Demonstration

    The default value of PREFERENCE_OVERRIDES_PARAMETER is false.  The default is the status quo, parameters override preferences.

    exec dbms_stats.set_global_prefs('PREFERENCE_OVERRIDES_PARAMETER','FALSE');

    I am going to create two tables with 50,000 rows each. 

    DROP TABLE t1 PURGE;
    DROP TABLE t2 PURGE;
    CREATE TABLE t1 AS SELECT * FROM all_objects WHERE rownum <= 50000;
    CREATE UNIQUE INDEX t1_idx ON t1 (owner, object_type, object_name, subobject_name);
    CREATE TABLE t2 AS SELECT /*+NO_GATHER_OPTIMIZER_STATISTICS*/ * FROM all_objects WHERE rownum <= 50000;
    CREATE UNIQUE INDEX t2_idx ON t2 (owner, object_type, object_name, subobject_name);
    @tstats
    REM tstats.sql
    set pages 99 lines 200 trimspool on autotrace off
    column table_name format a10
    column column_name format a30
    column PREFERENCE_OVERRIDES_PARAMETER format a30
    break on report
    alter session set nls_date_Format = 'hh24:mi:ss dd/mm/yyyy';
    spool tstats
    SELECT DBMS_STATS.GET_PREFS('PREFERENCE_OVERRIDES_PARAMETER') AS PREFERENCE_OVERRIDES_PARAMETER
    FROM dual;
    SELECT table_name, sample_size, num_rows, last_analyzed
    FROM user_tables
    WHERE table_name in('T1','T2')
    ORDER BY 1;

    break on table_name skip 1
    SELECT table_name, column_name, num_distinct, num_Buckets, histogram, last_analyzed
    FROM user_tab_columns
    WHERE table_name in('T1','T2')
    order by 1,2;
    spool off

    Real-time statistics were collected on T1.  Note that I have column statistics on T1, but no histograms.  I suppressed statistics collection on T2. 

    If I had run this test on an autonomous database then I would have had histograms because _optimizer_gather_stats_on_load_hist=TRUE

    PREFERENCE_OVERRIDES_PARAMETER
    ------------------------------
    FALSE

    TABLE_NAME SAMPLE_SIZE NUM_ROWS LAST_ANALYZED
    ---------- ----------- ---------- -------------------
    T1 50000 50000 15:33:06 13/11/2020
    T2

    TABLE_NAME COLUMN_NAME NUM_DISTINCT NUM_BUCKETS HISTOGRAM LAST_ANALYZED
    ---------- ------------------------------ ------------ ----------- --------------- -------------------
    T1 APPLICATION 1 1 NONE 15:33:06 13/11/2020
    CREATED 520 1 NONE 15:33:06 13/11/2020
    CREATED_APPID 0 0 NONE 15:33:06 13/11/2020
    CREATED_VSNID 0 0 NONE 15:33:06 13/11/2020
    DATA_OBJECT_ID 343 1 NONE 15:33:06 13/11/2020
    DEFAULT_COLLATION 1 1 NONE 15:33:06 13/11/2020
    DUPLICATED 1 1 NONE 15:33:06 13/11/2020
    EDITIONABLE 1 1 NONE 15:33:06 13/11/2020
    EDITION_NAME 0 0 NONE 15:33:06 13/11/2020
    GENERATED 2 1 NONE 15:33:06 13/11/2020
    LAST_DDL_TIME 736 1 NONE 15:33:06 13/11/2020
    MODIFIED_APPID 0 0 NONE 15:33:06 13/11/2020
    MODIFIED_VSNID 0 0 NONE 15:33:06 13/11/2020
    NAMESPACE 7 1 NONE 15:33:06 13/11/2020
    OBJECT_ID 50000 1 NONE 15:33:06 13/11/2020
    OBJECT_NAME 45268 1 NONE 15:33:06 13/11/2020
    OBJECT_TYPE 24 1 NONE 15:33:06 13/11/2020
    ORACLE_MAINTAINED 2 1 NONE 15:33:06 13/11/2020
    OWNER 8 1 NONE 15:33:06 13/11/2020
    SECONDARY 1 1 NONE 15:33:06 13/11/2020
    SHARDED 1 1 NONE 15:33:06 13/11/2020
    SHARING 4 1 NONE 15:33:06 13/11/2020
    STATUS 1 1 NONE 15:33:06 13/11/2020
    SUBOBJECT_NAME 77 1 NONE 15:33:06 13/11/2020
    TEMPORARY 2 1 NONE 15:33:06 13/11/2020
    TIMESTAMP 602 1 NONE 15:33:06 13/11/2020

    T2 APPLICATION NONE
    CREATED NONE
    CREATED_APPID NONE
    CREATED_VSNID NONE
    DATA_OBJECT_ID NONE
    DEFAULT_COLLATION NONE
    DUPLICATED NONE
    EDITIONABLE NONE
    EDITION_NAME NONE
    GENERATED NONE
    LAST_DDL_TIME NONE
    MODIFIED_APPID NONE
    MODIFIED_VSNID NONE
    NAMESPACE NONE
    OBJECT_ID NONE
    OBJECT_NAME NONE
    OBJECT_TYPE NONE
    ORACLE_MAINTAINED NONE
    OWNER NONE
    SECONDARY NONE
    SHARDED NONE
    SHARING NONE
    STATUS NONE
    SUBOBJECT_NAME NONE
    TEMPORARY NONE
    TIMESTAMP NONE

    I will now gather statistics on both tables with an explicit sample size and a METHOD_OPT that does not collect statistics on unindexed columns.

    EXEC dbms_stats.gather_table_stats(user,'T1',estimate_percent=>10,method_opt=>'FOR ALL INDEXED COLUMNS SIZE 10');
    EXEC dbms_stats.gather_table_stats(user,'T2',estimate_percent=>10,method_opt=>'FOR ALL INDEXED COLUMNS SIZE 10');
    @tstats
    • I can see that I got sample sizes close to 5000 for each table, and the number of rows is 10 times larger, so it was a 10% sample size.
    • There is a unique index on each table on 4 columns.  Only the column statistics for those 4 columns were updated.  On T1, we can see the column statistics were collected at different times.
    TABLE_NAME SAMPLE_SIZE   NUM_ROWS LAST_ANALYZED
    ---------- ----------- ---------- -------------------
    T1 4937 49370 15:33:46 13/11/2020
    T2 5013 50130 15:33:47 13/11/2020


    TABLE_NAME COLUMN_NAME NUM_DISTINCT NUM_BUCKETS HISTOGRAM LAST_ANALYZED
    ---------- ------------------------------ ------------ ----------- --------------- -------------------
    T1 APPLICATION 1 1 NONE 15:33:06 13/11/2020
    CREATED 520 1 NONE 15:33:06 13/11/2020
    CREATED_APPID 0 0 NONE 15:33:06 13/11/2020
    CREATED_VSNID 0 0 NONE 15:33:06 13/11/2020
    DATA_OBJECT_ID 343 1 NONE 15:33:06 13/11/2020
    DEFAULT_COLLATION 1 1 NONE 15:33:06 13/11/2020
    DUPLICATED 1 1 NONE 15:33:06 13/11/2020
    EDITIONABLE 1 1 NONE 15:33:06 13/11/2020
    EDITION_NAME 0 0 NONE 15:33:06 13/11/2020
    GENERATED 2 1 NONE 15:33:06 13/11/2020
    LAST_DDL_TIME 736 1 NONE 15:33:06 13/11/2020
    MODIFIED_APPID 0 0 NONE 15:33:06 13/11/2020
    MODIFIED_VSNID 0 0 NONE 15:33:06 13/11/2020
    NAMESPACE 7 1 NONE 15:33:06 13/11/2020
    OBJECT_ID 50000 1 NONE 15:33:06 13/11/2020
    OBJECT_NAME 40572 1 NONE 15:33:46 13/11/2020
    OBJECT_TYPE 18 10 HEIGHT BALANCED 15:33:46 13/11/2020
    ORACLE_MAINTAINED 2 1 NONE 15:33:06 13/11/2020
    OWNER 7 7 FREQUENCY 15:33:46 13/11/2020
    SECONDARY 1 1 NONE 15:33:06 13/11/2020
    SHARDED 1 1 NONE 15:33:06 13/11/2020
    SHARING 4 1 NONE 15:33:06 13/11/2020
    STATUS 1 1 NONE 15:33:06 13/11/2020
    SUBOBJECT_NAME 8 1 NONE 15:33:46 13/11/2020
    TEMPORARY 2 1 NONE 15:33:06 13/11/2020
    TIMESTAMP 602 1 NONE 15:33:06 13/11/2020

    T2 APPLICATION NONE
    CREATED NONE
    CREATED_APPID NONE
    CREATED_VSNID NONE
    DATA_OBJECT_ID NONE
    DEFAULT_COLLATION NONE
    DUPLICATED NONE
    EDITIONABLE NONE
    EDITION_NAME NONE
    GENERATED NONE
    LAST_DDL_TIME NONE
    MODIFIED_APPID NONE
    MODIFIED_VSNID NONE
    NAMESPACE NONE
    OBJECT_ID NONE
    OBJECT_NAME 43403 1 NONE 15:33:47 13/11/2020
    OBJECT_TYPE 14 10 HEIGHT BALANCED 15:33:47 13/11/2020
    ORACLE_MAINTAINED NONE
    OWNER 7 7 FREQUENCY 15:33:47 13/11/2020
    SECONDARY NONE
    SHARDED NONE
    SHARING NONE
    STATUS NONE
    SUBOBJECT_NAME 10 1 NONE 15:33:47 13/11/2020
    TEMPORARY NONE
    TIMESTAMP NONE

    Now I am going to enable PREFERENCE_OVERRIDES_PARAMETER at database-level, but I am also going to disable it for T1, so the parameters still override the preference.  I would like histograms on the indexed columns on T2, so I am going to specify a METHOD_OPT  table preference.  If I mix FOR ALL COLUMNS  and FOR ALL INDEXED COLUMNS, whichever is specified first will be overridden by the second. So instead, I must explicitly list the indexed columns.

    connect / as sysdba
    exec dbms_stats.set_global_prefs('PREFERENCE_OVERRIDES_PARAMETER','TRUE');
    connect scott/tiger
    exec dbms_stats.set_table_prefs(user,'T1','PREFERENCE_OVERRIDES_PARAMETER','FALSE');
    exec dbms_stats.set_table_prefs(user,'T2','METHOD_OPT'
    ,'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 10 owner, object_type, object_name, subobject_name');

    EXEC dbms_stats.gather_table_stats(user,'T1',estimate_percent=>10,method_opt=>'FOR ALL INDEXED COLUMNS SIZE 10');
    EXEC dbms_stats.gather_table_stats(user,'T2',estimate_percent=>10,method_opt=>'FOR ALL INDEXED COLUMNS SIZE 10');
    @tstats

    • For T2, the sample size and the number of rows are both 50000.  The new NDV algorithm has produced the correct answer.  I didn't get any histograms other than on the columns I specified because there isn't any table usage yet.
    • T1 used the 10% sample size and again only the statistics on the indexed columns were updated.
    • I didn't get a histogram on T1 on OBJECT_NAME and SUBOBJECT_NAME, and I got a height-balanced histogram on OBJECT_TYPE because there were more distinct values than buckets specified.  
    • However, on T2, Oracle collected hybrid histograms on OBJECT_NAME and SUBOBJECT and a top-frequency histogram on OBJECT_TYPE.  This type of histogram is only gathered is ESTIMATE_PERCENT is AUTO_SAMPLE_SIZE.
    PREFERENCE_OVERRIDES_PARAMETER
    ------------------------------
    TRUE


    TABLE_NAME SAMPLE_SIZE NUM_ROWS LAST_ANALYZED
    ---------- ----------- ---------- -------------------
    T1 5000 50000 15:56:58 13/11/2020
    T2 50000 50000 15:56:59 13/11/2020


    TABLE_NAME COLUMN_NAME NUM_DISTINCT NUM_BUCKETS HISTOGRAM LAST_ANALYZED
    ---------- ------------------------------ ------------ ----------- --------------- -------------------
    T1 APPLICATION 1 1 NONE 15:33:06 13/11/2020
    CREATED 520 1 NONE 15:33:06 13/11/2020
    CREATED_APPID 0 0 NONE 15:33:06 13/11/2020
    CREATED_VSNID 0 0 NONE 15:33:06 13/11/2020
    DATA_OBJECT_ID 343 1 NONE 15:33:06 13/11/2020
    DEFAULT_COLLATION 1 1 NONE 15:33:06 13/11/2020
    DUPLICATED 1 1 NONE 15:33:06 13/11/2020
    EDITIONABLE 1 1 NONE 15:33:06 13/11/2020
    EDITION_NAME 0 0 NONE 15:33:06 13/11/2020
    GENERATED 2 1 NONE 15:33:06 13/11/2020
    LAST_DDL_TIME 736 1 NONE 15:33:06 13/11/2020
    MODIFIED_APPID 0 0 NONE 15:33:06 13/11/2020
    MODIFIED_VSNID 0 0 NONE 15:33:06 13/11/2020
    NAMESPACE 7 1 NONE 15:33:06 13/11/2020
    OBJECT_ID 50000 1 NONE 15:33:06 13/11/2020
    OBJECT_NAME 42274 1 NONE 15:56:58 13/11/2020
    OBJECT_TYPE 19 10 HEIGHT BALANCED 15:56:58 13/11/2020
    ORACLE_MAINTAINED 2 1 NONE 15:33:06 13/11/2020
    OWNER 8 8 FREQUENCY 15:56:58 13/11/2020
    SECONDARY 1 1 NONE 15:33:06 13/11/2020
    SHARDED 1 1 NONE 15:33:06 13/11/2020
    SHARING 4 1 NONE 15:33:06 13/11/2020
    STATUS 1 1 NONE 15:33:06 13/11/2020
    SUBOBJECT_NAME 14 1 NONE 15:56:58 13/11/2020
    TEMPORARY 2 1 NONE 15:33:06 13/11/2020
    TIMESTAMP 602 1 NONE 15:33:06 13/11/2020

    T2 APPLICATION 1 1 NONE 15:56:59 13/11/2020
    CREATED 520 1 NONE 15:56:59 13/11/2020
    CREATED_APPID 0 0 NONE 15:56:59 13/11/2020
    CREATED_VSNID 0 0 NONE 15:56:59 13/11/2020
    DATA_OBJECT_ID 343 1 NONE 15:56:59 13/11/2020
    DEFAULT_COLLATION 1 1 NONE 15:56:59 13/11/2020
    DUPLICATED 1 1 NONE 15:56:59 13/11/2020
    EDITIONABLE 1 1 NONE 15:56:59 13/11/2020
    EDITION_NAME 0 0 NONE 15:56:59 13/11/2020
    GENERATED 2 1 NONE 15:56:59 13/11/2020
    LAST_DDL_TIME 736 1 NONE 15:56:59 13/11/2020
    MODIFIED_APPID 0 0 NONE 15:56:59 13/11/2020
    MODIFIED_VSNID 0 0 NONE 15:56:59 13/11/2020
    NAMESPACE 7 1 NONE 15:56:59 13/11/2020
    OBJECT_ID 50000 1 NONE 15:56:59 13/11/2020
    OBJECT_NAME 45268 10 HYBRID 15:56:59 13/11/2020
    OBJECT_TYPE 24 10 TOP-FREQUENCY 15:56:59 13/11/2020
    ORACLE_MAINTAINED 2 1 NONE 15:56:59 13/11/2020
    OWNER 8 8 FREQUENCY 15:56:59 13/11/2020
    SECONDARY 1 1 NONE 15:56:59 13/11/2020
    SHARDED 1 1 NONE 15:56:59 13/11/2020
    SHARING 4 1 NONE 15:56:59 13/11/2020
    STATUS 1 1 NONE 15:56:59 13/11/2020
    SUBOBJECT_NAME 77 10 HYBRID 15:56:59 13/11/2020
    TEMPORARY 2 1 NONE 15:56:59 13/11/2020
    TIMESTAMP 602 1 NONE 15:56:59 13/11/2020

    Conclusion

    I think there is a strong case for enabling PREFERENCE_OVERIDES_PARAMETER at database level on all databases from 12.2.

    exec dbms_stats.set_global_prefs('PREFERENCE_OVERRIDES_PARAMETER','TRUE');

    If your application explicitly collects statistics, or if you have legacy scripts that collect statistics, and either explicitly specify GATHER_TABLE_STATS parameters, then setting this parameter will revert them to the default.  This is particularly valuable in the case of ESTIMATE_PERCENT as it will default to AUTO_SAMPLE_SIZE and you will get improved row count estimates based on the new 12c algorithm will less work. 

    If you don't have this problem in the first place, then enabling PREFERENCE_OVERIDES_PARAMETER at database level will prevent this problem from developing in the future!

    If you have tables with requirements for particular statistics collections (e.g. METHOD_OPT, GRANULARITY etc.) and you don't wish to simply use the defaults, then these variations should be implemented as table statistics preferences. If for some reason that is not possible, PREFERENCE_OVERIDES_PARAMETER can be disabled again at table level, also with a table statistics preference.

    Partition Change Tracking During Materialized View Refresh and Query Rewrite

    $
    0
    0
    This article discusses the interplay of Partitioning, Partition Change Tracking and Query Rewrite in relation to Materialized Views.

    Contents

    Introduction

    In the Oracle database, Materialized Views can be used to create pre-generated reporting tables.  A view of the data based on a SQL query is materialized into a table.  That query may restrict the rows and columns and may aggregate the data.  An application can reference the materialized view directly, or the Oracle database can 'rewrite' SQL queries on the original tables that are similar to query in a materialized view to use that materialized view instead.  
    By default, QUERY_REWRITE_INTEGRITY is enforced, which means Query rewrite works only with materialized views that are up to date (i.e. the underlying data hasn't changed since the materialized view was last refreshed).  This note deals with that scenario.  Optionally, rewrite integrity can be configured to allow rewrite to occur on stale materialized views (this is called 'stale tolerated').  It can be set at system or session-level.
    Partition Change Tracking (PCT) is 'a way of tracking the staleness of a materialized view on the partition and subpartition level'.  If both the materialized view and at least one underlying table in the view are similarly partitioned, then Oracle can determine the relationship between partitions and subpartitions in the underlying table and those in the materialized view.  The database can track not just whether any partition in the underlying tables has been updated since the last refresh of the materialized view, but which ones. During SQL parse, if after partition pruning of the query on the underlying tables, none of the remaining partitions are stale then the query can still be rewritten.  Also, it is possible to refresh just the stale partitions in the materialized view, those that correspond to the underlying partitions that have been updated since the last refresh.
    Query rewrite is a cost-based SQL transformation.  Therefore, it will only occur if the optimizer calculates that the rewritten query has a lower cost.  If I refresh the materialized view in non-atomic mode, then the materialized view will be truncated and populated in direct-path mode, thus the data can be compressed (either with basic compression, or Hybrid-Columnar Compression if on an engineered platform) without the need of the Advanced Compression Licence.  This will further reduce the size and cost of accessing the materialized view and increase the likelihood of query rewrite.
    I have written a series of blogs about retrofitting partitioning into existing applications.  One of my examples was based on PeopleSoft General Ledger reporting in which I discussed options for partitioning the ledger such that there is a different partition for each accounting period.  Once an accounting period is closed the application generally doesn't usually change it further.  It should be possible to create partitioned materialized views on the ledger table to support GL reporting using query rewrite.  As the application continues to insert data into the partition for the current accounting period, that partition will quickly become stale and queries on that partition won't be rewritten.  However, it is common for customers to run suites of reports overnight, and those could be run after a materialized view refresh and make good use of query rewrite. 
    However, as I modelled this, I ran into a few problems that reveal some of the behaviour of PCT, query rewrite and materialized view refresh.  I have created a number of test scripts that illustrate various scenarios that I will describe below.  The full scripts are available on Github.

    Documented Preconditions and Limitations

    Oracle's documentation sets out a number of preconditions for PCT.
    • Partitioned tables must use either range, list or composite partitioning with range or list as the top-level partitioning strategy. - Therefore, hash partitioning is not supported.  What about interval partitioning?  See demonstration 3.
    • The top-level partition key must consist of only a single column. - If, as I proposed, the ledger table is range partitioned on the combination FISCAL_YEAR and ACCOUNTING_PERIOD then PCT will not work (see demonstration 1: Multi-column composite partitioning).  So, are other partitioning strategies viable?
    • The materialized view must contain either the partition key column or a partition marker or ROWID or join dependent expression of the detail table.
    • If you use a GROUP BY clause, the partition key column or the partition marker or ROWID or join dependent expression must be present in the GROUP BY clause.
    Note that, while partition change tracking tracks the staleness on a partition and subpartition level (for composite partitioned tables), the level of granularity for PCT refresh is only the top-level partitioning strategy. Consequently, any change to data in one of the subpartitions of a composite partitioned-table will only mark the single impacted subpartition as stale and have the rest of the table available for rewrite, but the PCT refresh will refresh the whole partition that contains the impacted subpartition.

    Demonstrations

    In each of the following demonstrations, I will create a copy of the PeopleSoft Financials General Ledger table PS_LEDGER, populate it with random data to simulate 2½ years of actuals and 4 years of budget data.  The table will be partitioned differently in each demonstration.  I will also create one or two materialized views that will also be partitioned.  Then I will add data for another accounting period and look at how the materialized view refresh and query rewrite behave when one partition is stale.
    The tests have been run on Oracle 19.9.  Query rewrite is enabled, and rewrite integrity is enforced.
    NAME                                 TYPE        VALUE
    ------------------------------------ ----------- ------------------------------
    query_rewrite_enabled string TRUE
    query_rewrite_integrity string enforced


    Demonstration 1: Multi-column composite partitioning

    I will start with my usual composite partitioning of the ledger table on the combination of FISCAL_YEAR and ACCOUNTING_PERIOD to permit sub-partitioning on LEDGER.
    CREATE TABLE ps_ledger
    (business_unit VARCHAR2(5) NOT NULL
    ,ledger VARCHAR2(10) NOT NULL
    ,account VARCHAR2(10) NOT NULL

    ) PCTFREE 10 PCTUSED 80
    PARTITION BY RANGE (FISCAL_YEAR,ACCOUNTING_PERIOD)
    SUBPARTITION BY LIST (LEDGER)

    SUBPARTITION TEMPLATE
    (SUBPARTITION actuals VALUES ('ACTUALS')
    ,SUBPARTITION budget VALUES ('BUDGET'))
    (PARTITION ledger_2018 VALUES LESS THAN (2019,0) PCTFREE 0 COMPRESS
    --
    ,PARTITION ledger_2019_bf VALUES LESS THAN (2019,1) PCTFREE 0 COMPRESS
    ,PARTITION ledger_2019_01 VALUES LESS THAN (2019,2) PCTFREE 0 COMPRESS

    ,PARTITION ledger_2019_12 VALUES LESS THAN (2019,13) PCTFREE 0 COMPRESS
    ,PARTITION ledger_2019_cf VALUES LESS THAN (2020,0) PCTFREE 0 COMPRESS
    --
    ,PARTITION ledger_2020_bf VALUES LESS THAN (2020,1)
    ,PARTITION ledger_2020_01 VALUES LESS THAN (2020,2)

    ,PARTITION ledger_2020_12 VALUES LESS THAN (2020,13)
    ,PARTITION ledger_2020_cf VALUES LESS THAN (2021,0)
    --
    ,PARTITION ledger_2021_bf VALUES LESS THAN (2021,1)
    ,PARTITION ledger_2021_01 VALUES LESS THAN (2021,2)

    ,PARTITION ledger_2021_12 VALUES LESS THAN (2021,13)
    ,PARTITION ledger_2021_cf VALUES LESS THAN (2022,0)
    )
    ENABLE ROW MOVEMENT
    NOPARALLEL NOLOGGING
    /
    @treeselectors
    @popledger
    I will also create the tree selector tables used as dimension tables in the nVision General Ledger Reports
    REM treeselectors.sql 
    CREATE TABLE PSTREESELECT05
    (SELECTOR_NUM INTEGER NOT NULL,
    TREE_NODE_NUM INTEGER NOT NULL,
    RANGE_FROM_05 VARCHAR2(05) NOT NULL,
    RANGE_TO_05 VARCHAR2(05) NOT NULL)
    PARTITION BY RANGE (SELECTOR_NUM) INTERVAL (1)
    (PARTITION pstreeselector VALUES LESS THAN (2))
    NOPARALLEL NOLOGGING;
    CREATE UNIQUE INDEX PS_PSTREESELECT05 ON PSTREESELECT05 (SELECTOR_NUM, TREE_NODE_NUM, RANGE_FROM_05);

    CREATE TABLE PSTREESELECT10
    (SELECTOR_NUM INTEGER NOT NULL,
    TREE_NODE_NUM INTEGER NOT NULL,
    RANGE_FROM_10 VARCHAR2(10) NOT NULL,
    RANGE_TO_10 VARCHAR2(10) NOT NULL)
    PARTITION BY RANGE (SELECTOR_NUM) INTERVAL (1)
    (PARTITION pstreeselector VALUES LESS THAN (2))
    NOPARALLEL NOLOGGING;
    CREATE UNIQUE INDEX PS_PSTREESELECT10 ON PSTREESELECT10 (SELECTOR_NUM, TREE_NODE_NUM, RANGE_FROM_10);

    exec dbms_stats.set_table_prefs('SCOTT','PSTREESELECT05','GRANULARITY','ALL');
    exec dbms_stats.set_table_prefs('SCOTT','PSTREESELECT10','GRANULARITY','ALL');
    exec dbms_stats.set_table_prefs('SCOTT','PSTREESELECT05','METHOD_OPT'-
    ,'FOR ALL COLUMNS SIZE 1, FOR COLUMNS SELECTOR_NUM, (SELECTOR_NUM, TREE_NODE_NUM) SIZE 254');
    exec dbms_stats.set_table_prefs('SCOTT','PSTREESELECT10','METHOD_OPT'-
    ,'FOR ALL COLUMNS SIZE 1, FOR COLUMNS SELECTOR_NUM, (SELECTOR_NUM, TREE_NODE_NUM) SIZE 254');
    And then I will populate and collect statistics on the ledger with randomised, but skewed, data to simulate 
    • actuals data from fiscal year 2018 to period 6 of 2020
    • budget data from fiscal year 2018 to 2021 that is 10% of the size of the actuals data. 
    Some typical indexes will be built on the ledger table. 
    The tree selector tables will be populated with data corresponding to the ledger data:
    • the business unit tree will have both business units,
    • the account tree will have 25% of the 999 accounts,
    • the chartfield tree will have 10% of the 999 chartfields. 
    Statistics preferences will be defined so that statistics will be collected at all table, partition and subpartition levels on all these tables. There will only be histograms on a few low cardinality columns.
    REM popledger.sql
    set autotrace off echo on pages 99 lines 200 trimspool on
    truncate table ps_ledger;
    exec dbms_stats.set_table_prefs('SCOTT','PS_LEDGER','METHOD_OPT'-
    ,'FOR ALL COLUMNS SIZE 1, FOR COLUMNS FISCAL_YEAR, ACCOUNTING_PERIOD, LEDGER, BUSINESS_UNIT SIZE 254');
    exec dbms_stats.set_table_prefs('SCOTT','PS_LEDGER','GRANULARITY','ALL');
    ALTER TABLE PS_LEDGER PARALLEL 8 NOLOGGING;

    CREATE /*UNIQUE*/ INDEX ps_ledger ON ps_ledger
    (business_unit, ledger, account, deptid
    ,product, fund_code, class_fld, affiliate
    ,chartfield2, project_id, book_code, gl_adjust_type
    ,currency_cd, statistics_code, fiscal_year, accounting_period
    ) COMPRESS 2 PARALLEL
    /
    INSERT /*+APPEND PARALLEL ENABLE_PARALLEL_DML NO_GATHER_OPTIMIZER_STATISTICS*//*IGNORE_ROW_ON_DUPKEY_INDEX(PS_LEDGER)*/
    INTO ps_ledger
    with n as (
    SELECT rownum n from dual connect by level <= 1e2
    ), fy as (
    SELECT 2017+rownum fiscal_year FROM dual CONNECT BY level <= 4
    ), ap as (
    SELECT FLOOR(dbms_random.value(0,13)) accounting_period FROM dual connect by level <= 998
    UNION ALL SELECT 998 FROM DUAL CONNECT BY LEVEL <= 1
    UNION ALL SELECT 999 FROM DUAL CONNECT BY LEVEL <= 1
    ), l as (
    SELECT 'ACTUALS' ledger FROM DUAL CONNECT BY LEVEL <= 10
    UNION ALL SELECT 'BUDGET' FROM DUAL
    )
    select 'BU'||LTRIM(TO_CHAR(CASE WHEN dbms_random.value <= .9 THEN 1 ELSE 2 END,'000')) business_unit
    , l.ledger
    , 'ACC'||LTRIM(TO_CHAR(999*SQRT(dbms_random.value),'000')) account
    , 'ALTACCT'||LTRIM(TO_CHAR(999*dbms_random.value,'000')) altacct
    , 'DEPT'||LTRIM(TO_CHAR(9999*dbms_random.value,'0000')) deptid
    , 'OPUNIT'||LTRIM(TO_CHAR(99*dbms_random.value,'00')) operating_unit
    , 'P'||LTRIM(TO_CHAR(99999*dbms_random.value,'00000')) product
    , 'FUND'||LTRIM(TO_CHAR(9*dbms_random.value,'0')) fund_code
    , 'CLAS'||LTRIM(TO_CHAR(9*dbms_random.value,'0')) class_fld
    , 'PROD'||LTRIM(TO_CHAR(9*dbms_random.value,'0')) program_code
    , '' budget_ref
    , 'AF'||LTRIM(TO_CHAR(999*dbms_random.value,'000')) affiliate
    , 'AFI'||LTRIM(TO_CHAR(99999*dbms_random.value,'00000')) affiliate_intra1
    , 'AFI'||LTRIM(TO_CHAR( 9999*dbms_random.value,'0000')) affiliate_intra2
    , 'CF'||LTRIM(TO_CHAR( 999*SQRT(dbms_random.value),'000')) chartfield1
    , 'CF'||LTRIM(TO_CHAR(99999*dbms_random.value,'00000')) chartfield2
    , 'CF'||LTRIM(TO_CHAR( 9999*dbms_random.value,'0000')) chartfield3
    , 'PRJ'||LTRIM(TO_CHAR(9999*dbms_random.value,'0000')) project_id
    , 'BK'||LTRIM(TO_CHAR(99*dbms_random.value,'00')) book_code
    , 'GL'||LTRIM(TO_CHAR(99*dbms_random.value,'00')) gl_adjust_type
    , 'GBP' currency_cd
    , '' statistics_code
    , fy.fiscal_year
    , ap.accounting_period
    , dbms_random.value(0,1e6) posted_total_amt
    , 0 posted_base_amt
    , 0 posted_tran_amt
    , 'GBP' base_currency
    , SYSDATE dttm_stamp_sec
    , 0 process_instance
    FROM fy,ap, l, n
    WHERE l.ledger = 'BUDGET' or (fy.fiscal_year < 2020 or (fy.fiscal_year = 2020 AND ap.accounting_period <= 6))
    /
    commit;
    exec dbms_stats.gather_table_stats('SCOTT','PS_LEDGER');

    CREATE INDEX psxledger ON ps_ledger
    (ledger, fiscal_year, accounting_period, business_unit, account, chartfield1
    ) LOCAL COMPRESS 4 PARALLEL
    /
    CREATE INDEX psyledger ON ps_ledger
    (ledger, fiscal_year, business_unit, account, chartfield1, accounting_period
    ) LOCAL COMPRESS 3 PARALLEL
    /
    ALTER INDEX ps_ledger NOPARALLEL;
    ALTER INDEX psxledger NOPARALLEL;
    ALTER INDEX psyledger NOPARALLEL;

    TRUNCATE TABLE PSTREESELECT05;
    TRUNCATE TABLE PSTREESELECT10;
    INSERT INTO PSTREESELECT05
    WITH x as (SELECT DISTINCT business_unit FROM ps_ledger)
    , y as (SELECT 30982, FLOOR(DBMS_RANDOM.value(1,1e10)) tree_node_num, business_unit FROM x)
    select y.*, business_unit FROM y
    /
    INSERT INTO PSTREESELECT10
    WITH x as (SELECT DISTINCT account FROM ps_ledger)
    , y as (SELECT 30984, FLOOR(DBMS_RANDOM.value(1,1e10)) tree_node_num, account FROM x)
    select y.*, account FROM y
    where mod(tree_node_num,100)<25
    /
    INSERT INTO PSTREESELECT10
    WITH x as (SELECT DISTINCT chartfield1 FROM ps_ledger)
    , y as (SELECT 30985, FLOOR(DBMS_RANDOM.value(1,1e10)) tree_node_num, chartfield1 FROM x)
    select y.*, chartfield1 FROM y
    where mod(tree_node_num,100)<10
    /
    Per complete fiscal year, there are 1,000,000 actuals rows and 100,000 budget rows
    LEDGER     FISCAL_YEAR   COUNT(*) MAX(ACCOUNTING_PERIOD)
    ---------- ----------- ---------- ----------------------
    ACTUALS 2018 1000000 999
    2019 1000000 999
    2020 538408 6

    BUDGET 2018 100000 999
    2019 100000 999
    2020 100000 999
    2021 100000 999

    ********** ----------
    sum 2938408
    There are about 77K rows per accounting period with just 1000 rows in periods 998 (adjustments), 999 (carry forward)
    LEDGER     FISCAL_YEAR ACCOUNTING_PERIOD   COUNT(*)
    ---------- ----------- ----------------- ----------

    ACTUALS 2019 0 76841
    1 76410
    2 76867
    3 77088
    4 77740
    5 77010
    6 76650
    7 76553
    8 76923
    9 76586
    10 76276
    11 76943
    12 76113
    998 1000
    999 1000

    ********** *********** ----------
    sum 1000000

    ACTUALS 2020 0 77308
    1 76696
    2 76944
    3 77227
    4 76944
    5 76524
    6 76765

    ********** *********** ----------
    sum 538408
    I will create two MVs each containing data for a single fiscal year; one for 2019 and one for 2020 I will only range partition the MV on accounting period. We don't need to partition it on FISCAL_YEAR since it only contains a single year.
    CREATE MATERIALIZED VIEW mv_ledger_2019
    PARTITION BY RANGE (ACCOUNTING_PERIOD)
    (PARTITION ap_bf VALUES LESS THAN (1)
    ,PARTITION ap_01 VALUES LESS THAN (2)
    ,PARTITION ap_02 VALUES LESS THAN (3)
    ,PARTITION ap_03 VALUES LESS THAN (4)
    ,PARTITION ap_04 VALUES LESS THAN (5)
    ,PARTITION ap_05 VALUES LESS THAN (6)
    ,PARTITION ap_06 VALUES LESS THAN (7)
    ,PARTITION ap_07 VALUES LESS THAN (8)
    ,PARTITION ap_08 VALUES LESS THAN (9)
    ,PARTITION ap_09 VALUES LESS THAN (10)
    ,PARTITION ap_10 VALUES LESS THAN (11)
    ,PARTITION ap_11 VALUES LESS THAN (12)
    ,PARTITION ap_12 VALUES LESS THAN (13)
    ,PARTITION ap_cf VALUES LESS THAN (MAXVALUE)
    ) PCTFREE 0 COMPRESS
    REFRESH COMPLETE ON DEMAND
    ENABLE QUERY REWRITE AS
    SELECT business_unit, account, chartfield1, fiscal_year, accounting_period,
    sum(posted_total_amt) posted_total_amt
    FROM ps_ledger
    WHERE fiscal_year = 2019
    AND ledger = 'ACTUALS'
    AND currency_cd = 'GBP'
    GROUP BY business_unit, account, chartfield1, fiscal_year, accounting_period
    /
    CREATE MATERIALIZED VIEW mv_ledger_2020
    PARTITION BY RANGE (ACCOUNTING_PERIOD)
    (PARTITION ap_bf VALUES LESS THAN (1)
    ,PARTITION ap_01 VALUES LESS THAN (2)
    ,PARTITION ap_02 VALUES LESS THAN (3)
    ,PARTITION ap_03 VALUES LESS THAN (4)
    ,PARTITION ap_04 VALUES LESS THAN (5)
    ,PARTITION ap_05 VALUES LESS THAN (6)
    ,PARTITION ap_06 VALUES LESS THAN (7)
    ,PARTITION ap_07 VALUES LESS THAN (8)
    ,PARTITION ap_08 VALUES LESS THAN (9)
    ,PARTITION ap_09 VALUES LESS THAN (10)
    ,PARTITION ap_10 VALUES LESS THAN (11)
    ,PARTITION ap_11 VALUES LESS THAN (12)
    ,PARTITION ap_12 VALUES LESS THAN (13)
    ,PARTITION ap_cf VALUES LESS THAN (MAXVALUE)
    ) PCTFREE 0 COMPRESS
    REFRESH COMPLETE ON DEMAND
    ENABLE QUERY REWRITE AS
    SELECT business_unit, account, chartfield1, fiscal_year, accounting_period,
    sum(posted_total_amt) posted_total_amt
    FROM ps_ledger
    WHERE fiscal_year = 2020
    AND ledger = 'ACTUALS'
    AND currency_cd = 'GBP'
    GROUP BY business_unit, account, chartfield1, fiscal_year, accounting_period
    /
    @mvpop
    @@mvpop
    @@mvsql
    @@pop2020m7
    @@mvsql
    @@mvtrc
    @@mvvol
    @@mvsql
    @@mvcap
    The materialized views are populated on creation, but I will explicitly collect statistics on them.
    SELECT MVIEW_NAME, STALENESS, LAST_REFRESH_TYPE, COMPILE_STATE FROM USER_MVIEWS ORDER BY MVIEW_NAME;

    ALTER MATERIALIZED VIEW mv_ledger_2019 NOPARALLEL;
    exec dbms_stats.set_table_prefs('SCOTT','MV_LEDGER_2019','METHOD_OPT',-
    'FOR ALL COLUMNS SIZE 1, FOR COLUMNS FISCAL_YEAR, ACCOUNTING_PERIOD, BUSINESS_UNIT SIZE 254');
    exec dbms_stats.set_table_prefs('SCOTT','MV_LEDGER_2019','GRANULARITY','ALL');

    ALTER MATERIALIZED VIEW mv_ledger_2020 NOPARALLEL;
    exec dbms_stats.set_table_prefs('SCOTT','MV_LEDGER_2020','METHOD_OPT',-
    'FOR ALL COLUMNS SIZE 1, FOR COLUMNS FISCAL_YEAR, ACCOUNTING_PERIOD, BUSINESS_UNIT SIZE 254');
    exec dbms_stats.set_table_prefs('SCOTT','MV_LEDGER_2020','GRANULARITY','ALL');

    exec dbms_stats.gather_table_stats('SCOTT','MV_LEDGER_2019');
    exec dbms_stats.gather_table_stats('SCOTT','MV_LEDGER_2020');
    Although I can do a full refresh of the MV, I cannot do a PCT refresh.
    BEGIN dbms_mview.refresh(list=>'MV_LEDGER_2020',method=>'P',atomic_refresh=>FALSE); END;

    *
    ERROR at line 1:
    ORA-12047: PCT FAST REFRESH cannot be used for materialized view "SCOTT"."MV_LEDGER_2020"
    ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 3020
    ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 2432
    ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 88
    ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 253
    ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 2413
    ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 2976
    ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 3263
    ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 3295
    ORA-06512: at "SYS.DBMS_SNAPSHOT", line 16
    ORA-06512: at line 1
    I can use EXPLAIN_MVIEW to check the status of the MV
    REM mvcap.sql
    create table MV_CAPABILITIES_TABLE
    (
    statement_id varchar(30) ,
    mvowner varchar(30) ,
    mvname varchar(30) ,
    capability_name varchar(30) ,
    possible character(1) ,
    related_text varchar(2000) ,
    related_num number ,
    msgno integer ,
    msgtxt varchar(2000) ,
    seq number
    ) ;

    truncate table MV_CAPABILITIES_TABLE;
    EXECUTE DBMS_MVIEW.EXPLAIN_MVIEW ('SCOTT.MV_LEDGER_2019');
    EXECUTE DBMS_MVIEW.EXPLAIN_MVIEW ('SCOTT.MV_LEDGER_2020');
    break on mvname skip 1
    column rel_text format a20
    column msgtxt format a60
    SELECT mvname, capability_name, possible, SUBSTR(related_text,1,20) AS rel_text, SUBSTR(msgtxt,1,60) AS msgtxt
    FROM MV_CAPABILITIES_TABLE
    WHERE mvname like 'MV_LEDGER_20%'
    ORDER BY mvname, seq;
    EXPLAIN_MVIEW reports that general query rewrite is available but PCT and PCT query rewrite are not. Per the manual, Oracle simply cannot do a PCT refresh if the table has multi-column partitioning.
    CAPABILITY_NAME                P REL_TEXT             MSGTXT
    ------------------------------ - -------------------- ------------------------------------------------------------
    PCT N
    REFRESH_COMPLETE Y
    REFRESH_FAST N
    REWRITE Y
    PCT_TABLE N PS_LEDGER PCT not supported with multi-column partition key
    REFRESH_FAST_AFTER_INSERT N SCOTT.PS_LEDGER the detail table does not have a materialized view log
    REFRESH_FAST_AFTER_ONETAB_DML N POSTED_TOTAL_AMT SUM(expr) without COUNT(expr)
    REFRESH_FAST_AFTER_ONETAB_DML N see the reason why REFRESH_FAST_AFTER_INSERT is disabled
    REFRESH_FAST_AFTER_ONETAB_DML N COUNT(*) is not present in the select list
    REFRESH_FAST_AFTER_ONETAB_DML N SUM(expr) without COUNT(expr)
    REFRESH_FAST_AFTER_ANY_DML N see the reason why REFRESH_FAST_AFTER_ONETAB_DML is disabled
    REFRESH_FAST_PCT N PCT is not possible on any of the detail tables in the mater
    REWRITE_FULL_TEXT_MATCH Y
    REWRITE_PARTIAL_TEXT_MATCH Y
    REWRITE_GENERAL Y
    REWRITE_PCT N general rewrite is not possible or PCT is not possible on an
    PCT_TABLE_REWRITE N PS_LEDGER PCT not supported with multi-column partition key
    At the moment, the materialized views are up to date.
    SELECT L.TREE_NODE_NUM,L2.TREE_NODE_NUM,SUM(A.POSTED_TOTAL_AMT)
    FROM PS_LEDGER A
    , PSTREESELECT05 L1
    , PSTREESELECT10 L
    , PSTREESELECT10 L2
    WHERE A.LEDGER='ACTUALS'
    AND A.FISCAL_YEAR=2020
    AND (A.ACCOUNTING_PERIOD BETWEEN 1 AND 6)
    AND L1.SELECTOR_NUM=30982 AND A.BUSINESS_UNIT=L1.RANGE_FROM_05
    AND L.SELECTOR_NUM=30985 AND A.CHARTFIELD1=L.RANGE_FROM_10
    AND L2.SELECTOR_NUM=30984 AND A.ACCOUNT=L2.RANGE_FROM_10
    AND A.CURRENCY_CD='GBP'
    GROUP BY L.TREE_NODE_NUM,L2.TREE_NODE_NUM
    /
    And I get MV rewrite because the MV is up to date. Note that Oracle only probed partitions 2 to 7, so it correctly pruned partitions.
    Plan hash value: 3290858815
    --------------------------------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
    --------------------------------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 5573 | 239K| 276 (3)| 00:00:01 | | |
    | 1 | HASH GROUP BY | | 5573 | 239K| 276 (3)| 00:00:01 | | |
    |* 2 | HASH JOIN | | 5573 | 239K| 275 (3)| 00:00:01 | | |
    | 3 | JOIN FILTER CREATE | :BF0000 | 2 | 22 | 1 (0)| 00:00:01 | | |
    |* 4 | INDEX RANGE SCAN | PS_PSTREESELECT05 | 2 | 22 | 1 (0)| 00:00:01 | | |
    | 5 | VIEW | VW_GBC_17 | 5573 | 179K| 274 (3)| 00:00:01 | | |
    | 6 | HASH GROUP BY | | 5573 | 364K| 274 (3)| 00:00:01 | | |
    | 7 | JOIN FILTER USE | :BF0000 | 5573 | 364K| 273 (2)| 00:00:01 | | |
    |* 8 | HASH JOIN | | 5573 | 364K| 273 (2)| 00:00:01 | | |
    |* 9 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 239 | 4541 | 2 (0)| 00:00:01 | | |
    |* 10 | HASH JOIN | | 23295 | 1091K| 270 (2)| 00:00:01 | | |
    |* 11 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 77 | 1386 | 2 (0)| 00:00:01 | | |
    | 12 | PARTITION RANGE ITERATOR | | 301K| 8827K| 267 (2)| 00:00:01 | 2 | 7 |
    |* 13 | MAT_VIEW REWRITE ACCESS FULL| MV_LEDGER_2020 | 301K| 8827K| 267 (2)| 00:00:01 | 2 | 7 |
    --------------------------------------------------------------------------------------------------------------------------
    Now I will add more random data for the financial year 2020, accounting period 7. So there have been changes to just one partition.
    REM pop2020m7.sql
    insert into ps_ledger
    with n as (
    SELECT rownum n from dual connect by level <= 1e6/13
    )
    select 'BU'||LTRIM(TO_CHAR(CASE WHEN dbms_random.value <= .9 THEN 1 ELSE 2 END,'000')) business_unit
    , 'ACTUALS' ledger
    , 'ACC'||LTRIM(TO_CHAR(999*SQRT(dbms_random.value),'000')) account
    , 'ALTACCT'||LTRIM(TO_CHAR(999*dbms_random.value,'000')) altacct
    , 'DEPT'||LTRIM(TO_CHAR(9999*dbms_random.value,'0000')) deptid
    , 'OPUNIT'||LTRIM(TO_CHAR(99*dbms_random.value,'00')) operating_unit
    , 'P'||LTRIM(TO_CHAR(99999*dbms_random.value,'00000')) product
    , 'FUND'||LTRIM(TO_CHAR(9*dbms_random.value,'0')) fund_code
    , 'CLAS'||LTRIM(TO_CHAR(9*dbms_random.value,'0')) class_fld
    , 'PROD'||LTRIM(TO_CHAR(9*dbms_random.value,'0')) program_code
    , '' budget_ref
    , 'AF'||LTRIM(TO_CHAR(999*dbms_random.value,'000')) affiliate
    , 'AFI'||LTRIM(TO_CHAR(99999*dbms_random.value,'00000')) affiliate_intra1
    , 'AFI'||LTRIM(TO_CHAR( 9999*dbms_random.value,'0000')) affiliate_intra2
    , 'CF'||LTRIM(TO_CHAR( 999*SQRT(dbms_random.value),'000')) chartfield1
    , 'CF'||LTRIM(TO_CHAR(99999*dbms_random.value,'00000')) chartfield2
    , 'CF'||LTRIM(TO_CHAR( 9999*dbms_random.value,'0000')) chartfield3
    , 'PRJ'||LTRIM(TO_CHAR(9999*dbms_random.value,'0000')) project_id
    , 'BK'||LTRIM(TO_CHAR(99*dbms_random.value,'00')) book_code
    , 'GL'||LTRIM(TO_CHAR(99*dbms_random.value,'00')) gl_adjust_type
    , 'GBP' currency_cd
    , '' statistics_code
    , 2020 fiscal_year
    , 7 accounting_period
    , dbms_random.value(0,1e6) posted_total_amt
    , 0 posted_base_amt
    , 0 posted_tran_amt
    , 'GBP' base_currency
    , SYSDATE dttm_stamp_sec
    , 0 process_instance
    FROM n
    /
    set lines 200 pages 999 autotrace off
    commit;
    column owner format a10
    column table_name format a15
    column mview_name format a15
    column detailobj_owner format a10 heading 'Detailobj|Owner'
    column detailobj_name format a15
    column detailobj_alias format a20
    column detail_partition_name format a20
    column detail_subpartition_name format a20
    column parent_table_partition format a20
    select * from user_mview_detail_relations;
    select * from user_mview_detail_partition;
    select * from user_mview_detail_subpartition where freshness != 'FRESH';
    SELECT MVIEW_NAME, STALENESS, LAST_REFRESH_TYPE, COMPILE_STATE FROM USER_MVIEWS ORDER BY MVIEW_NAME;
    /
    As soon as I have committed the insert, both the MVs need to be refreshed, even though none of the data queried by MV_LEDGER_2019 was changed. USER_MVIEW_DETAIL_RELATIONS reports PCT not applicable. No individual partitions are listed as stale.
    MVIEW_NAME      STALENESS           LAST_REF COMPILE_STATE
    --------------- ------------------- -------- -------------------
    MV_LEDGER_2019 NEEDS_COMPILE COMPLETE NEEDS_COMPILE
    MV_LEDGER_2020 NEEDS_COMPILE COMPLETE NEEDS_COMPILE

    Detailobj
    OWNER MVIEW_NAME Owner DETAILOBJ_NAME DETAILOBJ DETAILOBJ_ALIAS D NUM_FRESH_PCT_PARTITIONS NUM_STALE_PCT_PARTITIONS
    ---------- --------------- ---------- --------------- --------- -------------------- - ------------------------ ------------------------
    SCOTT MV_LEDGER_2019 SCOTT PS_LEDGER TABLE PS_LEDGER N 86 0
    SCOTT MV_LEDGER_2020 SCOTT PS_LEDGER TABLE PS_LEDGER N 86 0
    I no longer get Query Rewrite for either fiscal year.
    SELECT L.TREE_NODE_NUM,L2.TREE_NODE_NUM,SUM(A.POSTED_TOTAL_AMT)
    FROM PS_LEDGER A
    , PSTREESELECT05 L1
    , PSTREESELECT10 L
    , PSTREESELECT10 L2
    WHERE A.LEDGER='ACTUALS'
    AND A.FISCAL_YEAR=2019
    AND A.ACCOUNTING_PERIOD BETWEEN 1 AND 6
    AND L1.SELECTOR_NUM=30982 AND A.BUSINESS_UNIT=L1.RANGE_FROM_05
    AND L.SELECTOR_NUM=30985 AND A.CHARTFIELD1=L.RANGE_FROM_10
    AND L2.SELECTOR_NUM=30984 AND A.ACCOUNT=L2.RANGE_FROM_10
    AND A.CURRENCY_CD='GBP'
    GROUP BY L.TREE_NODE_NUM,L2.TREE_NODE_NUM
    /

    Plan hash value: 346876754
    -----------------------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
    -----------------------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 492 | 45756 | 2036 (1)| 00:00:01 | | |
    | 1 | HASH GROUP BY | | 492 | 45756 | 2036 (1)| 00:00:01 | | |
    |* 2 | HASH JOIN | | 492 | 45756 | 2035 (1)| 00:00:01 | | |
    |* 3 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 239 | 4541 | 2 (0)| 00:00:01 | | |
    |* 4 | HASH JOIN | | 2055 | 148K| 2033 (1)| 00:00:01 | | |
    | 5 | MERGE JOIN CARTESIAN | | 154 | 4466 | 3 (0)| 00:00:01 | | |
    |* 6 | INDEX RANGE SCAN | PS_PSTREESELECT05 | 2 | 22 | 1 (0)| 00:00:01 | | |
    | 7 | BUFFER SORT | | 77 | 1386 | 2 (0)| 00:00:01 | | |
    |* 8 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 77 | 1386 | 1 (0)| 00:00:01 | | |
    | 9 | PARTITION RANGE ITERATOR| | 26686 | 1172K| 2030 (1)| 00:00:01 | 3 | 8 |
    | 10 | PARTITION LIST SINGLE | | 26686 | 1172K| 2030 (1)| 00:00:01 | 1 | 1 |
    |* 11 | TABLE ACCESS FULL | PS_LEDGER | 26686 | 1172K| 2030 (1)| 00:00:01 | KEY | KEY |
    -----------------------------------------------------------------------------------------------------------------
    Without PCT, I cannot do a partial refresh of a partitioned materialized view, and will not get query rewrite if just a single partition in the underlying table has changed, whether I need it for this query or not. 
    So is there a different partitioning strategy that will permit PCT to work effectively?


    Demonstration 2: Simple 1-Dimensional Range Partitioning 

    Let's start with a simple range partitioned example; one partition per fiscal year.
    CREATE TABLE ps_ledger
    (business_unit VARCHAR2(5) NOT NULL

    ) PCTFREE 10 PCTUSED 80
    PARTITION BY RANGE (FISCAL_YEAR)
    (PARTITION ledger_2018 VALUES LESS THAN (2019) PCTFREE 0 COMPRESS
    ,PARTITION ledger_2019 VALUES LESS THAN (2020) PCTFREE 0 COMPRESS
    ,PARTITION ledger_2020 VALUES LESS THAN (2021) PCTFREE 10 NOCOMPRESS
    ,PARTITION ledger_2021 VALUES LESS THAN (2022) PCTFREE 10 NOCOMPRESS)
    ENABLE ROW MOVEMENT
    NOPARALLEL NOLOGGING
    /
    @treeselectors
    @popledger
    Now I am going to build a materialized view to summarise the ledger data by BUSINESS_UNIT, ACCOUNT and CHARTFIELD1, and of course by FISCAL_YEAR and ACCOUNTING_PERIOD.
    CREATE MATERIALIZED VIEW mv_ledger_2020
    PARTITION BY RANGE (FISCAL_YEAR)
    (PARTITION ledger_2019 VALUES LESS THAN (2020)
    ,PARTITION ledger_2020 VALUES LESS THAN (2021)
    ) PCTFREE 0 COMPRESS NOPARALLEL
    REFRESH COMPLETE ON DEMAND
    ENABLE QUERY REWRITE AS
    SELECT business_unit, ledger, account, chartfield1, fiscal_year, accounting_period,
    sum(posted_total_amt) posted_total_amt
    FROM ps_ledger
    WHERE fiscal_year >= 2019
    AND ledger = 'ACTUALS'
    AND currency_cd = 'GBP'
    GROUP BY business_unit, ledger, account, chartfield1, fiscal_year, accounting_period
    /
    @mvpop
    I can see the MV has partitions for 2019 and 2020 populated, and they contain fewer rows than the original.
                                              Sub-                                             Rows
    Part Part per
    TABLE_NAME Pos PARTITION_NAME Pos SUBPARTITION_NAME NUM_ROWS BLOCKS Block COMPRESS COMPRESS_FOR
    --------------- ---- -------------------- ---- ------------------------- -------- ------ ------ -------- -------------------
    MV_LEDGER_2020 1 LEDGER_2019 ENABLED BASIC
    2 LEDGER_2020 ENABLED BASIC
    1456077 4864 299.4

    PS_LEDGER 1 LEDGER_2018 1100000 17893 61.5 ENABLED BASIC
    2 LEDGER_2019 1100000 17892 61.5 ENABLED BASIC
    3 LEDGER_2020 637915 16456 38.8 DISABLED
    4 LEDGER_2021 100000 2559 39.1 DISABLED
    2937915 54800 53.6
    When I query 2018 ledger data, for which there is no materialized view, the execution plan shows that Oracle full scanned only the first partition of the PS_LEDGER table that contains the 2018 data. It eliminated the other partitions.
    Plan hash value: 1780139226
    ---------------------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
    ---------------------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 822 | 76446 | 4883 (1)| 00:00:01 | | |
    | 1 | HASH GROUP BY | | 822 | 76446 | 4883 (1)| 00:00:01 | | |
    |* 2 | HASH JOIN | | 822 | 76446 | 4882 (1)| 00:00:01 | | |
    |* 3 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 228 | 4332 | 2 (0)| 00:00:01 | | |
    |* 4 | HASH JOIN | | 3601 | 260K| 4880 (1)| 00:00:01 | | |
    | 5 | MERGE JOIN CARTESIAN | | 180 | 5220 | 3 (0)| 00:00:01 | | |
    |* 6 | INDEX RANGE SCAN | PS_PSTREESELECT05 | 2 | 22 | 1 (0)| 00:00:01 | | |
    | 7 | BUFFER SORT | | 90 | 1620 | 2 (0)| 00:00:01 | | |
    |* 8 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 90 | 1620 | 1 (0)| 00:00:01 | | |
    | 9 | PARTITION RANGE SINGLE| | 39970 | 1756K| 4877 (1)| 00:00:01 | 1 | 1 |
    |* 10 | TABLE ACCESS FULL | PS_LEDGER | 39970 | 1756K| 4877 (1)| 00:00:01 | 1 | 1 |
    ---------------------------------------------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------
    2 - access("A"."ACCOUNT"="L2"."RANGE_FROM_10")
    3 - access("L2"."SELECTOR_NUM"=30984)
    4 - access("A"."BUSINESS_UNIT"="L1"."RANGE_FROM_05" AND "A"."CHARTFIELD1"="L"."RANGE_FROM_10")
    6 - access("L1"."SELECTOR_NUM"=30982)
    8 - access("L"."SELECTOR_NUM"=30985)
    10 - filter("A"."ACCOUNTING_PERIOD"<=6 AND "A"."LEDGER"='ACTUALS' AND "A"."FISCAL_YEAR"=2018 AND
    "A"."ACCOUNTING_PERIOD">=1 AND "A"."CURRENCY_CD"='GBP')
    When I query the 2020 data, Oracle has rewritten the query to use the second partition of the materialised view. Again it only queried a single partition.
    Plan hash value: 4006930814
    ----------------------------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
    ----------------------------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1088 | 88128 | 674 (2)| 00:00:01 | | |
    | 1 | HASH GROUP BY | | 1088 | 88128 | 674 (2)| 00:00:01 | | |
    |* 2 | HASH JOIN | | 1088 | 88128 | 673 (2)| 00:00:01 | | |
    |* 3 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 228 | 4332 | 2 (0)| 00:00:01 | | |
    |* 4 | HASH JOIN | | 4767 | 288K| 671 (2)| 00:00:01 | | |
    | 5 | MERGE JOIN CARTESIAN | | 180 | 5220 | 3 (0)| 00:00:01 | | |
    |* 6 | INDEX RANGE SCAN | PS_PSTREESELECT05 | 2 | 22 | 1 (0)| 00:00:01 | | |
    | 7 | BUFFER SORT | | 90 | 1620 | 2 (0)| 00:00:01 | | |
    |* 8 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 90 | 1620 | 1 (0)| 00:00:01 | | |
    | 9 | PARTITION RANGE SINGLE | | 52909 | 1705K| 668 (2)| 00:00:01 | 2 | 2 |
    |* 10 | MAT_VIEW REWRITE ACCESS FULL| MV_LEDGER_2020 | 52909 | 1705K| 668 (2)| 00:00:01 | 2 | 2 |
    ----------------------------------------------------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------
    2 - access("MV_LEDGER_2020"."ACCOUNT"="L2"."RANGE_FROM_10")
    3 - access("L2"."SELECTOR_NUM"=30984)
    4 - access("MV_LEDGER_2020"."BUSINESS_UNIT"="L1"."RANGE_FROM_05" AND
    "MV_LEDGER_2020"."CHARTFIELD1"="L"."RANGE_FROM_10")
    6 - access("L1"."SELECTOR_NUM"=30982)
    8 - access("L"."SELECTOR_NUM"=30985)
    10 - filter("MV_LEDGER_2020"."ACCOUNTING_PERIOD"<=6 AND "MV_LEDGER_2020"."FISCAL_YEAR"=2020 AND
    "MV_LEDGER_2020"."ACCOUNTING_PERIOD">=1)
    Now I am going to simulate running financial processing for period 7 in fiscal year 2020, by inserting data into PS_LEDGER for that period.
    @pop2020m7.sql
    The materialised view status and staleness on USER_MVIEWS changes to NEEDS_COMPILE when the insert into PS_LEDGER is committed. 
    • USER_MVIEW_DETAIL_RELATIONS shows that 1 tracked partition is stale but three are still fresh. 
    • USER_MVIEW_DETAIL_PARTITION shows the tracking status of each source partition. We can see that the LEDGER_2020 partition on PS_LEDGER is stale but the others are still fresh.
    22:00:01 SQL> SELECT MVIEW_NAME, STALENESS, LAST_REFRESH_TYPE, COMPILE_STATE FROM USER_MVIEWS ORDER BY MVIEW_NAME;

    MVIEW_NAME STALENESS LAST_REF COMPILE_STATE
    --------------- ------------------- -------- -------------------
    MV_LEDGER_2020 NEEDS_COMPILE COMPLETE NEEDS_COMPILE

    22:00:01 SQL> select * from user_mview_detail_relations;

    Detailobj
    OWNER MVIEW_NAME Owner DETAILOBJ_NAME DETAILOBJ DETAILOBJ_ALIAS D NUM_FRESH_PCT_PARTITIONS NUM_STALE_PCT_PARTITIONS
    ---------- --------------- ---------- --------------- --------- -------------------- - ------------------------ ------------------------
    SCOTT MV_LEDGER_2020 SCOTT PS_LEDGER TABLE PS_LEDGER Y 3 1

    22:00:01 SQL> select * from user_mview_detail_partition;

    Detailobj
    OWNER MVIEW_NAME Owner DETAILOBJ_NAME DETAIL_PARTITION_NAM DETAIL_PARTITION_POSITION FRESHNE LAST_REFRESH_TIME
    ---------- --------------- ---------- --------------- -------------------- ------------------------- ------- -------------------
    SCOTT MV_LEDGER_2020 SCOTT PS_LEDGER LEDGER_2018 1 FRESH 21:59:41 15/11/2020
    SCOTT MV_LEDGER_2020 SCOTT PS_LEDGER LEDGER_2019 2 FRESH 21:59:41 15/11/2020
    SCOTT MV_LEDGER_2020 SCOTT PS_LEDGER LEDGER_2020 3 STALE 21:59:41 15/11/2020
    SCOTT MV_LEDGER_2020 SCOTT PS_LEDGER LEDGER_2021 4 FRESH 21:59:41 15/11/2020
    The query on 2019 still rewrites because the 2019 partition is fresh
    Plan hash value: 4006930814
    ----------------------------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
    ----------------------------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1088 | 88128 | 674 (2)| 00:00:01 | | |
    | 1 | HASH GROUP BY | | 1088 | 88128 | 674 (2)| 00:00:01 | | |
    |* 2 | HASH JOIN | | 1088 | 88128 | 673 (2)| 00:00:01 | | |
    |* 3 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 228 | 4332 | 2 (0)| 00:00:01 | | |
    |* 4 | HASH JOIN | | 4767 | 288K| 671 (2)| 00:00:01 | | |
    | 5 | MERGE JOIN CARTESIAN | | 180 | 5220 | 3 (0)| 00:00:01 | | |
    |* 6 | INDEX RANGE SCAN | PS_PSTREESELECT05 | 2 | 22 | 1 (0)| 00:00:01 | | |
    | 7 | BUFFER SORT | | 90 | 1620 | 2 (0)| 00:00:01 | | |
    |* 8 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 90 | 1620 | 1 (0)| 00:00:01 | | |
    | 9 | PARTITION RANGE SINGLE | | 52909 | 1705K| 668 (2)| 00:00:01 | 1 | 1 |
    |* 10 | MAT_VIEW REWRITE ACCESS FULL| MV_LEDGER_2020 | 52909 | 1705K| 668 (2)| 00:00:01 | 1 | 1 |
    ----------------------------------------------------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------
    2 - access("MV_LEDGER_2020"."ACCOUNT"="L2"."RANGE_FROM_10")
    3 - access("L2"."SELECTOR_NUM"=30984)
    4 - access("MV_LEDGER_2020"."BUSINESS_UNIT"="L1"."RANGE_FROM_05" AND
    "MV_LEDGER_2020"."CHARTFIELD1"="L"."RANGE_FROM_10")
    6 - access("L1"."SELECTOR_NUM"=30982)
    8 - access("L"."SELECTOR_NUM"=30985)
    10 - filter("MV_LEDGER_2020"."ACCOUNTING_PERIOD"<=6 AND "MV_LEDGER_2020"."FISCAL_YEAR"=2019 AND
    "MV_LEDGER_2020"."ACCOUNTING_PERIOD">=1
    )
    But we no longer get rewrite on the 2020 partition because it is stale. The query stays on PS_LEDGER.
    Plan hash value: 1780139226

    ---------------------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
    ---------------------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 477 | 44361 | 4483 (1)| 00:00:01 | | |
    | 1 | HASH GROUP BY | | 477 | 44361 | 4483 (1)| 00:00:01 | | |
    |* 2 | HASH JOIN | | 477 | 44361 | 4482 (1)| 00:00:01 | | |
    |* 3 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 228 | 4332 | 2 (0)| 00:00:01 | | |
    |* 4 | HASH JOIN | | 2090 | 151K| 4479 (1)| 00:00:01 | | |
    | 5 | MERGE JOIN CARTESIAN | | 180 | 5220 | 3 (0)| 00:00:01 | | |
    |* 6 | INDEX RANGE SCAN | PS_PSTREESELECT05 | 2 | 22 | 1 (0)| 00:00:01 | | |
    | 7 | BUFFER SORT | | 90 | 1620 | 2 (0)| 00:00:01 | | |
    |* 8 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 90 | 1620 | 1 (0)| 00:00:01 | | |
    | 9 | PARTITION RANGE SINGLE| | 23179 | 1018K| 4476 (1)| 00:00:01 | 3 | 3 |
    |* 10 | TABLE ACCESS FULL | PS_LEDGER | 23179 | 1018K| 4476 (1)| 00:00:01 | 3 | 3 |
    ---------------------------------------------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------
    2 - access("A"."ACCOUNT"="L2"."RANGE_FROM_10")
    3 - access("L2"."SELECTOR_NUM"=30984)
    4 - access("A"."BUSINESS_UNIT"="L1"."RANGE_FROM_05" AND "A"."CHARTFIELD1"="L"."RANGE_FROM_10")
    6 - access("L1"."SELECTOR_NUM"=30982)
    8 - access("L"."SELECTOR_NUM"=30985)
    10 - filter("A"."ACCOUNTING_PERIOD"<=6 AND "A"."LEDGER"='ACTUALS' AND "A"."FISCAL_YEAR"=2020 AND
    "A"."ACCOUNTING_PERIOD">=1
    AND "A"."CURRENCY_CD"='GBP')
    So now I have to refresh the view. I am going to use 
    • method P to indicate that it should use PCT,
    • atomic refresh is set to false because I want Oracle to truncate the partition and repopulate it in direct path mode so that the data is compressed (because I am not licenced for advanced compression).
    I am also going to trace the refresh process so I can see what actually happened. I will give the trace file an identifying suffix to make it easier to find. I can query the trace file name from v$diag_info
    I need to collect statistics myself, or they won't be updated.
    REM mvtrc.sql
    disconnect
    connect scott/tiger@oracle_pdb

    column name format a20
    column value format a70
    alter session set tracefile_identifier=PCT;
    select * from v$diag_info where name like '%Trace%';

    alter session set sql_trace = true;
    exec dbms_mview.refresh(list=>'MV_LEDGER_2019',method=>'P',atomic_refresh=>FALSE);
    exec dbms_mview.refresh(list=>'MV_LEDGER_2020',method=>'P',atomic_refresh=>FALSE);

    alter session set sql_trace = false;
    exec dbms_stats.gather_Table_stats(user,'MV_LEDGER_2019');
    exec dbms_stats.gather_Table_stats(user,'MV_LEDGER_2020');
    v$diag_info indicates the trace file
    INST_ID NAME VALUE CON_ID
    ---------- -------------------- ---------------------------------------------------------------------- ----------
    1 Diag Trace /u01/app/oracle/diag/rdbms/oracle/oracle/trace 0
    1 Default Trace File /u01/app/oracle/diag/rdbms/oracle/oracle/trace/oracle_ora_7802_PCT.trc 0
    I can see the total number of rows in MV_LEDGER_2020 has gone up from 1455085 to 1528980, reflecting the rows I inserted.
                                              Sub-                                             Rows
    Part Part per
    TABLE_NAME Pos PARTITION_NAME Pos SUBPARTITION_NAME NUM_ROWS BLOCKS Block COMPRESS COMPRESS_FOR
    --------------- ---- -------------------- ---- ------------------------- -------- ------ ------ -------- ------------------------------
    MV_LEDGER_2020 1 LEDGER_2019 946825 3173 298.4 ENABLED BASIC
    2 LEDGER_2020 582155 1926 302.3 ENABLED BASIC
    1528980 5099 299.9

    PS_LEDGER 1 LEDGER_2018 1100000 17893 61.5 ENABLED BASIC
    2 LEDGER_2019 1100000 17892 61.5 ENABLED BASIC
    3 LEDGER_2020 637915 16456 38.8 DISABLED
    4 LEDGER_2021 100000 2559 39.1 DISABLED
    2937915 54800 53.6
    I am just going to pick out the statements from the trace that alter the materialized view. I can see the LEDGER_2020 partition was truncated and then the data for the stale ledger partition is reinserted in direct path mode, so it will have been compressed. Statistics confirm this as I can calculate that the number of rows per block is still around 300.

    /* MV_REFRESH (ATB) */ ALTER TABLE "SCOTT"."MV_LEDGER_2020" TRUNCATE PARTITION LEDGER_2020

    /* MV_REFRESH (INS) */ INSERT /*+ APPEND BYPASS_RECURSIVE_CHECK */ INTO "SCOTT"."MV_LEDGER_2020" PARTITION ( LEDGER_2020 ) ("BUSINESS_UNIT"
    ,"LEDGER", "ACCOUNT", "CHARTFIELD1", "FISCAL_YEAR", "ACCOUNTING_PERIOD", "POSTED_TOTAL_AMT") SELECT /*+ X_DYN_PRUNE */
    "PS_LEDGER"."BUSINESS_UNIT", "PS_LEDGER"."LEDGER" , "PS_LEDGER"."ACCOUNT" , "PS_LEDGER"."CHARTFIELD1" , "PS_LEDGER"."FISCAL_YEAR" P0,
    "PS_LEDGER"."ACCOUNTING_PERIOD" ,SUM("PS_LEDGER"."POSTED_TOTAL_AMT") FROM "PS_LEDGER""PS_LEDGER" WHERE ("PS_LEDGER"."FISCAL_YEAR">=2019
    AND "PS_LEDGER"."LEDGER"='ACTUALS' AND "PS_LEDGER"."CURRENCY_CD"='GBP') AND ( ( ( ( ( ( "PS_LEDGER"."FISCAL_YEAR">= 2020 ) ) )
    AND ( ( ( "PS_LEDGER"."FISCAL_YEAR"< 2021 ) ) )
    ) ) ) GROUP BY "PS_LEDGER"."BUSINESS_UNIT","PS_LEDGER"."LEDGER"
    ,"PS_LEDGER"."ACCOUNT","PS_LEDGER"."CHARTFIELD1","PS_LEDGER"."FISCAL_YEAR","PS_LEDGER"."ACCOUNTING_PERIOD"
    I can use EXPLAIN_MVIEW to check the status of MV_LEDGER_2020. PCT is enabled for refresh and rewrite.
    CAPABILITY_NAME                P REL_TEXT             MSGTXT
    ------------------------------ - -------------------- ------------------------------------------------------------
    PCT Y
    REFRESH_COMPLETE Y
    REFRESH_FAST Y
    REWRITE Y
    PCT_TABLE Y PS_LEDGER
    REFRESH_FAST_AFTER_INSERT N SCOTT.PS_LEDGER the detail table does not have a materialized view log
    REFRESH_FAST_AFTER_ONETAB_DML N POSTED_TOTAL_AMT SUM(expr) without COUNT(expr)
    REFRESH_FAST_AFTER_ONETAB_DML N see the reason why REFRESH_FAST_AFTER_INSERT is disabled
    REFRESH_FAST_AFTER_ONETAB_DML N COUNT(*) is not present in the select list
    REFRESH_FAST_AFTER_ONETAB_DML N SUM(expr) without COUNT(expr)
    REFRESH_FAST_AFTER_ANY_DML N see the reason why REFRESH_FAST_AFTER_ONETAB_DML is disabled
    REFRESH_FAST_PCT Y
    REWRITE_FULL_TEXT_MATCH Y
    REWRITE_PARTIAL_TEXT_MATCH Y
    REWRITE_GENERAL Y
    REWRITE_PCT Y
    PCT_TABLE_REWRITE Y PS_LEDGER
    I can see PCT has worked.
    • I still get query rewrite for the partitions that are still fresh rather than stale.
    • The refresh process refreshes only the stale partitions. 
    However, I have to regenerate the materialized view for the whole fiscal year, when I have only changed one accounting period. Could I organise it to refresh just a single accounting period?


    Demonstration 3: Interval Partitioning 

    This time I am going to use interval partitioning. I have explicitly specified the partitions for previous years because I don't want to allow any free space in the blocks, but the current and future partitions will be created automatically.
    CREATE TABLE ps_ledger
    (business_unit VARCHAR2(5) NOT NULL

    ) PCTFREE 10 PCTUSED 80
    PARTITION BY RANGE (FISCAL_YEAR) INTERVAL (1)
    (PARTITION ledger_2018 VALUES LESS THAN (2019) PCTFREE 0 COMPRESS
    ,PARTITION ledger_2019 VALUES LESS THAN (2020) PCTFREE 0 COMPRESS)
    ENABLE ROW MOVEMENT NOLOGGING
    /
    @treeselectors
    @popledger
    I will similarly create a single materialized view with interval partitioning per fiscal year and populate it for 2019 onwards.
    CREATE MATERIALIZED VIEW mv_ledger_2020
    PARTITION BY RANGE (FISCAL_YEAR) INTERVAL (1)
    (PARTITION ledger_2019 VALUES LESS THAN (2020)
    ) PCTFREE 0 COMPRESS
    REFRESH COMPLETE ON DEMAND
    ENABLE QUERY REWRITE AS
    SELECT business_unit, account, chartfield1, fiscal_year, accounting_period,
    sum(posted_total_amt) posted_total_amt
    FROM ps_ledger
    WHERE fiscal_year >= 2019
    AND ledger = 'ACTUALS'
    AND currency_cd = 'GBP'
    GROUP BY business_unit, account, chartfield1, fiscal_year, accounting_period
    /
    @@mvpop
    @@mvvol
    @@mvsql
    I get exactly the same behaviour as the previous demonstration. The only difference is that the new partitions have system generated names, but as before just one of them is identified as stale.
    @pop2020m7.sql
    23:25:42 SQL> SELECT MVIEW_NAME, STALENESS, LAST_REFRESH_TYPE, COMPILE_STATE FROM USER_MVIEWS ORDER BY MVIEW_NAME;

    MVIEW_NAME STALENESS LAST_REF COMPILE_STATE
    --------------- ------------------- -------- -------------------
    MV_LEDGER_2020 NEEDS_COMPILE COMPLETE NEEDS_COMPILE

    23:25:42 SQL> select * from user_mview_detail_relations;

    Detailobj
    OWNER MVIEW_NAME Owner DETAILOBJ_NAME DETAILOBJ DETAILOBJ_ALIAS D NUM_FRESH_PCT_PARTITIONS NUM_STALE_PCT_PARTITIONS
    ---------- --------------- ---------- --------------- --------- -------------------- - ------------------------ ------------------------
    SCOTT MV_LEDGER_2020 SCOTT PS_LEDGER TABLE PS_LEDGER Y 3 1

    23:25:42 SQL> select * from user_mview_detail_partition;

    Detailobj
    OWNER MVIEW_NAME Owner DETAILOBJ_NAME DETAIL_PARTITION_NAM DETAIL_PARTITION_POSITION FRESHNE LAST_REFRESH_TIME
    ---------- --------------- ---------- --------------- -------------------- ------------------------- ------- -------------------
    SCOTT MV_LEDGER_2020 SCOTT PS_LEDGER LEDGER_2018 1 FRESH 23:25:21 15/11/2020
    SCOTT MV_LEDGER_2020 SCOTT PS_LEDGER LEDGER_2019 2 FRESH 23:25:21 15/11/2020
    SCOTT MV_LEDGER_2020 SCOTT PS_LEDGER SYS_P981 3 STALE 23:25:21 15/11/2020
    SCOTT MV_LEDGER_2020 SCOTT PS_LEDGER SYS_P982 4 FRESH 23:25:21 15/11/2020
    However, when I look in the trace of the refresh, I see that it has truncated and repopulated the partitions for both 2020 and 2021 even though I didn't change any of the data in the 2021 partition, and it is listed as fresh.

    /* MV_REFRESH (ATB) */ ALTER TABLE "SCOTT"."MV_LEDGER_2020"TRUNCATE PARTITION SYS_P987

    /* MV_REFRESH (ATB) */ ALTER TABLE "SCOTT"."MV_LEDGER_2020"TRUNCATE PARTITION SYS_P986

    /* MV_REFRESH (INS) */ INSERT /*+ APPEND BYPASS_RECURSIVE_CHECK */ FIRST WHEN ( ( ( ( "P0">= 2020 ) ) ) AND ( ( ( "P0"< 2021 )
    ) ) ) THEN INTO "SCOTT"."MV_LEDGER_2020"PARTITION (SYS_P986)("BUSINESS_UNIT", "ACCOUNT", "CHARTFIELD1", "FISCAL_YEAR",
    "ACCOUNTING_PERIOD", "POSTED_TOTAL_AMT") WHEN ( ( ( ( "P0">= 2021 ) ) ) AND ( ( ( "P0"< 2022 ) ) ) ) THEN INTO
    "SCOTT"."MV_LEDGER_2020"PARTITION (SYS_P987)("BUSINESS_UNIT", "ACCOUNT", "CHARTFIELD1", "FISCAL_YEAR", "ACCOUNTING_PERIOD",
    "POSTED_TOTAL_AMT") SELECT /*+ X_DYN_PRUNE */ "PS_LEDGER"."BUSINESS_UNIT" , "PS_LEDGER"."ACCOUNT" , "PS_LEDGER"."CHARTFIELD1" ,
    "PS_LEDGER"."FISCAL_YEAR" P0, "PS_LEDGER"."ACCOUNTING_PERIOD" ,SUM("PS_LEDGER"."POSTED_TOTAL_AMT") FROM "PS_LEDGER""PS_LEDGER" WHERE
    ("PS_LEDGER"."FISCAL_YEAR">=2019 AND "PS_LEDGER"."LEDGER"='ACTUALS' AND "PS_LEDGER"."CURRENCY_CD"='GBP') AND ( ( ( ( ( (
    "PS_LEDGER"."FISCAL_YEAR">= 2020 ) ) ) AND ( ( ( "PS_LEDGER"."FISCAL_YEAR"< 2022 ) ) ) ) ) ) GROUP BY
    "PS_LEDGER"."BUSINESS_UNIT","PS_LEDGER"."ACCOUNT","PS_LEDGER"."CHARTFIELD1","PS_LEDGER"."FISCAL_YEAR","PS_LEDGER"."ACCOUNTING_PERIOD"
    In practice, in this particular case, it won't make a huge difference because there is no actuals data in 2021. The partition for 2021 has been created in the data dictionary, but due to deferred segment creation, it has not been physically created because there is no data in it. However, if I had updated data in 2019, then it would have truncated and repopulated two partitions (2019 and 2020). 
    Interval partitioning is a form of range partitioning, so it is expected that PCT still works. However, I have no explanation as to why the partition following the stale partition was also refreshed. This might be a bug.

    Demonstration 4: Composite (Range-List) Partitioning 

    This time I am going to create a composite partitioned table. It will have the same range partitioning on FISCAL_YEAR, but then I will list subpartition it by ACCOUTING_PERIOD with 14 periods per fiscal year. I will use a template so that each partition will have the same subpartitions.
    CREATE TABLE ps_ledger
    (business_unit VARCHAR2(5) NOT NULL

    ) PCTFREE 10 PCTUSED 80
    PARTITION BY RANGE (FISCAL_YEAR)
    SUBPARTITION BY LIST (ACCOUNTING_PERIOD)
    SUBPARTITION TEMPLATE
    (SUBPARTITION ap_bf VALUES (0)
    ,SUBPARTITION ap_01 VALUES (1)
    ,SUBPARTITION ap_02 VALUES (2)
    ,SUBPARTITION ap_03 VALUES (3)
    ,SUBPARTITION ap_04 VALUES (4)
    ,SUBPARTITION ap_05 VALUES (5)
    ,SUBPARTITION ap_06 VALUES (6)
    ,SUBPARTITION ap_07 VALUES (7)
    ,SUBPARTITION ap_08 VALUES (8)
    ,SUBPARTITION ap_09 VALUES (9)
    ,SUBPARTITION ap_10 VALUES (10)
    ,SUBPARTITION ap_11 VALUES (11)
    ,SUBPARTITION ap_12 VALUES (12)
    ,SUBPARTITION ap_cf VALUES (DEFAULT))
    (PARTITION ledger_2018 VALUES LESS THAN (2019) PCTFREE 0 COMPRESS
    ,PARTITION ledger_2019 VALUES LESS THAN (2020) PCTFREE 0 COMPRESS
    ,PARTITION ledger_2020 VALUES LESS THAN (2021)
    ,PARTITION ledger_2021 VALUES LESS THAN (2022)
    ) ENABLE ROW MOVEMENT NOPARALLEL NOLOGGING
    /
    @treeselectors
    @popledger
    I will similarly partition the materialized view
    CREATE MATERIALIZED VIEW mv_ledger_2020
    PARTITION BY RANGE (FISCAL_YEAR)
    SUBPARTITION BY LIST (ACCOUNTING_PERIOD)
    SUBPARTITION TEMPLATE
    (SUBPARTITION ap_bf VALUES (0)
    ,SUBPARTITION ap_01 VALUES (1)
    ,SUBPARTITION ap_02 VALUES (2)
    ,SUBPARTITION ap_03 VALUES (3)
    ,SUBPARTITION ap_04 VALUES (4)
    ,SUBPARTITION ap_05 VALUES (5)
    ,SUBPARTITION ap_06 VALUES (6)
    ,SUBPARTITION ap_07 VALUES (7)
    ,SUBPARTITION ap_08 VALUES (8)
    ,SUBPARTITION ap_09 VALUES (9)
    ,SUBPARTITION ap_10 VALUES (10)
    ,SUBPARTITION ap_11 VALUES (11)
    ,SUBPARTITION ap_12 VALUES (12)
    ,SUBPARTITION ap_cf VALUES (DEFAULT))
    (PARTITION ledger_2019 VALUES LESS THAN (2020)
    ,PARTITION ledger_2020 VALUES LESS THAN (2021)
    ) PCTFREE 0 COMPRESS PARALLEL
    REFRESH COMPLETE ON DEMAND
    ENABLE QUERY REWRITE AS
    SELECT business_unit, account, chartfield1, fiscal_year, accounting_period,
    sum(posted_total_amt) posted_total_amt
    FROM ps_ledger
    WHERE fiscal_year >= 2019
    AND ledger = 'ACTUALS'
    AND currency_cd = 'GBP'
    GROUP BY business_unit, account, chartfield1, fiscal_year, accounting_period
    /
    @mvpop
    PCT does work properly. USER_MVIEW_DETAIL_PARTITION reports that one partition is stale USER_MVIEW_DETAIL_SUBPARTITION correctly identified that it is a stale sub-partition, but as expected, the materialized view refresh truncates the partition not the sub-partition and repopulates it. So we are still processing a whole fiscal year.
    @pop2020m7.sql
    17:40:03 SQL> SELECT MVIEW_NAME, STALENESS, LAST_REFRESH_TYPE, COMPILE_STATE FROM USER_MVIEWS ORDER BY MVIEW_NAME;

    MVIEW_NAME STALENESS LAST_REF COMPILE_STATE
    --------------- ------------------- -------- -------------------
    MV_LEDGER_2020 NEEDS_COMPILE COMPLETE NEEDS_COMPILE

    17:40:03 SQL> select * from user_mview_detail_relations;

    Detailobj
    OWNER MVIEW_NAME Owner DETAILOBJ_NAME DETAILOBJ DETAILOBJ_ALIAS D NUM_FRESH_PCT_PARTITIONS NUM_STALE_PCT_PARTITIONS
    ---------- --------------- ---------- --------------- --------- -------------------- - ------------------------ ------------------------
    SCOTT MV_LEDGER_2020 SCOTT PS_LEDGER TABLE PS_LEDGER Y 55 1

    17:40:10 SQL> select * from user_mview_detail_partition;

    no rows selected

    17:40:10 SQL> select * from user_mview_detail_subpartition where freshness != 'FRESH';

    Detailobj
    OWNER MVIEW_NAME Owner DETAILOBJ_NAME DETAIL_PARTITION_NAM DETAIL_SUBPARTITION_ DETAIL_SUBPARTITION_POSITION FRESH
    ---------- --------------- ---------- --------------- -------------------- -------------------- ---------------------------- -----
    SCOTT MV_LEDGER_2020 SCOTT PS_LEDGER LEDGER_2020 LEDGER_2020_AP_07 8 STALE
    If I query periods 1 to 6 in 2020 using a BETWEEN, this is then expanded to two inequalities that I can see in the predicate section. These subpartitions are up to date, and Oracle performs query rewrite.
    Plan hash value: 1400212726
    -------------------------------------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
    -------------------------------------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 12260 | 969K| | 664 (1)| 00:00:01 | | |
    | 1 | HASH GROUP BY | | 12260 | 969K| 1128K| 664 (1)| 00:00:01 | | |
    |* 2 | HASH JOIN | | 12260 | 969K| | 428 (2)| 00:00:01 | | |
    |* 3 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 270 | 5130 | | 2 (0)| 00:00:01 | | |
    |* 4 | HASH JOIN | | 45363 | 2746K| | 425 (1)| 00:00:01 | | |
    | 5 | MERGE JOIN CARTESIAN | | 182 | 5278 | | 3 (0)| 00:00:01 | | |
    |* 6 | INDEX RANGE SCAN | PS_PSTREESELECT05 | 2 | 22 | | 1 (0)| 00:00:01 | | |
    | 7 | BUFFER SORT | | 91 | 1638 | | 2 (0)| 00:00:01 | | |
    |* 8 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 91 | 1638 | | 1 (0)| 00:00:01 | | |
    | 9 | PARTITION RANGE SINGLE | | 497K| 15M| | 421 (1)| 00:00:01 | 2 | 2 |
    | 10 | PARTITION LIST ITERATOR | | 497K| 15M| | 421 (1)| 00:00:01 | KEY | KEY |
    |* 11 | MAT_VIEW REWRITE ACCESS FULL| MV_LEDGER_2020 | 497K| 15M| | 421 (1)| 00:00:01 | 15 | 28 |
    -------------------------------------------------------------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------
    2 - access("MV_LEDGER_2020"."ACCOUNT"="L2"."RANGE_FROM_10")
    3 - access("L2"."SELECTOR_NUM"=30984)
    4 - access("MV_LEDGER_2020"."BUSINESS_UNIT"="L1"."RANGE_FROM_05" AND
    "MV_LEDGER_2020"."CHARTFIELD1"="L"."RANGE_FROM_10")
    6 - access("L1"."SELECTOR_NUM"=30982)
    8 - access("L"."SELECTOR_NUM"=30985)
    11 - filter("MV_LEDGER_2020"."ACCOUNTING_PERIOD">=1 AND "MV_LEDGER_2020"."ACCOUNTING_PERIOD"<=6 AND
    "MV_LEDGER_2020"."FISCAL_YEAR"=2020)
    But if I create period 7 in fiscal year 2020, then that subpartition is stale and Oracle leaves the query against that period as submitted to run against PS_LEDGER.
    Plan hash value: 3964652976
    ---------------------------------------------------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
    ---------------------------------------------------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1 | 92 | 7 (15)| 00:00:01 | | |
    | 1 | HASH GROUP BY | | 1 | 92 | 7 (15)| 00:00:01 | | |
    |- * 2 | HASH JOIN | | 1 | 92 | 6 (0)| 00:00:01 | | |
    | 3 | NESTED LOOPS | | 1 | 92 | 6 (0)| 00:00:01 | | |
    |- 4 | STATISTICS COLLECTOR | | | | | | | |
    |- * 5 | HASH JOIN | | 1 | 73 | 5 (0)| 00:00:01 | | |
    | 6 | NESTED LOOPS | | 1 | 73 | 5 (0)| 00:00:01 | | |
    |- 7 | STATISTICS COLLECTOR | | | | | | | |
    |- * 8 | HASH JOIN | | 1 | 55 | 4 (0)| 00:00:01 | | |
    | 9 | NESTED LOOPS | | 1 | 55 | 4 (0)| 00:00:01 | | |
    |- 10 | STATISTICS COLLECTOR | | | | | | | |
    | 11 | PARTITION RANGE SINGLE | | 1 | 44 | 3 (0)| 00:00:01 | 3 | 3 |
    | 12 | PARTITION LIST SINGLE | | 1 | 44 | 3 (0)| 00:00:01 | KEY | KEY |
    | * 13 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| PS_LEDGER | 1 | 44 | 3 (0)| 00:00:01 | 36 | 36 |
    | * 14 | INDEX RANGE SCAN | PSXLEDGER | 1 | | 2 (0)| 00:00:01 | 36 | 36 |
    | * 15 | INDEX RANGE SCAN | PS_PSTREESELECT05 | 1 | 11 | 1 (0)| 00:00:01 | | |
    |- * 16 | INDEX RANGE SCAN | PS_PSTREESELECT05 | 1 | 11 | 1 (0)| 00:00:01 | | |
    | * 17 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 1 | 18 | 1 (0)| 00:00:01 | | |
    |- * 18 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 1 | 18 | 1 (0)| 00:00:01 | | |
    | * 19 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 1 | 19 | 1 (0)| 00:00:01 | | |
    |- * 20 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 1 | 19 | 1 (0)| 00:00:01 | | |
    ---------------------------------------------------------------------------------------------------------------------------------------------


    Predicate Information (identified by operation id):
    ---------------------------------------------------
    2 - access("A"."ACCOUNT"="L2"."RANGE_FROM_10")
    5 - access("A"."CHARTFIELD1"="L"."RANGE_FROM_10")
    8 - access("A"."BUSINESS_UNIT"="L1"."RANGE_FROM_05")
    13 - filter("A"."CURRENCY_CD"='GBP')
    14 - access("A"."LEDGER"='ACTUALS' AND "A"."FISCAL_YEAR"=2020 AND "A"."ACCOUNTING_PERIOD"=7)
    15 - access("L1"."SELECTOR_NUM"=30982 AND "A"."BUSINESS_UNIT"="L1"."RANGE_FROM_05")
    filter("A"."BUSINESS_UNIT"="L1"."RANGE_FROM_05")
    16 - access("L1"."SELECTOR_NUM"=30982)
    17 - access("L"."SELECTOR_NUM"=30985 AND "A"."CHARTFIELD1"="L"."RANGE_FROM_10")
    filter("A"."CHARTFIELD1"="L"."RANGE_FROM_10")
    18 - access("L"."SELECTOR_NUM"=30985)
    19 - access("L2"."SELECTOR_NUM"=30984 AND "A"."ACCOUNT"="L2"."RANGE_FROM_10")
    filter("A"."ACCOUNT"="L2"."RANGE_FROM_10")
    20 - access("L2"."SELECTOR_NUM"=30984)
    So PCT also controls query rewrite correctly on list partitioning. Again, when I look at the trace of the stale partition refresh, the entire 2020 partition was truncated and refreshed in direct-path mode. There is no accounting period criterion on the insert statement.

    /* MV_REFRESH (ATB) */ ALTER TABLE "SCOTT"."MV_LEDGER_2020" TRUNCATE PARTITION LEDGER_2020

    /* MV_REFRESH (INS) */ INSERT /*+ APPEND BYPASS_RECURSIVE_CHECK */ INTO "SCOTT"."MV_LEDGER_2020" PARTITION ( LEDGER_2020 ) ("BUSINESS_UNIT",
    "ACCOUNT", "CHARTFIELD1", "FISCAL_YEAR", "ACCOUNTING_PERIOD", "POSTED_TOTAL_AMT") SELECT /*+ X_DYN_PRUNE */ "PS_LEDGER"."BUSINESS_UNIT" ,
    "PS_LEDGER"."ACCOUNT" , "PS_LEDGER"."CHARTFIELD1" , "PS_LEDGER"."FISCAL_YEAR" P0, "PS_LEDGER"."ACCOUNTING_PERIOD" ,
    SUM("PS_LEDGER"."POSTED_TOTAL_AMT") FROM "PS_LEDGER""PS_LEDGER" WHERE ("PS_LEDGER"."FISCAL_YEAR"=2020 AND "PS_LEDGER"."LEDGER"='ACTUALS'
    AND "PS_LEDGER"."CURRENCY_CD"='GBP') AND ( ( ( ( ( ( "PS_LEDGER"."FISCAL_YEAR"< 2021 ) ) ) ) ) )GROUP BY
    "PS_LEDGER"."BUSINESS_UNIT","PS_LEDGER"."ACCOUNT","PS_LEDGER"."CHARTFIELD1","PS_LEDGER"."FISCAL_YEAR","PS_LEDGER"."ACCOUNTING_PERIOD"

    Demonstration 5: Composite (Range-Range) Partitioning

    I am still composite partitioning the ledger table and materialized view in this test. It will have the same range partitioning on FISCAL_YEAR, but this time I will range subpartition it by ACCOUTING_PERIOD with 14 periods per fiscal year. I will use a template so that each partition will have the same subpartitions.
    CREATE TABLE ps_ledger
    (business_unit VARCHAR2(5) NOT NULL

    ) PCTFREE 10 PCTUSED 80
    PARTITION BY RANGE (FISCAL_YEAR)
    SUBPARTITION BY RANGE (ACCOUNTING_PERIOD)
    SUBPARTITION TEMPLATE
    (SUBPARTITION ap_bf VALUES LESS THAN (1)
    ,SUBPARTITION ap_01 VALUES LESS THAN (2)
    ,SUBPARTITION ap_02 VALUES LESS THAN (3)
    ,SUBPARTITION ap_03 VALUES LESS THAN (4)
    ,SUBPARTITION ap_04 VALUES LESS THAN (5)
    ,SUBPARTITION ap_05 VALUES LESS THAN (6)
    ,SUBPARTITION ap_06 VALUES LESS THAN (7)
    ,SUBPARTITION ap_07 VALUES LESS THAN (8)
    ,SUBPARTITION ap_08 VALUES LESS THAN (9)
    ,SUBPARTITION ap_09 VALUES LESS THAN (10)
    ,SUBPARTITION ap_10 VALUES LESS THAN (11)
    ,SUBPARTITION ap_11 VALUES LESS THAN (12)
    ,SUBPARTITION ap_12 VALUES LESS THAN (13)
    ,SUBPARTITION ap_cf VALUES LESS THAN (MAXVALUE))
    (PARTITION ledger_2018 VALUES LESS THAN (2019) PCTFREE 0 COMPRESS
    ,PARTITION ledger_2019 VALUES LESS THAN (2020) PCTFREE 0 COMPRESS
    ,PARTITION ledger_2020 VALUES LESS THAN (2021)
    ,PARTITION ledger_2021 VALUES LESS THAN (2022)
    )
    ENABLE ROW MOVEMENT NOLOGGING
    /
    @treeselectors
    @popledger
    This time I will create one materialized view with two range partitions for two fiscal years
    CREATE MATERIALIZED VIEW mv_ledger_2020
    PARTITION BY RANGE (FISCAL_YEAR)
    SUBPARTITION BY RANGE (ACCOUNTING_PERIOD)
    SUBPARTITION TEMPLATE
    (SUBPARTITION ap_bf VALUES LESS THAN (1)
    ,SUBPARTITION ap_01 VALUES LESS THAN (2)
    ,SUBPARTITION ap_02 VALUES LESS THAN (3)
    ,SUBPARTITION ap_03 VALUES LESS THAN (4)
    ,SUBPARTITION ap_04 VALUES LESS THAN (5)
    ,SUBPARTITION ap_05 VALUES LESS THAN (6)
    ,SUBPARTITION ap_06 VALUES LESS THAN (7)
    ,SUBPARTITION ap_07 VALUES LESS THAN (8)
    ,SUBPARTITION ap_08 VALUES LESS THAN (9)
    ,SUBPARTITION ap_09 VALUES LESS THAN (10)
    ,SUBPARTITION ap_10 VALUES LESS THAN (11)
    ,SUBPARTITION ap_11 VALUES LESS THAN (12)
    ,SUBPARTITION ap_12 VALUES LESS THAN (13)
    ,SUBPARTITION ap_cf VALUES LESS THAN (MAXVALUE))
    (PARTITION ledger_2019 VALUES LESS THAN (2020)
    ,PARTITION ledger_2020 VALUES LESS THAN (2021)
    ) PCTFREE 0 COMPRESS PARALLEL
    REFRESH COMPLETE ON DEMAND
    ENABLE QUERY REWRITE AS
    SELECT business_unit, account, chartfield1, fiscal_year, accounting_period,
    sum(posted_total_amt) posted_total_amt
    FROM ps_ledger
    WHERE fiscal_year >= 2019
    AND ledger = 'ACTUALS'
    AND currency_cd = 'GBP'
    GROUP BY business_unit, account, chartfield1, fiscal_year, accounting_period
    /
    @mvpop
    After inserting and committing data for fiscal year 2020, period 7 USER_MVIEW_DETAIL_SUBPARTITION correctly identified the one stale sub-partition, and USER_MVIEW_DETAIL_PARTITION reports that one range subpartition is stale.
    @pop2020m7.sql
    19:09:50 SQL> SELECT MVIEW_NAME, STALENESS, LAST_REFRESH_TYPE, COMPILE_STATE FROM USER_MVIEWS ORDER BY MVIEW_NAME;

    MVIEW_NAME STALENESS LAST_REF COMPILE_STATE
    --------------- ------------------- -------- -------------------
    MV_LEDGER_2020 NEEDS_COMPILE COMPLETE NEEDS_COMPILE

    19:09:50 SQL> select * from user_mview_detail_relations;

    Detailobj
    OWNER MVIEW_NAME Owner DETAILOBJ_NAME DETAILOBJ DETAILOBJ_ALIAS D NUM_FRESH_PCT_PARTITIONS NUM_STALE_PCT_PARTITIONS
    ---------- --------------- ---------- --------------- --------- -------------------- - ------------------------ ------------------------
    SCOTT MV_LEDGER_2020 SCOTT PS_LEDGER TABLE PS_LEDGER Y 55 1

    19:09:56 SQL> select * from user_mview_detail_subpartition where freshness != 'FRESH';

    Detailobj
    OWNER MVIEW_NAME Owner DETAILOBJ_NAME DETAIL_PARTITION_NAM DETAIL_SUBPARTITION_ DETAIL_SUBPARTITION_POSITION FRESH
    ---------- --------------- ---------- --------------- -------------------- -------------------- ---------------------------- -----
    SCOTT MV_LEDGER_2020 SCOTT PS_LEDGER LEDGER_2020 LEDGER_2020_AP_07 8 STALE
    Query rewrite continues to work on the fresh partitions.
    Plan hash value: 589110139
    -------------------------------------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
    -------------------------------------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 13427 | 1062K| | 683 (1)| 00:00:01 | | |
    | 1 | HASH GROUP BY | | 13427 | 1062K| 1232K| 683 (1)| 00:00:01 | | |
    |* 2 | HASH JOIN | | 13427 | 1062K| | 427 (2)| 00:00:01 | | |
    |* 3 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 257 | 4883 | | 2 (0)| 00:00:01 | | |
    |* 4 | HASH JOIN | | 52141 | 3156K| | 424 (1)| 00:00:01 | | |
    | 5 | MERGE JOIN CARTESIAN | | 210 | 6090 | | 3 (0)| 00:00:01 | | |
    |* 6 | INDEX RANGE SCAN | PS_PSTREESELECT05 | 2 | 22 | | 1 (0)| 00:00:01 | | |
    | 7 | BUFFER SORT | | 105 | 1890 | | 2 (0)| 00:00:01 | | |
    |* 8 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 105 | 1890 | | 1 (0)| 00:00:01 | | |
    | 9 | PARTITION RANGE SINGLE | | 496K| 15M| | 420 (1)| 00:00:01 | 2 | 2 |
    | 10 | PARTITION RANGE ITERATOR | | 496K| 15M| | 420 (1)| 00:00:01 | 2 | 7 |
    |* 11 | MAT_VIEW REWRITE ACCESS FULL| MV_LEDGER_2020 | 496K| 15M| | 420 (1)| 00:00:01 | 15 | 28 |
    -------------------------------------------------------------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------

    2 - access("MV_LEDGER_2020"."ACCOUNT"="L2"."RANGE_FROM_10")
    3 - access("L2"."SELECTOR_NUM"=30984)
    4 - access("MV_LEDGER_2020"."BUSINESS_UNIT"="L1"."RANGE_FROM_05" AND
    "MV_LEDGER_2020"."CHARTFIELD1"="L"."RANGE_FROM_10")
    6 - access("L1"."SELECTOR_NUM"=30982)
    8 - access("L"."SELECTOR_NUM"=30985)
    11 - filter("MV_LEDGER_2020"."ACCOUNTING_PERIOD"<=6 AND "MV_LEDGER_2020"."FISCAL_YEAR"=2020)
    PCT correctly identifies stale partition in this query on period 7 only and prevents query rewrite.
    Plan hash value: 1321682226
    ---------------------------------------------------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
    ---------------------------------------------------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1 | 92 | 7 (15)| 00:00:01 | | |
    | 1 | HASH GROUP BY | | 1 | 92 | 7 (15)| 00:00:01 | | |
    |- * 2 | HASH JOIN | | 1 | 92 | 6 (0)| 00:00:01 | | |
    | 3 | NESTED LOOPS | | 1 | 92 | 6 (0)| 00:00:01 | | |
    |- 4 | STATISTICS COLLECTOR | | | | | | | |
    |- * 5 | HASH JOIN | | 1 | 73 | 5 (0)| 00:00:01 | | |
    | 6 | NESTED LOOPS | | 1 | 73 | 5 (0)| 00:00:01 | | |
    |- 7 | STATISTICS COLLECTOR | | | | | | | |
    |- * 8 | HASH JOIN | | 1 | 55 | 4 (0)| 00:00:01 | | |
    | 9 | NESTED LOOPS | | 1 | 55 | 4 (0)| 00:00:01 | | |
    |- 10 | STATISTICS COLLECTOR | | | | | | | |
    | 11 | PARTITION RANGE SINGLE | | 1 | 44 | 3 (0)| 00:00:01 | 3 | 3 |
    | 12 | PARTITION RANGE SINGLE | | 1 | 44 | 3 (0)| 00:00:01 | 8 | 8 |
    | * 13 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| PS_LEDGER | 1 | 44 | 3 (0)| 00:00:01 | 36 | 36 |
    | * 14 | INDEX RANGE SCAN | PSXLEDGER | 1 | | 2 (0)| 00:00:01 | 36 | 36 |
    | * 15 | INDEX RANGE SCAN | PS_PSTREESELECT05 | 1 | 11 | 1 (0)| 00:00:01 | | |
    |- * 16 | INDEX RANGE SCAN | PS_PSTREESELECT05 | 1 | 11 | 1 (0)| 00:00:01 | | |
    | * 17 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 1 | 18 | 1 (0)| 00:00:01 | | |
    |- * 18 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 1 | 18 | 1 (0)| 00:00:01 | | |
    | * 19 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 1 | 19 | 1 (0)| 00:00:01 | | |
    |- * 20 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 1 | 19 | 1 (0)| 00:00:01 | | |
    ---------------------------------------------------------------------------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------
    2 - access("A"."ACCOUNT"="L2"."RANGE_FROM_10")
    5 - access("A"."CHARTFIELD1"="L"."RANGE_FROM_10")
    8 - access("A"."BUSINESS_UNIT"="L1"."RANGE_FROM_05")
    13 - filter("A"."CURRENCY_CD"='GBP')
    14 - access("A"."LEDGER"='ACTUALS' AND "A"."FISCAL_YEAR"=2020 AND "A"."ACCOUNTING_PERIOD"=7)
    15 - access("L1"."SELECTOR_NUM"=30982 AND "A"."BUSINESS_UNIT"="L1"."RANGE_FROM_05")
    filter("A"."BUSINESS_UNIT"="L1"."RANGE_FROM_05")
    16 - access("L1"."SELECTOR_NUM"=30982)
    17 - access("L"."SELECTOR_NUM"=30985 AND "A"."CHARTFIELD1"="L"."RANGE_FROM_10")
    filter("A"."CHARTFIELD1"="L"."RANGE_FROM_10")
    18 - access("L"."SELECTOR_NUM"=30985)
    19 - access("L2"."SELECTOR_NUM"=30984 AND "A"."ACCOUNT"="L2"."RANGE_FROM_10")
    filter("A"."ACCOUNT"="L2"."RANGE_FROM_10")
    20 - access("L2"."SELECTOR_NUM"=30984)
    The query rewrite is prevented if a stale partition is not pruned. It is all or nothing. The query is not expanded and then rewritten to use materialised view for periods 1 to 6 and then the underlying table for period 7.
    Plan hash value: 3827045647
    ------------------------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
    ------------------------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 561 | 52173 | 3670 (1)| 00:00:01 | | |
    | 1 | HASH GROUP BY | | 561 | 52173 | 3670 (1)| 00:00:01 | | |
    |* 2 | HASH JOIN | | 561 | 52173 | 3669 (1)| 00:00:01 | | |
    |* 3 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 227 | 4313 | 2 (0)| 00:00:01 | | |
    |* 4 | HASH JOIN | | 2468 | 178K| 3667 (1)| 00:00:01 | | |
    | 5 | MERGE JOIN CARTESIAN | | 210 | 6090 | 3 (0)| 00:00:01 | | |
    |* 6 | INDEX RANGE SCAN | PS_PSTREESELECT05 | 2 | 22 | 1 (0)| 00:00:01 | | |
    | 7 | BUFFER SORT | | 105 | 1890 | 2 (0)| 00:00:01 | | |
    |* 8 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 105 | 1890 | 1 (0)| 00:00:01 | | |
    | 9 | PARTITION RANGE SINGLE | | 23486 | 1032K| 3664 (1)| 00:00:01 | 3 | 3 |
    | 10 | PARTITION RANGE ITERATOR| | 23486 | 1032K| 3664 (1)| 00:00:01 | 2 | 8 |
    |* 11 | TABLE ACCESS FULL | PS_LEDGER | 23486 | 1032K| 3664 (1)| 00:00:01 | 29 | 42 |
    ------------------------------------------------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------
    2 - access("A"."ACCOUNT"="L2"."RANGE_FROM_10")
    3 - access("L2"."SELECTOR_NUM"=30984)
    4 - access("A"."BUSINESS_UNIT"="L1"."RANGE_FROM_05" AND "A"."CHARTFIELD1"="L"."RANGE_FROM_10")
    6 - access("L1"."SELECTOR_NUM"=30982)
    8 - access("L"."SELECTOR_NUM"=30985)
    11 - filter("A"."ACCOUNTING_PERIOD"<=7 AND "A"."LEDGER"='ACTUALS' AND "A"."FISCAL_YEAR"=2020 AND
    "A"."CURRENCY_CD"='GBP')
    Again, the materialized view refresh process truncates and repopulates the whole partition not the sub-partition. So we are still processing a whole fiscal year.

    /* MV_REFRESH (ATB) */ ALTER TABLE "SCOTT"."MV_LEDGER_2020" TRUNCATE PARTITION LEDGER_2020

    /* MV_REFRESH (INS) */ INSERT /*+ APPEND BYPASS_RECURSIVE_CHECK */ INTO "SCOTT"."MV_LEDGER_2020" PARTITION ( LEDGER_2020 ) ("BUSINESS_UNIT",
    "ACCOUNT", "CHARTFIELD1", "FISCAL_YEAR", "ACCOUNTING_PERIOD", "POSTED_TOTAL_AMT") SELECT /*+ X_DYN_PRUNE */ "PS_LEDGER"."BUSINESS_UNIT" ,
    "PS_LEDGER"."ACCOUNT" , "PS_LEDGER"."CHARTFIELD1" , "PS_LEDGER"."FISCAL_YEAR" P0, "PS_LEDGER"."ACCOUNTING_PERIOD" ,
    SUM("PS_LEDGER"."POSTED_TOTAL_AMT") FROM "PS_LEDGER""PS_LEDGER" WHERE ("PS_LEDGER"."FISCAL_YEAR"=2020 AND "PS_LEDGER"."LEDGER"='ACTUALS'
    AND "PS_LEDGER"."CURRENCY_CD"='GBP') AND ( ( ( ( ( ( "PS_LEDGER"."FISCAL_YEAR"< 2021 ) ) ) ) ) )GROUP BY
    "PS_LEDGER"."BUSINESS_UNIT","PS_LEDGER"."ACCOUNT","PS_LEDGER"."CHARTFIELD1","PS_LEDGER"."FISCAL_YEAR","PS_LEDGER"."ACCOUNTING_PERIOD"

    Demonstration 6: Mismatching Partitioning

    In this example, I am still composite partitioning the ledger table and materialized view. It will have the same range partitioning on FISCAL_YEAR, I still will range subpartition it by ACCOUTING_PERIOD with 14 periods per fiscal year. I will use a template so that each partition will have the same subpartitions.
    CREATE TABLE ps_ledger
    (business_unit VARCHAR2(5) NOT NULL

    ) PCTFREE 10 PCTUSED 80
    PARTITION BY RANGE (FISCAL_YEAR)
    SUBPARTITION BY RANGE (ACCOUNTING_PERIOD)
    SUBPARTITION TEMPLATE
    (SUBPARTITION ap_bf VALUES LESS THAN (1)
    ,SUBPARTITION ap_01 VALUES LESS THAN (2)
    ,SUBPARTITION ap_02 VALUES LESS THAN (3)
    ,SUBPARTITION ap_03 VALUES LESS THAN (4)
    ,SUBPARTITION ap_04 VALUES LESS THAN (5)
    ,SUBPARTITION ap_05 VALUES LESS THAN (6)
    ,SUBPARTITION ap_06 VALUES LESS THAN (7)
    ,SUBPARTITION ap_07 VALUES LESS THAN (8)
    ,SUBPARTITION ap_08 VALUES LESS THAN (9)
    ,SUBPARTITION ap_09 VALUES LESS THAN (10)
    ,SUBPARTITION ap_10 VALUES LESS THAN (11)
    ,SUBPARTITION ap_11 VALUES LESS THAN (12)
    ,SUBPARTITION ap_12 VALUES LESS THAN (13)
    ,SUBPARTITION ap_cf VALUES LESS THAN (MAXVALUE))
    (PARTITION ledger_2018 VALUES LESS THAN (2019) PCTFREE 0 COMPRESS
    ,PARTITION ledger_2019 VALUES LESS THAN (2020) PCTFREE 0 COMPRESS
    ,PARTITION ledger_2020 VALUES LESS THAN (2021)
    ,PARTITION ledger_2021 VALUES LESS THAN (2022)
    ) ENABLE ROW MOVEMENT NOLOGGING
    /
    @treeselectors
    @popledger
    I will create two materialized views, one for 2019 and one for 2020. I will only range partition the MV on accounting period because each contains only a single fiscal year.
    CREATE MATERIALIZED VIEW mv_ledger_2019
    PARTITION BY RANGE (ACCOUNTING_PERIOD)
    (PARTITION ap_bf VALUES LESS THAN (1)
    ,PARTITION ap_01 VALUES LESS THAN (2)
    ,PARTITION ap_02 VALUES LESS THAN (3)
    ,PARTITION ap_03 VALUES LESS THAN (4)
    ,PARTITION ap_04 VALUES LESS THAN (5)
    ,PARTITION ap_05 VALUES LESS THAN (6)
    ,PARTITION ap_06 VALUES LESS THAN (7)
    ,PARTITION ap_07 VALUES LESS THAN (8)
    ,PARTITION ap_08 VALUES LESS THAN (9)
    ,PARTITION ap_09 VALUES LESS THAN (10)
    ,PARTITION ap_10 VALUES LESS THAN (11)
    ,PARTITION ap_11 VALUES LESS THAN (12)
    ,PARTITION ap_12 VALUES LESS THAN (13)
    ,PARTITION ap_cf VALUES LESS THAN (MAXVALUE)
    ) PCTFREE 0 COMPRESS
    REFRESH COMPLETE ON DEMAND
    ENABLE QUERY REWRITE AS
    SELECT business_unit, account, chartfield1, fiscal_year, accounting_period,
    sum(posted_total_amt) posted_total_amt
    FROM ps_ledger
    WHERE fiscal_year = 2019
    AND ledger = 'ACTUALS'
    AND currency_cd = 'GBP'
    GROUP BY business_unit, account, chartfield1, fiscal_year, accounting_period
    /

    CREATE MATERIALIZED VIEW mv_ledger_2020
    PARTITION BY RANGE (ACCOUNTING_PERIOD)
    (PARTITION ap_bf VALUES LESS THAN (1)
    ,PARTITION ap_01 VALUES LESS THAN (2)
    ,PARTITION ap_02 VALUES LESS THAN (3)
    ,PARTITION ap_03 VALUES LESS THAN (4)
    ,PARTITION ap_04 VALUES LESS THAN (5)
    ,PARTITION ap_05 VALUES LESS THAN (6)
    ,PARTITION ap_06 VALUES LESS THAN (7)
    ,PARTITION ap_07 VALUES LESS THAN (8)
    ,PARTITION ap_08 VALUES LESS THAN (9)
    ,PARTITION ap_09 VALUES LESS THAN (10)
    ,PARTITION ap_10 VALUES LESS THAN (11)
    ,PARTITION ap_11 VALUES LESS THAN (12)
    ,PARTITION ap_12 VALUES LESS THAN (13)
    ,PARTITION ap_cf VALUES LESS THAN (MAXVALUE)
    ) PCTFREE 0 COMPRESS PARALLEL
    REFRESH COMPLETE ON DEMAND
    ENABLE QUERY REWRITE AS
    SELECT business_unit, account, chartfield1, fiscal_year, accounting_period,
    sum(posted_total_amt) posted_total_amt
    FROM ps_ledger
    WHERE fiscal_year = 2020
    AND ledger = 'ACTUALS'
    AND currency_cd = 'GBP'
    GROUP BY business_unit, account, chartfield1, fiscal_year, accounting_period
    /
    @mvpop
    USER_MVIEW_DETAIL_RELATIONS reports that PCT does apply to these materialized views. USER_MVIEW_DETAIL_SUBPARTITION correctly identified the one stale sub-partition into which new data was added is stale, but in both materialised views, even though we can see it is not needed by MV_LEDGER_2019.
    @pop2020m7.sql
    23:57:09 SQL> SELECT MVIEW_NAME, STALENESS, LAST_REFRESH_TYPE, COMPILE_STATE FROM USER_MVIEWS ORDER BY MVIEW_NAME;

    MVIEW_NAME STALENESS LAST_REF COMPILE_STATE
    --------------- ------------------- -------- -------------------
    MV_LEDGER_2019 NEEDS_COMPILE COMPLETE NEEDS_COMPILE
    MV_LEDGER_2020 NEEDS_COMPILE COMPLETE NEEDS_COMPILE

    Elapsed: 00:00:00.00
    23:57:09 SQL> select * from user_mview_detail_relations;

    Detailobj
    OWNER MVIEW_NAME Owner DETAILOBJ_NAME DETAILOBJ DETAILOBJ_ALIAS D NUM_FRESH_PCT_PARTITIONS NUM_STALE_PCT_PARTITIONS
    ---------- --------------- ---------- --------------- --------- -------------------- - ------------------------ ------------------------
    SCOTT MV_LEDGER_2019 SCOTT PS_LEDGER TABLE PS_LEDGER Y 55 1
    SCOTT MV_LEDGER_2020 SCOTT PS_LEDGER TABLE PS_LEDGER Y 55 1

    Elapsed: 00:00:13.46
    23:57:23 SQL> select * from user_mview_detail_partition;

    no rows selected

    Elapsed: 00:00:00.00
    23:57:23 SQL> select * from user_mview_detail_subpartition where freshness != 'FRESH';

    Detailobj
    OWNER MVIEW_NAME Owner DETAILOBJ_NAME DETAIL_PARTITION_NAM DETAIL_SUBPARTITION_ DETAIL_SUBPARTITION_POSITION FRESH
    ---------- --------------- ---------- --------------- -------------------- -------------------- ---------------------------- -----
    SCOTT MV_LEDGER_2019 SCOTT PS_LEDGER LEDGER_2020 LEDGER_2020_AP_07 8 STALE
    SCOTT MV_LEDGER_2020 SCOTT PS_LEDGER LEDGER_2020 LEDGER_2020_AP_07 8 STALE
    Query on 2019 continues to be rewritten to use MV_LEDGER_2019 even though the MV needs compilation.
    Plan hash value: 1498194812
    ----------------------------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
    ----------------------------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1703 | 128K| 421 (2)| 00:00:01 | | |
    | 1 | HASH GROUP BY | | 1703 | 128K| 421 (2)| 00:00:01 | | |
    |* 2 | HASH JOIN | | 1703 | 128K| 420 (2)| 00:00:01 | | |
    |* 3 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 238 | 4522 | 2 (0)| 00:00:01 | | |
    |* 4 | HASH JOIN | | 7156 | 405K| 418 (2)| 00:00:01 | | |
    | 5 | MERGE JOIN CARTESIAN | | 208 | 6032 | 3 (0)| 00:00:01 | | |
    |* 6 | INDEX RANGE SCAN | PS_PSTREESELECT05 | 2 | 22 | 1 (0)| 00:00:01 | | |
    | 7 | BUFFER SORT | | 104 | 1872 | 2 (0)| 00:00:01 | | |
    |* 8 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 104 | 1872 | 1 (0)| 00:00:01 | | |
    | 9 | PARTITION RANGE ITERATOR | | 68804 | 1948K| 415 (2)| 00:00:01 | 2 | 7 |
    |* 10 | MAT_VIEW REWRITE ACCESS FULL| MV_LEDGER_2019 | 68804 | 1948K| 415 (2)| 00:00:01 | 2 | 7 |
    ----------------------------------------------------------------------------------------------------------------------
    ….
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    2 - access("MV_LEDGER_2019"."ACCOUNT"="L2"."RANGE_FROM_10")
    3 - access("L2"."SELECTOR_NUM"=30984)
    4 - access("MV_LEDGER_2019"."BUSINESS_UNIT"="L1"."RANGE_FROM_05" AND
    "MV_LEDGER_2019"."CHARTFIELD1"="L"."RANGE_FROM_10")
    6 - access("L1"."SELECTOR_NUM"=30982)
    8 - access("L"."SELECTOR_NUM"=30985)
    10 - filter("MV_LEDGER_2019"."ACCOUNTING_PERIOD"<=6)
    Queries on periods 1-6 in 2020 also get rewritten
    Plan hash value: 3016493666
    ------------------------------------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
    ------------------------------------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 12328 | 927K| | 653 (2)| 00:00:01 | | |
    | 1 | HASH GROUP BY | | 12328 | 927K| 1080K| 653 (2)| 00:00:01 | | |
    |* 2 | HASH JOIN | | 12328 | 927K| | 429 (2)| 00:00:01 | | |
    |* 3 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 238 | 4522 | | 2 (0)| 00:00:01 | | |
    |* 4 | HASH JOIN | | 51748 | 2931K| | 427 (2)| 00:00:01 | | |
    | 5 | MERGE JOIN CARTESIAN | | 208 | 6032 | | 3 (0)| 00:00:01 | | |
    |* 6 | INDEX RANGE SCAN | PS_PSTREESELECT05 | 2 | 22 | | 1 (0)| 00:00:01 | | |
    | 7 | BUFFER SORT | | 104 | 1872 | | 2 (0)| 00:00:01 | | |
    |* 8 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 104 | 1872 | | 1 (0)| 00:00:01 | | |
    | 9 | PARTITION RANGE ITERATOR | | 496K| 13M| | 423 (2)| 00:00:01 | 2 | 7 |
    |* 10 | MAT_VIEW REWRITE ACCESS FULL| MV_LEDGER_2020 | 496K| 13M| | 423 (2)| 00:00:01 | 2 | 7 |
    ------------------------------------------------------------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------
    2 - access("MV_LEDGER_2020"."ACCOUNT"="L2"."RANGE_FROM_10")
    3 - access("L2"."SELECTOR_NUM"=30984)
    4 - access("MV_LEDGER_2020"."BUSINESS_UNIT"="L1"."RANGE_FROM_05" AND
    "MV_LEDGER_2020"."CHARTFIELD1"="L"."RANGE_FROM_10")
    6 - access("L1"."SELECTOR_NUM"=30982)
    8 - access("L"."SELECTOR_NUM"=30985)
    10 - filter("MV_LEDGER_2020"."ACCOUNTING_PERIOD"<=6)
    Quite correctly, the query on 2020 period 7 is not rewritten because the underlying partition is stale.
    Plan hash value: 1321682226
    ---------------------------------------------------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
    ---------------------------------------------------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1 | 92 | 7 (15)| 00:00:01 | | |
    | 1 | HASH GROUP BY | | 1 | 92 | 7 (15)| 00:00:01 | | |
    |- * 2 | HASH JOIN | | 1 | 92 | 6 (0)| 00:00:01 | | |
    | 3 | NESTED LOOPS | | 1 | 92 | 6 (0)| 00:00:01 | | |
    |- 4 | STATISTICS COLLECTOR | | | | | | | |
    |- * 5 | HASH JOIN | | 1 | 73 | 5 (0)| 00:00:01 | | |
    | 6 | NESTED LOOPS | | 1 | 73 | 5 (0)| 00:00:01 | | |
    |- 7 | STATISTICS COLLECTOR | | | | | | | |
    |- * 8 | HASH JOIN | | 1 | 55 | 4 (0)| 00:00:01 | | |
    | 9 | NESTED LOOPS | | 1 | 55 | 4 (0)| 00:00:01 | | |
    |- 10 | STATISTICS COLLECTOR | | | | | | | |
    | 11 | PARTITION RANGE SINGLE | | 1 | 44 | 3 (0)| 00:00:01 | 3 | 3 |
    | 12 | PARTITION RANGE SINGLE | | 1 | 44 | 3 (0)| 00:00:01 | 8 | 8 |
    | * 13 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| PS_LEDGER | 1 | 44 | 3 (0)| 00:00:01 | 36 | 36 |
    | * 14 | INDEX RANGE SCAN | PSXLEDGER | 1 | | 2 (0)| 00:00:01 | 36 | 36 |
    | * 15 | INDEX RANGE SCAN | PS_PSTREESELECT05 | 1 | 11 | 1 (0)| 00:00:01 | | |
    |- * 16 | INDEX RANGE SCAN | PS_PSTREESELECT05 | 1 | 11 | 1 (0)| 00:00:01 | | |
    | * 17 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 1 | 18 | 1 (0)| 00:00:01 | | |
    |- * 18 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 1 | 18 | 1 (0)| 00:00:01 | | |
    | * 19 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 1 | 19 | 1 (0)| 00:00:01 | | |
    |- * 20 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 1 | 19 | 1 (0)| 00:00:01 | | |
    ---------------------------------------------------------------------------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------
    2 - access("A"."ACCOUNT"="L2"."RANGE_FROM_10")
    5 - access("A"."CHARTFIELD1"="L"."RANGE_FROM_10")
    8 - access("A"."BUSINESS_UNIT"="L1"."RANGE_FROM_05")
    13 - filter("A"."CURRENCY_CD"='GBP')
    14 - access("A"."LEDGER"='ACTUALS' AND "A"."FISCAL_YEAR"=2020 AND "A"."ACCOUNTING_PERIOD"=7)
    15 - access("L1"."SELECTOR_NUM"=30982 AND "A"."BUSINESS_UNIT"="L1"."RANGE_FROM_05")
    filter("A"."BUSINESS_UNIT"="L1"."RANGE_FROM_05")
    16 - access("L1"."SELECTOR_NUM"=30982)
    17 - access("L"."SELECTOR_NUM"=30985 AND "A"."CHARTFIELD1"="L"."RANGE_FROM_10")
    filter("A"."CHARTFIELD1"="L"."RANGE_FROM_10")
    18 - access("L"."SELECTOR_NUM"=30985)
    19 - access("L2"."SELECTOR_NUM"=30984 AND "A"."ACCOUNT"="L2"."RANGE_FROM_10")
    filter("A"."ACCOUNT"="L2"."RANGE_FROM_10")
    20 - access("L2"."SELECTOR_NUM"=30984)
    Both MVs are compressed after the initial creation. Note the sizes of the partitions for fiscal year 2020; about 256 blocks, and 284 rows per block
                                              Sub-                                             Rows
    Part Part per
    TABLE_NAME Pos PARTITION_NAME Pos SUBPARTITION_NAME NUM_ROWS BLOCKS Block COMPRESS COMPRESS_FOR
    --------------- ---- -------------------- ---- ------------------------- -------- ------ ------ -------- ----------------
    MV_LEDGER_2019 1 AP_BF 72886 252 289.2 ENABLED BASIC
    2 AP_01 72925 252 289.4 ENABLED BASIC
    3 AP_02 72736 251 289.8 ENABLED BASIC
    4 AP_03 72745 251 289.8 ENABLED BASIC
    5 AP_04 72649 251 289.4 ENABLED BASIC
    6 AP_05 71947 249 288.9 ENABLED BASIC
    7 AP_06 72903 252 289.3 ENABLED BASIC
    8 AP_07 72510 250 290.0 ENABLED BASIC
    9 AP_08 72520 251 288.9 ENABLED BASIC
    10 AP_09 72965 252 289.5 ENABLED BASIC
    11 AP_10 72209 250 288.8 ENABLED BASIC
    12 AP_11 72647 251 289.4 ENABLED BASIC
    13 AP_12 73121 253 289.0 ENABLED BASIC
    14 AP_CF 1999 25 80.0 ENABLED BASIC
    946762 3290 287.8

    MV_LEDGER_2020 1 AP_BF 72475 256 283.1 ENABLED BASIC
    2 AP_01 72981 256 285.1 ENABLED BASIC
    3 AP_02 72726 256 284.1 ENABLED BASIC
    4 AP_03 72844 256 284.5 ENABLED BASIC
    5 AP_04 72709 256 284.0 ENABLED BASIC
    6 AP_05 72535 256 283.3 ENABLED BASIC
    7 AP_06 72419 256 282.9 ENABLED BASIC
    8 AP_07 0 0 ENABLED BASIC
    9 AP_08 0 0 ENABLED BASIC
    10 AP_09 0 0 ENABLED BASIC
    11 AP_10 0 0 ENABLED BASIC
    12 AP_11 0 0 ENABLED BASIC
    13 AP_12 0 0 ENABLED BASIC
    14 AP_CF 0 0 ENABLED BASIC
    508689 1792 283.9
    Let's look at the trace of the refresh processes. Both materialised views were marked as NEEDS_COMPILE, so both were refreshed. However, the trace shows that the refresh has changed from truncate to delete and the insert is not done in direct path mode. The refresh of MV_LEDGER_2019 didn't actually change any data because both refreshes tried to process 2020 because a 2020 subpartition had been changed. No data was deleted, and none was inserted.

    /* MV_REFRESH (DEL) */ DELETE FROM "SCOTT"."MV_LEDGER_2019" WHERE ( ( ( (2020 <= "FISCAL_YEAR" AND "FISCAL_YEAR"< 2021) )) )

    /* MV_REFRESH (INS) */ INSERT /*+ BYPASS_RECURSIVE_CHECK */ INTO "SCOTT"."MV_LEDGER_2019" ("BUSINESS_UNIT", "ACCOUNT", "CHARTFIELD1",
    "FISCAL_YEAR", "ACCOUNTING_PERIOD", "POSTED_TOTAL_AMT") SELECT /*+ X_DYN_PRUNE */ "PS_LEDGER"."BUSINESS_UNIT" , "PS_LEDGER"."ACCOUNT" ,
    "PS_LEDGER"."CHARTFIELD1" , "PS_LEDGER"."FISCAL_YEAR" , "PS_LEDGER"."ACCOUNTING_PERIOD" , SUM("PS_LEDGER"."POSTED_TOTAL_AMT")
    FROM "PS_LEDGER""PS_LEDGER" WHERE ("PS_LEDGER"."FISCAL_YEAR"=2019 AND "PS_LEDGER"."LEDGER"='ACTUALS' AND "PS_LEDGER"."CURRENCY_CD"='GBP')
    AND ( ( ( (2020 <= "PS_LEDGER"."FISCAL_YEAR" AND "PS_LEDGER"."FISCAL_YEAR"< 2021) ) ) )GROUP BY
    "PS_LEDGER"."BUSINESS_UNIT","PS_LEDGER"."ACCOUNT","PS_LEDGER"."CHARTFIELD1","PS_LEDGER"."FISCAL_YEAR","PS_LEDGER"."ACCOUNTING_PERIOD"



    /* MV_REFRESH (DEL) */ DELETE FROM "SCOTT"."MV_LEDGER_2020" WHERE ( ( ( (2020 <= "FISCAL_YEAR" AND "FISCAL_YEAR"< 2021) )) )

    /* MV_REFRESH (INS) */ INSERT /*+ BYPASS_RECURSIVE_CHECK */ INTO "SCOTT"."MV_LEDGER_2020" ("BUSINESS_UNIT", "ACCOUNT", "CHARTFIELD1",
    "FISCAL_YEAR", "ACCOUNTING_PERIOD", "POSTED_TOTAL_AMT") SELECT /*+ X_DYN_PRUNE */ "PS_LEDGER"."BUSINESS_UNIT" , "PS_LEDGER"."ACCOUNT" ,
    "PS_LEDGER"."CHARTFIELD1" , "PS_LEDGER"."FISCAL_YEAR" , "PS_LEDGER"."ACCOUNTING_PERIOD" , SUM("PS_LEDGER"."POSTED_TOTAL_AMT")
    FROM "PS_LEDGER""PS_LEDGER" WHERE ("PS_LEDGER"."FISCAL_YEAR"=2020 AND "PS_LEDGER"."LEDGER"='ACTUALS' AND "PS_LEDGER"."CURRENCY_CD"='GBP')
    AND ( ( ( (2020 <= "PS_LEDGER"."FISCAL_YEAR" AND "PS_LEDGER"."FISCAL_YEAR"< 2021) ) ) )GROUP BY
    "PS_LEDGER"."BUSINESS_UNIT","PS_LEDGER"."ACCOUNT","PS_LEDGER"."CHARTFIELD1","PS_LEDGER"."FISCAL_YEAR","PS_LEDGER"."ACCOUNTING_PERIOD"
    However, the 2020 materialized view has gone from 256 blocks per period to 384 blocks, and from 285 to 189 rows per block because the data is no longer compressed because it was not inserted in direct path mode, although there was still a commit between the delete and insert statements.
                                              Sub-                                             Rows
    Part Part per
    TABLE_NAME Pos PARTITION_NAME Pos SUBPARTITION_NAME NUM_ROWS BLOCKS Block COMPRESS COMPRESS_FOR
    --------------- ---- -------------------- ---- ------------------------- -------- ------ ------ -------- ------------------------------
    MV_LEDGER_2019 1 AP_BF 72886 252 289.2 ENABLED BASIC
    2 AP_01 72925 252 289.4 ENABLED BASIC
    3 AP_02 72736 251 289.8 ENABLED BASIC
    4 AP_03 72745 251 289.8 ENABLED BASIC
    5 AP_04 72649 251 289.4 ENABLED BASIC
    6 AP_05 71947 249 288.9 ENABLED BASIC
    7 AP_06 72903 252 289.3 ENABLED BASIC
    8 AP_07 72510 250 290.0 ENABLED BASIC
    9 AP_08 72520 251 288.9 ENABLED BASIC
    10 AP_09 72965 252 289.5 ENABLED BASIC
    11 AP_10 72209 250 288.8 ENABLED BASIC
    12 AP_11 72647 251 289.4 ENABLED BASIC
    13 AP_12 73121 253 289.0 ENABLED BASIC
    14 AP_CF 1999 25 80.0 ENABLED BASIC
    946762 3290 287.8

    MV_LEDGER_2020 1 AP_BF 72475 384 188.7 ENABLED BASIC
    2 AP_01 72981 384 190.1 ENABLED BASIC
    3 AP_02 72726 384 189.4 ENABLED BASIC
    4 AP_03 72844 384 189.7 ENABLED BASIC
    5 AP_04 72709 384 189.3 ENABLED BASIC
    6 AP_05 72535 384 188.9 ENABLED BASIC
    7 AP_06 72419 384 188.6 ENABLED BASIC
    8 AP_07 72795 1006 72.4 ENABLED BASIC
    9 AP_08 0 0 ENABLED BASIC
    10 AP_09 0 0 ENABLED BASIC
    11 AP_10 0 0 ENABLED BASIC
    12 AP_11 0 0 ENABLED BASIC
    13 AP_12 0 0 ENABLED BASIC
    14 AP_CF 0 0 ENABLED BASIC
    581484 3694 157.4
    MV_CAPABILITIES reports PCT is available, and it is. It correctly identified stale partitions that prevent rewrite.
    MVNAME                         CAPABILITY_NAME                P REL_TEXT             MSGTXT
    ------------------------------ ------------------------------ - -------------------- ------------------------------------------------------------
    MV_LEDGER_2019 PCT Y
    REFRESH_COMPLETE Y
    REFRESH_FAST Y
    REWRITE Y
    PCT_TABLE Y PS_LEDGER
    REFRESH_FAST_AFTER_INSERT N SCOTT.PS_LEDGER the detail table does not have a materialized view log
    REFRESH_FAST_AFTER_ONETAB_DML N POSTED_TOTAL_AMT SUM(expr) without COUNT(expr)
    REFRESH_FAST_AFTER_ONETAB_DML N see the reason why REFRESH_FAST_AFTER_INSERT is disabled
    REFRESH_FAST_AFTER_ONETAB_DML N COUNT(*) is not present in the select list
    REFRESH_FAST_AFTER_ONETAB_DML N SUM(expr) without COUNT(expr)
    REFRESH_FAST_AFTER_ANY_DML N see the reason why REFRESH_FAST_AFTER_ONETAB_DML is disabled
    REFRESH_FAST_PCT Y
    REWRITE_FULL_TEXT_MATCH Y
    REWRITE_PARTIAL_TEXT_MATCH Y
    REWRITE_GENERAL Y
    REWRITE_PCT Y
    PCT_TABLE_REWRITE Y PS_LEDGER

    MV_LEDGER_2020 PCT Y
    REFRESH_COMPLETE Y
    REFRESH_FAST Y
    REWRITE Y
    PCT_TABLE Y PS_LEDGER
    REFRESH_FAST_AFTER_INSERT N SCOTT.PS_LEDGER the detail table does not have a materialized view log
    REFRESH_FAST_AFTER_ONETAB_DML N POSTED_TOTAL_AMT SUM(expr) without COUNT(expr)
    REFRESH_FAST_AFTER_ONETAB_DML N see the reason why REFRESH_FAST_AFTER_INSERT is disabled
    REFRESH_FAST_AFTER_ONETAB_DML N COUNT(*) is not present in the select list
    REFRESH_FAST_AFTER_ONETAB_DML N SUM(expr) without COUNT(expr)
    REFRESH_FAST_AFTER_ANY_DML N see the reason why REFRESH_FAST_AFTER_ONETAB_DML is disabled
    REFRESH_FAST_PCT Y
    REWRITE_FULL_TEXT_MATCH Y
    REWRITE_PARTIAL_TEXT_MATCH Y
    REWRITE_GENERAL Y
    REWRITE_PCT Y
    PCT_TABLE_REWRITE Y PS_LEDGER
    Mismatching partitioning caused non-atomic refresh to go back to atomic mode and so the data was no longer compressed.  

     

    Demonstration 7: Partition on Accounting Period, Subpartition on Fiscal Year!

    This final example still composite partitions the ledger table, but now I will swap the partitioning and sub-partitioning. I will range partition on ACCOUNTING PERIOD into 14 partitions per fiscal year and will subpartition on FISCAL_YEAR. The intention is to demonstrate that the partition elimination will still work correctly and that I will only have to refresh a single accounting period. 
    However, you will see that there are some problems, and I can't work around all of them. 
    I will use a template so that each accounting period partition will have the same fiscal year subpartitions.
    I will still only range partition the MV on accounting period. We don't need to partition it on FISCAL_YEAR since it only contains a single year.
    CREATE TABLE ps_ledger
    (business_unit VARCHAR2(5) NOT NULL

    ) PCTFREE 10 PCTUSED 80
    PARTITION BY RANGE (ACCOUNTING_PERIOD)
    SUBPARTITION BY RANGE (FISCAL_YEAR)
    SUBPARTITION TEMPLATE
    (SUBPARTITION ledger_2018 VALUES LESS THAN (2019)
    ,SUBPARTITION ledger_2019 VALUES LESS THAN (2020)
    ,SUBPARTITION ledger_2020 VALUES LESS THAN (2021)
    ,SUBPARTITION ledger_2021 VALUES LESS THAN (2022))
    (PARTITION ap_bf VALUES LESS THAN (1)
    ,PARTITION ap_01 VALUES LESS THAN (2)
    ,PARTITION ap_02 VALUES LESS THAN (3)
    ,PARTITION ap_03 VALUES LESS THAN (4)
    ,PARTITION ap_04 VALUES LESS THAN (5)
    ,PARTITION ap_05 VALUES LESS THAN (6)
    ,PARTITION ap_06 VALUES LESS THAN (7)
    ,PARTITION ap_07 VALUES LESS THAN (8)
    ,PARTITION ap_08 VALUES LESS THAN (9)
    ,PARTITION ap_09 VALUES LESS THAN (10)
    ,PARTITION ap_10 VALUES LESS THAN (11)
    ,PARTITION ap_11 VALUES LESS THAN (12)
    ,PARTITION ap_12 VALUES LESS THAN (13)
    ,PARTITION ap_cf VALUES LESS THAN (MAXVALUE))
    ENABLE ROW MOVEMENT NOLOGGING
    /
    I can't specify physical attributes on subpartitions, only partitions. So I have to come along afterwards and alter the sub-partitions. I am going to do that before I populate the data so it is compressed on load rather than load it and rebuild it afterwards.
    set serveroutput on 
    DECLARE
    l_sql CLOB;
    BEGIN
    FOR i IN (
    select *
    from user_tab_subpartitions
    where table_name = 'PS_LEDGER'
    and subpartition_name like 'AP%LEDGER%201%'
    and (compression = 'DISABLED' OR pct_free>0)
    order by table_name, partition_position, subpartition_position
    ) LOOP
    l_sql := 'ALTER TABLE '||i.table_name||' MOVE SUBPARTITION '||i.subpartition_name||' COMPRESS UPDATE INDEXES';
    dbms_output.put_line(l_sql);
    EXECUTE IMMEDIATE l_sql;
    END LOOP;
    END;
    /
    @treeselectors
    @popledger
                                              Sub-                                             Rows
    Part Part per
    TABLE_NAME Pos PARTITION_NAME Pos SUBPARTITION_NAME NUM_ROWS BLOCKS Block COMPRESS COMPRESS_FOR
    --------------- ---- -------------------- ---- ------------------------- -------- ------ ------ -------- ------------
    PS_LEDGER 1 AP_BF 261458 5147 50.8 NONE
    1 1 AP_BF_LEDGER_2018 84565 1372 61.6 ENABLED BASIC
    1 2 AP_BF_LEDGER_2019 84519 1371 61.6 ENABLED BASIC
    1 3 AP_BF_LEDGER_2020 84673 2193 38.6 DISABLED
    1 4 AP_BF_LEDGER_2021 7701 211 36.5 DISABLED
    2 AP_01 261108 5174 50.5 NONE
    2 1 AP_01_LEDGER_2018 84268 1368 61.6 ENABLED BASIC
    2 2 AP_01_LEDGER_2019 84233 1366 61.7 ENABLED BASIC
    2 3 AP_01_LEDGER_2020 84831 2224 38.1 DISABLED
    2 4 AP_01_LEDGER_2021 7776 216 36.0 DISABLED
    3 AP_02 261174 5172 50.5 NONE
    3 1 AP_02_LEDGER_2018 84372 1369 61.6 ENABLED BASIC
    3 2 AP_02_LEDGER_2019 84444 1370 61.6 ENABLED BASIC
    3 3 AP_02_LEDGER_2020 84596 2218 38.1 DISABLED
    3 4 AP_02_LEDGER_2021 7762 215 36.1 DISABLED
    4 AP_03 259982 5149 50.5 NONE
    4 1 AP_03_LEDGER_2018 84105 1364 61.7 ENABLED BASIC
    4 2 AP_03_LEDGER_2019 83820 1360 61.6 ENABLED BASIC
    4 3 AP_03_LEDGER_2020 84284 2210 38.1 DISABLED
    4 4 AP_03_LEDGER_2021 7773 215 36.2 DISABLED
    5 AP_04 261376 5177 50.5 NONE
    5 1 AP_04_LEDGER_2018 84378 1369 61.6 ENABLED BASIC
    5 2 AP_04_LEDGER_2019 84649 1374 61.6 ENABLED BASIC
    5 3 AP_04_LEDGER_2020 84652 2220 38.1 DISABLED
    5 4 AP_04_LEDGER_2021 7697 214 36.0 DISABLED
    6 AP_05 261772 5180 50.5 NONE
    6 1 AP_05_LEDGER_2018 84984 1378 61.7 ENABLED BASIC
    6 2 AP_05_LEDGER_2019 84656 1374 61.6 ENABLED BASIC
    6 3 AP_05_LEDGER_2020 84507 2216 38.1 DISABLED
    6 4 AP_05_LEDGER_2021 7625 212 36.0 DISABLED
    7 AP_06 260581 5165 50.5 NONE
    7 1 AP_06_LEDGER_2018 83994 1363 61.6 ENABLED BASIC
    7 2 AP_06_LEDGER_2019 84150 1366 61.6 ENABLED BASIC
    7 3 AP_06_LEDGER_2020 84729 2222 38.1 DISABLED
    7 4 AP_06_LEDGER_2021 7708 214 36.0 DISABLED
    8 AP_07 184118 3163 58.2 NONE
    8 1 AP_07_LEDGER_2018 84863 1377 61.6 ENABLED BASIC
    8 2 AP_07_LEDGER_2019 84155 1366 61.6 ENABLED BASIC
    8 3 AP_07_LEDGER_2020 7587 211 36.0 DISABLED
    8 4 AP_07_LEDGER_2021 7513 209 35.9 DISABLED
    9 AP_08 184619 3173 58.2 NONE
    9 1 AP_08_LEDGER_2018 84547 1372 61.6 ENABLED BASIC
    9 2 AP_08_LEDGER_2019 84775 1376 61.6 ENABLED BASIC
    9 3 AP_08_LEDGER_2020 7662 213 36.0 DISABLED
    9 4 AP_08_LEDGER_2021 7635 212 36.0 DISABLED
    10 AP_09 184375 3168 58.2 NONE
    10 1 AP_09_LEDGER_2018 84407 1370 61.6 ENABLED BASIC
    10 2 AP_09_LEDGER_2019 84645 1373 61.6 ENABLED BASIC
    10 3 AP_09_LEDGER_2020 7570 210 36.0 DISABLED
    10 4 AP_09_LEDGER_2021 7753 215 36.1 DISABLED
    11 AP_10 184327 3166 58.2 NONE
    11 1 AP_10_LEDGER_2018 84300 1368 61.6 ENABLED BASIC
    11 2 AP_10_LEDGER_2019 84738 1374 61.7 ENABLED BASIC
    11 3 AP_10_LEDGER_2020 7656 212 36.1 DISABLED
    11 4 AP_10_LEDGER_2021 7633 212 36.0 DISABLED
    12 AP_11 184489 3167 58.3 NONE
    12 1 AP_11_LEDGER_2018 84406 1369 61.7 ENABLED BASIC
    12 2 AP_11_LEDGER_2019 84861 1376 61.7 ENABLED BASIC
    12 3 AP_11_LEDGER_2020 7700 213 36.2 DISABLED
    12 4 AP_11_LEDGER_2021 7522 209 36.0 DISABLED
    13 AP_12 184244 3168 58.2 NONE
    13 1 AP_12_LEDGER_2018 84611 1373 61.6 ENABLED BASIC
    13 2 AP_12_LEDGER_2019 84155 1365 61.7 ENABLED BASIC
    13 3 AP_12_LEDGER_2020 7776 216 36.0 DISABLED
    13 4 AP_12_LEDGER_2021 7702 214 36.0 DISABLED
    14 AP_CF 4800 154 31.2 NONE
    14 1 AP_CF_LEDGER_2018 2200 53 41.5 ENABLED BASIC
    14 2 AP_CF_LEDGER_2019 2200 53 41.5 ENABLED BASIC
    14 3 AP_CF_LEDGER_2020 200 24 8.3 DISABLED
    14 4 AP_CF_LEDGER_2021 200 24 8.3 DISABLED
    2938423 55323 53.1
    If I query periods 1-6 in 2018 I get correct partition elimination. Oracle inspects 6 partitions, 1 sub-partition on each. So swapping the composite partitioning types and columns should not affect performance.
    Plan hash value: 2690363151
    -----------------------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
    -----------------------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 717 | 66681 | 2244 (1)| 00:00:01 | | |
    | 1 | HASH GROUP BY | | 717 | 66681 | 2244 (1)| 00:00:01 | | |
    |* 2 | HASH JOIN | | 717 | 66681 | 2243 (1)| 00:00:01 | | |
    |* 3 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 258 | 4902 | 2 (0)| 00:00:01 | | |
    |* 4 | HASH JOIN | | 2776 | 200K| 2241 (1)| 00:00:01 | | |
    | 5 | MERGE JOIN CARTESIAN | | 208 | 6032 | 3 (0)| 00:00:01 | | |
    |* 6 | INDEX RANGE SCAN | PS_PSTREESELECT05 | 2 | 22 | 1 (0)| 00:00:01 | | |
    | 7 | BUFFER SORT | | 104 | 1872 | 2 (0)| 00:00:01 | | |
    |* 8 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 104 | 1872 | 1 (0)| 00:00:01 | | |
    | 9 | PARTITION RANGE ITERATOR| | 26693 | 1173K| 2238 (1)| 00:00:01 | 2 | 7 |
    | 10 | PARTITION RANGE SINGLE | | 26693 | 1173K| 2238 (1)| 00:00:01 | 1 | 1 |
    |* 11 | TABLE ACCESS FULL | PS_LEDGER | 26693 | 1173K| 2238 (1)| 00:00:01 | | |
    -----------------------------------------------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------
    2 - access("A"."ACCOUNT"="L2"."RANGE_FROM_10")
    3 - access("L2"."SELECTOR_NUM"=30984)
    4 - access("A"."BUSINESS_UNIT"="L1"."RANGE_FROM_05" AND "A"."CHARTFIELD1"="L"."RANGE_FROM_10")
    6 - access("L1"."SELECTOR_NUM"=30982)
    8 - access("L"."SELECTOR_NUM"=30985)
    11 - filter("A"."ACCOUNTING_PERIOD"<=6 AND "A"."FISCAL_YEAR"=2018 AND "A"."LEDGER"='ACTUALS' AND
    "A"."CURRENCY_CD"='GBP')
    CREATE MATERIALIZED VIEW mv_ledger_2019
    PARTITION BY RANGE (ACCOUNTING_PERIOD)
    (PARTITION ap_bf VALUES LESS THAN (1)
    ,PARTITION ap_01 VALUES LESS THAN (2)
    ,PARTITION ap_02 VALUES LESS THAN (3)
    ,PARTITION ap_03 VALUES LESS THAN (4)
    ,PARTITION ap_04 VALUES LESS THAN (5)
    ,PARTITION ap_05 VALUES LESS THAN (6)
    ,PARTITION ap_06 VALUES LESS THAN (7)
    ,PARTITION ap_07 VALUES LESS THAN (8)
    ,PARTITION ap_08 VALUES LESS THAN (9)
    ,PARTITION ap_09 VALUES LESS THAN (10)
    ,PARTITION ap_10 VALUES LESS THAN (11)
    ,PARTITION ap_11 VALUES LESS THAN (12)
    ,PARTITION ap_12 VALUES LESS THAN (13)
    ,PARTITION ap_cf VALUES LESS THAN (MAXVALUE)
    ) PCTFREE 0 COMPRESS
    REFRESH COMPLETE ON DEMAND
    ENABLE QUERY REWRITE AS
    SELECT business_unit, account, chartfield1, fiscal_year, accounting_period,
    sum(posted_total_amt) posted_total_amt
    FROM ps_ledger
    WHERE fiscal_year = 2019
    AND ledger = 'ACTUALS'
    AND currency_cd = 'GBP'
    GROUP BY business_unit, account, chartfield1, fiscal_year, accounting_period
    /

    CREATE MATERIALIZED VIEW mv_ledger_2020
    PARTITION BY RANGE (ACCOUNTING_PERIOD)
    (PARTITION ap_bf VALUES LESS THAN (1)
    ,PARTITION ap_01 VALUES LESS THAN (2)
    ,PARTITION ap_02 VALUES LESS THAN (3)
    ,PARTITION ap_03 VALUES LESS THAN (4)
    ,PARTITION ap_04 VALUES LESS THAN (5)
    ,PARTITION ap_05 VALUES LESS THAN (6)
    ,PARTITION ap_06 VALUES LESS THAN (7)
    ,PARTITION ap_07 VALUES LESS THAN (8)
    ,PARTITION ap_08 VALUES LESS THAN (9)
    ,PARTITION ap_09 VALUES LESS THAN (10)
    ,PARTITION ap_10 VALUES LESS THAN (11)
    ,PARTITION ap_11 VALUES LESS THAN (12)
    ,PARTITION ap_12 VALUES LESS THAN (13)
    ,PARTITION ap_cf VALUES LESS THAN (MAXVALUE)
    ) PCTFREE 0 COMPRESS PARALLEL
    REFRESH COMPLETE ON DEMAND
    ENABLE QUERY REWRITE AS
    SELECT business_unit, account, chartfield1, fiscal_year, accounting_period,
    sum(posted_total_amt) posted_total_amt
    FROM ps_ledger
    WHERE fiscal_year = 2020
    AND ledger = 'ACTUALS'
    AND currency_cd = 'GBP'
    GROUP BY business_unit, account, chartfield1, fiscal_year, accounting_period
    /
    @mvpop
    USER_MVIEW_DETAIL_SUBPARTITION correctly identified the one stale sub-partition, but USER_MVIEW_DETAIL_PARTITION reports that one range partition is stale
    @pop2020m7.sql
    MVIEW_NAME STALENESS LAST_REF COMPILE_STATE
    --------------- ------------------- -------- -------------------
    MV_LEDGER_2019 NEEDS_COMPILE COMPLETE NEEDS_COMPILE
    MV_LEDGER_2020 NEEDS_COMPILE COMPLETE NEEDS_COMPILE

    01:02:53 SQL> select * from user_mview_detail_relations;

    Detailobj
    OWNER MVIEW_NAME Owner DETAILOBJ_NAME DETAILOBJ DETAILOBJ_ALIAS D NUM_FRESH_PCT_PARTITIONS NUM_STALE_PCT_PARTITIONS
    ---------- --------------- ---------- --------------- --------- -------------------- - ------------------------ ------------------------
    SCOTT MV_LEDGER_2019 SCOTT PS_LEDGER TABLE PS_LEDGER Y 55 1
    SCOTT MV_LEDGER_2020 SCOTT PS_LEDGER TABLE PS_LEDGER Y 55 1

    01:03:06 SQL> select * from user_mview_detail_subpartition where freshness != 'FRESH';

    Detailobj
    OWNER MVIEW_NAME Owner DETAILOBJ_NAME DETAIL_PARTITION_NAM DETAIL_SUBPARTITION_ DETAIL_SUBPARTITION_POSITION FRESH
    ---------- --------------- ---------- --------------- -------------------- -------------------- ---------------------------- -----
    SCOTT MV_LEDGER_2019 SCOTT PS_LEDGER AP_07 AP_07_LEDGER_2020 3 STALE
    SCOTT MV_LEDGER_2020 SCOTT PS_LEDGER AP_07 AP_07_LEDGER_2020 3 STALE
    I get query rewrite as you would expect, and as seen in demo 5. Fiscal year 2019, period 7 still rewrites because the partition is not stale
    Plan hash value: 387550712
    ----------------------------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
    ----------------------------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1967 | 147K| 76 (3)| 00:00:01 | | |
    | 1 | HASH GROUP BY | | 1967 | 147K| 76 (3)| 00:00:01 | | |
    |* 2 | HASH JOIN | | 1967 | 147K| 75 (2)| 00:00:01 | | |
    |* 3 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 258 | 4902 | 2 (0)| 00:00:01 | | |
    |* 4 | HASH JOIN | | 7576 | 429K| 73 (2)| 00:00:01 | | |
    | 5 | MERGE JOIN CARTESIAN | | 208 | 6032 | 3 (0)| 00:00:01 | | |
    |* 6 | INDEX RANGE SCAN | PS_PSTREESELECT05 | 2 | 22 | 1 (0)| 00:00:01 | | |
    | 7 | BUFFER SORT | | 104 | 1872 | 2 (0)| 00:00:01 | | |
    |* 8 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 104 | 1872 | 1 (0)| 00:00:01 | | |
    | 9 | PARTITION RANGE SINGLE | | 72486 | 2052K| 70 (2)| 00:00:01 | 8 | 8 |
    |* 10 | MAT_VIEW REWRITE ACCESS FULL| MV_LEDGER_2019 | 72486 | 2052K| 70 (2)| 00:00:01 | 8 | 8 |
    ----------------------------------------------------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------
    2 - access("MV_LEDGER_2019"."ACCOUNT"="L2"."RANGE_FROM_10")
    3 - access("L2"."SELECTOR_NUM"=30984)
    4 - access("MV_LEDGER_2019"."BUSINESS_UNIT"="L1"."RANGE_FROM_05" AND
    "MV_LEDGER_2019"."CHARTFIELD1"="L"."RANGE_FROM_10")
    6 - access("L1"."SELECTOR_NUM"=30982)
    8 - access("L"."SELECTOR_NUM"=30985)
    10 - filter("MV_LEDGER_2019"."ACCOUNTING_PERIOD"=7)
    Fiscal year 2020 period 7 doesn't rewrite, because the subpartition is stale.
    Plan hash value: 1321682226
    ---------------------------------------------------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
    ---------------------------------------------------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1 | 92 | 7 (15)| 00:00:01 | | |
    | 1 | HASH GROUP BY | | 1 | 92 | 7 (15)| 00:00:01 | | |
    |- * 2 | HASH JOIN | | 1 | 92 | 6 (0)| 00:00:01 | | |
    | 3 | NESTED LOOPS | | 1 | 92 | 6 (0)| 00:00:01 | | |
    |- 4 | STATISTICS COLLECTOR | | | | | | | |
    |- * 5 | HASH JOIN | | 1 | 73 | 5 (0)| 00:00:01 | | |
    | 6 | NESTED LOOPS | | 1 | 73 | 5 (0)| 00:00:01 | | |
    |- 7 | STATISTICS COLLECTOR | | | | | | | |
    |- * 8 | HASH JOIN | | 1 | 55 | 4 (0)| 00:00:01 | | |
    | 9 | NESTED LOOPS | | 1 | 55 | 4 (0)| 00:00:01 | | |
    |- 10 | STATISTICS COLLECTOR | | | | | | | |
    | 11 | PARTITION RANGE SINGLE | | 1 | 44 | 3 (0)| 00:00:01 | 8 | 8 |
    | 12 | PARTITION RANGE SINGLE | | 1 | 44 | 3 (0)| 00:00:01 | 3 | 3 |
    | * 13 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| PS_LEDGER | 1 | 44 | 3 (0)| 00:00:01 | 31 | 31 |
    | * 14 | INDEX RANGE SCAN | PSXLEDGER | 1 | | 2 (0)| 00:00:01 | 31 | 31 |
    | * 15 | INDEX RANGE SCAN | PS_PSTREESELECT05 | 1 | 11 | 1 (0)| 00:00:01 | | |
    |- * 16 | INDEX RANGE SCAN | PS_PSTREESELECT05 | 1 | 11 | 1 (0)| 00:00:01 | | |
    | * 17 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 1 | 18 | 1 (0)| 00:00:01 | | |
    |- * 18 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 1 | 18 | 1 (0)| 00:00:01 | | |
    | * 19 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 1 | 19 | 1 (0)| 00:00:01 | | |
    |- * 20 | INDEX RANGE SCAN | PS_PSTREESELECT10 | 1 | 19 | 1 (0)| 00:00:01 | | |
    ---------------------------------------------------------------------------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------

    2 - access("A"."ACCOUNT"="L2"."RANGE_FROM_10")
    5 - access("A"."CHARTFIELD1"="L"."RANGE_FROM_10")
    8 - access("A"."BUSINESS_UNIT"="L1"."RANGE_FROM_05")
    13 - filter("A"."CURRENCY_CD"='GBP')
    14 - access("A"."LEDGER"='ACTUALS' AND "A"."FISCAL_YEAR"=2020 AND "A"."ACCOUNTING_PERIOD"=7)
    15 - access("L1"."SELECTOR_NUM"=30982 AND "A"."BUSINESS_UNIT"="L1"."RANGE_FROM_05")
    filter("A"."BUSINESS_UNIT"="L1"."RANGE_FROM_05")
    16 - access("L1"."SELECTOR_NUM"=30982)
    17 - access("L"."SELECTOR_NUM"=30985 AND "A"."CHARTFIELD1"="L"."RANGE_FROM_10")
    filter("A"."CHARTFIELD1"="L"."RANGE_FROM_10")
    18 - access("L"."SELECTOR_NUM"=30985)
    19 - access("L2"."SELECTOR_NUM"=30984 AND "A"."ACCOUNT"="L2"."RANGE_FROM_10")
    filter("A"."ACCOUNT"="L2"."RANGE_FROM_10")
    20 - access("L2"."SELECTOR_NUM"=30984)
    As we have already seen refresh processes all subpartitions for a partition. Now, not surprisingly, the refresh process truncates the partition for period 7 in both the 2019 and 2020 MVs even though only the 2020 data was affected. So because period 7 was stale in one fiscal year, it processed all fiscal years. We would have had the same problem if I had composite partitioned the materialized view to match table, it would have truncated and reprocessed fiscal yeares for period 7.

    /* MV_REFRESH (ATB) */ ALTER TABLE "SCOTT"."MV_LEDGER_2019"TRUNCATE PARTITION AP_07 UPDATE GLOBAL INDEXES

    /* MV_REFRESH (INS) */ INSERT /*+ APPEND BYPASS_RECURSIVE_CHECK */ INTO "SCOTT"."MV_LEDGER_2019"PARTITION ( AP_07 ) ("BUSINESS_UNIT",
    "ACCOUNT", "CHARTFIELD1", "FISCAL_YEAR", "ACCOUNTING_PERIOD", "POSTED_TOTAL_AMT") SELECT /*+ X_DYN_PRUNE */ "PS_LEDGER"."BUSINESS_UNIT" ,
    "PS_LEDGER"."ACCOUNT" , "PS_LEDGER"."CHARTFIELD1" , "PS_LEDGER"."FISCAL_YEAR" , "PS_LEDGER"."ACCOUNTING_PERIOD" P0,
    SUM("PS_LEDGER"."POSTED_TOTAL_AMT") FROM "PS_LEDGER""PS_LEDGER" WHERE ("PS_LEDGER"."FISCAL_YEAR"=2019 AND "PS_LEDGER"."LEDGER"='ACTUALS'
    AND "PS_LEDGER"."CURRENCY_CD"='GBP') AND ( ( ( ( ( ( "PS_LEDGER"."ACCOUNTING_PERIOD">= 7 ) ) )
    AND ( ( ( "PS_LEDGER"."ACCOUNTING_PERIOD"< 8 ) ) ) ) ) )GROUP BY
    "PS_LEDGER"."BUSINESS_UNIT","PS_LEDGER"."ACCOUNT","PS_LEDGER"."CHARTFIELD1","PS_LEDGER"."FISCAL_YEAR","PS_LEDGER"."ACCOUNTING_PERIOD"

    /* MV_REFRESH (ATB) */ ALTER TABLE "SCOTT"."MV_LEDGER_2020" TRUNCATE PARTITION AP_07 UPDATE GLOBAL INDEXES

    /* MV_REFRESH (INS) */ INSERT /*+ APPEND BYPASS_RECURSIVE_CHECK */ INTO "SCOTT"."MV_LEDGER_2020"PARTITION ( AP_07 ) ("BUSINESS_UNIT",
    "ACCOUNT", "CHARTFIELD1", "FISCAL_YEAR", "ACCOUNTING_PERIOD", "POSTED_TOTAL_AMT") SELECT /*+ X_DYN_PRUNE */ "PS_LEDGER"."BUSINESS_UNIT" ,
    "PS_LEDGER"."ACCOUNT" , "PS_LEDGER"."CHARTFIELD1" , "PS_LEDGER"."FISCAL_YEAR" , "PS_LEDGER"."ACCOUNTING_PERIOD" P0,
    SUM("PS_LEDGER"."POSTED_TOTAL_AMT") FROM "PS_LEDGER""PS_LEDGER" WHERE ("PS_LEDGER"."FISCAL_YEAR"=2020 AND "PS_LEDGER"."LEDGER"='ACTUALS'
    AND "PS_LEDGER"."CURRENCY_CD"='GBP') AND ( ( ( ( ( ( "PS_LEDGER"."ACCOUNTING_PERIOD">= 7 ) ) )
    AND ( ( ( "PS_LEDGER"."ACCOUNTING_PERIOD"< 8 ) ) ) ) ) )GROUP BY
    "PS_LEDGER"."BUSINESS_UNIT","PS_LEDGER"."ACCOUNT","PS_LEDGER"."CHARTFIELD1","PS_LEDGER"."FISCAL_YEAR","PS_LEDGER"."ACCOUNTING_PERIOD"
    Partition pruning still worked correctly after swapping the partitioning and sub-partitioning columns. 
    It also correctly controlled query rewrite. 
    However, the PCT refresh processed all years for the single accounting period, rather than all accounting periods for the single year. That is less work if you have fewer fiscal years than accounting periods. Generally, I see systems only contain 3 to 6 fiscal years of data. However, it is also refreshing MVs that didn't need to be refreshed 
     Swapping the partitioning columns has also made the management of the partitions in the ledger table much more complicated.
    • I can't interval sub-partition, so I can't automatically add partitions for future fiscal years on demand. Instead, I am going to have to add new fiscal year subpartitions to each of the 1 4 range partitions.
    • I can't specify storage options or compression attributes on sub-partitions in the create table DDL command, so I have to come along afterwards with PL/SQL to alter the sub-partitions. 
    On balance, I don't think I would choose to implement this.

    Conclusion

    PCT does track individually stale partitions and subpartitions, but the subsequent refresh appears only to be done by partition. If one subpartition is stale, then the entire partition is refreshed. If you use composite partitioning then you may have to accept reprocessing more data than is absolutely necessary rather than create a partitioning strategy that is less effective. 
    The subpartition key should be subordinate to the partition key. In the ledger example that I have used, I think it is better to partition by fiscal year and subpartition by accounting period (demonstration 5) than vice versa (demonstration 7). 
    PCT doesn't work when there are multiple partitioning key columns. So you need to find a single partition key column that is used by the application that is sufficiently selective to restrict the number of partitions being refreshed. 
    The partitioning on the table and the materialized view must be the same type of partitioning and on the same column. Otherwise, while PCT may still work, the refresh process may not be possible to populate the materialized view in direct-path mode, and it may not be possible to maintain compressed materialized views. 
    There will be a balance to be struck. On the one hand application performance can be improved by partitioning application tables in a way that partition elimination is effective, but that partitioning strategy may not work with PCT. On the other, reporting performance can be improved maintaining fresh pre-aggregated data in materialized views, and PCT can help to keep the materialized fresh with less overhead.

    First Steps in Spatial Data

    $
    0
    0
    This is the introductory blog post in a series about using Spatial Data in the Oracle database.

    Caveat: Spatial Data has been a part of the Oracle database since at least version 8i.  I have been aware of it for many years, but have never previously used it myself.  Recently, I have recently had some spare time and decided to experiment with it.  These blogs document my first steps.  I have spent a lot of time reading the documentation and using Google to find other people's blogs.  Where I found useful material I have provided links to it.  It is likely that more experienced developers can point out my mistakes, and better methods to achieve results.  In which case, I will gladly publish comments and make corrections to my material.

    Index

    1. Loading GPX data into XML data types
    2. Convert GPX Track to a Spatial Line Geometry

    Problem Statement

    A map reading stop!
    When I am not working with Oracle databases, I am a keen cyclist and I ride with a touring club.  I have also always enjoyed maps having been taught to read Ordnance Survey maps at school.  It is no surprise therefore that I lead rides for my cycling club.  We used to use (and you can still buy) paper maps.  By 2005 I was starting to use a GPS. Initially, I recorded rides as tracks on PDA.  By 2012, I was regularly using an android tablet on my handlebar bag for navigation.   The market has caught up and people now attach their phones to their handlebars or have dedicated bike computers with GPS and Bluetooth links to their phones.  The cycling club website includes a library of the routes of previous rides, however, you can only search that by the structured data held for that ride.  So, for example, I can only search for rides in the Chilterns if that word appears in the description.  I cannot do a spatial search.

    I have also started to use Strava, an internet service for tracking exercise.  It is mainly used by cyclists and runners.  Activities can be recorded on a phone or other device and then be uploaded, compared and analysed.  Every time I go out on the bike I upload the activity.  I also uploaded my back catalogue of GPS data.  As a result of the Coronavirus lockdowns, I bought an indoor trainer by that I use with Zwift and that also posts data to Strava.  My most recent toy is a heart monitor.  Both Strava and Zwift also capture data from that.  Strava will let you see a certain amount of analysis about your activities and how you compare to other people, and more if you pay for their subscription service.  They will also allow you to export and download all of your data as a set of structured data in CSV files, and also the GPX files and photographs that you uploaded.  

    I thought it would be interesting to try to analyse and interrogate that data.  Typical questions might include:

    1. I ride up Swain's Lane in Highgate most days.  How long do I take, and am I getting faster or slower?
    2. I want to go for a ride in the Chilterns, so I would like to see tracks of previous rides to get some route ideas.

    So I am going to upload my Strava data into an Oracle database, load the GPS tracks currently in GPX files into the database, convert them to Spatial geometries, and then process them.  To answer the first question I will need to provide a working definition of Swain's Lane.  For the second, I need definitions of various areas.  For example, I will take the Chilterns to be the area designed by Natural England as an Area of Outstanding Natural Beauty.  So I will need to import a definition of that and other areas from published data.

    The following series of blogs illustrate how I dealt with these and other challenges.

    Spatial Data 1: Loading GPX data into XML data types

    $
    0
    0

    This blog is part of a series about my first steps using Spatial Data in the Oracle database.  I am using the GPS data for my cycling activities collected by Strava.

    In these posts I have only shown extracts of some of the scripts I have written.  The full files are available on github.

    Upload and Expand Strava Bulk Export

    Strava will bulk export all your data to a zipped folder.  It contains various CSV files.  I am interested in activities.csv that contains a row for each activity with various pieces of data including the name of the data file that can be found in the /activities directory.  That file will usually be a .gpx, or it may be zipped as a .gpx.gz file.  GPX is an XML schema that contains sets of longitude/latitude coordinates and may contain other attributes.  

    The first job is to upload the Strava export .zip file to somewhere accessible to the database server (in my case /vagrant) and to expand it (to /tmp/strava/).

    cd /vagrant
    mkdir /tmp/strava
    unzip /vagrant/export_1679301.zip -d /tmp/strava

    Create Strava Schema 

    I need to create a new database schema to hold the various objects I will create, and I have to give it certain privileges.
    connect / as sysdba
    create user strava identified by strava;
    grant connect, resource to strava;
    grant create view to strava;
    grant select_catalog_role to strava;
    grant XDBADMIN to STRAVA;
    grant alter session to STRAVA;
    alter user strava quota unlimited on users;
    alter user strava default tablespace users;

    GRANT CREATE ANY DIRECTORY TO strava;
    CREATE OR REPLACE DIRECTORY strava as '/tmp/strava';
    CREATE OR REPLACE DIRECTORY activities as '/tmp/strava/activities';
    CREATE OR REPLACE DIRECTORY exec_dir AS '/usr/bin';

    GRANT READ, EXECUTE ON DIRECTORY exec_dir TO strava;
    GRANT READ, EXECUTE ON DIRECTORY strava TO strava;
    GRANT READ ON DIRECTORY activities TO strava;
    • I need to create database directories for both the CSV files in /tmp/strava and the various GPX files in the /tmp/strava/activities sub-directory.  I will need read privilege on both directories, and also execute privilege on the strava directory so that I can use a pre-processor script.
    • The exec_dir directory points to /usr/bin where the zip executables are located.  I need read and execute privilege on this so I can read directly from zipped files.
    • XDBADMIN: "Allows the grantee to register an XML schema globally, as opposed to registering it for use or access only by its owner. It also lets the grantee bypass access control list (ACL) checks when accessing Oracle XML DB Repository".

    Import CSV file via an External Table

    I will start by creating an external table to read the Strava activities.csv file, and then copy it into a database table.  This file is a simple comma-separated variable file.  The activity date, name and description are enclosed in double-quotes.  
    The first problem that I encountered was that some of the descriptions I typed into Strava contain newline characters and the external table interprets them as the end of the record even though these characters are inside the double-quotes.
    4380927517,"23 Nov 2020, 18:03:54",Zwift Crash Recovery,Virtual Ride,"Zwift Crash Recovery
    1. recover fit file per https://zwiftinsider.com/retrieve-lost-ride/,
    2. fix corrupt .fit file with https://www.fitfiletools.com",1648,13.48,,false,Other,activities/4682540615.gpx.gz,,10.0,1648.0,1648.0,13480.2001953125,13.199999809265137,
    8.179733276367188,91.0,36.20000076293945,12.600000381469727,69.5999984741211,7.099999904632568,0.40652215480804443,,,84.0,62.1943244934082,
    ,,,150.66201782226562,276.8444519042969,,,,,,,,,,,,158.0,1649.0,,,0.0,,1.0,,,,,,,,,,,,,,,,4907360.0,,,,,,,,,,,
    As Chris Saxon points out on AskTom, it is necessary to pre-process the records to replace the newline characters with something else.  I found this awk script to process the record.  So I put it into a shell script nlfix.sh, made it executable and invoked as a pre-processor in the external table definition.
    #nlfix.sh
    /usr/bin/gawk -v RS='"''NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", $0, RT) }' $*

    nlfix.sh
    • Note the full path for gawk is specified.
    A database directory is needed for the location of the pre-processor scripts and it is necessary to grant read and execute privileges on it.  I simply put the pre-processor in the same directory as the CSV file so I could use the same strava directory I created earlier.
    GRANT READ, EXECUTE ON DIRECTORY strava TO strava;
    Now I can define an external table that will read the activities.csv file. 
    CREATE TABLE strava.activities_ext
    (Activity_ID NUMBER
    ,Activity_Date DATE
    ,Activity_Name VARCHAR2(100)
    ,Activity_Type VARCHAR2(15)
    ,Activity_Description VARCHAR2(200)
    ,Elapsed_Time NUMBER
    ,Distance_km NUMBER
    …)
    ORGANIZATION EXTERNAL
    (TYPE ORACLE_LOADER
    DEFAULT DIRECTORY strava
    ACCESS PARAMETERS
    (RECORDS DELIMITED BY newline
    SKIP 1
    DISABLE_DIRECTORY_LINK_CHECK
    PREPROCESSOR strava:'nlfix.sh'
    FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' RTRIM
    MISSING FIELD VALUES ARE NULL
    REJECT ROWS WITH ALL NULL FIELDS
    NULLIF = BLANKS
    (Activity_ID,Activity_Date date "DD Mon yyyy,HH24:mi:ss"
    ,Activity_Name,Activity_Type,Activity_Description
    ,Elapsed_Time,Distance_km
    …))
    LOCATION ('activities.csv')
    ) REJECT LIMIT 5
    /

    Import Activities

    Now I can simply copy from the external table to a regular table.  I have omitted a lot of columns that Strava does not populate (at least not in my export) but that appear in the CSV file.
    rem 1b_create_activities_ext.sql
    spool 1b_create_activities_ext

    CREATE TABLE strava.activities AS
    select ACTIVITY_ID,ACTIVITY_DATE,ACTIVITY_NAME,ACTIVITY_TYPE,ACTIVITY_DESCRIPTION,
    ELAPSED_TIME,DISTANCE_KM,RELATIVE_EFFORT,COMMUTE_CHAR,ACTIVITY_GEAR,
    FILENAME,
    ATHLETE_WEIGHT,BIKE_WEIGHT,ELAPSED_TIME2,MOVING_TIME,DISTANCE_M,MAX_SPEED,AVERAGE_SPEED,
    ELEVATION_GAIN,ELEVATION_LOSS,ELEVATION_LOW,ELEVATION_HIGH,MAX_GRADE,AVERAGE_GRADE,
    --AVERAGE_POSITIVE_GRADE,AVERAGE_NEGATIVE_GRADE,
    MAX_CADENCE,AVERAGE_CADENCE,
    --MAX_HEART_RATE,
    AVERAGE_HEART_RATE,
    --MAX_WATTS,
    AVERAGE_WATTS,CALORIES,
    --MAX_TEMPERATURE,AVERAGE_TEMPERATURE,
    RELATIVE_EFFORT2,
    TOTAL_WORK,
    --NUMBER_OF_RUNS,
    --UPHILL_TIME,DOWNHILL_TIME,OTHER_TIME,
    PERCEIVED_EXERTION,
    --TYPE,
    --START_TIME,
    WEIGHTED_AVERAGE_POWER,POWER_COUNT,
    PREFER_PERCEIVED_EXERTION,PERCEIVED_RELATIVE_EFFORT,
    COMMUTE,
    --TOTAL_WEIGHT_LIFTED,
    FROM_UPLOAD,
    GRADE_ADJUSTED_DISTANCE,
    --WEATHER_OBSERVATION_TIME,WEATHER_CONDITION,
    --WEATHER_TEMPERATURE,APPARENT_TEMPERATURE,
    --DEWPOINT,HUMIDITY,WEATHER_PRESSURE,
    --WIND_SPEED,WIND_GUST,WIND_BEARING,
    --PRECIPITATION_INTENSITY,
    --SUNRISE_TIME,SUNSET_TIME,MOON_PHASE,
    BIKE
    --GEAR,
    --PRECIPITATION_PROBABILITY,PRECIPITATION_TYPE,
    --CLOUD_COVER,WEATHER_VISIBILITY,UV_INDEX,WEATHER_OZONE,
    --JUMP_COUNT,TOTAL_GRIT,AVG_FLOW,
    --FLAGGED
    FROM strava.activities_ext
    /

    ALTER TABLE activities ADD CONSTRAINT activities_pk PRIMARY KEY (activity_id);

    ALTER TABLE activities ADD (gpx XMLTYPE) XMLTYPE COLUMN gpx STORE AS SECUREFILE BINARY XML (CACHE DISABLE STORAGE IN ROW);
    ALTER TABLE activities ADD (geom mdsys.sdo_geometry));
    ALTER TABLE activities ADD (geom_27700 mdsys.sdo_geometry));
    ALTER TABLE activities ADD (mbr mdsys.sdo_geometry));
    ALTER TABLE activities ADD (xmlns VARCHAR2(128));
    ALTER TABLE activities ADD (num_pts INTEGER DEFAULT 0);

    Spool off
    • I have specified a primary key on activity_id and made a number of other columns not nullable.
    • I have added a new XMLTYPE column GPX into which I will load the GPS data in the .gpx files.  

    FIT files

    Some applications, such as Garmin and Rouvy generate compressed .fit files, and Strava exports them again (apparently if it can't convert them, although it can convert the .fit files from Zwift to .gpx).  These are binary files, and since I only have a few of them, I have converted them to .gpx files using GPSBabel on my laptop, and then I reuploaded the .gpx files.
    for %i in (*.fit.gz) do "C:\Program Files\GnuWin\bin\gzip" -fd %i
    for %i in (*.fit) do "C:\Program Files (x86)\GPSBabel\GPSBabel.exe" -i garmin_fit -f "%i" -o gpx -F "%~ni".gpx
    I then update the file name in the activities table.
    UPDATE activities
    SET filename = REPLACE(filename,'.fit.gz','.gpx')
    WHERE filename like '%.fit.gz'
    /

    Compress GPX files (optional)

    Some of the GPX files in the Strava export are compressed and some are not.  There is no obvious reason why.  To minimise the space I can gzip the GPX files.
    gzip -9v /tmp/strava/activities/*.gpx
    If I do compress any .gpx files, then I also need to update the file names in the activities table.
    UPDATE activities
    Set filename = filename||'.gz'
    Where filename like '%.gpx'
    /

    Load the GPX files into the XML data type.

    The next stage is to load each of the GPX files into the activities table.  
    create or replace package body strava_pkg as 
    k_module CONSTANT VARCHAR2(48) := $$PLSQL_UNIT;

    ----------------------------------------------------------------------------------------------------
    function getClobDocument
    (p_directory IN VARCHAR2
    ,p_filename IN VARCHAR2
    ,p_charset IN VARCHAR2 DEFAULT NULL
    ) return CLOB deterministic
    is
    l_module VARCHAR2(64);
    l_action VARCHAR2(64);

    v_filename VARCHAR2(128);
    v_directory VARCHAR2(128);
    v_file bfile;
    v_unzipped blob := empty_blob();

    v_Content CLOB := '';
    v_src_offset number := 1 ;
    v_dst_offset number := 1 ;
    v_charset_id number := 0;
    v_lang_ctx number := DBMS_LOB.default_lang_ctx;
    v_warning number;

    e_22288 EXCEPTION; --file or LOB operation FILEOPEN failed
    PRAGMA EXCEPTION_INIT(e_22288, -22288);
    BEGIN
    dbms_application_info.read_module(module_name=>l_module
    ,action_name=>l_action);
    dbms_application_info.set_module(module_name=>k_module
    ,action_name=>'getClobDocument');

    IF p_charset IS NOT NULL THEN
    v_charset_id := NLS_CHARSET_ID(p_charset);
    END IF;

    v_filename := REGEXP_SUBSTR(p_filename,'[^\/]+',1,2);
    v_directory := REGEXP_SUBSTR(p_filename,'[^\/]+',1,1);

    IF v_directory IS NOT NULL and v_filename IS NULL THEN /*if only one parameters then it is actually a filename*/
    v_filename := v_directory;
    v_directory := '';
    END IF;

    IF p_directory IS NOT NULL THEN
    v_directory := p_directory;
    END IF;

    v_File := bfilename(UPPER(v_directory),v_filename);

    BEGIN
    DBMS_LOB.fileopen(v_File, DBMS_LOB.file_readonly);
    exception
    when VALUE_ERROR OR e_22288 then
    dbms_output.put_line('Can''t open:'||v_directory||'/'||v_filename||' - '||v_dst_offset||' bytes');
    v_content := '';
    dbms_application_info.set_module(module_name=>l_module
    ,action_name=>l_action);
    return v_content;
    END;

    IF v_filename LIKE '%.gz' THEN
    v_unzipped := utl_compress.lz_uncompress(v_file);
    dbms_lob.converttoclob(
    dest_lob => v_content,
    src_blob => v_unzipped,
    amount => DBMS_LOB.LOBMAXSIZE,
    dest_offset => v_dst_offset,
    src_offset => v_src_offset,
    blob_csid => dbms_lob.default_csid,
    lang_context => v_lang_ctx,
    warning => v_warning);
    ELSE --ELSIF v_filename LIKE '%.g__' THEN
    DBMS_LOB.LOADCLOBFROMFILE(v_Content,
    Src_bfile => v_File,
    amount => DBMS_LOB.LOBMAXSIZE,
    src_offset => v_src_offset,
    dest_offset => v_dst_offset,
    bfile_csid => v_charset_id,
    lang_context => v_lang_ctx,
    warning => v_warning);
    END IF;

    dbms_output.put_line(v_directory||'/'||v_filename||' - '||v_dst_offset||' bytes');
    DBMS_LOB.fileclose(v_File);

    dbms_application_info.set_module(module_name=>l_module
    ,action_name=>l_action);

    return v_Content;
    exception when others then
    dbms_output.put_line(v_directory||'/'||v_filename||' - '||v_dst_offset||' bytes');
    DBMS_LOB.fileclose(v_File);
    dbms_application_info.set_module(module_name=>l_module
    ,action_name=>l_action);
    raise;
    end getClobDocument;
    ----------------------------------------------------------------------------------------------------

    END strava_pkg;
    /

    I can simply query the contents of the uncompressed GPX file in SQL by calling the function.  In this case, the zipped .gpx file is 65K but decompresses to 1.2Mb.
    Set long 1000 lines 200 pages 99 serveroutput on
    Column filename format a30
    Column gpx format a100
    select activity_id, filename
    , getClobDocument('',filename) gpx
    from activities
    where filename like '%.gpx%'
    And activity_id = 4468006769
    order by 1
    /


    ACTIVITY_ID FILENAME GPX
    ----------- ------------------------------ ----------------------------------------------------------------------------------------------------
    4468006769 activities/4468006769.gpx.gz <?xml version="1.0" encoding="UTF-8"?>
    <gpx creator="StravaGPX Android" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLoc
    ation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd http://www.garmin
    .com/xmlschemas/GpxExtensions/v3 http://www.garmin.com/xmlschemas/GpxExtensionsv3.xsd http://www.gar
    min.com/xmlschemas/TrackPointExtension/v1 http://www.garmin.com/xmlschemas/TrackPointExtensionv1.xsd
    " version="1.1" xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxtpx="http://www.garmin.com/xmlsch
    emas/TrackPointExtension/v1" xmlns:gpxx="http://www.garmin.com/xmlschemas/GpxExtensions/v3">
    <metadata>
    <time>2020-12-13T14:31:13Z</time>
    </metadata>
    <trk>
    <name>Loop</name>
    <type>1</type>
    <trkseg>
    <trkpt lat="51.5296380" lon="-0.1875360">
    <ele>30.6</ele>
    <time>2020-12-13T14:31:13Z</time>
    <extensions>
    <gpxtpx:TrackPointExtension>
    <gpxtpx:hr>57</gpxtpx:hr>
    </gpxtpx:TrackPointExtension>
    </extensions>
    </trkpt>


    activities/4468006769.gpx.gz - 1286238
    Elapsed: 00:00:00.14
    I can load the .gpx files into the GPX column of the activities table with a simple update statement.  The CLOB returned from the function is converted to an XML with XMLTYPE.
    UPDATE activities
    SET gpx = XMLTYPE(getClobDocument('ACTIVITIES',filename))
    WHERE filename like '%.gpx%'
    /
    I can now query back the same GPX from the database.
    Set long 1100 lines 200 pages 99 serveroutput on
    select activity_id, filename, gpx
    from activities
    where filename like '%.gpx%'
    And activity_id = 4468006769
    order by 1
    /


    ACTIVITY_ID FILENAME GPX
    ----------- ------------------------------ ----------------------------------------------------------------------------------------------------
    4468006769 activities/4468006769.gpx.gz <?xml version="1.0" encoding="US-ASCII"?>
    <gpx creator="StravaGPX Android" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLoc
    ation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd http://www.garmin
    .com/xmlschemas/GpxExtensions/v3 http://www.garmin.com/xmlschemas/GpxExtensionsv3.xsd http://www.gar
    min.com/xmlschemas/TrackPointExtension/v1 http://www.garmin.com/xmlschemas/TrackPointExtensionv1.xsd
    " version="1.1" xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxtpx="http://www.garmin.com/xmlsch
    emas/TrackPointExtension/v1" xmlns:gpxx="http://www.garmin.com/xmlschemas/GpxExtensions/v3">
    <metadata>
    <time>2020-12-13T14:31:13Z</time>
    </metadata>
    <trk>
    <name>Loop</name>
    <type>1</type>
    <trkseg>
    <trkpt lat="51.5296380" lon="-0.1875360">
    <ele>30.6</ele>
    <time>2020-12-13T14:31:13Z</time>
    <extensions>
    <gpxtpx:TrackPointExtension>
    <gpxtpx:hr>57</gpxtpx:hr>
    </gpxtpx:TrackPointExtension>
    </extensions>
    </trkpt>
    <trkpt lat="51.5296350" lon="-0.1875340">

    Spatial Data 2: Convert GPX Track to a Spatial Line Geometry

    $
    0
    0

    This blog is part of a series about my first steps using Spatial Data in the Oracle database.  I am using the GPS data for my cycling activities collected by Strava.

    Having loaded my GPS tracks from GPX files into an XML type column, the next stage is to extract the track points and create a spatial geometry column.  

    Defining Spatial Geometries

    Spatial objects are generically referred to as geometries.  When you define one, you have to specify what kind of geometry it is, and what coordinate system you are using. Later when you compare geometries to each other they have to use the same coordinate system. Otherwise, Oracle will raise an error.  Fortunately, Oracle can convert between coordinate systems.

    Various coordinate systems are used for geographical data, they are given EPSG Geodetic Parameter Dataset codes.  Oracle supports various coordinate systems.  As well as older definitions, it also has current definitions where the ESPG code matches the Spatial Reference ID (SDO_SRID).  They can be queried from SDO_COORD_REF_SYS.

    I will use two different coordinate systems during this series of blogs

    Set lines 150 pages 99
    Column coord_ref_sys_name format a35
    Column legacy_cs_bounds format a110
    select srid, coord_ref_sys_name, coord_ref_sys_kind, legacy_cs_bounds
    from SDO_COORD_REF_SYS where srid IN(4326, 27700)
    /

    SRID COORD_REF_SYS_NAME COORD_REF_SYS_KIND
    ---------- ----------------------------------- ------------------------
    LEGACY_CS_BOUNDS(SDO_GTYPE, SDO_SRID, SDO_POINT(X, Y, Z), SDO_ELEM_INFO, SDO_ORDINATES)
    --------------------------------------------------------------------------------------------------------------
    4326 WGS 84 GEOGRAPHIC2D
    SDO_GEOMETRY(2003, 4326, NULL, SDO_ELEM_INFO_ARRAY(1, 1003, 3), SDO_ORDINATE_ARRAY(-180, -90, 180, 90))

    27700 OSGB 1936 / British National Grid PROJECTED

    • "The World Geodetic System (WGS) is a standard for use in cartography, geodesy, and satellite navigation including GPS". The latest revision is WGS 84 (also known as WGS 1984, EPSG:4326). It is the reference coordinate system used by the Global Positioning System (GPS).  Where I am dealing with longitude and latitude, specified in degrees, especially from GPS data, I need to tell Oracle that it is WGS84 by specifying SDO_SRID of 4326.
    • Later on, I will also be using data for Great Britain available from the Ordnance Survey that uses the Ordnance Survey National Grid (also known as British National Grid) reference system.  That requires SDO_SRID to be set to 27700.

    See also:

    Creating Spatial Points

    I have found it useful to create a packaged function to convert longitude and latitude to a spatial data point.  It is a useful shorthand that I use in various places.

    create or replace package body strava_pkg as 
    k_module CONSTANT VARCHAR2(48) := $$PLSQL_UNIT;

    ----------------------------------------------------------------------------------------------------
    function make_point
    (longitude in number
    ,latitude in number)
    return sdo_geometry deterministic is
    l_module VARCHAR2(64);
    l_action VARCHAR2(64);
    begin
    dbms_application_info.read_module(module_name=>l_module
    ,action_name=>l_action);
    dbms_application_info.set_module(module_name=>k_module
    ,action_name=>'make_point');

    if longitude is not null and latitude is not null then
    return
    sdo_geometry (
    2001, 4326,
    sdo_point_type (longitude, latitude, null),
    null, null
    );
    else
    return null;
    end if;

    dbms_application_info.set_module(module_name=>l_module
    ,action_name=>l_action);
    end make_point;
    ----------------------------------------------------------------------------------------------------
    END strava_pkg;
    /

    strava_pkg.sql

    There are two parameters to SDO_GEOMETRY that I always have to specify.

    • The first parameter, SDO_GTYPE, describes the natures of the spatial geometry being defined.  Here it is 2001.  The 2 indicates that it is a 2-dimensional geometry, and the 1 indicates that it is a single point.  See SDO_GEOMETRY Object Type
    • The second parameter, SDO_SRID, defines the coordinate system that I discussed above.  4326 indicates that I am working with longitude and latitude.

    XML Namespace

    GPS data is often held in GPX or GPS Exchange Format.  This is an XML schema.  GPX has been the de-facto XML standard for the lightweight interchange of GPS data since the initial GPX 1.0 release in 2002.  The GPX 1.1 schema was released in 2004 (see https://www.topografix.com/gpx.asp).  

    Garmin has created an extension schema that holds additional athlete training information such as heart rate.

    I can extract individual track points from a GPX with SQL using the extract() and extractvalue() functions.  However, I have GPX tracks that use both versions of the Topographix GPX schema (it depends on upon which piece of software emitted the GPX file), and some that also use the Garmin extensions.  

    Therefore, I need to register all three schemas with Oracle.  I can download the schema files with wget.

    cd /tmp/strava
    wget http://www.topografix.com/GPX/1/0/gpx.xsd --output-document=gpx0.xsd
    wget http://www.topografix.com/GPX/1/1/gpx.xsd
    wget https://www8.garmin.com/xmlschemas/TrackPointExtensionv1.xsd

    Then I can register the files 

    delete from plan_table WHERE statement_id = 'XSD';
    insert into plan_table (statement_id, plan_id, object_name, object_alias)
    values ('XSD', 1, 'gpx0.xsd', 'http://www.topografix.com/GPX/1/0/gpx.xsd');
    insert into plan_table (statement_id, plan_id, object_name, object_alias)
    values ('XSD', 2, 'gpx.xsd', 'http://www.topografix.com/GPX/1/1/gpx.xsd');
    insert into plan_table (statement_id, plan_id, object_name, object_alias)
    values ('XSD', 3, 'TrackPointExtensionv1.xsd', 'https://www8.garmin.com/xmlschemas/TrackPointExtensionv1.xsd');

    DECLARE
    xmlSchema xmlType;
    res boolean;
    BEGIN
    FOR i IN (
    SELECT object_alias schemaURL
    , object_name schemaDoc
    FROM plan_table
    WHERE statement_id = 'XSD'
    ORDER BY plan_id
    ) LOOP
    --read xsd file
    xmlSchema := XMLTYPE(getCLOBDocument('STRAVA',i.schemaDoc,'AL32UTF8'));
    --if already exists delete XSD
    if (dbms_xdb.existsResource(i.schemaDoc)) then
    dbms_xdb.deleteResource(i.schemaDoc);
    end if;
    --create resource from XSD
    res := dbms_xdb.createResource(i.schemaDoc,xmlSchema);

    -- Delete existing schema
    dbms_xmlschema.deleteSchema(
    i.schemaURL
    );
    -- Now reregister the schema
    dbms_xmlschema.registerSchema(
    i.schemaURL,
    xmlSchema,
    TRUE,TRUE,FALSE,FALSE
    );
    END LOOP;
    End;
    /
    3a_register_xml_schema.sql

    Then I can query the registered schemas.

    Set pages 99 lines 160
    Column schema_url format a60
    Column qual_schema_url format a105
    select schema_url, local, hier_type, binary, qual_schema_url
    from user_xml_schemas
    /

    SCHEMA_URL LOC HIER_TYPE BIN
    ------------------------------------------------------------ --- ----------- ---
    QUAL_SCHEMA_URL
    ---------------------------------------------------------------------------------------------------------
    https://www8.garmin.com/xmlschemas/TrackPointExtensionv1.xsd YES CONTENTS NO
    http://xmlns.oracle.com/xdb/schemas/STRAVA/https://www8.garmin.com/xmlschemas/TrackPointExtensionv1.xsd

    http://www.topografix.com/GPX/1/0/gpx.xsd YES CONTENTS NO
    http://xmlns.oracle.com/xdb/schemas/STRAVA/www.topografix.com/GPX/1/0/gpx.xsd

    http://www.topografix.com/GPX/1/1/gpx.xsd YES CONTENTS NO
    http://xmlns.oracle.com/xdb/schemas/STRAVA/www.topografix.com/GPX/1/1/gpx.xsd

    Extracting GPS Track Points from GPX

    A GPS track is a list of points specifying at least time, longitude, latitude and often elevation.  I can extract all the points in a GPX as a set of rows.  However, I must specify the correct namespace for the specific GPX.

    Column time_string format a20
    SELECT g.activity_id
    , EXTRACTVALUE(VALUE(t), 'trkpt/time') time_string
    , TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lat')) lat
    , TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lon')) lng
    , TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/ele')) ele
    , TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/extensions/gpxtpx:TrackPointExtension/gpxtpx:hr'
    ,'xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1"')) hr
    FROM activities g,
    TABLE(XMLSEQUENCE(extract(g.gpx,'/gpx/trk/trkseg/trkpt'
    ,'xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1"'
    ))) t
    Where activity_id IN(4468006769)
    And rownum <= 10
    /

    Activity
    ID TIME_STRING LAT LNG ELE HR
    ---------- -------------------- ------------- ------------- ------- ----
    4468006769 2020-12-13T14:31:13Z 51.52963800 -.18753600 30.6 57
    2020-12-13T14:31:14Z 51.52963500 -.18753400 30.6 57
    2020-12-13T14:31:15Z 51.52964100 -.18753100 30.6 57
    2020-12-13T14:31:16Z 51.52964000 -.18752900 30.6 57
    2020-12-13T14:31:17Z 51.52963600 -.18752700 30.6 57
    2020-12-13T14:31:18Z 51.52963200 -.18752700 30.6 57
    2020-12-13T14:31:19Z 51.52962900 -.18752800 30.6 57
    2020-12-13T14:31:20Z 51.52962800 -.18752800 30.6 57
    2020-12-13T14:31:21Z 51.52962800 -.18752900 30.6 57
    2020-12-13T14:31:22Z 51.52962800 -.18753000 30.6 57

    I can use this approach to extract all the points from a GPS track and create a spatial line geometry.  I have put the whole process into a packaged procedure strava_pkg.load_activity.

    First I need to work out which version of the Topographix schema is in use.  So I can try extracting the creator name with each and see which is not null.


    IF l_num_rows > 0 THEN
    UPDATE activities
    SET gpx = XMLTYPE(l_gpx), geom = null, geom_27700 = null, num_pts = 0, xmlns = NULL
    WHERE activity_id = p_activity_id
    RETURNING extractvalue(gpx,'/gpx/@version', 'xmlns="http://www.topografix.com/GPX/1/0"')
    , extractvalue(gpx,'/gpx/@version', 'xmlns="http://www.topografix.com/GPX/1/1"')
    INTO l_xmlns0, l_xmlns1;
    l_num_rows := SQL%rowcount;
    END IF;

    Now I can extract all the points in a GPX as a set of rows and put them into a spatial geometry.  I turn each row with two coordinates into two rows with one point each.  Note that longitude is listed before latitude for each point.  I convert the rows into a list using multiset() and finally cast that as a spatial ordinate array. 

    Note that the SDO_GTYPE is 2002 (rather than 2001) because it is a line (rather than a single point) on a two-dimensional coordinate system.

      BEGIN
    UPDATE activities a
    SET geom = mdsys.sdo_geometry(2002,4326,null,mdsys.sdo_elem_info_array(1,2,1),
    cast(multiset(
    select CASE n.rn WHEN 1 THEN pt.lng WHEN 2 THEN pt.lat END ord
    from (
    SELECT rownum rn
    , TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lon')) as lng
    , TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lat')) as lat
    FROM TABLE(XMLSEQUENCE(extract(a.gpx,'/gpx/trk/trkseg/trkpt', 'xmlns="http://www.topografix.com/GPX/1/1"'))) t
    ) pt,
    (select 1 rn from dual union all select 2 from dual) n
    order by pt.rn, n.rn
    ) AS mdsys.sdo_ordinate_array))
    , xmlns = 'xmlns="http://www.topografix.com/GPX/1/1"'
    WHERE a.gpx IS NOT NULL
    And activity_id = p_activity_id;
    l_num_rows := SQL%rowcount;
    EXCEPTION
    WHEN e_13034 OR e_29877 THEN
    dbms_output.put_line('Exception:'||sqlerrm);
    l_num_rows := 0;
    END;

    I have found it helpful to simplify the line geometry with sdo_util.simplify(). It removes some of the noise in the GPS data and has resolved problems with calculating the length of lines that intersect with areas.

      BEGIN
    UPDATE activities
    SET geom = sdo_util.simplify(geom,1)
    WHERE geom IS NOT NULL
    And activity_id = p_activity_id;
    l_num_rows := SQL%rowcount;
    EXCEPTION
    WHEN e_13034 THEN
    dbms_output.put_line('Exception:'||sqlerrm);
    END;

    There are a few other fields I also update at this point.  You will see me use them later.

    • NUM_PTS is the number of points in the line geometry.  
    • GEOM_27700 is the result of converting the line to British National Grid reference coordinates.  This helps when comparing it to British boundary data obtained from the Ordnance Survey or other government agencies.
    • MBR is the minimum bounding rectangle for the line.  This is generated to enable me to improve the performance of some spatial queries.  I have found some of the spatial operators to calculate intersections between geometries are quite slow and CPU intensive when applied to GPS tracks and boundary data that both have lots of points.  SDO_GEOM.SDO_MBR simply returns 4 ordinates that define the bounding rectangle.  This can be used to roughly match geometries that might match before doing a proper match.

      UPDATE activities 
    SET num_pts = SDO_UTIL.GETNUMVERTICES(geom)
    , geom_27700 = sdo_cs.transform(geom,27700)
    , mbr = sdo_geom.sdo_mbr(geom)
    WHERE geom IS NOT NULL
    And activity_id = p_activity_id
    RETURNING num_pts INTO l_num_pts;
    dbms_output.put_line('Activity ID:'||p_activity_id||', '||l_num_pts||' points');

    Now I can load each GPX and process it into a spatial geometry in one step.  I can process all of the activities in a simple loop.

    set serveroutput on timi on
    exec strava_pkg.load_activity(4468006769);
    Loading Activity: 4468006769
    ACTIVITIES/4468006769.gpx.gz - 1286238 bytes
    xmlns 1=StravaGPX Android
    Activity ID:4468006769, 998 points

    PL/SQL procedure successfully completed.

    Elapsed: 00:00:01.41

    Now my Strava activities are all in spatial geometries and I can start to do some spatial processing.

    Spatial Data 3. Analyse a track in proximity to a GPS route

    $
    0
    0
    This blog is part of a series about my first steps using Spatial Data in the Oracle database.  I am using the GPS data for my cycling activities collected by Strava.

    Swain's Lane, Highgate
    Now I have loaded some data, I am going to start to do something useful with it.  I go out on my bike most mornings, and I usually ride up Swain's Lane in Highgate three times.  How long did each one take?  Over time, have I got faster or slower?

    I need a definition of Swain's Lane that I can compare to.  I will start by drawing a route with my favourite GPS software.  A route is just a sequence of route points.  I can then export that as a GPX file.

    <?xml version="1.0" encoding="UTF-8"?>
    <gpx xmlns="http://www.topografix.com/GPX/1/1" version="1.1" creator="ViewRanger - //www.viewranger.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd">
    <rte>
    <name><![CDATA[Swain's World]]></name>
    <rtept lat="51.569613039632" lon="-0.14770468632509"></rtept>
    <rtept lat="51.569407978151" lon="-0.14832964102552"></rtept>
    <rtept lat="51.567090552402" lon="-0.14674177328872"></rtept>
    <rtept lat="51.567080548869" lon="-0.14592101733016"></rtept>
    <rtept lat="51.569618041121" lon="-0.14773419062425"></rtept>
    </rte>
    </gpx>

    Geometries Table

    I will load the GPX route into a table much as I did with the track files. 
    drop table my_geometries purge;

    createg table my_geometries
    (geom_id NUMBER NOT NULL
    ,descr VARCHAR2(64)
    ,gpx XMLTYPE
    ,geom mdsys.sdo_geometry
    ,geom_27700 mdsys.sdo_geometry
    ,mbr mdsys.sdo_geometry
    ,constraint my_geometries_pk PRIMARY KEY (geom_id)
    )
    XMLTYPE COLUMN gpx STORE AS SECUREFILE BINARY XML (CACHE DISABLE STORAGE IN ROW)
    /
    The difference is that I have a series of route points instead of track points, so the paths in extract() and extractvalue() are slightly different.
    delete from my_geometries where geom_id = 2;
    INSERT INTO my_geometries (geom_id, descr, gpx)
    VALUES (2,'Swains World Route', XMLTYPE(strava_pkg.getClobDocument('STRAVA','swainsworldroute.gpx')));

    UPDATE my_geometries
    SET geom = mdsys.sdo_geometry(2002,4326,null,mdsys.sdo_elem_info_array(1,2,1),
    cast(multiset(
    select CASE n.rn WHEN 1 THEN pt.lng WHEN 2 THEN pt.lat END ord
    from (
    SELECT /*+MATERIALIZE*/ rownum rn
    , TO_NUMBER(EXTRACTVALUE(VALUE(t), 'rtept/@lon')) as lng
    , TO_NUMBER(EXTRACTVALUE(VALUE(t), 'rtept/@lat')) as lat
    FROM my_geometries g,
    TABLE(XMLSEQUENCE(extract(g.gpx,'/gpx/rte/rtept','xmlns="http://www.topografix.com/GPX/1/1"'))) t
    where g.geom_id = 2
    ) pt,
    (select 1 rn from dual union all select 2 from dual) n
    order by pt.rn, n.rn
    ) AS mdsys.sdo_ordinate_array))
    WHERE gpx IS NOT NULL
    AND geom IS NULL
    /
    UPDATE my_geometries
    SET mbr = sdo_geom.sdo_mbr(geom)
    , geom_27700 = sdo_cs.transform(geom,27700)
    /

    Commit;
    Set pages 99 lines 180
    Select geom_id, descr, gpx, geom
    from my_geometries
    where geom_id = 2;

    GEOM_ID DESCR
    ---------- ----------------------------------------------------------------
    GPX
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    GEOM(SDO_GTYPE, SDO_SRID, SDO_POINT(X, Y, Z), SDO_ELEM_INFO, SDO_ORDINATES)
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    2 Swains World Route
    <?xml version="1.0" encoding="US-ASCII"?>
    <gpx xmlns="http://www.topografix.com/GPX/1/1" version="1.1" creator="ViewRanger - //www.viewranger.com" xml
    SDO_GEOMETRY(2002, 4326, NULL, SDO_ELEM_INFO_ARRAY(1, 2, 1), SDO_ORDINATE_ARRAY(-.14651114, 51.5670769, -.14649237, 51.567298, -.1465782, 51.567563, -.14680618, 51.5680165, -.14697
    516, 51.5682533, -.14754379, 51.5688701, -.14807219, 51.5694887))

    I am going to build spatial indexes on the geometry columns, so I need to define the upper and lower bound values on the coordinates.
    delete from user_sdo_geom_metadata where table_name = 'MY_GEOMETRIES';
    insert into user_sdo_geom_metadata (table_name,column_name,diminfo,srid)
    values (
    'MY_GEOMETRIES' , 'GEOM_27700',
    sdo_dim_array(
    sdo_dim_element('Easting',-1000000,1500000,0.05),
    sdo_dim_element('Northing', -500000,2000000,0.05)),
    27700);
    insert into user_sdo_geom_metadata (table_name,column_name,diminfo,srid)
    values (
    'MY_GEOMETRIES' , 'GEOM',
    sdo_dim_array(
    sdo_dim_element('Longitude',-180,180,0.05),
    sdo_dim_element('Latgitude',-90,90,0.05)),
    4326);
    insert into user_sdo_geom_metadata (table_name,column_name,diminfo,srid)
    values (
    'MY_GEOMETRIES' , 'MBR',
    sdo_dim_array(
    sdo_dim_element('Longitude',-180,180,0.05),
    sdo_dim_element('Latgitude',-90,90,0.05)),
    4326);
    commit;

    CREATE INDEX my_geometries_geom ON my_geometries (geom) INDEXTYPE IS MDSYS.SPATIAL_INDEX_v2;
    CREATE INDEX my_geometries_geom_27700 ON my_geometries (geom_27700) INDEXTYPE IS MDSYS.SPATIAL_INDEX_v2;
    CREATE INDEX my_geometries_mbr ON my_geometries (mbr) INDEXTYPE IS MDSYS.SPATIAL_INDEX_v2;

    Compare Geometries

    Now I can compare my Swain's Lane geometry to my activity geometries.  Let's start by looking for rides in December 2020 that went up Swain's Lane
    Column activity_id heading 'Activity|ID'
    Column activity_name format a30
    Column geom_relate heading 'geom|relate' format a6
    With a as (
    SELECT a.activity_id, a.activity_date, a.activity_name
    , SDO_GEOM.RELATE(a.geom,'anyinteract',g.geom,25) geom_relate
    FROM activities a
    , my_geometries g
    WHERE a.activity_type = 'Ride'
    --And a.activity_id IN(4468006769)
    And a.activity_date >= TO_DATE('01122020','DDMMYYYY')
    and g.geom_id = 2 /*Swains World Route*/
    )
    Select *
    From a
    Where geom_relate = 'TRUE'
    Order by activity_date
    /

    Where there is a relation between the two geometries then I have a hit.
      Activity                                                    geom
    ID ACTIVITY_DATE ACTIVITY_NAME relate
    ---------- ------------------- ------------------------------ ------
    4419821750 08:44:45 02.12.2020 Loop TRUE
    4428307816 10:49:25 04.12.2020 Loop TRUE
    4431920358 09:41:13 05.12.2020 Loop TRUE

    4528825613 09:39:38 28.12.2020 Loop TRUE
    4534027888 11:29:45 29.12.2020 Loop TRUE
    4538488655 09:57:55 30.12.2020 Loop TRUE

    25 rows selected.

    Analyse Individual Efforts

    Now I want to analyse each of my trips up Swain's Lane on a particular day.  I am going to work with the GPX rather than the spatial geometry because I am interested also in time, elevation and heart rate data that is not stored in the spatial geometry.
    Also, you can't use analytic functions on spatial geometries.
    with x as (
    SELECT activity_id
    , TO_DATE(EXTRACTVALUE(VALUE(t), 'trkpt/time'),'YYYY-MM-DD"T"HH24:MI:SS"Z"') time
    , TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lat')) lat
    , TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lon')) lng
    FROM activities a,
    TABLE(XMLSEQUENCE(extract(a.gpx,'/gpx/trk/trkseg/trkpt','xmlns="http://www.topografix.com/GPX/1/1"'))) t
    WHERE a.activity_id IN(4468006769)
    ), y as (
    select x.*, strava_pkg.make_point(lng,lat) loc
    from x
    )
    select lag(loc,1) over (partition by activity_id order by time) last_loc
    from y
    /

    select lag(loc,1) over (partition by activity_id order by time) last_loc
    *
    ERROR at line 13:
    ORA-22901: cannot compare VARRAY or LOB attributes of an object type

    Instead, I will have to apply analytic functions to the values extracted from the GPX and then create a spatial point.  Thus I will be able to calculate the length of each individual trip by aggregating the distance between each pair of points.
    The following query splits out each trip up Swain's Lane in a particular activity and shows the distance, duration, and metrics about elevation, gradient, and heart rate. 
    alter session set statistics_level=ALL;
    alter session set nls_date_Format = 'hh24:mi:ss dd.mm.yyyy';
    break on activity_id skip 1
    compute sum of sum_dist on activity_id
    compute sum of num_pt on activity_id
    compute sum of sum_secs on activity_id
    Set lines 180 pages 50 timi on
    Column activity_id heading 'Activity|ID'
    Column activity_name format a15
    column time format a20
    column lat format 999.99999999
    column lng format 999.99999999
    column ele format 9999.9
    column hr format 999
    column sdo_relate format a10
    column num_pts heading 'Num|Pts' format 99999
    column sum_dist heading 'Dist.|(km)' format 999.999
    column sum_secs heading 'Secs' format 9999
    column avg_speed heading 'Avg|Speed|(kmph)' format 99.9
    column ele_gain heading 'Ele|Gain|(m)' format 9999.9
    column ele_loss heading 'Ele|Loss|(m)' format 9999.9
    column avg_grade heading 'Avg|Grade|%' format 99.9
    column min_ele heading 'Min|Ele|(m)' format 999.9
    column max_ele heading 'Max|Ele|(m)' format 999.9
    column avg_hr heading 'Avg|HR' format 999
    column max_hr heading 'Max|HR' format 999
    WITH geo as ( /*route geometry to compare to*/
    select /*MATERIALIZE*/ g.*, 25 tol
    , sdo_geom.sdo_length(geom, unit=>'unit=m') geom_length
    from my_geometries g
    where geom_id = 2 /*Swains World Route*/
    ), a as ( /*extract all points in activity*/
    SELECT a.activity_id, g.geom g_geom, g.tol, g.geom_length
    , TO_DATE(EXTRACTVALUE(VALUE(t), 'trkpt/time'),'YYYY-MM-DD"T"HH24:MI:SS"Z"') time
    , TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lat')) lat
    , TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lon')) lng
    , TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/ele')) ele
    , TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/extensions/gpxtpx:TrackPointExtension/gpxtpx:hr'
    ,'xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1"')) hr
    FROM activities a,
    geo g,
    TABLE(XMLSEQUENCE(extract(a.gpx,'/gpx/trk/trkseg/trkpt'
    ,'xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1"'))) t
    Where a.activity_id IN(4468006769)
    and SDO_GEOM.RELATE(a.geom,'anyinteract',g.geom,g.tol) = 'TRUE' /*activity has relation to reference geometry*/
    ), b as ( /*smooth elevation*/
    Select a.*
    , avg(ele) over (partition by activity_id order by time rows between 2 preceding and 2 following) avg_ele
    From a
    ), c as ( /*last point*/
    Select b.*
    , row_number() over (partition by activity_id order by time) seq
    , lag(time,1) over (partition by activity_id order by time) last_time
    , lag(lat,1) over (partition by activity_id order by time) last_lat
    , lag(lng,1) over (partition by activity_id order by time) last_lng
    --, lag(ele,1) over (partition by activity_id order by time) last_ele
    , lag(avg_ele,1) over (partition by activity_id order by time) last_avg_ele
    From b
    ), d as ( /*make points*/
    SELECT c.*
    , strava_pkg.make_point(lng,lat) loc
    , strava_pkg.make_point(last_lng,last_lat) last_loc
    FROM c
    ), e as ( /*determine whether point is inside the polygon*/
    select d.*
    , 86400*(time-last_time) secs
    , avg_ele-last_avg_ele ele_diff
    , sdo_geom.sdo_distance(loc,last_loc,0.05,'unit=m') dist
    , SDO_GEOM.RELATE(loc,'anyinteract', g_geom, tol) sdo_relate
    FROM d
    ), f as (
    select e.*
    , CASE WHEN sdo_relate != lag(sdo_relate,1) over (partition by activity_id order by time) THEN 1 END sdo_diff
    from e
    ), g as (
    select f.*
    , SUM(sdo_diff) over (partition by activity_id order by time range between unbounded preceding and current row) sdo_seq
    from f
    where sdo_relate = 'TRUE'
    )
    select activity_id, min(time), max(time)
    , sum(dist)/1000 sum_dist
    , sum(secs) sum_secs
    , 3.6*sum(dist)/sum(secs) avg_speed
    , sum(greatest(0,ele_diff)) ele_gain
    , sum(least(0,ele_diff)) ele_loss
    , 100*sum(ele_diff*dist)/sum(dist*dist) avg_grade
    , min(ele) min_ele
    , max(ele) max_ele
    , sum(hr*secs)/sum(secs) avg_Hr
    , max(hr) max_hr
    , count(*) num_pts
    from g
    group by activity_id, sdo_seq, g.geom_length
    having sum(dist)>= g.geom_length/2 /*make sure line we find is longer than half route to prevent fragmentation*/
    order by 2
    /
    select * from table(dbms_xplan.display_cursor(null,null,'ADVANCED +IOSTATS -PROJECTION +ADAPTIVE'))
    /
    4a_1swains.sql
    • In subquery a, I compare the geometry of the activity with the geometry of Swain's Lane using sdo_geom.relate() to confirm that the activity includes Swain's Lane, but then I extract all the points in the activity GPX.
    • GPS is optimised for horizontal accuracy.  Even so, the tolerance for determining whether the track is close to the route has to be set to 25m to allow for noise in the data (Swain's Lane is tree-lined, and has walls on both sides, that both attenuate the GPS signal).  GPS elevation data is notorious for being noisy even under good conditions; you can see this in the variation of height gained on each ascent.  Sub-query b calculates an average elevation across 5 track points (up to +/-2 points).  
    • I need to compare each point in the track to each previous point so I can do some calculations and determine when the track comes into proximity with the Swain's Lane route, subquery c uses analytic functions to determine the previous point.  It is not possible to apply the analytic function to a geometry.
    • Subquery e determines whether a track point is in proximity to the route.  The tolerance, 25m, is set in subquery geo.  Then subquery f flags where the track point is in proximity to the route and the previous one was not.  Finally, subquery g maintains a running total of the number of times the track has gone close enough to the route.  That becomes a sequence number for each ascent of Swain's Lane by which I can group the subsequent analytics.
                                                                         Avg     Ele     Ele   Avg    Min    Max
    Activity Dist. Speed Gain Loss Grade Ele Ele Avg Max Num
    ID MIN(TIME) MAX(TIME) (km) Secs (kmph) (m) (m) % (m) (m) HR HR Pts
    ---------- ------------------- ------------------- -------- ----- ------ ------- ------- ----- ------ ------ ---- ---- -----
    4468006769 14:55:51 13.12.2020 14:58:17 13.12.2020 .372 147 9.1 36.1 .0 8.6 86.8 122.7 141 153 147
    15:08:13 13.12.2020 15:10:28 13.12.2020 .374 136 9.9 36.2 .0 8.4 86.8 122.8 147 155 136
    15:22:49 13.12.2020 15:25:18 13.12.2020 .369 150 8.9 36.2 .0 8.2 86.8 122.7 147 155 150
    ********** -------- -----
    sum 1.116 433
    On my laptop, this query takes about 10s, of which about 8s is spent on the window sort for the analytic functions, and 2s is spent working out whether the track points are in proximity to the route.
    Plan hash value: 3042349692

    ---------------------------------------------------------------------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Starts | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
    ---------------------------------------------------------------------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 2 | | | | 5147 (100)| | 6 |00:00:20.94 | 392 |
    | 1 | SORT ORDER BY | | 2 | 1 | 104 | | 5147 (1)| 00:00:01 | 6 |00:00:20.94 | 392 |
    |* 2 | FILTER | | 2 | | | | | | 6 |00:00:20.94 | 392 |
    | 3 | HASH GROUP BY | | 2 | 1 | 104 | | 5147 (1)| 00:00:01 | 6 |00:00:20.94 | 392 |
    | 4 | VIEW | | 2 | 8168 | 829K| | 5144 (1)| 00:00:01 | 866 |00:00:20.93 | 392 |
    | 5 | WINDOW SORT | | 2 | 8168 | 16M| 21M| 5144 (1)| 00:00:01 | 866 |00:00:20.93 | 392 |
    |* 6 | VIEW | | 2 | 8168 | 16M| | 1569 (1)| 00:00:01 | 866 |00:00:20.93 | 392 |
    | 7 | WINDOW SORT | | 2 | 8168 | 1403K| 1688K| 1569 (1)| 00:00:01 | 10208 |00:00:09.63 | 392 |
    | 8 | VIEW | | 2 | 8168 | 1403K| | 1252 (1)| 00:00:01 | 10208 |00:00:06.22 | 392 |
    | 9 | WINDOW SORT | | 2 | 8168 | 1021K| 1248K| 1252 (1)| 00:00:01 | 10208 |00:00:06.20 | 392 |
    | 10 | VIEW | | 2 | 8168 | 1021K| | 1016 (1)| 00:00:01 | 10208 |00:00:06.05 | 392 |
    | 11 | WINDOW SORT | | 2 | 8168 | 4546K| 5040K| 1016 (1)| 00:00:01 | 10208 |00:00:00.76 | 392 |
    | 12 | NESTED LOOPS | | 2 | 8168 | 4546K| | 31 (0)| 00:00:01 | 10208 |00:00:00.41 | 392 |
    | 13 | NESTED LOOPS | | 2 | 1 | 560 | | 2 (0)| 00:00:01 | 2 |00:00:00.03 | 104 |
    | 14 | TABLE ACCESS BY INDEX ROWID| MY_GEOMETRIES | 2 | 1 | 112 | | 1 (0)| 00:00:01 | 2 |00:00:00.01 | 4 |
    |* 15 | INDEX UNIQUE SCAN | MY_GEOMETRIES_PK | 2 | 1 | | | 0 (0)| | 2 |00:00:00.01 | 2 |
    |* 16 | TABLE ACCESS BY INDEX ROWID| ACTIVITIES | 2 | 1 | 448 | | 1 (0)| 00:00:01 | 2 |00:00:00.03 | 100 |
    |* 17 | INDEX UNIQUE SCAN | ACTIVITIES_PK | 2 | 1 | | | 0 (0)| | 2 |00:00:00.01 | 4 |
    | 18 | XPATH EVALUATION | | 2 | | | | | | 10208 |00:00:00.37 | 288 |
    ---------------------------------------------------------------------------------------------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------

    2 - filter(SUM("DIST")>="G"."GEOM_LENGTH"/2)
    6 - filter("SDO_RELATE"='TRUE')
    15 - access("GEOM_ID"=2)
    16 - filter(("A"."ACTIVITY_TYPE"='Ride' AND "SDO_GEOM"."RELATE"("A"."GEOM",'anyinteract',"G"."GEOM",25)='TRUE'))
    17 - access("A"."ACTIVITY_ID"=4468006769)
    I can apply this approach to all my trips up Swain's Lane.  However, I have logged 1115 ascents, and if I attempt to process them in a single SQL query I will have to do some very large window sorts that will spill out of memory (at least they will on my machine).  Instead, it is faster to process each activity separately in a PL/SQL loop (see 4b_allswains2.sql).
    I now have a table containing all of my ascents of Swain's Lane and I can see if I am getting faster or slower.  I simply dumped the data into Excel with SQL developer.  
    Unfortunately, I have discovered that I am not going faster!


    Spatial Data 4: Obtaining Geographical Data

    $
    0
    0

    This blog is part of a series about my first steps in using Spatial Data in the Oracle database.  I am using the GPS data from my cycling activities collected by Strava. All of my files are available on GitHub.

    The next stage is to use my Strava data as a resource for ride planning.  For example, I want to go for a ride in the Chilterns this weekend, I want to look at previous rides in the Chilterns to see where I have gone.  This presents a number of challenges that I will cover over the next few blogs.

    • I need a working definition of the Chilterns.  
    • I need to identify which activities entered the area defined as being the Chilterns.  

    More generically, I might be interested in any area in any country.  I need to be able to search for areas by name, then identify the activities that passed through these areas.

    Geographical Areas

    The world is divided up into 206 sovereignties (including independent and leased areas), and those are then divided down.  Let's take the United Kingdom as an example:

    United Kingdom

    .England
    .Northern Ireland
    .Scotland
    .Wales

    .Guernsey
    ..Alderney
    ..Guernsey
    ..Herm
    ..Sark
    .Isle of Man
    .Jersey

    .Anguilla
    .Bermuda
    .Cayman Islands
    .Dhekelia Sovereign Base Area
    .Falkland Islands
    .Gibraltar
    .British Indian Ocean Territory
    ..Diego Garcia Naval Support Facility
    .Montserrat
    .Pitcairn Islands
    .South Georgia and the Islands
    ..South Georgia
    ..South Sandwich Islands
    .Saint Helena
    ..Ascension
    ..Saint Helena
    ..Tristan da Cunha
    .Turks and Caicos Islands
    .British Virgin Islands
    Akrotiri Sovereign Base Area

    • The United Kingdom consists of the 4 'home' countries.
      • These are divided down into counties, authorities, districts, boroughs, wards and parishes.
    • Guernsey, Jersey and the Isle of Man are "Crown Dependencies".
    • There are 14 dependent territories
      • Some of these are broken down further into separate islands.

    I need enough areas to allow me to effectively search areas by name and then determine which activities are in which areas.

    To return to the original question, the Chiltern Hills are not a government administrative area but are designated as an Area of Outstanding Natural Beauty (AONB).  As, they are a useful shorthand to describe some areas where I regularly cycle, so I have included them in the heirarchy.

    Loading Spatial Data from Esri Shapefile

    Lots of geographical data is publically available from a variety of organisations and governments in the form of shapefiles.  This is "Esri's somewhat open, hybrid vector data format using SHP, SHX and DBF files. Originally invented in the early 1990s, it is still commonly used as a widely supported interchange format".  Oracle provides a java shapefile converter that transforms shapefiles into database tables.

    See also: 

    Shapefiles are zip archives that contain a number of files, including but not limited to the following, but containing at least always the first three:

    • .shp - the main file that contains the geometry itself,
    • .shx - an index file,
    • .dbf - a DBase file containing other attributes to describe the spatial data.  When you load the shapefile, the DBF file is loaded into all the other columns in the table.  This file can be opened with Microsoft Excel so you can see the data,
    • .prj - contains the projection description of the data,
    • .csv -the same data as in the .dbf file, but as a comma-separated data file,
    • .cfg - the code page of the data in the .dbf file.

    A little searching with Google turned up a number of useful sources of publically available spatial data (although most of it requires to be licenced for commercial use).

    Most of the shapefiles provide data in latitude/longitude in WGS84 that corresponds to SRID 4326.  However, the data from the UK government and the Ordnance Survey uses the British National Grid (BNG) GCS_OSGB_1936.  This corresponds to SRID 27700 (see Convert GPX Track to a Spatial Line Geometry).

    By default, the shapefile converter creates geometries in the coordinate system provided by the shapefile.  It is possible to specify a different coordinate system at load time, however, converting the data significantly slows the load process (my experience is that it increases load duration by about a factor of approximately 5).  

    The spatial data is loaded into a geometry column called geom by default.  However, the column name can be specified.

    Later when it comes to comparing spatial data, you can only compare geometries that have the same SRID.  Therefore, it is important to know the coordinate system of the data with which you are dealing.  My convention is to put WGS84 (SRID 4326) data into columns call geom, and to put British National Grid into columns called geom_27700. I load data in the coordinate system of the shapefile.  Later on, I may add additional columns and copy and convert the data.

    I have written a simple shell script (load_shapes.sh) to call the java shapefile converter, including controlling the SRID and the name of the table and the geometry column.

    #load_shapes.sh

    function shp_load {
    echo $0:$*

    cd $dir
    pwd
    export clpath=$ORACLE_HOME/suptools/tfa/release/tfa_home/jlib/ojdbc5.jar:$ORACLE_HOME/md/jlib/sdoutl.jar:$ORACLE_HOME/md/jlib/sdoapi.jar
    echodo "java -cp $clpath oracle.spatial.util.SampleShapefileToJGeomFeature -h oracle-database.local -p 1521 -sn oracle_pdb -u strava -d strava -t $table -f $base -r $srid -g ${col}"
    }

    clear
    #set -x

    shp_load /tmp/strava/ne_10m_admin_0_sovereignty.shp
    shp_load /tmp/strava/ne_10m_admin_0_map_units
    shp_load /tmp/strava/ne_10m_admin_0_map_subunits

    I can now load each shapefile into a separate table.  

    Merging Shapefile Data into a Single Set of Data

    The various tables created by loading shapefiles will each have their own structures determined by what was put into the shapefile. Ultimately, I am going to load them all into a single table with which I will work.  

    Areas have a hierarchy and that is represented in this table by the linked list of area code and number to parent area code and number.  Foreign key constraints ensure the parent values are valid.  There are also check constraints to prevent an area from being its own parent.

    REM my_areas_ddl.sql

    CREATE TABLE my_areas
    (area_Code varchar2(4) NOT NULL
    ,area_number integer NOT NULL
    ,uqid varchar2(20) NOT NULL

    ,area_level integer NOT NULL
    ,parent_area_code varchar2(4)
    ,parent_area_number integer
    ,parent_uqid varchar2(20)
    ,name varchar2(40)
    ,suffix varchar2(20)
    ,iso_code3 varchar2(3)

    ,num_children integer
    ,matchable integer default 1

    ,geom mdsys.sdo_geometry
    ,geom_27700 mdsys.sdo_geometry
    ,mbr mdsys.sdo_geometry
    ,constraint my_areas_pk primary key (area_code, area_number)
    ,constraint my_areas_uqid unique (uqid)
    ,constraint my_areas_rfk_area_code foreign key (parent_area_code, parent_area_number) references my_areas (area_code, area_number)
    ,constraint my_areas_rfk_uqid foreign key (parent_uqid) references my_areas (uqid)
    ,constraint my_areas_fk_area_code foreign key (area_code) references my_area_codes (area_code)
    ,constraint my_areas_check_parent_area_code CHECK (area_code != parent_area_code OR area_number != parent_area_number)
    ,constraint my_areas_check_parent_uqid CHECK (uqid != parent_uqid)
    )
    /
    --alter table my_areas modify matchable default 1;
    Alter table my_areas add constraint my_areas_uq_iso_code3 unique (iso_code3);
    Create index my_areas_rfk_uqid on my_areas(parent_uqid);
    Create index my_areas_rfk_area_code on my_areas (parent_area_code, parent_area_number);

    I have created scripts to populate data in the my_areas table from the Natural Earth data, and from the data for each country.  Different scripts are needed for each shapefile.

    • load_countries.sql - to load Natural Earth data
    • load_uk.sql - to load Ordnance Survey data of Great Britain.  This includes some DML to work out which wards and parishes are in which districts and boroughs and update the hierarchy accordingly.
    • load_XXX.sql, - load administrative areas for a country where XXX is the 3-letter ISO code for that country (eg. load_FRA.sql for France).
    • fix_names.sql - to simplify names stripping common suffixes such as a county, district, authority, ward etc.
      • fix_my_areas.sql - script to collect statistics, count children for each area, look for areas that children of another area with the same name, simplify areas with more than 10,000 points. 

    Spatial Data 5: Searching For Geometries That Intersect Other Geometries

    $
    0
    0

    This blog is part of a series about my first steps in using Spatial Data in the Oracle database.  I am using the GPS data from my cycling activities collected by Strava. All of my files are available on GitHub.

    I have loaded basic data for all countries, and detailed data for the UK and other countries where I have recorded activities.  The next step is to determine which activities pass through which areas.  Generically, the question is simply whether one geometry intersects with another.   I can test this in SQL with the sdo_geom.relate() function.

    WHERE SDO_GEOM.RELATE(a.geom,'anyinteract',m.geom) = 'TRUE'

    However, working out whether an activity, with several thousand points, is within an area defined with several thousand points can be CPU intensive and time-consuming.  Larger areas such as UK counties average over 20,000 points. 

    I have 60,000 defined areas, of which, over 20,000 of which are for the UK.  I have 2700 activities recorded on Strava, with an average of 2700 points, but some have over 10,000 points.  It isn't viable to compare every activity with every area.  Comparing these large geometries can take a significant time, too long to do the spatial queries every time I want to interrogate the data, and too long for an on-line application.

    Pre-processing Geometry Intersections

    However, the data, once loaded is static.  Definitions of areas can change, but it is rare.  Activities do not change.  Therefore, I have decided to pre-process the data to produce a table of matching activities and areas.

    CREATE TABLE activity_areas
    (activity_id NUMBER NOT NULL
    ,area_code VARCHAR2(4) NOT NULL
    ,area_number NUMBER NOT NULL
    ,geom_length NUMBER
    ,CONSTRAINT ACTIVITY_AREAS_PK PRIMARY KEY (activity_id, area_code, area_number)
    ,CONSTRAINT ACTIVITY_AREAS_FK FOREIGN KEY (activity_id) REFERENCES ACTIVITIES (activity_id)
    ,CONSTRAINT ACTIVITY_AREAS_FK2 FOREIGN KEY (area_code, area_number)
    REFERENCES MY_AREAS (area_code, area_number)
    );

    Recursive Search

    I have written the search as a PL/SQL procedure to search areas that match a particular activity.

    • I pass the ID of the activity to be processed to the procedure.
    • I can specify the area code and number, or the parent area code and number, at which to search through the areas.  I usually leave them to default to null so the search starts with areas at the root of the hierarchy that therefore have no parents (i.e. sovereign countries).  
    • The procedure then calls itself recursively for each area that it finds matches the activity, to search its children.  This way, I limit the total number of comparisons required.  
    • For every area and activity, I have calculated the minimum bounding rectangle using sdo_geom.sdo_mbr() and stored it in another geometry column on the same row.  This geometry contains just 5 points (the last point is the same as the first to close the rectangle).  I can compare two rectangles very quickly, and if they don't intersect overlap then there is no need to see if the actual geometries overlap.  This approach filters out geometries that cannot match, so that fewer geometries then have to be compared in full, thus significantly improving the performance of the search.
    AND SDO_GEOM.RELATE(a.mbr,'anyinteract',m.mbr) = 'TRUE'

    • I have found that it is necessary to have the MBR comparison earlier in the predicate clauses than the GEOM comparison.


    PROCEDURE activity_area_search
    (p_activity_id INTEGER
    ,p_area_code my_areas.area_code%TYPE DEFAULT NULL
    ,p_area_number my_areas.area_number%TYPE DEFAULT NULL
    ,p_query_type VARCHAR2 DEFAULT 'P'
    ,p_level INTEGER DEFAULT 0
    ) IS
    BEGIN
    FOR i IN(
    SELECT m.*
    , CASE WHEN m.geom_27700 IS NOT NULL THEN sdo_geom.sdo_length(SDO_GEOM.sdo_intersection(m.geom_27700,a.geom_27700,5), unit=>'unit=km')
    WHEN m.geom IS NOT NULL THEN sdo_geom.sdo_length(SDO_GEOM.sdo_intersection(m.geom,a.geom,5), unit=>'unit=km')
    END geom_length
    , (SELECT MIN(m2.area_level) FROM my_areas m2
    WHERE m2.parent_area_code = m.area_code AND m2.parent_area_number = m.area_number) min_child_level
    FROM my_areas m
    , activities a
    WHERE ( (p_query_type = 'P' AND parent_area_code = p_area_code AND parent_area_number = p_area_number)
    OR (p_query_type = 'A' AND area_code = p_area_code AND area_number = p_area_number)
    OR (p_query_type = 'A' AND p_area_number IS NULL AND area_code = p_area_code)
    OR (p_area_code IS NULL AND p_area_number IS NULL AND parent_area_code IS NULL AND parent_area_number IS NULL))
    AND a.activity_id = p_activity_id
    and SDO_GEOM.RELATE(a.mbr,'anyinteract',m.mbr) = 'TRUE'
    and SDO_GEOM.RELATE(a.geom,'anyinteract',m.geom) = 'TRUE'
    ) LOOP
    IF i.area_level>0 OR i.num_children IS NULL THEN
    BEGIN
    INSERT INTO activity_areas
    (activity_id, area_code, area_number, geom_length)
    VALUES
    (p_activity_id, i.area_code, i.area_number, i.geom_length);
    EXCEPTION
    WHEN dup_val_on_index THEN
    UPDATE activity_areas
    SET geom_length = i.geom_length
    WHERE activity_id = p_activity_id
    AND area_code = i.area_code
    AND area_number = i.area_number;
    END;
    END IF;

    IF i.num_children > 0 THEN
    strava_pkg.activity_area_search(p_activity_id, i.area_code, i.area_number, 'P', p_level+1);
    END IF;
    END LOOP;

    END activity_area_search;

    The search can process a single activity by calling the procedure.  An activity that found just 10 areas, took just 6 seconds to process.  However, it does not scale linearly.  Activities that have over 100 areas can take at least 6 minutes.
    SQL> exec strava_pkg.activity_area_search(4372796838);
    Searching 4372796838:-
    Found SOV-1159320701:United Kingdom, 2.895 km
    .Searching 4372796838:SOV-1159320701
    .Found GEOU-1159320743:England, 2.851 km
    ..Searching 4372796838:GEOU-1159320743
    ..Found GLA-117537:Greater London, 2.851 km
    ...Searching 4372796838:GLA-117537
    ...Found LBO-50724:City of Westminster, 1.732 km
    ....Searching 4372796838:LBO-50724
    ....Found LBW-117484:Abbey Road, 1.435 km
    ....Found LBW-50639:Maida Vale, 0.298 km
    ....Done 4372796838:LBO-50724: 0.415 secs).
    ...Found LBO-50632:Camden, 1.119 km
    ....Searching 4372796838:LBO-50632
    ....Found LBW-117286:Kilburn, 0.273 km
    ....Found LBW-117288:Swiss Cottage, 1.033 km
    ....Found LBW-117287:West Hampstead, 0.084 km
    ....Done 4372796838:LBO-50632: 0.521 secs).
    ...Done 4372796838:GLA-117537: 3.368 secs).
    ..Done 4372796838:GEOU-1159320743: 4.372 secs).
    .Done 4372796838:SOV-1159320701: 4.750 secs).
    Done 4372796838:-: 5.532 secs).

    PL/SQL procedure successfully completed.
    Since I load Strava activities from the bulk download, I also process them in bulk.
    --process unmatched activities
    set pages 99 lines 180 timi on serveroutput on
    column activity_name format a60
    BEGIN
    FOR i IN (
    SELECT a.activity_id, activity_date, activity_name
    , distance_km, num_pts, ROUND(num_pts/NULLIF(distance_km,0),0) ppkm
    FROM activities a
    WHERE activity_id NOT IN (SELECT DISTINCT activity_id FROM activity_areas)
    AND num_pts>0
    ) LOOP
    strava_pkg.activity_area_search(i.activity_id);
    commit;
    END LOOP;
    END;
    /
    Matching 2,700 activities produced 71,628 rows on activity_areas for 5,620 distinct areas. In the next article, I will demonstrate how to text search the areas for matching activities.

    Spatial Data 6: Text Searching Areas by their Name, and the Names of Parent Areas

    $
    0
    0

    This blog is part of a series about my first steps in using Spatial Data in the Oracle database.  I am using the GPS data from my cycling activities collected by Strava. All of my files are available on GitHub.

    Now I have loaded all the areas, I want to be able to search for them by name.  I am going to create an Oracle Text Index, but I need to index more than just the name of each area.  I must index the full hierarchy of each area so I can search on combinations of names in different types of areas.  For example, I might search for a village and county (e.g. Streatley and Berkshire), to distinguish it from a village of the same name in a different county (e.g. Streatley in Bedfordshire).

    I can generate the full hierarchy of an area with a PL/SQL function (strava_pkg.name_heirarchy_fn) by navigating up the linked list and discarding repeated names.  I could make that available in a virtual column.  However, I cannot build a text index on a function or a virtual column.  

    Text Index Option 1: Store Hierarchy on Table, and Create a Multi-Column Text Index

    I could store the hierarchy of an area on the my_areas table, and generate the area from PL/SQL function strava_pkg. name_heirarchy_fn.

    DECLARE
    l_clob CLOB;
    l_my_areas my_areas%ROWTYPE;
    BEGIN
    select m.*
    into l_my_areas
    FROM my_areas m
    WHERE area_code = 'CPC'
    And area_number = '40307';

    dbms_output.put_line(strava_pkg.name_heirarchy_fn(l_my_areas.area_code,l_my_areas.area_number));
    dbms_output.put_line(strava_pkg.name_heirarchy_fn(l_my_areas.parent_area_code,l_my_areas.parent_area_number));
    END;
    /

    If I pass the code and number for a particular area, I can get its full hierarchy including its name.  I can see the parish of Streatley, is in the Unitary Authority of West Berkshire, which is in England, and England is in the United Kingdom.  If I pass the code and number of its parent, I just get the hierarchy up to its parent.  

    Streatley, West Berkshire, England, United Kingdom
    West Berkshire, England, United Kingdom

    I can store the hierarchy on my_areas, though I have to store results on a temporary table, rather than update it directly.  Otherwise, I get a mutation error.

    ALTER TABLE my_areas add name_heirarchy VARCHAR(4000)
    /
    CREATE GLOBAL TEMPORARY TABLE my_areas_temp ON COMMIT PRESERVE ROWS AS
    SELECT area_code, area_number, strava_pkg.name_heirarchy_fn(parent_area_code,parent_area_number) name_heirarchy
    FROM my_areas WHERE parent_area_code IS NOT NULL AND parent_area_number IS NOT NULL
    /
    MERGE INTO my_areas u
    USING (SELECT * FROM my_areas_temp) s
    ON (u.area_code = s.area_code AND u.area_number = s.area_number)
    WHEN MATCHED THEN UPDATE
    SET u.name_heirarchy = s.name_heirarchy
    /

    Then I can create a multi-column text index on the name

    begin
    ctx_ddl.create_preference('my_areas_lexer', 'BASIC_LEXER');
    ctx_ddl.set_attribute('my_areas_lexer', 'mixed_case', 'NO');
    ctx_ddl.create_preference('my_areas_datastore', 'MULTI_COLUMN_DATASTORE');
    ctx_ddl.set_attribute('my_areas_datastore', 'columns', 'name, name_heirarchy');
    end;
    /
    CREATE INDEX my_areas_name_txtidx ON my_areas (name) INDEXTYPE IS ctxsys.context
    PARAMETERS ('datastore my_areas_datastore lexer my_areas_lexer sync(on commit)');

    The index will sync if I have cause to update the hierarchy.

    Text Index Option 2: Index a user_datastore based on the result of a PL/SQL function

    Alternatively, I can build a text index on a combination of data from various sources by creating a PL/SQL procedure that combines the data and returns the string to be indexed.  

    I have created a procedure (strava_pkg.name_heirarchy_txtidx) that returns a string containing the hierarchy of a given area, and then I will create a text index on that.  The format of the parameters must be exactly as follows: 

    • The rowid of the row being indexed is passed to the procedure; 
    • The string to be indexed is passed back as a CLOB parameter.
    See also: Oracle Text Indexing Elements: USER_DATASTORE Attributes


    PROCEDURE name_heirarchy_txtidx
    (p_rowid in rowid
    ,p_dataout IN OUT NOCOPY CLOB
    ) IS
    l_count INTEGER := 0;
    BEGIN
    FOR i IN (
    SELECT area_code, area_number, name, matchable
    FROM my_areas m
    START WITH rowid = p_rowid
    CONNECT BY NOCYCLE prior m.parent_area_code = m.area_code
    AND prior m.parent_area_number = m.area_number
    ) LOOP
    IF i.matchable >= 1 THEN
    l_count := l_count + 1;
    IF l_count > 1 THEN
    p_dataout := p_dataout ||', '|| i.name;
    ELSE
    p_dataout := i.name;
    END IF;
    END IF;
    END LOOP;
    END name_heirarchy_txtidx;

    As an example, if I pass a particular rowid to the procedure, I obtain the full hierarchy of areas as before.

    set serveroutput on
    DECLARE
    l_rowid ROWID;
    l_clob CLOB;
    BEGIN
    select rowid
    into l_rowid
    FROM my_areas m
    WHERE area_code = 'CPC'
    And area_number = '40307';

    strava_pkg.name_heirarchy_txtidx(l_rowid, l_clob);
    dbms_output.put_line(l_clob);
    END;
    /

    Streatley, West Berkshire, England, United Kingdom

    PL/SQL procedure successfully completed.

    The procedure is referenced as an attribute to a user datastore, I can then build a text index on the user datastore.

    BEGIN
    ctx_ddl.create_preference('my_areas_lexer', 'BASIC_LEXER');
    ctx_ddl.set_attribute('my_areas_lexer', 'mixed_case', 'NO');
    ctx_ddl.create_preference('my_areas_datastore', 'user_datastore');
    ctx_ddl.set_attribute('my_areas_datastore', 'procedure', 'strava_pkg.name_heirarchy_txtidx');
    ctx_ddl.set_attribute('my_areas_datastore', 'output_type', 'CLOB');
    END;
    /

    CREATE INDEX my_areas_name_txtidx on my_areas (name) INDEXTYPE IS ctxsys.context
    PARAMETERS ('datastore my_areas_datastore lexer my_areas_lexer');
    I have not been able to combine a multi-column datastore with a user datastore.

    Text Search examples

    Both options produce an index that I can use in the same way.  I can search for a particular name, for example, the village of Streatley.

    SELECT score(1), area_Code, area_number, name, suffix, name_heirarchy
    FROM my_areas m
    WHERE CONTAINS(name,'streatley',1)>0
    /

    I get the two Streatleys, one in Berkshire, and the other in Bedfordshire.  

      SCORE(1) AREA AREA_NUMBER NAME                 SUFFIX     NAME_HEIRARCHY
    ---------- ---- ----------- -------------------- ---------- ------------------------------------------------------------
    16 CPC 41076 Streatley CP Streatley, Central Bedfordshire, England, United Kingdom
    16 CPC 40307 Streatley CP Streatley, West Berkshire, England, United Kingdom

    As I have indexed the full hierarchy, I can be more precise and search for both the village and the county, even though they are two different rows in the my_areas table.

    SELECT score(1), area_Code, area_number, name, suffix, name_heirarchy
    FROM my_areas m
    WHERE CONTAINS(name,'streatley and berks%',1)>0
    /

    Now I just get one result.  The Streatley in Berkshire.

      SCORE(1) AREA AREA_NUMBER NAME                 SUFFIX     NAME_HEIRARCHY
    ---------- ---- ----------- -------------------- ---------- ------------------------------------------------------------
    11 CPC 40307 Streatley CP Streatley, West Berkshire, England, United Kingdom

    Searching For the Top of Hierarchies

    My search query works satisfactorily if my search identifies areas with no children, but supposing I search for something higher up the hierarchy, like Berkshire?  

    SELECT score(1), area_Code, area_number, name, suffix, name_heirarchy
    FROM my_areas m
    WHERE CONTAINS(name,'berkshire',1)>0
    /

    I get 184 areas, of different types within the areas called Berkshire, because the name of the parent area appears in the hierarchy of all the children and so is returned by the text index.

               Area        Area
    SCORE(1) Code Number NAME SUFFIX NAME_HEIRARCHY
    ---------- ---- ----------- ------------------------- ---------- -----------------------------------------------------------
    11 UTA 101678 Windsor and Maidenhead (B) Windsor and Maidenhead, Berkshire, England, United Kingdom
    11 UTA 101680 Wokingham (B) Wokingham, Berkshire, England, United Kingdom
    11 UTA 101681 Reading (B) Reading, Berkshire, England, United Kingdom
    11 UTA 101685 West Berkshire West Berkshire, England, United Kingdom
    11 UTW 40258 Norreys Ward Norreys, Wokingham, Berkshire, England, United Kingdom
    11 UTW 40261 Barkham Ward Barkham, Wokingham, Berkshire, England, United Kingdom

    However, I am just interested in the highest points on each part of the hierarchy I have identified.  So, I exclude any result where its parent is also in the result set.

    WITH x AS (
    SELECT area_code, area_number, parent_area_code, parent_area_number, name, name_heirarchy
    FROM my_areas m
    WHERE CONTAINS(name,'berkshire',1)>0
    ) SELECT * FROM x WHERE NOT EXISTS (
    SELECT 'x' FROM x x1
    WHERE x1.area_code = x.parent_area_code
    AND x1.area_number = x.parent_area_number
    )
    /

    In this case, I still get two results because the boundaries of the unitary authority of West Berkshire are not entirely within the ceremonial county of Berkshire (some parts of Hungerford and Lambourne were exchanged with Wiltshire in 1990), hence I could not make Berkshire the parent of West Berkshire.

    Area        Area
    Code Number SCORE NAME_HEIRARCHY
    ---- ----------- ---------- ------------------------------------------------------------
    UTA 101685 11 West Berkshire, England, United Kingdom
    CCTY 7 11 Berkshire, England, United Kingdom

    Text Searching for Activities that pass through Areas

    It is a simple extension to join the pre-processed areas through which activities pass to the areas found by the text search, and then exclude areas whose parent was also found in the same activity.

    WITH x AS (
    SELECT aa.activity_id, m.area_code, m.area_number, m.parent_area_code, m.parent_area_number, m.name, m.name_heirarchy
    FROM my_areas m, activity_areas aa
    WHERE m.area_Code = aa.area_code
    AND m.area_number = aa.area_number
    AND CONTAINS(name,'berkshire',1)>0
    )
    SELECT a.activity_id, a.activity_date, a.activity_name, a.activity_type, a.distance_km
    , x.area_Code, x.area_number, x.name, x.name_heirarchy
    FROM x, activities a
    WHERE x.activity_id = a.activity_id
    AND a.activity_date between TO_DATE('01022019','DDMMYYYY') and TO_DATE('28022019','DDMMYYYY')
    AND NOT EXISTS (
    SELECT 'x' FROM x x1
    WHERE x1.area_code = x.parent_area_code
    AND x1.area_number = x.parent_area_number
    AND x1.activity_id = x.activity_id)
    ORDER BY a.activity_date
    /

    Now I can see the rides in Berkshire in February 2019.  I get two rows returned for the ride that was in both Berkshire and West Berkshire.  

      Activity Activity                                                Activity Distance Area   Area
    ID Date ACTIVITY_NAME Type (km) Code Number NAME NAME_HEIRARCHY
    ---------- --------- --------------------------------------------- -------- -------- ---- ------ --------------- -------------------------
    2156308823 17-FEB-19 MV - Aldworth, CLCTC Aldworth-Reading Ride 120.86 CCTY 7 Berkshire England, United Kingdom
    2156308823 17-FEB-19 MV - Aldworth, CLCTC Aldworth-Reading Ride 120.86 UTA 101685 West Berkshire England, United Kingdom
    2172794879 24-FEB-19 MV - Maidenhead Ride 48.14 CCTY 7 Berkshire England, United Kingdom
    2173048214 24-FEB-19 CLCTC: Maidenhead - Turville Heath Ride 53.15 CCTY 7 Berkshire England, United Kingdom
    2173048406 24-FEB-19 Maidenhead - Burnham Beeches - West Drayton Ride 27.92 CCTY 7 Berkshire England, United Kingdom

    References

    I found these references useful while creating the Text index:

    Clashing SQL Profiles - Exact Matching Profiles Take Precedence Over Force Matching Profiles

    $
    0
    0

    Sometimes, you reach a point in performance tuning, where you use a SQL Baseline, or SQL Patch, or SQL Profile to stabilise an execution plan.  These methods all effectively inject a hint or set of hints into a statement to produce the desired execution plan.  Baselines and Patches will only exactly match a SQL ID and therefore a SQL statement.  However, a SQL Profile can optionally do force matching so that it applies to "all SQL statements that have the same text after the literal values in the WHERE clause have been replaced by bind variables.  

    This setting may be useful for applications that use only literal values because it enables SQL with text differing only in its literal values to share a SQL profile. If both literal values and bind variables are in the SQL text, or if force_match is set to false (default), then the literal values in the WHERE clause are not replaced by bind variables."[Oracle Database SQL Tuning Guide]

    I often work with PeopleSoft, whose batch processes often dynamically generate SQL with literal values.  Therefore, I usually create force matching profiles when I need to control an execution plan.  However, sometimes I come across situations where some exact matching (i.e. not force matching) profiles have been created (often by production DBAs using the tuning advisor) on different statements that have the same force matching signature, and then maybe a force matching profile has also been applied.

    Note: SQL Profiles require the Tuning Pack licence.
    Where both exact and force matching profiles apply to a SQL statement, the exact matching profile will take precedence over the force matching profile, and even if disabled it will prevent the force matching profile from being applied.
    I will demonstrate this with a simple test.  I will create a table with a couple of indexes, collect statistics, and generate an execution plan for a query.  I am using explain plan for command to force a parse of the statement every time.
    CREATE TABLE t (a not null, b) AS 
    SELECT rownum, ceil(sqrt(rownum)) FROM dual CONNECT BY LEVEL <= 100;
    CREATE UNIQUE INDEX t_idx on t(a);
    CREATE INDEX t_idx2 on t(b,a);
    EXEC dbms_stats.gather_table_stats(user,'T');

    Without Any SQL Profiles

    Without any profiles in place, I get a skip scan of T_IDX2, and there is no note in the execution plan.
    EXPLAIN PLAN FOR SELECT * FROM t WHERE a = 42;
    SELECT * FROM table(dbms_xplan.display(null,null,'ADVANCED +ADAPTIVE -PROJECTION'));

    Plan hash value: 3418618943
    ---------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    ---------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1 | 6 | 1 (0)| 00:00:01 |
    |* 1 | INDEX SKIP SCAN | T_IDX2 | 1 | 6 | 1 (0)| 00:00:01 |
    ---------------------------------------------------------------------------

    Outline Data
    -------------
    /*+
    BEGIN_OUTLINE_DATA
    INDEX_SS(@"SEL$1""T"@"SEL$1" ("T"."B""T"."A"))
    OUTLINE_LEAF(@"SEL$1")
    ALL_ROWS
    DB_VERSION('19.1.0')
    OPTIMIZER_FEATURES_ENABLE('19.1.0')
    IGNORE_OPTIM_EMBEDDED_HINTS
    END_OUTLINE_DATA
    */

    Predicate Information (identified by operation id):
    ---------------------------------------------------
    1 - access("A"=42)
    filter("A"=42)

    Force Matching Profile

    Now I will create an exact matching SQL profile on the query that will force the use of the unique index.  The query is the same except the literal value is different (it is 54 instead of 42).
    DECLARE
    signature INTEGER;
    sql_txt CLOB;
    h SYS.SQLPROF_ATTR;
    BEGIN
    sql_txt := q'[
    SELECT * FROM t WHERE a = 54
    ]';
    h := SYS.SQLPROF_ATTR(
    q'[BEGIN_OUTLINE_DATA]',
    q'[IGNORE_OPTIM_EMBEDDED_HINTS]',
    q'[FULL(@"SEL$1""T"@"SEL$1")]',
    q'[END_OUTLINE_DATA]');
    signature := DBMS_SQLTUNE.SQLTEXT_TO_SIGNATURE(sql_txt);
    DBMS_SQLTUNE.IMPORT_SQL_PROFILE (
    sql_text => sql_txt,
    profile => h,
    name => 'clashing_profile_test_force',
    category => 'DEFAULT',
    validate => TRUE,
    replace => TRUE,
    force_match => TRUE
    );
    END;
    /
    I only have a force-matching profile. 
                                                                       Execution plan with force matching profile (full scan)

    NAME CATEGORY SIGNATURE SQL_TEXT CREATED
    ------------------------------ ---------- --------------------- -------------------------------------------------------------------------------- ------------------------------
    LAST_MODIFIED DESCRIPTION TYPE STATUS FOR TASK_ID TASK_EXEC_NAME TASK_OBJ_ID TASK_FND_ID TASK_REC_ID TASK_CON_DBID
    ------------------------------ -------------------- ------- -------- --- ---------- -------------------- ----------- ----------- ----------- -------------
    clashing_profile_test_force DEFAULT 11431056000319719221 27-JUL-21 01.35.43.854691 PM
    SELECT * FROM t WHERE a = 54
    27-JUL-21 01.35.43.000000 PM MANUAL ENABLED YES
    The execution plan uses the full plan as specified by the profile, there is a note confirming that the profile was matched and used, and the full hint was listed in the hint report.
    EXPLAIN PLAN FOR SELECT * FROM t WHERE a = 42;
    SELECT * FROM table(dbms_xplan.display(null,null,'ADVANCED +ADAPTIVE -PROJECTION'));

    Plan hash value: 1601196873

    ----------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    ----------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1 | 6 | 3 (0)| 00:00:01 |
    |* 1 | TABLE ACCESS STORAGE FULL| T | 1 | 6 | 3 (0)| 00:00:01 |
    ----------------------------------------------------------------------------------

    Outline Data
    -------------

    /*+
    BEGIN_OUTLINE_DATA
    FULL(@"SEL$1""T"@"SEL$1")
    OUTLINE_LEAF(@"SEL$1")
    ALL_ROWS
    DB_VERSION('19.1.0')
    OPTIMIZER_FEATURES_ENABLE('19.1.0')
    IGNORE_OPTIM_EMBEDDED_HINTS
    END_OUTLINE_DATA
    */

    Predicate Information (identified by operation id):
    ---------------------------------------------------

    1 - storage("A"=42)
    filter("A"=42)

    Hint Report (identified by operation id / Query Block Name / Object Alias):
    Total hints for statement: 2
    ---------------------------------------------------------------------------

    0 - STATEMENT
    - IGNORE_OPTIM_EMBEDDED_HINTS

    1 - SEL$1 / T@SEL$1
    - FULL(@"SEL$1""T"@"SEL$1")

    Note
    -----
    - SQL profile "clashing_profile_test_force" used for this statement

    Exact Matching Profile 

    I will now add an exact matching profile
    DECLARE
    signature INTEGER;
    sql_txt CLOB;
    h SYS.SQLPROF_ATTR;
    BEGIN
    sql_txt := q'[
    SELECT * FROM t WHERE a = 42
    ]';
    h := SYS.SQLPROF_ATTR(
    q'[BEGIN_OUTLINE_DATA]',
    q'[IGNORE_OPTIM_EMBEDDED_HINTS]',
    q'[INDEX(@"SEL$1""T"@"SEL$1" ("T"."A"))]',
    q'[END_OUTLINE_DATA]');
    signature := DBMS_SQLTUNE.SQLTEXT_TO_SIGNATURE(sql_txt);
    DBMS_SQLTUNE.IMPORT_SQL_PROFILE (
    sql_text => sql_txt,
    profile => h,
    name => 'clashing_profile_test_exact',
    category => 'DEFAULT',
    validate => TRUE,
    replace => TRUE,
    force_match => FALSE
    );
    END;
    /
    I can see I now have two SQL Profiles; one force matched, and one exact matched.
                                                                        Execution plan with force matching profile (unique index lookup)

    NAME CATEGORY SIGNATURE SQL_TEXT CREATED
    ------------------------------ ---------- --------------------- -------------------------------------------------------------------------------- ------------------------------
    LAST_MODIFIED DESCRIPTION TYPE STATUS FOR TASK_ID TASK_EXEC_NAME TASK_OBJ_ID TASK_FND_ID TASK_REC_ID TASK_CON_DBID
    ------------------------------ -------------------- ------- -------- --- ---------- -------------------- ----------- ----------- ----------- -------------
    clashing_profile_test_exact DEFAULT 14843900676141266266 27-JUL-21 01.35.46.825697 PM
    SELECT * FROM t WHERE a = 42
    27-JUL-21 01.35.46.000000 PM MANUAL ENABLED NO

    clashing_profile_test_force DEFAULT 11431056000319719221 27-JUL-21 01.35.43.854691 PM
    SELECT * FROM t WHERE a = 54
    27-JUL-21 01.35.43.000000 PM MANUAL ENABLED YES
    The execution plan has changed to the unique index scan. The index hint from the profile appears hints report. The note at the bottom of the plan shows the exact matching profile has been used, taking precedence over the force matching profile.
    EXPLAIN PLAN FOR SELECT * FROM t WHERE a = 42;
    SELECT * FROM table(dbms_xplan.display(null,null,'ADVANCED +ADAPTIVE -PROJECTION'));

    Plan hash value: 2929955852

    -------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    -------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1 | 6 | 1 (0)| 00:00:01 |
    | 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 6 | 1 (0)| 00:00:01 |
    |* 2 | INDEX UNIQUE SCAN | T_IDX | 1 | | 0 (0)| 00:00:01 |
    -------------------------------------------------------------------------------------

    Outline Data
    -------------

    /*+
    BEGIN_OUTLINE_DATA
    INDEX_RS_ASC(@"SEL$1""T"@"SEL$1" ("T"."A"))
    OUTLINE_LEAF(@"SEL$1")
    ALL_ROWS
    DB_VERSION('19.1.0')
    OPTIMIZER_FEATURES_ENABLE('19.1.0')
    IGNORE_OPTIM_EMBEDDED_HINTS
    END_OUTLINE_DATA
    */

    Predicate Information (identified by operation id):
    ---------------------------------------------------

    2 - access("A"=42)

    Hint Report (identified by operation id / Query Block Name / Object Alias):
    Total hints for statement: 2
    ---------------------------------------------------------------------------

    0 - STATEMENT
    - IGNORE_OPTIM_EMBEDDED_HINTS

    1 - SEL$1 / T@SEL$1
    - INDEX(@"SEL$1""T"@"SEL$1" ("T"."A"))

    Note
    -----
    - SQL profile "clashing_profile_test_exact" used for this statement

    Different Query

    If I run the query with a different literal value, the plan changes back to the full scan, and the note reports the force matching profile was used
    EXPLAIN PLAN FOR SELECT * FROM t WHERE a = 54;
    SELECT * FROM table(dbms_xplan.display(null,null,'ADVANCED +ADAPTIVE -PROJECTION'));

    Plan hash value: 1601196873

    ----------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    ----------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1 | 6 | 3 (0)| 00:00:01 |
    |* 1 | TABLE ACCESS STORAGE FULL| T | 1 | 6 | 3 (0)| 00:00:01 |
    ----------------------------------------------------------------------------------

    Outline Data
    -------------

    /*+
    BEGIN_OUTLINE_DATA
    FULL(@"SEL$1""T"@"SEL$1")
    OUTLINE_LEAF(@"SEL$1")
    ALL_ROWS
    DB_VERSION('19.1.0')
    OPTIMIZER_FEATURES_ENABLE('19.1.0')
    IGNORE_OPTIM_EMBEDDED_HINTS
    END_OUTLINE_DATA
    */

    Predicate Information (identified by operation id):
    ---------------------------------------------------

    1 - storage("A"=54)
    filter("A"=54)

    Hint Report (identified by operation id / Query Block Name / Object Alias):
    Total hints for statement: 2
    ---------------------------------------------------------------------------

    0 - STATEMENT
    - IGNORE_OPTIM_EMBEDDED_HINTS

    1 - SEL$1 / T@SEL$1
    - FULL(@"SEL$1""T"@"SEL$1")

    Note
    -----
    - SQL profile "clashing_profile_test_force" used for this statement

    Disable Exact Matching SQL Profile

    I will now disable the exact matching profile.
    exec dbms_sqltune.alter_sql_profile(name=>'clashing_profile_test_exact', attribute_name=>'STATUS',value=>'DISABLED');
    SELECT * FROM dba_sql_profiles where name like 'clashing%';

    Disable Exact Profile - Execution plan with no profile (skip scan) - Odd

    NAME CATEGORY SIGNATURE SQL_TEXT CREATED
    ------------------------------ ---------- --------------------- -------------------------------------------------------------------------------- ------------------------------
    LAST_MODIFIED DESCRIPTION TYPE STATUS FOR TASK_ID TASK_EXEC_NAME TASK_OBJ_ID TASK_FND_ID TASK_REC_ID TASK_CON_DBID
    ------------------------------ -------------------- ------- -------- --- ---------- -------------------- ----------- ----------- ----------- -------------
    clashing_profile_test_exact DEFAULT 14843900676141266266 27-JUL-21 01.35.46.825697 PM
    SELECT * FROM t WHERE a = 42
    27-JUL-21 01.35.52.000000 PM MANUAL DISABLED NO

    clashing_profile_test_force DEFAULT 11431056000319719221 27-JUL-21 01.35.43.854691 PM
    SELECT * FROM t WHERE a = 54
    27-JUL-21 01.35.43.000000 PM MANUAL ENABLED YES
    I expected the profile to switch back to the force matching profile, but instead it goes back to the original plan using the skip scan with no profile at all. So the disabled exact matching profile prevents the force matching profile from matching the statement, and then doesn't get applied to the statement either! There is no note in the execution plan and no hint report.
    EXPLAIN PLAN FOR SELECT * FROM t WHERE a = 42;
    SELECT * FROM table(dbms_xplan.display(null,null,'ADVANCED +ADAPTIVE -PROJECTION'));

    Plan hash value: 3418618943

    ---------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    ---------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1 | 6 | 1 (0)| 00:00:01 |
    |* 1 | INDEX SKIP SCAN | T_IDX2 | 1 | 6 | 1 (0)| 00:00:01 |
    ---------------------------------------------------------------------------

    Outline Data
    -------------

    /*+
    BEGIN_OUTLINE_DATA
    INDEX_SS(@"SEL$1""T"@"SEL$1" ("T"."B""T"."A"))
    OUTLINE_LEAF(@"SEL$1")
    ALL_ROWS
    DB_VERSION('19.1.0')
    OPTIMIZER_FEATURES_ENABLE('19.1.0')
    IGNORE_OPTIM_EMBEDDED_HINTS
    END_OUTLINE_DATA
    */

    Predicate Information (identified by operation id):
    ---------------------------------------------------

    1 - access("A"=42)
    filter("A"=42)

    Alter Category of Exact Matching SQL Profile

    I could have dropped the SQL Profile, but I might want to retain it for documentation and in case I need to reinstate it. So instead I will move it to a different category.
    exec dbms_sqltune.alter_sql_profile(name=>'clashing_profile_test_exact', attribute_name=>'CATEGORY',value=>'DO_NOT_USE');
    SELECT * FROM dba_sql_profiles where name like 'clashing%';

    Change Category of Exact Profile - Execution plan with force matching profile (full scan)

    NAME CATEGORY SIGNATURE SQL_TEXT CREATED
    ------------------------------ ---------- --------------------- -------------------------------------------------------------------------------- ------------------------------
    LAST_MODIFIED DESCRIPTION TYPE STATUS FOR TASK_ID TASK_EXEC_NAME TASK_OBJ_ID TASK_FND_ID TASK_REC_ID TASK_CON_DBID
    ------------------------------ -------------------- ------- -------- --- ---------- -------------------- ----------- ----------- ----------- -------------
    clashing_profile_test_exact DO_NOT_USE 14843900676141266266 27-JUL-21 02.57.11.343291 PM
    SELECT * FROM t WHERE a = 42
    27-JUL-21 02.57.19.000000 PM MANUAL DISABLED NO

    clashing_profile_test_force DEFAULT 11431056000319719221 27-JUL-21 02.57.08.390801 PM
    SELECT * FROM t WHERE a = 54
    27-JUL-21 02.57.08.000000 PM MANUAL ENABLED YES
    And now the execution plan goes back to the force matching profile and the unique index lookup.
    EXPLAIN PLAN FOR SELECT * FROM t WHERE a = 42;
    SELECT * FROM table(dbms_xplan.display(null,null,'ADVANCED +ADAPTIVE -PROJECTION'));

    Plan hash value: 1601196873

    ----------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    ----------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1 | 6 | 3 (0)| 00:00:01 |
    |* 1 | TABLE ACCESS STORAGE FULL| T | 1 | 6 | 3 (0)| 00:00:01 |
    ----------------------------------------------------------------------------------

    Outline Data
    -------------

    /*+
    BEGIN_OUTLINE_DATA
    FULL(@"SEL$1""T"@"SEL$1")
    OUTLINE_LEAF(@"SEL$1")
    ALL_ROWS
    DB_VERSION('19.1.0')
    OPTIMIZER_FEATURES_ENABLE('19.1.0')
    IGNORE_OPTIM_EMBEDDED_HINTS
    END_OUTLINE_DATA
    */

    Predicate Information (identified by operation id):
    ---------------------------------------------------

    1 - storage("A"=42)
    filter("A"=42)

    Hint Report (identified by operation id / Query Block Name / Object Alias):
    Total hints for statement: 2
    ---------------------------------------------------------------------------

    0 - STATEMENT
    - IGNORE_OPTIM_EMBEDDED_HINTS

    1 - SEL$1 / T@SEL$1
    - FULL(@"SEL$1""T"@"SEL$1")

    Note
    -----
    - SQL profile "clashing_profile_test_force" used for this statement

    Conclusion

    An exact matching profile will be matched to a SQL statement before a force matching SQL statement, even if it is disabled, in which case neither profile will be applied.
    If you have exact matching SQL profiles that provide the same hints to produce the same execution plan on various similar SQL statements that have the same force matching signature (i.e. they only differ in their literal values), and you wish to replace them with a single force matching profile, then rather than disable the exact matching profiles you should either drop them or if you prefer to retain them for documentation then alter them to a different category. 
    • The script GitHub used in this blog to demonstrate this behaviour is available on GitHub.  They were run in Oracle 19.9 for this post.
    • The script disabled_profiles_category.sql moves all disabled profiles from the category DEFAULT to DO_NOT_USE.
    In a subsequent post, I will show how to detect conflicting SQL profiles.

    Detecting Clashing SQL Profiles

    $
    0
    0

    In my last post, I discussed the possible undesirable consequences of force and exact matching SQL profiles on statements with the same force matching signature.  The question is how do you detect such profiles?

    I have created three profiles on very similar SQL statements that only differ in the literal value of a predicate.  One of them is force matching, the others are exact matching.  The signature reported by DBA_SQL_PROFILES is the force matching signature for force matching profiles, and the exact matching signature for exact matching profiles.

    select * from dba_sql_profiles;

    NAME CATEGORY SIGNATURE SQL_TEXT CREATED
    ------------------------------ ---------- --------------------- -------------------------------------------------- ------------------------------
    LAST_MODIFIED DESCRIPTION TYPE STATUS FOR TASK_ID TASK_EXEC_NAME TASK_OBJ_ID TASK_FND_ID TASK_REC_ID TASK_CON_DBID
    ------------------------------ -------------------- ------- -------- --- ---------- -------------------- ----------- ----------- ----------- -------------
    my_sql_profile_force DEFAULT 11431056000319719221 16:09:33 01/08/2021
    SELECT * FROM t WHERE a = 54
    16:09:33 01/08/2021 MANUAL ENABLED YES

    my_sql_profile_24 DEFAULT 12140764948557749245 16:09:33 01/08/2021
    SELECT * FROM t
    WHERE a = 24
    16:09:33 01/08/2021 MANUAL ENABLED NO

    my_sql_profile_42 DEFAULT 14843900676141266266 16:09:33 01/08/2021
    SELECT * FROM t WHERE a = 42
    16:09:33 01/08/2021 MANUAL ENABLED NO
    In order to be able to compare the profiles, I need to calculate the force matching signature for the exact matching profiles using DBMS_SQLTUNE.SQLTEXT_TO_SIGNATURE.  I can't use the Boolean constant TRUE parameter in SQL.  Instead, I have used a PL/SQL function in a with clause.
    REM dup_sql_profiles1.sql
    WITH function sig(p_sql_text CLOB, p_number INTEGER) RETURN NUMBER IS
    l_sig NUMBER;
    BEGIN
    IF p_number > 0 THEN
    l_sig := dbms_sqltune.sqltext_to_signature(p_sql_text,TRUE);
    ELSIF p_number = 0 THEN
    l_sig := dbms_sqltune.sqltext_to_signature(p_sql_text,FALSE);
    END IF;
    RETURN l_sig;
    END;
    x as (
    select CASE WHEN force_matching = 'NO' THEN signature ELSE sig(sql_text, 0) END exact_sig
    , CASE WHEN force_matching = 'YES' THEN signature ELSE sig(sql_text, 1) END force_sig
    , p.*
    from dba_sql_profiles p
    where (status = 'ENABLED' or force_matching = 'NO')
    ), y as (
    select x.*
    , row_number() over (partition by category, force_sig order by force_matching desc, exact_sig nulls first) profile#
    , count(*) over (partition by category, force_sig) num_profiles
    from x
    )
    select profile#, num_profiles, force_sig, exact_sig, name, created, category, status, force_matching, sql_text
    from y
    where num_profiles > 1
    order by force_sig, force_matching desc, exact_sig
    /
     We can see these three profiles are grouped together.  The force matching signature calculated on the exact matching profiles is the same as the signature on the force matching profile.  Now I can start to make some decisions about whether I should retain the exact matching profiles or remove them and just use the force matching profile.
    Prof   Num        Force Matching        Exact Matching
    # Profs Signature Signature NAME CREATED CATEGORY STATUS FOR
    ---- ----- --------------------- --------------------- ------------------------------ ---------------------------- -------------------- -------- ---
    SQL_TEXT
    ----------------------------------------------------------------------------------------------------------------------------------------------------
    1 3 11431056000319719221 my_sql_profile_force 16:35:36 01/08/2021 DEFAULT ENABLED YES

    SELECT * FROM t WHERE a = 54

    2 3 12140764948557749245 my_sql_profile_24 16:35:36 01/08/2021 DEFAULT ENABLED NO

    SELECT * FROM t
    WHERE a = 24

    3 3 14843900676141266266 my_sql_profile_42 16:35:36 01/08/2021 DEFAULT ENABLED NO

    SELECT * FROM t WHERE a = 42
    The SQL statements in this example are absurdly simple.  In real life that is rarely the case.  Sometimes it can be a struggle to see where two complex statements differ.
    In the next query, I compare enabled force matching SQL profiles to any exact matching profiles in the same category with the same force matching signature.  The full query is on GitHub.
    REM dup_sql_profiles2.sql
    WITH function sig(p_sql_text CLOB, p_number INTEGER) RETURN NUMBER IS

    END sig;
    function norm(p_queryin CLOB) RETURN CLOB IS

    END norm;
    function str_diff(p_str1 CLOB, p_str2 CLOB) RETURN NUMBER IS

    END str_diff;
    x as (
    select CASE WHEN force_matching = 'NO' THEN signature ELSE sig(sql_text, 0) END exact_sig
    , CASE WHEN force_matching = 'YES' THEN signature ELSE sig(sql_text, 1) END force_sig
    , p.*
    from dba_sql_profiles p
    ), y as (
    select f.force_matching, f.force_sig, f.name force_name, f.created force_created, f.status force_status
    , e.force_matching exact_matching, e.exact_sig, e.name exact_name
    , e.created exact_created, e.status exact_status, e.category
    , norm(e.sql_text) esql_text, norm(f.sql_text) fsql_text
    from x e
    , x f
    where f.force_matching = 'YES'
    and e.force_matching = 'NO'
    and e.force_sig = f.force_sig
    and e.category = f.category
    and e.name != f.name
    and f.status = 'ENABLED'
    ), z as (
    select y.*
    , str_diff(fsql_Text, esql_text) diff_len
    from y
    )
    select force_matching, force_Sig, force_name, force_created, force_status
    , exact_matching, exact_sig, exact_name, exact_Created, exact_status
    , substr(fsql_text,1,diff_len) common_text
    , substr(fsql_text,diff_len+1) fdiff_text, substr(esql_text,diff_len+1) ediff_text
    from z
    order by force_sig
    /
    I have shown the common part of both statements, from the start to the first difference, and then also how the rest of each statement continues.
    It is not enough to simply compare two statements character by character.  Both the force and exact matching signatures are "calculated on the normalized SQL text. The normalization includes the removal of white space and the uppercasing of all non-literal strings".  However, neither the normalised SQL, nor the normalisation mechanism is exposed by Oracle.  Therefore, in this query, I have included my own rudimentary normalisation function (based on an idea from AskTOM) that I apply first and a string comparison function.  You can see that normalisation has eliminated the line feed in from the statement in my_sql_profile_24.
    Now I can see my two exact matching profiles match my force matching profile.  I can see the common part of the SQL up to the literal value, and the different parts of the text are just the literal value.  
               Force Matching Force                          Force                        Force               Exact Matching Exact                          Exact                        Exact
    FOR Signature Name Created Date Status EXA Signature Name Created Date Status
    --- --------------------- ------------------------------ ---------------------------- -------- --- --------------------- ------------------------------ ---------------------------- --------
    Common Text
    -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Force Text Exact Text
    --------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------
    YES 11431056000319719221 my_sql_profile_force 16:35:36 01/08/2021 ENABLED NO 12140764948557749245 my_sql_profile_24 16:35:36 01/08/2021 ENABLED
    SELECT * FROM T WHERE A =
    54 24

    ENABLED NO 14843900676141266266 my_sql_profile_42 16:35:36 01/08/2021 ENABLED
    SELECT * FROM T WHERE A =
    54
    Both the queries mentioned in this blog are available on GitHub.

    Alter SQL Profiles from Exact to Force Matching

    $
    0
    0

    You can use DBMS_SQLTUNE.ALTER_SQL_PROFILE to change the status, name, description, or category of a SQL profile, but you can't alter it from exact to force matching.  Instead, you would have to recreate it.  That is easy if you have the script that you used to create it in the first place.  There is another way.

    Oracle support note How to Move SQL Profiles from One Database to Another (Including to Higher Versions) (Doc ID 457531.1) describes a process to export SQL profiles to a staging table that can be imported into another database.  This provides an opportunity to alter a profile by updating the data in the staging table.  There are two columns in the staging table that have to be updated.

    • SQLFLAGS must be updated from 0 (indicating an exact match profile) to 1 (indicating a force match profile)
    • SIGNATURE must be recalculated as a force matching signature using DBMS_SQLTUNE.SQLTEXT_TO_SIGNATURE.

    Demonstration

    I am going to create a small table with a unique index.  

    CREATE TABLE t (a not null, b) AS 
    SELECT rownum, ceil(sqrt(rownum)) FROM dual connect by level <= 100;
    create unique index t_idx on t(a);
    exec dbms_stats.gather_table_stats(user,'T');

    ttitle off
    select * from dba_sql_profiles where name like 'my%sql_profile%';
    explain plan for SELECT * FROM t WHERE a = 42;
    ttitle 'Default Execution plan without profiles (index scan)'
    select * from table(dbms_xplan.display(null,null,'ADVANCED +ADAPTIVE -PROJECTION'));

    Without any SQL profiles, when I query by the unique key I get a unique index scan.

    Plan hash value: 2929955852

    -------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    -------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1 | 6 | 1 (0)| 00:00:01 |
    | 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 6 | 1 (0)| 00:00:01 |
    |* 2 | INDEX UNIQUE SCAN | T_IDX | 1 | | 0 (0)| 00:00:01 |
    -------------------------------------------------------------------------------------

    Now I am going to create two SQL profiles.  I have deliberately put the same SQL text into both SQL Profiles.

    • my_sql_profile is exact matching
    • my_sql_profile_force is force matching.

    DECLARE
    signature INTEGER;
    sql_txt CLOB;
    h SYS.SQLPROF_ATTR;
    BEGIN
    sql_txt := q'[
    SELECT * FROM t WHERE a = 54
    ]';
    h := SYS.SQLPROF_ATTR(
    q'[BEGIN_OUTLINE_DATA]',
    q'[IGNORE_OPTIM_EMBEDDED_HINTS]',
    q'[FULL(@"SEL$1""T"@"SEL$1")]',
    q'[END_OUTLINE_DATA]');
    signature := DBMS_SQLTUNE.SQLTEXT_TO_SIGNATURE(sql_txt);
    DBMS_SQLTUNE.IMPORT_SQL_PROFILE (
    sql_text => sql_txt,
    profile => h,
    name => 'my_sql_profile',
    category => 'DEFAULT',
    validate => TRUE,
    replace => TRUE,
    force_match => FALSE
    );
    END;
    /

    DECLARE
    signature INTEGER;
    sql_txt CLOB;
    h SYS.SQLPROF_ATTR;
    BEGIN
    sql_txt := q'[
    SELECT * FROM t WHERE a = 54
    ]';
    h := SYS.SQLPROF_ATTR(
    q'[BEGIN_OUTLINE_DATA]',
    q'[IGNORE_OPTIM_EMBEDDED_HINTS]',
    q'[FULL(@"SEL$1""T"@"SEL$1")]',
    q'[END_OUTLINE_DATA]');
    signature := DBMS_SQLTUNE.SQLTEXT_TO_SIGNATURE(sql_txt);
    DBMS_SQLTUNE.IMPORT_SQL_PROFILE (
    sql_text => sql_txt,
    profile => h,
    name => 'my_sql_profile_force',
    category => 'DEFAULT',
    validate => TRUE,
    replace => TRUE,
    force_match => TRUE
    );
    END;
    /
    ttitle off
    select * from dba_sql_profiles where name like 'my%sql_profile%';


    NAME CATEGORY SIGNATURE SQL_TEXT CREATED
    ------------------------------ ---------- --------------------- -------------------------------------------------------------------------------- ------------------------------
    LAST_MODIFIED DESCRIPTION TYPE STATUS FOR TASK_ID TASK_EXEC_NAME TASK_OBJ_ID TASK_FND_ID TASK_REC_ID TASK_CON_DBID
    ------------------------------ -------------------- ------- -------- --- ---------- -------------------- ----------- ----------- ----------- -------------
    my_sql_profile DEFAULT 9394869341287877934 31-JUL-21 10.47.34.243454
    SELECT * FROM t WHERE a = 54
    31-JUL-21 10.47.34.000000 MANUAL ENABLED NO

    my_sql_profile_force DEFAULT 11431056000319719221 31-JUL-21 10.47.34.502721
    SELECT * FROM t WHERE a = 54
    31-JUL-21 10.47.34.000000 MANUAL ENABLED YES

    The force match profile works if the literal value is different from that in the profiles.

    explain plan for SELECT * FROM t WHERE a = 42;
    select * from table(dbms_xplan.display(null,null,'ADVANCED +ADAPTIVE -PROJECTION'));

    Plan hash value: 1601196873
    --------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    --------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1 | 6 | 3 (0)| 00:00:01 |
    |* 1 | TABLE ACCESS FULL| T | 1 | 6 | 3 (0)| 00:00:01 |
    --------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------

    1 - filter("A"=42)

    Hint Report (identified by operation id / Query Block Name / Object Alias):
    Total hints for statement: 2
    ---------------------------------------------------------------------------

    0 - STATEMENT
    - IGNORE_OPTIM_EMBEDDED_HINTS

    1 - SEL$1 / T@SEL$1
    - FULL(@"SEL$1""T"@"SEL$1")

    Note
    -----
    - SQL profile "my_sql_profile_force" used for this statement

    The exact match profile takes precedence of the force match profile.

    explain plan for SELECT * FROM t WHERE a = 54;
    select * from table(dbms_xplan.display(null,null,'ADVANCED +ADAPTIVE -PROJECTION'));

    Plan hash value: 1601196873

    --------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    --------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1 | 6 | 3 (0)| 00:00:01 |
    |* 1 | TABLE ACCESS FULL| T | 1 | 6 | 3 (0)| 00:00:01 |
    --------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------

    1 - filter("A"=54)

    Hint Report (identified by operation id / Query Block Name / Object Alias):
    Total hints for statement: 2
    ---------------------------------------------------------------------------

    0 - STATEMENT
    - IGNORE_OPTIM_EMBEDDED_HINTS

    1 - SEL$1 / T@SEL$1
    - FULL(@"SEL$1""T"@"SEL$1")

    Note
    -----
    - SQL profile "my_sql_profile" used for this statement

    I am now going to follow the process to export the SQL Profiles to a staging table, and subsequently reimport them.

    exec DBMS_SQLTUNE.CREATE_STGTAB_SQLPROF(table_name=>'STAGE',schema_name=>user);
    exec DBMS_SQLTUNE.PACK_STGTAB_SQLPROF (staging_table_name =>'STAGE',profile_name=>'my_sql_profile');
    exec DBMS_SQLTUNE.PACK_STGTAB_SQLPROF (staging_table_name =>'STAGE',profile_name=>'my_sql_profile_force');

    There is a row in the staging table for each profile and you can see the differences between them.

    select signature, sql_handle, obj_name, obj_type, sql_text, sqlflags from STAGE;

    SIGNATURE SQL_HANDLE OBJ_NAME
    --------------------- ------------------------------ ---------------------------------------------------------------------
    OBJ_TYPE SQL_TEXT SQLFLAGS
    ------------------------------ -------------------------------------------------------------------------------- ----------
    9394869341287877934 SQL_826147e3c6ac0d2e my_sql_profile
    SQL_PROFILE 0
    SELECT * FROM t WHERE a = 54

    11431056000319719221 SQL_9ea344de32a78735 my_sql_profile_force
    SQL_PROFILE 1
    SELECT * FROM t WHERE a = 54

    I will update the staging table using this PL/SQL loop (because SQL doesn't recognise TRUE as a boolean constant).

    DECLARE
    l_sig INTEGER;
    BEGIN
    FOR i IN (
    SELECT rowid, stage.* FROM stage WHERE sqlflags = 0 FOR UPDATE
    ) LOOP
    l_sig := dbms_sqltune.sqltext_to_signature(i.sql_text,TRUE);
    UPDATE stage
    SET signature = l_sig
    , sqlflags = 1
    WHERE sqlflags = 0
    AND rowid = i.rowid;
    END LOOP;
    END;
    /

    And now the profiles look the same.

    select signature, sql_handle, obj_name, obj_type, sql_text, sqlflags from STAGE;

    SIGNATURE SQL_HANDLE OBJ_NAME
    --------------------- ------------------------------ ---------------------------------------------------------------------
    OBJ_TYPE SQL_TEXT SQLFLAGS
    ------------------------------ -------------------------------------------------------------------------------- ----------
    11431056000319719221 SQL_826147e3c6ac0d2e my_sql_profile
    SQL_PROFILE 1
    SELECT * FROM t WHERE a = 54

    11431056000319719221 SQL_9ea344de32a78735 my_sql_profile_force
    SQL_PROFILE 1
    SELECT * FROM t WHERE a = 54

    But I can't just reimport my_sql_profile from the staging replacing the one in the database because I will get ORA-13841: SQL profile named my_sql_profile already exists for a different signature/category pair. To avoid this error I must either drop the profile or rename it.  

    I am going to rename the existing exact matching profile, and also disable it and move it to another category to stop it from matching my statement in preference to the force matching profile (see previous post Clashing SQL Profiles - Exact Matching Profiles Take Precedence Over Force Matching Profiles), and thus I can go back to it later if needed.

    I will drop my example force matching profile.  I no longer need that.

    Then, I can reimport the profile from the staging table.

    exec dbms_sqltune.alter_sql_profile(name=>'my_sql_profile', attribute_name=>'NAME',value=>'my_old_sql_profile');
    exec dbms_sqltune.alter_sql_profile(name=>'my_old_sql_profile', attribute_name=>'CATEGORY',value=>'DO_NOT_USE');
    exec dbms_sqltune.alter_sql_profile(name=>'my_old_sql_profile', attribute_name=>'STATUS',value=>'DISABLED');
    exec dbms_sqltune.drop_sql_profile('my_sql_profile_force',TRUE);
    EXEC DBMS_SQLTUNE.UNPACK_STGTAB_SQLPROF(profile_name => 'my_sql_profile', replace => TRUE, staging_table_name => 'STAGE');

    I can see in the SQL profile table that my SQL profile is now force matching, and it has a different signature to the old one that is exact matching.

    ttitle off
    select * from dba_sql_profiles where name like 'my%sql_profile%';

    NAME CATEGORY SIGNATURE SQL_TEXT CREATED
    ------------------------------ ---------- --------------------- -------------------------------------------------------------------------------- ------------------------------
    LAST_MODIFIED DESCRIPTION TYPE STATUS FOR TASK_ID TASK_EXEC_NAME TASK_OBJ_ID TASK_FND_ID TASK_REC_ID TASK_CON_DBID
    ------------------------------ -------------------- ------- -------- --- ---------- -------------------- ----------- ----------- ----------- -------------
    my_old_sql_profile DO_NOT_USE 9394869341287877934 31-JUL-21 10.54.58.694037
    SELECT * FROM t WHERE a = 54
    31-JUL-21 10.55.00.000000 MANUAL DISABLED NO

    my_sql_profile DEFAULT 11431056000319719221 31-JUL-21 10.55.01.005377
    SELECT * FROM t WHERE a = 54
    31-JUL-21 10.55.01.000000 MANUAL ENABLED YES

    Both my queries now match the new force matching version of the profile.

    explain plan for SELECT * FROM t WHERE a = 42;
    ttitle 'Execution plan with force match profile (full scan)'
    select * from table(dbms_xplan.display(null,null,'ADVANCED +ADAPTIVE -PROJECTION'));

    Plan hash value: 1601196873

    --------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    --------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1 | 6 | 3 (0)| 00:00:01 |
    |* 1 | TABLE ACCESS FULL| T | 1 | 6 | 3 (0)| 00:00:01 |
    --------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------

    1 - filter("A"=42)

    Hint Report (identified by operation id / Query Block Name / Object Alias):
    Total hints for statement: 2
    ---------------------------------------------------------------------------

    0 - STATEMENT
    - IGNORE_OPTIM_EMBEDDED_HINTS

    1 - SEL$1 / T@SEL$1
    - FULL(@"SEL$1""T"@"SEL$1")

    Note
    -----
    - SQL profile "my_sql_profile" used for this statement

    explain plan for SELECT * FROM t WHERE a = 54;
    select * from table(dbms_xplan.display(null,null,'ADVANCED +ADAPTIVE -PROJECTION'));

    Plan hash value: 1601196873

    --------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    --------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1 | 6 | 3 (0)| 00:00:01 |
    |* 1 | TABLE ACCESS FULL| T | 1 | 6 | 3 (0)| 00:00:01 |
    --------------------------------------------------------------------------


    Predicate Information (identified by operation id):
    ---------------------------------------------------

    1 - filter("A"=54)

    Hint Report (identified by operation id / Query Block Name / Object Alias):
    Total hints for statement: 2
    ---------------------------------------------------------------------------

    0 - STATEMENT
    - IGNORE_OPTIM_EMBEDDED_HINTS

    1 - SEL$1 / T@SEL$1
    - FULL(@"SEL$1""T"@"SEL$1")

    Note
    -----
    - SQL profile "my_sql_profile" used for this statement
    The script used for this demonstration is available on GitHub

    Obtaining Trace Files without Access to the Database Server

    $
    0
    0

    Why Trace? 

    For many years, I used database SQL Trace to investigate SQL performance problems. I would trace a process, obtain the trace file, profile it (with Oracle's TKPROF or another profiling tool such as the Method R profiler, TVD$XTAT, or OraSRP), and analyse the profile. 
    Active Session History (ASH) was introduced in Oracle 10g.  Today, it is usually where I start to investigate performance problems. It has the advantage that it is always on, and I can just query ASH data from the Automatic Workload Repository (AWR). However, ASH is only available on Enterprise Edition and requires the Diagnostics Pack licence. 
    Sometimes, even if available, ASH isn't enough. ASH is based on sampling database activity, while trace is a record of all the SQL activity in a session. Some short-lived behaviour, that doesn't generate many samples, is difficult to investigate with ASH. Sometimes, it is necessary to dig deeper and use SQL trace. 
    On occasion, you might want to generate other forms trace.  For example, an optimizer trace (event 10053) in order to understand how an execution plan was arrived at.

    Where is my Trace File? 

    A trend that I have observed over the years is that is is becoming ever more difficult to get hold of the trace files. If you are not the production DBA, you are unlikely to get access to the database server. Frequently, I find that pre-production performance test databases, which are often clones of the production database, are treated as production systems. After all, they contain production data. The move to the cloud has accelerated that trend. On some cloud services, you have no access to the database server at all! 
    In the past, I have blogged about using an external table from which the trace file can be queried, a variation of a theme others had also written about. It required certain privileges, a new external table was required for each trace file, and you had to know the name of the trace file, and on which RAC instance it was located. 
    However, in version 12.2, it is much easier.  Oracle has provided some new views that report what trace files are available and then query their contents. 

    Where Is This Session Writing Its Trace File?

    The Automatic Diagnostic Repository (ADR) was first documented in 11g. The view V$DIAG_INFO was introduced in 12c, from which you can query the state of the ADR. This includes the various directory paths to which files are written and the name of the current trace file.
    select dbid, con_dbid, name from v$database;
    column inst_id format 99 heading 'Inst|ID'
    column con_id format 99 heading 'Con|ID'
    column name format a22
    column value format a95
    select * from v$diag_info;

    Inst Con
    ID NAME VALUE ID
    ---- ---------------------- ----------------------------------------------------------------------------------------------- ---
    1 Diag Enabled TRUE 0
    1 ADR Base /opt/oracle/psft/db/oracle-server 0
    1 ADR Home /opt/oracle/psft/db/oracle-server/diag/rdbms/cdbhcm/CDBHCM 0
    1 Diag Trace /opt/oracle/psft/db/oracle-server/diag/rdbms/cdbhcm/CDBHCM/trace 0
    1 Diag Alert /opt/oracle/psft/db/oracle-server/diag/rdbms/cdbhcm/CDBHCM/alert 0
    1 Diag Incident /opt/oracle/psft/db/oracle-server/diag/rdbms/cdbhcm/CDBHCM/incident 0
    1 Diag Cdump /opt/oracle/psft/db/oracle-server/diag/rdbms/cdbhcm/CDBHCM/cdump 0
    1 Health Monitor /opt/oracle/psft/db/oracle-server/diag/rdbms/cdbhcm/CDBHCM/hm 0
    1 Default Trace File /opt/oracle/psft/db/oracle-server/diag/rdbms/cdbhcm/CDBHCM/trace/CDBHCM_ora_27009_unnest.trc 0
    1 Active Problem Count 0 0
    1 Active Incident Count 0 0
    1 ORACLE_HOME /opt/oracle/psft/db/oracle-server/19.3.0.0 0

    What files have been written? 

    The available files are reported by V$DIAG_TRACE_FILE
    column adr_home format a60
    column trace_filename format a40
    column change_time format a32
    column modify_time format a32
    column con_id format 999
    select *
    from v$DIAG_TRACE_FILE
    where adr_home = '&adr_Home'
    order by modify_time
    /

    ADR_HOME TRACE_FILENAME CHANGE_TIME MODIFY_TIME CON_ID
    ------------------------------------------------------------ ---------------------------------------- -------------------------------- -------------------------------- ------

    /opt/oracle/psft/db/oracle-server/diag/rdbms/cdbhcm/CDBHCM CDBHCM_ora_27674_no_unnest.trc 13-SEP-22 02.06.10.000 PM +00:00 13-SEP-22 02.06.10.000 PM +00:00 3
    /opt/oracle/psft/db/oracle-server/diag/rdbms/cdbhcm/CDBHCM CDBHCM_ora_27674_unnest.trc 13-SEP-22 02.06.11.000 PM +00:00 13-SEP-22 02.06.11.000 PM +00:00 3

    What is in the file? 

    I can then extract the contents of the file from V$DIAG_TRACE_FILE_CONTENTS. Each line of the trace is returned in a different row. 
    This script spools the contents of the current trace file from SQL Plus locally to a file of the same name. It stores the name of the ADR home and its file path and the trace file name to SQL*Plus variables and then uses these to query the trace file contents. 
    I can generate a trace and then run this script to extract it locally.
    REM spooltrc.sql

    clear screen
    set heading on pages 99 lines 180 verify off echo off trimspool on termout on feedback off
    column value format a95
    column value new_value adr_home heading 'ADR Home'
    select value from v$diag_info where name = 'ADR Home';
    column value new_value diag_trace heading 'Diag Trace'
    select value from v$diag_info where name = 'Diag Trace';
    column value new_value trace_filename heading 'Trace File'
    select SUBSTR(value,2+LENGTH('&diag_trace')) value from v$diag_info where name = 'Default Trace File'
    /
    column adr_home format a60
    column trace_filename format a40
    column change_time format a32
    column modify_time format a32
    column con_id format 999
    select *
    from v$DIAG_TRACE_FILE
    where adr_home = '&adr_home'
    and trace_filename = '&trace_filename'
    /
    set head off pages 0 lines 5000 verify off echo off timi off termout off feedback off long 5000
    spool &trace_filename
    select payload
    from v$diag_trace_file_contents
    where adr_home = '&adr_home'
    and trace_filename = '&trace_filename'
    order by line_number
    /
    spool off
    set head on pages 99 lines 180 verify on echo on termout on feedback on
    The spooltrc.sql script is available on Github.  In a subsequent blog, I will demonstrate how to use it.
    The payload is a VARCHAR2 column, so it is easy to search one or several trace files for specific text. This is useful if you are having trouble identifying the trace file of interest. 
    See also:

    No Execution Plan Survives Contact with the Optimizer Untransformed

    $
    0
    0
    One of the benefits of attending Oracle conferences is that by listening and talking to other people I get a different perspective on things. Sometimes, something gives me an idea or reminds me of the importance of something that I don't use often enough. I was talking with Neil Chandler about SQL Query Transformation. We came up with a variation of a well known quote:
    It isn't completely accurate.  Not every query gets transformed, but it occurs commonly, it made a good title, and you are reading this blog!
    During SQL parse, the optimizer can transform a SQL query into another SQL query that is functionally identical but that results in an execution plan with a lower cost (and therefore should execute more quickly). Sometimes, multiple transformations can be applied to a single statement. 
    The Oracle documentation describes various forms of transformation. You can see in the execution plan that something has happened, but you can't see the transformed SQL statement directly. However, it can be obtained from the optimizer trace that can be enabled by setting event 10053.

    Demonstration 

    I am going to take a simple SQL query
    set pages 99 lines 200 autotrace off
    alter session set tracefile_identifier='no_unnest';
    alter session set events '10053 trace name context forever, level 1';
    select emplid, name, effdt, last_name
    from ps_names x
    where x.last_name = 'Smith'
    and x.name_type = 'PRI'
    and x.effdt = (
    SELECT /*+NO_UNNEST*/ MAX(x1.effdt)
    FROM ps_names x1
    WHERE x1.emplid = x.emplid
    AND x1.name_type = x.name_type
    AND x1.effdt <= SYSDATE)
    /
    alter session set events '10053 trace name context off';
    @spooltrc
    • For the second execution, an UNNEST hint is used to force the optimizer to unnest the sub-query.
    alter session set tracefile_identifier='unnest';
    alter session set events '10053 trace name context forever, level 1';
    select emplid, name, effdt, last_name
    from ps_names x
    where x.last_name = 'Smith'
    and x.name_type = 'PRI'
    and x.effdt = (
    SELECT /*+UNNEST*/ MAX(x1.effdt)
    FROM ps_names x1
    WHERE x1.emplid = x.emplid
    AND x1.name_type = x.name_type
    AND x1.effdt <= SYSDATE)
    /
    alter session set events '10053 trace name context off';
    @spooltrc
    This is the execution plan from the first trace file for the statement with the NO_UNNEST hint. The select query blocks are simply numbered sequentially and thus are called SEL$1 and SEL$2. SEL$2 is the sub-query that references PS_NAMES with the row source alias X1. No query transformation has occurred.
    -------------------------------------------------------+-----------------------------------+
    | Id | Operation | Name | Rows | Bytes | Cost | Time |
    -------------------------------------------------------+-----------------------------------+
    | 0 | SELECT STATEMENT | | | | 122 | |
    | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | PS_NAMES| 1 | 44 | 120 | 00:00:02 |
    | 2 | INDEX SKIP SCAN | PS_NAMES| 11 | | 112 | 00:00:02 |
    | 3 | SORT AGGREGATE | | 1 | 21 | | |
    | 4 | FIRST ROW | | 1 | 21 | 2 | 00:00:01 |
    | 5 | INDEX RANGE SCAN (MIN/MAX) | PS_NAMES| 1 | 21 | 2 | 00:00:01 |
    -------------------------------------------------------+-----------------------------------+
    Query Block Name / Object Alias (identified by operation id):
    ------------------------------------------------------------
    1 - SEL$1 / "X"@"SEL$1"
    2 - SEL$1 / "X"@"SEL$1"
    3 - SEL$2
    5 - SEL$2 / "X1"@"SEL$2"
    ------------------------------------------------------------
    Predicate Information:
    ----------------------
    1 - filter("X"."LAST_NAME"='Smith')
    2 - access("X"."NAME_TYPE"='PRI')
    2 - filter(("X"."NAME_TYPE"='PRI' AND "X"."EFFDT"=))
    5 - access("X1"."EMPLID"=:B1 AND "X1"."NAME_TYPE"=:B2 AND "X1"."EFFDT"<=SYSDATE@!)
    Now, let's look at the optimizer trace file for the statement with the UNNEST hint. First, we can see the statement as submitted with its SQL_ID.
    Trace file /opt/oracle/psft/db/oracle-server/diag/rdbms/cdbhcm/CDBHCM/trace/CDBHCM_ora_21909_unnest.trc
    Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
    Version 19.7.0.0.0

    ----- Current SQL Statement for this session (sql_id=7r3mwa86fma5t) -----
    select emplid, name, effdt, last_name
    from ps_names x
    where x.last_name = 'Smith'
    and x.name_type = 'PRI'
    and x.effdt = (
    SELECT /*+UNNEST*/ MAX(x1.effdt)
    FROM ps_names x1
    WHERE x1.emplid = x.emplid
    AND x1.name_type = x.name_type
    AND x1.effdt <= SYSDATE)

    Later in the trace, we can see the fully expanded SQL statement preceded by the 'UNPARSED QUERY IS' message. 
    • All the SQL language keywords have been forced into upper case.
    • All the object and column names have been made upper case to match the objects.
    • Every column and table is double-quoted which makes them case sensitive.    
    • The columns all have row source aliases. 
    • The row sources (tables in this case) are fully qualified.
    • Only the literal 'Smith' is in mixed case.
    Various unparsed queries may appear in the trace as the optimizer tries and costs different transformations. These are not nicely formatted, the expanded statements are just a long string of text.   The first one is the expanded form of the untransformed statement.
    Stmt: ******* UNPARSED QUERY IS *******
    SELECT "X"."EMPLID""EMPLID","X"."NAME""NAME","X"."EFFDT""EFFDT","X"."LAST_NAME""LAST_NAME" FROM "SYSADM"."PS_NAMES"
    "X" WHERE "X"."LAST_NAME"='Smith' AND "X"."NAME_TYPE"='PRI' AND "X"."EFFDT"= (SELECT /*+ UNNEST */ MAX("X1"."EFFDT")
    "MAX(X1.EFFDT)" FROM "SYSADM"."PS_NAMES""X1" WHERE "X1"."EMPLID"="X"."EMPLID" AND "X1"."NAME_TYPE"="X"."NAME_TYPE" AND
    "X1"."EFFDT"<=SYSDATE@!)
    Here the sub-query has been transformed into an in-line view.  I have reformatted it to make it easier to read.
    CVM:   Merging complex view SEL$683B0107 (#2) into SEL$C772B8D1 (#1).
    qbcp:******* UNPARSED QUERY IS *******
    SELECT "X"."EMPLID""EMPLID","X"."NAME""NAME","X"."EFFDT""EFFDT","X"."LAST_NAME""LAST_NAME"
    FROM (SELECT /*+ UNNEST */ MAX("X1"."EFFDT") "MAX(X1.EFFDT)","X1"."EMPLID""ITEM_0","X1"."NAME_TYPE""ITEM_1"
    FROM "SYSADM"."PS_NAMES""X1"
    WHERE "X1"."EFFDT"<=SYSDATE@!
    GROUP BY "X1"."EMPLID","X1"."NAME_TYPE") "VW_SQ_1"
    ,"SYSADM"."PS_NAMES""X"
    WHERE "X"."LAST_NAME"='Smith'
    AND "X"."NAME_TYPE"='PRI'
    AND "X"."EFFDT"="VW_SQ_1"."MAX(X1.EFFDT)"
    AND "VW_SQ_1"."ITEM_0"="X"."EMPLID"
    AND "VW_SQ_1"."ITEM_1"="X"."NAME_TYPE"
    This is the final form of the statement that was executed and that produced the execution plan.  The in-line view has been merged into the parent query.  There will only be a final query section if any transformations have occurred. Again, I have reformatted it here to make it easier to read.  
    Final query after transformations:******* UNPARSED QUERY IS *******
    SELECT /*+ UNNEST */ "X"."EMPLID""EMPLID","X"."NAME""NAME","X"."EFFDT""EFFDT",'Smith'"LAST_NAME"
    FROM "SYSADM"."PS_NAMES""X1"
    ,"SYSADM"."PS_NAMES""X"
    WHERE "X"."LAST_NAME"='Smith'
    AND "X"."NAME_TYPE"='PRI'
    AND "X1"."EMPLID"="X"."EMPLID"
    AND "X1"."NAME_TYPE"="X"."NAME_TYPE"
    AND "X1"."EFFDT"<=SYSDATE@!
    AND "X1"."NAME_TYPE"='PRI'
    GROUP BY "X1"."NAME_TYPE","X".ROWID,"X"."EFFDT","X"."NAME","X"."EMPLID"
    HAVING "X"."EFFDT"=MAX("X1"."EFFDT")

    • PS_NAMES X1 has been moved from the subquery into the main from clause. Instead of a correlated subquery, we now have a two-table join. 
    • The query is grouped by the ROWID on row source X and the other selected columns. 
    • Instead of joining the tables on NAME_TYPE, the literal criterion has been duplicated in X1 
    • A having clause is used to join X.EFFDT to the maximum value of X1.EFFDT. 
    • Instead of selecting LAST_NAME from X, the literal value in the predicate has been put in the select clause. 
    If we look at the execution plan for the unnested statement we can see X and X1 are now in query block SEL$841DDE77 that has been unnested and merged.

    ----- Explain Plan Dump -----

    ----------------------------------------+-----------------------------------+
    | Id | Operation | Name | Rows | Bytes | Cost | Time |
    ----------------------------------------+-----------------------------------+
    | 0 | SELECT STATEMENT | | | | 139 | |
    | 1 | FILTER | | | | | |
    | 2 | SORT GROUP BY | | 1 | 77 | 139 | 00:00:02 |
    | 3 | NESTED LOOPS | | 3 | 231 | 138 | 00:00:02 |
    | 4 | TABLE ACCESS FULL | PS_NAMES| 2 | 112 | 136 | 00:00:02 |
    | 5 | INDEX RANGE SCAN | PS_NAMES| 1 | 21 | 1 | 00:00:01 |
    ----------------------------------------+-----------------------------------+
    Query Block Name / Object Alias (identified by operation id):
    ------------------------------------------------------------
    1 - SEL$841DDE77
    4 - SEL$841DDE77 / "X"@"SEL$1"
    5 - SEL$841DDE77 / "X1"@"SEL$2"
    ------------------------------------------------------------
    Predicate Information:
    ----------------------
    1 - filter("EFFDT"=MAX("X1"."EFFDT"))
    4 - filter(("X"."LAST_NAME"='Smith' AND "X"."NAME_TYPE"='PRI'))
    5 - access("X1"."EMPLID"="X"."EMPLID" AND "X1"."NAME_TYPE"='PRI' AND "X1"."EFFDT"<=SYSDATE@!)

    The new query block name is a hash value based on the names of other blocks.  The presence of such a block name is an indication of query transformation occurring.  The query block name is stable and it is referenced in the outline of hints. 
    "A question that we could ask about the incomprehensible query block names that Oracle generates is: 'are they deterministic?'– is it possible for the same query to give you the same plan while generating different query block names on different versions of Oracle (or different days of the week). The answer is (or should be) no; when Oracle generates a query block name (after supplying the initial defaults of sel$1, sel$2 etc.) it applies a hashing function to the query block names that have gone INTO a transformation to generate the name that it will use for the block that comes OUT of the transformation." - Jonathan Lewis: Query Blocks and Inline Views 
    As Jonathan points out "the 'Outline Data' section of the report tells us that query block" in my example SEL$841DDE77 "is an 'outline_leaf', in other words, it is a 'final' query block that has actually been subject to independent optimization". We can also see other query block names referenced in OUTLINE hints.
      Outline Data:
    /*+
    BEGIN_OUTLINE_DATA

    OUTLINE_LEAF(@"SEL$841DDE77")
    MERGE(@"SEL$683B0107">"SEL$C772B8D1")
    OUTLINE(@"SEL$C772B8D1")
    UNNEST(@"SEL$2")
    OUTLINE(@"SEL$683B0107")
    OUTLINE(@"SEL$7511BFD2")
    OUTLINE(@"SEL$2")
    OUTLINE(@"SEL$1")
    FULL(@"SEL$841DDE77""X"@"SEL$1")
    INDEX(@"SEL$841DDE77""X1"@"SEL$2" ("PS_NAMES"."EMPLID""PS_NAMES"."NAME_TYPE""PS_NAMES"."EFFDT"))
    LEADING(@"SEL$841DDE77""X"@"SEL$1""X1"@"SEL$2")
    USE_NL(@"SEL$841DDE77""X1"@"SEL$2")
    END_OUTLINE_DATA
    */
    We can see these query block names being generated in the trace as a number of transformations are applied with some description of the transformation.
    Registered qb: SEL$683B0107 0xfc6e3030 (SUBQ INTO VIEW FOR COMPLEX UNNEST SEL$2)
    Registered qb: SEL$7511BFD2 0xfc6c5c68 (VIEW ADDED SEL$1)
    Registered qb: SEL$C772B8D1 0xfc6c5c68 (SUBQUERY UNNEST SEL$7511BFD2; SEL$2)
    Registered qb: SEL$841DDE77 0xfc6d91e0 (VIEW MERGE SEL$C772B8D1; SEL$683B0107; SEL$C772B8D1)

    There is no BITOR() in Oracle SQL

    $
    0
    0
    In Oracle SQL, I can do a bitwise AND of two numbers, but there is no equivalent function to do a bitwise OR.  However, it turns out to be really easy to do using BITAND().
    I was manipulating some trace values where each binary digit, or bit, corresponds to a different function.  I wanted to ensure certain attributes were set.  So, I wanted to do a bitwise OR between the current flag value and the value of the bits I wanted to set. 
    In bitwise OR, if either or both bits are set, then the answer is 1.  It is like addition, but when both the bits are 1, the answer is 1 rather than 2.  I can add the bits up and then deduct BITAND().  Thus:
    BITOR01   +01   BITAND01
    001 = 001 - 000
    111   112   101
    Or I could write it as 
    BITOR(x,y)  =  x + y - BITAND(x,y)

    Here is a simple example with two decimal numbers expressed in binary.  The results of AND and OR operations are below, with their decimal values.

     27 = 00011011
    42 = 00101010

    AND = 00001010 = 10
    OR = 00111011 = 59
    I can then write a simple SQL expression to calculate this, and perhaps put it into a PL/SQL function thus:
    WITH FUNCTION bitor(p1 INTEGER, p2 INTEGER) RETURN INTEGER IS
    BEGIN
    RETURN p1+p2-bitand(p1,p2);
    END;
    SELECT BITAND(27,42)
    , 27+42-BITAND(27,42)
    , bitor(27,42)
    FROM DUAL
    /

    BITAND(27,42) 27+42-BITAND(27,42) BITOR(27,42)
    ------------- ------------------- ------------
    10 59 59

    Optimizer Panel Group @ UKOUG Breakthough'22 Conference, Thursday 1st December, 4pm.

    $
    0
    0
    Come to the Optimizer Panel Group session at the UKOUG Breakthrough'22 conference in Birmingham.  Reply to this tweet to submit a question in advance.

    Loading a Flat File from OCI Object Storage into an Autonomous Database. Part 3. Copying data from Object Storage to a Regular Table

    $
    0
    0
    This blog is the third in a series of three that looks at transferring a file to Oracle Cloud Infrastructure (OCI) Object Storage, and then reading it into the database with an external table or copying it into a regular table.

    Copy Data into Table 

    Alternatively, we can copy the data into a normal table. The table needs to be created in advance. This time, I am going to run the copy as user SOE rather than ADMIN.  I need to:
    • Grant connect and resource privilege and quota on the data tablespace.
    • Grant execute on DBMS_CLOUD to SOE, so it can execute the command.
    • Grant READ and WRITE access on the DATA_PUMP_DIR directory – the log and bad files created by this process are written to this database directory.
    connect admin/Password2020!@gofaster1b_tp 
    CREATE USER soe IDENTIFIED BY Password2020;
    GRANT CONNECT, RESOURCE TO soe;
    GRANT EXECUTE ON DBMS_CLOUD TO soe;
    GRANT READ, WRITE ON DIRECTORY data_pump_dir TO soe;
    ALTER USER soe QUOTA UNLIMITED ON data;
    I am now going to switch to user SOE and create my table.
    connect soe/Password2020@gofaster1b_tp
    Drop table soe.ash_hist purge;
    CREATE TABLE soe.ASH_HIST
    ( SNAP_ID NUMBER,
    DBID NUMBER,
    INSTANCE_NUMBER NUMBER,
    SAMPLE_ID NUMBER,
    SAMPLE_TIME TIMESTAMP (3),
    -- SAMPLE_TIME_UTC TIMESTAMP (3),
    -- USECS_PER_ROW NUMBER,
    SESSION_ID NUMBER,
    SESSION_SERIAL# NUMBER,
    SESSION_TYPE VARCHAR2(10),
    FLAGS NUMBER,
    USER_ID NUMBER,
    -----------------------------------------
    SQL_ID VARCHAR2(13),
    IS_SQLID_CURRENT VARCHAR2(1),
    SQL_CHILD_NUMBER NUMBER,
    SQL_OPCODE NUMBER,
    SQL_OPNAME VARCHAR2(64),
    FORCE_MATCHING_SIGNATURE NUMBER,
    TOP_LEVEL_SQL_ID VARCHAR2(13),
    TOP_LEVEL_SQL_OPCODE NUMBER,
    SQL_PLAN_HASH_VALUE NUMBER,
    SQL_FULL_PLAN_HASH_VALUE NUMBER,
    -----------------------------------------
    SQL_ADAPTIVE_PLAN_RESOLVED NUMBER,
    SQL_PLAN_LINE_ID NUMBER,
    SQL_PLAN_OPERATION VARCHAR2(64),
    SQL_PLAN_OPTIONS VARCHAR2(64),
    SQL_EXEC_ID NUMBER,
    SQL_EXEC_START DATE,
    PLSQL_ENTRY_OBJECT_ID NUMBER,
    PLSQL_ENTRY_SUBPROGRAM_ID NUMBER,
    PLSQL_OBJECT_ID NUMBER,
    PLSQL_SUBPROGRAM_ID NUMBER,
    -----------------------------------------
    QC_INSTANCE_ID NUMBER,
    QC_SESSION_ID NUMBER,
    QC_SESSION_SERIAL# NUMBER,
    PX_FLAGS NUMBER,
    EVENT VARCHAR2(64),
    EVENT_ID NUMBER,
    SEQ# NUMBER,
    P1TEXT VARCHAR2(64),
    P1 NUMBER,
    P2TEXT VARCHAR2(64),
    -----------------------------------------
    P2 NUMBER,
    P3TEXT VARCHAR2(64),
    P3 NUMBER,
    WAIT_CLASS VARCHAR2(64),
    WAIT_CLASS_ID NUMBER,
    WAIT_TIME NUMBER,
    SESSION_STATE VARCHAR2(7),
    TIME_WAITED NUMBER,
    BLOCKING_SESSION_STATUS VARCHAR2(11),
    BLOCKING_SESSION NUMBER,
    -----------------------------------------
    BLOCKING_SESSION_SERIAL# NUMBER,
    BLOCKING_INST_ID NUMBER,
    BLOCKING_HANGCHAIN_INFO VARCHAR2(1),
    CURRENT_OBJ# NUMBER,
    CURRENT_FILE# NUMBER,
    CURRENT_BLOCK# NUMBER,
    CURRENT_ROW# NUMBER,
    TOP_LEVEL_CALL# NUMBER,
    TOP_LEVEL_CALL_NAME VARCHAR2(64),
    CONSUMER_GROUP_ID NUMBER,
    -----------------------------------------
    XID RAW(8),
    REMOTE_INSTANCE# NUMBER,
    TIME_MODEL NUMBER,
    IN_CONNECTION_MGMT VARCHAR2(1),
    IN_PARSE VARCHAR2(1),
    IN_HARD_PARSE VARCHAR2(1),
    IN_SQL_EXECUTION VARCHAR2(1),
    IN_PLSQL_EXECUTION VARCHAR2(1),
    IN_PLSQL_RPC VARCHAR2(1),
    IN_PLSQL_COMPILATION VARCHAR2(1),
    -----------------------------------------
    IN_JAVA_EXECUTION VARCHAR2(1),
    IN_BIND VARCHAR2(1),
    IN_CURSOR_CLOSE VARCHAR2(1),
    IN_SEQUENCE_LOAD VARCHAR2(1),
    IN_INMEMORY_QUERY VARCHAR2(1),
    IN_INMEMORY_POPULATE VARCHAR2(1),
    IN_INMEMORY_PREPOPULATE VARCHAR2(1),
    IN_INMEMORY_REPOPULATE VARCHAR2(1),
    IN_INMEMORY_TREPOPULATE VARCHAR2(1),
    -- IN_TABLESPACE_ENCRYPTION VARCHAR2(1),
    CAPTURE_OVERHEAD VARCHAR2(1),
    -----------------------------------------
    REPLAY_OVERHEAD VARCHAR2(1),
    IS_CAPTURED VARCHAR2(1),
    IS_REPLAYED VARCHAR2(1),
    -- IS_REPLAY_SYNC_TOKEN_HOLDER VARCHAR2(1),
    SERVICE_HASH NUMBER,
    PROGRAM VARCHAR2(64),
    MODULE VARCHAR2(64),
    ACTION VARCHAR2(64),
    CLIENT_ID VARCHAR2(64),
    MACHINE VARCHAR2(64),
    PORT NUMBER,
    -----------------------------------------
    ECID VARCHAR2(64),
    DBREPLAY_FILE_ID NUMBER,
    DBREPLAY_CALL_COUNTER NUMBER,
    TM_DELTA_TIME NUMBER,
    TM_DELTA_CPU_TIME NUMBER,
    TM_DELTA_DB_TIME NUMBER,
    DELTA_TIME NUMBER,
    DELTA_READ_IO_REQUESTS NUMBER,
    DELTA_WRITE_IO_REQUESTS NUMBER,
    DELTA_READ_IO_BYTES NUMBER,
    -----------------------------------------
    DELTA_WRITE_IO_BYTES NUMBER,
    DELTA_INTERCONNECT_IO_BYTES NUMBER,
    PGA_ALLOCATED NUMBER,
    TEMP_SPACE_ALLOCATED NUMBER,
    DBOP_NAME VARCHAR2(64),
    DBOP_EXEC_ID NUMBER,
    CON_DBID NUMBER,
    CON_ID NUMBER,
    -----------------------------------------
    CONSTRAINT ash_hist_pk PRIMARY KEY (dbid, instance_number, snap_id, sample_id, session_id)
    )
    COMPRESS FOR QUERY LOW
    /
    As Autonomous Databases run on Exadata, I have also specified Hybrid Columnar Compression (HCC) for this table.
    Credentials are specific to the database user.  I have to create an additional credential, for the same cloud user, but owned by SOE.
    ALTER SESSION SET nls_date_Format='hh24:mi:ss dd.mm.yyyy';
    set serveroutput on timi on
    BEGIN
    DBMS_CLOUD.CREATE_CREDENTIAL (
    credential_name => 'SOE_BUCKET',
    username=> 'oraclecloud1@go-faster.co.uk',
    password=> 'K7xfi-mG<1Z:dq#88;1m'
    );
    END;
    /
    column owner format a10
    column credential_name format a20
    column comments format a80
    column username format a40
    SELECT * FROM dba_credentials;

    OWNER CREDENTIAL_NAME USERNAME WINDOWS_DOMAIN
    ---------- -------------------- ---------------------------------------- ------------------------------
    COMMENTS ENABL
    -------------------------------------------------------------------------------- -----
    ADMIN MY_BUCKET oraclecloud1@go-faster.co.uk
    {"comments":"Created via DBMS_CLOUD.create_credential"} TRUE

    SOE SOE_BUCKET oraclecloud1@go-faster.co.uk
    {"comments":"Created via DBMS_CLOUD.create_credential"} TRUE

    The COPY_DATA procedure is similar to CREATE_EXTERNAL_TABLE described in the previous post, but it doesn't have a column list. The field names much match the column names. It is sensitive to field names with a trailing #.  These must be enclosed in double-quotes.
    TRUNCATE TABLE soe.ash_hist;
    DECLARE
    l_operation_id NUMBER;
    BEGIN
    DBMS_CLOUD.COPY_DATA(
    table_name =>'ASH_HIST',
    credential_name =>'SOE_BUCKET',
    file_uri_list =>'https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu/b/bucket-20200505-1552/o/dba_hist_active_sess_history.txt.gz',
    schema_name => 'SOE',
    format => json_object('blankasnull' value 'true'
    ,'compression' value 'gzip'
    ,'dateformat' value 'YYYY-MM-DD/HH24:mi:ss'
    ,'timestampformat' value 'YYYY-MM-DD/HH24:mi:ss.ff'
    ,'delimiter' value '<,>'
    ,'ignoreblanklines' value 'true'
    ,'rejectlimit' value '10'
    ,'removequotes' value 'true'
    ,'trimspaces' value 'lrtrim'
    ),
    field_list=>'SNAP_ID,DBID,INSTANCE_NUMBER,SAMPLE_ID,SAMPLE_TIME ,SESSION_ID,"SESSION_SERIAL#",SESSION_TYPE,FLAGS,USER_ID
    ,SQL_ID,IS_SQLID_CURRENT,SQL_CHILD_NUMBER,SQL_OPCODE,SQL_OPNAME,FORCE_MATCHING_SIGNATURE,TOP_LEVEL_SQL_ID,TOP_LEVEL_SQL_OPCODE,SQL_PLAN_HASH_VALUE,SQL_FULL_PLAN_HASH_VALUE
    ,SQL_ADAPTIVE_PLAN_RESOLVED,SQL_PLAN_LINE_ID,SQL_PLAN_OPERATION,SQL_PLAN_OPTIONS,SQL_EXEC_ID,SQL_EXEC_START,PLSQL_ENTRY_OBJECT_ID,PLSQL_ENTRY_SUBPROGRAM_ID,PLSQL_OBJECT_ID,PLSQL_SUBPROGRAM_ID
    ,QC_INSTANCE_ID,QC_SESSION_ID,"QC_SESSION_SERIAL#",PX_FLAGS,EVENT,EVENT_ID,"SEQ#",P1TEXT,P1,P2TEXT
    ,P2,P3TEXT,P3,WAIT_CLASS,WAIT_CLASS_ID,WAIT_TIME,SESSION_STATE,TIME_WAITED,BLOCKING_SESSION_STATUS,BLOCKING_SESSION
    ,"BLOCKING_SESSION_SERIAL#",BLOCKING_INST_ID,BLOCKING_HANGCHAIN_INFO,"CURRENT_OBJ#","CURRENT_FILE#","CURRENT_BLOCK#","CURRENT_ROW#","TOP_LEVEL_CALL#",TOP_LEVEL_CALL_NAME,CONSUMER_GROUP_ID
    ,XID,"REMOTE_INSTANCE#",TIME_MODEL,IN_CONNECTION_MGMT,IN_PARSE,IN_HARD_PARSE,IN_SQL_EXECUTION,IN_PLSQL_EXECUTION,IN_PLSQL_RPC,IN_PLSQL_COMPILATION
    ,IN_JAVA_EXECUTION,IN_BIND,IN_CURSOR_CLOSE,IN_SEQUENCE_LOAD,IN_INMEMORY_QUERY,IN_INMEMORY_POPULATE,IN_INMEMORY_PREPOPULATE,IN_INMEMORY_REPOPULATE,IN_INMEMORY_TREPOPULATE,CAPTURE_OVERHEAD
    ,REPLAY_OVERHEAD,IS_CAPTURED,IS_REPLAYED,SERVICE_HASH,PROGRAM,MODULE,ACTION,CLIENT_ID,MACHINE,PORT
    ,ECID,DBREPLAY_FILE_ID,DBREPLAY_CALL_COUNTER,TM_DELTA_TIME,TM_DELTA_CPU_TIME,TM_DELTA_DB_TIME,DELTA_TIME,DELTA_READ_IO_REQUESTS,DELTA_WRITE_IO_REQUESTS,DELTA_READ_IO_BYTES
    ,DELTA_WRITE_IO_BYTES,DELTA_INTERCONNECT_IO_BYTES,PGA_ALLOCATED,TEMP_SPACE_ALLOCATED,DBOP_NAME,DBOP_EXEC_ID,CON_DBID,CON_ID',
    operation_id=>l_operation_id
    );
    dbms_output.put_line('Operation ID:'||l_operation_id||' finished successfully');
    EXCEPTION WHEN OTHERS THEN
    dbms_output.put_line('Operation ID:'||l_operation_id||' raised an error');
    RAISE;
    END;
    /

    The copy data takes slightly longer than the query on the external table.
    Operation ID:31 finished successfully

    PL/SQL procedure successfully completed.

    Elapsed: 00:02:01.11
    The status of the copy operation is reported in USER_LOAD_OPERATIONS.  This includes the number of rows loaded and the names of external tables that are created for the log and bad files.
    set lines 120
    column type format a10
    column file_uri_list format a64
    column start_time format a32
    column update_time format a32
    column owner_name format a10
    column table_name format a10
    column partition_name format a10
    column subpartition_name format a10
    column logfile_table format a15
    column badfile_table format a15
    column tempext_table format a30
    select * from user_load_operations where id = &operation_id;

    ID TYPE SID SERIAL# START_TIME UPDATE_TIME STATUS
    ---------- ---------- ---------- ---------- -------------------------------- -------------------------------- ---------
    OWNER_NAME TABLE_NAME PARTITION_ SUBPARTITI FILE_URI_LIST ROWS_LOADED
    ---------- ---------- ---------- ---------- ---------------------------------------------------------------- -----------
    LOGFILE_TABLE BADFILE_TABLE TEMPEXT_TABLE
    --------------- --------------- ------------------------------
    31 COPY 19965 44088 07-MAY-20 17.03.20.328263 +01:00 07-MAY-20 17.05.36.157680 +01:00 COMPLETED
    SOE ASH_HIST https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu 1409305
    /b/bucket-20200505-1552/o/dba_hist_active_sess_history.txt.gz
    COPY$31_LOG COPY$31_BAD COPY$Y2R021UKPJ5F75JCMSKL

    An external table is temporarily created by the COPY_DATA procedure but is then dropped before the procedure completes.  The bad file is empty because the copy operation succeeded without error, but we can query the copy log.
    select * from COPY$31_LOG;

    RECORD
    ------------------------------------------------------------------------------------------------------------------------
    LOG file opened at 05/07/20 16:03:21

    Total Number of Files=1

    Data File: https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu/b/bucket-20200505-1552/o/dba_hist_active_sess_history.txt.gz

    Log File: COPY$31_105537.log

    LOG file opened at 05/07/20 16:03:21

    Total Number of Files=1

    Data File: https://objectstorage.uk-london-1.oraclecloud.com/n/lrndaxjjgnuu/b/bucket-20200505-1552/o/dba_hist_active_sess_history.txt.gz

    Log File: COPY$31_105537.log

    LOG file opened at 05/07/20 16:03:21

    KUP-05014: Warning: Intra source concurrency disabled because the URLs specified for the Cloud Service map to compressed data.

    Bad File: COPY$31_105537.bad

    Field Definitions for table COPY$Y2R021UKPJ5F75JCMSKL
    Record format DELIMITED BY
    Data in file has same endianness as the platform
    Rows with all null fields are accepted
    Table level NULLIF (Field = BLANKS)
    Fields in Data Source:

    SNAP_ID CHAR (255)
    Terminated by "<,>"
    Trim whitespace from left and right
    DBID CHAR (255)
    Terminated by "<,>"
    Trim whitespace from left and right
    INSTANCE_NUMBER CHAR (255)
    Terminated by "<,>"
    Trim whitespace from left and right
    SAMPLE_ID CHAR (255)
    Terminated by "<,>"
    Trim whitespace from left and right
    SAMPLE_TIME CHAR (255)
    Date datatype TIMESTAMP, date mask YYYY-MM-DD/HH24:mi:ss.ff
    Terminated by "<,>"
    Trim whitespace from left and right

    CON_ID CHAR (255)
    Terminated by "<,>"
    Trim whitespace from left and right

    Date Cache Statistics for table COPY$Y2R021UKPJ5F75JCMSKL
    Date conversion cache disabled due to overflow (default size: 1000)

    365 rows selected.
    These files are written to the DATA_DUMP_DIR database directory.  We don't have access to the database file system in Autonomous, so Oracle has provided the LIST_FILES procedure in DBMS_CLOUD so that we can see what files are in a directory.
    Set pages 99 lines 150
    Column object_name format a32
    Column created format a32
    Column last_modified format a32
    Column checksum format a20
    SELECT * FROM DBMS_CLOUD.LIST_FILES('DATA_PUMP_DIR');

    OBJECT_NAME BYTES CHECKSUM CREATED LAST_MODIFIED
    -------------------------------- ---------- -------------------- -------------------------------- --------------------------------

    COPY$31_dflt.log 0 07-MAY-20 16.03.20.000000 +00:00 07-MAY-20 16.03.20.000000 +00:00
    COPY$31_dflt.bad 0 07-MAY-20 16.03.20.000000 +00:00 07-MAY-20 16.03.20.000000 +00:00
    COPY$31_105537.log 13591 07-MAY-20 16.03.21.000000 +00:00 07-MAY-20 16.05.35.000000 +00:00

    Statistics are automatically collected on the table by the copy process because it was done in direct-path mode.  We can see the number of rows retrieved corresponds with the number of rows imported by the COPY_DATA procedure.
    Set pages 99 lines 140
    Column owner format a10
    Column IM_STAT_UPDATE_TIME format a30
    Select *
    from all_tab_statistics
    Where table_name = 'ASH_HIST';

    OWNER TABLE_NAME PARTITION_ PARTITION_POSITION SUBPARTITI SUBPARTITION_POSITION OBJECT_TYPE NUM_ROWS BLOCKS EMPTY_BLOCKS
    ---------- ---------- ---------- ------------------ ---------- --------------------- ------------ ---------- ---------- ------------
    AVG_SPACE CHAIN_CNT AVG_ROW_LEN AVG_SPACE_FREELIST_BLOCKS NUM_FREELIST_BLOCKS AVG_CACHED_BLOCKS AVG_CACHE_HIT_RATIO IM_IMCU_COUNT
    ---------- ---------- ----------- ------------------------- ------------------- ----------------- ------------------- -------------
    IM_BLOCK_COUNT IM_STAT_UPDATE_TIME SCAN_RATE SAMPLE_SIZE LAST_ANALYZED GLO USE STATT STALE_S SCOPE
    -------------- ------------------------------ ---------- ----------- ------------------- --- --- ----- ------- -------
    SOE ASH_HIST TABLE 1409305 19426 0
    0 0 486 0 0
    1409305 15:16:14 07.05.2020 YES NO NO SHARED

    I can confirm that the data is compressed because the compression type of every row is type 8 (HCC QUERY LOW).  See also DBMS_COMPRESSION Compression Types
    WITH x AS (
    select dbms_compression.get_compression_type('SOE', 'ASH_HIST', rowid) ctype
    from soe.ash_hist sample (.1))
    Select ctype, count(*) From x group by ctype;

    CTYPE COUNT(*)
    ---------- ----------
    8 14097
    I can find this SQL Statement in the Performance Hub. 
    INSERT /*+ append enable_parallel_dml */ INTO "SOE"."ASH_HIST" SELECT * FROM COPY$Y2R021UKPJ5F75JCMSKL
    Therefore, the data was queried from the temporary external table into the permanent table, in direct path mode and in parallel.
    I can also look at the OCI Performance Hub and see that mode of the time was spent on CPU.  I can see the SQL_ID of the insert statement and the call to the DBMS_CLOUD procedure.
    I can drill in further to the exact SQL statement.
    When I query the table I get exactly the same data as previously with the external table.
    set autotrace on timi on lines 180 trimspool on
    break on report
    compute sum of ash_secs on report
    column min(sample_time) format a22
    column max(sample_time) format a22
    select event, sum(10) ash_Secs, min(sample_time), max(sample_time)
    from soe.ash_hist
    group by event
    order by ash_Secs desc
    ;

    EVENT ASH_SECS MIN(SAMPLE_TIME) MAX(SAMPLE_TIME)
    ---------------------------------------------------------------- ---------- ---------------------- ----------------------
    10304530 22-MAR-20 09.59.51.125 07-APR-20 23.00.30.395
    direct path read 3258500 22-MAR-20 09.59.51.125 07-APR-20 23.00.30.395
    SQL*Net more data to client 269220 22-MAR-20 10.00.31.205 07-APR-20 22.59.30.275
    direct path write temp 32400 22-MAR-20 11.39.53.996 07-APR-20 21.43.47.329
    gc cr block busy 24930 22-MAR-20 10.51.33.189 07-APR-20 22.56.56.804

    latch free 10 28-MAR-20 20.26.11.307 28-MAR-20 20.26.11.307
    ----------
    sum 14093050

    86 rows selected.

    Elapsed: 00:00:00.62

    I can see that the execution plan is now a single serial full scan of the table.
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 1336681691

    ----------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    ----------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 84 | 1428 | 1848 (9)| 00:00:01 |
    | 1 | SORT ORDER BY | | 84 | 1428 | 1848 (9)| 00:00:01 |
    | 2 | HASH GROUP BY | | 84 | 1428 | 1848 (9)| 00:00:01 |
    | 3 | TABLE ACCESS STORAGE FULL| ASH_HIST | 1409K| 22M| 1753 (4)| 00:00:01 |
    ----------------------------------------------------------------------------------------


    Statistics
    ----------------------------------------------------------
    11 recursive calls
    13 db block gets
    19255 consistent gets
    19247 physical reads
    2436 redo size
    5428 bytes sent via SQL*Net to client
    602 bytes received via SQL*Net from client
    7 SQL*Net roundtrips to/from client
    1 sorts (memory)
    0 sorts (disk)
    86 rows processed

    In the Cloud, Performance is Instrumented as Cost

    $
    0
    0
    About 5 years ago, I was at a conference where someone put this statement up in a PowerPoint slide.  (I would like to be able to correctly credit the author, but I can't remember who it was).  We all looked it at, thought about and said 'yes, of course' to ourselves.  However, as a consultant who specialises in performance optimisation, it has taken until only recently that I started to have conversations with clients that reflect that idea.

    In the good old/bad old days of 'on premises'

    It is not that long ago that the only option for procuring new hardware was to go through a sizing exercise that involved guessing how much you needed, allowing for future growth in data and processing volumes, and then deciding how much you were actually willing to afford, purchase it, and finally wheel it into your data centre and hope for the best.

    It was then normal to want to get the best possible performance out of whatever system was installed on that hardware.  It would inevitably slow down over time.  Eventually, after the hardware purchase had been fully depreciated, you would have to start the whole cycle again and replace the hardware with newer hardware.

    Similarly, Oracle licencing.  You would have to licence Oracle for all your CPUs (there are a few exceptions where you can associate specific CPUs to specific VMs and only licence Oracle for the CPUs in those VMs).  You would also have to decide how many Oracle features you licenced.  Standard or Enterprise Edition?  Diagnostics? Tuning? RAC? Partitioning? Compression? In-Memory?

    "You are gonna need a bigger boat"

    Then when you encountered performance problems you did the best you could with what you had. As a consultant, there was rarely any point in saying to a customer that they had run out of resource and they needed more.  The answer was usually along the lines of 'we have spent our money on that, and it has to last for five years, we have no additional budget and it has to work'. So you got on with finding the rabbit in the hat.

    Instead of purchasing hardware as a capital expense, in the cloud you rent hardware as an operational expense.

    You can bring your own Oracle licence (BYOL), and then you have exactly what you were previously licenced for.  "At a high level, one Oracle Processor License maps to two OCPUs."

    With Oracle's cloud licencing there are still lots of choices to make, not just how many CPUs and how much memory.  You can choose Infrastructure as a Service (IAAS) where you rent the server and install and licence Oracle on it just as you did on-premises.  You can choose different storage systems with different I/O profiles.  There are different levels of PAAS that have different database features.  You can go all the way up to Extreme performance on Exadata.  All of these choices have a cost consequence.  Oracle provides a Cloud cost estimator tool (other consultancies have produced their own versions).  These tools clearly show the link between these choices and their costs very clear.

    "You can have as much performance as you are willing to pay for"

    I have been working with a customer who is moving a PeopleSoft system from Supercluster on-premises to Exadata Cloud-at-Customer (so it is physically on-site, but in all other respects it is in the cloud).  They are not bringing their own licence (BYOL). Instead, they are on a tariff of US$1.3441/OCPU/hr, we have found it easier to talk about US$1000/OCPU/month.

    Just as you would with an on-premises system, they went through a sizing exercise that predicted they needed 6 OCPU on each of 2 RAC nodes during the day, and 10 at night. 

    It has been very helpful to have a clear quantitative definition of acceptable performance for the critical part of the system, the overnight reporting batch.  "The reports need to be available to users by the start of the working day in continental Europe, at 8am CET", which is 2am EST.  There is no benefit in providing additional resources to allow the batch to finish any earlier.  Instead, we only need to provide as much as is necessary to reliably meet the target.

    A performance tuning/testing exercise quickly showed that fewer than the predicted number of CPUs were actually needed.  2-4 OCPUs/node during the day is looking comfortable.  The new Exadata has fewer but much faster CPUs.  As we adjusted the application configuration to match we found we are able to reduce the number of OCPUs. 

    If we hadn't already been using base-level In Memory feature on Supercluster, then to complete the overnight batch in time for the start of the European working day, we would probably have needed 10 OCPUs/node.  The base-level In Memory option brought that down to around 7.  This shows the huge value of the careful use of database features and techniques to reduce CPU overhead.

    We are not using BYOL, so we can use fully featured In Memory with a larger store.  Increasing the In Memory store from 16Gb to 40Gb per node saved another OCPU, but cost nothing.  If we had been using BYOL we would have had to pay additionally for fully featured In Memory.  I doubt the marginal benefit would have justified the cost.

    The customer has been considering switching on the extra OCPUs overnight to facilitate the batch.  Doing so costs $1.33/hour, and at the end of the month, they get an invoice from Oracle.  That has concentrated minds and changed behaviours.  The customer understands that there is a real $ cost/saving to their business decisions.

    One day I was asked: "What happens if we reduce the number of CPUs from 6 to 4?"

    Essentially the batch will take longer.  We are already using the database resource manager to prioritise processes when all the CPU is in use.  The resource manager plan has been built to reflect the business priorities, and so keeps it fair for all users.  For example, it ensures that users of the online part of the application get CPU in preference to batch processes, this is important for users in Asia who are online when the batch runs overnight in North America.  We also use the resource plan to impose different parallel query limits to different groups of processes.   If we are going to vary the number of CPUs we will have to switch between different resource manager plans with different limits.  We will also have to reduce the number of reports that can be concurrently executed by the application, so some application configuration has to go hand in hand with the database configuration.

    Effective caching by the database meant we already did relatively little physical I/O during the reporting.  Most of the time was already spent on CPU.  Use of In Memory further reduced physical I/O, and now nearly all the time is spent on CPU, but it also reduced the overall CPU consumption and therefore response time.

    When we did vary the number of CPUs, we were not surprised to observe, from the Active Session History (ASH), that the total amount of database time spent on CPU by the nVision reporting processes is roughly constant (indicated by the blue area in the below charts).  If we reduce the number of concurrent processes, then the batch simply runs for longer.


    There is no question that effective design and tuning are as important as they ever were.  The laws of physics are the same in the cloud as they are in your own data centre.  We worked hard to get the reporting to this level of performance and down to this CPU usage. 
    The difference is that now you can measure exactly how much that effort is saving you on your cloud subscription, and you can choose to spend more or less on that cloud subscription in order to achieve your business objectives.

    Determining the benefit to the business, in terms of the quantity and cost of users' time, remains as difficult as ever.  However, it was not a major consideration in this example because this all happens before the users are at work.

    In the cloud, you can have as much performance as you are willing to pay for!

    Viewing all 105 articles
    Browse latest View live


    Latest Images