Understanding parallel processing and the Siebel upgrade

Introduction

The Siebel upgrade process supports two types of parallel processing.  Both can help decrease the time required to run your production upgrade.  In order to avoid confusion lets start by clarifying the two types of parallel processing.

  • Use of the Oracle keywords Parallel Nologging on index builds by ddlimp.  This type of parallel processing is enabled by selecting the parallel option when prompted by the Siebel upgrade wizard.  This option is only available with Oracle and does not apply to SQL Server or DB2.
  • Running multiple copies of ddlimp to create different sets of indexes at the same time.  This type of parallel processing is available for all databases.  This option is enabled by using the Siebel upgrade tuner to create multiple ddlimp streams that will run at the same time.  We will refer to this type of parallel processing as index build swim lanes in order to avoid confusion.

A key point to keep in mind with all Oracle index builds regardless of the parallel option is the amount of memory available for index builds is very important.  Index builds do two main things in order to build an index: reads and sorts.  If you don’t allocate enough memory then your sorts will require temp file usage and these index builds will be slower.  The amount of memory available for sort operations is controlled by the parameter pga_aggregate_target if you are living in the modern world or sort_area_size if you are still doing things the old-fashioned way.

Oracle Parallel Index Creation

If you specify “Y” when prompted by the upgrade wizard to use the Oracle parallel option, ddlimp will add the keywords “Parallel Nologging” to all of your index builds.  This is actually a mixed blessing since you really only want this option for the large indexes. 

The majority of indexes that are built during upgrep are indexes on EIM tables.  One of the first things upgrep does is to drop all EIM tables and when these tables get dropped so are their indexes.  Specifying Parallel Nologging for these empty EIM tables will make the index builds take longer.  When the Oracle parallel option was first introduced, a number of my colleagues at Siebel reported their index builds taking much longer than expected.  How much longer you ask?  The worst case we have seen was EIM index creation that normally would take around 30 minutes took 8 hours on a 72 processor Sun E10K server.  For this reason, we normally never use this option on development upgrades.

The key to fast index creation on Siebel upgrades is to specify Oracle parallel for the really big tables but not specify it for small tables and empty tables like the EIM tables.  We will cover this technique later when we discuss multiple ddlimp swim lanes.

So how do you know if you should use the parallel option on your Siebel index builds?  We do test index creates before running the upgrade.  To do this we, check our development upgrade logs and find an index created on your largest table.  Then simply create the index with and without the Parallel Nologging option and compare the difference.   It is a good idea to repeat the test several times to ensure the first index build was not biased by not having data loaded into the buffer pool.

If you decide to go with the parallel index option, you need to make sure certain key init.ora parameters are set correctly.  The parameters of concern are:

  • cpu_count – number of cpu’s.  For dual core or hyperthreaded processes this number will normally be the number of cores available.
  • parallel_max_servers – maximum number of parallel processes that can run in the Oracle instance.
  • parallel_threads_per_cpu – Number of threads per cpu.  Normally defaults to 2.

For Siebel upgrades these parameters are important for a number of reasons.  First, when the keyword Parallel is given without a parallel degree (e.g. parallel (16)) then Oracle uses cpu_count * parallel_threads_per_cpu = n parallel threads.  For example, if cpu_count = 16 and parallel_threads_per_cpu = 2 then 16 * 2 = 32 threads will be used.  Not exactly.  If you monitor v$px_process or just use TOAD you will see 64 threads launched (numbered 0 – 63).  Why?  32 threads will scan the table while another 32 threads will sort the results from the scanners.

If parallel_threads_per_cpu is not large enough, then problems can occur.  On one upgrade with both Oracle parallel and multiple ddlimp swim lanes performance was horrible until we realized parallel_threads_per_cpu was not set high enough.  In this case, only one swim lane at a time would run while the other swim lanes waited for enough parallel treads to become available.

Given all of this the correct math to determine the minimum value for parallel_max_servers on your Siebel upgrade is:

(((cpu_count * parallel_threads_per_cpu) * 2 ) * swim_lane_count) + safety_factor

where swim_lane_count is the number of swim lanes that are using the Oracle parallel option and saftey_factor is some number, say 10, so that DB monitoring tools such as Quest won’t alert you that you are close to hitting your max.

Multiple ddlimp Index Create Swim Lanes

The Siebel upgrade includes a utility called  the Upgrade Tuner.  In addition to eliminating zero-row SQL statements this utility also allows you to move certain tables and their corresponding indexes into different ddlimp instances such that will all run at the same time.  As an example we might have four ddlimp swim lanes such as

  • Huge table all by itself
  • Three large tables
  • Ten fairly large tables
  • Everything else including EIM tables and all small tables

The number of ddlimp swim lanes and tables in each depends on how long particular index builds take.  We normally use the standard project management techniques to determine the critical path and move things around so that we minimize the total duration for index builds.  It is important to note that there will be a limit on how many swim lanes you will want to create since these swim lanes will be competing for cpu, memory and disk resources and eventually you will reach a point of diminishing returns with too many swim lanes actually taking longer.

Unfortunately, Siebel does not provide an option to turn the parallel option on or off for a given swim lane.  You either get Oracle Parallel Nologging for all swim lanes or no swim lanes.  In the example above we would turn the parallel option off for swim lane number 4.  Please contact us if you would like to know how you selectively turn parallel on or off for a given swim lane.
 

 

Copyright Ponder Professional Services, Inc. 2006