Practical Apps DBA

Oracle Apps DBA’s Blog

Archive for the ‘Performance Tuning’ Category

Perf Tune

How did we solve our “log file sync” ?

Posted by Navdeep Saini on April 24, 2008

It was one of those intriguing days when for almost 2-3 days users/developers/managers has been bugging me for “possible” slowness in our EPR instance. Ofcourse I checked everything: no long running queries eating up resources, no sudden spurt of sessions, nothing in AWR report to show off, nothing to show that there is slowness in the database !!!

I do not normaly loose sleep over performance issues, but it was irritating when apps support guys woke me up in middle of night for two consecutive nights. One batch job (conc. request) very critical to business, was running 5 hours behind schedule. Obvious place to look for was dbconsole “top activity” and tracing the conc. request. The trace file showed that request was just running crazy (tail -f of trace) and dbconsole did not show any particular activity which might be causing the slowness in the database. I tkprofed (sys=no, sort=exeela,prsela,fchela) the trace and found nothing wrong with the sqls that were reported. No wait events like “db file sequential/scattered read”, apart from few occurances of “log file sync” event. Looking more carefuly in the “top sessions” in dbconsole it occured to me that indeed “log file sync” was reported occasionally but only for few seconds.

Observing more keenly I also found that trace of the sessions (tail -f of trace) was freezing occasionaly for few second and showing “log file sync” after it freezes. So it seems “log file sync” is my area of concern, albeit it was very miniscule in the overall database scenario.

As per metalink standard Note:34592.1 on “log file sync”:

The log file sync wait may be broken down into the following components:

    1. Wakeup LGWR if idle
    2. LGWR gathers the redo to be written and issue the I/O
    3. Time for the log write I/O to complete
    4. LGWR I/O post processing
    5. LGWR posting the foreground/user session that the write has completed
    6. Foreground/user session wakeup

Tuning advice based on log file sync component breakdown above:

Steps 2 and 3 are accumulated in the “redo write time” statistic. (i.e. as found under STATISICS section of Statspack and AWR)
Step 3 is the “log file parallel write” wait event. (Note 34583.1:”log file parallel write” Reference Note )
Steps 5 and 6 may become very significant as the system load increases. This is because even after the foreground has been posted it may take a some time for the OS to schedule it to run. May require monitoring from O/S level.

As mentioned earlier, my AWR report was not showing “redo write time” and “log file parallel write” to be that significant. As such Load on the machine was “normal” and there was nothing that could point me to a direction in which I could proceed.

The situation was becoming worse and batch job finally finished at 7:30 AM in the morning, 5:30 hours behind schedule. Blood hounds (business/apps support) were out after me :-) and they were smelling blood. But I was not letting it go and turned to my internals knowledge. During my oracle days one of the thing I learned was that we can 10046 (sql trace) trace background sessions also. They will only show wait events in trace files as they normally do not fire any sqls. LGWR was my guy this time and I turned on the trace. VOILA !!! it was showing “log file parallel write” !!

However my hopes were dashed immiediately as it was normal for LGWR to post this event as it writes (IO) to redo logs. Again I was in dark and could not fathom what could be the reason for “log file sync”. One wild idea that was floating in my mind was that it could be that there are suddenly more commits in the sytem. Apps support guys purged almot 800 millions rows in workflow tables, as they fixed some issue with workflow becuase of which they were not able to purge WF tables for almost a year !!!. Could it be that there are now 1/2 the number of rows in WF tables which might have “oiled” the system and now it was “churing” out more !!! There is suddenly more commits in the database and causing the LGWR to lag behind…..

Keen observation is one common concept that many try to overlook and I was the one who was the victim of it that day. Looking at LGWR trace files again, I saw that elapsed time for “log file parallel write” was sometimes spiking. Simple unix sort on it showed that “elapsed time” was sometimes in secs where-as, as per oracle for 20k of redo it should not take more than 15 milliseconds. It was “log file parallel write” , a simple sutpid IO request on redo log files causing this delay. And since LGWR has to write to redo at every commit it was stalling the whole system !!!

Here is portion of lgwr trace showing “log file parallel write” spiking to sometime to 39 secs !!

WAIT #0: nam=’log file parallel write’ ela= 22324424 files=2 blocks=2 requests=2 obj#=-1 tim=13758521131245
WAIT #0: nam=’rdbms ipc message’ ela= 6595 timeout=294 p2=0 p3=0 obj#=-1 tim=13758521138897
WAIT #0: nam=’log file parallel write’ ela= 2343353 files=2 blocks=64 requests=4 obj#=-1 tim=13758521143128
WAIT #0: nam=’rdbms ipc message’ ela= 97448 timeout=290 p2=0 p3=0 obj#=-1 tim=13758521262712
WAIT #0: nam=’log file parallel write’ ela= 1294 files=2 blocks=80 requests=2 obj#=-1 tim=13758521264788
WAIT #0: nam=’rdbms ipc message’ ela= 9538 timeout=280 p2=0 p3=0 obj#=-1 tim=13758521275843
WAIT #0: nam=’log file parallel write’ ela= 909835 files=2 blocks=8 requests=2 obj#=-1 tim=13758521278533

Mount points for redo logs was targeted and I contacted my SAs for DMX system. They immiediately found that out of four fiber adaptors on database machine , which connects to storage array via SAN switch, one was performing little slower. They asked for some more time to investigate. I waited for almost 1 excrutiating hour and finally heard back from them. It was indeed the IO sub-system. One of the port on SAN switch was “dying” causing frequent IO queues. They swapped the port with another one and it fixed it !!!. So it was just one simple bad port on SAN switch which was giving frequent “heart burns” to the database.

Leason learned…on a very very busy system, esp. an OLTP like Oracle APPS, it is absolutely critical that write throughput on redo logs is maintained at every time. No RAID-5, just simple RAID-1, on faster fiber channels, san switches etc..

These screen shots shows pictographically the situation:

This was before SAs fixed IO sub-system. The orange colors is for “commits”. Observe the frequent spikes of Orange color.

This was after SAs fixed IO sub-system. There is no orange color.

Posted in Performance Tuning | 22 Comments »

FND_STATS vs DBMS_STATS

Posted by Navdeep Saini on April 4, 2008

I have been asked this questions many times. Shoud we use fnd_stats or should we use dbms_stats in 11i (or release 12). Most of the Apps DBAs you will face will say use fnd_stats. In fact Oracle also recommends to use fnd_stats in 11i E-Business Environments. But do you know what are the basic difference or you can say advantages or using FND_STATS over DBMS_STATS.
Here I have tried to compare the two…

Posted in Performance Tuning | 7 Comments »

How to force hints using Outlines

Posted by Navdeep Saini on May 18, 2007

Oracle introduced plan stability in form of Outlines in version 8i. However I have always been intrigued as to how to force a hint to a query using outline. This is particularly useful for Apps DBA, as sometimes it is not possible to change the query and put the hint in it, e.g.
If a query is being fired from some executable which cannnot be edited, like INCOIN for Item Import in Oracle Apps.

Also, often, changing sql will lead to invalidations and will require recompilation which means we may need downtime, e.g. if you want to force hint to a query which is being fired from package, you will have to change the package and recompile it and its dependencies.

If you want to force a hint to a query in a running production system and do not have downtime to do it, here is an example how you can achieve it:

Note: Always make sure you get the exact sql from the shared_pool. Outlines are very sensitive and even one blank space will invalidate the outlines.

1. Take the sql from the shared_pool. (Use hash_value in 9i or sql_id in 10g).
e.g. The following sql is in shared_pool:
select /*+ FIRST_ROWS +*/ * from ad_parallel_update_units where update_id = :b1 and status = :b2

2. This sql sometimes uses index AD_PARALLEL_UPDATE_UNITS_U1 and you have to force use of AD_PARALLEL_UPDATE_UNITS_N1 by using hints….

select /*+ index( ad_parallel_update_units AD_PARALLEL_UPDATE_UNITS_N1FIRST_ROWS) +*/ * from ad_parallel_update_units where update_id = :b1 and status = :b2

3. Create an outline on the original sql:
CREATE OR REPLACE OUTLINE ADPARALLEL_ORIG
select /*+ FIRST_ROWS +*/ * from ad_parallel_update_units where update_id = :b1 and status = :b2;

4. Create an outline for the hinted sql
CREATE OR REPLACE OUTLINE ADPARALLEL_HINT
select /*+ index( ad_parallel_update_units AD_PARALLEL_UPDATE_UNITS_N1) FIRST_ROWS) +*/ * from ad_parallel_update_units where update_id = :b1 and status = :b2;

5. Update the OUTLN.OL$HINTS table to replace the outline of orig sql with the “hinted” sql:
UPDATE OUTLN.OL$HINTS
SET OL_NAME=DECODE(OL_NAME,’ADPARALLEL_HINT’,'ADPARALLEL_ORIG’,'ADPARALLEL_ORIG’,'ADPARALLEL_HINT’)
WHERE OL_NAME IN (‘ADPARALLEL_ORIG’,'ADPARALLEL_HINT’);
commit;

6. You can drop the “hinted” sql at this point:
DROP OUTLINE ADPARALLEL_HINT

7. At this point if the sql: select /*+ FIRST_ROWS +*/ * from ad_parallel_update_units where update_id = :b1 and status = :b2
is parsed again, it will use the outline and will be forced to use index AD_PARALLEL_UPDATE_UNITS_N1, no matter what. You have to make sure that the cursor for this sql is invalidated and query is parsed again, otherwise database will continue using the old plan, which was cached before until next hard parse of this sql. One way to ensure hard parse is to flush shared pool:
alter system flush shared_pool.
However you will have to be careful as this will invalidate all the cursors and will cause hard parsing for all the sqls that are being fired in the database.

8. You can also monitor if the query is using outline or not by running following sqls:

    1. select name,used from dba_outlines where name=’ADPARALLEL_ORIG’
    The column “USED” will show value of ‘USED’ or ‘UNUSED’ accordingly.

    2. Select sql_text,outline_category from v$sql where hash_value=’Your hash value’ — For 9i
    Select sql_text,outline_category from v$sql where sql_id=’Your sql id’ — For 10g

    The column “OUTLINE_CATEFORY” will show some value e.g. DEFAULT in our case as we created outline in default category.
    If the value is null then outlines are not being used. This is the most fool proof and quick way of checking whether the current sql in the sql area is using the outline or not.

    3. Run event 10053 for the sql. Event 10053 traces the CBO Computations. (refer metalink note#225598.1)

    1. alter session set events =’10053 trace name context forever, level 1;
    2. then run explain for the sql e.g.
    explain plan for
    select /*+ FIRST_ROWS +*/ * from ad_parallel_update_units where update_id = :b1 and status = :b2

    A trace will be generated in your udump directory and look for words like outline in it. In 10g you will see something like:
    Outline Data:
    /*+
    BEGIN_OUTLINE_DATA…..

Note: There is a funny problem until 9i. In order to enable usage of outlines in database you have to alter session or system to enable “use_stored_outlines = {TRUE|category|FALSE}” e.g. to Enable usage of “Default outline category” system-wide you will have to :
ater system set use_stored_outlines = true;
However during next restart of database this value will be lost and you will stop using outline that you have created above. One solution is to create system level “on database startup” trigger and run alter system command in it. This problem is not seen in 10g.

References:Note:92202.1

Posted in Performance Tuning | 13 Comments »

Using Hints in SQLs versus Using Outlines

Posted by Navdeep Saini on December 23, 2006

Many times as DBAs or sometimes developer we need to fix a performance issue and mostly the fix is adding hint/s to the sql.

There are two options to put a hint:
1. Put the hint in the sql itself and redeploy the code (works well if it is custom code).
2. Create an outline with hint for the sql (works well for standard code and most of the time when we cannot change the code).

Attached is the document I prepared, that compares both the techniques. It helps in taking a decision as to what approach to use when.
hints-in-sqls-vs-outlines.doc

I will discuss more on creating outlines (esp. in Oracle Apps they are very helpful) in my later posts. Till then keep checking my blog…………….

Posted in Performance Tuning | Leave a Comment »

 
Follow

Get every new post delivered to your Inbox.

Join 28 other followers