Practical Apps DBA

Oracle Apps DBA’s Blog

EBS R12 on IBM P-Series Running RedHat Linux

Posted by Navdeep Saini on April 29, 2008

Few days ago, one of our customer had this interesting question which caught us little offguard. Question was, that whether EBS R12 is certified for RedHat Linux running on IBM P-series (IBM power based system) or not?

Checking metalink it looks that, it should just be fine. If you go to metalink you will see the following:

Which suggests that oracle database on Linux running on IBM Power-pSeries, runs just fine. However a closer look at metalink note:341507.1, you will find that no-where it talks of EBS R11i or R12 or related products. Another look at metalink certify:

Important thing to note here is that for Apps Tier on Linux, only x86 and x86-64 platforms are supported which means that only processor chipsets like x-86 (Intel based), AMD64,EMT64T are supported. Anyother processor chip (e.g. IBM P series which uses IBM POWER5+ or POWER5 processor) is NOT supported for APPS TIER, even though they can run Linux on it (as per RedHat certification: RedHat Enterprise Linux: Server Version comparison chart ).

Jist of the discussion is that on IBM Power with Linux, (pSeries, iSeries, System p5 and System i5) oracle EBS can run only in split configuration. Which means database tier can run on IBM Power with RedHat Linux and Apps Tier has to have AIX as OS software. However If you want to run Apps Tier on Linux, it has to be on supported chipsets only e.g. x-86, x-86-64 etc.

Posted in Advanced configurations | 5 Comments »

How did we solve our “log file sync” ?

Posted by Navdeep Saini on April 24, 2008

It was one of those intriguing days when for almost 2-3 days users/developers/managers has been bugging me for “possible” slowness in our EPR instance. Ofcourse I checked everything: no long running queries eating up resources, no sudden spurt of sessions, nothing in AWR report to show off, nothing to show that there is slowness in the database !!!

I do not normaly loose sleep over performance issues, but it was irritating when apps support guys woke me up in middle of night for two consecutive nights. One batch job (conc. request) very critical to business, was running 5 hours behind schedule. Obvious place to look for was dbconsole “top activity” and tracing the conc. request. The trace file showed that request was just running crazy (tail -f of trace) and dbconsole did not show any particular activity which might be causing the slowness in the database. I tkprofed (sys=no, sort=exeela,prsela,fchela) the trace and found nothing wrong with the sqls that were reported. No wait events like “db file sequential/scattered read”, apart from few occurances of “log file sync” event. Looking more carefuly in the “top sessions” in dbconsole it occured to me that indeed “log file sync” was reported occasionally but only for few seconds.

Observing more keenly I also found that trace of the sessions (tail -f of trace) was freezing occasionaly for few second and showing “log file sync” after it freezes. So it seems “log file sync” is my area of concern, albeit it was very miniscule in the overall database scenario.

As per metalink standard Note:34592.1 on “log file sync”:

The log file sync wait may be broken down into the following components:

    1. Wakeup LGWR if idle
    2. LGWR gathers the redo to be written and issue the I/O
    3. Time for the log write I/O to complete
    4. LGWR I/O post processing
    5. LGWR posting the foreground/user session that the write has completed
    6. Foreground/user session wakeup

Tuning advice based on log file sync component breakdown above:

Steps 2 and 3 are accumulated in the “redo write time” statistic. (i.e. as found under STATISICS section of Statspack and AWR)
Step 3 is the “log file parallel write” wait event. (Note 34583.1:”log file parallel write” Reference Note )
Steps 5 and 6 may become very significant as the system load increases. This is because even after the foreground has been posted it may take a some time for the OS to schedule it to run. May require monitoring from O/S level.

As mentioned earlier, my AWR report was not showing “redo write time” and “log file parallel write” to be that significant. As such Load on the machine was “normal” and there was nothing that could point me to a direction in which I could proceed.

The situation was becoming worse and batch job finally finished at 7:30 AM in the morning, 5:30 hours behind schedule. Blood hounds (business/apps support) were out after me :-) and they were smelling blood. But I was not letting it go and turned to my internals knowledge. During my oracle days one of the thing I learned was that we can 10046 (sql trace) trace background sessions also. They will only show wait events in trace files as they normally do not fire any sqls. LGWR was my guy this time and I turned on the trace. VOILA !!! it was showing “log file parallel write” !!

However my hopes were dashed immiediately as it was normal for LGWR to post this event as it writes (IO) to redo logs. Again I was in dark and could not fathom what could be the reason for “log file sync”. One wild idea that was floating in my mind was that it could be that there are suddenly more commits in the sytem. Apps support guys purged almot 800 millions rows in workflow tables, as they fixed some issue with workflow becuase of which they were not able to purge WF tables for almost a year !!!. Could it be that there are now 1/2 the number of rows in WF tables which might have “oiled” the system and now it was “churing” out more !!! There is suddenly more commits in the database and causing the LGWR to lag behind…..

Keen observation is one common concept that many try to overlook and I was the one who was the victim of it that day. Looking at LGWR trace files again, I saw that elapsed time for “log file parallel write” was sometimes spiking. Simple unix sort on it showed that “elapsed time” was sometimes in secs where-as, as per oracle for 20k of redo it should not take more than 15 milliseconds. It was “log file parallel write” , a simple sutpid IO request on redo log files causing this delay. And since LGWR has to write to redo at every commit it was stalling the whole system !!!

Here is portion of lgwr trace showing “log file parallel write” spiking to sometime to 39 secs !!

WAIT #0: nam=’log file parallel write’ ela= 22324424 files=2 blocks=2 requests=2 obj#=-1 tim=13758521131245
WAIT #0: nam=’rdbms ipc message’ ela= 6595 timeout=294 p2=0 p3=0 obj#=-1 tim=13758521138897
WAIT #0: nam=’log file parallel write’ ela= 2343353 files=2 blocks=64 requests=4 obj#=-1 tim=13758521143128
WAIT #0: nam=’rdbms ipc message’ ela= 97448 timeout=290 p2=0 p3=0 obj#=-1 tim=13758521262712
WAIT #0: nam=’log file parallel write’ ela= 1294 files=2 blocks=80 requests=2 obj#=-1 tim=13758521264788
WAIT #0: nam=’rdbms ipc message’ ela= 9538 timeout=280 p2=0 p3=0 obj#=-1 tim=13758521275843
WAIT #0: nam=’log file parallel write’ ela= 909835 files=2 blocks=8 requests=2 obj#=-1 tim=13758521278533

Mount points for redo logs was targeted and I contacted my SAs for DMX system. They immiediately found that out of four fiber adaptors on database machine , which connects to storage array via SAN switch, one was performing little slower. They asked for some more time to investigate. I waited for almost 1 excrutiating hour and finally heard back from them. It was indeed the IO sub-system. One of the port on SAN switch was “dying” causing frequent IO queues. They swapped the port with another one and it fixed it !!!. So it was just one simple bad port on SAN switch which was giving frequent “heart burns” to the database.

Leason learned…on a very very busy system, esp. an OLTP like Oracle APPS, it is absolutely critical that write throughput on redo logs is maintained at every time. No RAID-5, just simple RAID-1, on faster fiber channels, san switches etc..

These screen shots shows pictographically the situation:

This was before SAs fixed IO sub-system. The orange colors is for “commits”. Observe the frequent spikes of Orange color.

This was after SAs fixed IO sub-system. There is no orange color.

Posted in Performance Tuning | 25 Comments »

Oracle Apps Migration Path from HP-UX PA-RISC to HP-UX Itanium

Posted by Navdeep Saini on April 7, 2008

Most of the HP shops running Applications are mostly on PA-RISC based systems. HP plans to offer PA-RISC-based servers through 2008 and support the systems through 2013, and continue to push for Itanium based servers. Apps customers are asking for possible migration paths to port their applications from PA-RISC to Itanium basesd systems. This post gives detailed summary of how you can achieve this.

For release 11i Oracle had certified HP Itanium under “split configuration”, wherein your database node can run on HP-Itanium and apps tier runs on other supported platforms. Note that as of 11.5.10.2 Apps Tier is NOT supported for HP-UX Itanium.
Currently Oracle supports following platforms for its middle tier:

    1. HP Tru64
    2. HP-UX PA-RISC
    3. IBM AIX Based Systems
    4. Linux x86
    5. Solaris Operating System SPARC
    6. Windows 2000/Windows Server 2003 (x86)

However as of R12, oracle has certified HP Itanium as certified platform for Apps Tier, which means both your db tier and apps tier can run on HP Itanium. Hence if you are planning to port your applications (both db and apps tier) to HP-UX Itanium, then you will have to upgrade to R12 first.

Here are the summarized steps:

Database Tier: The database migration from HP-UX on PA-RISC to HP-UX Itanium (IPF) and from Itanium (IPF) to PA-RISC by copying the datafiles is possible. Starting with RDBMS 9.2.0.7 and 10.2.0.2, it is supported to copy the database files (control files, redolog/archivelog files, data files) from HP-UX on PA-RISC to HP-UX Itanium (IPF) and from Itanium (IPF) to PA-RISC. While HP-UX on PA-RISC is big endian because of the processor, the Itanium processor adapts to the endianess of the OS and is in this case also big endian. Follow these steps to migrate db-tier.

    1. Patch the target operating system to the required level and make sure it has all the required OS patches and packages (e.g. ar, cc,aCC, make etc.) for more details check:Oracle Applications Release 12.0.4 Installation Manual

    2. Configure the target system as per Oracle installation and configuration guide.

    3. Install the required Oracle RDBMS software (9.2.0.7 or above, 10.2.0.2 or above) on the target platform (Itanium)

    4. Patch the Oracle RDBMS to the required level.

    5. For copying the database (using tar, cpio, dd etc.) from the source to the target system, create the required volume groups, logical volumes, file systems, raw devices etc. on the target system

    6. Shutdown the Oracle database on the source system

    7. Copy all the required Oracle database files, including control files, date files, undo, redo logs, Oracle initialization file etc. to the target system.

    8. If required, regenerate the control file (for example when changing the file locations on the target system).

    9. Setup and configure the Oracle listener etc.

    10. Startup the database on the target system.

Apps Tier: For Apps tier you will need to install R12 on itnanium (follow: Note 402307.1 – Oracle Applications Installation and Upgrade Notes Release 12 (12.0) for HP-UX Itanium) and upgrade to R12 running 4440000.drv on HP-UX Itanium platform.

More to follow on R12 upgrade later in my posts.

Posted in Advanced configurations | 11 Comments »

FND_STATS vs DBMS_STATS

Posted by Navdeep Saini on April 4, 2008

I have been asked this questions many times. Shoud we use fnd_stats or should we use dbms_stats in 11i (or release 12). Most of the Apps DBAs you will face will say use fnd_stats. In fact Oracle also recommends to use fnd_stats in 11i E-Business Environments. But do you know what are the basic difference or you can say advantages or using FND_STATS over DBMS_STATS.
Here I have tried to compare the two…

Posted in Performance Tuning | 12 Comments »

How to recover and open the database if the archive log required for recovery is missing.

Posted by kalpit on April 1, 2008

Last week, I had one interesting recovery scenario at client side; we had to recover one of our development databases from old backup.
As part of recovery process, our restore went fine and also were able to re-create controlfile. During recovery, it asked for Archive logs. We checked with our Unix team for required archivelogs and found out they don’t have required archive logs.

It was critical for us to recover database because of some project deadline.

Error:

SQL> recover database until cancel using backup controlfile;
ORA-00279: change 9867098396261 generated at 03/21/2008 13:37:44 needed for
thread 1
ORA-00289: suggestion : /arcredo/XSCLFY/log1_648355446_2093.arc
ORA-00280: change 9867098396261 for thread 1 is in sequence #2093

Specify log: {=suggested | filename | AUTO | CANCEL}
cancel
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01195: online backup of file 1 needs more recovery to be consistent
ORA-01110: data file 1: ‘/u100/oradata/XSCLFY/SYSTEM01_SCLFY.dbf’
ORA-01112: media recovery not started

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01195: online backup of file 1 needs more recovery to be consistent
ORA-01110: data file 1: ‘/u100/oradata/XSCLFY/SYSTEM01_SCLFY.dbf’

After doing some research, I found out one hidden parameter (_ALLOW_RESETLOGS_CORRUPTION=TRUE) will allow us to open database even though it’s not properly recovered.

We forced open the database by setting the _ALLOW_RESETLOGS_CORRUPTION=TRUE. It allows us to open database but instance crashed immediately after open. I checked the alert.log file and found out we have undo tablespace corruption.

Alert log shows below error

Errors in file /u01/XSCLFYDB/admin/XSCLFY/udump/xsclfy_ora_9225.trc:
ORA-00600: internal error code, arguments: [4194], [17], [9], [], [], [], [], []
Tue Mar 25 12:45:55 2008
Errors in file /u01/XSCLFYDB/admin/XSCLFY/bdump/xsclfy_smon_24975.trc:
ORA-00600: internal error code, arguments: [4193], [53085], [50433], [], [], [], [], []
Doing block recovery for file 433 block 13525
Block recovery from logseq 2, block 31 to scn 9867098416340

To resolve undo corruption issue, I changed undo_management to “Manual” in init.ora. Now it allowed us to open database successfully. Once database was up and running, I created new undo tablespace and dropped old corrupted undo tablespace. I changed back the undo_management to “Auto” and undo_tablespace to “NewUndoTablespace”.

It resolved our issue and database was up and running without any issue.

_ALLOW_RESETLOGS_CORRUPTION=TRUE allows database to open without consistency checks. This may result in a corrupted database. The database should be recreated.

As per Oracle Metalink, there is no 100% guarantee that setting _ALLOW_RESETLOGS_CORRUPTION=TRUE will open the database. However, once the database is opened, then we must immediately rebuild the database. Database rebuild means doing the following, namely: (1) perform a full-database export, (2) create a brand new and separate database, and finally (3) import the recent export dump. This option can be tedious and time consuming, but once we successfully open the new database, then we expect minimal or perhaps no data loss at all. Before you try this option, ensure that you have a good and valid backup of the current database.Solution:

1) Set _ALLOW_RESETLOGS_CORRUPTION=TRUE in init.ora file.
2) Startup Mount
3) Recover database
4) Alter database open resetlogs.
5) reset undo_management to “manual” in init.ora file.
6) startup database
7) Create new undo tablespace
changed undo_management to “AUTO” and undo_tablespace to “NewTablespace”
9) Bounce database.

Posted in Backup & Recovery | 42 Comments »

Back with BANG !!!

Posted by Navdeep Saini on March 27, 2008

OK folks, enough is enough. I am back on my blogsphere and gonna write some really cool stuff.
This time I promise to my visitors, that you will be dazlled !!!.

Hang in there and keep checlking my blog.

Posted in Authors | 2 Comments »

I will Write Soon….

Posted by Navdeep Saini on November 3, 2007

Friends I know I have not been writing lately..It is combination of lazyness and some important enagements.
I will update soon, about some exciting new things I have learned during my “exile”…

-nav

Posted in Authors | 6 Comments »

Database 11g announced for July 11, 2007

Posted by Navdeep Saini on June 6, 2007

Ok folks, for those of who were tired of 10g :-) here is the new database release from Oracle.
It is official now that it will be called 11g.

For more details see: Oracle Database 11g Launch – New York City

Posted in Database Stuff | Leave a Comment »

Script to kill Inactive forms sessions

Posted by Navdeep Saini on May 22, 2007

Lots of time, due to user training issues, users leave their forms sessions open and do not end thier sessions. This can cause lots of issues as sometime even querying in a forms sessions places a lock (I know of Order Entry forms which used to behave like that) and as DBAs you will end up clearing blocking sessions in the database.

Oracle apps has timeout profiles (e.g. ICX Session Timeout) options and confirguration options (Apache timeouts etc) , but most of them do not end the database session. Following script checks the forms sessions in the database and kill the database and their corresponding forms (f60webmx) sessions, which were “INACTIVE” and idle (LAST_CALL_ET) for more than 1 hour.

This works for all types of configurations, whether single node or multi-node. For multi-node the requirement is to enable remsh (remote shell), otherwise change the script to use ssh (for more secure option). It also create log and appends to it and send e-mail if there are any sessions that needs to be killed.

Click here to download script

Posted in My Scripts | 33 Comments »

How to force hints using Outlines

Posted by Navdeep Saini on May 18, 2007

Oracle introduced plan stability in form of Outlines in version 8i. However I have always been intrigued as to how to force a hint to a query using outline. This is particularly useful for Apps DBA, as sometimes it is not possible to change the query and put the hint in it, e.g.
If a query is being fired from some executable which cannnot be edited, like INCOIN for Item Import in Oracle Apps.

Also, often, changing sql will lead to invalidations and will require recompilation which means we may need downtime, e.g. if you want to force hint to a query which is being fired from package, you will have to change the package and recompile it and its dependencies.

If you want to force a hint to a query in a running production system and do not have downtime to do it, here is an example how you can achieve it:

Note: Always make sure you get the exact sql from the shared_pool. Outlines are very sensitive and even one blank space will invalidate the outlines.

1. Take the sql from the shared_pool. (Use hash_value in 9i or sql_id in 10g).
e.g. The following sql is in shared_pool:
select /*+ FIRST_ROWS +*/ * from ad_parallel_update_units where update_id = :b1 and status = :b2

2. This sql sometimes uses index AD_PARALLEL_UPDATE_UNITS_U1 and you have to force use of AD_PARALLEL_UPDATE_UNITS_N1 by using hints….

select /*+ index( ad_parallel_update_units AD_PARALLEL_UPDATE_UNITS_N1FIRST_ROWS) +*/ * from ad_parallel_update_units where update_id = :b1 and status = :b2

3. Create an outline on the original sql:
CREATE OR REPLACE OUTLINE ADPARALLEL_ORIG
select /*+ FIRST_ROWS +*/ * from ad_parallel_update_units where update_id = :b1 and status = :b2;

4. Create an outline for the hinted sql
CREATE OR REPLACE OUTLINE ADPARALLEL_HINT
select /*+ index( ad_parallel_update_units AD_PARALLEL_UPDATE_UNITS_N1) FIRST_ROWS) +*/ * from ad_parallel_update_units where update_id = :b1 and status = :b2;

5. Update the OUTLN.OL$HINTS table to replace the outline of orig sql with the “hinted” sql:
UPDATE OUTLN.OL$HINTS
SET OL_NAME=DECODE(OL_NAME,’ADPARALLEL_HINT’,’ADPARALLEL_ORIG’,’ADPARALLEL_ORIG’,’ADPARALLEL_HINT’)
WHERE OL_NAME IN (‘ADPARALLEL_ORIG’,’ADPARALLEL_HINT’);
commit;

6. You can drop the “hinted” sql at this point:
DROP OUTLINE ADPARALLEL_HINT

7. At this point if the sql: select /*+ FIRST_ROWS +*/ * from ad_parallel_update_units where update_id = :b1 and status = :b2
is parsed again, it will use the outline and will be forced to use index AD_PARALLEL_UPDATE_UNITS_N1, no matter what. You have to make sure that the cursor for this sql is invalidated and query is parsed again, otherwise database will continue using the old plan, which was cached before until next hard parse of this sql. One way to ensure hard parse is to flush shared pool:
alter system flush shared_pool.
However you will have to be careful as this will invalidate all the cursors and will cause hard parsing for all the sqls that are being fired in the database.

8. You can also monitor if the query is using outline or not by running following sqls:

    1. select name,used from dba_outlines where name=’ADPARALLEL_ORIG’
    The column “USED” will show value of ‘USED’ or ‘UNUSED’ accordingly.

    2. Select sql_text,outline_category from v$sql where hash_value=’Your hash value’ — For 9i
    Select sql_text,outline_category from v$sql where sql_id=’Your sql id’ — For 10g

    The column “OUTLINE_CATEFORY” will show some value e.g. DEFAULT in our case as we created outline in default category.
    If the value is null then outlines are not being used. This is the most fool proof and quick way of checking whether the current sql in the sql area is using the outline or not.

    3. Run event 10053 for the sql. Event 10053 traces the CBO Computations. (refer metalink note#225598.1)

    1. alter session set events =’10053 trace name context forever, level 1;
    2. then run explain for the sql e.g.
    explain plan for
    select /*+ FIRST_ROWS +*/ * from ad_parallel_update_units where update_id = :b1 and status = :b2

    A trace will be generated in your udump directory and look for words like outline in it. In 10g you will see something like:
    Outline Data:
    /*+
    BEGIN_OUTLINE_DATA…..

Note: There is a funny problem until 9i. In order to enable usage of outlines in database you have to alter session or system to enable “use_stored_outlines = {TRUE|category|FALSE}” e.g. to Enable usage of “Default outline category” system-wide you will have to :
ater system set use_stored_outlines = true;
However during next restart of database this value will be lost and you will stop using outline that you have created above. One solution is to create system level “on database startup” trigger and run alter system command in it. This problem is not seen in 10g.

References:Note:92202.1

Posted in Performance Tuning | 15 Comments »

 
Follow

Get every new post delivered to your Inbox.

Join 60 other followers