Just Oracle: Database general activities and issues II

Continuing the prevous topic with the same heading

Loss of TEMP Datafile
Symptoms: On large sorts (select distinct, order by, group by, union) that can't be done in memory, the sort will fail with "ORA-01157: cannot identify data file 3 - file not found" if the loss happened in the middle of the sort, or "ORA-01116: error in opening database file 3" if the loss happened before the sort started, along with the file name "ORA-01110: data file 3: '/u03/oradata/PROD/temp_PROD_01.dbf'". Nothing is put into the alert.log file, and no trace files will be generated for this.
You may also need to drop and recreate the TEMP tablespace after a database recovery, which may be indicated by an error message like "ORA-25153: temporary tablespace is empty" when you first log into your application (such as SCT Banner), although you probably won't get that message just logging into sqlplus.
Action: All you need to do is to take the datafile offline and drop and recreate the TEMP tablespace, which can be done either while the database is still running (from sqlplus) or beginning at the mount state (when connected as sysdba). Here, we are using a locally managed temporary tablespace.
$ sqlplus "/ as sysdba"
SQL> alter database datafile '/u03/oradata/PROD/temp_PROD_01.dbf' offline;
Then, if from mount state, do an "alter database open;" here.
SQL> select file_name,bytes/1024 kbytes from dba_temp_files;
Shows the size of the TEMP datafile(s) in Kbytes.
SQL> select initial_extent/1024 kbytes from dba_tablespaces
where tablespace_name = 'TEMP';
Shows the uniform size of the extents in Kbytes.
SQL> drop tablespace temp;
SQL> !rm /u03/oradata/PROD/temp_PROD_01.dbf
SQL> create tablespace temp
tempfile '/u03/oradata/PROD/temp_PROD_01.dbf' size 40064K
extent management local uniform size 640K;

TEMP Datafile Offline

Symptoms: Similar to Loss of TEMP Datafile, except for the message "ORA-00376: file 3 cannot be read at this time", along with the file name message.
Action: You can offline the datafile and drop and recreate the tablespace, as above, or you can do a datafile recovery with the database open and bring the datafile online.
SQL> recover automatic datafile '/u03/oradata/PROD/temp_PROD_01.dbf';
SQL> alter database datafile '/u03/oradata/PROD/temp_PROD_01.dbf' online;
Loss of INACTIVE Online Redo Log Group (which had already been archived)
Symptoms: The database crashes when an attempt is made to access that redo log group during a logfile switch. Current users will be kicked out with "ORA-01092: ORACLE instance terminated. Disconnection forced", and users trying to log on will get "ERROR: ORA-03114: not connected to ORACLE" and "ERROR: ORA-00472: PMON process terminated with error". Nothing will be placed in the alert.log file; however, there will be pmon (process monitor), dbwr (database writer), and lgwr (log writer) trace files generated in the background dump dest directory. Check these from the unix command line, since the database is not up to check them from:
$ grep background_dump_dest /u00/oracle/product/v723/dbs/initPROD.ora
Shows the pathname of the bdump directory.
$ cd /u00/oracle/admin/PROD/bdump
$ ls -ltr *.trc
Get the latest trace files in the bdump directory, such as pmon_13612.trc and lgwr_32306.trc (the last ones listed).
$ cat pmon_13612.trc
Shows "ORA-00470: LGWR process terminated with error" (also in dbwr).
$ cat lgwr_32306.trc
Shows "ORA-00313: open failed for members of log group 3 of thread 1" and messages containing the missing log file names like "ORA-00312: online log 3 thread 1: '/u03/oradata/PROD/log_PROD_3B.rdo'".
Action: You will need to startup the database in the mount state, drop the missing logfile group (and remove the group's member files, if they are not already gone), and add the logfile group back, at which time the database can be opened for use.
$ sqlplus "/ as sysdba"
SQL> startup
Shows "ORA-01081: cannot start already-running ORACLE - shut it down first".
SQL> startup force
Or, you could have done a "shutdown abort" followed by a "startup". Shows "Database mounted." and then "ORA-00313: open failed for members of log group 3 of thread 1" with additional messages containing the log file names.
SQL> select bytes/1024 from v$log where group# = 3;
Shows the size in K of the missing group's members.
SQL> select member from v$logfile where group# = 3;
Shows the member (file) names of the missing group.
SQL> alter database drop logfile group 3;
SQL> alter database add logfile group 3
('/u03/oradata/PROD/log_PROD_3A.rdo',
'/u03/oradata/PROD/log_PROD_3B.rdo') size 500K;
Recreates all member files for that group of the given size.
SQL> alter database open;
If just one member of group is lost, the other member(s) will take over and allow Oracle to keep running with no apparent problem. However, alert_PROD.log will show errors such as "ORA-00313: open failed for members of log group 3 of thread 1" and "ORA-00312: online log 3 thread 1: '/u03/oradata/PROD/log_PROD_3A.rdo'" on each log switch for that group. In this case, drop and recreate the member:
SQL> select * from v$log where group# = 3;
If the status is active or current for that group, do an "alter system switch logfile;"
SQL> alter database drop logfile member
'/u03/oradata/PROD/log_PROD_3A.rdo';
SQL> alter database add logfile member
'/u03/oradata/PROD/log_PROD_3A.rdo' to group 3;

Loss of CURRENT Online Redo Log Group (which needed to be archived)

Symptoms: This behaves like the database freezing when the archivelogs disk volume is filled up; however, doing a "df -k" shows that there is plenty of room for more archivelogs. Current users' sessions will freeze, and users trying to log on will get "ERROR: ORA-00257: archiver error. Connect internal only, until freed.", since the archiver process is still waiting to archive that redo log group after Oracle has cycled through the other groups and needs to use that group again. To verify this conclusion:
$ sqlplus "/ as sysdba"
SQL> select value from v$parameter where name = 'background_dump_dest';
Shows the pathname of the bdump directory.
SQL> !tail -200 /u00/oracle/admin/PROD/bdump/alert_PROD.log
Shows "ORA-00286: No members available, or no member contains valid data", indicating that the redo log group is missing or corrupt, as well as messages such as "ORACLE Instance PROD - Can not allocate log, archival required", "Thread 1 cannot allocate new log, sequence 21", and "All online logs needed archiving".
SQL> !ls -ltr /u00/oracle/admin/PROD/bdump/arch*
Get the latest archiver trace file in the bdump directory, such as arch_22882.trc (the last one listed).
SQL> !cat /u00/oracle/admin/PROD/bdump/arch_22882.trc
Also shows "ORA-00286: No members available, or no member contains valid data".
SQL> shutdown abort
SQL> startup
Shows "ORA-00313: open failed for members of log group 2 of thread 1", or whatever group number has the problem.
Then, if you had tried to just drop and recreate the redo log group from this mount state (as in loss of inactive online redo log group), you would have gotten:
SQL> alter database drop logfile group 2;
Fails with "ORA-00350: log 2 of thread 1 needs to be archived".
Action: This requires an incomplete recovery to just before that group was used, since Oracle can't continue without doing a successful archive, and since you can't drop and recreate the redo log group while the archiver has it marked for archiving. Note that all changes in the lost redo log group will be lost (yuck!). An incomplete time-based recovery is shown below:
SQL> connect / as sysdba
SQL> shutdown abort
At this point, restore ALL datafiles from the last backup, any archivelogs that are needed since that time, AND the lost online redo log group's files (this is the only case in which you would restore those redo logs), but, NOT the control files.
SQL> startup mount
SQL> select group#, sequence#, bytes, first_change#, to_char(first_time,'DD-MON-YY HH24:MI:SS'), status from v$log;
Get the first time for changes in that missing group, subtract 1 second from it, and use that value in the recover command below.
SQL> recover automatic database until time '2005-02-14:12:59:59';
SQL> alter database open resetlogs;
Then, immediately shut down the database and make a BACKUP, since the old backup is unuseable after RESETLOGS is used.

Failure During Hot Backup

If you have a failure while you are doing a hot backup on a tablespace, besides doing any recovery that is needed for the particular failure, you will also need to bring those tablespace datafiles back out of hot backup mode. To do this, while the database is in a mount state, do an "end backup" on each of those datafiles before opening the database. This is available in Oracle 7.2 and above - before that, you would have to restore all the datafiles and do an incomplete recovery to the time before the hot backup was started on that tablespace.
$ sqlplus "/ as sysdba"
SQL> startup mount
SQL> select df.name,bk.time from v$datafile df,v$backup bk
where df.file# = bk.file# and bk.status = 'ACTIVE';
Shows the datafiles currently in a hot backup state.
SQL> alter database datafile
'/u03/oradata/PROD/devl_PROD_01.dbf' end backup;
Do an "end backup" on those listed hot backup datafiles.
SQL> alter database open;

Loss of Server Parameter File (spfile)

If your server parameter file (spfile) becomes corrupt, and you haven't been creating a textual init.ora parameter file as a backup, you can pull the parameters from it using the strings command in UNIX to create an init.ora file. You will need to edit the resulting file to get rid of any garbage characters in it (but don't worry about the "*." characters at the beginning of the lines) and make any corrections to it before using it to start your database, but, at least you will have something to go by:
$ cd $ORACLE_HOME/dbs
$ strings spfilePROD.ora >initPROD.ora
If you have been saving off a textual init.ora parameter file as a backup, you can restore that init.ora file to the $ORACLE_HOME/dbs directory in UNIX (or $ORACLE_HOME\database directory in NT). You will need to delete the corrupt spfile before trying to restart your database, since Oracle looks for the spfile first, and the init.ora file last, to use as the parameter file when it starts up the database (or, you could leave the spfile there and use the pfile option in the startup command to point to the init.ora file). Then, once your database is up, you can recreate the spfile using the following (as sysdba):
SQL> create spfile from pfile;

Just Oracle

Friday, January 2, 2009

Database general activities and issues II

No comments: