1.Enabling Archiving
To enable archivelog mode so that Oracle saves the old redo logs to disk after a log switch before overwriting them when it cycles back around, perform the following commands. Oracle requires at least two redo logs (log groups) in order to accommodate possible archiving, since one could be being spooled during archiving to the archivelogs directory while changes currently being made to the database are being written to the other by the log writer.
$ vi $ORACLE_HOME/dbs/initPROD.ora
Edit init.ora file to contain the archive parameters (directory name and file format and the start flag):
log_archive_dest = /u01/oradata/PROD/archivelogs/
log_archive_format = arch_PROD_%S.arc
log_archive_start = true
("archive log start" can also be used when connected as sysdba to start the arch process, if you don't want it automatically done through init.ora; but, neither of these turn on archivelog mode - they just start the arch process itself.)
$ mkdir /u01/oradata/PROD/archivelogs
$ sqlplus "/ as sysdba"
SQL> shutdown immediate
SQL> startup mount
SQL> alter database archivelog;
Turns on archivelog mode for the database.
SQL> alter database open;
SQL> archive log list
Shows that you are now in archivelog mode, that archiving is enabled, and shows you the current log sequence numbers. You can "alter system switch logfile;" a few times to see archivelog files going into the log_archive_dest directory.
SQL> select * from v$database;
Also shows that you are in archivelog mode.
To disable archivelog mode, similar commands (when connected as sysdba) are used, except for using the "alter database noarchivelog;" command instead (these commands are also performed after the database is shut down). The log_archive_start parameter in the init.ora file would also need to be set to false to permanently disable the arch process itself.
2.Moving Datafiles
You can move a datafile to another disk, or change it's name (such as for restoring a copy of it from your backup tape to that other location in the event of a disk failure or moving it to another disk that has more room for it) by making the move after shutting down the database, and using the "alter database rename file" command to tell Oracle where the file has been moved to, as in the following example, which moves one of the development datafiles from disk u03 to disk u04:
$ sqlplus "/ as sysdba"
SQL> shutdown immediate
SQL> !mv /u03/oradata/PROD/devl_PROD_01.dbf /u04/oradata/PROD
SQL> startup mount
SQL> alter database rename file '/u03/oradata/PROD/devl_PROD_01.dbf'
to '/u04/oradata/PROD/devl_PROD_01.dbf';
SQL> alter database open;
SQL> select * from v$datafile;
Lists the current datafiles, showing your changes.
This can also be done without shutting down the database, but taking the associated tablespace offline first (which prevents others from accessing that tablespace's tables, indexes, and other data):
$ sqlplus "/ as sysdba"
SQL> alter tablespace development offline;
SQL> !mv /u03/oradata/PROD/devl_PROD_01.dbf /u04/oradata/PROD
SQL> alter database rename file '/u03/oradata/PROD/devl_PROD_01.dbf'
to '/u04/oradata/PROD/devl_PROD_01.dbf';
SQL> alter tablespace development online;
SQL> select * from v$datafile;
Finally, the textual control file from the "alter database backup controlfile to trace;" command can be used to change the datafile locations, such as to move all datafiles from one disk to another (u03 to u04 in the example below). The textual control file is created in the database's udump directory, which is found by using either of the following commands:
SQL> select value from v$parameter where name = 'user_dump_dest';
SQL> show parameter user_dump
with a name like /u00/oracle/admin/PROD/udump/prod_ora_16060.trc. (A sample textual control file is shown in the Appendix.)
SQL> connect / as sysdba
SQL> alter database backup controlfile to trace;
SQL> show parameter user_dump
SQL> shutdown immediate
SQL> exit
$ mv /u03/oradata/PROD/*PROD*.dbf /u04/oradata/PROD
$ cd /u00/oracle/admin/PROD/udump
$ ls -ltr *.trc
To find the name of the textual control file just created (the last one in the ls listing, such as prod_ora_16060.trc).
$ vi prod_ora_16060.trc
Edit the textual control file and remove all lines before the STARTUP NOMOUNT line, remove the lines after ALTER DATABASE OPEN, or, if present, ALTER TABLESPACE TEMP ADD TEMPFILE, change the names in the DATAFILE section to match the new disk names such as from "u03" to "u04" (DO NOT change the order of the file names!), comment out (put # in front of) the RECOVER command, and, for Oracle 9i and above, and change all comment lines to start with dashes. You may also want to change the TEMPFILE location, which is not in the DATAFILE section for Oracle 9i+. The vi commands to do these (for 9i+; except for TEMPFILE) are usually:
:1,/STARTUP NOMOUNT/-1d
:/ALTER TABLESPACE TEMP/+2,$d
:g/^DATAFILE/,/;/s/\/u03/\/u04/
:/RECOVER DATABASE/s/^/# /
:1,$s/^#/--/
:wq
[Be sure to read the NOTES below.]
$ sqlplus "/ as sysdba"
SQL> @prod_ora_16060.trc
Recreates the control files with the new disk volume names.
SQL> select * from v$datafile;
NOTES: Be aware that the 9i "alter database backup controlfile to trace;" command creates a textual control file with two "create database" commands. You will use the first set (the one with NORESETLOGS) and edit out the second set, while you are changing the directory and/or file names in the textual control file. Also, be sure to include the ADD TEMPFILE command that may be generated following the OPEN command - that is easy to overlook and get edited out accidentally.
3.Adding Redo Log Members
To add another member on another disk drive and separate controller to a redo log group, in order to mirror the logs in the group for media failure protection, you will need to specify the new file name and the current group number, as shown below. The new redo log file will be created the same size as all the other redo log files in the group.
SQL> alter database add logfile member
'/u04/oradata/PROD/log_PROD.rdo' to group 1;
SQL> select * from v$logfile;
Lists the current log files, showing your changes.
To add more members than maxlogmembers to a redo log group, you will need to recreate the control files with that new maximum by creating a textual control file, editing the maxlogmembers parameter in it, and running that SQL when connected as sysdba after shutting down the database:
$ sqlplus "/ as sysdba"
SQL> alter database backup controlfile to trace;
SQL> !ls -ltr /u00/oracle/admin/PROD/udump
To find the name of the textual control file just created.
SQL> !vi /u01/oracle/admin/PROD/udump/prod_ora_16060.trc
Edit the textual control file and remove all lines before the STARTUP NOMOUNT line, remove the lines after ALTER DATABASE OPEN, or, if present, ALTER TABLESPACE TEMP ADD TEMPFILE, change the maxlogmembers value from the default 2 to 3 or 4, comment out (put # in front of) the RECOVER command, and, for Oracle 9i and above, and change all comment lines to start with dashes. The vi commands to do these (for 9i+) are usually:
:1,/STARTUP NOMOUNT/-1d
:/ALTER TABLESPACE TEMP/+2,$d
:/MAXLOGMEMBERS/s/2/3/
:/RECOVER DATABASE/s/^/# /
:1,$s/^#/--/
:wq
SQL> shutdown immediate
SQL> @/u01/oracle/admin/PROD/udump/prod_ora_16060.trc
Recreates the control files with the new maxlogmembers value.
4.Adding Redo Log Groups
Additional redo log groups can be added to lessen the possibility of the database freezing while waiting on archiving to free up the next log group. The new group number along with the list of new redo logs and the allocation size for those member files is specified as shown:
SQL> alter database add logfile group 4
('/u01/oradata/PROD/log_PROD_4A.rdo',
'/u01/oradata/PROD/log_PROD_4B.rdo') size 500K;
SQL> select * from v$logfile;
5. Multiplexing Control Files
Multiplexing the control files involves just shutting down the database, copying the current control file to the other disks and renaming them, editing the init.ora file to include the new control file names, and restarting the database. From that point, all datafile header updates and datafile additions and migrations will be reflected in all of the identical multiplexed control files, and the other remaining control file(s) can be used to bring the database back up if one or more of the control files are lost.
$ sqlplus "/ as sysdba"
SQL> shutdown immediate
SQL> host
$ cp -p /u03/oradata/PROD/ctrl_PROD_01.ctl /u01/oradata/PROD/ctrl_PROD_02.ctl
$ cp -p /u03/oradata/PROD/ctrl_PROD_01.ctl /u00/oradata/PROD/ctrl_PROD_03.ctl
Copies the original control file to the other disks, renaming them to match the control_files parameter.
$ vi $ORACLE_HOME/dbs/initPROD.ora
Edit init.ora to add other control files on separate disks to the control_files parameter, using their full pathnames, separated by commas, such as:
control_files = (/u03/oradata/PROD/ctrl_PROD_01.ctl,
/u01/oradata/PROD/ctrl_PROD_02.ctl,
/u00/oradata/PROD/ctrl_PROD_03.ctl)
$ exit
SQL> startup
SQL> select * from v$controlfile;
Lists the current control files, showing your changes.
Multiplexing Archive Log Files
I haven't done this yet, but, multiplexing archive log files is available in Oracle in 8i and beyond (see chapter 8 in Oracle9i Database Administrator's Guide). For earlier versions, you could set up a cron job to run every half-hour or so to copy the archive log files not currently in the mirror directory into it. Make sure that the mirror directory is on a different disk than the archive logs directory; otherwise, you defeat the purpose if the archive logs disk goes out.
6. Backups
The DBA's primary job is to make sure that the data is available and accessable by the users during those times that they need it, which means that a complete and well-tested backup and recovery procedure is in place and functioning. This section covers what files need to be included in that backup, the types of backups (cold backups and hot backups), and other processing that you could perform during the backup for proactive maintenance.
What To Back Up
The following types of files making up the database and providing recovery capabilities should be backed up during one backup cycle, such as all of them during a cold backup, or all or subsets of them during a hot backup cycle, assuming the entire set is eventually backed up during that cycle (see Oracle9i User-Managed Backup and Recovery Guide, chapter 2):
• Datafiles (for all tablespaces)
• Control Files (binary and textual versions)
• Redo Log Files (cold backups only, not hot backups)
• Archive Log Files (archived redo logs, if archivelog mode is enabled)
• Parameter Files (init.ora; like $ORACLE_HOME/dbs/initPROD.ora)
• Password Files (like $ORACLE_HOME/dbs/orapwdPROD, if used)
The most basic way to get the names of the datafiles, control files, and redo log files through SQL is shown below, which can be used in both the startup mount state and the database open state:
SQL> select name from v$datafile;
SQL> select name from v$tempfile;
SQL> select name from v$controlfile;
SQL> select member from v$logfile;
You can also get the tablespace name with the datafile and temp file names using:
SQL> select tablespace_name,file_name from dba_data_files order by tablespace_name;
SQL> select tablespace_name,file_name from dba_temp_files;
The list of archive log files which are currently on-line cannot be found through SQL. However, you can get the directory containing those archive logs, the format of the file names, and whether or not archiving was turned on in the init.ora parameter file using:
SQL> select name,value from v$parameter where name in ('log_archive_dest',
'log_archive_format','log_archive_start');
These archiving parameters can also be found in an sqldba session by issuing either of the following commands (but the first one might truncate the dest directory name):
SQL> show parameter archive
SQL> archive log list
To get the list of archive log files for the last 5 days (the date is when Oracle started writing to that redo log, not when it was copied to the archivelogs directory), which may or may not still be on-line, you can query the v$archived_log table in Oracle 8.x and above:
SQL> select name from v$archived_log
where trunc(completion_time) >= trunc(sysdate)-5;
Or, you can query the v$log_history table in Oracle 7.x:
SQL> select archive_name from v$log_history
where trunc(to_date(substr(time,1,8),'MM/DD/YY')) >=
trunc(sysdate)-5;
The name of the parameter file is not kept in the database, but is usually $ORACLE_HOME/dbs/initPROD.ora (substitute your Oracle SID).
Since you can't find the currently existing archive log files or the init.ora file or the password file through SQL, the best way to get the list of database files to back up (assuming you are using a standard naming convention for all of your files related to a database instance, such as using the Oracle SID in the names of those files or in the name of one level of the directory hierarchies containing your datafiles, control files, redo log files, and archive log files) is by using the unix find command to get the list of file pathnames that contain that Oracle SID, as shown below (which is redirect to a file named backemup.dat):
$ find / -name '*PROD*' ! -type d 2>/dev/null >backemup.dat
The addendums to this presentation contain the scripts backup_list.shl and backup_list.sql, which are sample unix and SQL scripts used to create the list of files to be backed up, assuming a standard naming convention. More information on naming conventions can be found in papers on Oracle's OFA, which is referred to as either Oracle Flexible Architecture or Optimal Flexible Architecture.
6.1 Cold Backups
Cold backups are done while the database is shut down, usually after a "shutdown immediate" is issued, and contain the most complete backup snapshot, since all database files, including control files and redo log files, are backed up from the same point in time.
During the time that a cold backup is in progress, the database is not available for querying or updating while those files are being copied to tape or to some other disk drive or directory (possibly compressing (zipping) them at the same time). Cold backups can be done with the database either in archivelog mode or in noarchivelog mode.
To perform cold backups, you will first need to get the list of files to be backed up (putting the file names into backemup.dat in the examples below), including datafiles, control files, and redo log files, which you can get either by querying the database itself while the database is still up (see "What To Back Up") or by using the unix "find" command to locate files following a standard naming convention, preferrably after you have shut down the database (see below), since the archive log files and other database files can also be picked up at that time by that "find" command. Be sure to get a fresh list of files for each backup in order to accommodate structural changes in the database (such as new datafiles) as well as additional archive log files.
Then, you should include the parameter file and password file (if used) in the list. If you are using the "find" command, those files could also be picked up by "find", since they both usually contain the SID name. It would also be a good idea to include a textual control file in the list, which you would create before shutting down the database for the backup, as shown below (which contains the SID name as part of the directory name and, so, could also be picked up by "find"):
$ sqlplus "/ as sysdba"
SQL> alter database backup controlfile to trace;
SQL> shutdown immediate
SQL> exit
$ ls -ltr /u00/oracle/admin/PROD/udump/* | tail -1 | sed 's/^.* \//\//' >>backemup.dat
Puts the name of the textual control file just created in the udump directory (the last one in the ls listing, such as prod_ora_16060.trc) into the list of files to be backed up.
Finally, after shutting down the database, you will need to get the list of the archive log files to back up (the shutdown may cause new archive log files to be written out). You could do this either as part of the "find" command at this point, along with the datafiles, control files, and redo log files, or by just getting the fully-qualified listing of the file pathnames in the archivelogs directory. You could also combine the two and get the list of non-archive files for the backup, followed by the list of archive files (zipped followed by not zipped here), as shown below:
$ find / -name '*PROD*' ! -type d 2>/dev/null | grep -v 'arc$' | grep -v 'gz$' >>backemup.dat
$ ls /u01/oradata/PROD/archivelogs/*.arc.gz >>backemup.dat
$ ls /u01/oradata/PROD/archivelogs/*.arc >>backemup.dat
After the list of files has been created (in backemup.dat) and the database has been shut down, you can do any of the following to back up the files or make copies for the backup. The first two options allow you to start up the database before writing the copied files to tape, which usually results in a shorter down time for the database.
• Copy the files to another staging (backup) directory (such as /u03/oradata/prod1 here) to be written to tape later, and restart the database, or,
• Compress (using GNU Zip here) the files to another staging directory to be written to tape later, and restart the database, or
• Copy the files directly to tape (no staging directory), then, restart the database.
These three options can be performed as shown below (do only one of them):
$ cat backemup.dat | sed 's/\(^.*\)\/\(.*\)$/cp -p \1\/\2 \/u03\/oradata\/prod1\/\2/' | sh
Copies the files to another staging (backup) directory.
$ cat backemup.dat | sed 's/\(^.*\)\/\(.*\)$/gzip -cv1 \1\/\2 >\/u03\/oradata\/prod1\/\2.gz;
touch -r \1\/\2 \/u03\/oradata\/prod1\/\2.gz/' | sh
Compresses (zips) the files to another staging directory (the touch command must be on the same line as the sed command).
$ cat backemup.dat | cpio -ovC64 >/dev/rmt0
Copies the files directly to tape (without using a staging directory).
Then, restart the database after either of the three above commands:
$ sqlplus "/ as sysdba"
SQL> startup
SQL> exit
Be sure to copy the backup disk directory files to tape after you have done the startup for the first two options, such as shown below. (See "Backup/Restore Using Unix cpio Command" for cpio description.)
$ ls /u03/oradata/prod1/* | cpio -ovC64 >/dev/rmt0
6.2 Hot Backups
Hot backups are done while the database is still up and running and being accessed and updated. This is used when continuous 24-hour-per-day / 7-day-per-week operations are required. If you have even a half-hour window when your database can be down, you can probably do a cold backup using the staging directory method above.
Doing hot backups requires that archivelog mode is turned on. It is more taxing on redo logs and archivelogs when the backups are being performed, since entire database blocks are written to the redo logs for the changes that are made, instead of just the individual changes (transaction data).
You must NEVER backup the online redo log files when doing hot backups. Restoring a backup of an online redo log file would cause corruption during database recovery. The online redo log files are the most vulnerable items in the hot backup scheme. However, forcing log switches can catch the pending datafile updates in the redo logs and give you some coverage in this area.
You cannot backup the online control files themselves, but you can backup a special copy of them, either a binary copy or a textual copy, as shown below.
To perform hot backups, you will only back up one tablespace's datafiles at a time. In order to reduce the time that the tablespace is in backup mode, it is best to copy its datafiles to a backup disk directory, instead of to tape at the current time. When all copies have been done, you can then copy that backup disk directory to tape while the database is running normally. You must use the "begin backup" and "end backup" commands, as shown below; otherwise, your backups will be corrupted and useless. Note that the data in the tablespace remains available to the users while in backup mode, and any transactions can still be performed on that data. (See section on Disaster Recovery Scenarios for info on database crashes during hot backups.)
For each tablespace, perform the following steps while connected as sysdba (the DEVELOPMENT tablespace is shown here - change the tablespace name and the datafile names as appropriate for each tablespace backup):
SQL> alter tablespace development begin backup;
SQL> !cp -p /u03/oradata/PROD/devl_PROD_*.dbf /u03/oradata/prod1
Copies the tablespace's datafiles to a backup disk directory (/u03/oradata/prod1 here), using standard naming conventions.
SQL> alter tablespace development end backup;
Force a log switch to archive the current log to catch all of the committed transactions in it. Then, copy the archive log files (zipped and not zipped) to your backup disk directory, as shown below:
SQL> alter system switch logfile;
$ ls /u01/oradata/PROD/archivelogs/*.arc.gz >logfile.lst
$ ls /u01/oradata/PROD/archivelogs/*.arc >>logfile.lst
Generates the list of archivelog files to be copied (don't just copy the entire directory - you might catch an archivelog file in mid-creation, but, do keep the archiving process running).
$ sleep 5
Waits for a few seconds to allow current archivelog writes to complete.
$ cat logfile.lst | sed "s/\(.*\/\)\([^\/].*\)/cp -p \1\2 \/u03\/oradata\/prod1\/\2/" >logfile.shl
$ sh logfile.shl
The above commands copy the archivelog files to your backup disk directory.
Create a control file to back up (binary and textual) AFTER all other files have been copied:
SQL> alter database backup controlfile to '/u03/oradata/prod1/controlfile.ctl';
SQL> alter database backup controlfile to trace;
$ ls -ltr /u00/oracle/admin/PROD/udump
To find the name of the textual control file just created (the last one in the ls listing, such as prod_ora_16060.trc).
$ cp -p /u00/oracle/admin/PROD/udump/prod_ora_16060.trc /u03/oradata/prod1
Copies that textual control file to your backup disk directory.
After all datafiles and log files have been copied to your backup disk directory, back up those copies in your backup disk directory to tape. If you wanted to zip those files first with a utility such as GNU Zip to backup the compressed versions (but see below), you would do that before the cpio copy command, as shown below.
$ cd /u03/oradata/prod1
$ gzip -vN1 *
$ find /u03/oradata/prod1 | cpio -ovC64 >/dev/rmt0
Partial hot backups, in which the tablespaces are backed up at separate times, possibly across several days for the entire set of tablespaces, are possible, but not recommended.
NOTE: Only put one tablespace at a time into hot backup mode, and, then, for as short a time as possible. Hot backups are hard on your redo logs, since entire database blocks are written to them for each change, instead of just the individual changes themselves, so, your redo logs tend to get filled up fast, resulting in much more frequent log switches while your tablespaces are in hot backup mode. A checksum is written to the beginning and to the end of the database block whenever a change is made. In Oracle7, during recovery, if those two checksums are different (meaning a change occurred in that block while it was being written out during a hot backup), the changed block is retrieved from the archivelogs. In Oracle8 (from what I remember from my Oracle class), those two checksums are compared during the hot backup, and the archivelog block is written out instead of the database block if a change is detected. It is for this reason that entire changed blocks are archived during hot backups, instead of just the changed bytes.Instead of doing compression after you have copied the files into the backup disk directory, you can use a UNIX pipe (or other operating system equivalent, if available) to compress the files on the fly as you copy or create them, such as in this export example, in which exp writes to the pipe while gzip reads from the pipe (that's a double quote/single quote after exp and a single quote/double quote after sysdba in the export (exp) command):
$ mknod /tmp/exp_pipe p
$ gzip -cNf prod.dmp.gz &
$ exp "'/ as sysdba'" file=/tmp/exp_pipe full=y compress=n log=prod.dmp.log
$ rm -f /tmp/exp_pipe
Other Nightly Processing
In addition to backups, other nightly processing can also be done to guard against data loss and allow for proactive problem detection and correction, such as the following, which are described below:
1.Create and keep current copies of your textual control files.
2.Create and keep current copies of your textual init.ora file if you are using an spfile.
3.Perform full database exports for quick table restores and to check datafile integrity.
4.Generate definitions for all of your tables and indexes.
5.Gather statistics on datafile and index space usage and table extent growth for proactive maintenance.
Keep current copies of the textual control files for each of your database instances, in case the control file is corrupted and you have to rebuild it. For each of the database instances (such as SEED, TRNG, PPRD, and PROD), switch to that instance (use ". oraenv" and enter SEED, or, use "export ORACLE_SID=SEED", to switch to SEED in unix), and enter the following in sqlplus or svrmgr to create the textual control file in the instance's user_dump_dest directory specified in the init.ora file:
SQL> alter database backup controlfile to trace;
Keep a current copy of the textual init.ora file (initialization parameter file) if you are using an spfile (server parameter file) instead of an init.ora file to hold your initialization parameters. If the binary spfile becomes corrupt, the init.ora file can be used in its place to restart your database (after deleting the spfile, since Oracle looks for the spfile first). To create the init.ora file (which is placed in the $ORACLE_HOME/dbs directory on UNIX systems, or the $ORACLE_HOME\database directory on NT systems):
SQL> create pfile from spfile;
Create an export file of the entire database for individual table restores. This will also detect any block corruption in the table data, since a full table scan is performed. Block corruptions are not detected in physical block copies during tape backups, so you wouldn't know that you were propagating bad blocks to all of your backups until a user tried to access data in that block. During export, if a bad data block is encountered, the export will fail.
$ exp "'/ as sysdba'" file=/u03/oradata/prod1.dmp full=y \
log=/u03/oradata/prod1.dmp.log >/dev/null 2>/dev/null
Create an indexfile of that export for table and index definitions. The indexfile can also be used to separate tables into product-based tablespaces, or to split the indexes off into their own tablespace.
$ imp "'/ as sysdba'" file=/u03/oradata/prod1.dmp \
indexfile=/u03/oradata/prod1.idx full=y \
log=/u03/oradata/prod1.idx.log >/dev/null 2>/dev/null
Gather statistics on space usage so that you will know when to reallocate tables to give them larger extents before they run out of extents to allocate (using gurrddl or gurnddl along with export/import), or to add datafiles to tablespaces that are about to fill up (alter tablespace ... add datafile ...), or to recreate indexes that currently have unused or wasted space due to deletions from their associated tables. Below are some sample commands you can use to see potential space problems for tables and tablespaces and to see if an index has a lot of wasted space (and should be recreated at a time of no activity):
SQL> select segment_name,segment_type,extents,max_extents
from sys.dba_segments where extents + 5 > max_extents
or extents > 50 order by segment_name,segment_type desc;
SQL> col "Free" format 9,999,999,999
SQL> col "Tot Size" format 9,999,999,999
SQL> select tablespace_name,sum(bytes) "Tot Size"
from dba_data_files group by tablespace_name;
SQL> select tablespace_name,sum(bytes) "Free"
from dba_free_space group by tablespace_name;
For each index, find the number of deleted rows (may rebuild it later):
SQL> validate index posnctl.nhrfinc_key_index;
SQL> select name,del_lf_rows_len from index_stats;
The above pair are used together (validate clears out the index_stats table before putting its stats into it).
7. Archiving To Tape
The best method for periodically writing the archive log files to tape is to send them to disk first and have a cron job backup those archive logs to tape later on. Never archive directly to tape, since that would require a dedicated tape drive just for the archivelogs, and since the backup would need to complete before cycling back to that online redo log (otherwise, the database will "freeze" until that redo log is available for reuse - the same thing that happens if your archivelog directory gets filled up). If you did have a tape drive that could be dedicated to the archivelogs, they could be backed up more frequently than a normal backup procedure, such as every 10 minutes. In that case, you should only backup the new files which haven't been placed on the tape yet.
8. Disaster Recovery Scenarios
Backups are no good unless you know how to use them to restore your database up to and including the most recent change made to the data. Oracle gives those capabilities for up-to-the-instant recoveries to you, as long as you know how to use them. This section describes Oracle's recovery capabilities and how to use them for general recovery of datafiles, tablespaces, or the entire database, and for specific disaster recovery scenarios, which are variations on the themes of the general recovery procedures.
Things To Check
Messages displayed during database startup will usually indicate what kind of problems the database is experiencing. After you get an error message during your initial startup, you will most likely need to do a shutdown before proceeding with database recovery.
Alert Log in the database's bdump directory (you can find bdump using "select value from v$parameter where name = 'background_dump_dest';" or "show parameter background_dump"), named something like /u00/oracle/admin/PROD/bdump/alert_PROD.log).
Other recent trace (.trc) files in the database's bdump (background_dump_dest) and udump (user_dump_dest) directories.
Oracle processes, using "ps -ef | grep ora" at the unix command line. Should see entries like ora_smon_PROD, ora_pmon_PROD, ora_dbwr_PROD, ora_lgwr_PROD, ora_arch_PROD, and other ora_xxxx_PROD processes for each database instance, and oraclePROD for jobsub and other users.
The most recent .lis files in the job submission directory (such as /home/jobsub), at the end of the "ls -ltr /home/jobsub" listing, which may indicate why a user's run crashed or why it filled up a table or archive log directory.
What To Restore
Only restore the datafile that has gone bad if you are recovering up to the current time.
Restore ALL datafiles if you are recovering up to an earlier time than now but after those datafiles were backed up (for point-in-time recovery or cancel-based recovery). Since Oracle requires that all datafiles be synchronized at the same SCN (System Change Number), just recovering one datafile to an earlier time would cause that datafile to have a lower SCN than all the other datafiles, which Oracle wouldn't accept.
Restore all archivelog files that will be needed in the recovery (for the time range between the backup and the point-in-time that you are recovering to), if they are not currently available on-line or if those on-line archivelog files have been corrupted.
NEVER restore control files unless all copies are lost. See "Loss of Control Files" for information about restoring control files.
NEVER restore online redo log files from a hot backup (you shouldn't have backed them up in the first place); it is OK to restore them when doing cold backups, but ONLY IF you are restoring up to the time of that backup or doing a partial restore where those old logs won't be used during the restore. Restoring a backup of an active redo log file would cause corruption during database recovery. If the redo logs are deleted or damaged, just remove the damaged redo log files and reset the logs on open, and Oracle will recreate the redo log files for you (be sure you backup immediately after any resetlogs command, since Oracle can't recover datafiles restored from a backup made prior to resetting the logs):
$ sqlplus "/ as sysdba"
SQL> startup mount
SQL> alter database open resetlogs;
Backup/Restore Using Unix cpio Command
Here is a quick overview of using the unix cpio command for tape backups and restores. The tape drive for our system is /dev/rmt0. (A blocksize of 32K, indicated by C64, is used by these cpio commands. You can change the blocksize of your tape drive in AIX from user root through the smit utility, or by using the chdev command, as in "chdev -l 'rmt0' -a block_size='32768'", which will speed up the writing to that tape drive from the default 512-byte blocksize.)
Saving files to backup:
$ cat backemup.dat | cpio -ovC64 >/dev/rmt0
Listing tape contents:
$ cpio -itvC64" < "/"dev/rmt0
Restoring files to same pathnames:
$ cpio -imvC64 "<"/"dev/rmt0
Restoring files individually to new pathnames, entering the new pathname when prompted (also use "u" flag to overwrite newer files if they exist as newpathname; for fullpathname, enclose patterns in double quotes):
$ cpio -irmC64 fullpathname "<"/"dev/rmt0
newpathname (or, period (".") for same name, or, enter to skip file)
Disaster Recovery Overview
The first thing you should ALWAYS do when your database crashes is to make a BACKUP of your datafiles, control files, archive log files, and initialization parameters file (init.ora) for your database that crashed, either copying those files to tape or to another directory, or zipping them with something like GNU Zip, or making a tar file of them. DO NOT TRY TO RESTART YOUR DATABASE BEFORE BACKING UP THOSE FILES. Otherwise, you might make the situation worse. If all else fails, the people at SCT or Oracle may be able to get your database going again from that initial backup, even though the crash may have corrupted some files.
After a crash occurs and you've made a backup, do a STARTUP when connected as sysdba. If there are problems encountered during startup, Oracle will display an error message about the first problem, which will usually give you an idea about what type of recovery scenario you will need to apply. In most cases, you will need to do a shutdown after this initial startup attempt before beginning the recovery.
There are three primary recovery options for media failure scenarios. The three recovery options are:
1) Recover Database
2) Recover Datafile
3) Recover Tablespace
Most of the other recovery scenarios are just variations on these three themes. Recovery steps for specific disaster recovery scenarios are given later on; however, in a generic recovery:
1.The database is shut down or the affected datafile(s) or tablespace is taken offline.
2.The affected datafile(s) is restored from a backup along with the archived redo logs created since that backup.
3.A recover command is issued.
4.The tablespace or datafile(s) is brought back online.
You can see the Oracle9i User-Managed Backup and Recovery Guide for more information on Oracle database disaster recovery.
Basic Recover Database Recovery Option
The Recover Database option is used to recover all datafiles needing recovery, either up to the point of failure, called complete recovery, or up to a point in the past before the failure, called incomplete recovery, which is only available using this option. Recover Database is performed from the MOUNT state, so, the database will have to be SHUT DOWN before using this. All datafiles must be ONLINE to be recovered, in contrast to the other two options where the datafiles are offline in most cases. The basic steps are: restore the datafiles, mount, online the datafiles, recover database, and open, as shown below:
$ cp -p /u03/oradata/prod1/devl_PROD_01.dbf /u03/oradata/PROD
Restores the datafiles from backup, such as devl_PROD_01.dbf here.
$sqlplus "/ as sysdba"
SQL> startup mount
SQL> select * from v$datafile;
Shows status of datafiles, indicating those that are offline.
SQL> alter database datafile '/u03/oradata/PROD/devl_PROD_01.dbf' online;
Online the offline datafiles shown above, such as devl_PROD_01.dbf here.
SQL> set autorecovery on
Tells recovery not to ask which redo logs to use (in log_archive_dest).
SQL> recover database;
Recovers all datafiles, as needed, up to the current point in time.
SQL> alter database open;
Opens the database for user access.
You may also need to drop and recreate the TEMP tablespace after a database recovery. See Loss of TEMP Datafile on how to do this.
Recover Database can also be used to perform an incomplete recovery up to some point in time before the failure occurred, or before the database was last shut down. To do this, ALL datafiles must be restored from the backup before doing the incomplete recovery (not control files or online redo log files), and ALL datafiles are recovered back to the same point in time - you can't do a partial incomplete recovery.
There are three incomplete recovery options available: time-based, cancel-based, and change-based recovery. Time-based recovery brings the datafiles up to a given date and time. Cancel-based recovery applies entire archive log files and online redo log files one at a time until the word CANCEL is typed in, so you can't use autorecovery with it. Change-based recovery applies all archive log files and online redo log files up to, but NOT including, a given system change number. If you have added datafile(s) since the last backup, and you want to recover up to a point before the datafile(s) were added, you will need to restore the control files and use the "using backup controlfile" option of the recover command (see Loss of Control Files). (I don't know of a way to recover up to a point between datafile additions - it's all or none.) The basic steps are similar to the complete recover database option, except for the incomplete recovery commands, which are shown below (only use one of these):
SQL> recover automatic database until time '2005-02-14:15:45:00';
Automatic is similar to Set Autorecovery On. Use 1 second before the time of failure or of the log to exclude, which, for Oracle 8.x and above, could be found using:
select to_char(a.time,'YYYY-MM-DD:HH24:MI:SS') as time, a.name from (select first_time as time, substr(name,1,58) as name, first_change# as change from v$archived_log union select first_time, 'Redo Log Group ' || to_char(group#), first_change# from v$log) a order by change;
Or, for Oracle 7.x, could be found using:
select to_char(a.time,'YYYY-MM-DD:HH24:MI:SS') as time, a.name from (select time, substr(archive_name,1,58) as name, low_change# as change from v$log_history union select first_time, 'Redo Log Group ' || to_char(group#), first_change# from v$log) a order by change;
SQL> recover database until cancel;
Accept logs until you type in CANCEL.
SQL> recover automatic database until change 43047423;
The system change numbers (SCN) contained in each archive log file is shown in Oracle 8.x and above using: select name,first_change#,next_change# - 1 from v$archived_log;
Or, for Oracle 7.x, using: select name,low_change#,high_change# from v$log_history;
For the online redo logs, you could use "select max(first_change# - 1) from v$log;" to find the SCN to use for applying all but the current online redo log file.
After an incomplete recovery, you must open the database with the RESETLOGS option, as shown below, and then immediately shut down the database and make a BACKUP, since the old backup is unuseable after RESETLOGS is used.
SQL> alter database open resetlogs;
Basic Recover Datafile Recovery Option
The Recover Datafile option is used to recover the specified datafile up to the point of failure, synchronizing it with the other datafiles (complete recovery only). Recover Datafile is performed either from the MOUNT state (after a shutdown), with the datafile ONLINE or OFFLINE, or from the OPEN state, with the datafile OFFLINE. A bad datafile must be taken offline before the database can be opened. Since the SYSTEM tablespace CANNOT be taken offline, you cannot use Recover Datafile from the OPEN state on SYSTEM.
The basic steps for the MOUNT state are: restore the datafile, mount, recover datafile (using automatic here instead of autorecovery), online the datafile if needed, and open, as shown below:
$ cp -p /u03/oradata/prod1/devl_PROD_01.dbf /u03/oradata/PROD
$ sqlplus "/ as sysdba"
SQL> startup mount
SQL> recover automatic datafile '/u03/oradata/PROD/devl_PROD_01.dbf';
SQL> select * from v$datafile;
SQL> alter database datafile '/u03/oradata/PROD/devl_PROD_01.dbf' online;
SQL> alter database open;
The basic steps for Recover Datafile from the OPEN state (except for SYSTEM) while the rest of the database is still up and running, are: offline the datafile, restore the datafile, recover datafile, and online the datafile, as shown below:
$ sqlplus "/ as sysdba"
SQL> alter database datafile '/u03/oradata/PROD/devl_PROD_01.dbf' offline;
SQL> !cp -p /u03/oradata/prod1/devl_PROD_01.dbf /u03/oradata/PROD
SQL> recover automatic datafile '/u03/oradata/PROD/devl_PROD_01.dbf';
SQL> alter database datafile '/u03/oradata/PROD/devl_PROD_01.dbf' online;
Doing a Recover Datafile from the OPEN state (except for SYSTEM) can also be used to bring up your database for use before doing a recovery. The basic steps are: mount, offline the datafile, open, restore the datafile, recover datafile, and online the datafile, as shown below:
$ sqlplus "/ as sysdba"
SQL> startup mount
SQL> alter database datafile '/u03/oradata/PROD/devl_PROD_01.dbf' offline;
SQL> alter database open;
The database is now available to the users, except for that datafile.
SQL> !cp -p /u03/oradata/prod1/devl_PROD_01.dbf /u03/oradata/PROD
SQL> recover automatic datafile '/u03/oradata/PROD/devl_PROD_01.dbf';
SQL> alter database datafile '/u03/oradata/PROD/devl_PROD_01.dbf' online;
You can still access data in tables in a tablespace which has an offlined datafile, just so long as the data is in one of the other datafiles of the tablespace and the table header is not in the offlined datafile.
Basic Recover Tablespace Recovery Option
The Recover Tablespace option is used to recover all datafiles needing recovery in a tablespace up to the point of failure, synchronizing them with the other datafiles (complete recovery only). Recover Tablespace is performed from the OPEN state after taking the tablespace OFFLINE (which brings its datafiles offline). Since the SYSTEM tablespace CANNOT be taken offline, you cannot use Recover Tablespace on SYSTEM. The basic steps while the database is open are: offline the tablespace, restore the datafiles, recover tablespace, and online the tablespace, as shown below:
$ sqlplus "/ as sysdba"
SQL> alter tablespace development offline immediate;
Using Immediate rolls back currently pending transactions.
SQL> !cp -p /u03/oradata/prod1/devl_PROD* /u03/oradata/PROD
SQL> recover automatic tablespace development;
SQL> alter tablespace development online;
Doing a Recover Tablespace after opening the database can be used to bring up your database for use before doing a recovery. The basic steps are: mount, offline the bad datafiles, open, offline the tablespace, restore the datafiles, recover tablespace, and online the tablespace, as shown below:
$ sqlplus "/ as sysdba"
SQL> startup mount
SQL> alter database datafile '/u03/oradata/PROD/devl_PROD_01.dbf' offline;
SQL> alter database open;
SQL> alter tablespace development offline;
SQL> !cp -p /u03/oradata/prod1/devl_PROD* /u03/oradata/PROD
SQL> recover automatic tablespace development;
SQL> alter tablespace development online;
If a write error occurs on one of the datafiles when you offline a tablespace, use "alter tablespace tsname offline temporary;". Then, you can run a recovery on the datafile. If all datafiles in an offlined tablespace have write errors, use "alter tablespace tsname offline immediate;". Then, you can run a recovery on the tablespace.
9. Archivelogs Disk Volume Filled Up
Symptoms: If the disk containing the directory for the archive log files fills up, Oracle will stop all transactions until the archiver process is able to continue writing to that directory. Doing a "df -k" shows that there is no room left for more archivelogs in that directory's disk. Current users' sessions will freeze, and users trying to log on will get "ERROR: ORA-00257: archiver error. Connect internal only, until freed.", since the archiver process is still waiting to archive that redo log group. (Note: Use the oerr utility in UNIX to get the error description, such as "oerr ora 257".) To verify this conclusion:
$ sqlplus "/ as sysdba"
SQL> select value from v$parameter where name = 'background_dump_dest';
Shows the pathname of the bdump directory.
SQL> !tail -200 /u00/oracle/admin/PROD/bdump/alert_PROD.log
Shows "ORA-00272: error writing archive log", indicating that the redo log group can't be written.
SQL> !ls -ltr /u00/oracle/admin/PROD/bdump/arch*
Get the latest archiver trace file in the bdump directory, such as arch_13106.trc (the last one listed).
SQL> !cat /u00/oracle/admin/PROD/bdump/arch_13106.trc
Also shows "ORA-00272: error writing archive log".
Action: You will need to free up some space on that disk volume for Oracle to continue, either by moving files off of that volume, or, more likely, by deleting old archive log files out of that directory which you have already backed up to tape. You must not delete archive log files that have not been backed up, since you wouldn't be able to recover transactions in those files if your database crashes. The following script deletes the old log files earlier than the given number of days (24-hour periods). If you back up nightly, 1 day is the smallest number you should enter, or 3 on Monday's if you don't back up on weekends. You have to be logged in as userid oracle or as root to remove the archive log files.
# File: remove_old_logs.shl
echo "You must be logged in as user Oracle to run this script,"
echo "which removes all archivelog files older than X days."
echo "Enter number of days to keep: \c"
read DAYS_KP; export DAYS_KP
find /u01/oradata/PROD/archivelogs -name '*.arc' -mtime +$DAYS_KP -exec rm {} \;
find /u01/oradata/PROD/archivelogs -name '*.arc.gz' -mtime +$DAYS_KP -exec rm {} \;
echo "Results after deletions:"
du -k
df -k
If you are on a Windows NT-based system (including Windows XP and Windows 2000) or on Windows 2003 (added 6/13/06), I've written a similar script using DOS Batch commands, which you can download by clicking on this: remove_old_logs.bat. This script uses extended DOS commands in Windows NT and Windows 2003, which aren't available in Windows 98 and before. It defaults to Windows 2003, so, if you want to use it on Windows NT, change the osver variable in the script to NT instead of 2003.
If you can't free up any space on the archive log disk, you can redirect the archives to another disk until you are able to free up space on it, such as shown below. This redirection goes away if you restart your database, but you can make it permanent by updating the server parameter file (spfile), also shown below, or the init.ora file before restarting, according to which one of those that you are using. I haven't tried this, but just saw it on the web in a presentation by Chris Lawson on dbspecialists.com.
$ sqlplus "/ as sysdba"
SQL> alter system archive log start to '
SQL> alter system set log_archive_dest='
10. Loss of Control Files
Symptoms: May be none until you try to shutdown and startup the database. On shutdown, if the control files were deleted, you would get "ORA-00210: cannot open control file '/u03/oradata/PROD/ctrl_PROD_01.ctl'", or, if the control files were overwritten, you would get "ORA-00201: control file version incompatible with ORACLE version" along with their names. On startup for both cases, you would get "ORA-00205: error in identifying control file '/u03/oradata/PROD/ctrl_PROD_01.ctl', along with "ORA-07366: sfifi: invalid file, file does not have valid header block." if overwritten.
Action: If you have a recent up-to-date textual control file (no added datafiles) from "alter database backup controlfile to trace;" in your user_dump_dest directory, you can just edit out the control file header and run it:
$ sqlplus "/ as sysdba"
SQL> shutdown abort
SQL> !ls -ltr /u00/oracle/admin/PROD/udump/*.trc
Get the latest textual control file in user_dump_dest directory, such as prod_ora_31494.trc (the last one listed).
SQL> !vi /u00/oracle/admin/PROD/udump/prod_ora_31494.trc
Edit the textual control file and remove all lines before the STARTUP NOMOUNT line, remove the lines after ALTER DATABASE OPEN, and, for Oracle 9i and above, and change all comment lines to start with dashes. The vi commands to do these (for 9i+) are usually:
:1,/STARTUP NOMOUNT/-1d
:/ALTER DATABASE OPEN/+1,$d
:1,$s/^#/--/
:wq
SQL> @/u00/oracle/admin/PROD/udump/prod_ora_31494.trc
If you don't have a textual control file, you'll need to restore all datafiles and control files, but, not online redo log files, from the last backup and do a recovery "using backup controlfile":
$ sqlplus "/ as sysdba"
SQL> shutdown abort
At this point, restore ALL datafiles AND control files from the last backup, along with any archivelogs that are needed since that time, but, NOT the online redo log files.
SQL> connect / as sysdba
SQL> startup mount
SQL> recover automatic database using backup controlfile;
Then, AUTO on "ORA-00308: cannot open archived log ...".
SQL> alter database open resetlogs;
Immediately shut down the database and make a BACKUP, since the old backup is unuseable after RESETLOGS is used.
Note that you can combine the recovery options depending on what's needed. For example, I needed to recover up to a point in time before a datafile was added, so, I did the steps above, including deleting the added datafile after shutting down the database, and substituting the following recover command (you won't enter AUTO as above, and, when it's finished the partial recovery, it will show "Media recovery cancelled."):
SQL> recover automatic database using backup controlfile until time '2001-07-16:13:15:00';
No comments:
Post a Comment