Tuesday, December 9, 2014

How to know at which point you can open the database after a restore of the database data files

During a database recovery and or duplicate there are times when we try to open the database and we seem to get endless errors saying the database needs more recovery or the datafile backup must be restored from an older backup or my personal favorite, ORA-01547 recovery succeeded but open resetlogs would get errors ... These messages are the last thing we want to see during a recovery. What we want to see is the database open cleanly.


Example errors we might get:

RMAN> alter database open; RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of sql statement command at 08/14/2014 22:50:53 ORA-01113: file 1 needs media recovery ORA-01110: data file 1: '+DATA/BRKLYN12/DATAFILE/system.281.855613959' RMAN> alter database open resetlogs; RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of sql statement command at 08/14/2014 22:50:27 ORA-01139: RESETLOGS option only valid after an incomplete database recovery RMAN> recover database until scn 4938436; Starting recover at 14-AUG-14 using channel ORA_DISK_1 using channel ORA_DISK_2 using channel ORA_DISK_3 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of recover command at 08/14/2014 22:53:07 RMAN-06556: datafile 10 must be restored from backup older than SCN 4938436


We can use the following queries to determine when a database can be opened safely.


Listing 1 --------- select hxfil file#, d.name datafile_name , c.name container, fhsta fh_status, fhscn fh_scn, fhrba_seq fh_seq , fhafs fh_abs from x$kcvfh x, v$datafile d, v$containers c where d.file# = x.hxfil and d.con_id = c.con_id order by 5; Listing 2 --------- select CURRENT_SCN, CHECKPOINT_CHANGE#, CONTROLFILE_CHANGE# from v$database;


The x$kcvfh table is a fixed table in Oracle. The table naming stands for Kernel, Cache, recoVery, File, Header. The following describes the columns from the x$kcvfh table in the query under listing 1;

Column
Description
hxfil
Datafile number.
fhsta
Datafile status
0 = consistent, mounted
4 = online fuzzy
64 = fuzzy needs recovery
8192 = internal
8196 = online
*There may be more undocumented statuses.
fhscn
The SCN that the datafile is currently at.
fhrba_seq
The sequence number the datafile is currently at.
Fhabs
Absolute fuzzy SCN of the datafile.



Looking at the output of this query while the database is open; for the first two lines, file#7 and 5, the FH_STATUS is 0 as the datafiles are from the seed PDB, thus would be offline in a consistent state. The FH_SCN and FH_SEQ are for file#7 and 5 are also well behind the other datafiles as they would not have been check pointed since they were taken offline. All the other datafiles, except the CDB system datafile (file# 1), have a FH_STATUS of 4 which is online fuzzy. They all have the same FH_SCN and FH_SEQ, this would be the SCN and sequence number at the time of the last checkpoint. File# 1 is the CDB system datafile and will have a status of 8196 when the system datafile is online.

FILE# DATAFILE_NAME CONTAINER FH_STATUS FH_SCN FH_SEQ FH_ABS ----- -------------------------------------------------------------------------------- --------- ---------- -------- ------ ------ 7 +DATA/BRKLYN12/FF9AE3FCE11017B1E043010B010B757D/DATAFILE/sysaux.279.855530189 PDB$SEED 0 1731888 5 0 5 +DATA/BRKLYN12/FF9AE3FCE11017B1E043010B010B757D/DATAFILE/system.289.855530443 PDB$SEED 0 1731888 5 0 4 +DATA/BRKLYN12/DATAFILE/undotbs1.283.855529987 CDB$ROOT 4 4936562 11 0 6 +DATA/BRKLYN12/DATAFILE/users.285.855529971 CDB$ROOT 4 4936562 11 0 12 +DATA/BRKLYN12/FF9AFAB356271C2DE043010B010B8686/DATAFILE/sysaux.287.855529967 PDB_TEST2 4 4936562 11 0 13 +DATA/BRKLYN12/FF9AFAB356271C2DE043010B010B8686/DATAFILE/users.262.855530555 PDB_TEST2 4 4936562 11 0 1 +DATA/BRKLYN12/DATAFILE/system.280.855529981 CDB$ROOT 8196 4936562 11 0 9 +DATA/BRKLYN12/FF9AF3FCBE891BFCE043010B010B3A65/DATAFILE/sysaux.269.855530391 PDB_TEST1 4 4936562 11 0 10 +DATA/BRKLYN12/FF9AF3FCBE891BFCE043010B010B3A65/DATAFILE/users.261.855530557 PDB_TEST1 4 4936562 11 0 11 +DATA/BRKLYN12/FF9AFAB356271C2DE043010B010B8686/DATAFILE/system.288.855530459 PDB_TEST2 4 4936562 11 0 3 +DATA/BRKLYN12/DATAFILE/sysaux.281.855529969 CDB$ROOT 4 4936562 11 0 8 +DATA/BRKLYN12/FF9AF3FCBE891BFCE043010B010B3A65/DATAFILE/system.274.855530351 PDB_TEST1 4 4936562 11 0


The next query (listing 2) will be used to see the SCN information in the control file. The query shows the current SCN that the database is up to. The CURRENT_SCN will only be shown when the database is open, otherwise it will be null. The CHECKPOINT_CHANGE# is the SCN at the last checkpoint, which will match the datafile FH_SCN in listing 1. The column CONTROLFILE_CHANGE# will show the SCN that was last updated in the control file only and not check pointed to the datafiles.

select CURRENT_SCN,CHECKPOINT_CHANGE#, CONTROLFILE_CHANGE# from v$database; CURRENT_SCN CHECKPOINT_CHANGE# CONTROLFILE_CHANGE# ----------- ------------------ ------------------- 4936698 4936562 4936588


Worked example 1

For the first example I have taken an online backup of a small database while it was under a light load. Let’s imagine that we need to do a database restore and recovery for a particular reason, maybe as part of a cloning exercise or worse a media loss. After having restored the data files from the online backup via the rman restore database command, I can view the output from our queries in listing 1 and listing 2.

FILE# DATAFILE_NAME CONTAINER FH_STATUS FH_SCN FH_SEQ FH_ABS ----- -------------------------------------------------------------------------------- ---------- ---------- -------- ------ ------ 7 +DATA/BRKLYN12/FF9AE3FCE11017B1E043010B010B757D/DATAFILE/sysaux.280.855614283 PDB$SEED 0 1731888 5 0 5 +DATA/BRKLYN12/FF9AE3FCE11017B1E043010B010B757D/DATAFILE/system.285.855614639 PDB$SEED 0 1731888 5 0 4 +DATA/BRKLYN12/DATAFILE/undotbs1.261.855613961 CDB$ROOT 0 4936905 13 0 1 +DATA/BRKLYN12/DATAFILE/system.281.855613959 CDB$ROOT 8192 4936905 13 0 6 +DATA/BRKLYN12/DATAFILE/users.269.855613967 CDB$ROOT 0 4936906 13 0 3 +DATA/BRKLYN12/DATAFILE/sysaux.288.855613959 CDB$ROOT 0 4936906 13 0 12 +DATA/BRKLYN12/FF9AFAB356271C2DE043010B010B8686/DATAFILE/sysaux.274.855613959 PDB_TEST2 0 4936911 13 0 9 +DATA/BRKLYN12/FF9AF3FCBE891BFCE043010B010B3A65/DATAFILE/sysaux.287.855614571 PDB_TEST1 0 4937536 17 0 8 +DATA/BRKLYN12/FF9AF3FCBE891BFCE043010B010B3A65/DATAFILE/system.262.855614537 PDB_TEST1 0 4937655 17 0 11 +DATA/BRKLYN12/FF9AFAB356271C2DE043010B010B8686/DATAFILE/system.283.855614699 PDB_TEST2 0 4938379 17 0 13 +DATA/BRKLYN12/FF9AFAB356271C2DE043010B010B8686/DATAFILE/users.289.855614805 PDB_TEST2 0 4938436 17 0 10 +DATA/BRKLYN12/FF9AF3FCBE891BFCE043010B010B3A65/DATAFILE/users.279.855614813 PDB_TEST1 0 4938840 17 0 12 rows selected. CURRENT_SCN CHECKPOINT_CHANGE# CONTROLFILE_CHANGE# ----------- ------------------ ------------------- 0 4939271 4945162


We can see that we have several different FH_SCN values for the data files. This means that the data files were at different SCN when the backup was taken, which is expected for an online backup. Also note that the FH_STATUS are 0 for the normal data files and 8192 for the container system data file.
We must recover until at least the highest scn in the data file headers for Oracle to be able to safely open the database.

RMAN> recover database until scn 4938840; Starting recover at 14-AUG-14 using channel ORA_DISK_1 using channel ORA_DISK_2 using channel ORA_DISK_3 starting media recovery archived log for thread 1 with sequence 13 is already on disk as file +FRA/BRKLYN12/ARCHIVELOG/2014_08_14/thread_1_seq_13.326.855611653 ... archived log file name=+FRA/BRKLYN12/ARCHIVELOG/2014_08_14/thread_1_seq_15.323.855611897 thread=1 sequence=15 media recovery complete, elapsed time: 00:00:11 Finished recover at 14-AUG-14

I have now started a incomplete recovery.

FILE# DATAFILE_NAME CONTAINER FH_STATUS FH_SCN FH_SEQ FH_ABS ---------- -------------------------------------------------------------------------------- ---------- ---------- --------- -------- ------ 7 +DATA/BRKLYN12/FF9AE3FCE11017B1E043010B010B757D/DATAFILE/sysaux.280.855614283 PDB$SEED 0 1731888 5 0 5 +DATA/BRKLYN12/FF9AE3FCE11017B1E043010B010B757D/DATAFILE/system.285.855614639 PDB$SEED 0 1731888 5 0 4 +DATA/BRKLYN12/DATAFILE/undotbs1.261.855613961 CDB$ROOT 0 4938840 17 0 12 +DATA/BRKLYN12/FF9AFAB356271C2DE043010B010B8686/DATAFILE/sysaux.274.855613959 PDB_TEST2 0 4938840 17 0 6 +DATA/BRKLYN12/DATAFILE/users.269.855613967 CDB$ROOT 0 4938840 17 0 13 +DATA/BRKLYN12/FF9AFAB356271C2DE043010B010B8686/DATAFILE/users.289.855614805 PDB_TEST2 0 4938840 17 0 1 +DATA/BRKLYN12/DATAFILE/system.281.855613959 CDB$ROOT 8192 4938840 17 0 9 +DATA/BRKLYN12/FF9AF3FCBE891BFCE043010B010B3A65/DATAFILE/sysaux.287.855614571 PDB_TEST1 0 4938840 17 0 10 +DATA/BRKLYN12/FF9AF3FCBE891BFCE043010B010B3A65/DATAFILE/users.279.855614813 PDB_TEST1 0 4938840 17 0 11 +DATA/BRKLYN12/FF9AFAB356271C2DE043010B010B8686/DATAFILE/system.283.855614699 PDB_TEST2 0 4938840 17 0 3 +DATA/BRKLYN12/DATAFILE/sysaux.288.855613959 CDB$ROOT 0 4938840 17 0 8 +DATA/BRKLYN12/FF9AF3FCBE891BFCE043010B010B3A65/DATAFILE/system.262.855614537 PDB_TEST1 0 4938840 17 0 12 rows selected. CURRENT_SCN CHECKPOINT_CHANGE# CONTROLFILE_CHANGE# CONTROLFILE_TIME ----------- ------------------ ------------------- -------------------- 0 4939271 4945162 14-AUG-2014 22:25:31

All the FH_SCN are at the same value for the data files (except for the PDB$SEED which is expected as they have always been offline). We are now at a point we can open resetlogs for incomplete recovery.
We know we can’t do a open noresetlogs as we have not done a complete recovery and the data file scn's does not match the controlfiles version of the check point scn.

RMAN> alter database open noresetlogs; RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of sql statement command at 08/14/2014 23:10:59 ORA-01113: file 1 needs media recovery ORA-01110: data file 1: '+DATA/BRKLYN12/DATAFILE/system.281.855613959'

In this example I will straight away open resetlogs to perform an incomplete recovery. I may have just been interested in getting the database open at the first scn possible and not interested in rolling forward.

RMAN> alter database open resetlogs; Statement processed

Instead of opening the database with resetlogs. We could continue to roll forward to a higher scn / sequence log or point in time. In circumstances where we wanted to recovery as much data as possible we could continue to a complete recovery using all the available archivelogs and possible redo logs.

Worked example 2

In this example I have taken a full online backup when the database was under a heavy load. After having restored the data files from the new online backup via the rman restore database command, I can again view the output from our queries in listing 1 and listing 2.

FILE# DATAFILE_NAME CONTAINER FH_STATUS FH_SCN FH_SEQ FH_ABS ---------- -------------------------------------------------------------------------------- ---------- ---------- --------- ------- ------- 7 +DATA/BRKLYN12/FF9AE3FCE11017B1E043010B010B757D/DATAFILE/sysaux.269.856138625 PDB$SEED 0 1731888 5 0 5 +DATA/BRKLYN12/FF9AE3FCE11017B1E043010B010B757D/DATAFILE/system.285.856138625 PDB$SEED 0 1731888 5 0 3 +DATA/BRKLYN12/DATAFILE/sysaux.283.856137879 CDB$ROOT 0 5098077 31 0 6 +DATA/BRKLYN12/DATAFILE/users.280.856137881 CDB$ROOT 0 5098077 31 0 1 +DATA/BRKLYN12/DATAFILE/system.261.856137879 CDB$ROOT 8192 5098077 31 0 4 +DATA/BRKLYN12/DATAFILE/undotbs1.274.856137879 CDB$ROOT 0 5098077 31 0 11 +DATA/BRKLYN12/FF9AFAB356271C2DE043010B010B8686/DATAFILE/system.289.856138253 PDB_TEST2 0 5161440 33 0 13 +DATA/BRKLYN12/FF9AFAB356271C2DE043010B010B8686/DATAFILE/users.287.856138255 PDB_TEST2 64 5161440 33 5161641 12 +DATA/BRKLYN12/FF9AFAB356271C2DE043010B010B8686/DATAFILE/sysaux.279.856138253 PDB_TEST2 0 5161440 33 0 10 +DATA/BRKLYN12/FF9AF3FCBE891BFCE043010B010B3A65/DATAFILE/users.281.856138439 PDB_TEST1 64 5207314 35 5207451 9 +DATA/BRKLYN12/FF9AF3FCBE891BFCE043010B010B3A65/DATAFILE/sysaux.288.856138439 PDB_TEST1 0 5207314 35 0 8 +DATA/BRKLYN12/FF9AF3FCBE891BFCE043010B010B3A65/DATAFILE/system.262.856138439 PDB_TEST1 0 5207314 35 0 12 rows selected. CURRENT_SCN CHECKPOINT_CHANGE# CONTROLFILE_CHANGE# CONTROLFILE_TIME ----------- ------------------ ------------------- -------------------- 0 7093713 7119216 21-AUG-2014 00:01:25


After this restore we can see the the FH_STATUS for two of the datafiles (file# 13 and 10) has a value of 64. Which means the data file is still in a fuzzy state. We can try to recover the database to the highest FH_SCN like we did on the previous example...

RMAN> recover database until scn 5207314; media recovery complete, elapsed time: 00:00:09 Finished recover at 21-AUG-14 RMAN> alter database open resetlogs; RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of sql statement command at 08/21/2014 00:23:25 ORA-01194: file 10 needs more recovery to be consistent ORA-01110: data file 10: '+DATA/BRKLYN12/FF9AF3FCBE891BFCE043010B010B3A65/DATAFILE/users.281.856138439'


We get the above error saying the file 10 needs more recovery to be in a consistent state. This is where the FH_ABS column in listing 1 come into play. File 10 needs to be recovered until the absolute fuzzy system change number (5207451). File 13 would also need to be recovered its absolute fuzzy scn (5161641), however as this is a scn before File 13, this has happened already when we did the recover command above.

We will now recover to scn 5207451 and open the database resetlogs.

RMAN> recover database until scn 5207451; Starting recover at 21-AUG-14 using channel ORA_DISK_1 starting media recovery archived log for thread 1 with sequence 35 is already on disk as file +FRA/BRKLYN12/ARCHIVELOG/2014_08_20/thread_1_seq_35.286.856136725 archived log file name=+FRA/BRKLYN12/ARCHIVELOG/2014_08_20/thread_1_seq_35.286.856136725 thread=1 sequence=35 media recovery complete, elapsed time: 00:00:01 Finished recover at 21-AUG-14 RMAN> alter database open resetlogs; Statement processed


******************************************
keywords: recovery scn absolute
******************************************
rdbms version: 12c
******************************************

No comments: