//
you're reading...
Data Guard

Data Guard 11g’s Automatic Gap Resolution and ORA-16401 Error

A log file gap occurs whenever a primary database continues to commit transactions while the LNS process has ceased transmitting redo to the standby database. This can happen when the network or the standby database is down and your Data Guard protection mode is not Maximum Protection. The primary database’s LGWR process continues writing to the current ORL (online redo log), fills it, and then switch to a new ORL while an archive (ARCH) process archives the completed ORL locally. This cycle can repeat itself many times on a busy system before the connection between the primary and the standby is restored, resulting a large log file gap.

Data Guard uses an ARCH process on the primary database to continuously ping the standby database during the outage to determine its status. When the communication with the standby is restored, the ARCH ping process queries the standby control file (via its RFS process) to determine the last complete log file that the standby received from the primary database. Data Guard determines which log files are required to resynchronize the standby database and immediately begins transmitting them using additional ARCH processes. At the very next log switch, the LNS will attempt and succeed in making a connection to the standby database and will begin transmitting current redo while the ARCH processes resolve the gap in the background. Once the standby apply process is able to catch up the current redo records, the apply process automatically transitions out of reading from archived redo logs and into reading from the current SRL (Standby Redo Log).

The performance of automatic gap resolution is critical. The primary must be able to transmit data at a much faster pace than its normal redo generation rate if the standby is to have any hope of catching up. The Data Guard architecture enables gaps to be resolved quickly using multiple background ARCH processes, while at the same time the LNS process is conducting normal SYNC or ASYNC transmission of the current redo stream.

FAL is Data Guard’s capability of Fetch Archive Log. It is only used on a physical standby database. When a physical standby database finds a problem of missing log file, it can go and fetch it from one of the databases (primary or standby) in the Data Guard configuration. This is also referred as reactive gap resolution. However nowadays most of gap requests from a physical or logical standby database can be handled by the ping process of the primary database as mentioned above.

FAL_SERVER parameter is defined as a list of TNS entries that exist on the standby server and point to the primary and/or any of the standby databases.

FAL_CLIENT is the TNS entry of the gap-requesting database that the receiver of the request needs so that the archive process on the FAL server can connect back to the sender of the request. FAL_CLIENT is optional. Oracle Support recommends not to set it. Instead DB_UNIQUE_NAME of the sender of the request is used to match that of a LOG_ARCHIVE_DEST_n.

However if you do set FAL_CLIENT in your standby database, you need to make sure the TNS entry you use is the same as that used in LOG_ARCHIVE_DEST_n of the FAL server. Otherwise you will receive ORA-16401 error. Following example demonstrates this case.

TNS entry for the primary database:

PSDL1I_sitka=
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = sitka)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = psdl1i.sitka)
)
)

The standby TNS entry:
PSDL1I_sanfords=
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = sanfords)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME=PSD_STANDBY.sanfords)
)
)

Both these two entries are in tnsnames.ora file on both database servers, primary and standby. On the standby server, there is also a fal_client entry, which points to the same database as the standby TNS entry:

PSDL1I_fal_client =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = sanfords)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME=PSD_STANDBY.sanfords)
)
)

The FAL parameters in the standby database are set as:

fal_client  = PSDL1I_fal_client
fal_server = PSDL1I_sitka

When there is a redo gap, primary shipped particular log seq# to a destination pointed by:

log_archive_dest_2=’ service=”PSDL1I_sanfords”, ASYNC NOAFFIRM db_unique_name=”PSD_STANDBY” valid_for=(all_logfiles,primary_role)’

The standby can do its own GAP analysis and can request logs from the FAL_SERVER. The FAL server, in this case the primary, will try to honor that request. When standby attempts to resolve a gap, primary gets a different fal_client=PSDL1I_fal_client. In fact PSDL1I_fal_client  and PSDL1I_sanfords point to the same standby.

In standby alert log file we get below error:

Thu Dec 29 13:40:47 2011
ARC2: Archive log rejected (thread 1 sequence 188) at host ‘PSDL1I_sanfords’
FAL[server, ARC2]: FAL archive failed, see trace file.
ARCH: FAL archive failed. Archiver continuing
ORACLE Instance PSDL1I – Archival Error. Archiver continuing.

In ARCH trace file we see below error message:

*** 2011-12-29 13:40:47.903
Error 16401 creating standby archive log file at host ‘PSDL1I_sanfords’
kcrrwkx: unknown error:16401

To fix this problem we need to change FAL_CLIENT of the standby to PSDL1I_sanfords, the same TNS entry as the one used in the primary LOG_ARCHIVE_DEST_2. Now the FAL request from the standby is the same as the FAL request created by regular LADn redo shipping, and we will not create the second FRB (FAL Request Block). Consequently ORA-16401 error is avoided.

Advertisements

About Hong Wang

I am an Oracle DBA, working in Oracle database since version 7.3. Worked in both application development and production support. Many experiences in real world complicate problems and database projects. This blog serves as a collections of notes I write on my database studies as well as issues I encountered/solved. Your comments are well welcome.

Discussion

5 thoughts on “Data Guard 11g’s Automatic Gap Resolution and ORA-16401 Error

  1. Do you know if the TNSNAMES entry comparison between archive_dest and fal_client are case sensitive?

    Posted by Stacy | September 10, 2012, 2:31 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: