本博文出自51CTO博客客居天涯博主,有任何问题请进入博主页面互动讨论!
博文地址:http://tiany.blog.51cto.com/513694/1617646
Oracle DG(Dataguard)是目前比较常见的数据库HA配置策略。通过实现Physical Standby和Logical Standby,可以实现数据冗余容错机制。防止在主库出现严重故障,不能支持服务的时候,没有快速的后备支持环境。
在DG中,switchover和failover是两个重要的概念,也是DG实现的核心。两者共同点都是Primary和Standby角色切换,差异在于Planned和UnPlanned之分。Switchover关键点在于Planned,这个切换动作是在运维机构规划范围内的动作。比如,进行定期系统软硬件升级、设备维修等动作。而Failover是真正出现严重系统故障,如数据库宕机、软硬件故障导致的Primary不能支持服务,从而进行的切换动作。
根据不同的DG配置,switchover和failover也是有差异的。理论上,Switchover是不会造成数据丢失的,Primary在切换之后也是在DG配置环境中,作为Standby存在的。但是Failover则不同,除了运行在***保护(Maximum Protection)模式下,Primary突发的故障可能引起一部分Redo Log不能及时的传递到Standby端,切换之后很可能有数据损失的情况。更重要的是,Primary端在发生Failover之后,是不能够直接加入回DG配置的!也就是说,Failover之后,Primary实际上就是被“抛出”了DG环境。
那么,有什么方法实现Primary回到原有的环境呢?这个问题的困难在于保持Primary和Standby一致。在正常情况下,Primary和Standby之间是关联同步的,即使发生了Switchover,也在可控情况下。Failover过程中有数据的缺失,还有Primary修复问题。在目前流行版本(11g)中,有三个方法:
- 环境重建:一种最简单的方法就是直接删除原来的Primary库,引用DG重建方法,重新搭建Standby端;
- RMAN备份恢复:如果Primary端保留过一份Failover之前的备份,则可以强制原来的Primary端恢复到进行Failover的时间点,之后作为Standby接收当前Primary的redo log传递,应用后可以跟上进度;
- Flashback Database恢复:Flashback技术是作为传统备份还原技术的补充,提供了更加便捷的恢复策略。使用flashback,可以将数据库恢复到failover之前的时间点。之后的过程和RMAN备份恢复策略相同;
案例分析:
一、在主库端模拟数据库意外宕机
- 7scott@bjdb>conn /as sysdba
- Connected.
- sys@bjdb>alter system switch logfile;
- System altered.
- sys@bjdb>shutdown abort
- ORACLE instance shut down.
二、在备库端
1、查看切换信息
- 5sys@shdb>select name,database_role,switchover_status from v$database;
- NAME DATABASE_ROLE SWITCHOVER_STATUS
- --------- ---------------- --------------------
- TESTDB12 PHYSICAL STANDBY NOT ALLOWED
- 可以看到此时备库处于无法切换状态
2、直接切换
- sys@shdb>alter database commit to switchover to primary;
- alert_log:(告警日志)
- Fatal NI connect error 12514, connecting to:
- (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=shsrv)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=shdb)(CID=(PROGRAM=oracle)(HOST=bjsrv)(USER=oracle))))
- VERSION INFORMATION:
- TNS for Linux: Version 11.2.0.3.0 - Production
- TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
- Time: 04-MAR-2015 21:25:13
- Tracing not turned on.
- Tns error struct:
- ns main err code: 12564
- TNS-12564: TNS:connection refused
- ns secondary err code: 0
- nt main err code: 0
- nt secondary err code: 0
- nt OS err code: 0
- Error 12514 received logging on to the standby
- FAL[client, MRP0]: Error 12514 connecting to shdb for fetching gap sequence
- Wed Mar 04 21:26:00 2015
- alter database commit to switchover to primary
- ALTER DATABASE SWITCHOVER TO PRIMARY (TestDB12)
- Maximum wait for role transition is 15 minutes.
- Switchover: Media recovery is still active
- Database not available for switchover
- End-Of-REDO archived log file has not been recovered
- Database not available for switchover
- End-Of-REDO archived log file has not been recovered
- Database not available for switchover
3、关闭standby MPR进程
- 35sys@shdb>ALTER DATABASE RECOVER managed standby database finish;
- ALTER DATABASE RECOVER managed standby database finish
- Terminal Recovery: request posted (TestDB12)
- Wed Mar 04 21:34:34 2015
- Begin: Standby Redo Logfile archival
- End: Standby Redo Logfile archival
- Terminal Recovery timestamp is '03/04/2015 21:34:34'
- Terminal Recovery: applying standby redo logs.
- Terminal Recovery: thread 1 seq# 34 redo required
- Media Recovery Waiting for thread 1 sequence 34
- Terminal Recovery: End-Of-Redo log allocation
- Terminal Recovery: standby redo logfile 4 created '/dsk4/arch_bj/arch_1_0_820054583.log'
- This standby redo logfile is being created as part of the
- failover operation. This standby redo logfile should be
- deleted after the switchover to primary operation completes.
- Media Recovery Log /dsk4/arch_bj/arch_1_0_820054583.log
- Terminal Recovery: log 4 reserved for thread 1 sequence 34
- Recovery of Online Redo Log: Thread 1 Group 4 Seq 34 Reading mem 0
- Mem# 0: /dsk4/arch_bj/arch_1_0_820054583.log
- Identified End-Of-Redo (failover) for thread 1 sequence 34 at SCN 0xffff.ffffffff
- Incomplete Recovery applied until change 1234252 time 03/04/2015 21:23:43
- MRP0: Media Recovery Complete (TestDB12)
- Terminal Recovery: successful completion
- Wed Mar 04 21:34:35 2015
- ARCH: Archival stopped, error occurred. Will continue retrying
- ORACLE Instance TestDB12 - Archival Error
- ORA-16014: log 4 sequence# 34 not archived, no available destinations
- ORA-00312: online log 4 thread 1: '/dsk4/arch_bj/arch_1_0_820054583.log'
- Forcing ARSCN to IRSCN for TR 0:1234252
- Attempt to set limbo arscn 0:1234252 irscn 0:1234252
- Resetting standby activation ID 2865247982 (0xaac836ee)
- MRP0: Background Media Recovery process shutdown (TestDB12)
- Terminal Recovery: completion detected (TestDB12)
- Completed: ALTER DATABASE RECOVER managed standby database finish
4、切换数据库到Primary
- sys@shdb>select status from v$instance;
- STATUS
- ------------
- OPEN
- sys@shdb>select name,database_role,switchover_status from v$database;
- NAME DATABASE_ROLE SWITCHOVER_STATUS
- --------- ---------------- --------------------
- TESTDB12 PHYSICAL STANDBY TO PRIMARY
- sys@shdb>alter database commit to switchover to primary;
- Database altered.
- sys@shdb>alter database open;
- Database altered.
- 告警日志:
- alter database commit to switchover to primary
- ALTER DATABASE SWITCHOVER TO PRIMARY (TestDB12)
- Maximum wait for role transition is 15 minutes.
- All dispatchers and shared servers shutdown
- CLOSE: killing server sessions.
- CLOSE: all sessions shutdown successfully.
- Wed Mar 04 21:35:47 2015
- SMON: disabling cache recovery
- Backup controlfile written to trace file /u01/app/oracle/diag/rdbms/bjdb/TestDB12/trace/TestDB12_ora_3146.trc
- Standby terminal recovery start SCN: 1234251
- RESETLOGS after incomplete recovery UNTIL CHANGE 1234252
- Online log /dsk2/oradata/bjdb/redo01b.log: Thread 1 Group 1 was previously cleared
- Online log /dsk1/oradata/bjdb/redo01a.log: Thread 1 Group 1 was previously cleared
- Online log /dsk2/oradata/bjdb/redo02b.log: Thread 1 Group 2 was previously cleared
- Online log /dsk1/oradata/bjdb/redo02a.log: Thread 1 Group 2 was previously cleared
- Online log /dsk2/oradata/bjdb/redo03b.log: Thread 1 Group 3 was previously cleared
- Online log /dsk1/oradata/bjdb/redo03a.log: Thread 1 Group 3 was previously cleared
- Standby became primary SCN: 1234250
- Wed Mar 04 21:35:47 2015
- Setting recovery target incarnation to 3
- AUDIT_TRAIL initialization parameter is changed back to its original value as specified in the parameter file.
- Switchover: Complete - Database mounted as primary
- Completed: alter database commit to switchover to primary
三、原主库修复后,开机
- sys@bjdb>startup
- ORACLE instance started.
- Total System Global Area 442601472 bytes
- Fixed Size 2229184 bytes
- Variable Size 281021504 bytes
- Database Buffers 155189248 bytes
- Redo Buffers 4161536 bytes
- Database mounted.
- Database opened.
- sys@bjdb>select name,database_role,switchover_status from v$database;
- NAME DATABASE_ROLE SWITCHOVER_STATUS
- --------- ---------------- --------------------
- TESTDB12 PRIMARY FAILED DESTINATION
现在原来的主库被修复后,整个DataGuara架构已经被破坏了,所以必须把原来的主库构建成新的备库,重新恢复DataGuard的环境。
四、重新构建DataGuard
- 1sys@bjdb>select name,database_role from v$database;
NAME DATABASE_ROLE
-------------------------------------------------- ----------------
TESTDB12 PHYSICAL STANDBY