本文转载自微信公众号「数据和云」,作者AIQ。转载本文请联系数据和云公众号。
故障检查
检查归档同步情况
一、查看数据库的情
- select database_role,flashback_on,open_mode,current_scn from v$database
- DATABASE_ROLE FLASHBACK_ON OPEN_MODE CURRENT_SCN
- ---------------- ------------------ -------------------- ---------------
- PHYSICAL STANDBY NO READ ONLY WITH APPLY 16657544972059
二、查看归档的最大线程与最大接收的归档情况。
- select thread#,max(sequence#) from v$archived_log group by thread#;
- 生产库:
- SQL> select thread#,max(sequence#) from v$archived_log group by thread#;
- THREAD# MAX(SEQUENCE#)
- ---------- --------------
- 1 136973
- 2 132693
- 4 149599
- 3 133277
- --DG库
- SYS@hisnewdb> select thread#,max(sequence#) from v$archived_log group by thread#;
- THREAD# MAX(SEQUENCE#)
- ---------- --------------
- 1 136973
- 2 132693
- 4 149598
- 3 133277
- 可见4个节点归档是都有会过来的,sequence都能对得上。
三、查是否存在GAP
- select * from v$archived_gap;
日志应用情况
查看延时的应用情况
- select name ,value,time_computed from v$dataguard_stats where rownum<33;
- NAME VALUE TIME_COMPUTED
- -------------------------------- ---------------------------------------------------------------- ------------------------------
- transport lag +11 06:41:27 03/04/2021 16:41:20
- apply lag +11 06:41:27 03/04/2021 16:41:20
- apply finish time +00 04:23:39.868 03/04/2021 16:41:20
- estimated startup time 37 03/04/2021 16:41:20
- 可看到apply lag的应用已经延时11天6小时了。
- apply finish time应用最快的恢复时长为4小时。
恢复思路
应用日志
- alter database recover managed standby databse cancel; --取消应用日志
- alter database open read only; --打开只读库
- alter database recover managed standby ;
- alter database recover managed standby disconnect from session; -- 后台应用,建议上面命令,放前台应用。
归档还保留或者GAP较少的情况
1)归档还在主库
方法一:
首先通过备库sql查出相应的 node[thread#] 和归档位置 name:
- select name from v$archived_log where sequence# between &1 and &2 and thread# = &3;
传输上面文件到备库归档位置 archive log list
- #1.asmcd命令 本地环境与asm存储cp自由。
- cp arch*.pdf /home/oracle/1.dbf
备库上注册归档文件alter database register logfile '归档文件绝对路径' 或rman注册日志catalog start with '';
应用日志,查看select * from V$ARCHIVE_GAP;,监视是否还存在其它的GAP出现。如出现,如上面步骤循环操作。
方法二:
在配置 fal_client=${备库的监听} 和 fal_server=${主库的监听}
直接应用日志,由备库寻找日志。
2)归档已在备库
应用日志
归档已经被删除或GAP较多的情况
查看归档所在的位置
alert.log日志:
- 提供等thread 线程4的序列为148164的归档,获取的序号有148164-148165
- control_keep_record_keep_time是控制文件的重用记录数据。提示在这个记录天数内没找到归档文件,建议设置更长些天数。以便GAP找到缺失的日志。
- 默认7天,1-365天范围。
- 记录的是归档日志,各种备份记录。
- 不记录数据文件,表空间,redo thread记录。除非被drop,否则不会重用这部分记录
- started logmerger process
- Thu Mar 04 16:19:53 2021
- Managed Standby Recovery not using Real Time Apply
- Parallel Media Recovery started with 16 slaves
- Waiting for all non-current ORLs to be archived...
- All non-current ORLs have been archived.
- Media Recovery Waiting for thread 4 sequence 148164
- Fetching gap sequence in thread 4, gap sequence 148164-148165
- Thu Mar 04 16:19:57 2021
- Completed: alter database recover managed standby database disconnect from session
- ----------
- Thu Mar 04 16:21:50 2021
- FAL[client]: Failed to request gap sequence
- GAP - thread 4 sequence 148164-148165
- DBID 3828421454 branch 984679630
- FAL[client]: All defined FAL servers have been attempted.
- ------------------------------------------------------------
- Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization
- parameter is defined to a value that's sufficiently large
- enough to maintain adequate log switch information to resolve
- archivelog gaps.
- ------------------------------------------------------------
- Thu Mar 04 16:22:25 2021
- RFS[18]: Selected log 29 for thread 4 sequence 149600 dbid -466545842 branch 984679630
- Thu Mar 04 16:22:25 2021
1)找到当前的最小SCN
对比数据文件最后检查点的scn,数据文件头部检查点的scn,缺失归档的对应scn(下个日志文件第一个更改号),当前数据库的scn:
- select thread#,low_sequence#,high_sequence# from v$archive_gap;
- col datafile_scn for 999999999999999
- col DATAFILE_HEADER_SCN for 999999999999999
- col current_scn for 999999999999999
- col next_change# for 999999999999999
- select ( select min(d.checkpoint_change#) from v$datafile d ) datafile_scn ,
- ( select min(d.checkpoint_change#) from v$datafile_header d where rownum=1) datafile_header_scn,
- (select current_scn from v$database) current_scn,
- (select next_change# from v$archived_log where sequence#=148164 and resetlogs_change# = (select d.resetlogs_change# from v$database d ) and rownum=1 ) next_change#
- from dual;
- DATAFILE_SCN DATAFILE_HEADER_SCN CURRENT_SCN NEXT_CHANGE#
- ---------------- ------------------- ---------------- ----------------
- 16657544969028 16657544972060 16657544972059
取上面最小的scn作为增量备份的SCN
2)主库做SCN增量备份
停用备库的日志应用
- alter database recover managed standby database cancel;
rman备份
- 切换日志
- 切记备份当前控制文件
- 增量scn备份
- run {
- allocate channel c1 device type disk;
- allocate channel c2 device type disk;
- allocate channel c3 device type disk;
- allocate channel c4 device type disk;
- allocate channel c5 device type disk;
- allocate channel c6 device type disk;
- CONFIGURE DEVICE TYPE DISK PARALLELISM 6 BACKUP TYPE TO BACKUPSET;
- backup as compressed backupset current controlfile for standby format '/home/oracle/backup/backup_ctl_%U.rman';
- backup as compressed backupset incremental from scn 16657544969028 database format '/home/oracle/backup/backup_%d_%s_%c_%U_%T.rman' include
- current controlfile for standby filesperset 10 tag 'forsdb_16657544969028_0304';
- release channel c1 ;
- release channel c2 ;
- release channel c3 ;
- release channel c4 ;
- release channel c5 ;
- release channel c6 ;
传输备份文件到备库
- scp -rp /home/oracle/backup/backup host2:/home/oracle
介质恢复备库
- 查出控制文件的绝对目录位置,后停备库
- 启动到nomount
- 恢复控制文件
- 启动到mount
- 恢复数据文件
- 检查rman进展
- select name from v$controlfile;
- shu immediate;
- startup nomount;
- rman target / <<eof
- restore standby controlfile from '/home/oracle/backup/backup_ctl_%U.rman';
- alter database mount;
- eof
- 如果没有单独备份standby controlfile,就一个一个文件来测试恢复standby controflie
- restore standby controlfile to '/oradata/hisnewdb/control01.ctl' from '/home/oracle/backup/某个文件';
- 如果文件太多,可以先rman注册文件后,再恢复控制文件。
- 要找开备库mount状态才能注册
- rmant target / <<eof
- startup mount;
- catalog start with '/home/oracle/backup/';
- list backup of controlfile;
- restore standby controlfile automatic;
- eof
- #大概是这样。restore standby controlfile automatic;如果不通,就采用上面list的信息,找到具体含有standby controflile的备份文件,再通过restore standby controfile from '';来恢复 。
- catalog start with '/home/oracle/backup/';
- recover database noredo;
查看rman的恢复进展:
- set line 9999
- select sid,serial#,opname,round(sofar/totalwork*100) completed,trunc(elapsed_seconds/60) elapsed ,trunc(time_remaining/60) remaining,context ,target,sofar,totalwork
- from v$session_longops
- where opname like 'RMAN%' and opname not like '%aggregate%' and totalwork!=0 and sofar<>totalwork;
应用日志
检查standby redo files是否存在:
- select * from v$standby_log;
注册standby redolog files
- -- 添加单个文件:
- alter database add standby logfile group {组号} 'standby redo logs files 绝对目录文件';
- -- 添加多个standby redologs file
- alter database add standby logfile group {组号} ('standby redo logs file 1','logfiles2');
应用日志
- alter database recover managed standby database cancel ;
- startup mount;
- alter database open read only;
- select open_mode,status,protection_level,protection_mode from v$database ;
- --前台应用日志
- alter database recover managed standby database ;
- -- 8 parallel 后台应用日志
- alter database recover managed standby database parallel 8 disconnect from session;
检查应用日志的情况
检查各个线程thread#的最大应用日志的序列,与主库进行对比。
- select thread#,max(sequence#) from v$archived_log where applied='YES' group by thread#;