一、场景描述
公司某工程师执行db.giveget_card.drop(),误将线上表删除。
幸好每天都有做备份,这个时候就体现了备份的重要性了,哈哈哈。。。
二、模拟故障过程
备份数据大小:
- rs_test01:PRIMARY> use ycsb
- switched to db ycsb
- rs_test01:PRIMARY> db.giveget_card.count();
- 3173391
删除之前,此表有更新。
- rs_test01:PRIMARY> db.giveget_card.insert({id:1});
- WriteResult({ "nInserted" : 1 })
- rs_test01:PRIMARY> db.giveget_card.insert({id:2});
- WriteResult({ "nInserted" : 1 })
- rs_test01:PRIMARY> db.giveget_card.insert({id:3});
- WriteResult({ "nInserted" : 1 })
- rs_test01:PRIMARY> db.giveget_card.insert({id:4});
- WriteResult({ "nInserted" : 1 })
其他表也有更新操作。
- rs_test01:PRIMARY> db.tab.find();
- { "_id" : ObjectId("59354ba202d9a99ab2f879c6"), "name" : "a" }
- { "_id" : ObjectId("59354ba602d9a99ab2f879c7"), "name" : "b" }
- { "_id" : ObjectId("59354ba802d9a99ab2f879c8"), "name" : "c" }
- { "_id" : ObjectId("59354baa02d9a99ab2f879c9"), "name" : "d" }
删除操作之后,此表和其他表都有更新。
- rs_test01:PRIMARY> db.giveget_card.find();
- { "_id" : ObjectId("59354c28d905432aeaccd53c"), "id" : 5 }
- { "_id" : ObjectId("59354c2bd905432aeaccd53d"), "id" : 6 }
- rs_test01:PRIMARY> db.tab.find();
- { "_id" : ObjectId("59354ba202d9a99ab2f879c6"), "name" : "a" }
- { "_id" : ObjectId("59354ba602d9a99ab2f879c7"), "name" : "b" }
- { "_id" : ObjectId("59354ba802d9a99ab2f879c8"), "name" : "c" }
- { "_id" : ObjectId("59354baa02d9a99ab2f879c9"), "name" : "d" }
- { "_id" : ObjectId("59354ccfd905432aeaccd542"), "name" : "e" }
- { "_id" : ObjectId("59354cd2d905432aeaccd543"), "name" : "f" }
三、恢复步骤
1、将备份中 tab 表的 giveget_card.bson 及 giveget_card.metadata.json 文件拷贝到 /tmp/restore/ycsb 目录(自建目录),ycsb 为库名。
- # cp /data/backup/rs07/ycsb/giveget_card.* /tmp/restore/ycsb
2、将备份时间之后,误删操作之前的 oplog 导出,用于恢复表
- # mongodump --port 2203 -d local -c oplog.rs -q '{"ts" : {$gte : Timestamp(1496664480, 10430), $lte : Timestamp(1496665113, 10430)}}' -o /tmp/oplog
--时间戳 是使用转换工具转换之后的结果。
3、使用 bsondump 查看 oplog 日志,找到 drop 操作的时间戳 1496665069
- # bsondump /tmp/oplog/local/oplog.rs.bson
- {"ts":{"$timestamp":{"t":1496664760,"i":1}},"t":{"$numberLong":"12"},"h":{"$numberLong":"7079172056815894727"},"v":2,"op":"i","ns":"ycsb.giveget_card","o":{"_id":{"$oid":"59354ab8c5308d8c7a9da8b5"},"id":1.0}}
- {"ts":{"$timestamp":{"t":1496664762,"i":1}},"t":{"$numberLong":"12"},"h":{"$numberLong":"-1797107728294067016"},"v":2,"op":"i","ns":"ycsb.giveget_card","o":{"_id":{"$oid":"59354abac5308d8c7a9da8b6"},"id":2.0}}
- {"ts":{"$timestamp":{"t":1496664765,"i":1}},"t":{"$numberLong":"12"},"h":{"$numberLong":"8604646791509150392"},"v":2,"op":"i","ns":"ycsb.giveget_card","o":{"_id":{"$oid":"59354abdc5308d8c7a9da8b7"},"id":3.0}}
- {"ts":{"$timestamp":{"t":1496664768,"i":1}},"t":{"$numberLong":"12"},"h":{"$numberLong":"9018614066505371436"},"v":2,"op":"i","ns":"ycsb.giveget_card","o":{"_id":{"$oid":"59354ac0c5308d8c7a9da8b8"},"id":4.0}}
- {"ts":{"$timestamp":{"t":1496664994,"i":1}},"t":{"$numberLong":"12"},"h":{"$numberLong":"-4471524661347063602"},"v":2,"op":"c","ns":"ycsb.$cmd","o":{"create":"tab"}}
- {"ts":{"$timestamp":{"t":1496664994,"i":2}},"t":{"$numberLong":"12"},"h":{"$numberLong":"-4215905958456607246"},"v":2,"op":"i","ns":"ycsb.tab","o":{"_id":{"$oid":"59354ba202d9a99ab2f879c6"},"name":"a"}}
- {"ts":{"$timestamp":{"t":1496664998,"i":1}},"t":{"$numberLong":"12"},"h":{"$numberLong":"6170506962401844481"},"v":2,"op":"i","ns":"ycsb.tab","o":{"_id":{"$oid":"59354ba602d9a99ab2f879c7"},"name":"b"}}
- {"ts":{"$timestamp":{"t":1496665000,"i":1}},"t":{"$numberLong":"12"},"h":{"$numberLong":"-8071456063660489895"},"v":2,"op":"i","ns":"ycsb.tab","o":{"_id":{"$oid":"59354ba802d9a99ab2f879c8"},"name":"c"}}
- {"ts":{"$timestamp":{"t":1496665002,"i":1}},"t":{"$numberLong":"12"},"h":{"$numberLong":"4387884836668659146"},"v":2,"op":"i","ns":"ycsb.tab","o":{"_id":{"$oid":"59354baa02d9a99ab2f879c9"},"name":"d"}}
- {"ts":{"$timestamp":{"t":1496665069,"i":1}},"t":{"$numberLong":"12"},"h":{"$numberLong":"-6913449254950935781"},"v":2,"op":"c","ns":"ycsb.$cmd","o":{"drop":"giveget_card"}}
- 2017-06-05T20:27:25.552+0800 10 objects found
4、将 oplog 的 bson 文件拷贝到相应目录下
- # cp /tmp/oplog/local/oplog.rs.bson /tmp/restore/oplog.bson
此时恢复的目录结构:
- # pwd
- /tmp/restore
- # ls
- oplog.bson ycsb
5、至此,所有的准备操作已经做完,恢复数据。
- [root@ops-db-test02 restore]# mongorestore --port 2203 --oplogReplay --oplogLimit=1496665069:1 /tmp/restore
- 2017-06-05T20:36:45.361+0800 building a list of dbs and collections to restore from /tmp/restore dir
- 2017-06-05T20:36:45.364+0800 reading metadata for ycsb.giveget_card from /tmp/restore/ycsb/giveget_card.metadata.json
- 2017-06-05T20:36:45.364+0800 restoring ycsb.giveget_card from /tmp/restore/ycsb/giveget_card.bson
- 2017-06-05T20:36:48.362+0800 [........................] ycsb.giveget_card 15.4MB/475MB (3.2%)
- 2017-06-05T20:36:51.362+0800 [#.......................] ycsb.giveget_card 31.1MB/475MB (6.6%)
- 2017-06-05T20:36:54.362+0800 [##......................] ycsb.giveget_card 46.6MB/475MB (9.8%)
- 2017-06-05T20:36:57.362+0800 [###.....................] ycsb.giveget_card 62.1MB/475MB (13.1%)
- 2017-06-05T20:37:00.362+0800 [###.....................] ycsb.giveget_card 76.4MB/475MB (16.1%)
- 2017-06-05T20:37:03.362+0800 [####....................] ycsb.giveget_card 90.7MB/475MB (19.1%)
- 2017-06-05T20:37:06.362+0800 [#####...................] ycsb.giveget_card 105MB/475MB (22.0%)
- 2017-06-05T20:37:09.362+0800 [######..................] ycsb.giveget_card 120MB/475MB (25.2%)
- 2017-06-05T20:37:12.362+0800 [######..................] ycsb.giveget_card 133MB/475MB (28.0%)
- 2017-06-05T20:37:15.362+0800 [#######.................] ycsb.giveget_card 146MB/475MB (30.8%)
- 2017-06-05T20:37:18.363+0800 [########................] ycsb.giveget_card 163MB/475MB (34.3%)
- 2017-06-05T20:37:21.362+0800 [########................] ycsb.giveget_card 178MB/475MB (37.4%)
- 2017-06-05T20:37:24.362+0800 [#########...............] ycsb.giveget_card 196MB/475MB (41.3%)
- 2017-06-05T20:37:27.362+0800 [##########..............] ycsb.giveget_card 214MB/475MB (45.0%)
- 2017-06-05T20:37:30.362+0800 [###########.............] ycsb.giveget_card 231MB/475MB (48.6%)
- 2017-06-05T20:37:33.362+0800 [############............] ycsb.giveget_card 245MB/475MB (51.5%)
- 2017-06-05T20:37:36.362+0800 [#############...........] ycsb.giveget_card 261MB/475MB (54.8%)
- 2017-06-05T20:37:39.362+0800 [##############..........] ycsb.giveget_card 279MB/475MB (58.7%)
- 2017-06-05T20:37:42.362+0800 [###############.........] ycsb.giveget_card 297MB/475MB (62.5%)
- 2017-06-05T20:37:45.362+0800 [###############.........] ycsb.giveget_card 312MB/475MB (65.8%)
- 2017-06-05T20:37:48.362+0800 [################........] ycsb.giveget_card 328MB/475MB (69.0%)
- 2017-06-05T20:37:51.362+0800 [#################.......] ycsb.giveget_card 341MB/475MB (71.8%)
- 2017-06-05T20:37:54.362+0800 [#################.......] ycsb.giveget_card 356MB/475MB (74.9%)
- 2017-06-05T20:37:57.362+0800 [##################......] ycsb.giveget_card 373MB/475MB (78.5%)
- 2017-06-05T20:38:00.362+0800 [###################.....] ycsb.giveget_card 388MB/475MB (81.7%)
- 2017-06-05T20:38:03.362+0800 [####################....] ycsb.giveget_card 405MB/475MB (85.2%)
- 2017-06-05T20:38:06.362+0800 [#####################...] ycsb.giveget_card 419MB/475MB (88.2%)
- 2017-06-05T20:38:09.362+0800 [#####################...] ycsb.giveget_card 434MB/475MB (91.4%)
- 2017-06-05T20:38:12.362+0800 [######################..] ycsb.giveget_card 442MB/475MB (93.1%)
- 2017-06-05T20:38:15.362+0800 [#######################.] ycsb.giveget_card 459MB/475MB (96.6%)
- 2017-06-05T20:38:18.362+0800 [#######################.] ycsb.giveget_card 475MB/475MB (99.9%)
- 2017-06-05T20:38:18.427+0800 [########################] ycsb.giveget_card 475MB/475MB (100.0%)
- 2017-06-05T20:38:18.427+0800 restoring indexes for collection ycsb.giveget_card from metadata
- 2017-06-05T20:38:44.680+0800 finished restoring ycsb.giveget_card (3173391 documents)
- 2017-06-05T20:38:44.680+0800 replaying oplog
- 2017-06-05T20:38:44.739+0800 done
6、查看恢复的结果
- rs_test01:PRIMARY> db.giveget_card.find({id : {$gte : 1 }});
- { "_id" : ObjectId("59354cb9d905432aeaccd540"), "id" : 5 }
- { "_id" : ObjectId("59354cc0d905432aeaccd541"), "id" : 6 }
- { "_id" : ObjectId("59354ab8c5308d8c7a9da8b5"), "id" : 1 }
- { "_id" : ObjectId("59354abac5308d8c7a9da8b6"), "id" : 2 }
- { "_id" : ObjectId("59354abdc5308d8c7a9da8b7"), "id" : 3 }
- { "_id" : ObjectId("59354ac0c5308d8c7a9da8b8"), "id" : 4 }
数据内容相同,但存储顺序与之前数据的存储顺序不同了。
- rs_test01:PRIMARY> db.giveget_card.count();
- 3173397
结果 count= 备份表数据 3173391+ 之后的更新数据 6 。
7、因为 dump 出来的 oplog 也包含了其他表的操作。查看恢复过程中有没有对其他表产生影响。
- rs_test01:PRIMARY> db.tab.find();
- { "_id" : ObjectId("59354ba202d9a99ab2f879c6"), "name" : "a" }
- { "_id" : ObjectId("59354ba602d9a99ab2f879c7"), "name" : "b" }
- { "_id" : ObjectId("59354ba802d9a99ab2f879c8"), "name" : "c" }
- { "_id" : ObjectId("59354baa02d9a99ab2f879c9"), "name" : "d" }
- { "_id" : ObjectId("59354ccfd905432aeaccd542"), "name" : "e" }
- { "_id" : ObjectId("59354cd2d905432aeaccd543"), "name" : "f" }
--查看 tab 表的数据跟原表数据相同,没有什么影响,说明其他表的日志在空跑。
以上就是备份结合 oplog 的恢复操作。
备份很重要!!! 备份很重要!!! 备份很重要!!!重要的事情讲三遍~~~