Restore Hadoop NameNode from Metadata Backup

Question

Restore Hadoop NameNode from Metadata Backup

I am trying to recover NN metadata. I made a backup of the Namenode and Journal node metadata. It contains editing logs and fsimages.

There are two NNs in my system. I am backing up metadata on both NNs (hdfs metadata and QJM metadata) at a regular frequency. I want to check the recovery procedure in the worst case. Suppose both NNs and Journal node work with completely deleted metadata.

I want to restore NN metadata from backup and start NN. I know that there may be data loss, since the latest changes made after the backup will be absent.

Questions

Do you think such a scenario is possible / possible?
I ran into some issues related to txn id mismatch fixed by txn id. Tell me if there is a solution for this.

Steps:

Take a backup of the NN and QJM metadata. Do some operations with hdfs files (create new files).
Stop NN and node log on both machines.
Remove metadata from / data / hdfs and log directories.
Restore Fsimages from a backup (taken some time ago).
Start NN. It crashes with a few exceptions.

An alternative approach. Restore all the editing and fsimage logs in both hdf files and qjm directories and run NN, but it still won't work.

Both NN are omitted, and I cannot educate. I do not want to format hdfs as it will change the cluster id and the backup will not be used.

Exceptions

There seems to be a space in the editing log. We expected txid 71453, but got txid 71466
The client is trying to move the committed txid back from 71599 to 71453
recoverUnfinalizedSegments failed to create the required log. I decided to synchronize the log with startTxId: 71453, but the registrar 10.204.64.26:8485 saw that txid 71599 committed

+6

hadoop

Vikas Ranjan Jun 09 '14 at 11:37

source share

3 answers

Venkata karthik · Answer 1 · 2015-04-17T19:30:53+0000

You can run namenode with the recovery flag enabled. Namenode recovery will take care of corrupt maestadates.

./bin/hadoop namenode -recover

secfree · Answer 2 · 2016-01-21T03:31:13+0000

Since the last FsImage and Edit were lost or damaged, you should try to restore the metadata
./bin/hadoop namenode -recover
Refer: Node Name Recovery Tools for the Hadoop Distributed File System
Since the log does not synchronize with namenode, you must recreate it.
./bin/hdfs namenode -initializeSharedEdits
Since the recovered FsImage lost the last data updated after the last backup, you should check and delete the corrupted data
./bin/hadoop fsck -delete /
If you don't do fsck, namenode may be stuck in safe mode, for too many blocks without an answer.

neeraj · Answer 3 · 2015-07-02T18:03:51+0000

Launch all JournalNode. Make sure you copy the fsimage, fsimage.md5 and VERSION file. Then run hdfs namenode -initializeSharedEdits -force, it will format only JournalNode. Then run NameNode (1). It should work. Let me know if this doesn't work.

Restore Hadoop NameNode from Metadata Backup

More articles: