I am trying to recover NN metadata. I made a backup of the Namenode and Journal node metadata. It contains editing logs and fsimages.
There are two NNs in my system. I am backing up metadata on both NNs (hdfs metadata and QJM metadata) at a regular frequency. I want to check the recovery procedure in the worst case. Suppose both NNs and Journal node work with completely deleted metadata.
I want to restore NN metadata from backup and start NN. I know that there may be data loss, since the latest changes made after the backup will be absent.
Questions
- Do you think such a scenario is possible / possible?
- I ran into some issues related to txn id mismatch fixed by txn id. Tell me if there is a solution for this.
Steps:
- Take a backup of the NN and QJM metadata. Do some operations with hdfs files (create new files).
- Stop NN and node log on both machines.
- Remove metadata from / data / hdfs and log directories.
- Restore Fsimages from a backup (taken some time ago).
- Start NN. It crashes with a few exceptions.
An alternative approach. Restore all the editing and fsimage logs in both hdf files and qjm directories and run NN, but it still won't work.
Both NN are omitted, and I cannot educate. I do not want to format hdfs as it will change the cluster id and the backup will not be used.
Exceptions
- There seems to be a space in the editing log. We expected txid 71453, but got txid 71466
- The client is trying to move the committed txid back from 71599 to 71453
- recoverUnfinalizedSegments failed to create the required log. I decided to synchronize the log with startTxId: 71453, but the registrar 10.204.64.26:8485 saw that txid 71599 committed
source share