How robust is a Reflect backup file? Can it be improved?


This was an exchange in the forum discussing the robustness of Macrium image files, edited somewhat for clarity.

Original post at https://forum.macrium.com/Topic24929.aspx

Are Reflect backup files robust? Why isn’t error correction used?

Macrium backup files have very effective error detection but don’t contain any error correction capability.

This is a pragmatic design decision both for Macrium backup files and most file systems (NTFS neither corrects or detects sector errors). The most effective place to detect and correct errors is as close as possible to the underlying storage medium. Surface errors tend to occur in bursts and to avoid a burst exceeding the maximum bit errors that the error correction scheme can resolve, a sector’s data is interleaved with other sectors. This inverleaving can only be effectively achieved by the disk firmware as the locality of data is abstracted away by the firmware. Communication protocols such as ADSL and DVB also use interleaving to overcome error busts.

We have carried out a design study on using RS coding to increase the robustness of Macrium files. However, it turns out that disk firmware error correction is so effective that typically, once that error correction fails for a sector, it is close to general failure or a significant proportion of the disk becomes unreadable; in other words, when a disk starts to fail, there a very significant probability that the whole backup file or a large proportion will be lost. Further, as previously noted, the lack of storage locality outside the device severely reduces the effectiveness of any OS or application level error correction. In summary, adding error correction would make backups and restores more CPU intensive and the files larger while only gaining an illusory improvement in robustness.

How many uncorrected bit errors can a backup file experience before its data is inaccessible.

The vast majority of a backup file contains the data from the backed-up system. A bit error in that part of the file will only impact that part of the restored data. A bit error in the file meta-data can be more general however.

Would not using compression at backup time reduce the risk?

The use of compression may actually reduce the risk as, by reducing the number of sectors used in the storage of the file, you are reducing the chance of any one of those sectors failing. The file is segmented such that a bit error only has an effect on a single unit of data, not the whole file. Further, though compression would increase the impact of a single bit error, due to the effect of disk firmware error correction, you never get a single bit error; either the errors are corrected or the whole sector (or sector group) is unreadable.

What about splitting the backup file across multiple devices?

This would result in a real improvement in robustness against data loss. However, it also represents an inflexible and limited re-invention of RAID. Using a fully fledged RAID implementation (inc. software solutions such as Storage Spaces) would be a more effective way forward.

TL:DR

Implementing error correction for Macrium backup files is ineffective because it is too distant from the underlying storage medium.

The most effective method to protect your backup files is replication on more than one device, or better, more than one location. Error detection is still vital to provide the alert to switch to the alternative backup location.

You can download a 30-Day Trial of Macrium Reflect Home, Workstation, Server, Server Plus, or Site Manager.


Previous Post

Virtualization with Macrium Reflect — part 2

Next Post

Virtualization with Macrium Reflect