Zoned storage and ReFS: future-proofing magnetic and flash
Magnetic and flash storage have long dominated the storage landscape.
They both continue to see innovation in both capacity, cost, and performance. Though many have predicted the death of magnetic media, it still remains the best choice where cost per unit storage is the key metric.
In this blog, we discuss some of the characteristics of modern magnetic and flash storage, and how maintaining compatibility with traditional harddisks is increasingly becoming a bottleneck to both performance and innovation.
512 byte random access sectors
The sector is the minimum unit of storage on a harddisk. Each sector or block has a fixed amount of storage. Data is addressed and read written as complete sectors.
Within that constraint, any sector on a disk can be read and written individually. These sectors are almost universally 512 bytes. This represented the real structure of all magnetic disks until Advanced Format (4096 byte sector size) was introduced in 2010. Flash media however had never used a 512 byte internal structure.
Despite the very different underlying storage structures, the core protocol to write to and read from a harddisk has remained the same. The only innovation has been abstracting the physical position on the disk with a move from CHS (geometric) to LBA (linear) addressing.
The firmware of a modern disk contains sophisticated algorithms to map its internal structure onto this same 512 byte sector protocol for compatibility with filesystems and boot firmware and operating systems while keeping write amplification to a minimum [3].
Internal Magnetic media structure
As previously noted, the first break from an internal 512 byte structure was a move to larger internal sectors (typ 4K) to maintain efficiency of error correction. More recently, shingled track zones (typ 256K) have been introduced on some larger capacity SMR disks to significantly increase the storage density.
Internal Flash media structure
Flash media is structured as pages. These vary in size, but are always much larger than 512 bytes.Further, to update a page, it must be erased. This takes time and each page can only withstand a limited number of erase cycles.
Aside from the strategy of cramming more cells onto the silicon, there has been innovation in the number of bits that can be stored in a cell. Unfortunately, there is a cost of cell lifetime reducing write durability as the number of bits per cell increases. Single bit SLC cells have the lowest density, but the highest durability, whereas a 4 bit QLC cell has the reverse characteristics. SLC is typically used for enterprise devices, where higher density cells are more commonly found on consumer devices.
Traditionally a device will only have a single cell type. This precludes the use of QLC cells when frequent updates are expected, even if to a small region of the disk.
Zoned storage protocol
As both magnetic and flash continue to drive for more, faster and cheaper storage, transparently emulating 512 byte random access sectors is becoming increasingly inefficient. We wrote about a specific case of this here [1].
A cost of 512 byte emulation is that the i/o operations are delivered to the firmware without.context, giving the firmware limited scope to optimise its internal operations. The solution is to move the management of the underlying storage blocks up the stack to the filesystem, or even into the application for data intensive cases such as databases. By more closely reflecting the real structure of the disk in the interface, the filesystem can optimise its write patterns around those most efficient for that device.
For magnetic storage, command set extensions have already been standardised (ZBC for ATA and ZAC for SCSI). This allows the zone layout and status to be queried and a zone to be opened and closed. Data can only be written sequentially to the zone and only while it is open. Host managed SMR devices implement these commands and have found some penetration in the enterprise and cloud storage space.
A similar command set exists for NVMe devices for Zoned Name Space devices. An additional advantage for flash storage is that it allows the device to offer heterogeneous storage, for example an allocation of SLC pages for rapidly changing data and QLC pages for data that rarely changes.
Filesystem support
On the Windows platform, ReFS is the only filesystem that is inherently capable of supporting host managed zone structured devices.
Introduced in 2012 with Windows 2012 and Windows 8.1, it headlined with auto data integrity management, data scrubbing and integrated RAID features. After initially failing to gain any traction in the consumer space, it was removed from the consumer targeted editions of Windows 10 with the 2017 Fall Creators update.
For those who want some deeper insight, a presentation by Microsoft’s Lee Prewitt is definitely worth a watch [2]
At Macrium, we are predicting that the joint promise of both increased performance and cheaper storage promised by ZNS devices will finally drive widespread adoption of both ReFS and ZNS devices. We also expect to see Host Managed SMR devices become more popular so expect to see more widespread support for them too, particularly in NAS units where cost per unit storage is a key metric.
We are committed to future proofing Macrium Reflect by added ReFS support to the next major release.
[1] https://blog.macrium.com/macriums-view-on-the-recent-smr-disk-controversy-778ae03b19f8