ZFS Caching

Starting to research building out a ZFS NAS storage server. Below you will find my notes on ZFS caching. This is a work in progress that will be continuously updated. If you have any feedback or questions yourself, please reply in the thread.

Writes

  • ZFS caches writes in RAM first; then sounds out as transaction groups (TXGs) at set intervals
    – Transactional File System

Asynchronous vs. Synchronous Writes

Asynchronous Writes:

  • Immediately written to RAM cache and reported as completed to the client; only gets written to disk later
  • In the event of power loss, everything that would’ve been included in that TXG is lost

Synchronous Writes:

  • First written to RAM cache, but instead server does not confirm write complete until it’s been logged in the ZFS Intent Log (ZIL)
  • By default, ZIL lives on the storage pool itself = latency; very bad for small random writes (which are the most common thing on a server)

SLOG to the Rescue

  • SLOG stands for Secondary Log
  • Basically ZIL on an SSD
  • Primary purpose is data integrity in event of power failure
  • Drastically improves small, random IO, but can also improve sequential IO
  • SLOG isn’t a true cache; nothing is directly accessed from it unless in the event of a power failure; it really is a log
    – The reason it improves performance so much is that it allows synchronous write requests to “clear” earlier
    -Prioritize single queue depth when selecting hardware

Reads

  • ZFS also reads from RAM cache
  • Adaptive Replacement Cache (ARC) = read cache

ARC

  • Stores most recently used and most frequently used data in RAM
  • Shared across all pools
  • Data also exists on disk
  • The more RAM the better

L2ARC

  • Level 2 ARC
  • ARC but on SSD instead of the much faster RAM
    – Still better than spinning disks
  • Unlike ARC, assigned to specific zpools

Resources:

ZFS Caching - 45Drives

Cache Specifications/Recommendations:

An L2ARC shouldn’t be bigger than about 5x your ARC size. Your ARC size cannot exceed 7/8th of your system RAM. So for a system with 32GB of RAM, you shouldn’t go any bigger than 120GB. This is why maximum system RAM first is a priority!

Source: Slideshow explaining VDev, zpool, ZIL and L2ARC for noobs! - TrueNAS

Questions:

  • Where is this number coming from?
  • If I’m limited to such a dinky amount of L2ARC (~240 GB on a 64 GB RAM system), why even bother? I already have a 1TB NVME drive, are you saying I wouldn’t even be able to take advantage of all the solid state storage I currently have?

L2ARC Scoping

The drawback to using L2ARC is that it takes memory (RAM) aka ARC to handle the mapping of where that cached data is stored in the L2ARC. Exactly how much? Here’s a formula:

(L2ARC size in kilobytes) / (typical recordsize -- or volblocksize -- in kilobytes) * 70 bytes = ARC header size in RAM

Source: L2ARC Scoping – How much ARC does L2ARC eat on average? - reddit

This formula shows the importance of recordsize and volblocksize in ARC impact when adding an L2ARC.

Questions:

  • This adds another question of complexity: how to determine the “best” recordsize and volblocksize?

Additional Questions/Thoughts:

  • As I look more and more into this, I’m wondering if ZFS is even worth it for small-scale storage servers. ZFS seems to have a huge fixed cost which seems to form a barrier to entry. For example, my current situation:
    – I currently have a 1TB NVMe drive providing local VM image storage on my Proxmox server. I’m not fully utilizing this storage yet, but to achieve similar performance, my plan was to use this 1TB of NVMe storage as L2ARC and SLOG. Then use spinning rust (HDDs) to provide the bulk storage in the vdev.
    –To get that same kind of 1TB storage as L2ARC cache, using the above formulas, I would need to have 256 GB of RAM!

Resources

  • Best Practices for OpenZFS L2ARC in the Era of NVMe - Ryan McKenzie - iXsystems
    – Fantastic resource. Can be used to attempt to answer the question, "In a world with NVME drives that are even faster than the previous gen SATA SSDs, should we be willing to take more of a hit in available ARC space in exchange for the much larger capacity made available to us by NVMe drives in the L2ARC?
    – Also gives an excellent step-by-step overview on how read/writes work with the ARC