My First NAS: Newbie Q&A on Hashing, Data & RAID Scrubbing and Check Summing for Backups

oltos · 2023-09-01 8:52 pm

In pursuit of building-or rather having my local IT guy build me-my first NAS, I’ve sunk my newbie brain as deep as it can go into learning how best to use it after my builder does all the hardware and OS installs and then walks me through use of the GUI.

Of course, beyond basic storage capacity and drive storage redundancy to prevent user file losses, a NAS or any server and its file system (zfs or btrfs) are only as useful as they enable you to prevent data corruption. Save for the crazy maths (and terms like “pool” which seems to have multiple meanings in the data storage biz), these reports were helpful https://en.wikipedia.org/wiki/Hash_function https://en.wikipedia.org/wiki/Checksum for learning about hash functions and the tables of hash codes (“hashes”) they (apparently?) create for each document, photo, audio or video file.

But please to these questions:

Is a hash code automatically created for every user file (e.g., document, photo, audio, video) the first time it gets written to the NAS? Or do you have to use some kind of app or NAS utility and enable it to generate and assign a hash code to every one of your files?

And where are those codes stored? Inside of the file’s own container? Or are all user file hash codes stored someplace else? In a “hash table” and/or on a drive partition on the RAID drive array?

Are these hash codes used by the zfs and btrfs file system for routine data scrubbing?
https://blog.synology.com/how-data-scrubbing-protects-against-data-corruption
https://www.qnap.com/en/how-to/tuto...a-corruption-by-using-data-scrubbing-schedule
https://par.nsf.gov/servlets/purl/10100012

Then, as mentioned in the above links, following data scrubbing, are these hash codes also usually used for routine RAID scrubbing?

But for both data and RAID scrubbing, is data integrity ensured by comparing the hash code of each file with its initially (first ever created) hash code (stored wherever) to the hash code currently in the file. If the system’s comparing calculations show that the codes are different, then one or more of the file’s bits have flipped, so then it knows that the file is therefore corrupt?

If yes, then at that point will it flag me and ask if it wants the system to attempt to repair it?

If I say yes, then it will try to overwrite the corrupt file with the mirrored copy stored on a redundant (e.g., RAID 5) drive.

CAUTION: As RAID scrubbing puts mechanical stress and heat on HDDs, the rule of thumb seems to be to schedule it for once monthly-and only when drives are idle, so no user triggered read/write errors can occur. https://arstechnica.com/civis/threa...bad-for-disks-if-done-too-frequently.1413781/

Beyond scrubbing, what else can I and the zfs and/or btrfs do to both bit rot?

And to minimize the risk crashes:

Replace the RAIDed HDD array every 3 (consumer) to 5 (enterprise grade) years.

Do not install any software upgrade for the NAS until it’s been around long for the NAS brand and the user community forum to declare it to be bug free.

What else can I do to minimize the risk of crashes?

Finally, when backing up from my (main) NAS to an (ideally identical??) NAS, Kunzite says here “…and I'm check summing my backups...”
https://forum.qnap.com/viewtopic.php?t=168535

But as hash functions are never perfect, and while rare, data “collisions” are inevitable. https://en.wikipedia.org/wiki/Hash_collision So as those hash algorithms are used for data and RAID scrubbing, they are evidently also used for check summing to ensure that data transfers from the NAS to a backup device happen without file corruption.

Apparently, CRC-32 is among the least collusion proof hash algorithms. https://en.wikipedia.org/wiki/Hash_collision#CRC-32

Thus, for backups from main NAS to backup NAS, how much more is the SHA256 hash function (algorithm) worth using to prevent collisions and to verify data integrity of user files via check summing than MD5, because it uses twice the number of bits?

But if not much more advantageous for even potentially large audio files https://www.hdtracks.com/ , then would SHA256 be a lot more so than MD5 for check summing during for backups of DVD movie rips saved to uncompressed MKV and/or ISO files, because video bandwidths are so much bigger than audio?

And what would be a recommended checksum calculator app? https://www.lifewire.com/what-does-checksum-mean-2625825#toc-checksum-calculators

But if the app returns a check sum error between the file on my main NAS and the copy to be updated on my backup NAS, how then to repair the corrupt file?

Again, by using the file’s original hash code (stored some place) created the first time that it was ever stored in the NAS?

If yes, would that app then prompt me to choose to have the system repair the file?

Search

Amplifiers

Source & Line

Loudspeakers

Design & Build

General Interest

Live Sound

Member Areas

Site

Featured Vendors

Members Market

Vendors Market

Vendors

Search

My First NAS: Newbie Q&A on Hashing, Data & RAID Scrubbing and Check Summing for Backups

oltos