Disclaimer/preface: I don't know all that much about MacOS.
I was investigating why my Time Machine (and later, borgbackup'ed tmutil localsnapshot
s) incremental backups were larger than I expected -- around 1-7 GB worth of changes per day.
I found this out after I had set up Time Machine backups to a TrueNAS CORE machine and had set up regular snapshots of the Time Machine ZFS dataset:
# zfs list -t snapshot -o space -r /mnt/vault/delorean/philsnow | head
NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD
vault/delorean/philsnow@auto-2024-05-04_04-00 - 6.44G - - - -
vault/delorean/philsnow@auto-2024-05-05_04-00 - 1.55G - - - -
vault/delorean/philsnow@auto-2024-05-06_04-00 - 2.09G - - - -
vault/delorean/philsnow@auto-2024-05-07_04-00 - 3.48G - - - -
vault/delorean/philsnow@auto-2024-05-08_04-00 - 3.05G - - - -
vault/delorean/philsnow@auto-2024-05-09_04-00 - 2.80G - - - -
vault/delorean/philsnow@auto-2024-05-10_04-00 - 2.80G - - - -
vault/delorean/philsnow@auto-2024-05-11_04-00 - 3.12G - - - -
vault/delorean/philsnow@auto-2024-05-12_04-00 - 1.86G - - - -
This is showing the space usage of the daily 4am snapshots that were happening on the TrueNAS machine. During at least one of those days, the only machine that was backing up to this Time Machine share was pretty much quiescent, just sitting there locked... so why were there so many bytes worth of changes?
It took me several hours just to figure out how to mount local snapshots (did I mention I don't really know MacOS?). There are a lot of articles about how to do this but they were mostly written/published between around 2017 and 2021, so after APFS was introduced, but before the whole Secure Signed Volume (SSV) Security system and (I think) before System Integrity Protection (SIP) showed up.
tl;dr:
- Whatever terminal app you're using needs to have Full Disk Access enabled.
- The blog posts from before SSV said
mount_apfs -s <snapshot_name> / /tmp/snapshot
, but that gives either "resource busy" or "mount: /tmp/snapshot failed with 77" or "[...] failed with 67". After much guesswork I found that the/
argument has to be/System/Volues/Data
(I think because <snapshot_name> is a snapshot for the APFS volume that gets mounted at /System/Volumes/Data, whereas / is another APFS volume and <snapshot_name> is not a snapshot for that volume).
So, as of 2024 and Sonoma, for the internet fossil record, the incantation seems to be:
# mountpoint=/tmp/snapshot
# mkdir -p $mountpoint
# tmutil localsnapshot
# snap=$(tmutil listlocalsnapshots / | grep TimeMachine | sort | tail -1)
# mount_apfs -s $snap -o ro /System/Volumes/Data $mountpoint
Hokay, now I can mount a couple consecutive, "normal Time Machine" snapshots under /tmp/snapshot_{a,b}. Now what are the differences between them?
# rsync -HPrl --itemize-changes --dry-run /tmp/snapshot_{a,b}/ 2>&1 | tee /tmp/rsync-diff
This command runs the rsync diff algorithm recursively between two snapshots and outputs an itemization of the changes, one line per path, showing whehter "A" or "B" is newer or unchanged, that kind of thing.
While looking at the output, one thing that jumped out at me was I saw there were hundreds of .plist files from all over the filesystem that were showing up as having changed between consecutive snapshots. I picked one at random and looked at a before/after, and it looked like it had the same contents, but with keys in different orders.
You can plistutil --sort --print
a .plist file and the --sort
will print the file's contents in some canonical order. I did that on the two versions of the .plist file I had picked, and the results were identical.
Maybe that was a fluke, maybe I happened on an odd case... Let's do science. This takes the list of changed files, filters out the ones where the only apparent change was some timestamp, filters only for plist files, then for each plist file, run a diff between the canonically-ordered printout of both snapshots, echoing "same" if they are semantically unchanged and "different" otherwise:
$ cat /tmp/rsync-diff | \
grep -v -F 'f..T......' | \ # filter out changes to files where only some timestamp changed
grep -v -F 'd..T......' | \ # same with directories
grep -v -F 'Operation timed out' | \
grep plist\$ | \
cut -c 11- | \ # only print the path to the plist file
while read f; do
echo -n "$f: "
diff -q \ # if same, exit 0. if different, exit nonzero
<(plistutil --sort --print "/tmp/snapshot_a/${f}") \
<(plistutil --sort --print "/tmp/snapshot_b/${f}") \
>/dev/null 2>&1 \
&& echo "same" || echo "different"
done > /tmp/same-diff
So now /tmp/same-diff has a bunch of lines "<path/to/file.plist>: same" or "<path/to/file.plist>: different". What's the breakdown?
# grep same\$ /tmp/same-diff | wc -l
87
# grep different\$ /tmp/same-diff | wc -l
67
About 60% of them (87/(67+87)) were semantically the same but bytewise different. Add up all the bytes for the semantically-the-same files and:
$ s=0; </tmp/same-diff awk -F: '/same$/{print $1}' | \
while read f; do
du -sb "$f" 2>/dev/null
done | \
awk '{print $1}' | \
while read n; do
s=$((s + n))
done; echo $s
5898552
That's 5.6MB worth of meaningless changes that get backed up regularly.
Is there a knob somewhere in MacOS that controls some framework, telling it to spend the extra CPU to use sorted property list representations and to always emit sorted plist files?