Inside macOS 10.13 High Sierra: APFS benefits end users with space, speed

[deleted user] · June 9, 2017 6:38AM

How will APFS work with Time Machine? it should be faster hopefully, copying over only block level changes, and not entire files.

I hope the logic of backing up will improve, because as mentioned in another comment, bandwidth to external drives is not necessarily running on Thunderbolt 3 on a raid...

Moreover, if they can migrate an SSD from HFS+ to APFS, I see no reason why they cannot migrate any other partition on any other type of drive.
Fusion Drives included, which are still sold today and in the future as mentioned in another comment.

The interesting thing, is the effort from Apple to reduce a user's data footprint. Their entire strategy seems to be to rely on Cloud storage, matching, reducing bytes to store vidoes and pics (H.265) and so on. APFS included. It's like as if they either want faster transfers and updates to/from Cloud and/or keep selling devices that do not require increasing amount of space, to those who use actively their iCloud services (with "optimise memory" options active, and streaming either from iTunes Match or Apple Music).
For everybody else, RAID on Thunderbolt 3

willcropoint · June 9, 2017 8:07AM

pakitt said:

[...]
It's like as if they either want faster transfers and updates to/from Cloud and/or keep selling devices that do not require increasing amount of space, to those who use actively their iCloud services (with "optimise memory" options active, and streaming either from iTunes Match or Apple Music).
[...]

About iCloud, I would really like to see an option popping up between 200Gb and 2Tb, why can't they provide options in between such as 500Gb or 1Tb? Crazy...

lorin schultz · June 9, 2017 8:50AM

pakitt said:
For everybody else, RAID on Thunderbolt 3

Yup. In my world the cloud often isn't a viable option away from home. I'm always either behind a corporate nanny gate or tethered to slow, expensive cell data. Apple's efforts to herd me into the wireless corral are wasted since I can't get there even if I wanted to go.

I signed up for iTunes match to make it easier to sync our music library across the six devices in our home. I was taken aback a little when I discovered that it didn't push actual files to the other devices, but rather just placeholders for cloud playback. Obviously the files can be retrieved, but it illustrates Apple's intention that our storage be remote rather than local.

Getting back to the main topic, I'm cautiously optimistic about this new file system, but I share the concerns of others about how creating a "pointer+difference" file is an advantage over just making an actual copy. Assuming both files continue to be changed, eventually the number of bits written is going to equal (or exceed) what would have been required to just copy the whole file in the first place, so it seems like adding a layer of risk for limited payoff.

macpro · June 9, 2017 11:28AM

Adobe CC minor issue after running High Sierra.

After running 10.12 on my new Mac Pro with a clone of the boot drive and a clone of my external data disks (i.e. identical set up to my usual Sierra one) I encounter no issues other those I expected such as Little Snitch. All my applications ran fine.

However after a few days I retuned to my normal internal boot and data to do some work. On launching PS CC Adobe CC told me my applications had to be uninstalled and reinstalled. I checked and the error showed that Adobe had changed privileges on /Library/Applications Support/ AdobeAdobePCD. I checked the flags but could see no obvious changes and I was in a hurry I followed the instructions to uninstall and reinstall all my CC applications. All was well after that.

The other thing that was strange but good was I had played extensively with the new Photos in High Sierra and made lots of changes to such things as People and who was a favorite and Memory etc. These changes instantly were seen on other Macs running High Sierra logged in as me too. Which was great. I half expected to face a mess going back to Sierra but no, none of the changes were there and all was as it had been prior to running High Sierra. Apple have obviously kept the databases distinct making the ability to migrate some but not all your macs possible without Photo on the cloud conflicts. Nice touch.

macpro · June 9, 2017 11:35AM

lorin schultz said:

pakitt said:
For everybody else, RAID on Thunderbolt 3

Yup. In my world the cloud often isn't a viable option away from home. I'm always either behind a corporate nanny gate or tethered to slow, expensive cell data. Apple's efforts to herd me into the wireless corral are wasted since I can't get there even if I wanted to go.

I signed up for iTunes match to make it easier to sync our music library across the six devices in our home. I was taken aback a little when I discovered that it didn't push actual files to the other devices, but rather just placeholders for cloud playback. Obviously the files can be retrieved, but it illustrates Apple's intention that our storage be remote rather than local.

Getting back to the main topic, I'm cautiously optimistic about this new file system, but I share the concerns of others about how creating a "pointer+difference" file is an advantage over just making an actual copy. Assuming both files continue to be changed, eventually the number of bits written is going to equal (or exceed) what would have been required to just copy the whole file in the first place, so it seems like adding a layer of risk for limited payoff.

Just picking up on your second paragraph ... the nice thing is though, as you point out .. you can select to download anything you want locally on another Mac if you needed to, say for a trip to the Arctic tundra with no WiFi, It is pretty fast if you have a decent connection. I actually re-downloaded my entire 100+ GBs of music not long ago on my main Mac so as to take advantage of the higher quality available since my initial CD uploads years ago. Point being you are not forced to rely on the cloud but it is pretty cool for access from Apple hardware with small SSDs or Apple TV for example.

dysamoria · June 9, 2017 1:14PM

glindon said:

coolfactor said:

You'd think that filesystem access was fully abstracted for applications, handled entirely by the OS, so it's strange that MS Outlook 2016 is having this issue. They must've done something proprietary, which is par for the course for them.

There’s been issues for a long time with case sensitive volumes not working with some software. Sadly in my tests APFS rendered most of my 3rd party logic plugins incompatible. Now it’s possible that High Sierra might have been responsible but I’ve run into issues on external drives with APFS and plugins in Sierra.

I thought that APFS was supposed to be case-preserving, not case-sensitive? This is the superior way to handle case. It should be the default in any file system on a "user friendly" OS.

dysamoria · June 9, 2017 1:50PM

tallest skil said:

foljs said:
You're doing it wrong then. "Redundancy" on a single drive is not a replacement for backups (plural: as you should not only backup, but backup to more than one location).

You’ve never made a copy of an file so that you could make changes to the copy without worrying about screwing something up and accidentally working destructively rather than non-destructively? Are you so accustomed to a nondestructive workflow that it didn’t occur to you, or are we just so old that you were never raised outside one?

You can still do that with APFS. You're not understanding the difference between what the file system does with bits on the physical storage device and what you the user are experiencing. There's a layer of abstraction between the two. The user sees nothing different from before, but the actual data storage methodology in the physical storage device works differently from before. The difference is invisible to you as the user and you keep working the way you always did.

If you copy a file for the purpose of working on a duplicate (to preserve the original as a "safety copy"), the file system will create a new record in the file system but not change any other bits in the storage device's physical media. When you modify and save your document, this is when the file system changes bits on the device. It only writes bits that differ from the original file. It associates these bits to the new file's file system record. Any bits that are unchanged between the original file and the new duplicate file will continue being read from the same physical location as before. There's no sense in duplicating the unchanged bits on the physical storage device.

As you change and save the duplicate, the file system will write more data to the device for that duplicate, and it will refer less and less to the original as it is reading from the device (because your changes aren't in the original file).

If you keep working on your duplicate file, it will eventually change so much that, due to writing changes to the file system, the space the new duplicate consumes on the volume may approach or equal the space consumed by your original. The duplicate file record will no longer refer to file system nodes storing data for the original file because those are irrelevant to your new file. No matter how much you change the duplicate, the original remains unchanged.

We aren't dealing with real-world objects here. You can treat bits of data composing a file in a molecular manner, rather than the monolithic manner we treat physical objects like pages of paper, paper folders, or paper books.

If you treated physical objects the same way as APFS treats file copies, you'd have a book sitting on your table. When you copy the book, you'd start by placing a new cover/title page on the table next to the original book. As you write new content, you're writing them on new pages of paper and sitting them in a pile, under the new cover/title page, on the table next to the existing book. The more pages you change, the larger your stack of pages for the new book becomes.

If you wanted to read the original book, you do nothing different from before. If you want to read the new book, you would read all the pages in order like before, but you would read unmodified pages from the original paper book and read the modified pages from the stack of pages you've been piling up on the table next to the original.

It's messy or impossible to treat physical objects this way, but it is simple and efficient to handle bits and bytes of data in this way.

The only caveat is file system corruption. If the device suffers corruption to the bits referenced by multiple records, those records will all include bad data and therefore result in the user seeing multiple corrupted files. This is why we do backups to other storage devices. This is why we have always done backups, and should always do backups, to other storage devices.

edited June 2017

tipoo · June 9, 2017 7:51PM

A separate test with a HDD would be cool. Wondering if it's a regression in speed on them.

mike eggleston · June 9, 2017 8:17PM

dysamoria said:

The only caveat is file system corruption. If the device suffers corruption to the bits referenced by multiple records, those records will all include bad data and therefore result in the user seeing multiple corrupted files. This is why we do backups to other storage devices. This is why we have always done backups, and should always do backups, to other storage devices.

A couple of things that should be pointed out. First, this is all on the same volume. So if you have a corruption on a volume, you usually do not end up with a partial corruption, it is usually more widespread (being that because the actual data on the drive isn't stored linearly, but in blocks of space that it can deal with). Secondly, this isn't "new" technology in that GIT has been doing this for some time. With GIT, you can clone a branch, make changes to it, remove files, etc, and the original branch is safe and secure. This is just like GIT, except it is on a per file basis; but it is the same mentality.

lorin schultz · June 9, 2017 9:14PM

dysamoria said:
As you change and save the duplicate, the file system will write more data to the device for that duplicate, and it will refer less and less to the original as it is reading from the device (because your changes aren't in the original file).

If you keep working on your duplicate file, it will eventually change so much that, due to writing changes to the file system, the space the new duplicate consumes on the volume may approach or equal the space consumed by your original. The duplicate file record will no longer refer to file system nodes storing data for the original file because those are irrelevant to your new file. No matter how much you change the duplicate, the original remains unchanged.

Excellent explanation.

There are still a couple details about which I'm not clear. I'm sure Apple has addressed them, I just don't yet understand them.

One is what happens if I delete the original. The system must have to:

rewrite the "common" data to the copy. That means the time that was saved making the copy is now used up rebuilding it. Time wasn't actually saved, the consumption of it was just deferred.
keep the original file's data and just dump the filename pointer. The retained copy would now occupy MORE storage space -- the original plus "difference" data -- than it would had it been copied bit-for-bit prior to deleting the original.
refer to some library/database that tells it which bits are still relevant to the copy and must be retained, versus which bits can be dumped. Again, it seems like the time saved during the copy operation is expended when the original is deleted. I guess if the original is never deleted the net result is a time saving, but I can see cases in which things that used to be instantaneous will now take longer. It seems like we're just trading off waiting for one thing in exchange for the waiting occurring somewhere else.

I'm not complaining -- my observations are purely academic -- and I'm sure there are lots of benefits beyond duplicating files. I just think the time issues associated with duplicates will ultimately be a zero-sum game in actual use.

edited June 2017

williamlondon · June 9, 2017 11:17PM

lorin schultz said:

dysamoria said:
As you change and save the duplicate, the file system will write more data to the device for that duplicate, and it will refer less and less to the original as it is reading from the device (because your changes aren't in the original file).

If you keep working on your duplicate file, it will eventually change so much that, due to writing changes to the file system, the space the new duplicate consumes on the volume may approach or equal the space consumed by your original. The duplicate file record will no longer refer to file system nodes storing data for the original file because those are irrelevant to your new file. No matter how much you change the duplicate, the original remains unchanged.

Excellent explanation.

There are still a couple details about which I'm not clear. I'm sure Apple has addressed them, I just don't yet understand them.

One is what happens if I delete the original. The system must have to:

rewrite the "common" data to the copy. That means the time that was saved making the copy is now used up rebuilding it. Time wasn't actually saved, the consumption of it was just deferred.
keep the original file's data and just dump the filename pointer. The retained copy would now occupy MORE storage space -- the original plus "difference" data -- than it would had it been copied bit-for-bit prior to deleting the original.
refer to some library/database that tells it which bits are still relevant to the copy and must be retained, versus which bits can be dumped. Again, it seems like the time saved during the copy operation is expended when the original is deleted. I guess if the original is never deleted the net result is a time saving, but I can see cases in which things that used to be instantaneous will now take longer. It seems like we're just trading off waiting for one thing in exchange for the waiting occurring somewhere else.

I'm not complaining -- my observations are purely academic -- and I'm sure there are lots of benefits beyond duplicating files. I just think the time issues associated with duplicates will ultimately be a zero-sum game in actual use.

There are people with PhDs who study this stuff, plus several companies have built fortunes on advanced compression which includes every single scenario you or I as lay people could ever come up with. I know this because I used to work with some of those geniuses for one of those companies.

fastasleep · June 10, 2017 2:51AM

williamlondon said:

lorin schultz said:

dysamoria said:
As you change and save the duplicate, the file system will write more data to the device for that duplicate, and it will refer less and less to the original as it is reading from the device (because your changes aren't in the original file).

If you keep working on your duplicate file, it will eventually change so much that, due to writing changes to the file system, the space the new duplicate consumes on the volume may approach or equal the space consumed by your original. The duplicate file record will no longer refer to file system nodes storing data for the original file because those are irrelevant to your new file. No matter how much you change the duplicate, the original remains unchanged.

Excellent explanation.

There are still a couple details about which I'm not clear. I'm sure Apple has addressed them, I just don't yet understand them.

One is what happens if I delete the original. The system must have to:

rewrite the "common" data to the copy. That means the time that was saved making the copy is now used up rebuilding it. Time wasn't actually saved, the consumption of it was just deferred.
keep the original file's data and just dump the filename pointer. The retained copy would now occupy MORE storage space -- the original plus "difference" data -- than it would had it been copied bit-for-bit prior to deleting the original.
refer to some library/database that tells it which bits are still relevant to the copy and must be retained, versus which bits can be dumped. Again, it seems like the time saved during the copy operation is expended when the original is deleted. I guess if the original is never deleted the net result is a time saving, but I can see cases in which things that used to be instantaneous will now take longer. It seems like we're just trading off waiting for one thing in exchange for the waiting occurring somewhere else.

I'm not complaining -- my observations are purely academic -- and I'm sure there are lots of benefits beyond duplicating files. I just think the time issues associated with duplicates will ultimately be a zero-sum game in actual use.

There are people with PhDs who study this stuff, plus several companies have built fortunes on advanced compression which includes every single scenario you or I as lay people could ever come up with. I know this because I used to work with some of those geniuses for one of those companies.

Pied Piper?

williamlondon · June 10, 2017 12:31PM

fastasleep said:

williamlondon said:

lorin schultz said:

dysamoria said:
As you change and save the duplicate, the file system will write more data to the device for that duplicate, and it will refer less and less to the original as it is reading from the device (because your changes aren't in the original file).

If you keep working on your duplicate file, it will eventually change so much that, due to writing changes to the file system, the space the new duplicate consumes on the volume may approach or equal the space consumed by your original. The duplicate file record will no longer refer to file system nodes storing data for the original file because those are irrelevant to your new file. No matter how much you change the duplicate, the original remains unchanged.

Excellent explanation.

There are still a couple details about which I'm not clear. I'm sure Apple has addressed them, I just don't yet understand them.

One is what happens if I delete the original. The system must have to:

rewrite the "common" data to the copy. That means the time that was saved making the copy is now used up rebuilding it. Time wasn't actually saved, the consumption of it was just deferred.
keep the original file's data and just dump the filename pointer. The retained copy would now occupy MORE storage space -- the original plus "difference" data -- than it would had it been copied bit-for-bit prior to deleting the original.
refer to some library/database that tells it which bits are still relevant to the copy and must be retained, versus which bits can be dumped. Again, it seems like the time saved during the copy operation is expended when the original is deleted. I guess if the original is never deleted the net result is a time saving, but I can see cases in which things that used to be instantaneous will now take longer. It seems like we're just trading off waiting for one thing in exchange for the waiting occurring somewhere else.

I'm not complaining -- my observations are purely academic -- and I'm sure there are lots of benefits beyond duplicating files. I just think the time issues associated with duplicates will ultimately be a zero-sum game in actual use.

There are people with PhDs who study this stuff, plus several companies have built fortunes on advanced compression which includes every single scenario you or I as lay people could ever come up with. I know this because I used to work with some of those geniuses for one of those companies.

Pied Piper?

Damn, wish I'd come up with that, and I'm such a huge Silicon Valley fan!

dysamoria · June 10, 2017 1:12PM

lorin schultz said:

dysamoria said:
As you change and save the duplicate, the file system will write more data to the device for that duplicate, and it will refer less and less to the original as it is reading from the device (because your changes aren't in the original file).

If you keep working on your duplicate file, it will eventually change so much that, due to writing changes to the file system, the space the new duplicate consumes on the volume may approach or equal the space consumed by your original. The duplicate file record will no longer refer to file system nodes storing data for the original file because those are irrelevant to your new file. No matter how much you change the duplicate, the original remains unchanged.

Excellent explanation.

There are still a couple details about which I'm not clear. I'm sure Apple has addressed them, I just don't yet understand them.

One is what happens if I delete the original. The system must have to:

rewrite the "common" data to the copy. That means the time that was saved making the copy is now used up rebuilding it. Time wasn't actually saved, the consumption of it was just deferred.
keep the original file's data and just dump the filename pointer. The retained copy would now occupy MORE storage space -- the original plus "difference" data -- than it would had it been copied bit-for-bit prior to deleting the original.
refer to some library/database that tells it which bits are still relevant to the copy and must be retained, versus which bits can be dumped. Again, it seems like the time saved during the copy operation is expended when the original is deleted. I guess if the original is never deleted the net result is a time saving, but I can see cases in which things that used to be instantaneous will now take longer. It seems like we're just trading off waiting for one thing in exchange for the waiting occurring somewhere else.

I'm not complaining -- my observations are purely academic -- and I'm sure there are lots of benefits beyond duplicating files. I just think the time issues associated with duplicates will ultimately be a zero-sum game in actual use.

It would seem to me that the database you mention is the file system itself (it's not just a physical storage control program, it's also a record keeper). When the original file record is deleted, the deletion process likely includes marking as "free space" the physical storage spots where that record's unique bits resided. The physical storage places aren't modified because those are writes to avoid wasting time on (changing the record is sufficient).

edited June 2017

fornadan · June 13, 2017 12:38AM

I wrote up a little Keynote that gives an idea of how Copy-on-Write works its magic, allowing you to dupe a file quickly by not duping its content, then modify content of either one while leaving the other untouched, or delete either one without having to reconstruct the other:

https://www.dropbox.com/s/im11mh2yj8jtws7/how%20Copy-on-Write%20works.key?dl=0

Note: APFS undoubtedly does it somewhat differently in detail, but the concepts are the same.

tipoo · June 18, 2017 7:47PM

No hard drive conversions yet, eh. I'm curious if APFS will help or hurt rust drives, as it seems mostly tailored for NAND.

martiniguy · February 18, 2018 9:58PM

"Apple seamlessly migrated the iOS user base to it in iOS 10.3, so we're confident that this will go smooth when it ships in the fall." LOL. Right. 10.13.3 still doesn't install on boot volumes set up in Apple's Disk Utility as RAID0. Apple doesn't even support Apple software from the previous version.

Inside macOS 10.13 High Sierra: APFS benefits end users with space, speed

Comments