What you need to know: Apple's iCloud Photos and Messages child safety initiatives

archstanton · August 6, 2021 6:39PM

killroy said:

All the other tech companys have been doing this for over ten years! So why the blow up When Apple finds a safer way to do it?

Because Apple has actually and demonstrably made real privacy a part of their platform. They have guarded user privacy even sometimes when law enforcement came knocking at the door. This is stark contrast to Android where surveillance of what you do and where you go (among many others including picture scans) is how they make the lion's share of the profit. Where sharing with law enforcement and "intelligence" community approaches is all but routine.

I don't per se think this one item from Apple is a bad thing IF Apple puts in their ToS exactly what they re doing and most important unequivocally states anti MEC is all it does, period, stop, nothing else. As another poster said, this could be the tip of a wider wedge. Government will surely keep coming for information on things a million miles below MEC. This will happen, period. If Apple gives in then they might as well start collecting everything you do and where you go etc etc etc and auction it off on the real time electronic exchange for ad data, and increase their revenue by many tens of billions. Why wouldn't they? If they are going to do what the others do then they might as well make the money.

macplusplus · August 6, 2021 6:40PM

That scheme, can be easily made into a legislation, approved, then applied to citizens' daily lives: a database for drug related photo hashes, a database for terrorism related photo hashes, another one for immigration related photo hashes...

edited August 2021

crowley · August 6, 2021 6:52PM

macplusplus said:

That scheme, can be easily made into a legislation, approved, then applied to citizens' daily lives: a database for drug related photo hashes, a database for terrorism related photo hashes, another one for immigration related photo hashes...

You think there are photos of drugs being circulated amongst drug users and suppliers? Or photos of terrorisms being circulated amongst terror groups? For the most part that's not really how those groups work. Child abuse networks however have been shown many times over to be disseminators of vast amounts of media of child abuse.

irenew · August 6, 2021 6:56PM

Mike Wuerthele said:

elijahg said:

Mike Wuerthele said:

elijahg said:

Remember that 1 in 1 trillion isn't 1 false positive per 1 trillion iCloud accounts - it's 1 per 1 trillion photos. I have 20,000 photos, that brings the chances I have a falsely flagged photo to 1 in 50 million. Not quite such spectacular odds then.

One in a trillion over 20,000 photos is not 1 in 50 million. It's one in a trillion, 20,000 times. The odds do not decrease per photo, as your photo library increases in size. There is not a 1:1 guarantee of a falsely flagged photo in a trillion-strong photo library.

And even if it was, one in 50 million is still pretty spectacularly against.

Unfortunately it is - 1 in 1 trillion becomes 2 in 1 trillion with two photos. Or 1 in 500 billion. That then halves again with 4 photos, 1 in 250 billion and so on. It's little more than simplified fractions. Punch 1,000,000,000,000/20,000 into a scientific calculator and it'll be simplified to 50,000,000/1. The odds do decrease because there is a more likelihood you have a matching photo with 2 photos than 1 photo. And yes, statistically speaking 1 in 1 trillion means that in a trillion-strong library there will be one false match.

Also, it's massively more likely someone will get their password phished than a hash collision occurring - probably 15-20% of people I know have been "hacked" through phishing. All it takes is a couple of photos to be planted, with a date a few years ago so they aren't at the forefront of someone's library and someone's in very hot water. You claim someone could defend against this in court, but I fail to understand how? "I don't know how they got there" isn't going to wash with too many people. And unfortunately, "good security practices" are practised only by the likes of us anyway, most people use the same password with their date of birth or something equally insecure for everything.

1 in 50 million is not the same statistically as one in a trillion tried 20,000 times, no matter how much you want it to be so, I'm afraid. Regardless, your 1 in 50 million is still a very large number.

One in a trillion tried a trillion times does not guarantee a match, although it is likely. as you're saying. There may even be two or three. You're welcome to believe what you want, and you can research it with statisticians if you are so inclined. This is the last I will address this point here.

And, in regards to the false positive, somebody will look at the image, and say something like: Oh, this is a palm tree. It just coincidentally collides with the hash. All good. Story over.

In regards to your latter point, this is addressed in the article.

Sorry, but you _really_ need to brush up your statistics knowledge... @elijahg is absolutely correct (and stating that you refuse to comment on the topic any further doesn't help your case).

mike wuerthele · August 6, 2021 7:00PM

IreneW said:

Mike Wuerthele said:

elijahg said:

Mike Wuerthele said:

elijahg said:

Remember that 1 in 1 trillion isn't 1 false positive per 1 trillion iCloud accounts - it's 1 per 1 trillion photos. I have 20,000 photos, that brings the chances I have a falsely flagged photo to 1 in 50 million. Not quite such spectacular odds then.

One in a trillion over 20,000 photos is not 1 in 50 million. It's one in a trillion, 20,000 times. The odds do not decrease per photo, as your photo library increases in size. There is not a 1:1 guarantee of a falsely flagged photo in a trillion-strong photo library.

And even if it was, one in 50 million is still pretty spectacularly against.

Unfortunately it is - 1 in 1 trillion becomes 2 in 1 trillion with two photos. Or 1 in 500 billion. That then halves again with 4 photos, 1 in 250 billion and so on. It's little more than simplified fractions. Punch 1,000,000,000,000/20,000 into a scientific calculator and it'll be simplified to 50,000,000/1. The odds do decrease because there is a more likelihood you have a matching photo with 2 photos than 1 photo. And yes, statistically speaking 1 in 1 trillion means that in a trillion-strong library there will be one false match.

Also, it's massively more likely someone will get their password phished than a hash collision occurring - probably 15-20% of people I know have been "hacked" through phishing. All it takes is a couple of photos to be planted, with a date a few years ago so they aren't at the forefront of someone's library and someone's in very hot water. You claim someone could defend against this in court, but I fail to understand how? "I don't know how they got there" isn't going to wash with too many people. And unfortunately, "good security practices" are practised only by the likes of us anyway, most people use the same password with their date of birth or something equally insecure for everything.

1 in 50 million is not the same statistically as one in a trillion tried 20,000 times, no matter how much you want it to be so, I'm afraid. Regardless, your 1 in 50 million is still a very large number.

One in a trillion tried a trillion times does not guarantee a match, although it is likely. as you're saying. There may even be two or three. You're welcome to believe what you want, and you can research it with statisticians if you are so inclined. This is the last I will address this point here.

And, in regards to the false positive, somebody will look at the image, and say something like: Oh, this is a palm tree. It just coincidentally collides with the hash. All good. Story over.

In regards to your latter point, this is addressed in the article.

Sorry, but you _really_ need to brush up your statistics knowledge... @elijahg is absolutely correct (and stating that you refuse to comment on the topic any further doesn't help your case).

My statistics 201 professor disagrees. Is your argument somehow that one in 50 million for any given user with a 20,000-strong file library is a giant danger?

Regardless, the larger point stands. As discussed in the article, there won't be a flurry of false matches with the system, nor lives ruined by false matches.

edited August 2021

mike_galloway · August 6, 2021 7:02PM

Thank you Mike for you effort in your article, even though you are probably wasting your time on this particular forum - all of us (including myself) are usually entrenched in our opinions despite any evidence to the contrary.

cpsro · August 6, 2021 7:11PM

Mike Wuerthele said:

elijahg said:

Remember that 1 in 1 trillion isn't 1 false positive per 1 trillion iCloud accounts - it's 1 per 1 trillion photos. I have 20,000 photos, that brings the chances I have a falsely flagged photo to 1 in 50 million. Not quite such spectacular odds then.

One in a trillion over 20,000 photos is not 1 in 50 million. It's one in a trillion, 20,000 times. The odds do not decrease per photo, as your photo library increases in size. There is not a 1:1 guarantee of a falsely flagged photo in a trillion-strong photo library.

And even if it was, one in 50 million is still pretty spectacularly against.

One in 50 million with over 1 billion accounts is 200 false positives per year.
But Apple might claim the false positive rate is per account, not per photo*. If it's per account and there are 1 billion accounts, the false positive rate would be about 0.1% per year or one false identification every 1000 years (hah!).
These statistics are under idealized (perfectly random) circumstances and real data typically don't look random.

*If the statistics were per photo, the chance of being falsely identified with a library of 1 trillion photos would be about 0.6, using Poisson statistics.

edited August 2021

macplusplus · August 6, 2021 7:13PM

crowley said:

macplusplus said:

That scheme, can be easily made into a legislation, approved, then applied to citizens' daily lives: a database for drug related photo hashes, a database for terrorism related photo hashes, another one for immigration related photo hashes...

You think there are photos of drugs being circulated amongst drug users and suppliers? Or photos of terrorisms being circulated amongst terror groups? For the most part that's not really how those groups work. Child abuse networks however have been shown many times over to be disseminators of vast amounts of media of child abuse.

So what? Forget drug or terrorism or immigration related photos and replace it with CSAM or <insert any database here>.

The point is the statutory nature such a scheme may easily acquire after Apple elaborately implements it and the Government more elaborately makes it into law, for whatever hash database at their discretion. The citizens won’t have any control on these statutory databases and then will have to fight at the courts for civil liberties.

edited August 2021

mike wuerthele · August 6, 2021 7:22PM

cpsro said:

Mike Wuerthele said:

elijahg said:

Remember that 1 in 1 trillion isn't 1 false positive per 1 trillion iCloud accounts - it's 1 per 1 trillion photos. I have 20,000 photos, that brings the chances I have a falsely flagged photo to 1 in 50 million. Not quite such spectacular odds then.

One in a trillion over 20,000 photos is not 1 in 50 million. It's one in a trillion, 20,000 times. The odds do not decrease per photo, as your photo library increases in size. There is not a 1:1 guarantee of a falsely flagged photo in a trillion-strong photo library.

And even if it was, one in 50 million is still pretty spectacularly against.

One in 50 million with over 1 billion accounts is 200 false positives per year.
But Apple might claim the false positive rate is per account, not per photo.
These statistics are, however, under idealized (perfectly random) circumstances and real data typically doesn't look entirely random.

The false positive rate is probably per account, considering the SHA-256 collision rate is one in 4 * 10^60 processes.

That's four novemdecillion hashes. Had to look that one up.

22july2013 · August 6, 2021 7:23PM

I'm not saying that I'm for or against this system. I am unbiased for now. I don't have any position yet, I just have some technical and procedural questions.

1. Why would CSAM or Apple use a hash algorithm that has a 1 in 1,000,000,000 (a trillion) chance of a mismatch, when using the MD5 hashing algorithm which is already built-into macOS has a 1 in 70,000,000,000,000,000,000,000,000,000 (70 octillion) chance of a mismatch? Yes, I know this is an image hash and not a byte-wise file hash, but even so, why was the image hash algorithm designed with such an amazingly high collision chance? Is it impossible to design an image hash that has an error rate in the octillions? Why did they settle for an error rate as surprisingly large as one in a trillion? I want to know.

2. What happens if the news media releases a picture of a child in a victimized situation but blurs or hides the indecent parts, in order to help get the public to identify the child, and what if this image gets sent to my iCloud? Is that going to trigger a match? The image has lots in common with the original offending image. Sure, a small part of the image was cropped out, but the CSAM algorithm results in matches even when images area as they say, "slightly cropped." They said this: "an image that has been slightly cropped or resized should be considered identical to its original and have the same hash." This means it could trigger a match. Am I entitled to know exactly how much of a change will typically cause the match to fail? Or is this something the public is not allowed to learn about?

3. When Apple detects a matching photo hash, how does Apple (or the US government) take into account the location of the culprit when a match occurs? Suppose the culprit who owns the offending photo resides in, say, Russia. What happens then? The US can't put that person on trial (although since 2004 the US has been putting a growing number foreign terrorists on trial after dragging them into the US, and then using tenuous links like accessing an American bank account as justification for charging them in a US Federal court.) About 96% of the people in the world do not live in the US, so does that mean a high percentage of the cases will never go to trial? Does the US government send a list of suspects to Vladimir Putin every week when they find Russians who have these illegal images in their iCloud? Or do Russian culprits get ignored? What about Canadian culprits, since Canada is on good terms with the US? Does the US government notify the Canadian government, or does the US wait until the culprit attempts to cross the border for a vacation? I want to see the list of countries that the US government provides its suspect list with. Or is this something the public is not allowed to learn about?

And now for a couple of hypothetical questions of less urgency but similar importance:

4. If a friendly country like Canada were to develop its own database, would Apple report Canadian matches to the Canadian authority, or to the US authority, or to both governments? In other words, is Apple treating the US government as the sole arbitrator of this data, or will it support other jurisdictions?

5. What happens if an unfriendly country like China builds its own child abuse database, would Apple support that, and then would Apple report to Chinese authorities? And how would Apple know that China hasn't included images of the Tiananmen massacre in its own database?

And now a final question that comes to my mind:

Since the crimes Apple is trying to fight occur all over the world, shouldn't the ICC (International Criminal Court) be creating a CSAM database? Frankly, I blame the ICC for not tackling this problem. I'm aware that US Republicans generally oppose the ICC, but Democrats sound much more open to using the ICC. Biden has been silent on the ICC since getting elected, but he has said that his administration: "will support multilateral institutions and put human rights at the center of our efforts to meet the challenges of the 21st century.” Reading between the lines, that sounds like he supports the ICC. And since 70% of Americans support the ICC, according to a poll, maybe this is a popular issue that Biden can hang his hat on.

My main concern with this whole topic is that there are so many important questions like these that are not being considered.

dbsync · August 6, 2021 7:28PM

Just wanted to say this is a well-written article, but unfortunately, many people will continue with the slippery slope argument no matter what you say. This is not something novel that Apple does, but they have found a way to do it with the most privacy. Google and Microsoft don't even pretend to try to preserve your privacy.

mike wuerthele · August 6, 2021 7:32PM

22july2013 said:

I'm not saying that I'm for or against this system. I am unbiased for now. I don't have any position yet, I just have some technical and procedural questions.

1. Why would CSAM or Apple use a hash algorithm that has a 1 in 1,000,000,000 (a trillion) chance of a mismatch, when using the MD5 hashing algorithm which is already built-into macOS has a 1 in 70,000,000,000,000,000,000,000,000,000 (70 octillion) chance of a mismatch? Yes, I know this is an image hash and not a byte-wise file hash, but even so, why was the image hash algorithm designed with such an amazingly high collision chance? Is it impossible to design an image hash that has an error rate in the octillions? Why did they settle for an error rate as surprisingly large as one in a trillion? I want to know.

2. What happens if the news media releases a picture of a child in a victimized situation but blurs or hides the indecent parts, in order to help get the public to identify the child, and what if this image gets sent to my iCloud? Is that going to trigger a match? The image has lots in common with the original offending image. Sure, a small part of the image was cropped out, but the CSAM algorithm results in matches even when images area as they say, "slightly cropped." They said this: "an image that has been slightly cropped or resized should be considered identical to its original and have the same hash." This means it could trigger a match. Am I entitled to know exactly how much of a change will typically cause the match to fail? Or is this something the public is not allowed to learn about?

3. When Apple detects a matching photo hash, how does Apple (or the US government) take into account the location of the culprit when a match occurs? Suppose the culprit who owns the offending photo resides in, say, Russia. What happens then? The US can't put that person on trial (although since 2004 the US has been putting a growing number foreign terrorists on trial after dragging them into the US, and then using tenuous links like accessing an American bank account as justification for charging them in a US Federal court.) About 96% of the people in the world do not live in the US, so does that mean a high percentage of the cases will never go to trial? Does the US government send a list of suspects to Vladimir Putin every week when they find Russians who have these illegal images in their iCloud? Or do Russian culprits get ignored? What about Canadian culprits, since Canada is on good terms with the US? Does the US government notify the Canadian government, or does the US wait until the culprit attempts to cross the border for a vacation? I want to see the list of countries that the US government provides its suspect list with. Or is this something the public is not allowed to learn about?

And now for a couple of hypothetical questions of less urgency but similar importance:

4. If a friendly country like Canada were to develop its own database, would Apple report Canadian matches to the Canadian authority, or to the US authority, or to both governments? In other words, is Apple treating the US government as the sole arbitrator of this data, or will it support other jurisdictions?

5. What happens if an unfriendly country like China builds its own child abuse database, would Apple support that, and then would Apple report to Chinese authorities? And how would Apple know that China hasn't included images of the Tiananmen massacre in its own database?

And now a final question that comes to my mind:

Since the crimes Apple is trying to fight occur all over the world, shouldn't the ICC (International Criminal Court) be creating a CSAM database? Frankly, I blame the ICC for not tackling this problem. I'm aware that US Republicans generally oppose the ICC, but Democrats sound much more open to using the ICC. Biden has been silent on the ICC since getting elected, but he has said that his administration: "will support multilateral institutions and put human rights at the center of our efforts to meet the challenges of the 21st century.” Reading between the lines, that sounds like he supports the ICC. And since 70% of Americans support the ICC, according to a poll, maybe this is a popular issue that Biden can hang his hat on.

My main concern with this whole topic is that there are so many important questions like these that are not being considered.

1. Based on what I've been told, and what the NCMEC has said, the hash is SHA-256 and the core technology that Microsoft delivered for it to the NCMEC has been SHA-256 for some time. I'm sure there's some wobble and decrease in the odds because of the "modified image" provision. The one in a trillion appears to be an under-promise, and over-deliver.

2. A "slight crop" will be similar as you say. A blur is a noticeable change to the hash. I'm pretty sure there'll be no transparency on this, as was discussed in the piece.

3. Existing law prevails where the user is located. The system is US-only. The NCMEC isn't transparent about data sources for the hash database.

4. No idea.

5. No idea.

ICC, probably, but it doesn't look like they are.

edited August 2021

cpsro · August 6, 2021 7:33PM

Mike Wuerthele said:

cpsro said:

Mike Wuerthele said:

elijahg said:

Remember that 1 in 1 trillion isn't 1 false positive per 1 trillion iCloud accounts - it's 1 per 1 trillion photos. I have 20,000 photos, that brings the chances I have a falsely flagged photo to 1 in 50 million. Not quite such spectacular odds then.

One in a trillion over 20,000 photos is not 1 in 50 million. It's one in a trillion, 20,000 times. The odds do not decrease per photo, as your photo library increases in size. There is not a 1:1 guarantee of a falsely flagged photo in a trillion-strong photo library.

And even if it was, one in 50 million is still pretty spectacularly against.

One in 50 million with over 1 billion accounts is 200 false positives per year.
But Apple might claim the false positive rate is per account, not per photo.
These statistics are, however, under idealized (perfectly random) circumstances and real data typically doesn't look entirely random.

The false positive rate is probably per account, considering the SHA-256 collision rate is one in 4 * 10^60 processes.

That's four novemdecillion hashes. Had to look that one up.

Many assumptions are behind the false positive rate calculation Apple provided, and we can't be certain Apple has accounted for everything. For instance, if a given photo can be hashed 1000 different ways, that increases the likelihood of a false match by up to 1000-fold (a Bonferroni correction) and a conservative estimate would use that 1000x factor. Is Apple being conservative in their estimates?

edited August 2021

cpsro · August 6, 2021 7:39PM

Mike Wuerthele said:

IreneW said:

Mike Wuerthele said:

elijahg said:

Mike Wuerthele said:

elijahg said:

Remember that 1 in 1 trillion isn't 1 false positive per 1 trillion iCloud accounts - it's 1 per 1 trillion photos. I have 20,000 photos, that brings the chances I have a falsely flagged photo to 1 in 50 million. Not quite such spectacular odds then.

One in a trillion over 20,000 photos is not 1 in 50 million. It's one in a trillion, 20,000 times. The odds do not decrease per photo, as your photo library increases in size. There is not a 1:1 guarantee of a falsely flagged photo in a trillion-strong photo library.

And even if it was, one in 50 million is still pretty spectacularly against.

Unfortunately it is - 1 in 1 trillion becomes 2 in 1 trillion with two photos. Or 1 in 500 billion. That then halves again with 4 photos, 1 in 250 billion and so on. It's little more than simplified fractions. Punch 1,000,000,000,000/20,000 into a scientific calculator and it'll be simplified to 50,000,000/1. The odds do decrease because there is a more likelihood you have a matching photo with 2 photos than 1 photo. And yes, statistically speaking 1 in 1 trillion means that in a trillion-strong library there will be one false match.

Also, it's massively more likely someone will get their password phished than a hash collision occurring - probably 15-20% of people I know have been "hacked" through phishing. All it takes is a couple of photos to be planted, with a date a few years ago so they aren't at the forefront of someone's library and someone's in very hot water. You claim someone could defend against this in court, but I fail to understand how? "I don't know how they got there" isn't going to wash with too many people. And unfortunately, "good security practices" are practised only by the likes of us anyway, most people use the same password with their date of birth or something equally insecure for everything.

1 in 50 million is not the same statistically as one in a trillion tried 20,000 times, no matter how much you want it to be so, I'm afraid. Regardless, your 1 in 50 million is still a very large number.

One in a trillion tried a trillion times does not guarantee a match, although it is likely. as you're saying. There may even be two or three. You're welcome to believe what you want, and you can research it with statisticians if you are so inclined. This is the last I will address this point here.

And, in regards to the false positive, somebody will look at the image, and say something like: Oh, this is a palm tree. It just coincidentally collides with the hash. All good. Story over.

In regards to your latter point, this is addressed in the article.

Sorry, but you _really_ need to brush up your statistics knowledge... @elijahg is absolutely correct (and stating that you refuse to comment on the topic any further doesn't help your case).

My statistics 201 professor disagrees. Is your argument somehow that one in 50 million for any given user with a 20,000-strong file library is a giant danger?

Regardless, the larger point stands. As discussed in the article, there won't be a flurry of false matches with the system, nor lives ruined by false matches.

I don't believe you consulted a statistician, as one in 50 million out of 1 billion accounts means 200 people would be falsely identified per year. That seems like a lot and that estimate is based on idealized circumstances. However, as mentioned before, the false positive rate Apple provided is likely per account, not per photo. But I still don't believe you consulted a statistician.

mike wuerthele · August 6, 2021 7:42PM

cpsro said:

Mike Wuerthele said:

IreneW said:

Mike Wuerthele said:

elijahg said:

Mike Wuerthele said:

elijahg said:

Remember that 1 in 1 trillion isn't 1 false positive per 1 trillion iCloud accounts - it's 1 per 1 trillion photos. I have 20,000 photos, that brings the chances I have a falsely flagged photo to 1 in 50 million. Not quite such spectacular odds then.

One in a trillion over 20,000 photos is not 1 in 50 million. It's one in a trillion, 20,000 times. The odds do not decrease per photo, as your photo library increases in size. There is not a 1:1 guarantee of a falsely flagged photo in a trillion-strong photo library.

And even if it was, one in 50 million is still pretty spectacularly against.

Unfortunately it is - 1 in 1 trillion becomes 2 in 1 trillion with two photos. Or 1 in 500 billion. That then halves again with 4 photos, 1 in 250 billion and so on. It's little more than simplified fractions. Punch 1,000,000,000,000/20,000 into a scientific calculator and it'll be simplified to 50,000,000/1. The odds do decrease because there is a more likelihood you have a matching photo with 2 photos than 1 photo. And yes, statistically speaking 1 in 1 trillion means that in a trillion-strong library there will be one false match.

Also, it's massively more likely someone will get their password phished than a hash collision occurring - probably 15-20% of people I know have been "hacked" through phishing. All it takes is a couple of photos to be planted, with a date a few years ago so they aren't at the forefront of someone's library and someone's in very hot water. You claim someone could defend against this in court, but I fail to understand how? "I don't know how they got there" isn't going to wash with too many people. And unfortunately, "good security practices" are practised only by the likes of us anyway, most people use the same password with their date of birth or something equally insecure for everything.

1 in 50 million is not the same statistically as one in a trillion tried 20,000 times, no matter how much you want it to be so, I'm afraid. Regardless, your 1 in 50 million is still a very large number.

One in a trillion tried a trillion times does not guarantee a match, although it is likely. as you're saying. There may even be two or three. You're welcome to believe what you want, and you can research it with statisticians if you are so inclined. This is the last I will address this point here.

And, in regards to the false positive, somebody will look at the image, and say something like: Oh, this is a palm tree. It just coincidentally collides with the hash. All good. Story over.

In regards to your latter point, this is addressed in the article.

Sorry, but you _really_ need to brush up your statistics knowledge... @elijahg is absolutely correct (and stating that you refuse to comment on the topic any further doesn't help your case).

My statistics 201 professor disagrees. Is your argument somehow that one in 50 million for any given user with a 20,000-strong file library is a giant danger?

Regardless, the larger point stands. As discussed in the article, there won't be a flurry of false matches with the system, nor lives ruined by false matches.

I don't believe you consulted a statistician, as one in 50 million out of 1 billion accounts means 200 people would be falsely identified per year. That seems like a lot and that estimate is based on idealized circumstances. However, as mentioned before, the false positive rate Apple provided is likely per account, not per photo. But I still don't believe you consulted a statistician.

You're misinterpreting what I said in my response. The "My stat 201 professor" is in reference to "really need to brush up on your statistics knowledge."

The 1 in 50 million is a number postulated by Elijah, based on a 20K library.

22july2013 · August 6, 2021 7:43PM

Mike Wuerthele said:

22july2013 said:

I'm not saying that I'm for or against this system. I am unbiased for now. I don't have any position yet, I just have some technical and procedural questions.

1. Why would CSAM or Apple use a hash algorithm that has a 1 in 1,000,000,000 (a trillion) chance of a mismatch, when using the MD5 hashing algorithm which is already built-into macOS has a 1 in 70,000,000,000,000,000,000,000,000,000 (70 octillion) chance of a mismatch? Yes, I know this is an image hash and not a byte-wise file hash, but even so, why was the image hash algorithm designed with such an amazingly high collision chance? Is it impossible to design an image hash that has an error rate in the octillions? Why did they settle for an error rate as surprisingly large as one in a trillion? I want to know.

2. What happens if the news media releases a picture of a child in a victimized situation but blurs or hides the indecent parts, in order to help get the public to identify the child, and what if this image gets sent to my iCloud? Is that going to trigger a match? The image has lots in common with the original offending image. Sure, a small part of the image was cropped out, but the CSAM algorithm results in matches even when images area as they say, "slightly cropped." They said this: "an image that has been slightly cropped or resized should be considered identical to its original and have the same hash." This means it could trigger a match. Am I entitled to know exactly how much of a change will typically cause the match to fail? Or is this something the public is not allowed to learn about?

3. When Apple detects a matching photo hash, how does Apple (or the US government) take into account the location of the culprit when a match occurs? Suppose the culprit who owns the offending photo resides in, say, Russia. What happens then? The US can't put that person on trial (although since 2004 the US has been putting a growing number foreign terrorists on trial after dragging them into the US, and then using tenuous links like accessing an American bank account as justification for charging them in a US Federal court.) About 96% of the people in the world do not live in the US, so does that mean a high percentage of the cases will never go to trial? Does the US government send a list of suspects to Vladimir Putin every week when they find Russians who have these illegal images in their iCloud? Or do Russian culprits get ignored? What about Canadian culprits, since Canada is on good terms with the US? Does the US government notify the Canadian government, or does the US wait until the culprit attempts to cross the border for a vacation? I want to see the list of countries that the US government provides its suspect list with. Or is this something the public is not allowed to learn about?

And now for a couple of hypothetical questions of less urgency but similar importance:

4. If a friendly country like Canada were to develop its own database, would Apple report Canadian matches to the Canadian authority, or to the US authority, or to both governments? In other words, is Apple treating the US government as the sole arbitrator of this data, or will it support other jurisdictions?

5. What happens if an unfriendly country like China builds its own child abuse database, would Apple support that, and then would Apple report to Chinese authorities? And how would Apple know that China hasn't included images of the Tiananmen massacre in its own database?

And now a final question that comes to my mind:

Since the crimes Apple is trying to fight occur all over the world, shouldn't the ICC (International Criminal Court) be creating a CSAM database? Frankly, I blame the ICC for not tackling this problem. I'm aware that US Republicans generally oppose the ICC, but Democrats sound much more open to using the ICC. Biden has been silent on the ICC since getting elected, but he has said that his administration: "will support multilateral institutions and put human rights at the center of our efforts to meet the challenges of the 21st century.” Reading between the lines, that sounds like he supports the ICC. And since 70% of Americans support the ICC, according to a poll, maybe this is a popular issue that Biden can hang his hat on.

My main concern with this whole topic is that there are so many important questions like these that are not being considered.

1. Based on what I've been told, and what the NCMEC has said, the hash is SHA-256 and the core technology that Microsoft delivered for it to the NCMEC has been SHA-256 for some time. I'm sure there's some wobble and decrease in the odds because of the "modified image" provision. The one in a trillion appears to be an under-promise, and over-deliver.

2. A "slight crop" will be similar as you say. A blur is a noticeable change to the hash. I'm pretty sure there'll be no transparency on this, as was discussed in the piece.

3. Existing law prevails where the user is located. The system is US-only. The NCMEC isn't transparent about data sources for the hash database.

4. No idea.

5. No idea.

ICC, probably, but it doesn't look like they are.

I appreciate that you tried to answer my questions. But even the ones you did answer weren't clear. If you are saying that the NCMEC claims to be using SHA256, then why didn't you provide that information in the article, and how do you get a high error rate of "a trillion" if you are using SHA256? That makes no sense. The way you talk about "slight crops" and "blurs" is not at all clear, and is not justified with any evidence, and is therefore not convincing. Your third answer really doesn't tell me anything. "Existing law prevails." What does that mean? Does it mean that the US government will or will not inform the foreign government? You didn't answer that question. How do I get the answer to my question? Are the answers available or will the government never tell us what they do in these cases?

edited August 2021

mike wuerthele · August 6, 2021 7:51PM

22july2013 said:

Mike Wuerthele said:

22july2013 said:

I'm not saying that I'm for or against this system. I am unbiased for now. I don't have any position yet, I just have some technical and procedural questions.

1. Why would CSAM or Apple use a hash algorithm that has a 1 in 1,000,000,000 (a trillion) chance of a mismatch, when using the MD5 hashing algorithm which is already built-into macOS has a 1 in 70,000,000,000,000,000,000,000,000,000 (70 octillion) chance of a mismatch? Yes, I know this is an image hash and not a byte-wise file hash, but even so, why was the image hash algorithm designed with such an amazingly high collision chance? Is it impossible to design an image hash that has an error rate in the octillions? Why did they settle for an error rate as surprisingly large as one in a trillion? I want to know.

2. What happens if the news media releases a picture of a child in a victimized situation but blurs or hides the indecent parts, in order to help get the public to identify the child, and what if this image gets sent to my iCloud? Is that going to trigger a match? The image has lots in common with the original offending image. Sure, a small part of the image was cropped out, but the CSAM algorithm results in matches even when images area as they say, "slightly cropped." They said this: "an image that has been slightly cropped or resized should be considered identical to its original and have the same hash." This means it could trigger a match. Am I entitled to know exactly how much of a change will typically cause the match to fail? Or is this something the public is not allowed to learn about?

3. When Apple detects a matching photo hash, how does Apple (or the US government) take into account the location of the culprit when a match occurs? Suppose the culprit who owns the offending photo resides in, say, Russia. What happens then? The US can't put that person on trial (although since 2004 the US has been putting a growing number foreign terrorists on trial after dragging them into the US, and then using tenuous links like accessing an American bank account as justification for charging them in a US Federal court.) About 96% of the people in the world do not live in the US, so does that mean a high percentage of the cases will never go to trial? Does the US government send a list of suspects to Vladimir Putin every week when they find Russians who have these illegal images in their iCloud? Or do Russian culprits get ignored? What about Canadian culprits, since Canada is on good terms with the US? Does the US government notify the Canadian government, or does the US wait until the culprit attempts to cross the border for a vacation? I want to see the list of countries that the US government provides its suspect list with. Or is this something the public is not allowed to learn about?

And now for a couple of hypothetical questions of less urgency but similar importance:

4. If a friendly country like Canada were to develop its own database, would Apple report Canadian matches to the Canadian authority, or to the US authority, or to both governments? In other words, is Apple treating the US government as the sole arbitrator of this data, or will it support other jurisdictions?

5. What happens if an unfriendly country like China builds its own child abuse database, would Apple support that, and then would Apple report to Chinese authorities? And how would Apple know that China hasn't included images of the Tiananmen massacre in its own database?

And now a final question that comes to my mind:

Since the crimes Apple is trying to fight occur all over the world, shouldn't the ICC (International Criminal Court) be creating a CSAM database? Frankly, I blame the ICC for not tackling this problem. I'm aware that US Republicans generally oppose the ICC, but Democrats sound much more open to using the ICC. Biden has been silent on the ICC since getting elected, but he has said that his administration: "will support multilateral institutions and put human rights at the center of our efforts to meet the challenges of the 21st century.” Reading between the lines, that sounds like he supports the ICC. And since 70% of Americans support the ICC, according to a poll, maybe this is a popular issue that Biden can hang his hat on.

My main concern with this whole topic is that there are so many important questions like these that are not being considered.

1. Based on what I've been told, and what the NCMEC has said, the hash is SHA-256 and the core technology that Microsoft delivered for it to the NCMEC has been SHA-256 for some time. I'm sure there's some wobble and decrease in the odds because of the "modified image" provision. The one in a trillion appears to be an under-promise, and over-deliver.

2. A "slight crop" will be similar as you say. A blur is a noticeable change to the hash. I'm pretty sure there'll be no transparency on this, as was discussed in the piece.

3. Existing law prevails where the user is located. The system is US-only. The NCMEC isn't transparent about data sources for the hash database.

4. No idea.

5. No idea.

ICC, probably, but it doesn't look like they are.

I appreciate that you tried to answer my questions. But even the ones you did answer weren't clear. If you are saying that the NCMEC claims to be using SHA256, then why didn't you provide that information in the article, and how do you get a high error rate of "a trillion" if you are using SHA256? That makes no sense. The way you talk about "slight crops" and "blurs" is not at all clear, and is not justified with any evidence, and is therefor not convincing. Your third answer really doesn't tell me anything. "Existing law prevails." What does that mean? Does it mean that the US government will or will not inform the foreign government? You didn't answer that question. How do I get the answer to my question? Are the answers available or will the government never tell us what they do in these cases?

Apple says one in a trillion, not us. I have no idea how they came to that conclusion. The SHA-256 point isn't super-relevant to the point of the article in question, and is relevant to questions in the forum, which is why I brought it up -- there is more to be said about it in future articles as we ask more questions. "Slighty cropped" from your comment, and blurs, are discussed in Apple's white paper.

This article is a basic primer. There are more to come. As it pertains to "not clear" - you know what I do about this.

There is no simple answer to your follow-on question as it pertains to #3, given treaties and whatnot. Apple has geofencing for all kinds of services like Apple Music, and it stands to reason, that they will here too. For now, this is a US system, run in the US, on US phones. US law prevails. If you're from Xistan and in the US, then US law applies as it pertains to the hashing, regardless if Xistan's laws are less restrictive. I'm sure the feds won't tell what they did specifically if the scenario is more complex.

And on that note, my workday is over, and it's time to do house repairs. I'm not expecting a lot of developments or more information over the weekend on the topic, despite having about 100 emails and calls out this morning and yesterday evening.

edited August 2021

baconstang · August 6, 2021 8:10PM

So, I guess just owning an iPhone is considered "probable cause"?

irenew · August 6, 2021 8:16PM

Mike Wuerthele said:

cpsro said:

Mike Wuerthele said:

elijahg said:

Remember that 1 in 1 trillion isn't 1 false positive per 1 trillion iCloud accounts - it's 1 per 1 trillion photos. I have 20,000 photos, that brings the chances I have a falsely flagged photo to 1 in 50 million. Not quite such spectacular odds then.

One in a trillion over 20,000 photos is not 1 in 50 million. It's one in a trillion, 20,000 times. The odds do not decrease per photo, as your photo library increases in size. There is not a 1:1 guarantee of a falsely flagged photo in a trillion-strong photo library.

And even if it was, one in 50 million is still pretty spectacularly against.

One in 50 million with over 1 billion accounts is 200 false positives per year.
But Apple might claim the false positive rate is per account, not per photo.
These statistics are, however, under idealized (perfectly random) circumstances and real data typically doesn't look entirely random.

The false positive rate is probably per account, considering the SHA-256 collision rate is one in 4 * 10^60 processes.

That's four novemdecillion hashes. Had to look that one up.

Is it specified anywhere tto be a SHA-256? Some article mentioned that this was a algorithm developed and donated by Microsoft, which allowed changes in pictures but still providing a match.
If so, the number may be smaller (or bigger), so let's stay with the facts.

mike wuerthele · August 6, 2021 8:22PM

IreneW said:

Mike Wuerthele said:

cpsro said:

Mike Wuerthele said:

elijahg said:

Remember that 1 in 1 trillion isn't 1 false positive per 1 trillion iCloud accounts - it's 1 per 1 trillion photos. I have 20,000 photos, that brings the chances I have a falsely flagged photo to 1 in 50 million. Not quite such spectacular odds then.

One in a trillion over 20,000 photos is not 1 in 50 million. It's one in a trillion, 20,000 times. The odds do not decrease per photo, as your photo library increases in size. There is not a 1:1 guarantee of a falsely flagged photo in a trillion-strong photo library.

And even if it was, one in 50 million is still pretty spectacularly against.

One in 50 million with over 1 billion accounts is 200 false positives per year.
But Apple might claim the false positive rate is per account, not per photo.
These statistics are, however, under idealized (perfectly random) circumstances and real data typically doesn't look entirely random.

The false positive rate is probably per account, considering the SHA-256 collision rate is one in 4 * 10^60 processes.

That's four novemdecillion hashes. Had to look that one up.

Is it specified anywhere tto be a SHA-256? Some article mentioned that this was a algorithm developed and donated by Microsoft, which allowed changes in pictures but still providing a match.
If so, the number may be smaller (or bigger), so let's stay with the facts.

The facts are, Apple says one in a trillion, and I don't know if it's per picture or per account (edit - see next page. It's one in a trillion per account) NCMEC says the current algo is based on SHA-256 which is the 4*10^60, but it won't be that based on the changes in pictures allowances.

Somewhere in between the two is the actual number.

NOW I'm getting up. Talk to you all off and on over the weekend.

edited August 2021

What you need to know: Apple's iCloud Photos and Messages child safety initiatives

Comments