Apple using hashes to flag & evaluate emails to hunt down child abuse images

AppleInsider · February 11, 2020 12:59PM

In a court filing, Apple has revealed how it is able to monitor emails passing through its systems for images of child abuse, with the iPhone maker keeping a look out for hashes pertaining to specific photographs and videos which are automatically flagged for inspection.

Like many other tech companies, Apple has systems in place to try and cut down on the amount of illegal traffic passing through its networks. Part of this is its monitoring of messages sent and received by its customers, which includes processes for the detection and reporting of images displaying child abuse.

Revealed in a search warrant filed in Seattle and found by Forbes, it is now known that Apple is using automated systems to check messages, specifically hashes. The system compares hashes of files with a database of existing hashes it knows belong to previously-reported abuse images.Emails that contain the questionable files are flagged for inspection.

For flagged emails, a staff member checks the content of the message and the files to check for illegal material. If a message is deemed to contain such content, Apple then passes the message along to the authorities.

Apple's process is seemingly more thorough than those of other firms, which typically pass the message over to organizations such as the National Center for Missing and Exploited Children when the automated system flags the message, with little to no manual checking of the content itself.

The details were brought up in the search warrant from comments made by an Apple employee, explaining how the system detected "several images of suspected child pornography" being uploaded by an iCloud user, which prompted an examination of their emails.

Emails containing suspect images are not delivered to their intended recipient, the employee wrote. In the warrant's case, one individual sent 8 emails, with 7 messages containing 12 images and the other holding another four, all to the same recipient.

As the seven emails were identical in terms of content and files, it was suspected by the employee either the person "was sending these images to himself and when they didn't deliver he sent them again repeatedly," or the intended recipient told them the messages weren't getting through.

As part of its disclosure to law enforcement, Apple also provided data on the iCloud user, including their name, address, and phone number, though it is unclear if this was included as part of the disclosure to law enforcement. The government later made requests for the user's emails, texts, instant messages, and "all files and other records stored on iCloud."

While there is the possibility some critics will object to the privacy implications of Apple staff inspecting flagged messages, the process of flagging the messages is performed using hashes and is automated, so the content of the images isn't taken into account. The use of humans for inspection helps limit the possibility of false positives in file verification, where a file may have the same hash value as another in an existing database, but have completely different properties.

The findings also cover communications that are not encrypted, namely those that don't go through the secure end-to-end encryption that has become one of the main elements of the ongoing encryption debate. Just as how law enforcement cannot access encrypted content on Apple's products or services, Apple also cannot look at the same material.

gatorguy · February 11, 2020 1:07PM

Well of course they are looking at emails. All the big providers utilize machine scanning even if some like to call it "reading our emails".

Some monetize the data in the emails themself (ie Earthlink, Edison and many others) while companies like Apple, Google and Microsoft do not monetize a user's emails but still do "read" them for protecting that user from spam/malware, organizing the person's inboxes, and for legally mandated reasons such as identifying child porn.

edited February 2020

seanismorris · February 11, 2020 1:43PM

Interesting...

I assume this is referring to @icloud.com, @mac.com, or @me.com and probably files uploaded to iCloud. If you have an Apple device you have an Apple email. Apps send emails to the Apple email which is then forwarded to your real email address (for privacy). If you use an email App you might be using Apple’s outgoing mail server and not realize it.

ninjaman · February 11, 2020 2:05PM

So Apple will flag the emails and pass along to authorities but will they unlock the savage's phone for authorities in an effort to uncover a broader distribution ring?

gatorguy · February 11, 2020 2:48PM

NinjaMan said:

So Apple will flag the emails and pass along to authorities but will they unlock the savage's phone for authorities in an effort to uncover a broader distribution ring?

With proper legal orders they'll turn over what they have access to. Bypassing your iPhone lock is not one of those things as of now.

seanismorris · February 11, 2020 2:51PM

NinjaMan said:

So Apple will flag the emails and pass along to authorities but will they unlock the savage's phone for authorities in an effort to uncover a broader distribution ring?

Unlocking a phone requires a backdoor which Apple refuses to do. Apple is assisting law enforcement, but not at the expense of product security.

Email isn’t a very secure method of communication by default. Adding security (encryption) causes complications. For example protonmail uses openPGP to encrypt emails in transit. They also encrypt the data at rest. The issues include the inability to encrypt the header (for the openPGP standard). Also, having your emails encrypted makes indexing your mailbox difficult which means limited search functionality. The fact that headers aren’t encrypted means law enforcement can still get a lot of information or at least leads.

Bottom line, users have options if increased security is necessary. But, law enforcement tries to suggest there’s easy answers to very complex problems to get what they want.

My stance is an unwillingness to sacrifice the security of everyone, to make it easier to put a few people behind bars. In most cases, they’ll be caught and convicted anyways with a bit of old fashioned police work.

esquirecats · February 11, 2020 3:34PM

I'd argue it's more privacy minded to check for false-positives, thus not automatically handing private details to a 3rd party or law enforcement. Imagine having your details in a database when there was actually no questionable content.

However hashing is reasonably robust that I can't be too critical of companies which automatically hand over the email itself. While hash collisions can occur, it's usually the result of deliberately attempting to engineer a collision, and if using a layered approach to hashes, then the chance of an already rare collision becomes incredibly remote. (I.E. When the seed images are hashed two different ways, then any positives are checked against a second, more robust hash - in Apple's case this is a pair of human eyes.)

edited February 2020

randominternetperson · February 11, 2020 3:40PM

gatorguy said:

Well of course they are looking at emails. All the big providers utilize machine scanning even if some like to call it "reading our emails".

Some monetize the data in the emails themself (ie Earthlink, Edison and many others) while companies like Apple, Google and Microsoft do not monetize a user's emails but still do "read" them for protecting that user from spam/malware, organizing the person's inboxes, and for legally mandated reasons such as identifying child porn.

I think you're overstating the bolded part. Google absolutely "monetizes a user's emails" and do so with the consent of the users. Parsing gmail messages to influence the sale of ads to each user is monetizing email content, period.

gatorguy · February 11, 2020 3:41PM

randominternetperson said:

gatorguy said:

Well of course they are looking at emails. All the big providers utilize machine scanning even if some like to call it "reading our emails".

Some monetize the data in the emails themself (ie Earthlink, Edison and many others) while companies like Apple, Google and Microsoft do not monetize a user's emails but still do "read" them for protecting that user from spam/malware, organizing the person's inboxes, and for legally mandated reasons such as identifying child porn.

I think you're overstating the bolded part. Google absolutely "monetizes a user's emails" and do so with the consent of the users. Parsing gmail messages to influence the sale of ads to each user is monetizing email content, period.

You are 100% mistaken. Google does not mine and monetize users' emails. That's a practice they stopped a couple of years ago even tho the people using the free version had given consent (?!) at the time.

edited February 2020

randominternetperson · February 11, 2020 3:46PM

gatorguy said:

randominternetperson said:

gatorguy said:

Well of course they are looking at emails. All the big providers utilize machine scanning even if some like to call it "reading our emails".

Some monetize the data in the emails themself (ie Earthlink, Edison and many others) while companies like Apple, Google and Microsoft do not monetize a user's emails but still do "read" them for protecting that user from spam/malware, organizing the person's inboxes, and for legally mandated reasons such as identifying child porn.

I think you're overstating the bolded part. Google absolutely "monetizes a user's emails" and do so with the consent of the users. Parsing gmail messages to influence the sale of ads to each user is monetizing email content, period.

You are 100% mistaken. Google does not monetize users' emails.

I stand corrected. Here's what Google says: "When you open Gmail, you'll see ads that were selected to show you the most useful and relevant ads. The process of selecting and showing personalized ads in Gmail is fully automated. These ads are shown to you based on your online activity while you're signed into Google. We will not scan or read your Gmail messages to show you ads."

Or I should say "Google NO LONGER reads your email to target ads." https://www.bloomberg.com/news/articles/2017-06-23/google-will-stop-reading-your-emails-for-gmail-ads

edited February 2020

gatorguy · February 11, 2020 3:49PM

randominternetperson said:

gatorguy said:

randominternetperson said:

gatorguy said:

Well of course they are looking at emails. All the big providers utilize machine scanning even if some like to call it "reading our emails".

Some monetize the data in the emails themself (ie Earthlink, Edison and many others) while companies like Apple, Google and Microsoft do not monetize a user's emails but still do "read" them for protecting that user from spam/malware, organizing the person's inboxes, and for legally mandated reasons such as identifying child porn.

I think you're overstating the bolded part. Google absolutely "monetizes a user's emails" and do so with the consent of the users. Parsing gmail messages to influence the sale of ads to each user is monetizing email content, period.

You are 100% mistaken. Google does not monetize users' emails.

I stand corrected. Here's what Google says: "When you open Gmail, you'll see ads that were selected to show you the most useful and relevant ads. The process of selecting and showing personalized ads in Gmail is fully automated. These ads are shown to you based on your online activity while you're signed into Google. We will not scan or read your Gmail messages to show you ads."

Or I should say "Google NO LONGER reads your email to target ads." https://www.bloomberg.com/news/articles/2017-06-23/google-will-stop-reading-your-emails-for-gmail-ads

For what it's worth all those ads are shuttled into a "promotions" tab so GMail users never have to see them unless they make the conscious decision to look.

randominternetperson · February 11, 2020 3:53PM

One thing that's interesting about Apple's approach is that it actually tips off the users that someone is on their trail. But not sending the message with the offending/illegal content, they are letting a savvy scumbag know that the jig is up. I'm surprised that the authorities wouldn't prefer a more transparent system. I'm sure they are especially upset about this coming to light. Isn't it trivially easy to alter the hash for an image? Changing a single pixel is enough. I don't know how sophisticated child pornographers are, but I'm sure they are already figuring out how to avoid this net.

zimmie · February 11, 2020 4:43PM

randominternetperson said:

One thing that's interesting about Apple's approach is that it actually tips off the users that someone is on their trail. But not sending the message with the offending/illegal content, they are letting a savvy scumbag know that the jig is up. I'm surprised that the authorities wouldn't prefer a more transparent system. I'm sure they are especially upset about this coming to light. Isn't it trivially easy to alter the hash for an image? Changing a single pixel is enough. I don't know how sophisticated child pornographers are, but I'm sure they are already figuring out how to avoid this net.

The system is a bit more robust than a simple hash like MD5 or SHA1. It's similar to how modern antivirus works, which can identify modified versions of previously-seen malware pretty consistently. These image matching techniques work pretty reliably across scaled images and with a certain amount of noise added (though big changes decrease match confidence).

If you're interested in more information about how this works, the term to look up is CSAM (Child Sexual Abuse Material). That's the technical term which the NCMEC (National Center for Missing & Exploited Children) and others have used for well over a decade. Cloudflare has a neat article up about how this type of image matching works:

https://blog.cloudflare.com/the-csam-scanning-tool/

randominternetperson · February 11, 2020 5:55PM

zimmie said:

randominternetperson said:

One thing that's interesting about Apple's approach is that it actually tips off the users that someone is on their trail. But not sending the message with the offending/illegal content, they are letting a savvy scumbag know that the jig is up. I'm surprised that the authorities wouldn't prefer a more transparent system. I'm sure they are especially upset about this coming to light. Isn't it trivially easy to alter the hash for an image? Changing a single pixel is enough. I don't know how sophisticated child pornographers are, but I'm sure they are already figuring out how to avoid this net.

The system is a bit more robust than a simple hash like MD5 or SHA1. It's similar to how modern antivirus works, which can identify modified versions of previously-seen malware pretty consistently. These image matching techniques work pretty reliably across scaled images and with a certain amount of noise added (though big changes decrease match confidence).

If you're interested in more information about how this works, the term to look up is CSAM (Child Sexual Abuse Material). That's the technical term which the NCMEC (National Center for Missing & Exploited Children) and others have used for well over a decade. Cloudflare has a neat article up about how this type of image matching works:

https://blog.cloudflare.com/the-csam-scanning-tool/

Thank you. Fascinating article and nice explanation of "fuzzy hashes."

strangedays · February 11, 2020 7:05PM

randominternetperson said:

gatorguy said:

randominternetperson said:

gatorguy said:

Well of course they are looking at emails. All the big providers utilize machine scanning even if some like to call it "reading our emails".

Some monetize the data in the emails themself (ie Earthlink, Edison and many others) while companies like Apple, Google and Microsoft do not monetize a user's emails but still do "read" them for protecting that user from spam/malware, organizing the person's inboxes, and for legally mandated reasons such as identifying child porn.

I think you're overstating the bolded part. Google absolutely "monetizes a user's emails" and do so with the consent of the users. Parsing gmail messages to influence the sale of ads to each user is monetizing email content, period.

You are 100% mistaken. Google does not monetize users' emails.

I stand corrected. Here's what Google says: "When you open Gmail, you'll see ads that were selected to show you the most useful and relevant ads. The process of selecting and showing personalized ads in Gmail is fully automated. These ads are shown to you based on your online activity while you're signed into Google. We will not scan or read your Gmail messages to show you ads."

Or I should say "Google NO LONGER reads your email to target ads." https://www.bloomberg.com/news/articles/2017-06-23/google-will-stop-reading-your-emails-for-gmail-ads

Exactly this — for the vast majority of gmail’s existence, they did read emails for the purpose of mining them for targeting advertising. They stopped a couple years ago, but for our resident google spokesperson to not even mention that in his initial post is quite disingenuous. They absolutely did.

gatorguy · February 11, 2020 7:09PM

StrangeDays said:

randominternetperson said:

gatorguy said:

randominternetperson said:

gatorguy said:

Well of course they are looking at emails. All the big providers utilize machine scanning even if some like to call it "reading our emails".

Some monetize the data in the emails themself (ie Earthlink, Edison and many others) while companies like Apple, Google and Microsoft do not monetize a user's emails but still do "read" them for protecting that user from spam/malware, organizing the person's inboxes, and for legally mandated reasons such as identifying child porn.

I think you're overstating the bolded part. Google absolutely "monetizes a user's emails" and do so with the consent of the users. Parsing gmail messages to influence the sale of ads to each user is monetizing email content, period.

You are 100% mistaken. Google does not monetize users' emails.

I stand corrected. Here's what Google says: "When you open Gmail, you'll see ads that were selected to show you the most useful and relevant ads. The process of selecting and showing personalized ads in Gmail is fully automated. These ads are shown to you based on your online activity while you're signed into Google. We will not scan or read your Gmail messages to show you ads."

Or I should say "Google NO LONGER reads your email to target ads." https://www.bloomberg.com/news/articles/2017-06-23/google-will-stop-reading-your-emails-for-gmail-ads

Exactly this — for the vast majority of gmail’s existence, they did read emails for the purpose of mining them for targeting advertising. They stopped a couple years ago, but for our resident google spokesperson to not even mention that in his initial post is quite disingenuous. They absolutely did.

You didn't even read my post did you? Go back and look at it, apparently for the first time. I did mention it.

Casting aspersions on someone without even bothering to do 10 seconds of reading before blubbering is just laziness if not outright dishonesty. Post#8 assuming you care about accuracy, which you really should pay more attention to IMO but perhaps you haven't the time.

edited February 2020

dysamoria · February 12, 2020 2:58AM

seanismorris said:

NinjaMan said:

So Apple will flag the emails and pass along to authorities but will they unlock the savage's phone for authorities in an effort to uncover a broader distribution ring?

Unlocking a phone requires a backdoor which Apple refuses to do. Apple is assisting law enforcement, but not at the expense of product security.

Email isn’t a very secure method of communication by default. Adding security (encryption) causes complications. For example protonmail uses openPGP to encrypt emails in transit. They also encrypt the data at rest. The issues include the inability to encrypt the header (for the openPGP standard). Also, having your emails encrypted makes indexing your mailbox difficult which means limited search functionality. The fact that headers aren’t encrypted means law enforcement can still get a lot of information or at least leads.

Bottom line, users have options if increased security is necessary. But, law enforcement tries to suggest there’s easy answers to very complex problems to get what they want.

My stance is an unwillingness to sacrifice the security of everyone, to make it easier to put a few people behind bars. In most cases, they’ll be caught and convicted anyways with a bit of old fashioned police work.

I wish I could mark your post as liked AND as informative. I hadn’t realized the searching of emails would be an issue with encryption like ProtonMail (a service I only just learned about and I’ve been reading about recently). Surely there’s an encryption method for an email content index that the user can control on their end just like the encryption of the emails themselves...? If the user isn’t signed in to the mail and the mailbox, then they aren’t getting a decrypted index and can’t do searches.

dysamoria · February 12, 2020 3:02AM

EsquireCats said:

I'd argue it's more privacy minded to check for false-positives, thus not automatically handing private details to a 3rd party or law enforcement. Imagine having your details in a database when there was actually no questionable content.

However hashing is reasonably robust that I can't be too critical of companies which automatically hand over the email itself. While hash collisions can occur, it's usually the result of deliberately attempting to engineer a collision, and if using a layered approach to hashes, then the chance of an already rare collision becomes incredibly remote. (I.E. When the seed images are hashed two different ways, then any positives are checked against a second, more robust hash - in Apple's case this is a pair of human eyes.)

Since we don’t actually have real AI, human visual systems are the only proper determining factor. I’ve seen what Google’s idea of “similar images” can be (not even close to being similar)...

blah64 · February 12, 2020 7:30AM

seanismorris said:

Interesting...

I assume this is referring to @icloud.com, @mac.com, or @me.com and probably files uploaded to iCloud. If you have an Apple device you have an Apple email.

This is a false statement. I have various Apple iOS devices, and I do not have any Apple email. I'm not even sure where you get this idea.

I also do not use iCloud. It does take a small conscious effort when you set up a new device, because Apple leads you down a path assuming you're going to use iCloud, but it's absolutely not required. If one chooses to use @icloud.com, @mac.com or @me.com, then certainly they are sending data through standard email, which is NOT end-to-end encrypted, but what you're saying here just isn't true.

Apps send emails to the Apple email which is then forwarded to your real email address (for privacy).

This is completely false as well. Again, I have no idea where you got this idea, and if it were true is would be less private, not more private.

I can guarantee this isn't true because by default I run my devices behind hardware firewalls that allow zero communication with Apple's servers.

blah64 · February 12, 2020 7:45AM

dysamoria said:

seanismorris said:

NinjaMan said:

So Apple will flag the emails and pass along to authorities but will they unlock the savage's phone for authorities in an effort to uncover a broader distribution ring?

Unlocking a phone requires a backdoor which Apple refuses to do. Apple is assisting law enforcement, but not at the expense of product security.

Email isn’t a very secure method of communication by default. Adding security (encryption) causes complications. For example protonmail uses openPGP to encrypt emails in transit. They also encrypt the data at rest. The issues include the inability to encrypt the header (for the openPGP standard). Also, having your emails encrypted makes indexing your mailbox difficult which means limited search functionality. The fact that headers aren’t encrypted means law enforcement can still get a lot of information or at least leads.

Bottom line, users have options if increased security is necessary. But, law enforcement tries to suggest there’s easy answers to very complex problems to get what they want.

My stance is an unwillingness to sacrifice the security of everyone, to make it easier to put a few people behind bars. In most cases, they’ll be caught and convicted anyways with a bit of old fashioned police work.

I wish I could mark your post as liked AND as informative. I hadn’t realized the searching of emails would be an issue with encryption like ProtonMail (a service I only just learned about and I’ve been reading about recently). Surely there’s an encryption method for an email content index that the user can control on their end just like the encryption of the emails themselves...? If the user isn’t signed in to the mail and the mailbox, then they aren’t getting a decrypted index and can’t do searches.

Searching encrypted data is tricky, but you should also take a look at Tutanota.com, another privacy-focused email service, like ProtonMail. Tutanota is located in Germany, which has some of the best privacy laws in the world.

Tutanota has a zero-knowledge end-to-end encrypted email system, and it also allows the account owner to search their emails. It's not enabled by default, because it uses more resources, but it is available. And impressive.

Honestly, everyone should consider moving their email to one of these services. Maybe not in a single move, but at least set one up and move things like healthcare providers and financial institutions to a private email service. Over time, if it works well for you, then you can move more and more contacts over until you feel comfortable extricating yourself from the legacy services like gmail that have full access to your content. That's so last-decade. ;-)

Apple using hashes to flag & evaluate emails to hunt down child abuse images

Comments