Skip to main content

Apple doesn’t use your data to train Apple Intelligence; other protections

A research paper explicitly says that Apple doesn’t use your data to train Apple Intelligence. This differs from OpenAI’s policy, which does use your ChatGPT sessions to help train its model.

However, Apple says that it does scrape websites for content via Applebot, and website owners must explicitly opt-out if they don’t want this to happen …

How generative AI systems are trained

Large language models (LLMs) like ChatGPT and Apple Intelligence are trained by feeding them with large amounts of sample text written by humans.

Some consider that controversial, since copyrighted material is being scanned, and some companies are already using AI-generated content in place of content written by humans. Effectively, writers are seeing their own work used to help train AI systems to replace them. Additionally, many websites contain user-generated content, like comments and forum posts, so it’s not just writers who are having their words used in this way.

OpenAI goes further, and by default it also uses your ChatGPT sessions as additional training material. You can opt-out (see below), but most users aren’t aware that it happens, so wouldn’t know to do this.

Apple doesn’t use your data to train Apple Intelligence

A research paper published this week explains how Apple trains its own on-device and server models, and outlines the protections the company has put in place.

One assurance given is that your interactions with Apple Intelligence will not be used as training material:

We do not use our users’ private personal data or user interactions when training our foundation models.

Note that this policy will also apply to ChatGPT usage which is handed off from Siri in iOS 18 and macOS 15.

It does scrape the web, and owners must opt-out

Like OpenAI and Google, however, Apple does scrape the web to build its models – and the company also takes the same approach of assuming this is ok unless website owners explicitly opt-out.

Apple’s web scraper, Applebot, has already been doing this for years, to help train Siri and surface Spotlight suggestions beyond your devices. Applebot is now additionally being used to train Apple Intelligence.

We train our foundation models on licensed data, including data selected to enhance specific features, as well as publicly available data collected by our web-crawler, AppleBot. Web publishers have the option to opt out of the use of their web content for Apple Intelligence training with a data usage control.

See below for opt-out instructions.

Additional Apple protections for web content

Apple says that it applies additional protections for web content, intended to ensure it doesn’t inadvertently include personal data, and to filter out potentially offensive material.

We apply filters to remove personally identifiable information like social security and credit card numbers that are publicly available on the Internet. We also filter profanity and other low-quality content to prevent its inclusion in the training corpus.

More generally, Apple says that it works to avoid reflecting biased material in Apple Intelligence output.

We work continuously to avoid perpetuating stereotypes and systemic biases across our AI tools and models.

GIGO – Garbage In, Garbage Out – has been one of the big problems with LLMs. There’s a lot of sexist and racist material on the web, for example, and without steps to filter this, then AIs can end up regurgitating this.

Opting out of Apple Intelligence training

Web publishers can opt-out by including instructions in a robots.txt file – the same method that has long been used for websites to control indexing by Google.

Applebot respects standard robots.txt directives in general search crawls that are targeted at Applebot. In this example, Applebot doesn’t try to crawl documents that are under /private/ or /not-allowed/:

User-agent: Applebot
Allow: /
Disallow: /private/
User-agent: *
Disallow: /not-allowed/

Additionally, Apple errs on the side of caution by also respecting instructions directed at Googlebot.

Note that you can opt out of Apple Intelligence training while still allowing your content to be indexed for Spotlight searches – this is done by setting the user-agent to Applebot-Extended.

Opting out of ChatGPT training

As mentioned, ChatGPT won’t be trained on interactions resulting from Siri handoff in iOS 18 and macOS 15. But if you don’t want your current interactions with ChatGPT used as training material either, you can opt out from that too.

iOS app

  • Three-dots menu top-right, Settings > Data Controls
  • Toggle off “Improve the model for everyone.”

Mac app

  • In the menu bar, ChatGPT > Settings > Data Controls
  • Toggle off “Improve the model for everyone.”

9to5Mac collage: Background from Apple, Apple Logo icon by Icons8. Thanks to Matt for the Applebot-Extended spot.

FTC: We use income earning auto affiliate links. More.

You’re reading 9to5Mac — experts who break news about Apple and its surrounding ecosystem, day after day. Be sure to check out our homepage for all the latest news, and follow 9to5Mac on Twitter, Facebook, and LinkedIn to stay in the loop. Don’t know where to start? Check out our exclusive stories, reviews, how-tos, and subscribe to our YouTube channel

Comments

Author

Avatar for Ben Lovejoy Ben Lovejoy

Ben Lovejoy is a British technology writer and EU Editor for 9to5Mac. He’s known for his op-eds and diary pieces, exploring his experience of Apple products over time, for a more rounded review. He also writes fiction, with two technothriller novels, a couple of SF shorts and a rom-com!


Ben Lovejoy's favorite gear

Manage push notifications

notification icon
We would like to show you notifications for the latest news and updates.
notification icon
Please wait...processing
notification icon
We would like to show you notifications for the latest news and updates.
notification icon
Please wait...processing