A research paper explicitly says that Apple doesn’t use your data to train Apple Intelligence. This differs from OpenAI’s policy, which does use your ChatGPT sessions to help train its model.
However, Apple says that it does scrape websites for content via Applebot, and website owners must explicitly opt-out if they don’t want this to happen …
How generative AI systems are trained
Large language models (LLMs) like ChatGPT and Apple Intelligence are trained by feeding them with large amounts of sample text written by humans.
Some consider that controversial, since copyrighted material is being scanned, and some companies are already using AI-generated content in place of content written by humans. Effectively, writers are seeing their own work used to help train AI systems to replace them. Additionally, many websites contain user-generated content, like comments and forum posts, so it’s not just writers who are having their words used in this way.
OpenAI goes further, and by default it also uses your ChatGPT sessions as additional training material. You can opt-out (see below), but most users aren’t aware that it happens, so wouldn’t know to do this.
Apple doesn’t use your data to train Apple Intelligence
A research paper published this week explains how Apple trains its own on-device and server models, and outlines the protections the company has put in place.
One assurance given is that your interactions with Apple Intelligence will not be used as training material:
We do not use our users’ private personal data or user interactions when training our foundation models.
Note that this policy will also apply to ChatGPT usage which is handed off from Siri in iOS 18 and macOS 15.
It does scrape the web, and owners must opt-out
Like OpenAI and Google, however, Apple does scrape the web to build its models – and the company also takes the same approach of assuming this is ok unless website owners explicitly opt-out.
Apple’s web scraper, Applebot, has already been doing this for years, to help train Siri and surface Spotlight suggestions beyond your devices. Applebot is now additionally being used to train Apple Intelligence.
We train our foundation models on licensed data, including data selected to enhance specific features, as well as publicly available data collected by our web-crawler, AppleBot. Web publishers have the option to opt out of the use of their web content for Apple Intelligence training with a data usage control.
See below for opt-out instructions.
Additional Apple protections for web content
Apple says that it applies additional protections for web content, intended to ensure it doesn’t inadvertently include personal data, and to filter out potentially offensive material.
We apply filters to remove personally identifiable information like social security and credit card numbers that are publicly available on the Internet. We also filter profanity and other low-quality content to prevent its inclusion in the training corpus.
More generally, Apple says that it works to avoid reflecting biased material in Apple Intelligence output.
We work continuously to avoid perpetuating stereotypes and systemic biases across our AI tools and models.
GIGO – Garbage In, Garbage Out – has been one of the big problems with LLMs. There’s a lot of sexist and racist material on the web, for example, and without steps to filter this, then AIs can end up regurgitating this.
Opting out of Apple Intelligence training
Web publishers can opt-out by including instructions in a robots.txt file – the same method that has long been used for websites to control indexing by Google.
Applebot respects standard robots.txt directives in general search crawls that are targeted at Applebot. In this example, Applebot doesn’t try to crawl documents that are under /private/ or /not-allowed/:
User-agent: Applebot
Allow: /
Disallow: /private/
User-agent: *
Disallow: /not-allowed/
Additionally, Apple errs on the side of caution by also respecting instructions directed at Googlebot.
Note that you can opt out of Apple Intelligence training while still allowing your content to be indexed for Spotlight searches – this is done by setting the user-agent to Applebot-Extended.
Opting out of ChatGPT training
As mentioned, ChatGPT won’t be trained on interactions resulting from Siri handoff in iOS 18 and macOS 15. But if you don’t want your current interactions with ChatGPT used as training material either, you can opt out from that too.
iOS app
- Three-dots menu top-right, Settings > Data Controls
- Toggle off “Improve the model for everyone.”
Mac app
- In the menu bar, ChatGPT > Settings > Data Controls
- Toggle off “Improve the model for everyone.”
9to5Mac collage: Background from Apple, Apple Logo icon by Icons8. Thanks to Matt for the Applebot-Extended spot.
FTC: We use income earning auto affiliate links. More.
Comments