Earlier this week, an investigation detailed that Apple and other tech giants had used YouTube subtitles to train their AI models. This included over 170,000 videos from the likes of MKBHD, Mr. Beast, and more. Apple then used this dataset to train its open-source OpenELM models, which were released back in April.
Apple has now confirmed to 9to5Mac, however, that OpenELM doesn’t power any of its AI or machine learning features – including Apple Intelligence.
Apple responds to YouTube AI training controversy
Apple says that it created the OpenELM model as a way of contributing to the research community and advancing open source large language model development. In the past, Apple researchers have described OpenELM as a “state-of-the-art open language model.”
According to Apple, OpenELM was created only for research purposes, not for use to power any of its Apple Intelligence features. The model was published open-source and is widely available, including on Apple’s Machine Learning Research website.
Because OpenELM isn’t used as part of Apple Intelligence, this means the “YouTube Subtitles” dataset isn’t used to power Apple Intelligence. In the past, Apple has said that Apple Intelligence models were trained “on licensed data, including data selected to enhance specific features, as well as publicly available data collected by our web-crawler.”
Finally, Apple also tells me that it has no plans to build any new versions of the OpenELM model.
As Wired reported earlier this week, companies including Apple, Anthropic, and NVIDIA all used this “YouTube Subtitles” dataset to train their AI models. This dataset is part of a larger collection called “The Pile,” from the non-profit EleutherAI.
FTC: We use income earning auto affiliate links. More.
Comments