We learned back in May that Apple is already using its own AI chatbot internally, which some have dubbed Apple GPT. A new research paper appears to be geared to enabling a ChatGPT-style system to run on iPhones.
A second Apple AI paper looks at ways to generate animated 3D avatars from standard video, with obvious application to Vision Pro …
VentureBeat spotted the papers.
‘Apple GPT’
The chatbot one is entitled LLM in a flash: Efficient Large Language Model Inference with Limited Memory.
The ‘flash’ in the title is a pun, as it’s about minimizing the amount of data which needs to be transferred from flash storage to RAM. LLMs is the generic term for AI chat systems that have been trained on large amounts of text.
LLMs [have] intensive computational and memory requirements [that] present challenges, especially for devices with limited DRAM capacity. This paper tackles the challenge of efficiently running LLMs that exceed the available DRAM capacity by storing the model parameters on flash memory but bringing them on demand to DRAM. Our method involves constructing an inference cost model that harmonizes with the flash memory behavior, guiding us to optimize in two critical areas: reducing the volume of data transferred from flash and reading data in larger, more contiguous chunks.
This approach enables LLMs to run up to 25 times faster on devices with limited RAM. The researchers conclude:
This breakthrough is particularly crucial for deploying advanced LLMs in resource-limited environments, thereby expanding their applicability and accessibility.
Generated animated 3D avatars from ‘flat’ video
If you want to shoot spatial video for 3D viewing on Vision Pro, the second beta of iOS 17.2 lets you do this on your iPhone.
But we all have masses of ‘flat’ (monocular) video, and Apple’s second AI paper describes a method of turning 2D video into animated 3D avatars.
The paper says that usually if you want to generate a realistic 3D avatar, that requires a multi-camera setup to capture footage from different angles, combining that into a 3D model. What Apple has achieved here is a method of doing this from a very short piece of standard video footage.
The paper is a deeply technical one, with even the abstract and conclusions packed with acronyms, but the bottom-line is that Apple’s method is roughly one hundred times faster than existing ways to achieve the same result.
Our method takes only a monocular video with a small number of (50-100) frames, and it automatically learns to disentangle the static scene and a fully animatable human avatar within 30 minutes.
This has obvious applications for Vision Pro, but can also enable things like virtual clothes fitting on your iPhone, enabling you to create a 3D avatar of yourself and then see how you’d look in various items of clothing.
When any of this will be released is a whole other question, with Kuo saying back in August that there was as yet “no sign” that the company would launch its own AI chatbot in 2024.
Photo: Max Langelott/Unsplash
FTC: We use income earning auto affiliate links. More.
Comments