Apple AI researchers have developed a breakthrough method for running large language models (LLMs) on iPhones and other Apple devices with limited memory. They have invented an innovative flash memory utilization technique to overcome the memory constraints of LLM-based chatbots. The technique involves storing the AI model’s data on flash memory, which is more abundant in mobile devices than traditional RAM. The researchers use two key techniques, windowing and row-column bundling, to minimize data transfer and maximize flash memory throughput. This allows AI models to run up to twice the size of the iPhone’s available memory, resulting in significantly faster processing speeds. The breakthrough opens up possibilities for advanced Siri capabilities, real-time language translation, and AI-driven features in photography and augmented reality on future iPhones. Apple is also developing its own generative AI model called “Ajax” to rival OpenAI’s GPT-3 and GPT-4. Ajax operates on 200 billion parameters and aims to integrate AI more deeply into Apple’s ecosystem. Apple is expected to offer a combination of cloud-based AI and on-device processing for generative AI features on the iPhone and iPad by late 2024.
