Apple’s secret AI bet

And yours too?

Hi, and happy Tuesday.

Apple’s AI strategy - after being mocked for years - is starting to look like a winner, and it has implications for all of us.

But, for context, look at this chart of capex spending: 

Capex spending by the big tech companies

When ChatGPT went viral in late 2022, all the big tech companies got into the data center buildout business to serve ChatGPT-like AI models… except for Apple.

After taking a “wait and see” approach, the company introduced Apple Intelligence - which promised the moon and shipped a flashlight.

Now, Apple is paying Google a billion dollars a year to whitelabel Google's Gemini as the new Siri. Yes, Siri’s new brain will be Google's Gemini - effectively crowning Google the winner in mobile AI, across iPhone and Android.

The popular press would suggest Apple has surrendered the AI race.

But could it be that they’re the only ones in the room who know what game we're actually playing?

On-device AI

For years, Apple has talked about running on-device AI, but little attention has been given to this.

But, earlier this year, when OpenClaw and personal always-on agents blew up, the first thing everyone learned was how expensive off-device AI is. 

  • With a normal chatbot, you ask twelve questions a day and it burns a few tokens 

  • An always-on agent is different. It doesn't sleep. It runs around the clock, checking, doing, thinking, twenty-four hours a day. And it burns tokens the entire time.

Running a constant agent on a top-tier model, hosted off-device, on the cloud, from a big provider isn't expensive - it's unaffordable. And it’s one thing to have your documents fetched from a data-center 200 miles away. It’s quite another to have every millisecond thought running through your agent’s brain done so remotely. 

Which points to the emerging shape for how we’ll use AI:

  • Using a small, cheap model that lives right next to you for the grunt work - the constant, boring, all-day stuff.

  • Saving the expensive genius in the cloud for the handful of problems that are genuinely hard. 

The result? You spend less, you wait less, and your data never leaves the building.

So which device is best to host a small, local model? And here's where everyone, all at once, started buying the same machine: the Mac mini.

We did it too. We set up an open-source model on a Mac mini at Prescouter, expecting a weekend of pain. It took a few hours. 

Which raises the obvious question…

Why the Mac mini?

Apple has been building the right technology for this moment for years.

To build the iPhone - a computer that could fit in your pocket - Apple had to rebuild the entire stack, including the microprocessor. The dominant Intel microprocessors of the time were too power hungry. So they designed their own chips from scratch - built to do heavy work on very little power.   The Mac mini delivers a version of it in an affordable package, without the bells and whistles.

For an agent running 24/7, power efficiency stops being a minor detail and becomes an economic necessity. A traditional PC running local AI models can easily draw 400W to 800W of power, turning an office into a noisy, expensive sauna. A Mac mini does the same grunt work pulling about 20 to 30 watts while staying completely silent.

Secondly, traditional computers have a strict division of labor:

  • the CPU, which is the  main workhorse processor 

  • the GPU, which powers the matrix math used for tasks ranging from rendering video  to running AI models on your computer.

Traditionally, each has its own pool of RAM memory, with a small RAM allocation for GPUs.

The problem? Large language models are massive. Running a GPT-3 class model locally can easily require 128GB of GPU RAM just to wake up.

Apple bypassed this bottleneck entirely by unifying the memory used by both the CPU and GPU.   If you buy a Mac mini configured with 128GB of unified memory, almost all of that memory is instantly accessible to the GPU and available for use with the LLM. 

For the price of a mid-range desktop, you get a compact, whisper-quiet lunchbox capable of loading giant AI models that would completely paralyze a standard consumer computer.

In 2017, Apple introduced “bionic neural engine” - their first iteration of local, on-device AI processing, initially for facial recognition on the iPhone.

Where this is all headed

To match a Mac on a traditional PC architecture, you would have to daisy-chain multiple enterprise-grade graphics cards - costing thousands of dollars and turning your office into a roaring, 1,000-watt space heater.

But, other companies have now taken notice of Apple’s lead in catering for the emerging need for on-device AI.

Last week, NVIDIA and Microsoft responded with RTX Spark, which walks Windows away from the Intel based “Wintel” architecture that cemented Window’s desktop monopoly. 

NVIDIA has produced a combination CPU / GPU chip  that copies Apple's homework on efficiency and memory design, then adds their own unmatched AI brute force to create a purpose-built machine for local, always-on AI agents - running on Windows.

And the Gemini-based Siri? Apple is having Google customize it so it runs predominantly on-device. Apple’s hardware advantage may mean there’s a chance the best version of Gemini might be the one called Siri, running on iPhones.

What this means for you

Right now we assume our AI lives far away, in someone else's data center, forever. 

We’re now entering an era where, not only will our computers and phones  run AI models, we will start to see:

  • AI devices on our networks that run an always-on agent. 

  • Dedicated AI devices - ranging from pins to ones your IT department hands out the way they hand out laptops. 

At Prescouter, we’ve been working with always-on agent technology, figuring out how to make it secure and easy to use for our needs. Over the next few months, we will start rolling this out for our whole team as well as a few clients - all running over Apple hardware. 

While Apple may have lost the war for AI in the cloud, it's winning the war for the hardware the whole agent era actually runs on. 

Best,

Dino

ps. If you’re curious about what we’re working on, you can check out this one-pager. Let me know if you have any questions.