I Asked a Weird Question. The Answer Changed How I Think About AI.

How many Mac Minis would you need to run Claude's Opus 4.8?

May 30, 2026

It started with a silly thought.

Everyone in my feed was posting about Claude Opus 4.8. The announcements, the benchmarks, the “this changes everything” takes. And somewhere in the middle of scrolling through all of it, a completely different question popped into my head:

What would it actually take to run this thing yourself?

Not for any practical reason. I wasn’t about to try it. It was just one of those questions that gets stuck in your brain and won’t leave until you actually work through it. So I did.

Let’s Talk About Scale for a Second

To understand why this question matters, you need to understand what these frontier models actually are under the hood.

AI models are built from billions — or in this case, potentially trillions — of parameters. Think of parameters as the tiny dials and knobs the model uses to process and generate information. The more parameters, the more complex the thinking the model can do. And Opus 4.8 is enormous.

Anthropic hasn’t published the exact number, but based on how the model performs and how much compute it requires, estimates put Opus 4.8 somewhere between 1.5 and 3 trillion parameters. That’s not a typo. Trillion.

Now, here’s where the Mac Mini comes in.

A Mac Mini M4 Pro with 16GB of unified memory is a legitimately great machine. It’s fast, power-efficient, and capable of running smaller AI models locally. But 16GB of memory can only hold so much. To load and run a model the size of Opus 4.8, you need an enormous amount of RAM — and you need it all talking to each other at the same time.

When I worked through the math, the number that came out the other side was somewhere around 200 Mac Minis.

Two hundred. Networked together. Just to handle a single inference — a single response from the model.

That’s not a server room. That’s a small data center. And we haven’t even talked about the electricity bill, the cooling system, the engineering team to keep it all running, or the software to orchestrate 200 machines into behaving like one.

But Here’s Where It Gets Interesting

Once I got past the initial shock of that number, I realized the more important question isn’t can you run it locally — it’s should you be running it at all?

Because here’s what I’m seeing in practice, across companies big and small: everyone has defaulted to routing everything through the biggest, most capable model they can access.

Customer support tickets? Opus 4.8. Generating a SQL query from a user’s natural language input? Opus 4.8. Classifying whether a piece of content violates a policy? Opus 4.8. Summarizing a two-paragraph email? Opus 4.8.

I get why this happens. When you first discover that a frontier model can do all of these things, the temptation is to just... let it. It’s smart enough. It handles everything. Why think harder?

But this is the part where I have to be direct: this is a really expensive way to solve simple problems.

Imagine hiring a neurosurgeon to give you a band-aid. Technically, they can do it. But it doesn’t make any sense — for your budget, for their time, or for the system you’re trying to build.

That’s what it looks like when you route a simple content classification task through a trillion-parameter model.

The Formula 1 Car Analogy

I keep coming back to this comparison: running Opus 4.8 on everything is like using a Formula 1 car to drive to the grocery store.

The car is spectacular. It goes from 0 to 300 km/h in seconds. The engineering is extraordinary. But it burns through fuel at an insane rate, you can’t park it anywhere, it’s not built for stop-and-go traffic, and you’re dramatically over-specced for a 10-minute trip across town.

Most AI tasks are grocery store trips.

And the teams that are genuinely winning with AI right now have figured this out. They’re not asking “what’s the most powerful model we can use?” They’re asking: “What is the minimum capability we need to solve this problem well?”

That’s a fundamentally different question. And it leads to a very different architecture.

What a Smarter Model Portfolio Actually Looks Like

When you start thinking this way, you stop thinking of AI as a single tool and start thinking of it as a layered system. Different models for different jobs.

Here’s a rough mental model I use:

Small, fast models — things like Claude Haiku or a fine-tuned open-source 7B model — are perfect for high-volume, repetitive tasks where speed and cost matter more than nuanced reasoning. Classifying support tickets. Extracting structured data from a form. Flagging content for review. You run millions of these a day, and you want them cheap and instant.

Mid-tier models — Sonnet, GPT-4o-mini, and their equivalents — handle the bulk of the work that actually requires some intelligence. Drafting responses, answering moderately complex questions, generating and explaining code, summarizing long documents. This is where most of your production AI workload probably lives.

Frontier models — Opus 4.8, GPT-5, Gemini Ultra — are genuinely transformative for hard problems. Complex multi-step reasoning. Sophisticated code architecture. Deep analysis. Novel problem-solving. Tasks where being 10% better actually changes the outcome in a meaningful way.

The key insight is: most of your tasks are not frontier tasks.

And when you design your system to match the model to the task, something magical happens — your costs drop dramatically, your latency gets faster, and your system becomes more predictable and reliable. Not despite using smaller models. Because of it.

The Real Takeaway

I started this with a silly question about Mac Minis, but the answer points at something I think matters a lot right now.

We’re in a moment where AI capability is accelerating faster than most people’s ability to think clearly about when and how to use it. The default behavior is to throw the biggest model at every problem and call it a day. And I understand why — it’s fast, it works well enough, and it avoids the harder thinking.

But the teams building AI systems that will actually scale — that will still make sense economically in 12 months, that will perform reliably under real production load — those teams are doing the harder work of understanding their tasks deeply and matching the right tool to each one.

You don’t need 200 Mac Minis.

You need the right model, in the right place, doing the right job.

The Agent Layer

Discussion about this post

Ready for more?