GPT-5 and the art of prediction: When AI counts

Joel Petersson

15 Aug 2025 — 4 min read

Before We Begin

If you’re reading this, chances are you’ve already met GPT-5 which is the latest, smartest iteration of OpenAI’s language models. Whether through ChatGPT Plus or early access, you’re experiencing AI that’s “more powerful and context-aware” than ever before. Very cool.

Yet, despite these leaps forward, there’s a fascinating limitation that remains stubbornly human: AI still can’t count like we do. Sounds trivial? Well yes. But let’s dig into why this “counting problem” matters more than you might think.

The strawberry “R” counting: small problem with a big reveal?

You’ve probably heard or seen someone in your design/linkidink/media circle

"Well i can fool chattGPT, lets try asking how many ‘r’s are in the word strawberry?”

Most of us with a human nature would (hopefully) answer: "Three".

But ask GPT-5, and you might get an unexpected answer. Sometimes it nails the three Rs, sometimes it says two or four. It might even go wildly off. This isn’t necessarily because GPT-5 is dumb or careless, as we think of it. It’s because AI doesn’t count letters the way humans do. Still, we expect it. Why?

Because GPT-5 isn’t scanning letters one by one. It processes language as a sequence of tokens, aka chunks that can be whole words, parts, or even single characters. Depending on how the model tokenizes “strawberry,” it might treat it as one token or break it down. When you ask “how many R's,” GPT-5 doesn’t tally letters; it predicts what the answer should be based on patterns in its training data.

Put simply: AI’s “counting” is pattern prediction, not literal tallying.

Let's check another interesting case, this one with image generation

The full glass of wine: when AI tries to predict

If counting letters reveals AI’s blind spots, image generation can show where it shines, well, not without its quirks.

Ask GPT-5 or its image-generating counterparts to create a full glass of wine, and you might get impressive visuals: reflections on the glass, deep red hues, and the way light plays across the liquid. But here’s the usual catch: the glass usually isn’t filled to the rim. Instead, it’s often about three-quarters full. This happens because the AI’s training data mostly contains images of glasses that aren’t full. The model doesn’t understand physical concepts like volume or fluid. And it is making an educated guess based on patterns in millions of images it has processed

So, the images AI generates aren’t exact replicas but probabilistic blends of what “a full glass” typically looks like.

GPT-5’s improvements mean it can predict more nuanced details, however, like wine swirls in a tilted glass. Still, these are predictions, not precise simulations grounded in physics. In short, AI’s idea of “full” is shaped by pattern recognition, not measurement or firsthand experience.

💡

Is this interesting to you? Bommasani et al. (2021) provide a comprehensive overview of foundation model large-scale AI models trained on vast datasets that power many modern AI applications. This was already early in 2021. These models excel at predicting and generating outputs based on patterns in data, but do not possess a true understanding. It is an interesting read. Go look. https://arxiv.org/abs/2108.07258

Tokens, characters, and the quirks of AI counting

Tokens are to make things simple, GPT’s building blocks. Sometimes one token equals a whole word; sometimes it’s just a few letters. This is a factor that makes character-level precision tricky.

When you want to know how many letters or specific characters appear, GPT-5 has to translate from token patterns back to characters, a somewhat fuzzy process prone to errors, as we read about above. This helps explain why it can confidently give the wrong count for something as simple as “strawberry.”

And this problem isn’t unique to GPT-5. It’s a limitation baked into how large language models understand language.

So, does this still matter?

You might be thinking, “Counting Rs in ‘strawberry’ isn’t exactly a dealbreaker.

That's fair.

But I think this little quirk exposes something big about AI today:

“GPT-5 isn’t about knowing facts perfectly; it’s about predicting the most likely answer based on vast but imperfect data.”

This distinction matters hugely as AI moves deeper into decision-making and automating real-world tasks. Imagine AI helping with financial reports, scientific measurements, or legal documents. If it can’t reliably “count” or handle precise data, errors might slip in with serious consequences. Why this "rant" matters is because of the sureties we use it with today. We are prone to trust what sounds professional, knowledgeable, and or correct.

Final thoughts on LLM's prediction power (and its limits)

GPT-5 marks a huge step forward in AI’s ability to predict, create, and assist. But it’s still fundamentally a pattern predictor, not a literal thinker. The strawberry “r” counting glitch isn’t just a funny mistake; it’s a signal that AI’s “knowledge” is probabilistic, not absolute.

As LLMs like GPT-5 take on bigger roles in work and life, understanding this nuance is crucial. Prediction is powerful but imperfect, and the AI’s quirks remind us to stay curious and cautious. So yes, GPT-5 can help write, plan, and even imagine your next weekend. But don’t expect it to count your Rs or pour your wine perfectly just yet.

GPT-5 and the art of prediction: When AI counts

Joel Petersson

Before We Begin

The strawberry “R” counting: small problem with a big reveal?

The full glass of wine: when AI tries to predict

Tokens, characters, and the quirks of AI counting

So, does this still matter?

Final thoughts on LLM's prediction power (and its limits)

Read more

The AI Bubble

Reinforcement learning from human feedback

The "Lovable" trap

What if AI doesn’t just respond to our queries but reads between the lines?