Welcome to AIEdTalks’ Newsletter!

In today's edition:

  • Your agent's bill grows quadratically. Your dashboard shows an average.

Let’s dive in.

Today’s Edition

AI TOPIC
Your agent's bill grows quadratically. Your dashboard shows an average.

Last week I wrote about the circuit breaker pattern and why your agents need it. This week is the other half: the cost problem that makes you need one in the first place — and why your monitoring is probably hiding it from you right now.

The math on your pricing page doesn't show

LLM APIs are stateless. The model remembers nothing between calls, so to continue a conversation your agent re-sends the entire history every single time. Step 1 sends 1,000 tokens. Step 2 sends the original 1,000 plus its own 1,000. Step 3 sends 3,000.

By step 20, you aren't paying for 20,000 input tokens. You're paying for the running sum: 1,000 + 2,000 + 3,000, all the way up.

That sum is N(N+1)/2 — the triangular number. A 20-step loop where each step adds 1,000 tokens of context costs roughly 210,000 input tokens, not 20,000. Ten times what your back-of-envelope estimate said. Push it to 50 steps and you're at ~1.3 million tokens for a task you mentally budgeted at 50,000.

Your per-step cost looks flat. Your cumulative cost is a parabola. And if the loop has no stop condition, it rides that parabola until your credit card taps out.

This isn't theoretical. One engineer on DEV.to described an agent that burned $200 in a single night — the same API call, over and over, for six hours. A separate first-person account on Substack described a four-agent LangChain system that ran up roughly $47,000 over eleven days when two agents got stuck in a conversation loop with each other. I can't independently verify that second number — the company isn't named and the details have drifted as the story got retold — but the shape is consistent with the math. A 50-step loop at $0.01 per 1,000 input tokens doesn't cost $0.50. It costs $12.75. Scale that to real agent traces and real token prices, and five-figure incidents aren't surprising. They're arithmetic.

Two things that make this invisible

Prompt caching won't save you. Every provider now offers prompt caching, and it feels like a safety net. It isn't — not for this problem. Caching discounts the stable prefix of your prompt: the system message, the tool definitions, the few-shot examples. But the thing that's growing is the conversation history, and that's unique on every call. The fast-growing, expensive part of your bill is exactly the part caching can't touch.

Run the numbers on your last week of agent calls. Calculate what percentage of your total input tokens came from the system prompt (cacheable) versus the conversation history (not). If the conversation history dominates — and for any multi-step agent it will — caching is shaving the small end while the big end compounds unchecked.

Your monthly average is lying. Agent cost distributions are bimodal. Most runs are cheap — a few tool calls, done in 30 seconds, costs a fraction of a cent. But a few runs are catastrophic: a stuck loop, a retry storm, an agent that explores 40 paths when it needed 4. Average the cheap runs and the catastrophic ones together and the number looks fine. It always looks fine right up until the invoice arrives.

The signal lives in the tail. A monthly average can't tell you that three runs last Tuesday cost more than the other 10,000 combined. A p95 or p99 cost-per-run alert can — and it catches the runaway while it's a $4 problem instead of a $4,000 one.

Find out if you have this problem tonight

Before you build anything, two checks. Five minutes.

Check 1: Sort last week's runs by cost, descending, and look at the top 1%. Pull your API usage by run or session ID. Sort by total tokens per run, highest first. If your most expensive run is 50–100× your median run, you have a tail problem. That tail is where loops hide — runs that didn't crash, didn't error, just quietly compounded until they stopped.

What you're looking for: a small number of runs that account for a disproportionate share of your total spend. Three runs out of 10,000 consuming 40% of the bill. That's the shape.

Check 2: Find your longest tool-call chain. Group your logs by run ID and count tool calls per run. Your typical agent probably finishes in 5–15 tool calls. Anything an order of magnitude past that — 80, 150, 300 calls in a single run — is a loop that happened to stop before the bill got scary. It's a near-miss, and next time it won't stop.

If both come back clean, you're in good shape — build the prevention layer at your own pace. If they don't, you just found money on the floor. And if you found a 50× tail in Check 1, you need the circuit breaker from last week's issue right now — not eventually. That piece has the three-state pattern, the code, and the defaults. Start there.

The parabola doesn't care about your estimate

Every team that runs agents in production eventually hits this. The question is whether you find it in a five-minute log query or a five-figure invoice.

The curve is structural. The same re-billed context that makes an agent feel coherent is the thing that makes it expensive. Caching can't fix it. Averages hide it. The only things that surface it are per-run cost tracking, tail-percentile alerts, and a hard ceiling that kills the run when the tab gets too high.

Measure the curve or pay the curve. There is no third option.

👋 That’s All Folks!

Before you go, just a few public service announcements:

  • Do you have a topic in mind you'd like us to cover? DM me 

  • Looking to sponsor AIEdTalks’ Newsletter? DM me, and we’ll get back to you asap.

See you soon,

AIEdTalks’ Newsletter Team

Recommended for you