Most LLM interactions follow the same pattern: ask a question, get an answer in seconds. The model does one pass — searches the web, reads a few pages, synthesises a response — and you're done. The result is usually competent. It's also usually a first draft pretending to be a final answer.
term-llm has a feature called progressive mode that changes this. Instead of answering once and stopping, the agent gets a time budget — say, 5 or 10 minutes — and keeps working. It drafts, critiques its own work, searches for more sources, verifies claims, corrects errors, and only finalises when the budget runs out. The model explicitly knows how much time it has and can choose how to spend it.
I wanted to know: does this actually produce better results? Or does the model just pad its answer with filler?
The Experiment
Same question, same model, three configurations. The question is deliberately one where surface-level answers are easy but thorough answers require cross-referencing multiple sources:
What are the top 10 audible sci fi audiobooks released in the past year? Search the web for current listings.
The three runs:
| Run | Command | Time budget |
|---|---|---|
| Vanilla | term-llm ask "..." --agent jarvis | None (finishes when done) |
| Progressive 5m | term-llm ask "..." --agent jarvis --progressive --timeout 5m | 5 minutes + finalization grace |
| Progressive 10m | term-llm ask "..." --agent jarvis --progressive --timeout 10m | 10 minutes + finalization grace |
Model: Claude Opus (via claude-bin provider). All three runs used the same agent configuration with web search enabled. No cherry-picking — these are the actual outputs from one run of each.
Vanilla: The First Draft
The vanilla run took about 90 seconds. It searched the web, found Audible's "Best of 2025" editorial list, read the page, and presented all 10 titles with genre annotations:
Full vanilla output (3.3 KB — click to expand)
Straight from Audible's own "Best of 2025" list, here are their top 10
sci-fi & fantasy audiobooks of the past year. Fair warning: it's a mixed
bag — some are hard sci-fi, others lean fantasy or LitRPG. I've flagged
the genre for each:
- The Incandescent – Emily Tesh (dark fantasy / magical school)
- There Is No Antimemetics Division – qntm (sci-fi / horror)
- The Fourth Consort – Edward Ashton (space opera / sci-fi comedy)
- What We Can Know – Ian McEwan (speculative / cli-fi)
- This Inevitable Ruin – Matt Dinniman (LitRPG / sci-fi)
- 1% Lifesteal – Robert Blaise (progression fantasy / LitRPG)
- The Prophecy, Season 2 – Kerry Washington (supernatural thriller)
- The River Has Roots – Amal El-Mohtar (fairy tale / literary fantasy)
- Bury Our Bones in the Midnight Soil – V.E. Schwab (gothic / vampire)
- Greenteeth – Molly O’Neill (cozy fantasy)
The two strongest pure sci-fi picks: There Is No Antimemetics Division
and The Fourth Consort. If you only want hard SF and skip the
fantasy/LitRPG, start there.
This is a perfectly fine answer. The genre flags are helpful. The recommendations at the bottom are sensible. If you asked a friend who'd spent 90 seconds researching, you'd be satisfied.
But it has blind spots it doesn't know about:
- It only consulted one source (Audible's editorial list from November 2025)
- Half the titles are fantasy, not sci-fi — the list is "Sci-Fi & Fantasy" combined, and the model didn't question that
- It missed the 2025 Hugo Award winner (The Tainted Cup) entirely
- It missed the 2025 Nebula Award winner (The Dragonfly Gambit) entirely
- No narrator details, no runtimes, no cross-referencing against critics' lists
- Nothing released after November 2025
The model didn't lie. It just stopped too early.
Progressive 5 Minutes: The Self-Critique
The 5-minute run is where things get interesting. After producing its initial draft (similar to vanilla), the model did something the vanilla run never does: it criticised its own output and kept going.
Here's the exact moment it pivots, visible in the raw output:
The current answer is solid but has gaps: Audible's list is from
November 2025 (misses anything since), it's mixed sci-fi/fantasy
rather than sci-fi-focused, and I haven't checked current bestseller
charts or early 2026 releases. Let me fix that.It then searched for the AudioFile critics' list, the 2025 Hugo winner, the Nebula Awards, and the current Audible live chart. It found The Tainted Cup (Hugo winner), Operation Bounce House (Feb 2026, currently charting), and The Dragonfly Gambit (Nebula winner). It also did something subtle: it started verifying its own claims.
Risky claims to verify: The Tainted Cup's actual release date
(2024 or 2025?), whether Shroud by Tchaikovsky has a firm Audible
listing and release date, whether The Dragonfly Gambit is available
as an audiobook...The final output includes 10 entries with sources, award data, and honest caveats about where the Audible editorial list falls short. Here's entry #1 — which vanilla missed entirely:
5-minute output excerpt — The Tainted Cup (entry #1)
1. The Tainted Cup — Robert Jackson Bennett
Detective sci-fi/fantasy | Hugo Award Winner 2025 · World Fantasy
Award Winner 2025 | Narrated by Andrew Fallaize
A Holmes-and-Watson detective duo investigate an imperial officer who
died when a tree erupted from his body. Set in an empire built on
leviathan blood and biological contagion — strange, layered
worldbuilding wrapped in a fiendishly clever murder mystery. Ana
Dolabra wears a blindfold and never leaves home. Her assistant Din
has a magically perfect memory. T.J. Klune called it “a fantasy
version of Knives Out.” NYT, NPR, and Elle all named it a best book
of the year. It’s the year’s most award-laden genre novel and
inexplicably absent from Audible’s own November list.
Confirmed on Audible — also recommended: the just-released sequel
A Trade of Blood (2025).
Progressive 10 Minutes: The Full Audit
The 10-minute run does everything the 5-minute run does, then keeps going. It consulted more sources (Penguin Random House audiobook lists, Book Riot new releases, Goodreads), fact-checked individual Audible product pages, and verified narrator credits and runtimes.
But the most interesting difference is structural. The 10-minute run produced:
- Full metadata cards for every entry — narrator, runtime, genre, sourcing
- UK vs US comparison — the UK and US Audible "Best of 2025" lists share 7 of 10 titles, with 3 swaps. Vanilla didn't even know there were two lists.
- A "pure sci-fi" filter — extracted the 3 titles that are unambiguously science fiction from the mixed genre list
- Post-list coverage — flagged notable releases from Dec 2025 through March 2026 that postdate the editorial list
- Explicit corrections — caught and fixed things from its own earlier drafts
10-minute output excerpt — entry cards with metadata
### 3. There Is No Antimemetics Division — qntm
Narrator: Rebecca Calder · Runtime: 8h 47m · Genre: Sci-Fi Horror
Expanded from the SCP Foundation universe. Entities that destroy your
memory of their existence — how do you fight something you can’t
remember exists? Non-linear, redacted-document structure. Raises
questions about deepfakes and fascism without being heavy-handed.
The genuinely weird pick of the list.
4. The Fourth Consort — Edward Ashton
Narrator: Barrie Kreinik · Runtime: 8h 25m · Genre: Sci-Fi (Space Comedy)
Humans as a client civilization in a galactic power structure. A
human diplomat and a stick-man alien stranded on an insect-politics
planet. Mickey7 author doing highbrow sci-fi meets Simon Pegg
buddy comedy.
10-minute output — UK vs US analysis (vanilla had no idea this existed)
UK vs. US: The UK "Best of 2025" shares 7 of these 10 titles.
The UK-only swaps include:
- Flybot by Dennis E. Taylor (standalone AI thriller, narrated by
Ray Porter — not part of Bobiverse but same author/narrator energy)
- The Devils by Joe Abercrombie
- Harry Potter Full-Cast Production
The 3 US-only titles that get swapped out: The Fourth Consort,
1% Lifesteal, and The Prophecy Season 2.
10-minute output — self-correction in action
A few things worth verifying: the list was published Nov 17,
2025 — it misses ~4 months of releases up to today (March 2026).
Also, half the list is fantasy, not strict sci-fi. Let me check
for December 2025–2026 notable releases and cross-reference
against other major lists.
[searches Penguin Random House, Book Riot, GoodEReader]
Earlier claim “Flybot is in the Bobiverse” — confirmed correct in
spirit but imprecise. Flybot is a standalone novel, not in the
Bobiverse series. The Audible listing calls Taylor “the number one
best-selling author of the Bobiverse series” and reviewers note
“echoes of the Bobiverse” — but it’s its own thing.
The Lists Side by Side
| # | Vanilla | Progressive 5m | Progressive 10m |
|---|---|---|---|
| 1 | The Incandescent (fantasy) | The Tainted Cup ★ (Hugo winner) | The Incandescent (fantasy) |
| 2 | Antimemetics Division | Shroud ★ (hard sci-fi) | The River Has Roots (fantasy) |
| 3 | The Fourth Consort | Antimemetics Division | Antimemetics Division |
| 4 | What We Can Know | The Fourth Consort | The Fourth Consort |
| 5 | This Inevitable Ruin | This Inevitable Ruin | This Inevitable Ruin |
| 6 | 1% Lifesteal (LitRPG) | Operation Bounce House ★ (Feb 2026) | The Prophecy S2 |
| 7 | The Prophecy S2 | What We Can Know | What We Can Know |
| 8 | The River Has Roots (fantasy) | Katabasis ★ (AudioFile pick) | 1% Lifesteal (LitRPG) |
| 9 | Bury Our Bones (vampire fantasy) | The Dragonfly Gambit ★ (Nebula winner) | Bury Our Bones (vampire fantasy) |
| 10 | Greenteeth (cozy fantasy) | The Incandescent (fantasy) | Greenteeth (cozy fantasy) |
★ = unique to that run. The vanilla and 10-minute runs both faithfully reproduced Audible's editorial list — but the 10-minute run added extensive context around it (UK vs US analysis, pure sci-fi filter, post-list releases, narrator/runtime metadata for every entry). The 5-minute run diverged most aggressively: it replaced 5 of the 10 titles with picks from Hugo/Nebula awards, AudioFile critics' lists, and the current Audible chart — surfacing the year's most acclaimed sci-fi that Audible's own editorial missed.
The Numbers
| Vanilla | Progressive 5m | Progressive 10m | |
|---|---|---|---|
| Output size | 3.3 KB | 15 KB | 24 KB |
| Sources consulted | 2 | 8 | 14 |
| Wall time | ~90 sec | ~6 min | ~8 min |
| Hugo/Nebula winners found | No | Yes | Yes |
| Self-corrections | 0 | 3 | 5+ |
| UK vs US comparison | No | No | Yes |
| Narrator/runtime metadata | No | Partial | Full |
| Post-Nov 2025 releases | No | Yes | Yes |
The vanilla answer is not wrong — it's just the first thing the model found presented as the final thing. Progressive mode changes the incentive: the model knows it has time remaining and uses it to question its own work.
How It Works
Progressive mode gives the agent two synthetic tools: update_progress and finalize_progress. As the agent works, it checkpoints its best-so-far state with update_progress. When the time budget expires, a finalization pass runs: the agent reviews its accumulated state, writes a polished prose response, and calls finalize_progress to commit it.
The key insight is that the model sees the time budget in its prompt. A 10-minute budget doesn't just mean "run longer" — it means "you have time to consult additional sources, verify claims, and restructure." The model adjusts its strategy based on the budget it has.
If the model finishes early (decides it's done before the timeout), it checks: is stop_when set to "timeout"? If so, it keeps going — searching for more sources, verifying entries, correcting errors. If set to "done", it stops. When you pass --timeout, stop_when defaults to "timeout", so the model uses the full budget.
How to Use It
Install term-llm, configure a provider and agent, then:
# Vanilla — fast, single-pass
term-llm ask "your research question" --agent my-agent
# Progressive — 5 minute budget
term-llm ask "your research question" --agent my-agent --progressive --timeout 5m
# Progressive — 10 minute budget
term-llm ask "your research question" --agent my-agent --progressive --timeout 10mProgressive mode also works with scheduled jobs. Set progressive: true and timeout_seconds in the job's runner config, and the job runner handles the time budget, finalization, and result storage.
When to Use It
Progressive mode isn't for everything. Quick factual questions don't benefit — the model will just pad. It shines on research tasks where the quality difference comes from consulting more sources, cross-referencing claims, and catching your own errors:
- Literature reviews and "best of" lists
- Competitive analysis and market research
- Technical comparisons that require reading multiple docs
- Any question where the first Google result isn't sufficient
The tradeoff is cost. A 10-minute progressive run uses more tokens than a vanilla run — roughly 3–5x in this experiment. Whether that's worth it depends on whether you're optimising for speed or for getting it right the first time.
The Takeaway
The vanilla answer reads better than it deserves to. It's confident, well-structured, and wrong by omission. The progressive answers are messier in their working — you can see the self-doubt, the corrections, the "let me check that" moments — but they converge on something significantly more trustworthy.
The interesting thing isn't that more time produces more output. It's that more time produces self-correction. The model catches its own mistakes when it has the budget to look back at what it wrote. That's the real value of progressive mode: not "think longer" but "think, then check your work."