Give Your LLM Time to Think: Progressive Mode in term-llm

Most LLM interactions follow the same pattern: ask a question, get an answer in seconds. The model does one pass — searches the web, reads a few pages, synthesises a response — and you're done. The result is usually competent. It's also usually a first draft pretending to be a final answer.

term-llm has a feature called progressive mode that changes this. Instead of answering once and stopping, the agent gets a time budget — say, 5 or 10 minutes — and keeps working. It drafts, critiques its own work, searches for more sources, verifies claims, corrects errors, and only finalises when the budget runs out. The model explicitly knows how much time it has and can choose how to spend it.

I wanted to know: does this actually produce better results? Or does the model just pad its answer with filler?

The Experiment

Same question, same model, three configurations. The question is deliberately one where surface-level answers are easy but thorough answers require cross-referencing multiple sources:

What are the top 10 audible sci fi audiobooks released in the past year? Search the web for current listings.

The three runs:

Run	Command	Time budget
Vanilla	`term-llm ask "..." --agent jarvis`	None (finishes when done)
Progressive 5m	`term-llm ask "..." --agent jarvis --progressive --timeout 5m`	5 minutes + finalization grace
Progressive 10m	`term-llm ask "..." --agent jarvis --progressive --timeout 10m`	10 minutes + finalization grace

Model: Claude Opus (via claude-bin provider). All three runs used the same agent configuration with web search enabled. No cherry-picking — these are the actual outputs from one run of each.

Vanilla: The First Draft

The vanilla run took about 90 seconds. It searched the web, found Audible's "Best of 2025" editorial list, read the page, and presented all 10 titles with genre annotations:

Full vanilla output (3.3 KB — click to expand)

Straight from Audible's own "Best of 2025" list, here are their top 10 sci-fi & fantasy audiobooks of the past year. Fair warning: it's a mixed bag — some are hard sci-fi, others lean fantasy or LitRPG. I've flagged the genre for each: The Incandescent – Emily Tesh (dark fantasy / magical school) There Is No Antimemetics Division – qntm (sci-fi / horror) The Fourth Consort – Edward Ashton (space opera / sci-fi comedy) What We Can Know – Ian McEwan (speculative / cli-fi) This Inevitable Ruin – Matt Dinniman (LitRPG / sci-fi) 1% Lifesteal – Robert Blaise (progression fantasy / LitRPG) The Prophecy, Season 2 – Kerry Washington (supernatural thriller) The River Has Roots – Amal El-Mohtar (fairy tale / literary fantasy) Bury Our Bones in the Midnight Soil – V.E. Schwab (gothic / vampire) Greenteeth – Molly O’Neill (cozy fantasy)

The two strongest pure sci-fi picks: There Is No Antimemetics Division and The Fourth Consort. If you only want hard SF and skip the fantasy/LitRPG, start there.

This is a perfectly fine answer. The genre flags are helpful. The recommendations at the bottom are sensible. If you asked a friend who'd spent 90 seconds researching, you'd be satisfied.

But it has blind spots it doesn't know about:

It only consulted one source (Audible's editorial list from November 2025)
Half the titles are fantasy, not sci-fi — the list is "Sci-Fi & Fantasy" combined, and the model didn't question that
It missed the 2025 Hugo Award winner (The Tainted Cup) entirely
It missed the 2025 Nebula Award winner (The Dragonfly Gambit) entirely
No narrator details, no runtimes, no cross-referencing against critics' lists
Nothing released after November 2025

The model didn't lie. It just stopped too early.

Progressive 5 Minutes: The Self-Critique

The 5-minute run is where things get interesting. After producing its initial draft (similar to vanilla), the model did something the vanilla run never does: it criticised its own output and kept going.

Here's the exact moment it pivots, visible in the raw output:

The current answer is solid but has gaps: Audible's list is from
November 2025 (misses anything since), it's mixed sci-fi/fantasy
rather than sci-fi-focused, and I haven't checked current bestseller
charts or early 2026 releases. Let me fix that.

It then searched for the AudioFile critics' list, the 2025 Hugo winner, the Nebula Awards, and the current Audible live chart. It found The Tainted Cup (Hugo winner), Operation Bounce House (Feb 2026, currently charting), and The Dragonfly Gambit (Nebula winner). It also did something subtle: it started verifying its own claims.

Risky claims to verify: The Tainted Cup's actual release date
(2024 or 2025?), whether Shroud by Tchaikovsky has a firm Audible
listing and release date, whether The Dragonfly Gambit is available
as an audiobook...

The final output includes 10 entries with sources, award data, and honest caveats about where the Audible editorial list falls short. Here's entry #1 — which vanilla missed entirely:

5-minute output excerpt — The Tainted Cup (entry #1)

1. The Tainted Cup — Robert Jackson Bennett Detective sci-fi/fantasy | Hugo Award Winner 2025 · World Fantasy Award Winner 2025 | Narrated by Andrew Fallaize A Holmes-and-Watson detective duo investigate an imperial officer who died when a tree erupted from his body. Set in an empire built on leviathan blood and biological contagion — strange, layered worldbuilding wrapped in a fiendishly clever murder mystery. Ana Dolabra wears a blindfold and never leaves home. Her assistant Din has a magically perfect memory. T.J. Klune called it “a fantasy version of Knives Out.” NYT, NPR, and Elle all named it a best book of the year. It’s the year’s most award-laden genre novel and inexplicably absent from Audible’s own November list.

Confirmed on Audible — also recommended: the just-released sequel A Trade of Blood (2025).

Progressive 10 Minutes: The Full Audit

The 10-minute run does everything the 5-minute run does, then keeps going. It consulted more sources (Penguin Random House audiobook lists, Book Riot new releases, Goodreads), fact-checked individual Audible product pages, and verified narrator credits and runtimes.

But the most interesting difference is structural. The 10-minute run produced:

Full metadata cards for every entry — narrator, runtime, genre, sourcing
UK vs US comparison — the UK and US Audible "Best of 2025" lists share 7 of 10 titles, with 3 swaps. Vanilla didn't even know there were two lists.
A "pure sci-fi" filter — extracted the 3 titles that are unambiguously science fiction from the mixed genre list
Post-list coverage — flagged notable releases from Dec 2025 through March 2026 that postdate the editorial list
Explicit corrections — caught and fixed things from its own earlier drafts

10-minute output excerpt — entry cards with metadata

### 3. There Is No Antimemetics Division — qntm Narrator: Rebecca Calder · Runtime: 8h 47m · Genre: Sci-Fi Horror Expanded from the SCP Foundation universe. Entities that destroy your memory of their existence — how do you fight something you can’t remember exists? Non-linear, redacted-document structure. Raises questions about deepfakes and fascism without being heavy-handed. The genuinely weird pick of the list. 4. The Fourth Consort — Edward Ashton Narrator: Barrie Kreinik · Runtime: 8h 25m · Genre: Sci-Fi (Space Comedy)

Humans as a client civilization in a galactic power structure. A human diplomat and a stick-man alien stranded on an insect-politics planet. Mickey7 author doing highbrow sci-fi meets Simon Pegg buddy comedy.

10-minute output — UK vs US analysis (vanilla had no idea this existed)

UK vs. US: The UK "Best of 2025" shares 7 of these 10 titles. The UK-only swaps include: - Flybot by Dennis E. Taylor (standalone AI thriller, narrated by Ray Porter — not part of Bobiverse but same author/narrator energy) - The Devils by Joe Abercrombie - Harry Potter Full-Cast Production

The 3 US-only titles that get swapped out: The Fourth Consort, 1% Lifesteal, and The Prophecy Season 2.

10-minute output — self-correction in action

A few things worth verifying: the list was published Nov 17, 2025 — it misses ~4 months of releases up to today (March 2026). Also, half the list is fantasy, not strict sci-fi. Let me check for December 2025–2026 notable releases and cross-reference against other major lists. [searches Penguin Random House, Book Riot, GoodEReader]

Earlier claim “Flybot is in the Bobiverse” — confirmed correct in spirit but imprecise. Flybot is a standalone novel, not in the Bobiverse series. The Audible listing calls Taylor “the number one best-selling author of the Bobiverse series” and reviewers note “echoes of the Bobiverse” — but it’s its own thing.

The Lists Side by Side

#	Vanilla	Progressive 5m	Progressive 10m
1	The Incandescent (fantasy)	The Tainted Cup ★ (Hugo winner)	The Incandescent (fantasy)
2	Antimemetics Division	Shroud ★ (hard sci-fi)	The River Has Roots (fantasy)
3	The Fourth Consort	Antimemetics Division	Antimemetics Division
4	What We Can Know	The Fourth Consort	The Fourth Consort
5	This Inevitable Ruin	This Inevitable Ruin	This Inevitable Ruin
6	1% Lifesteal (LitRPG)	Operation Bounce House ★ (Feb 2026)	The Prophecy S2
7	The Prophecy S2	What We Can Know	What We Can Know
8	The River Has Roots (fantasy)	Katabasis ★ (AudioFile pick)	1% Lifesteal (LitRPG)
9	Bury Our Bones (vampire fantasy)	The Dragonfly Gambit ★ (Nebula winner)	Bury Our Bones (vampire fantasy)
10	Greenteeth (cozy fantasy)	The Incandescent (fantasy)	Greenteeth (cozy fantasy)

★ = unique to that run. The vanilla and 10-minute runs both faithfully reproduced Audible's editorial list — but the 10-minute run added extensive context around it (UK vs US analysis, pure sci-fi filter, post-list releases, narrator/runtime metadata for every entry). The 5-minute run diverged most aggressively: it replaced 5 of the 10 titles with picks from Hugo/Nebula awards, AudioFile critics' lists, and the current Audible chart — surfacing the year's most acclaimed sci-fi that Audible's own editorial missed.

The Numbers

	Vanilla	Progressive 5m	Progressive 10m
Output size	3.3 KB	15 KB	24 KB
Sources consulted	2	8	14
Wall time	~90 sec	~6 min	~8 min
Hugo/Nebula winners found	No	Yes	Yes
Self-corrections	0	3	5+
UK vs US comparison	No	No	Yes
Narrator/runtime metadata	No	Partial	Full
Post-Nov 2025 releases	No	Yes	Yes

The vanilla answer is not wrong — it's just the first thing the model found presented as the final thing. Progressive mode changes the incentive: the model knows it has time remaining and uses it to question its own work.

How It Works

Progressive mode gives the agent two synthetic tools: update_progress and finalize_progress. As the agent works, it checkpoints its best-so-far state with update_progress. When the time budget expires, a finalization pass runs: the agent reviews its accumulated state, writes a polished prose response, and calls finalize_progress to commit it.

The key insight is that the model sees the time budget in its prompt. A 10-minute budget doesn't just mean "run longer" — it means "you have time to consult additional sources, verify claims, and restructure." The model adjusts its strategy based on the budget it has.

If the model finishes early (decides it's done before the timeout), it checks: is stop_when set to "timeout"? If so, it keeps going — searching for more sources, verifying entries, correcting errors. If set to "done", it stops. When you pass --timeout, stop_when defaults to "timeout", so the model uses the full budget.

How to Use It

Install term-llm, configure a provider and agent, then:

# Vanilla — fast, single-pass
term-llm ask "your research question" --agent my-agent

# Progressive — 5 minute budget
term-llm ask "your research question" --agent my-agent --progressive --timeout 5m

# Progressive — 10 minute budget
term-llm ask "your research question" --agent my-agent --progressive --timeout 10m

Progressive mode also works with scheduled jobs. Set progressive: true and timeout_seconds in the job's runner config, and the job runner handles the time budget, finalization, and result storage.

When to Use It

Progressive mode isn't for everything. Quick factual questions don't benefit — the model will just pad. It shines on research tasks where the quality difference comes from consulting more sources, cross-referencing claims, and catching your own errors:

Literature reviews and "best of" lists
Competitive analysis and market research
Technical comparisons that require reading multiple docs
Any question where the first Google result isn't sufficient

The tradeoff is cost. A 10-minute progressive run uses more tokens than a vanilla run — roughly 3–5x in this experiment. Whether that's worth it depends on whether you're optimising for speed or for getting it right the first time.

The Takeaway

The vanilla answer reads better than it deserves to. It's confident, well-structured, and wrong by omission. The progressive answers are messier in their working — you can see the self-doubt, the corrections, the "let me check that" moments — but they converge on something significantly more trustworthy.

The interesting thing isn't that more time produces more output. It's that more time produces self-correction. The model catches its own mistakes when it has the budget to look back at what it wrote. That's the real value of progressive mode: not "think longer" but "think, then check your work."