Discourse Automations Need Pipelines

Discourse has an automation plugin. It ships with 19 triggers, 22 built-in scripts, and 5 AI-powered scripts from the discourse-ai plugin — plus additional scripts from other plugins like discourse-data-explorer. On paper, that sounds comprehensive. In practice, it's a collection of monoliths that can't talk to each other.

A recent feature request on Meta captures the core frustration well: "I often find one script has something I need, while I need that part to work within the context of another script." The request is to split automations into composable triggers and actions. This article is a concrete proposal for how to do that — and a diagnosis of why the current design makes it necessary.

What Exists Today

The current architecture has three concepts, wired in a way that defeats composability:

Triggers — event sources like post_created_edited, topic_closed, user_badge_granted
Scripts — monolithic blocks that bundle filtering logic, action execution, and sometimes multiple side-effects into a single proc
Automations — the join: exactly one trigger → one script

The Automation model has two string columns: trigger and script. The trigger! method calls the script's proc directly. There's no indirection, no action pipeline, no data-passing between steps.

The kitchen-sink problem

The best illustration is llm_triage from discourse-ai. It's a single script that handles: running an LLM call, searching the response for text, optionally changing the category, optionally adding tags, optionally hiding the topic, optionally flagging the post (with multiple flag types), optionally posting a canned reply, optionally replying via a different AI persona, optionally sending a PM to the post author, and optionally whispering. That's fifteen config fields for one script, because the framework gives it no way to delegate to other actions.

Every script is an island. auto_tag_topic knows how to tag things. send_pms knows how to send PMs. close_topic knows how to close topics. But you can't chain them. If you want "LLM triage → tag → PM the author → close if spam," you either use the kitchen-sink llm_triage script (hoping it has all the knobs you need), or you create multiple automations on the same trigger and hope they don't race.

The trigger-side mess

The trigger side is equally tangled. Here's a simplified view of what handle_post_created_edited does in event_handlers.rb — before it even gets to the script:

# Simplified from ~80 lines of actual filtering code
next if selected_action == :created && action != :create
next if restricted_archetype != topic_archetype
next if !restricted_category_ids.include?(topic.category_id)
next if (restricted_tags["value"] & topic.tags.map(&:name)).empty?
next if valid_trust_levels["value"].exclude?(post.user.trust_level)
next if user_group_ids.present? && !post.user.in_any_groups?(user_group_ids)
next if excluded_group_ids.present? && post.user.in_any_groups?(excluded_group_ids)
next if skip_via_email["value"] && post.via_email?
next if first_post_only["value"] && post.user.user_stat.post_count != 1
next if first_topic_only["value"] && post.user.user_stat.topic_count != 1
# ...more

And every trigger reinvents this. after_post_cook has its own restricted_category check with different field names. topic_tags_changed has watching_categories instead of restricted_categories. pm_created has restricted_user and ignore_staff. It's the same filtering concepts with different names, different code paths, and different edge-case handling everywhere.

The Proposal: Three Clean Layers

The fix is to separate three concerns that the current design conflates:

Trigger — the event source. Thin. Just fires with raw context.
Conditions — generic filters. Reusable across any trigger.
Actions — composable steps. Chained in a pipeline. Each reads and extends a shared context.

01 Trigger fires — "a post was created" — and emits a context hash: { post, topic, user, action: :created }

↓

02 Conditions evaluate against the context: category is Support? Trust level ≥ 2? Post is first in topic? All conditions must pass or the pipeline halts.

↓

03 Action₁ runs (e.g. LLM call) — reads context[:post], writes context[:llm_response]

↓

04 Action₂ runs (e.g. tag topic) — reads context[:llm_response] to pick tags

↓

05 Action₃ runs (e.g. send PM) — uses context[:post] and context[:llm_response] as template data

Triggers become thin — but not all are one-liners

Most triggers become trivial — just emit a context hash when something happens. The post_created_edited trigger drops from 80+ lines to roughly this:

on(:post_created) do |post|
  { post: post, topic: post.topic, user: post.user, action: :created }
end

No category checks. No trust-level checks. No group exclusions. Those are conditions — a separate, shared layer.

But not every trigger is a simple event. Stalled Topic is inherently a periodic query — it scans for topics that haven't received a reply within a time window. The stall duration is the trigger's only intrinsic config, because it's what defines "stalled." Everything else — which category, only open topics, not staff-authored — those are conditions. The same condition definitions that work for post_created work identically here. The same applies to Recurring (needs an interval) and Point in Time (needs a datetime).

The rule is clean: if it's about which event fires, it's trigger config. If it's about filtering which firings matter, it's a condition. The stall duration defines the event. The category filters it.

Conditions are generic and composable

Instead of each trigger reinventing restricted_categories, watching_categories, and restricted_category (yes, all three exist), there's one shared condition library:

Condition	Checks	Replaces
category_is	Topic is in specified categories (with optional subcategory inclusion)	`restricted_categories`, `watching_categories`, `restricted_category`
has_tags	Topic has any of the specified tags	`restricted_tags`, `watching_tags`
trust_level	User's trust level matches threshold	`valid_trust_levels`
user_in_group	User is member of specified groups	`restricted_groups`
user_not_in_group	User is not member of specified groups	`excluded_groups`
is_first_post	Post is the first in the topic	`original_post_only`
is_first_topic	User's first ever topic	`first_topic_only`
archetype_is	Topic archetype matches (regular, PM, etc.)	`restricted_archetype`
not_via_email	Post was not created via email	`skip_via_email`
not_bot	User is not a bot	`ignore_automated`
not_staff	User is not staff	`ignore_staff`

These work identically regardless of whether the trigger is post_created, topic_tags_changed, or pm_created. Write once, apply anywhere.

Actions chain through context

Each action receives the context hash, does its work, and optionally extends it for downstream actions. The framework runs them in sequence:

# Simplified pipeline executor
def run_pipeline(trigger_context, conditions, actions)
  context = trigger_context.dup

  # All conditions must pass
  conditions.each do |condition|
    return unless condition.evaluate(context)
  end

  # Actions run in sequence, each can extend context
  actions.sort_by(&:position).each do |action|
    result = action.execute(context)
    break if result[:halt]
    context.merge!(result[:additions] || {})
  end
end

An LLM call action writes its response to context. A tagging action reads from context to decide which tags to apply. A conditional action can halt the pipeline. The actions don't need to know about each other — they communicate through the shared context.

LLM input: what does the model actually see?

This is the hardest design surface. Today every AI script hardcodes its own input assembly:

llm_triage sends "title: {title}\n{post.raw}" — just one post, truncated to a token budget
llm_tagger sends the title plus multiple posts from the topic (up to max_posts_for_context), plus the available tags list
llm_report sends the admin's custom instructions wrapped around a structured <context> block with topic summaries, post excerpts, stats, and top-user data

In the pipeline model, the persona (system prompt) defines what the LLM does. The LLM Call action's input config defines what data it sees. Three levels of control:

Auto (default) — the action looks at context and assembles input automatically. Post in context? Send "title: {title}\n{post.raw}". Report data? Wrap it in <context> tags. Query result? Include the markdown table. Zero config for simple cases.
Source selector — choose explicitly: just the post body, post + topic replies (with a limit), report data, query results. Plus toggles for images/uploads, a token budget, and a custom preamble.
Template mode — write a full input template with placeholders: "Classify this post:\nTitle: {{topic.title}}\nBody: {{post.raw}}\nAvailable tags: {{available_tags}}". Full control for power users. The template can reference any context key.

The key insight: what data to send is action config, not persona config. A spam classifier persona works the same whether it's reading a single post or a batch of flagged posts — the input config determines the scope. This also means the same persona can be reused across different pipelines with different input shapes.

Try It: Build a Pipeline

The demo below lets you assemble a pipeline from the proposed model. Pick a trigger, add conditions and configure them, then chain actions — each with its own settings. The output panel shows the context hash as it flows through. Start with one of the presets to see fully configured real-world scenarios — spam triage, auto-tagging, stalled topic nudges, weekly AI reports, and Data Explorer pipelines.

Presets: 🛡 Spam Triage 🏷 Auto-Tag & Route ⏰ Stalled Topic Nudge 📊 Weekly AI Report 🚨 Urgent Escalation 📋 Data Explorer Report

Trigger

What event starts this automation?

↓

Conditions

Click to add a filter. All must pass or the pipeline halts.

↓

Actions

Click to add to the pipeline. Configure each step inline.

↓

Context Flow

Compare: Today vs. Pipeline

Here's a real-world scenario to make the difference concrete. You want: "When a new topic is created in Support, run LLM triage. If it looks like spam, flag it and hide the topic. If it looks like a billing question, tag it and reply with the billing persona."

Today: two kitchen-sink automations

You need two separate llm_triage automations on the same trigger, because each can only do one "branch" of logic. Both run the LLM independently (doubling the cost). Neither can condition on the other's output. And if you also want to tag non-spam billing questions, you might need a third automation running auto_tag_topic with yet another duplicate trigger evaluation.

Proposed: one pipeline, two branches

One automation, one trigger, one LLM call. After the LLM call, a conditional action checks the response. The pipeline forks cleanly:

01 Trigger: Post Created

02 Condition: Category is Support, Not a Bot

03 LLM Call → writes context[:llm_response], context[:classification]

04 If classification == "spam" → Flag Post + Hide Topic

05 If classification == "billing" → Tag Topic ["billing"] + Reply with Persona

One LLM call instead of two. Clear control flow. Each action is small, testable, and reusable in other pipelines.

Reports: data gathering as an action

The AI report use case stress-tests this model differently. Today's llm_report script is another kitchen sink: 25 config fields covering data gathering (time range, categories, tags, excludes, sample size, token budget), LLM summarisation (persona, model, instructions, temperature), and publishing (topic, PM recipients, email addresses, notification suppression). One monolithic script does all three.

In the pipeline model, these decompose cleanly. But there's a subtle distinction: the report's category and tag filtering isn't the same as pipeline conditions. Conditions gate whether the automation fires. The report's scoping determines what data the action gathers. A weekly report always fires — there's no event to filter. The category selection is action-level config: "gather posts from Support and Feature Requests, excluding Staff."

01 Trigger: Recurring (every Monday)

02 Gather Report Data — last 7 days, categories: Support + Feature Requests, sample: 100 posts → context[:report_data]

03 LLM Call — Report Runner persona, reads context[:report_data] → context[:llm_response]

04 Send PM — to @leadership, body from context[:llm_response]

05 Create Topic — in Reports category, body from context[:llm_response]

This decomposition unlocks things the monolith can't do: insert a Continue Only If to suppress empty weeks, gather data from two different scopes and feed both to one LLM, swap the persona without touching the data config, or add a Tag Topic step to the report topic. Try building it in the demo above — pick Recurring, add Gather Report Data, then LLM Call, then any combination of publish actions.

Data Explorer: the purest case for decomposition

The discourse-data-explorer plugin ships two automation scripts: recurring_data_explorer_result_pm and recurring_data_explorer_result_topic. They do the same thing — run a saved SQL query on a cron, format the result as a markdown table — and differ only in where they publish: one sends a PM, the other posts to a topic. Two scripts, because the framework can't compose "run a query" with "decide where to put the output."

In the pipeline model, this is one action — Run Data Explorer Query — that writes context[:query_result]. What happens next is up to the pipeline. Send a PM? Add a Send PM step. Post to a topic? Add a Create Topic step. Do both? Add both. Feed the raw results through an LLM for analysis first? Stick an LLM Call in between. Skip empty result sets? Toggle a checkbox or add a Continue Only If on query_result_count. The "📋 Data Explorer Report" preset in the demo above shows this in action.

This is also where the pipeline model benefits other plugins directly. Any plugin that adds a data-producing action — custom reports, analytics queries, external API calls — gets the full power of conditions, context passing, and flexible publishing for free. The plugin just writes one action that emits to context. The framework handles everything else.

Schema Changes

The database changes are modest. The existing discourse_automation_automations table keeps trigger but drops script. Two new tables:

Table	Columns	Purpose
automation_conditions	`automation_id`, `condition_type`, `config (jsonb)`, `position`	Ordered list of filter predicates
automation_actions	`automation_id`, `action_type`, `config (jsonb)`, `position`, `enabled`	Ordered action pipeline

The existing discourse_automation_fields table can be repurposed — its target column already distinguishes "trigger" from "script" fields. Extend it to support "condition:0", "action:0", etc., or just move to the JSONB config column on the new tables.

Backward Compatibility

Existing automations with script set continue to work as single-action pipelines. The migration path:

Any automation with a script value gets an automation_actions row with action_type = "legacy_script" and the script name in config
Trigger filter fields (like restricted_categories) get migrated to automation_conditions rows
The legacy script wrapper passes through to the existing Scriptable proc — no script code changes required
New automations use the pipeline UI exclusively

Over time, the monolithic scripts get decomposed into actions. llm_triage becomes llm_call + conditional + individual action steps. But nothing breaks in the meantime.

Testing Pipelines

Testing automations today is essentially: configure it, save, wait for the trigger, check logs, repeat. The only observability is discourse_automation_stats (aggregate run count and timing per day) and the AI report's debug_mode (which dumps raw LLM input into a follow-up PM — but only in development). There's no dry run, no step-level inspection, no way to test conditions against a real post without triggering the entire automation for real.

Pipelines make this worse — more steps, more context passing, more places for things to silently go wrong — but they also make it solvable, because each step has typed inputs and outputs. The proposal needs three testing features to be viable:

1. Dry run with real data

The admin picks a real post, topic, or user from the site and runs the pipeline against it. Every action executes in read-only mode: LLM calls actually fire (they're read-only by nature), but side effects — flagging, tagging, PMing, closing — are captured and reported instead of executed.

The output is a step-by-step trace:

01 Trigger: Post Created — loaded post #4821 by @alice

02 Condition: Category Is: Support — ✓ PASS (topic is in Support)

03 Condition: Not a Bot — ✓ PASS (alice is human)

04 LLM Call — Spam Classifier → response: "not spam, billing question about refund"
context[:llm_classification] = "billing"

05 Continue Only If: llm_classification contains "spam" — ✗ HALT
Pipeline stopped. Actions 06–07 were not reached.

This immediately answers the two questions admins always have: "would this trigger for post X?" and "what would the LLM say?" No side effects, real data, instant feedback.

2. Execution log

Every production pipeline run writes a structured log — not just aggregate stats, but per-step detail:

Column	Content
`automation_id`	Which pipeline
`trigger_context`	JSON snapshot of what the trigger emitted
`condition_results`	Array of `{ type, passed, reason }`
`action_results`	Array of `{ type, status, context_before, context_after, duration_ms, error }`
`halted_at`	Which step halted the pipeline (null if completed)
`total_duration_ms`	Wall-clock time for the full run

This replaces the guessing game. When an admin says "my automation didn't fire," the log shows exactly which condition failed. When it fired but did the wrong thing, the log shows what the LLM returned and what context each action saw. The existing discourse_automation_stats table can stay as-is for aggregate dashboards — the execution log is the detailed view.

3. Action-level test

Test a single action in isolation. The admin provides or overrides the input context manually — paste in a JSON context hash, or pick a real post and let the framework build the context — and the action runs in dry-run mode. This is the escape hatch for debugging step 5 of a 7-step pipeline without running steps 1–4 every time.

For LLM actions specifically, this also doubles as a prompt debugger: show the exact text that would be sent to the model, the token count, and the model's response. The current debug_mode on AI reports does a crude version of this by appending the raw LLM input to the PM thread. Pipeline testing makes it a first-class feature.

Why this matters for pipelines specifically

A single-step automation is testable by accident — if the script runs, you can see if it worked. A five-step pipeline with conditional halts and context passing between LLM calls and side effects is not testable by accident. Without dry run, you deploy to production and hope. Without execution logs, you debug by reading source code. Without action-level testing, you re-trigger the entire pipeline to iterate on one step. Pipelines are only as useful as the confidence admins have in them — and confidence comes from observability.

Open Questions

A few things this proposal intentionally leaves underspecified:

Branching vs. linear: The pipeline above is linear with conditional halts. Full branching (if/else with divergent paths) is more powerful but significantly more complex to build and to reason about in a UI. Linear-with-halts covers 90% of use cases.
Context namespacing: If a pipeline has two LLM calls, do they both write to context[:llm_response]? Flat context with last-writer-wins is simpler for v1. Namespacing (context[:actions][2][:output]) is better long-term but adds UI complexity.
Error handling: What happens when Action₂ fails? Options: halt the pipeline, skip to the next action, retry. The current system has no error handling at all, so even "halt on error" is an improvement.
Async actions: Some actions (like LLM calls) are slow. Should the pipeline support "fire and forget" steps, or is everything synchronous? The current run_in_background flag on scripts suggests async is already needed.
Dedup and cooldowns: Query-based triggers like Stalled Topic will find the same topics every scan until something changes. The framework needs a fired_for(automation_id, target_id) ledger so pipelines don't re-fire endlessly. A cooldown config — "don't re-fire for this topic within N days" — would let admins control re-engagement for topics that remain stalled after the initial nudge.

The Core Argument

The current automation system forces every script to be a self-contained program. Need LLM + tagging + reply? That's one 250-line script with fifteen config fields. Need LLM + flagging + PM? That's a different 250-line script with a different fifteen fields. The shared logic gets copied, the edge cases diverge, and admins can't customize the flow without writing Ruby.

Pipelines fix this by making the unit of composition small: one action does one thing. The framework handles sequencing, context flow, and condition evaluation. Scripts stop being monoliths and start being building blocks.

The irony is that the current codebase is almost there. The trigger/script separation exists. The context hash exists. The fields system exists. The missing piece is allowing more than one script per automation, and giving them a shared context to talk through. That's a schema change, a pipeline executor, and a UI — not a rewrite.