Middleware¶

Overview¶

Agent middleware lets you inject custom logic (logging, tracing, input rewriting, access control, …) at key points in an agent’s execution flow without modifying the agent or model code.

In AgentScope Java, you can hook into 5 places — covering everything from the outer reply flow down to the raw model API call:

Position	Type	Description
`onAgent`	Onion	Wraps a full reply flow, covering all ReAct rounds, tool execution, and the final output
`onReasoning`	Onion	Wraps one reasoning step in the ReAct loop (input assembly → model call → streaming decode)
`onActing`	Onion	Wraps the execution of a single tool call
`onModelCall`	Onion	Wraps a raw `ChatModel` API call — closest to the model
`onSystemPrompt`	Transformer	Triggers when the system prompt is assembled; multiple middlewares run in sequence, each transforming the previous output

The two types differ:

Onion — middleware wraps the next handler; you can insert logic before/after next.apply(input) and observe the intermediate event stream.
Transformer — middlewares form a pipeline; the previous output is the next input. There’s no “inner layer” concept.

The diagram below shows how the hooks nest in the agent lifecycle. onSystemPrompt is nested inside onReasoning because it fires when the reasoning step assembles the system prompt:

onAgent/
└── ReAct loop (per round)/
    ├── onReasoning/
    │   ├── onSystemPrompt (assemble system prompt)
    │   └── onModelCall (model API call)
    └── onActing (per tool call)

Note

onActing only wraps tool executions inside the agent runtime. Tools executed outside the agent via external execution are not tracked by onActing.

Equipping middleware¶

AgentScope packs a set of hooks into a single MiddlewareBase implementation — one middleware class can implement any subset of the 5 hooks (the rest default to next.apply(input)). Pass the instances to the builder’s middlewares(...):

import io.agentscope.core.ReActAgent;
import io.agentscope.core.middleware.MiddlewareBase;
import io.agentscope.core.tracing.OtelTracingMiddleware;
import java.util.List;

ReActAgent agent =
        ReActAgent.builder()
                .name("assistant")
                .sysPrompt("You are a helpful assistant.")
                .model(model)
                .toolkit(toolkit)
                .middlewares(List.of(new OtelTracingMiddleware()))
                .build();

middleware(...) (singular) appends one; middlewares(...) accepts List<? extends MiddlewareBase>. Hooks not implemented by a middleware are skipped at zero cost.

Built-in middlewares¶

OtelTracingMiddleware¶

OtelTracingMiddleware (io.agentscope.core.tracing) wires up OpenTelemetry tracing for the agent lifecycle. It instruments onAgent, onModelCall, onActing, producing nested spans:

invoke_agent <name> — wraps a full reply
chat <model> — wraps each model API call
execute_tool <name> — wraps each tool execution

When no OpenTelemetry SDK is configured (only the default no-op provider), every hook short-circuits to next.apply(input) — near-zero overhead.

Initialise the OpenTelemetry SDK in your process (OTLP exporter, SdkTracerProvider, OpenTelemetrySdk.builder().setTracerProvider(...).buildAndRegisterGlobal()) and then equip the middleware:

import io.agentscope.core.ReActAgent;
import io.agentscope.core.tracing.OtelTracingMiddleware;
import java.util.List;

ReActAgent agent =
        ReActAgent.builder()
                .name("assistant")
                .sysPrompt("You are a helpful assistant.")
                .model(model)
                .toolkit(toolkit)
                .middlewares(List.of(new OtelTracingMiddleware()))
                .build();

Each reply produces a nested span tree with attributes such as agent name, session ID, model name, token counts, tool name, and inputs.

TaskReminderMiddleware¶

TaskReminderMiddleware (io.agentscope.core.middleware) pairs with the built-in TodoTools: before every reasoning step it renders the current AgentState.tasksContext as a <system-reminder> and injects it into the context, keeping long-running tasks aligned with the plan.

Enable it together with TodoTools via enableTaskList(true):

import io.agentscope.core.ReActAgent;
import io.agentscope.core.tool.Toolkit;
import io.agentscope.core.tool.builtin.TodoTools;

Toolkit toolkit = new Toolkit();
toolkit.registerTool(new TodoTools());

ReActAgent agent =
        ReActAgent.builder()
                .name("planner")
                .sysPrompt("You plan tasks step by step.")
                .model(model)
                .toolkit(toolkit)
                .enableTaskList(true)
                .build();

Custom middleware¶

Implement MiddlewareBase (io.agentscope.core.middleware) and override only the hooks you need.

Each onion hook receives a next function — calling next.apply(input) enters the next layer. You can insert logic before or after, or use Reactor operators (doOnNext / flatMap / map, …) to observe and rewrite the event stream.

import io.agentscope.core.agent.Agent;
import io.agentscope.core.agent.RuntimeContext;
import io.agentscope.core.event.AgentEvent;
import io.agentscope.core.middleware.ActingInput;
import io.agentscope.core.middleware.AgentInput;
import io.agentscope.core.middleware.MiddlewareBase;
import io.agentscope.core.middleware.ModelCallInput;
import io.agentscope.core.middleware.ReasoningInput;
import java.util.function.Function;
import reactor.core.publisher.Flux;
import reactor.core.publisher.Mono;

/** Observes agent / reasoning / model_call / system_prompt at the same time. */
public class FullObservabilityMiddleware implements MiddlewareBase {

    @Override
    public Flux<AgentEvent> onAgent(
            Agent agent, RuntimeContext ctx, AgentInput input, Function<AgentInput, Flux<AgentEvent>> next) {
        System.out.println("[agent] start for " + agent.getName());
        return next.apply(input)
                .doOnComplete(() -> System.out.println("[agent] end for " + agent.getName()));
    }

    @Override
    public Flux<AgentEvent> onReasoning(
            Agent agent, RuntimeContext ctx, ReasoningInput input, Function<ReasoningInput, Flux<AgentEvent>> next) {
        System.out.println("[reasoning] start");
        return next.apply(input).doOnComplete(() -> System.out.println("[reasoning] end"));
    }

    @Override
    public Flux<AgentEvent> onModelCall(
            Agent agent, RuntimeContext ctx, ModelCallInput input, Function<ModelCallInput, Flux<AgentEvent>> next) {
        System.out.println("[model_call] " + input.model().getClass().getSimpleName());
        return next.apply(input).doOnComplete(() -> System.out.println("[model_call] done"));
    }

    @Override
    public Mono<String> onSystemPrompt(Agent agent, RuntimeContext ctx, String currentPrompt) {
        System.out.println("[system_prompt] length=" + currentPrompt.length());
        return Mono.just(currentPrompt);
    }
}

Input record types per hook (under io.agentscope.core.middleware):

Hook	Input record	Fields
`onAgent`	`AgentInput`	`msgs: List<Msg>`
`onReasoning`	`ReasoningInput`	`messages: List<Msg>`, `tools: List<ToolSchema>`, `options: GenerateOptions`
`onActing`	`ActingInput`	`toolCalls: List<ToolUseBlock>`
`onModelCall`	`ModelCallInput`	`messages`, `tools`, `options`, `model: Model`
`onSystemPrompt`	`String`	The current prompt

To replace fields flowing into the next layer, construct a new input record, then call next.apply(...).

Runnable examples: agentscope-examples/documentation/.../middleware/CustomizedMiddlewareExample.java, middleware/ModelCallMiddlewareExample.java, middleware/SystemPromptMiddlewareExample.java.

Reading RuntimeContext¶

Every MiddlewareBase hook receives the RuntimeContext bound for this call / stream as the second argument — you can read session fields and typed/string attributes, and you can write back to it to forward values to downstream hooks and tools.

import io.agentscope.core.agent.Agent;
import io.agentscope.core.agent.RuntimeContext;
import io.agentscope.core.event.AgentEvent;
import io.agentscope.core.middleware.AgentInput;
import io.agentscope.core.middleware.MiddlewareBase;
import java.util.function.Function;
import reactor.core.publisher.Flux;

/** Log user / request id and propagate a trace id for downstream tools. */
public class RequestContextMiddleware implements MiddlewareBase {

    @Override
    public Flux<AgentEvent> onAgent(
            Agent agent, RuntimeContext ctx, AgentInput input, Function<AgentInput, Flux<AgentEvent>> next) {
        System.out.printf(
                "[req] user=%s session=%s reqId=%s%n",
                ctx.getUserId(),
                ctx.getSessionId(),
                ctx.get("request_id"));
        ctx.put("trace_id", java.util.UUID.randomUUID().toString());  // visible to later hooks / tools
        return next.apply(input);
    }
}

Things to keep in mind:

The same RuntimeContext instance is shared by every hook and tool in the reply; its maps are thread-safe, so put from any hook is safe.
Don’t cache per-request state on middleware instance fields — a middleware instance is typically reused across agents / calls. Use RuntimeContext or Reactor’s contextWrite instead.
If the builder also has a global toolExecutionContext, the framework merges it after the per-call context when dispatching to tools (per-call wins on key collisions).

Execution order¶

Onion hooks (onAgent, onReasoning, onActing, onModelCall) — the first middleware in the list is outermost:

middlewares = [mw1, mw2]
// Order:
// mw1 pre → mw2 pre → inner → mw2 post → mw1 post

For streaming / event-emitting hooks, the inner middleware sees each emitted event first:

mw1_pre → mw2_pre → mw2_event → mw1_event → ... → mw2_post → mw1_post

Transformer hooks (onSystemPrompt) — left to right pipeline:

middlewares = [mw1, mw2]
// originalPrompt → mw1.onSystemPrompt() → mw2.onSystemPrompt() → final

Overall hook execution order across one reply:

onAgent
  └── per ReAct round:
        ├── onReasoning
        │     ├── prepare model input → onSystemPrompt
        │     └── onModelCall
        └── onActing (per tool call)

Practical examples¶

Timing middleware¶

The middleware below records the wall-clock time of each model call:

import io.agentscope.core.agent.Agent;
import io.agentscope.core.event.AgentEvent;
import io.agentscope.core.middleware.MiddlewareBase;
import io.agentscope.core.middleware.ModelCallInput;
import java.util.function.Function;
import reactor.core.publisher.Flux;

public class TimingMiddleware implements MiddlewareBase {
    @Override
    public Flux<AgentEvent> onModelCall(
            Agent agent, ModelCallInput input, Function<ModelCallInput, Flux<AgentEvent>> next) {
        long start = System.nanoTime();
        return next.apply(input)
                .doFinally(sig -> {
                    long ms = (System.nanoTime() - start) / 1_000_000;
                    System.out.println(
                            "[timing] " + agent.getName() + ": " + ms + "ms");
                });
    }
}

Rate-limit middleware¶

Enforce a minimum interval between two model calls:

import io.agentscope.core.agent.Agent;
import io.agentscope.core.event.AgentEvent;
import io.agentscope.core.middleware.MiddlewareBase;
import io.agentscope.core.middleware.ModelCallInput;
import java.time.Duration;
import java.util.concurrent.atomic.AtomicLong;
import java.util.function.Function;
import reactor.core.publisher.Flux;
import reactor.core.publisher.Mono;

public class RateLimitMiddleware implements MiddlewareBase {

    private final long minIntervalMs;
    private final AtomicLong lastCall = new AtomicLong(0);

    public RateLimitMiddleware(Duration minInterval) {
        this.minIntervalMs = minInterval.toMillis();
    }

    @Override
    public Flux<AgentEvent> onModelCall(
            Agent agent, ModelCallInput input, Function<ModelCallInput, Flux<AgentEvent>> next) {
        long now = System.currentTimeMillis();
        long wait = minIntervalMs - (now - lastCall.get());
        Mono<Void> delay = wait > 0 ? Mono.delay(Duration.ofMillis(wait)).then() : Mono.empty();
        return delay.thenMany(next.apply(input))
                .doOnSubscribe(s -> lastCall.set(System.currentTimeMillis()));
    }
}

Dynamic system-prompt middleware¶

Inject runtime context into the system prompt. Or reuse the example middleware/SystemPromptMiddlewareExample.java:

import io.agentscope.core.agent.Agent;
import io.agentscope.core.middleware.MiddlewareBase;
import java.time.Instant;
import java.util.function.Supplier;
import reactor.core.publisher.Mono;

public class DynamicContextMiddleware implements MiddlewareBase {

    private final Supplier<String> contextFn;

    public DynamicContextMiddleware(Supplier<String> contextFn) {
        this.contextFn = contextFn;
    }

    @Override
    public Mono<String> onSystemPrompt(Agent agent, String currentPrompt) {
        return Mono.just(currentPrompt + "\n\n## Current Context\n" + contextFn.get());
    }
}

// Wire-up:
// .middlewares(List.of(new DynamicContextMiddleware(() -> "Time: " + Instant.now())))

Model-fallback middleware¶

Swap to a backup model if the primary fails:

import io.agentscope.core.agent.Agent;
import io.agentscope.core.event.AgentEvent;
import io.agentscope.core.middleware.MiddlewareBase;
import io.agentscope.core.middleware.ModelCallInput;
import io.agentscope.core.model.Model;
import java.util.function.Function;
import reactor.core.publisher.Flux;

public class ModelFallbackMiddleware implements MiddlewareBase {

    private final Model fallback;

    public ModelFallbackMiddleware(Model fallback) {
        this.fallback = fallback;
    }

    @Override
    public Flux<AgentEvent> onModelCall(
            Agent agent, ModelCallInput input, Function<ModelCallInput, Flux<AgentEvent>> next) {
        return next.apply(input)
                .onErrorResume(err -> {
                    System.err.println("Primary model failed: " + err.getMessage()
                            + ", switching to fallback");
                    return next.apply(
                            new ModelCallInput(
                                    input.messages(),
                                    input.tools(),
                                    input.options(),
                                    fallback));
                });
    }
}

Tip

For a simple primary→backup fallback, ReActAgent.Builder already exposes fallbackModel(...) and maxRetries(...) directly — no middleware needed.

Stop agent when all tools are denied¶

When a user denies all tool calls from a reasoning step via HITL, the agent continues to the next reasoning iteration by default (backward compatible). To stop the agent in this scenario, write an onActing middleware that observes AllToolsDeniedEvent and emits a RequestStopEvent:

import io.agentscope.core.agent.Agent;
import io.agentscope.core.agent.RuntimeContext;
import io.agentscope.core.event.AgentEvent;
import io.agentscope.core.event.AllToolsDeniedEvent;
import io.agentscope.core.event.RequestStopEvent;
import io.agentscope.core.message.GenerateReason;
import io.agentscope.core.middleware.ActingInput;
import io.agentscope.core.middleware.MiddlewareBase;
import java.util.function.Function;
import reactor.core.publisher.Flux;

public class StopOnAllDeniedMiddleware implements MiddlewareBase {

    @Override
    public Flux<AgentEvent> onActing(
            Agent agent, RuntimeContext ctx, ActingInput input,
            Function<ActingInput, Flux<AgentEvent>> next) {
        return next.apply(input)
                .flatMap(event -> {
                    if (event instanceof AllToolsDeniedEvent) {
                        return Flux.just(
                                event,
                                new RequestStopEvent(
                                        "All tools denied by user",
                                        GenerateReason.ALL_TOOLS_DENIED));
                    }
                    return Flux.just(event);
                });
    }
}

Once wired up, the agent stops immediately when all tools are denied, returning GenerateReason.ALL_TOOLS_DENIED:

ReActAgent agent =
        ReActAgent.builder()
                .name("guarded")
                .sysPrompt("...")
                .model(model)
                .toolkit(toolkit)
                .middlewares(List.of(new StopOnAllDeniedMiddleware()))
                .build();