model.md

Model¶

This guide introduces the LLM models supported by AgentScope Java and how to configure them.

Supported Models¶

Provider	Class	Streaming	Tools	Vision	Reasoning
DashScope	`DashScopeChatModel`	✅	✅	✅	✅
OpenAI	`OpenAIChatModel`	✅	✅	✅
Anthropic	`AnthropicChatModel`	✅	✅	✅	✅
Gemini	`GeminiChatModel`	✅	✅	✅	✅
Ollama	`OllamaChatModel`	✅	✅	✅	✅

Note:

OpenAIChatModel is compatible with OpenAI API specification, works with vLLM, DeepSeek, etc.

GeminiChatModel supports both Gemini API and Vertex AI

Getting API Keys¶

Provider	URL	Environment Variable
DashScope	Alibaba Cloud Bailian Console	`DASHSCOPE_API_KEY`
OpenAI	OpenAI Platform	`OPENAI_API_KEY`
Anthropic	Anthropic Console	`ANTHROPIC_API_KEY`
Gemini	Google AI Studio	`GEMINI_API_KEY`
DeepSeek	DeepSeek Platform	-

DashScope¶

Alibaba Cloud LLM platform, providing Qwen series models.

DashScopeChatModel model = DashScopeChatModel.builder()
        .apiKey(System.getenv("DASHSCOPE_API_KEY"))
        .modelName("qwen3-max")
        .build();

Configuration¶

Option	Description
`apiKey`	DashScope API key
`modelName`	Model name, e.g., `qwen3-max`, `qwen-vl-max`
`baseUrl`	Custom API endpoint (optional)
`stream`	Enable streaming, default `true`
`enableThinking`	Enable thinking mode to show reasoning process
`enableSearch`	Enable web search for real-time information

Thinking Mode¶

DashScopeChatModel model = DashScopeChatModel.builder()
        .apiKey(System.getenv("DASHSCOPE_API_KEY"))
        .modelName("qwen3-max")
        .enableThinking(true)  // Automatically enables streaming
        .defaultOptions(GenerateOptions.builder()
                .thinkingBudget(5000)  // Token budget for thinking
                .build())
        .build();


OllamaChatModel model =
        OllamaChatModel.builder()
                .modelName("qwen3-max")
                .baseUrl("http://localhost:11434")
                .defaultOptions(OllamaOptions.builder()
                        .thinkOption(ThinkOption.ThinkBoolean.ENABLED)
                        .temperature(0.8)
                        .build())
                .build();

OpenAI¶

OpenAI models and compatible APIs.

OpenAIChatModel model = OpenAIChatModel.builder()
        .apiKey(System.getenv("OPENAI_API_KEY"))
        .modelName("gpt-4o")
        .build();

Compatible APIs¶

For DeepSeek, vLLM, and other compatible providers:

OpenAIChatModel model = OpenAIChatModel.builder()
        .apiKey("your-api-key")
        .modelName("deepseek-chat")
        .baseUrl("https://api.deepseek.com")
        .build();

Configuration¶

Option	Description
`apiKey`	API key
`modelName`	Model name, e.g., `gpt-4o`, `gpt-4o-mini`
`baseUrl`	Custom API endpoint (optional)
`stream`	Enable streaming, default `true`

Anthropic¶

Anthropic’s Claude series models.

AnthropicChatModel model = AnthropicChatModel.builder()
        .apiKey(System.getenv("ANTHROPIC_API_KEY"))
        .modelName("claude-sonnet-4-5-20250929")  // Default
        .build();

Configuration¶

Option	Description
`apiKey`	Anthropic API key
`modelName`	Model name, default `claude-sonnet-4-5-20250929`
`baseUrl`	Custom API endpoint (optional)
`stream`	Enable streaming, default `true`

Gemini¶

Google’s Gemini series models, supporting both Gemini API and Vertex AI.

Gemini API¶

GeminiChatModel model = GeminiChatModel.builder()
        .apiKey(System.getenv("GEMINI_API_KEY"))
        .modelName("gemini-2.5-flash")  // Default
        .build();

Vertex AI¶

GeminiChatModel model = GeminiChatModel.builder()
        .modelName("gemini-2.0-flash")
        .project("your-gcp-project")
        .location("us-central1")
        .vertexAI(true)
        .credentials(GoogleCredentials.getApplicationDefault())
        .build();

Configuration¶

Option	Description
`apiKey`	Gemini API key
`modelName`	Model name, default `gemini-2.5-flash`
`project`	GCP project ID (Vertex AI)
`location`	GCP region (Vertex AI)
`vertexAI`	Whether to use Vertex AI
`credentials`	GCP credentials (Vertex AI)
`streamEnabled`	Enable streaming, default `true`

Ollama¶

Self-hosted open-source LLM platform supporting various models.

OllamaChatModel model = OllamaChatModel.builder()
        .modelName("qwen3-max")
        .baseUrl("http://localhost:11434")  // Default
        .build();

Configuration¶

Option	Description
`modelName`	Model name, e.g., `qwen3-max`,`llama3.2`, `mistral`, `phi3`
`baseUrl`	Ollama server endpoint (optional, default `http://localhost:11434`)
`defaultOptions`	Default generation options
`formatter`	Message formatter (optional)
`httpTransport`	HTTP transport configuration (optional)

Advanced Configuration¶

For advanced model loading and generation parameters:

OllamaOptions options = OllamaOptions.builder()
        .numCtx(4096)           // Context window size
        .temperature(0.7)       // Generation randomness
        .topK(40)               // Top-K sampling
        .topP(0.9)              // Nucleus sampling
        .repeatPenalty(1.1)     // Repetition penalty
        .build();

OllamaChatModel model = OllamaChatModel.builder()
        .modelName("qwen3-max")
        .baseUrl("http://localhost:11434")
        .defaultOptions(options)
        .build();

GenerateOptions Support¶

Ollama also supports GenerateOptions for standard configuration:

GenerateOptions options = GenerateOptions.builder()
        .temperature(0.7)           // Maps to Ollama's temperature
        .topP(0.9)                  // Maps to Ollama's top_p
        .topK(40)                   // Maps to Ollama's top_k
        .maxTokens(2000)            // Maps to Ollama's num_predict
        .seed(42L)                  // Maps to Ollama's seed
        .frequencyPenalty(0.5)      // Maps to Ollama's frequency_penalty
        .presencePenalty(0.5)       // Maps to Ollama's presence_penalty
        .additionalBodyParam(OllamaOptions.ParamKey.NUM_CTX.getKey(), 4096)      // Context window size
        .additionalBodyParam(OllamaOptions.ParamKey.NUM_GPU.getKey(), -1)        // Offload all layers to GPU
        .additionalBodyParam(OllamaOptions.ParamKey.REPEAT_PENALTY.getKey(), 1.1) // Repetition penalty
        .additionalBodyParam(OllamaOptions.ParamKey.MAIN_GPU.getKey(), 0)        // Main GPU index
        .additionalBodyParam(OllamaOptions.ParamKey.LOW_VRAM.getKey(), false)    // Low VRAM mode
        .additionalBodyParam(OllamaOptions.ParamKey.F16_KV.getKey(), true)       // 16-bit KV cache
        .additionalBodyParam(OllamaOptions.ParamKey.NUM_THREAD.getKey(), 8)      // Number of CPU threads
        .build();

OllamaChatModel model = OllamaChatModel.builder()
        .modelName("qwen3-max")
        .baseUrl("http://localhost:11434")
        .defaultOptions(OllamaOptions.fromGenerateOptions(options))  // Will be converted to OllamaOptions internally
        .build();

Available Parameters¶

Ollama supports over 40 parameters for fine-tuning:

Model Loading Parameters¶

numCtx: Context window size (default: 2048)
numBatch: Batch size for prompt processing (default: 512)
numGPU: Number of layers to offload to GPU (-1 for all)
lowVRAM: Enable low VRAM mode for limited GPU memory
useMMap: Use memory mapping for model loading
useMLock: Lock model in memory to prevent swapping

Generation Parameters¶

temperature: Generation randomness (0.0-2.0)
topK: Top-K sampling (standard: 40)
topP: Nucleus sampling (standard: 0.9)
minP: Minimum probability threshold (default: 0.0)
numPredict: Max tokens to generate (-1 for infinite)
repeatPenalty: Penalty for repetitions (default: 1.1)
presencePenalty: Penalty based on token presence
frequencyPenalty: Penalty based on token frequency
seed: Random seed for reproducible results
stop: Strings that stop generation immediately

Sampling Strategies¶

mirostat: Mirostat sampling (0=disabled, 1=Mirostat v1, 2=Mirostat v2)
mirostatTau: Target entropy for Mirostat (default: 5.0)
mirostatEta: Learning rate for Mirostat (default: 0.1)
tfsZ: Tail-free sampling (default: 1.0 disables)
typicalP: Typical probability sampling (default: 1.0)

Generation Options¶

Configure generation parameters with GenerateOptions:

GenerateOptions options = GenerateOptions.builder()
        .temperature(0.7)           // Randomness (0.0-2.0)
        .topP(0.9)                  // Nucleus sampling
        .topK(40)                   // Top-K sampling
        .maxTokens(2000)            // Maximum output tokens
        .seed(42L)                  // Random seed
        .toolChoice(new ToolChoice.auto())  // Tool choice strategy
        .build();

DashScopeChatModel model = DashScopeChatModel.builder()
        .apiKey(System.getenv("DASHSCOPE_API_KEY"))
        .modelName("qwen3-max")
        .defaultOptions(options)
        .build();

OllamaChatModel model = OllamaChatModel.builder()
        .modelName("qwen3-max")
        .baseUrl("http://localhost:11434")
        .defaultOptions(OllamaOptions.fromGenerateOptions(options))
        .build();

Parameters¶

Parameter	Type	Description
`temperature`	Double	Controls randomness, 0.0-2.0
`topP`	Double	Nucleus sampling threshold, 0.0-1.0
`topK`	Integer	Limits candidate tokens
`maxTokens`	Integer	Maximum tokens to generate
`thinkingBudget`	Integer	Token budget for thinking
`seed`	Long	Random seed
`toolChoice`	ToolChoice	Tool choice strategy

Tool Choice Strategy¶

ToolChoice.auto()              // Model decides (default)
ToolChoice.none()              // Disable tool calling
ToolChoice.required()          // Force tool calling
ToolChoice.specific("tool_name")  // Force specific tool

Additional Parameters¶

Support for provider-specific parameters:

GenerateOptions options = GenerateOptions.builder()
        .additionalHeader("X-Custom-Header", "value")
        .additionalBodyParam("custom_param", "value")
        .additionalQueryParam("version", "v2")
        .build();

Timeout and Retry¶

ExecutionConfig execConfig = ExecutionConfig.builder()
        .timeout(Duration.ofMinutes(2))
        .maxAttempts(3)
        .initialBackoff(Duration.ofSeconds(1))
        .maxBackoff(Duration.ofSeconds(10))
        .backoffMultiplier(2.0)
        .build();

GenerateOptions options = GenerateOptions.builder()
        .executionConfig(execConfig)
        .build();

Formatter¶

Formatter converts AgentScope’s unified message format to each LLM provider’s API format. Each provider has two types of Formatter:

Provider	Single-Agent	Multi-Agent
DashScope	`DashScopeChatFormatter`	`DashScopeMultiAgentFormatter`
OpenAI	`OpenAIChatFormatter`	`OpenAIMultiAgentFormatter`
Anthropic	`AnthropicChatFormatter`	`AnthropicMultiAgentFormatter`
Gemini	`GeminiChatFormatter`	`GeminiMultiAgentFormatter`
Ollama	`OllamaChatFormatter`	`OllamaMultiAgentFormatter`

Default Behavior¶

When no Formatter is specified, the model uses the corresponding ChatFormatter, suitable for single-agent scenarios.

Multi-Agent Scenarios¶

In multi-agent collaboration (such as Pipeline, MsgHub), use MultiAgentFormatter. It will:

Merge messages from multiple agents into conversation history
Use <history></history> tags to structure historical messages
Distinguish between current agent and other agents’ messages

// DashScope multi-agent
DashScopeChatModel model = DashScopeChatModel.builder()
        .apiKey(System.getenv("DASHSCOPE_API_KEY"))
        .modelName("qwen3-max")
        .formatter(new DashScopeMultiAgentFormatter())
        .build();

// OpenAI multi-agent
OpenAIChatModel model = OpenAIChatModel.builder()
        .apiKey(System.getenv("OPENAI_API_KEY"))
        .modelName("gpt-4o")
        .formatter(new OpenAIMultiAgentFormatter())
        .build();

// Anthropic multi-agent
AnthropicChatModel model = AnthropicChatModel.builder()
        .apiKey(System.getenv("ANTHROPIC_API_KEY"))
        .formatter(new AnthropicMultiAgentFormatter())
        .build();

// Gemini multi-agent
GeminiChatModel model = GeminiChatModel.builder()
        .apiKey(System.getenv("GEMINI_API_KEY"))
        .formatter(new GeminiMultiAgentFormatter())
        .build();

// Ollama multi-agent
OllamaChatModel model = OllamaChatModel.builder()
        .modelName("qwen3-max")
        .formatter(new OllamaMultiAgentFormatter())
        .build();

Custom History Prompt¶

You can customize the conversation history prompt:

String customPrompt = "# Conversation Record\nBelow is the previous conversation:\n";

DashScopeChatModel model = DashScopeChatModel.builder()
        .apiKey(System.getenv("DASHSCOPE_API_KEY"))
        .modelName("qwen3-max")
        .formatter(new DashScopeMultiAgentFormatter(customPrompt))
        .build();

When to Use MultiAgentFormatter¶

Scenario	Recommended Formatter
Single-agent conversation	`ChatFormatter` (default)
Pipeline sequential execution	`MultiAgentFormatter`
MsgHub group chat	`MultiAgentFormatter`
Multi-agent debate	`MultiAgentFormatter`