model.md
Model¶
This guide introduces the LLM models supported by AgentScope Java and how to configure them.
Supported Models¶
Provider |
Class |
Streaming |
Tools |
Vision |
Reasoning |
|---|---|---|---|---|---|
DashScope |
|
✅ |
✅ |
✅ |
✅ |
OpenAI |
|
✅ |
✅ |
✅ |
|
Anthropic |
|
✅ |
✅ |
✅ |
✅ |
Gemini |
|
✅ |
✅ |
✅ |
✅ |
Ollama |
|
✅ |
✅ |
✅ |
✅ |
Note:
OpenAIChatModelis compatible with OpenAI API specification, works with vLLM, DeepSeek, etc.
GeminiChatModelsupports both Gemini API and Vertex AI
Getting API Keys¶
Provider |
URL |
Environment Variable |
|---|---|---|
DashScope |
|
|
OpenAI |
|
|
Anthropic |
|
|
Gemini |
|
|
DeepSeek |
- |
DashScope¶
Alibaba Cloud LLM platform, providing Qwen series models.
DashScopeChatModel model = DashScopeChatModel.builder()
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.modelName("qwen3-max")
.build();
Configuration¶
Option |
Description |
|---|---|
|
DashScope API key |
|
Model name, e.g., |
|
Custom API endpoint (optional) |
|
Enable streaming, default |
|
Enable thinking mode to show reasoning process |
|
Enable web search for real-time information |
Thinking Mode¶
DashScopeChatModel model = DashScopeChatModel.builder()
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.modelName("qwen3-max")
.enableThinking(true) // Automatically enables streaming
.defaultOptions(GenerateOptions.builder()
.thinkingBudget(5000) // Token budget for thinking
.build())
.build();
OllamaChatModel model =
OllamaChatModel.builder()
.modelName("qwen3-max")
.baseUrl("http://localhost:11434")
.defaultOptions(OllamaOptions.builder()
.thinkOption(ThinkOption.ThinkBoolean.ENABLED)
.temperature(0.8)
.build())
.build();
OpenAI¶
OpenAI models and compatible APIs.
OpenAIChatModel model = OpenAIChatModel.builder()
.apiKey(System.getenv("OPENAI_API_KEY"))
.modelName("gpt-4o")
.build();
Compatible APIs¶
For DeepSeek, vLLM, and other compatible providers:
OpenAIChatModel model = OpenAIChatModel.builder()
.apiKey("your-api-key")
.modelName("deepseek-chat")
.baseUrl("https://api.deepseek.com")
.build();
Configuration¶
Option |
Description |
|---|---|
|
API key |
|
Model name, e.g., |
|
Custom API endpoint (optional) |
|
Enable streaming, default |
Anthropic¶
Anthropic’s Claude series models.
AnthropicChatModel model = AnthropicChatModel.builder()
.apiKey(System.getenv("ANTHROPIC_API_KEY"))
.modelName("claude-sonnet-4-5-20250929") // Default
.build();
Configuration¶
Option |
Description |
|---|---|
|
Anthropic API key |
|
Model name, default |
|
Custom API endpoint (optional) |
|
Enable streaming, default |
Gemini¶
Google’s Gemini series models, supporting both Gemini API and Vertex AI.
Gemini API¶
GeminiChatModel model = GeminiChatModel.builder()
.apiKey(System.getenv("GEMINI_API_KEY"))
.modelName("gemini-2.5-flash") // Default
.build();
Vertex AI¶
GeminiChatModel model = GeminiChatModel.builder()
.modelName("gemini-2.0-flash")
.project("your-gcp-project")
.location("us-central1")
.vertexAI(true)
.credentials(GoogleCredentials.getApplicationDefault())
.build();
Configuration¶
Option |
Description |
|---|---|
|
Gemini API key |
|
Model name, default |
|
GCP project ID (Vertex AI) |
|
GCP region (Vertex AI) |
|
Whether to use Vertex AI |
|
GCP credentials (Vertex AI) |
|
Enable streaming, default |
Ollama¶
Self-hosted open-source LLM platform supporting various models.
OllamaChatModel model = OllamaChatModel.builder()
.modelName("qwen3-max")
.baseUrl("http://localhost:11434") // Default
.build();
Configuration¶
Option |
Description |
|---|---|
|
Model name, e.g., |
|
Ollama server endpoint (optional, default |
|
Default generation options |
|
Message formatter (optional) |
|
HTTP transport configuration (optional) |
Advanced Configuration¶
For advanced model loading and generation parameters:
OllamaOptions options = OllamaOptions.builder()
.numCtx(4096) // Context window size
.temperature(0.7) // Generation randomness
.topK(40) // Top-K sampling
.topP(0.9) // Nucleus sampling
.repeatPenalty(1.1) // Repetition penalty
.build();
OllamaChatModel model = OllamaChatModel.builder()
.modelName("qwen3-max")
.baseUrl("http://localhost:11434")
.defaultOptions(options)
.build();
GenerateOptions Support¶
Ollama also supports GenerateOptions for standard configuration:
GenerateOptions options = GenerateOptions.builder()
.temperature(0.7) // Maps to Ollama's temperature
.topP(0.9) // Maps to Ollama's top_p
.topK(40) // Maps to Ollama's top_k
.maxTokens(2000) // Maps to Ollama's num_predict
.seed(42L) // Maps to Ollama's seed
.frequencyPenalty(0.5) // Maps to Ollama's frequency_penalty
.presencePenalty(0.5) // Maps to Ollama's presence_penalty
.additionalBodyParam(OllamaOptions.ParamKey.NUM_CTX.getKey(), 4096) // Context window size
.additionalBodyParam(OllamaOptions.ParamKey.NUM_GPU.getKey(), -1) // Offload all layers to GPU
.additionalBodyParam(OllamaOptions.ParamKey.REPEAT_PENALTY.getKey(), 1.1) // Repetition penalty
.additionalBodyParam(OllamaOptions.ParamKey.MAIN_GPU.getKey(), 0) // Main GPU index
.additionalBodyParam(OllamaOptions.ParamKey.LOW_VRAM.getKey(), false) // Low VRAM mode
.additionalBodyParam(OllamaOptions.ParamKey.F16_KV.getKey(), true) // 16-bit KV cache
.additionalBodyParam(OllamaOptions.ParamKey.NUM_THREAD.getKey(), 8) // Number of CPU threads
.build();
OllamaChatModel model = OllamaChatModel.builder()
.modelName("qwen3-max")
.baseUrl("http://localhost:11434")
.defaultOptions(OllamaOptions.fromGenerateOptions(options)) // Will be converted to OllamaOptions internally
.build();
Available Parameters¶
Ollama supports over 40 parameters for fine-tuning:
Model Loading Parameters¶
numCtx: Context window size (default: 2048)numBatch: Batch size for prompt processing (default: 512)numGPU: Number of layers to offload to GPU (-1 for all)lowVRAM: Enable low VRAM mode for limited GPU memoryuseMMap: Use memory mapping for model loadinguseMLock: Lock model in memory to prevent swapping
Generation Parameters¶
temperature: Generation randomness (0.0-2.0)topK: Top-K sampling (standard: 40)topP: Nucleus sampling (standard: 0.9)minP: Minimum probability threshold (default: 0.0)numPredict: Max tokens to generate (-1 for infinite)repeatPenalty: Penalty for repetitions (default: 1.1)presencePenalty: Penalty based on token presencefrequencyPenalty: Penalty based on token frequencyseed: Random seed for reproducible resultsstop: Strings that stop generation immediately
Sampling Strategies¶
mirostat: Mirostat sampling (0=disabled, 1=Mirostat v1, 2=Mirostat v2)mirostatTau: Target entropy for Mirostat (default: 5.0)mirostatEta: Learning rate for Mirostat (default: 0.1)tfsZ: Tail-free sampling (default: 1.0 disables)typicalP: Typical probability sampling (default: 1.0)
Generation Options¶
Configure generation parameters with GenerateOptions:
GenerateOptions options = GenerateOptions.builder()
.temperature(0.7) // Randomness (0.0-2.0)
.topP(0.9) // Nucleus sampling
.topK(40) // Top-K sampling
.maxTokens(2000) // Maximum output tokens
.seed(42L) // Random seed
.toolChoice(new ToolChoice.auto()) // Tool choice strategy
.build();
DashScopeChatModel model = DashScopeChatModel.builder()
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.modelName("qwen3-max")
.defaultOptions(options)
.build();
OllamaChatModel model = OllamaChatModel.builder()
.modelName("qwen3-max")
.baseUrl("http://localhost:11434")
.defaultOptions(OllamaOptions.fromGenerateOptions(options))
.build();
Parameters¶
Parameter |
Type |
Description |
|---|---|---|
|
Double |
Controls randomness, 0.0-2.0 |
|
Double |
Nucleus sampling threshold, 0.0-1.0 |
|
Integer |
Limits candidate tokens |
|
Integer |
Maximum tokens to generate |
|
Integer |
Token budget for thinking |
|
Long |
Random seed |
|
ToolChoice |
Tool choice strategy |
Tool Choice Strategy¶
ToolChoice.auto() // Model decides (default)
ToolChoice.none() // Disable tool calling
ToolChoice.required() // Force tool calling
ToolChoice.specific("tool_name") // Force specific tool
Additional Parameters¶
Support for provider-specific parameters:
GenerateOptions options = GenerateOptions.builder()
.additionalHeader("X-Custom-Header", "value")
.additionalBodyParam("custom_param", "value")
.additionalQueryParam("version", "v2")
.build();
Timeout and Retry¶
ExecutionConfig execConfig = ExecutionConfig.builder()
.timeout(Duration.ofMinutes(2))
.maxAttempts(3)
.initialBackoff(Duration.ofSeconds(1))
.maxBackoff(Duration.ofSeconds(10))
.backoffMultiplier(2.0)
.build();
GenerateOptions options = GenerateOptions.builder()
.executionConfig(execConfig)
.build();
Formatter¶
Formatter converts AgentScope’s unified message format to each LLM provider’s API format. Each provider has two types of Formatter:
Provider |
Single-Agent |
Multi-Agent |
|---|---|---|
DashScope |
|
|
OpenAI |
|
|
Anthropic |
|
|
Gemini |
|
|
Ollama |
|
|
Default Behavior¶
When no Formatter is specified, the model uses the corresponding ChatFormatter, suitable for single-agent scenarios.
Multi-Agent Scenarios¶
In multi-agent collaboration (such as Pipeline, MsgHub), use MultiAgentFormatter. It will:
Merge messages from multiple agents into conversation history
Use
<history></history>tags to structure historical messagesDistinguish between current agent and other agents’ messages
// DashScope multi-agent
DashScopeChatModel model = DashScopeChatModel.builder()
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.modelName("qwen3-max")
.formatter(new DashScopeMultiAgentFormatter())
.build();
// OpenAI multi-agent
OpenAIChatModel model = OpenAIChatModel.builder()
.apiKey(System.getenv("OPENAI_API_KEY"))
.modelName("gpt-4o")
.formatter(new OpenAIMultiAgentFormatter())
.build();
// Anthropic multi-agent
AnthropicChatModel model = AnthropicChatModel.builder()
.apiKey(System.getenv("ANTHROPIC_API_KEY"))
.formatter(new AnthropicMultiAgentFormatter())
.build();
// Gemini multi-agent
GeminiChatModel model = GeminiChatModel.builder()
.apiKey(System.getenv("GEMINI_API_KEY"))
.formatter(new GeminiMultiAgentFormatter())
.build();
// Ollama multi-agent
OllamaChatModel model = OllamaChatModel.builder()
.modelName("qwen3-max")
.formatter(new OllamaMultiAgentFormatter())
.build();
Custom History Prompt¶
You can customize the conversation history prompt:
String customPrompt = "# Conversation Record\nBelow is the previous conversation:\n";
DashScopeChatModel model = DashScopeChatModel.builder()
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.modelName("qwen3-max")
.formatter(new DashScopeMultiAgentFormatter(customPrompt))
.build();
When to Use MultiAgentFormatter¶
Scenario |
Recommended Formatter |
|---|---|
Single-agent conversation |
|
Pipeline sequential execution |
|
MsgHub group chat |
|
Multi-agent debate |
|