RAG (Retrieval-Augmented Generation)¶
AgentScope provides built-in support for Retrieval-Augmented Generation (RAG) tasks, enabling agents to access and utilize external knowledge bases to provide more accurate and informative responses.
Overview¶
The RAG module in AgentScope consists of two core components:
Reader: Responsible for reading and chunking input documents into processable units
Knowledge: Responsible for storing documents, generating embeddings, and retrieving relevant information
AgentScope supports two types of knowledge base implementations:
Type |
Implementation |
Features |
Use Cases |
|---|---|---|---|
Local Knowledge |
|
Requires local embedding model and vector store |
Development, testing, full data control |
Cloud-hosted Knowledge |
|
Uses Alibaba Cloud Bailian Knowledge Base service |
Enterprise production, zero maintenance, advanced retrieval |
Supported Readers¶
AgentScope provides several built-in readers for different document formats:
Reader |
Description |
Supported Formats |
|---|---|---|
|
Reads and chunks plain text documents |
text |
|
Extracts text from PDF files |
|
|
Extracts text, tables, and images from Word documents |
docx |
|
Reads image files (for multimodal RAG) |
jpg, jpeg, png, gif, bmp, tiff, webp |
Each reader chunks documents into Document objects with the following fields:
metadata: Contains content (TextBlock/ImageBlock), doc_id, chunk_id, and total_chunksembedding: The embedding vector (filled when added to or retrieved from knowledge base)score: The relevance score (filled during retrieval)
Quick Start¶
1. Creating a Knowledge Base¶
First, create a knowledge base with an embedding model and vector store:
import io.agentscope.core.embedding.EmbeddingModel;
import io.agentscope.core.embedding.dashscope.DashScopeTextEmbedding;
import io.agentscope.core.rag.Knowledge;
import io.agentscope.core.rag.knowledge.SimpleKnowledge;
import io.agentscope.core.rag.store.InMemoryStore;
import io.agentscope.core.rag.store.VDBStoreBase;
// Create embedding model
EmbeddingModel embeddingModel = DashScopeTextEmbedding.builder()
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.modelName("text-embedding-v3")
.dimensions(1024)
.build();
// Create vector store
VDBStoreBase vectorStore = InMemoryStore.builder()
.dimensions(1024)
.build();
// Create knowledge base
Knowledge knowledge = SimpleKnowledge.builder()
.embeddingModel(embeddingModel)
.embeddingStore(vectorStore)
.build();
2. Adding Documents¶
Use readers to process documents and add them to the knowledge base:
import io.agentscope.core.rag.model.Document;
import io.agentscope.core.rag.model.ReaderInput;
import io.agentscope.core.rag.reader.SplitStrategy;
import io.agentscope.core.rag.reader.TextReader;
// Create a text reader
TextReader reader = new TextReader(512, SplitStrategy.PARAGRAPH, 50);
// Read and chunk a document
String text = "AgentScope is a multi-agent framework...";
ReaderInput input = ReaderInput.fromString(text);
List<Document> documents = reader.read(input).block();
// Add to knowledge base
knowledge.addDocuments(documents).block();
3. Retrieving Knowledge¶
Query the knowledge base to retrieve relevant documents:
import io.agentscope.core.rag.model.Document;
import io.agentscope.core.rag.model.RetrieveConfig;
// Configure retrieval parameters
RetrieveConfig config = RetrieveConfig.builder()
.limit(3) // Return top 3 results
.scoreThreshold(0.5) // Minimum similarity score
.build();
// Retrieve documents
List<Document> results = knowledge.retrieve("What is AgentScope?", config).block();
for (Document doc : results) {
System.out.println("Score: " + doc.getScore());
System.out.println("Content: " + doc.getMetadata().getContent());
}
Cloud-hosted Knowledge Base (Bailian)¶
AgentScope supports Alibaba Cloud Bailian Knowledge Base service, providing an enterprise-grade cloud-hosted RAG solution. Unlike local knowledge bases, Bailian Knowledge requires no local embedding model or vector store - all document processing, embedding, and retrieval are handled by the cloud service.
Core Features¶
Zero Infrastructure: No need to deploy and maintain vector databases
Automatic Processing: Documents are automatically parsed, chunked, and embedded
Enterprise-grade Retrieval: Supports reranking and query rewriting
Multi-turn Conversations: Automatically leverages conversation history to improve retrieval accuracy
Structured/Unstructured Data: Supports various knowledge base types
Quick Start¶
1. Configure Bailian Connection¶
import io.agentscope.core.rag.integration.bailian.BailianConfig;
import io.agentscope.core.rag.integration.bailian.BailianKnowledge;
// Configure Bailian connection
BailianConfig config = BailianConfig.builder()
.accessKeyId(System.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID"))
.accessKeySecret(System.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET"))
.workspaceId("llm-xxx") // Your workspace ID
.indexId("mymxbdxxxx") // Your knowledge base index ID
.build();
// Create knowledge base instance
BailianKnowledge knowledge = BailianKnowledge.builder()
.config(config)
.build();
2. Configure Advanced Retrieval Options¶
Bailian supports rich retrieval configuration options:
import io.agentscope.core.rag.integration.bailian.RerankConfig;
import io.agentscope.core.rag.integration.bailian.RewriteConfig;
BailianConfig config = BailianConfig.builder()
.accessKeyId(System.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID"))
.accessKeySecret(System.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET"))
.workspaceId("llm-xxx")
.indexId("mymxbdxxxx")
// Configure dense vector retrieval
.denseSimilarityTopK(20) // Dense retrieval returns top 20
// Configure sparse vector retrieval (optional)
.sparseSimilarityTopK(10) // Sparse retrieval returns top 10
// Enable reranking
.enableReranking(true)
.rerankConfig(
RerankConfig.builder()
.modelName("gte-rerank-hybrid")
.rerankMinScore(0.3f) // Minimum reranking score
.rerankTopN(5) // Return top 5 results
.build())
// Enable query rewriting (multi-turn conversations)
.enableRewrite(true)
.rewriteConfig(
RewriteConfig.builder()
.modelName("conv-rewrite-qwen-1.8b")
.build())
.build();
3. Retrieve Documents¶
import io.agentscope.core.rag.model.RetrieveConfig;
import io.agentscope.core.rag.model.Document;
// Configure retrieval parameters
RetrieveConfig retrieveConfig = RetrieveConfig.builder()
.limit(5) // Return up to 5 documents
.scoreThreshold(0.3) // Minimum similarity score
.build();
// Retrieve documents
List<Document> results = knowledge.retrieve("What is RAG?", retrieveConfig).block();
for (Document doc : results) {
System.out.println("Score: " + doc.getScore());
System.out.println("Document ID: " + doc.getMetadata().getDocId());
System.out.println("Content: " + doc.getMetadata().getContent());
}
4. Multi-turn Retrieval with Conversation History¶
Bailian can leverage conversation history to improve retrieval effectiveness by automatically rewriting queries based on context:
import io.agentscope.core.message.Msg;
import io.agentscope.core.message.MsgRole;
// Prepare conversation history
List<Msg> conversationHistory = List.of(
Msg.builder().textContent("What is AgentScope?").build(),
Msg.builder().role(MsgRole.ASSISTANT).textContent("AgentScope is a multi-agent framework...").build()
);
// Retrieval config with history
RetrieveConfig config = RetrieveConfig.builder()
.limit(5)
.scoreThreshold(0.3)
.conversationHistory(conversationHistory) // Add conversation history
.build();
// Query will be automatically rewritten to consider context
List<Document> results = knowledge.retrieve("What are its features?", config).block();
Integration with ReActAgent¶
In Agentic mode, the agent automatically extracts conversation history from its Memory and passes it to Bailian for context-aware retrieval:
import io.agentscope.core.ReActAgent;
import io.agentscope.core.rag.RAGMode;
import io.agentscope.core.tool.Toolkit;
import io.agentscope.core.rag.KnowledgeRetrievalTools;
// Create Bailian knowledge base
BailianKnowledge knowledge = BailianKnowledge.builder()
.config(bailianConfig)
.build();
// Use Agentic mode
ReActAgent agent = ReActAgent.builder()
.name("Assistant")
.sysPrompt("You are a helpful assistant with a knowledge retrieval tool. " +
"Use the retrieve_knowledge tool when you need information.")
.model(chatModel)
.toolkit(new Toolkit())
.knowledge(knowledge)
.ragMode(RAGMode.AGENTIC) // Agent autonomously decides when to retrieve
.retrieveConfig(
RetrieveConfig.builder()
.limit(5)
.scoreThreshold(0.3)
.build())
.build();
// Multi-turn conversations automatically leverage historical context
agent.call(Msg.builder().textContent("What is AgentScope?").build());
agent.call(Msg.builder().textContent("What models does it support?").build());
// The second query will leverage the first conversation's context to improve retrieval accuracy
Document Management¶
Note: Currently, document upload and management need to be done through the Bailian console. API-based document management will be supported in future releases.
Log in to Alibaba Cloud Bailian Platform
Create a knowledge base and upload documents
Obtain workspace ID and index ID
Use these IDs in your code for retrieval
Bailian vs SimpleKnowledge¶
Feature |
SimpleKnowledge |
BailianKnowledge |
|---|---|---|
Deployment |
Requires local embedding model and vector store |
Cloud service, zero deployment |
Document Processing |
Need to write Reader code yourself |
Upload via console, automatic processing |
Retrieval Capabilities |
Basic vector retrieval |
Advanced retrieval (reranking, rewriting) |
Scalability |
Limited by local resources |
Cloud service auto-scaling |
Cost |
Computing resource costs |
Pay per use |
Data Control |
Full local control |
Hosted in cloud |
Multi-turn Conversations |
Need manual implementation |
Automatically supported |
Use Cases |
Development, testing, small-scale |
Production, enterprise, large-scale |
Complete Example¶
See the complete Bailian RAG example:
examples/src/main/java/io/agentscope/examples/BailianRAGExample.java
Run the example:
cd examples
# Set environment variables
export ALIBABA_CLOUD_ACCESS_KEY_ID="your-access-key-id"
export ALIBABA_CLOUD_ACCESS_KEY_SECRET="your-access-key-secret"
export BAILIAN_WORKSPACE_ID="your-workspace-id"
export BAILIAN_INDEX_ID="your-index-id"
mvn exec:java -Dexec.mainClass="io.agentscope.examples.BailianRAGExample"
Integrating with ReActAgent¶
AgentScope supports two integration modes for RAG with ReActAgent:
Mode |
Description |
Advantages |
Disadvantages |
|---|---|---|---|
Generic Mode |
Automatically retrieves and injects knowledge before each reasoning step |
Simple, works with any LLM |
Retrieves even when unnecessary |
Agentic Mode |
Agent decides when to retrieve using a tool |
Flexible, only retrieves when needed |
Requires strong reasoning capabilities |
Generic Mode¶
In Generic mode, knowledge is automatically retrieved and injected into the user’s message:
import io.agentscope.core.ReActAgent;
import io.agentscope.core.rag.RAGMode;
ReActAgent agent = ReActAgent.builder()
.name("Assistant")
.sysPrompt("You are a helpful assistant with access to a knowledge base.")
.model(chatModel)
.toolkit(new Toolkit())
// Enable Generic RAG mode
.knowledge(knowledge)
.ragMode(RAGMode.GENERIC)
.retrieveConfig(
RetrieveConfig.builder()
.limit(3)
.scoreThreshold(0.3)
.build())
.enableOnlyForUserQueries(true) // Only retrieve for user messages
.build();
// The agent will automatically retrieve knowledge for each query
agent.call(Msg.builder()
.name("user")
.textContent("What is AgentScope?")
.build());
How it works:
User sends a query
Knowledge base automatically retrieves relevant documents
Retrieved documents are prepended to the user’s message
Agent processes the enhanced message and responds
Agentic Mode¶
In Agentic mode, the agent has a retrieve_knowledge tool and decides when to use it:
ReActAgent agent = ReActAgent.builder()
.name("Agent")
.sysPrompt("You are a helpful assistant with a knowledge retrieval tool. " +
"Use the retrieve_knowledge tool when you need information.")
.model(chatModel)
.toolkit(new Toolkit())
// Enable Agentic RAG mode
.knowledge(knowledge)
.ragMode(RAGMode.AGENTIC)
.retrieveConfig(
RetrieveConfig.builder()
.limit(3)
.scoreThreshold(0.5)
.build())
.build();
// The agent decides when to retrieve
agent.call(Msg.builder()
.name("user")
.textContent("What is RAG?")
.build());
How it works:
User sends a query
Agent reasons and decides whether to retrieve knowledge
If needed, agent calls
retrieve_knowledge(query="...")Retrieved documents are returned as tool results
Agent reasons again with the retrieved information
Reading Different Document Types¶
Text Documents¶
TextReader reader = new TextReader(
512, // Chunk size
SplitStrategy.PARAGRAPH, // Split by paragraph
50 // Overlap size
);
ReaderInput input = ReaderInput.fromString("Your text content...");
List<Document> docs = reader.read(input).block();
Supported split strategies:
SplitStrategy.CHARACTER: Split by character countSplitStrategy.PARAGRAPH: Split by paragraphs (double newline)SplitStrategy.SENTENCE: Split by sentencesSplitStrategy.TOKEN: Split by approximate token count
PDF Documents¶
import io.agentscope.core.rag.reader.PDFReader;
PDFReader reader = new PDFReader(512, SplitStrategy.PARAGRAPH, 50);
ReaderInput input = ReaderInput.fromString("/path/to/document.pdf");
List<Document> docs = reader.read(input).block();
Word Documents¶
import io.agentscope.core.rag.reader.WordReader;
import io.agentscope.core.rag.reader.TableFormat;
WordReader reader = new WordReader(
512, // Chunk size
SplitStrategy.PARAGRAPH, // Split strategy
50, // Overlap size
true, // Include images
true, // Separate tables as chunks
TableFormat.MARKDOWN // Table format (MARKDOWN or JSON)
);
ReaderInput input = ReaderInput.fromString("/path/to/document.docx");
List<Document> docs = reader.read(input).block();
Image Documents (Multimodal RAG)¶
import io.agentscope.core.rag.reader.ImageReader;
import io.agentscope.core.embedding.dashscope.DashScopeMultiModalEmbedding;
import io.agentscope.core.rag.store.InMemoryStore;
import io.agentscope.core.rag.store.VDBStoreBase;
// Create multimodal embedding model
EmbeddingModel embeddingModel = DashScopeMultiModalEmbedding.builder()
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.modelName("multimodal-embedding-one")
.dimensions(1024)
.build();
// Create vector store
VDBStoreBase vectorStore = InMemoryStore.builder()
.dimensions(1024)
.build();
// Create knowledge base with multimodal embedding
Knowledge knowledge = SimpleKnowledge.builder()
.embeddingModel(embeddingModel)
.embeddingStore(vectorStore)
.build();
// Read image
ImageReader reader = new ImageReader(false); // OCR disabled
ReaderInput input = ReaderInput.fromString("/path/to/image.jpg");
// or from URL
// ReaderInput input = ReaderInput.fromString("https://example.com/image.jpg");
List<Document> docs = reader.read(input).block();
knowledge.addDocuments(docs).block();
Vector Stores¶
AgentScope supports multiple vector store backends:
In-Memory Store¶
Fast, suitable for development and small datasets:
InMemoryStore store = InMemoryStore.builder()
.dimensions(1024)
.build();
Qdrant Store¶
Production-ready vector database with persistence:
import io.agentscope.core.rag.store.QdrantStore;
QdrantStore store = QdrantStore.builder()
.location("localhost:6334") // Qdrant server location
.collectionName("my_collection")
.dimensions(1024)
.apiKey("your-api-key") // Optional: for cloud
.useTransportLayerSecurity(true) // Enable TLS
.build();
Qdrant supports various storage backends via the location parameter:
:memory:- In-memory storagepath/to/db- Local file storagelocalhost:6334- Remote server (gRPC)http://localhost:6333- Remote server (REST)
Customizing RAG Components¶
AgentScope encourages customization of RAG components. You can extend the following base classes:
Base Class |
Description |
Abstract Methods |
|---|---|---|
|
Base for document readers |
|
|
Base for vector stores |
|
|
Base for knowledge implementations |
|
Custom Reader Example¶
import io.agentscope.core.rag.reader.Reader;
import reactor.core.publisher.Mono;
public class CustomReader implements Reader {
@Override
public Mono<List<Document>> read(ReaderInput input) throws ReaderException {
return Mono.fromCallable(() -> {
// Your custom reading logic
String content = processInput(input);
List<String> chunks = chunkContent(content);
return createDocuments(chunks);
});
}
@Override
public List<String> getSupportedFormats() {
return List.of("custom", "fmt");
}
private List<Document> createDocuments(List<String> chunks) {
// Create Document objects with metadata
// ...
}
}
Best Practices¶
Chunk Size: Choose chunk size based on your model’s context window and use case. Typical values: 256-1024 characters.
Overlap: Use 10-20% overlap to maintain context across chunks.
Score Threshold: Start with 0.3-0.5 and adjust based on retrieval quality.
Top-K: Retrieve 3-5 documents initially, adjust based on context window limits.
Mode Selection:
Use Generic mode for: Simple Q&A, consistent retrieval patterns, weaker LLMs
Use Agentic mode for: Complex tasks, selective retrieval, strong LLMs
Vector Store Selection:
Use InMemoryStore for: Development, testing, small datasets (<10K documents)
Use QdrantStore for: Production, large datasets, persistence required
Embedding Models:
Use text embedding for text-only documents
Use multimodal embedding for mixed content (text + images)
Complete Example¶
See the full RAG example at:
examples/src/main/java/io/agentscope/examples/RAGExample.java
Run the example:
cd examples
mvn exec:java -Dexec.mainClass="io.agentscope.examples.RAGExample"