Large Language Models (LLMs) are powerful, but they have a critical limitation: they only know what they were trained on. Ask them about your company’s internal documentation, recent product updates, or proprietary processes, and they’ll either hallucinate or admit ignorance. Retrieval-Augmented Generation (RAG) solves this by combining the reasoning power of LLMs with real-time access to your own data.
In this deep-dive, we’ll explore how to implement production-ready RAG systems using Spring AI, covering everything from document ingestion to advanced retrieval strategies.
What is RAG and Why Does It Matter?
RAG is an architectural pattern that enhances LLM responses by retrieving relevant information from a knowledge base before generating answers. Instead of relying solely on the model’s training data, RAG systems:
- Retrieve relevant documents from a vector database
- Augment the prompt with retrieved context
- Generate a response grounded in actual data
┌─────────────────────────────────────────────────────────────────────┐
│ RAG Pipeline │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────────┐ ┌───────────────────────┐ │
│ │ User │ │ Embedding │ │ Vector Database │ │
│ │ Query │────►│ Model │────►│ (Similarity Search) │ │
│ └──────────┘ └──────────────┘ └───────────┬───────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ Retrieved Context Documents │ │
│ └──────────────────────────────┬───────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ Augmented Prompt = User Query + Retrieved Context │ │
│ └──────────────────────────────┬───────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ LLM (Generate) │ │
│ └──────────────────────────────┬───────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ Grounded Response with Citations │ │
│ └──────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
Benefits of RAG
- Current Information: Access data that didn’t exist when the model was trained
- Reduced Hallucination: Responses are grounded in actual documents
- Source Attribution: Cite specific documents for transparency
- Cost Efficiency: Smaller models can perform well with good retrieval
- Data Privacy: Keep sensitive data in your own infrastructure
Spring AI RAG Architecture
Spring AI provides comprehensive support for building RAG systems with familiar Spring patterns. Let’s explore the key components.
Project Dependencies
<dependencies>
<!-- Spring AI Core -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
<version>1.0.0</version>
</dependency>
<!-- Vector Store - Choose one -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-pgvector-store-spring-boot-starter</artifactId>
<version>1.0.0</version>
</dependency>
<!-- Document Readers -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-pdf-document-reader</artifactId>
<version>1.0.0</version>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-tika-document-reader</artifactId>
<version>1.0.0</version>
</dependency>
</dependencies>
Configuration
spring:
ai:
openai:
api-key: ${OPENAI_API_KEY}
embedding:
options:
model: text-embedding-3-small
chat:
options:
model: gpt-4-turbo-preview
temperature: 0.2
vectorstore:
pgvector:
index-type: HNSW
distance-type: COSINE_DISTANCE
dimensions: 1536
datasource:
url: jdbc:postgresql://localhost:5432/ragdb
username: ${DB_USERNAME}
password: ${DB_PASSWORD}
Document Ingestion Pipeline
The first step in RAG is getting your documents into the vector database. This involves reading, chunking, embedding, and storing documents.
Document Reader Service
@Service
public class DocumentIngestionService {
private final VectorStore vectorStore;
private final EmbeddingModel embeddingModel;
private final DocumentTransformer documentTransformer;
public DocumentIngestionService(VectorStore vectorStore,
EmbeddingModel embeddingModel) {
this.vectorStore = vectorStore;
this.embeddingModel = embeddingModel;
this.documentTransformer = new TokenTextSplitter();
}
/**
* Ingest a PDF document into the vector store
*/
public IngestionResult ingestPdf(Resource pdfResource,
Map<String, Object> metadata) {
// Read PDF
PagePdfDocumentReader reader = new PagePdfDocumentReader(
pdfResource,
PdfDocumentReaderConfig.builder()
.withPageExtractedTextFormatter(
new ExtractedTextFormatter.Builder()
.withNumberOfBottomTextLinesToDelete(3)
.withNumberOfTopPagesToSkipBeforeDelete(1)
.build()
)
.withPagesPerDocument(1)
.build()
);
List<Document> documents = reader.get();
// Add metadata
documents.forEach(doc -> doc.getMetadata().putAll(metadata));
// Process and store
return processAndStore(documents);
}
/**
* Ingest various document types using Apache Tika
*/
public IngestionResult ingestDocument(Resource resource,
String contentType,
Map<String, Object> metadata) {
TikaDocumentReader reader = new TikaDocumentReader(resource);
List<Document> documents = reader.get();
// Add metadata
documents.forEach(doc -> {
doc.getMetadata().putAll(metadata);
doc.getMetadata().put("contentType", contentType);
doc.getMetadata().put("ingestedAt", Instant.now().toString());
});
return processAndStore(documents);
}
private IngestionResult processAndStore(List<Document> documents) {
// Split into chunks
List<Document> chunks = documentTransformer.apply(documents);
// Store in vector database (embedding happens automatically)
vectorStore.add(chunks);
return new IngestionResult(
documents.size(),
chunks.size(),
Instant.now()
);
}
}
Advanced Text Splitting
Proper chunking is crucial for retrieval quality. Here’s a custom splitter that respects document structure:
@Component
public class SemanticTextSplitter implements DocumentTransformer {
private static final int DEFAULT_CHUNK_SIZE = 1000;
private static final int DEFAULT_OVERLAP = 200;
private final int chunkSize;
private final int overlap;
public SemanticTextSplitter() {
this(DEFAULT_CHUNK_SIZE, DEFAULT_OVERLAP);
}
public SemanticTextSplitter(int chunkSize, int overlap) {
this.chunkSize = chunkSize;
this.overlap = overlap;
}
@Override
public List<Document> apply(List<Document> documents) {
return documents.stream()
.flatMap(doc -> splitDocument(doc).stream())
.collect(Collectors.toList());
}
private List<Document> splitDocument(Document document) {
String content = document.getContent();
List<Document> chunks = new ArrayList<>();
// First, try to split by semantic boundaries
List<String> sections = splitBySections(content);
for (int sectionIndex = 0; sectionIndex < sections.size(); sectionIndex++) {
String section = sections.get(sectionIndex);
// If section is small enough, keep as single chunk
if (section.length() <= chunkSize) {
chunks.add(createChunk(document, section, sectionIndex, 0));
continue;
}
// Split large sections by paragraphs with overlap
List<String> paragraphChunks = splitWithOverlap(section);
for (int chunkIndex = 0; chunkIndex < paragraphChunks.size(); chunkIndex++) {
chunks.add(createChunk(document, paragraphChunks.get(chunkIndex),
sectionIndex, chunkIndex));
}
}
return chunks;
}
private List<String> splitBySections(String content) {
// Split by markdown headers or double newlines
String[] sections = content.split("(?=^#{1,3}\\s)|\\n\\n(?=[A-Z])");
return Arrays.stream(sections)
.filter(s -> !s.isBlank())
.collect(Collectors.toList());
}
private List<String> splitWithOverlap(String text) {
List<String> chunks = new ArrayList<>();
String[] sentences = text.split("(?<=[.!?])\\s+");
StringBuilder currentChunk = new StringBuilder();
StringBuilder overlapBuffer = new StringBuilder();
for (String sentence : sentences) {
if (currentChunk.length() + sentence.length() > chunkSize) {
if (currentChunk.length() > 0) {
chunks.add(currentChunk.toString().trim());
// Start new chunk with overlap
currentChunk = new StringBuilder(overlapBuffer.toString());
overlapBuffer = new StringBuilder();
}
}
currentChunk.append(sentence).append(" ");
overlapBuffer.append(sentence).append(" ");
// Keep overlap buffer trimmed
while (overlapBuffer.length() > overlap) {
int spaceIndex = overlapBuffer.indexOf(" ", 1);
if (spaceIndex > 0) {
overlapBuffer.delete(0, spaceIndex + 1);
} else {
break;
}
}
}
if (currentChunk.length() > 0) {
chunks.add(currentChunk.toString().trim());
}
return chunks;
}
private Document createChunk(Document source, String content,
int sectionIndex, int chunkIndex) {
Map<String, Object> metadata = new HashMap<>(source.getMetadata());
metadata.put("sectionIndex", sectionIndex);
metadata.put("chunkIndex", chunkIndex);
metadata.put("sourceId", source.getId());
return new Document(content, metadata);
}
}
Building the RAG Service
Now let’s implement the core RAG functionality:
@Service
public class RagService {
private final VectorStore vectorStore;
private final ChatClient chatClient;
private final EmbeddingModel embeddingModel;
public RagService(VectorStore vectorStore,
ChatClient.Builder chatClientBuilder,
EmbeddingModel embeddingModel) {
this.vectorStore = vectorStore;
this.chatClient = chatClientBuilder.build();
this.embeddingModel = embeddingModel;
}
public RagResponse query(String question) {
return query(question, RagOptions.defaults());
}
public RagResponse query(String question, RagOptions options) {
// 1. Retrieve relevant documents
SearchRequest searchRequest = SearchRequest.query(question)
.withTopK(options.getTopK())
.withSimilarityThreshold(options.getSimilarityThreshold())
.withFilterExpression(options.getFilterExpression());
List<Document> relevantDocs = vectorStore.similaritySearch(searchRequest);
if (relevantDocs.isEmpty()) {
return RagResponse.noContext(
"I couldn't find any relevant information to answer your question."
);
}
// 2. Build augmented prompt
String context = buildContext(relevantDocs);
String augmentedPrompt = buildPrompt(question, context, options);
// 3. Generate response
ChatResponse response = chatClient.prompt()
.user(augmentedPrompt)
.call()
.chatResponse();
// 4. Build response with sources
return RagResponse.builder()
.answer(response.getResult().getOutput().getContent())
.sources(extractSources(relevantDocs))
.tokensUsed(response.getMetadata().getUsage().getTotalTokens())
.build();
}
private String buildContext(List<Document> documents) {
StringBuilder context = new StringBuilder();
for (int i = 0; i < documents.size(); i++) {
Document doc = documents.get(i);
context.append(String.format("[Document %d]%n", i + 1));
context.append(String.format("Source: %s%n",
doc.getMetadata().getOrDefault("source", "Unknown")));
context.append(String.format("Content: %s%n%n", doc.getContent()));
}
return context.toString();
}
private String buildPrompt(String question, String context, RagOptions options) {
return String.format("""
You are a helpful assistant that answers questions based on the provided context.
## Instructions
- Answer the question based ONLY on the provided context
- If the context doesn't contain enough information, say so
- Cite your sources using [Document N] notation
- Be concise but thorough
%s
## Context
%s
## Question
%s
## Answer
""",
options.getAdditionalInstructions(),
context,
question
);
}
private List<SourceReference> extractSources(List<Document> documents) {
return documents.stream()
.map(doc -> new SourceReference(
(String) doc.getMetadata().getOrDefault("source", "Unknown"),
(String) doc.getMetadata().getOrDefault("title", "Untitled"),
doc.getMetadata().containsKey("page") ?
((Number) doc.getMetadata().get("page")).intValue() : null
))
.distinct()
.collect(Collectors.toList());
}
}
Advanced Retrieval Strategies
Basic similarity search works well, but production systems often need more sophisticated retrieval.
Hybrid Search
Combine semantic and keyword search for better results:
@Service
public class HybridSearchService {
private final VectorStore vectorStore;
private final ElasticsearchClient elasticsearchClient;
public List<Document> hybridSearch(String query, int topK) {
// Semantic search
List<Document> semanticResults = vectorStore.similaritySearch(
SearchRequest.query(query).withTopK(topK * 2)
);
// Keyword search (BM25)
List<Document> keywordResults = keywordSearch(query, topK * 2);
// Reciprocal Rank Fusion
return reciprocalRankFusion(semanticResults, keywordResults, topK);
}
private List<Document> keywordSearch(String query, int topK) {
SearchResponse<Document> response = elasticsearchClient.search(s -> s
.index("documents")
.query(q -> q
.multiMatch(m -> m
.query(query)
.fields("content^2", "title^3", "metadata.*")
.fuzziness("AUTO")
)
)
.size(topK),
Document.class
);
return response.hits().hits().stream()
.map(Hit::source)
.collect(Collectors.toList());
}
private List<Document> reciprocalRankFusion(List<Document> list1,
List<Document> list2,
int topK) {
int k = 60; // Constant for RRF
Map<String, Double> scores = new HashMap<>();
Map<String, Document> documents = new HashMap<>();
// Score from first list
for (int i = 0; i < list1.size(); i++) {
Document doc = list1.get(i);
String id = doc.getId();
scores.merge(id, 1.0 / (k + i + 1), Double::sum);
documents.put(id, doc);
}
// Score from second list
for (int i = 0; i < list2.size(); i++) {
Document doc = list2.get(i);
String id = doc.getId();
scores.merge(id, 1.0 / (k + i + 1), Double::sum);
documents.put(id, doc);
}
// Sort by combined score and return top K
return scores.entrySet().stream()
.sorted(Map.Entry.<String, Double>comparingByValue().reversed())
.limit(topK)
.map(e -> documents.get(e.getKey()))
.collect(Collectors.toList());
}
}
Query Expansion with HyDE
Hypothetical Document Embeddings (HyDE) generates a hypothetical answer to improve retrieval:
@Service
public class HydeQueryExpander {
private final ChatClient chatClient;
private final EmbeddingModel embeddingModel;
private final VectorStore vectorStore;
public List<Document> searchWithHyde(String query, int topK) {
// Generate hypothetical answer
String hypotheticalAnswer = generateHypotheticalAnswer(query);
// Combine query and hypothetical answer for embedding
String expandedQuery = query + "\n\n" + hypotheticalAnswer;
// Search with expanded query
return vectorStore.similaritySearch(
SearchRequest.query(expandedQuery).withTopK(topK)
);
}
private String generateHypotheticalAnswer(String query) {
String prompt = String.format("""
Write a short, factual paragraph that would answer the following question.
Do not include phrases like "According to..." or "Based on...".
Just write the content as if it were from an authoritative source.
Question: %s
Answer:
""", query);
return chatClient.prompt()
.user(prompt)
.call()
.content();
}
}
Multi-Query Retrieval
Generate multiple query variations to improve recall:
@Service
public class MultiQueryRetriever {
private final ChatClient chatClient;
private final VectorStore vectorStore;
public List<Document> retrieve(String originalQuery, int topK) {
// Generate query variations
List<String> queryVariations = generateQueryVariations(originalQuery);
// Search with each variation
Set<Document> allResults = new LinkedHashSet<>();
for (String query : queryVariations) {
List<Document> results = vectorStore.similaritySearch(
SearchRequest.query(query).withTopK(topK)
);
allResults.addAll(results);
}
// Re-rank and return top K
return rerank(new ArrayList<>(allResults), originalQuery, topK);
}
private List<String> generateQueryVariations(String query) {
String prompt = String.format("""
Generate 3 different versions of the following query for a search system.
Each version should capture the same intent but use different words or phrasings.
Return only the queries, one per line.
Original query: %s
Variations:
""", query);
String response = chatClient.prompt()
.user(prompt)
.call()
.content();
List<String> variations = new ArrayList<>();
variations.add(query); // Include original
variations.addAll(Arrays.asList(response.split("\n")));
return variations.stream()
.map(String::trim)
.filter(s -> !s.isEmpty())
.limit(4)
.collect(Collectors.toList());
}
private List<Document> rerank(List<Document> documents,
String query, int topK) {
// Simple reranking using embedding similarity
float[] queryEmbedding = embeddingModel.embed(query);
return documents.stream()
.sorted((d1, d2) -> {
float sim1 = cosineSimilarity(queryEmbedding,
embeddingModel.embed(d1.getContent()));
float sim2 = cosineSimilarity(queryEmbedding,
embeddingModel.embed(d2.getContent()));
return Float.compare(sim2, sim1);
})
.limit(topK)
.collect(Collectors.toList());
}
}
Conversation Memory with RAG
Maintain context across multiple turns:
@Service
public class ConversationalRagService {
private final RagService ragService;
private final ChatClient chatClient;
private final ConversationMemoryStore memoryStore;
public RagResponse chat(String sessionId, String userMessage) {
// Get conversation history
List<Message> history = memoryStore.getHistory(sessionId);
// Determine if this is a follow-up question
String standaloneQuestion = reformulateQuestion(history, userMessage);
// Perform RAG with reformulated question
RagResponse ragResponse = ragService.query(standaloneQuestion);
// Store in memory
memoryStore.addMessage(sessionId, MessageType.USER, userMessage);
memoryStore.addMessage(sessionId, MessageType.ASSISTANT,
ragResponse.getAnswer());
return ragResponse;
}
private String reformulateQuestion(List<Message> history, String question) {
if (history.isEmpty()) {
return question;
}
String conversationContext = history.stream()
.map(m -> m.getType() + ": " + m.getContent())
.collect(Collectors.joining("\n"));
String prompt = String.format("""
Given the conversation history and the follow-up question,
reformulate the question to be a standalone question that captures
all necessary context.
Conversation history:
%s
Follow-up question: %s
Standalone question:
""", conversationContext, question);
return chatClient.prompt()
.user(prompt)
.call()
.content()
.trim();
}
}
Evaluation and Monitoring
RAG Evaluation Metrics
@Service
public class RagEvaluationService {
private final ChatClient chatClient;
private final EmbeddingModel embeddingModel;
public EvaluationResult evaluate(String question,
String generatedAnswer,
List<Document> retrievedDocs,
String groundTruth) {
return EvaluationResult.builder()
.faithfulness(evaluateFaithfulness(generatedAnswer, retrievedDocs))
.relevance(evaluateRelevance(question, generatedAnswer))
.contextRelevance(evaluateContextRelevance(question, retrievedDocs))
.answerSimilarity(computeAnswerSimilarity(generatedAnswer, groundTruth))
.build();
}
private double evaluateFaithfulness(String answer, List<Document> docs) {
String context = docs.stream()
.map(Document::getContent)
.collect(Collectors.joining("\n"));
String prompt = String.format("""
Evaluate if the following answer is faithful to the given context.
An answer is faithful if all claims in it can be verified from the context.
Context: %s
Answer: %s
Return a score from 0 to 1, where 1 means completely faithful.
Return ONLY the number.
""", context, answer);
String response = chatClient.prompt()
.user(prompt)
.call()
.content();
return Double.parseDouble(response.trim());
}
private double evaluateRelevance(String question, String answer) {
String prompt = String.format("""
Evaluate if the following answer is relevant to the question.
Question: %s
Answer: %s
Return a score from 0 to 1, where 1 means highly relevant.
Return ONLY the number.
""", question, answer);
String response = chatClient.prompt()
.user(prompt)
.call()
.content();
return Double.parseDouble(response.trim());
}
private double evaluateContextRelevance(String question, List<Document> docs) {
double totalScore = 0;
for (Document doc : docs) {
float[] questionEmbed = embeddingModel.embed(question);
float[] docEmbed = embeddingModel.embed(doc.getContent());
totalScore += cosineSimilarity(questionEmbed, docEmbed);
}
return totalScore / docs.size();
}
private double computeAnswerSimilarity(String generated, String groundTruth) {
float[] genEmbed = embeddingModel.embed(generated);
float[] truthEmbed = embeddingModel.embed(groundTruth);
return cosineSimilarity(genEmbed, truthEmbed);
}
}
Observability
@Configuration
public class RagObservabilityConfig {
@Bean
public MeterRegistryCustomizer<MeterRegistry> ragMetrics() {
return registry -> {
// Custom metrics for RAG
Gauge.builder("rag.vector_store.document_count", vectorStore,
vs -> vs.count())
.register(registry);
};
}
}
@Aspect
@Component
public class RagMetricsAspect {
private final MeterRegistry meterRegistry;
@Around("execution(* com.example.rag.RagService.query(..))")
public Object measureQuery(ProceedingJoinPoint joinPoint) throws Throwable {
Timer.Sample sample = Timer.start(meterRegistry);
try {
Object result = joinPoint.proceed();
RagResponse response = (RagResponse) result;
// Record metrics
meterRegistry.counter("rag.queries.total").increment();
meterRegistry.counter("rag.sources.retrieved")
.increment(response.getSources().size());
meterRegistry.counter("rag.tokens.used")
.increment(response.getTokensUsed());
sample.stop(Timer.builder("rag.query.duration")
.tag("status", "success")
.register(meterRegistry));
return result;
} catch (Exception e) {
sample.stop(Timer.builder("rag.query.duration")
.tag("status", "error")
.register(meterRegistry));
meterRegistry.counter("rag.queries.errors",
"exception", e.getClass().getSimpleName()).increment();
throw e;
}
}
}
REST API for RAG
@RestController
@RequestMapping("/api/rag")
public class RagController {
private final RagService ragService;
private final ConversationalRagService conversationalRagService;
private final DocumentIngestionService ingestionService;
@PostMapping("/query")
public ResponseEntity<RagResponse> query(@RequestBody QueryRequest request) {
RagOptions options = RagOptions.builder()
.topK(request.getTopK() != null ? request.getTopK() : 5)
.similarityThreshold(request.getThreshold() != null ?
request.getThreshold() : 0.7)
.filterExpression(request.getFilter())
.build();
RagResponse response = ragService.query(request.getQuestion(), options);
return ResponseEntity.ok(response);
}
@PostMapping("/chat/{sessionId}")
public ResponseEntity<RagResponse> chat(
@PathVariable String sessionId,
@RequestBody ChatRequest request) {
RagResponse response = conversationalRagService.chat(
sessionId,
request.getMessage()
);
return ResponseEntity.ok(response);
}
@PostMapping("/ingest")
public ResponseEntity<IngestionResult> ingestDocument(
@RequestParam("file") MultipartFile file,
@RequestParam(required = false) String source,
@RequestParam(required = false) String category) {
try {
Resource resource = new ByteArrayResource(file.getBytes());
Map<String, Object> metadata = new HashMap<>();
metadata.put("source", source != null ? source : file.getOriginalFilename());
metadata.put("category", category);
metadata.put("originalFilename", file.getOriginalFilename());
IngestionResult result = ingestionService.ingestDocument(
resource,
file.getContentType(),
metadata
);
return ResponseEntity.ok(result);
} catch (IOException e) {
return ResponseEntity.badRequest().build();
}
}
}
Best Practices
1. Chunk Size Optimization
// Experiment with different chunk sizes for your use case
// General guidelines:
// - 256-512 tokens: Better for precise factual retrieval
// - 512-1024 tokens: Good balance for most use cases
// - 1024-2048 tokens: Better for complex topics requiring more context
@Configuration
public class ChunkingConfig {
@Bean
public DocumentTransformer textSplitter(
@Value("${rag.chunk.size:800}") int chunkSize,
@Value("${rag.chunk.overlap:200}") int overlap) {
return TokenTextSplitter.builder()
.withChunkSize(chunkSize)
.withChunkOverlap(overlap)
.withSeparators(List.of("\n\n", "\n", ". ", " "))
.build();
}
}
2. Metadata Enrichment
@Component
public class MetadataEnricher implements DocumentTransformer {
private final ChatClient chatClient;
@Override
public List<Document> apply(List<Document> documents) {
return documents.stream()
.map(this::enrichWithMetadata)
.toList();
}
private Document enrichWithMetadata(Document doc) {
// Generate summary
String summary = generateSummary(doc.getContent());
// Extract keywords
List<String> keywords = extractKeywords(doc.getContent());
// Add to metadata
Map<String, Object> metadata = new HashMap<>(doc.getMetadata());
metadata.put("summary", summary);
metadata.put("keywords", keywords);
metadata.put("wordCount", doc.getContent().split("\\s+").length);
return new Document(doc.getContent(), metadata);
}
}
3. Filter by Metadata
// Use filter expressions to scope searches
SearchRequest request = SearchRequest.query(question)
.withTopK(5)
.withFilterExpression(
new Filter.Expression(
Filter.ExpressionType.AND,
new Filter.Expression(
Filter.ExpressionType.EQ,
new Filter.Key("category"),
new Filter.Value("technical-docs")
),
new Filter.Expression(
Filter.ExpressionType.GTE,
new Filter.Key("date"),
new Filter.Value("2024-01-01")
)
)
);
Conclusion
RAG with Spring AI provides a powerful foundation for building context-aware AI applications. By combining retrieval with generation, you can create systems that provide accurate, grounded responses based on your own data.
Key takeaways:
- Document quality matters: Good chunking and metadata lead to better retrieval
- Retrieval is crucial: Invest in advanced retrieval strategies like hybrid search and query expansion
- Evaluate continuously: Use metrics to understand and improve your RAG pipeline
- Monitor in production: Track latency, token usage, and retrieval quality
References and Further Reading
- Spring AI Documentation - RAG
- LangChain RAG Conceptual Guide
- InfoQ - Building Production-Ready RAG Applications
- DZone - Vector Databases for AI
- Pinecone - RAG Best Practices
- OpenAI Cookbook - Techniques to Improve Reliability
The code examples in this post are simplified for clarity. Always follow security best practices and thoroughly test RAG implementations before deploying to production.