Spring AI applications bring the power of large language models to enterprise Java development, but deploying these applications to production requires careful consideration of resource management, scaling, secrets handling, and observability. Kubernetes provides the perfect platform for running AI workloads at scale.
In this comprehensive guide, we’ll walk through deploying Spring AI applications to Kubernetes, covering everything from containerization to production-ready GitOps workflows.
Understanding Spring AI Deployment Requirements
Spring AI applications have unique characteristics that influence deployment strategies:
┌─────────────────────────────────────────────────────────────────────────┐
│ Spring AI Application Architecture │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────────┐ │
│ │ REST API │ │ AI Service │ │ Vector Store │ │
│ │ Controllers │────►│ (ChatClient) │────►│ (PostgreSQL/ │ │
│ │ │ │ │ │ pgvector) │ │
│ └────────────────┘ └───────┬────────┘ └────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────┐ │
│ │ External AI APIs │ │
│ │ (OpenAI, Azure, │ │
│ │ Anthropic, etc.) │ │
│ └────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Key Considerations
- External API Dependencies: Connections to OpenAI, Azure OpenAI, or other AI providers
- Secret Management: API keys must be securely stored and injected
- Resource Requirements: AI operations can be memory and CPU intensive
- Latency Sensitivity: AI API calls have inherent latency (seconds, not milliseconds)
- Vector Database: Often requires a dedicated database instance
- Cost Management: AI API calls have per-token costs
Containerizing Spring AI Applications
Optimized Dockerfile
# Build stage
FROM eclipse-temurin:21-jdk-alpine AS builder
WORKDIR /build
# Copy Maven/Gradle wrapper and dependencies first for better caching
COPY mvnw pom.xml ./
COPY .mvn .mvn
RUN ./mvnw dependency:go-offline -B
# Copy source and build
COPY src src
RUN ./mvnw package -DskipTests -B
# Extract layered JAR for optimal Docker layers
RUN java -Djarmode=layertools -jar target/*.jar extract
# Runtime stage
FROM eclipse-temurin:21-jre-alpine
# Security: Create non-root user
RUN addgroup -g 1000 spring && \
adduser -u 1000 -G spring -s /bin/sh -D spring && \
apk add --no-cache curl
WORKDIR /app
# Copy layers in order of change frequency
COPY --from=builder /build/dependencies/ ./
COPY --from=builder /build/spring-boot-loader/ ./
COPY --from=builder /build/snapshot-dependencies/ ./
COPY --from=builder /build/application/ ./
# Set ownership
RUN chown -R spring:spring /app
USER spring
# JVM tuning for containers
ENV JAVA_OPTS="-XX:+UseContainerSupport \
-XX:MaxRAMPercentage=75.0 \
-XX:InitialRAMPercentage=50.0 \
-XX:+UseG1GC \
-XX:+UseStringDeduplication \
-Djava.security.egd=file:/dev/./urandom"
EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
CMD curl --fail --silent http://localhost:8080/actuator/health || exit 1
ENTRYPOINT ["sh", "-c", "java $JAVA_OPTS org.springframework.boot.loader.launch.JarLauncher"]
Cloud-Native Buildpacks Alternative
For simpler containerization, use Spring Boot’s buildpack support:
# Using Maven
./mvnw spring-boot:build-image \
-Dspring-boot.build-image.imageName=myregistry/spring-ai-app:latest
# Using Gradle
./gradlew bootBuildImage \
--imageName=myregistry/spring-ai-app:latest
Kubernetes Manifests
Namespace and Resource Quotas
# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: spring-ai
labels:
name: spring-ai
istio-injection: enabled # If using Istio service mesh
---
apiVersion: v1
kind: ResourceQuota
metadata:
name: spring-ai-quota
namespace: spring-ai
spec:
hard:
requests.cpu: "4"
requests.memory: 8Gi
limits.cpu: "8"
limits.memory: 16Gi
pods: "10"
Secrets Management with External Secrets Operator
Instead of storing secrets in Kubernetes directly, use External Secrets Operator to fetch from a secrets manager:
# external-secret.yaml
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: spring-ai-secrets
namespace: spring-ai
spec:
refreshInterval: 1h
secretStoreRef:
name: azure-keyvault # or aws-secrets-manager, gcp-secret-manager
kind: ClusterSecretStore
target:
name: spring-ai-api-keys
creationPolicy: Owner
data:
- secretKey: OPENAI_API_KEY
remoteRef:
key: openai-api-key
- secretKey: AZURE_OPENAI_API_KEY
remoteRef:
key: azure-openai-api-key
- secretKey: DATABASE_PASSWORD
remoteRef:
key: pgvector-password
ConfigMap for Application Configuration
# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: spring-ai-config
namespace: spring-ai
data:
application.yaml: |
spring:
ai:
openai:
chat:
options:
model: gpt-4-turbo-preview
temperature: 0.7
max-tokens: 4096
embedding:
options:
model: text-embedding-3-small
retry:
max-attempts: 3
backoff:
initial-interval: 2s
multiplier: 2
max-interval: 30s
datasource:
url: jdbc:postgresql://pgvector-service:5432/vectordb
username: springai
hikari:
maximum-pool-size: 10
minimum-idle: 5
connection-timeout: 30000
management:
endpoints:
web:
exposure:
include: health,info,metrics,prometheus
endpoint:
health:
show-details: when_authorized
probes:
enabled: true
metrics:
tags:
application: spring-ai-app
Deployment Specification
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: spring-ai-app
namespace: spring-ai
labels:
app: spring-ai-app
version: v1
spec:
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: spring-ai-app
template:
metadata:
labels:
app: spring-ai-app
version: v1
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/actuator/prometheus"
spec:
serviceAccountName: spring-ai-sa
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
containers:
- name: spring-ai-app
image: myregistry/spring-ai-app:1.0.0
imagePullPolicy: Always
ports:
- containerPort: 8080
name: http
protocol: TCP
env:
- name: SPRING_PROFILES_ACTIVE
value: kubernetes
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: spring-ai-api-keys
key: OPENAI_API_KEY
- name: SPRING_DATASOURCE_PASSWORD
valueFrom:
secretKeyRef:
name: spring-ai-api-keys
key: DATABASE_PASSWORD
- name: JAVA_OPTS
value: >-
-XX:+UseContainerSupport
-XX:MaxRAMPercentage=75.0
-XX:+UseG1GC
-Dspring.config.additional-location=/config/
volumeMounts:
- name: config-volume
mountPath: /config
readOnly: true
- name: tmp-volume
mountPath: /tmp
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 60
periodSeconds: 15
timeoutSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
startupProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 30
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
volumes:
- name: config-volume
configMap:
name: spring-ai-config
- name: tmp-volume
emptyDir: {}
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: spring-ai-app
Service and Ingress
# service.yaml
apiVersion: v1
kind: Service
metadata:
name: spring-ai-service
namespace: spring-ai
labels:
app: spring-ai-app
spec:
type: ClusterIP
ports:
- port: 80
targetPort: 8080
protocol: TCP
name: http
selector:
app: spring-ai-app
---
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: spring-ai-ingress
namespace: spring-ai
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/proxy-read-timeout: "120"
nginx.ingress.kubernetes.io/proxy-send-timeout: "120"
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
tls:
- hosts:
- ai-api.example.com
secretName: spring-ai-tls
rules:
- host: ai-api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: spring-ai-service
port:
number: 80
Horizontal Pod Autoscaling
Spring AI applications may need to scale based on request volume or custom metrics:
# hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: spring-ai-hpa
namespace: spring-ai
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: spring-ai-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
# Custom metric: requests per second
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100"
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 25
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 4
periodSeconds: 15
selectPolicy: Max
Vector Database Deployment (pgvector)
Deploy PostgreSQL with pgvector extension for RAG applications:
# pgvector-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: pgvector
namespace: spring-ai
spec:
serviceName: pgvector
replicas: 1
selector:
matchLabels:
app: pgvector
template:
metadata:
labels:
app: pgvector
spec:
containers:
- name: pgvector
image: pgvector/pgvector:pg16
ports:
- containerPort: 5432
env:
- name: POSTGRES_DB
value: vectordb
- name: POSTGRES_USER
value: springai
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: spring-ai-api-keys
key: DATABASE_PASSWORD
- name: PGDATA
value: /var/lib/postgresql/data/pgdata
volumeMounts:
- name: pgvector-data
mountPath: /var/lib/postgresql/data
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
livenessProbe:
exec:
command:
- pg_isready
- -U
- springai
- -d
- vectordb
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
exec:
command:
- pg_isready
- -U
- springai
- -d
- vectordb
initialDelaySeconds: 5
periodSeconds: 5
volumeClaimTemplates:
- metadata:
name: pgvector-data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: standard
resources:
requests:
storage: 50Gi
---
apiVersion: v1
kind: Service
metadata:
name: pgvector-service
namespace: spring-ai
spec:
type: ClusterIP
ports:
- port: 5432
targetPort: 5432
selector:
app: pgvector
Helm Chart Structure
For production deployments, package everything as a Helm chart:
spring-ai-chart/
├── Chart.yaml
├── values.yaml
├── values-production.yaml
├── templates/
│ ├── _helpers.tpl
│ ├── deployment.yaml
│ ├── service.yaml
│ ├── ingress.yaml
│ ├── hpa.yaml
│ ├── configmap.yaml
│ ├── external-secret.yaml
│ ├── serviceaccount.yaml
│ ├── pdb.yaml
│ └── servicemonitor.yaml
└── charts/
└── pgvector/
Chart.yaml
apiVersion: v2
name: spring-ai-app
description: Spring AI Application Helm Chart
type: application
version: 1.0.0
appVersion: "1.0.0"
dependencies:
- name: postgresql
version: "13.x.x"
repository: https://charts.bitnami.com/bitnami
condition: postgresql.enabled
values.yaml
# Default values for spring-ai-app
replicaCount: 2
image:
repository: myregistry/spring-ai-app
pullPolicy: Always
tag: "latest"
imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""
serviceAccount:
create: true
annotations: {}
name: ""
podAnnotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/actuator/prometheus"
podSecurityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
service:
type: ClusterIP
port: 80
ingress:
enabled: true
className: nginx
annotations:
nginx.ingress.kubernetes.io/proxy-read-timeout: "120"
cert-manager.io/cluster-issuer: letsencrypt-prod
hosts:
- host: ai-api.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: spring-ai-tls
hosts:
- ai-api.example.com
resources:
limits:
cpu: 1000m
memory: 2Gi
requests:
cpu: 500m
memory: 1Gi
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 70
targetMemoryUtilizationPercentage: 80
# Pod Disruption Budget
pdb:
enabled: true
minAvailable: 1
nodeSelector: {}
tolerations: []
affinity: {}
# Application configuration
config:
spring:
profiles:
active: kubernetes
ai:
openai:
chat:
options:
model: gpt-4-turbo-preview
temperature: 0.7
# External Secrets configuration
externalSecrets:
enabled: true
refreshInterval: 1h
secretStore:
name: azure-keyvault
kind: ClusterSecretStore
secrets:
- key: OPENAI_API_KEY
remoteRef: openai-api-key
- key: DATABASE_PASSWORD
remoteRef: pgvector-password
# PostgreSQL with pgvector
postgresql:
enabled: true
auth:
database: vectordb
username: springai
existingSecret: spring-ai-api-keys
secretKeys:
userPasswordKey: DATABASE_PASSWORD
primary:
persistence:
size: 50Gi
extendedConfiguration: |
shared_preload_libraries = 'vector'
image:
repository: pgvector/pgvector
tag: pg16
Helm Deployment Commands
# Install/upgrade with values file
helm upgrade --install spring-ai ./spring-ai-chart \
--namespace spring-ai \
--create-namespace \
-f values-production.yaml
# Check deployment status
helm status spring-ai -n spring-ai
# View generated manifests
helm template spring-ai ./spring-ai-chart -f values-production.yaml
GitOps with ArgoCD
For production deployments, use GitOps practices with ArgoCD:
# argocd-application.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: spring-ai-app
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/myorg/spring-ai-k8s-manifests.git
targetRevision: HEAD
path: spring-ai-app/overlays/production
destination:
server: https://kubernetes.default.svc
namespace: spring-ai
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
- PrunePropagationPolicy=foreground
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3m
Observability Stack
ServiceMonitor for Prometheus
# servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: spring-ai-monitor
namespace: spring-ai
labels:
app: spring-ai-app
spec:
selector:
matchLabels:
app: spring-ai-app
endpoints:
- port: http
path: /actuator/prometheus
interval: 15s
scrapeTimeout: 10s
Grafana Dashboard ConfigMap
# grafana-dashboard-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: spring-ai-dashboard
namespace: monitoring
labels:
grafana_dashboard: "1"
data:
spring-ai-dashboard.json: |
{
"dashboard": {
"title": "Spring AI Application",
"panels": [
{
"title": "AI API Latency",
"type": "graph",
"targets": [
{
"expr": "histogram_quantile(0.95, sum(rate(spring_ai_chat_client_duration_seconds_bucket[5m])) by (le))",
"legendFormat": "p95 Latency"
}
]
},
{
"title": "Token Usage",
"type": "stat",
"targets": [
{
"expr": "sum(rate(spring_ai_tokens_total[5m]))",
"legendFormat": "Tokens/sec"
}
]
},
{
"title": "RAG Retrieval Time",
"type": "graph",
"targets": [
{
"expr": "histogram_quantile(0.95, sum(rate(rag_retrieval_duration_seconds_bucket[5m])) by (le))",
"legendFormat": "p95 Retrieval Time"
}
]
}
]
}
}
Application Metrics Configuration
@Configuration
public class AiMetricsConfig {
@Bean
public MeterRegistryCustomizer<MeterRegistry> aiMetricsCustomizer() {
return registry -> {
// Common tags for all AI metrics
registry.config().commonTags(
"application", "spring-ai-app",
"environment", System.getenv("ENVIRONMENT")
);
};
}
@Bean
public ObservationHandler<Observation.Context> aiObservationHandler(
MeterRegistry registry) {
return new AiOperationObservationHandler(registry);
}
}
@Component
public class AiOperationObservationHandler
implements ObservationHandler<Observation.Context> {
private final MeterRegistry registry;
private final Counter tokenCounter;
private final Timer chatLatencyTimer;
public AiOperationObservationHandler(MeterRegistry registry) {
this.registry = registry;
this.tokenCounter = Counter.builder("spring_ai_tokens_total")
.description("Total tokens used in AI operations")
.register(registry);
this.chatLatencyTimer = Timer.builder("spring_ai_chat_client_duration_seconds")
.description("AI chat operation duration")
.register(registry);
}
@Override
public void onStop(Observation.Context context) {
if (context.getName().contains("chat")) {
// Record latency
chatLatencyTimer.record(
context.getOrDefault("duration", Duration.ZERO)
);
// Record token usage
Integer tokens = context.getOrDefault("tokens_used", 0);
tokenCounter.increment(tokens);
}
}
}
Network Policies
Restrict network access for security:
# network-policy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: spring-ai-network-policy
namespace: spring-ai
spec:
podSelector:
matchLabels:
app: spring-ai-app
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx
- podSelector:
matchLabels:
app: prometheus
ports:
- protocol: TCP
port: 8080
egress:
# Allow DNS
- to:
- namespaceSelector: {}
ports:
- protocol: UDP
port: 53
# Allow pgvector database
- to:
- podSelector:
matchLabels:
app: pgvector
ports:
- protocol: TCP
port: 5432
# Allow external AI APIs (OpenAI, Azure, etc.)
- to:
- ipBlock:
cidr: 0.0.0.0/0
ports:
- protocol: TCP
port: 443
Production Deployment Checklist
Pre-Deployment
# 1. Verify image security
trivy image myregistry/spring-ai-app:1.0.0
# 2. Validate Kubernetes manifests
kubectl apply --dry-run=client -f manifests/
# 3. Check resource quotas
kubectl describe resourcequota -n spring-ai
# 4. Verify secrets are properly configured
kubectl get externalsecrets -n spring-ai
Deployment
# 1. Deploy using Helm
helm upgrade --install spring-ai ./spring-ai-chart \
--namespace spring-ai \
--create-namespace \
-f values-production.yaml \
--wait \
--timeout 10m
# 2. Verify deployment
kubectl rollout status deployment/spring-ai-app -n spring-ai
# 3. Check pod health
kubectl get pods -n spring-ai -l app=spring-ai-app
# 4. View logs
kubectl logs -n spring-ai -l app=spring-ai-app --tail=100 -f
Post-Deployment Verification
# 1. Test health endpoints
kubectl run test-pod --rm -it --image=curlimages/curl -- \
curl http://spring-ai-service.spring-ai.svc/actuator/health
# 2. Verify HPA is working
kubectl get hpa -n spring-ai
# 3. Check metrics are being collected
kubectl port-forward svc/prometheus-server -n monitoring 9090:80
# Visit http://localhost:9090 and query: spring_ai_chat_client_duration_seconds_count
# 4. Run a test query
curl -X POST https://ai-api.example.com/api/chat \
-H "Content-Type: application/json" \
-d '{"message": "Hello, can you help me?"}'
Troubleshooting Guide
Common Issues
1. Pods stuck in Pending state
kubectl describe pod -n spring-ai <pod-name>
# Check for resource quota issues or node availability
2. API key not found
kubectl get externalsecrets -n spring-ai
kubectl describe externalsecret spring-ai-secrets -n spring-ai
3. Database connection failures
# Check pgvector pod
kubectl logs -n spring-ai -l app=pgvector
# Test connectivity
kubectl run test-db --rm -it --image=postgres:16 -- \
psql -h pgvector-service.spring-ai.svc -U springai -d vectordb
4. High latency issues
# Check if requests are being rate-limited by AI provider
kubectl logs -n spring-ai -l app=spring-ai-app | grep -i "rate limit"
# Check HPA status
kubectl describe hpa spring-ai-hpa -n spring-ai
Best Practices Summary
-
Use External Secrets: Never store API keys in ConfigMaps or environment variables directly
-
Set Appropriate Timeouts: AI API calls can take seconds; configure ingress and client timeouts accordingly
-
Implement Circuit Breakers: Use Resilience4j to handle AI provider outages gracefully
-
Monitor Token Usage: Track AI API costs through Prometheus metrics
-
Use Pod Disruption Budgets: Ensure availability during cluster maintenance
-
Enable Autoscaling: Configure HPA based on request volume, not just CPU
-
Secure Network Access: Use NetworkPolicies to restrict egress to only necessary endpoints
-
Plan for Failure: AI providers have outages; implement fallback strategies
Conclusion
Deploying Spring AI applications to Kubernetes requires attention to several unique concerns: secret management for API keys, appropriate resource allocation for AI workloads, and robust observability for monitoring costs and performance.
By following the patterns in this guide—containerizing efficiently, using GitOps for deployments, implementing proper health checks, and establishing comprehensive monitoring—you can run Spring AI applications reliably in production.
References and Further Reading
- Spring AI Documentation
- Kubernetes Best Practices
- External Secrets Operator
- InfoQ - Kubernetes Production Best Practices
- DZone - Spring Boot on Kubernetes
- Helm Documentation
- ArgoCD Best Practices
The configurations in this post are examples and should be adapted to your specific requirements and security policies. Always review and test thoroughly before deploying to production.