Kubernetes Deployment
This guide covers deploying Octokraft on Kubernetes for production environments that need scaling, high availability, and operational maturity. This is the recommended approach for teams with 50+ developers or when uptime requirements are critical.Prerequisites
- Kubernetes 1.28+
- Helm 3.x
kubectlconfigured for your cluster- PostgreSQL 16+ (managed or self-hosted)
- Redis 7+ (managed or self-hosted)
- FalkorDB instance
- Temporal server
- A registered GitHub App (see GitHub Integration)
- A Clerk account for authentication
- Access to an OpenAI-compatible AI model API
For production deployments, use managed database services (e.g., Amazon RDS, Cloud SQL, Azure Database) rather than running PostgreSQL and Redis inside the cluster.
Quick Start
Create a values file
Create
values.yaml with your configuration. See the values reference below.values.yaml Reference
Scaling
Horizontal Scaling
The API server and workers scale independently. Workers are the primary scaling target — add more replicas to process analysis tasks faster.Horizontal Pod Autoscaler
Resource Sizing Guide
| Team Size | API Replicas | Worker Replicas | Worker CPU | Worker Memory |
|---|---|---|---|---|
| 10-50 devs | 2 | 2 | 1000m | 1 Gi |
| 50-200 devs | 3 | 4 | 2000m | 2 Gi |
| 200+ devs | 4+ | 6+ | 2000m | 4 Gi |
High Availability
API Server
Run at least 2 replicas with pod anti-affinity to spread across nodes:Workers
Workers are stateless and can tolerate pod restarts. Temporal automatically retries tasks assigned to workers that go down. Run at least 2 replicas in production.Infrastructure Services
For high availability of infrastructure services:- PostgreSQL: Use a managed service with automated failover (Amazon RDS, Cloud SQL, Azure Database).
- Redis: Use a managed service with replication (ElastiCache, Memorystore, Azure Cache).
- Temporal: Deploy the Temporal server cluster with multiple history and matching service replicas.
- FalkorDB: Run with persistent storage and regular backups.
Operations
Health Checks
The Helm chart configures liveness and readiness probes automatically. The health endpoints are:| Endpoint | Purpose |
|---|---|
/healthz | Simple health check. Returns 200 if the service is running. Used for liveness probes. |
/health/detailed | Component health check. Returns status of each infrastructure dependency with circuit breaker state. Used for readiness probes. |
Monitoring
Octokraft exposes Prometheus metrics at/metrics on port 8080. The Helm chart can create a ServiceMonitor for automatic Prometheus discovery.
Key metrics to monitor:
- HTTP request latency and error rates
- Active Temporal workflow counts
- Analysis task duration and failure rates
- Database connection pool utilization
OTEL_* environment variables. Enable it in your values:
Logs
All components write structured JSON logs to stdout. Use your cluster’s log aggregation pipeline (Fluentd, Loki, CloudWatch Logs, etc.) to collect and search logs.Upgrades
Backups
Back up the PostgreSQL database regularly using your managed service’s backup features orpg_dump. Redis and FalkorDB contain derived data that can be reconstructed from a fresh analysis run.
Troubleshooting
Pods stuck in CrashLoopBackOff
Pods stuck in CrashLoopBackOff
Check the pod logs for the specific error:The most common causes are missing environment variables (the application logs which variable is unset) or unreachable infrastructure services.
Readiness probe failing
Readiness probe failing
The readiness probe calls The response includes the status of each dependency (PostgreSQL, Redis, FalkorDB, Temporal).
/health/detailed, which checks connectivity to all infrastructure services. Identify which component is unhealthy:Workers not picking up tasks
Workers not picking up tasks
Verify workers are connected to Temporal and the correct namespace:Confirm that
TEMPORAL_ADDRESS and TEMPORAL_NAMESPACE match between the API and worker deployments.Ingress not routing traffic
Ingress not routing traffic
Verify the ingress resource is created and has an address assigned:Confirm your DNS records point to the ingress controller’s external IP or load balancer. Check the ingress controller logs if requests are not reaching the Octokraft pods.
Analysis tasks timing out
Analysis tasks timing out
Analysis task duration depends on repository size and AI model response time. If tasks are timing out:
- Check AI model API latency — slow responses from the model provider are the most common cause.
- Scale workers to reduce queue depth.
- Verify workers have sufficient memory — large repositories require more memory during analysis.