Prometheus GatewayΒΆ
A high-performance, multi-provider LLM gateway with advanced caching, monitoring, and security features.
π FeaturesΒΆ
Multi-Provider SupportΒΆ
- OpenAI GPT - GPT-4o, GPT-3.5-turbo, and more
- Google Gemini - Gemini 2.5 Flash, Gemini 2.5 Pro
- Anthropic Claude - Claude Sonnet, Claude Opus
- Extensible Architecture - Easy to add new providers
Intelligent RoutingΒΆ
- Configuration-Driven - YAML-based provider configuration
- Model-to-Provider Mapping - Automatic routing based on model names
- Failover Support - Automatic fallback to alternative providers
Two-Level Caching SystemΒΆ
- Exact Cache - Redis-based caching for identical requests
- Semantic Cache - ChromaDB vector search for similar queries
- Configurable TTL - Customizable cache expiration times
- Cache Analytics - Monitor cache hit rates and performance
Security & PrivacyΒΆ
- Data Loss Prevention (DLP) - Automatic PII detection and anonymization
- API Key Management - Secure key generation and validation
- Rate Limiting - Per-key request throttling
- Input Sanitization - Comprehensive request validation
Monitoring & ObservabilityΒΆ
- Prometheus Metrics - Custom metrics for all operations
- Grafana Dashboards - Pre-built monitoring dashboards
- Structured Logging - JSON-formatted logs with correlation IDs
- Health Checks - Comprehensive service health endpoints
ποΈ ArchitectureΒΆ
graph TB
Client[Client Application] --> Gateway[Prometheus Gateway]
Gateway --> DLP[DLP Scanner]
Gateway --> Cache[Redis Cache]
Gateway --> Semantic[Semantic Cache]
Gateway --> OpenAI[OpenAI Provider]
Gateway --> Google[Google Provider]
Gateway --> Anthropic[Anthropic Provider]
Gateway --> Metrics[Prometheus Metrics]
Metrics --> Grafana[Grafana Dashboard]
Cache --> Redis[(Redis)]
Semantic --> ChromaDB[(ChromaDB)]
π PerformanceΒΆ
- Sub-50ms Latency - For cached responses
- 99.9% Uptime - Highly available with proper deployment
- Horizontal Scaling - Stateless design for easy scaling
- Memory Efficient - Optimized for high-throughput scenarios
π SecurityΒΆ
- Zero-Trust Architecture - All requests require valid API keys
- PII Protection - Automatic detection and anonymization
- Audit Logging - Complete request/response audit trail
- Rate Limiting - Protection against abuse and DoS attacks
π MonitoringΒΆ
Real-time monitoring with: - Request latency histograms - Token usage tracking - Error rate monitoring - Cache performance metrics - Provider health status
π Quick StartΒΆ
# Clone the repository
git clone https://github.com/yourusername/prometheus-gateway.git
cd prometheus-gateway
# Start with Docker Compose
docker-compose up -d
# Or run locally
pip install -r requirements.txt
python -m spacy download en_core_web_lg
uvicorn app.main:app --reload
π DocumentationΒΆ
π€ ContributingΒΆ
We welcome contributions! Please see our Testing Guide for development information.
π LicenseΒΆ
This project is licensed under the MIT License - see the LICENSE file for details.
π SupportΒΆ
- π Documentation
- π Issue Tracker
- π¬ Discussions