Troubleshooting Guide
Encountering issues with Aicser Platform? This comprehensive troubleshooting guide will help you diagnose and resolve common problems, from installation issues to performance bottlenecks.
🔍 Quick Diagnosis
Health Check Commands
# Check system status
docker-compose ps
docker-compose logs --tail=50
# Check service health
curl -f http://localhost:3005/health
curl -f http://localhost:8000/health
curl -f http://localhost:5432/health
# Check resource usage
docker stats
df -h
free -h
Common Error Patterns
- Connection Refused: Service not running or wrong port
- Timeout Errors: Network issues or service overload
- Authentication Failed: Invalid credentials or expired tokens
- Permission Denied: File system or database access issues
🚨 Common Issues & Solutions
1. Installation & Setup Issues
Docker Compose Won't Start
Symptoms: docker-compose up fails with connection errors
Solutions:
# Check Docker service
sudo systemctl status docker
sudo systemctl start docker
# Verify ports are available
netstat -tulpn | grep :3005
netstat -tulpn | grep :8000
# Clear Docker cache
docker system prune -a
docker volume prune
Port Already in Use
Symptoms: Error: Port 3005 is already in use
Solutions:
# Find process using port
lsof -i :3005
sudo fuser -k 3005/tcp
# Or change port in docker-compose.yml
ports:
- "3006:3005" # Use 3006 instead
2. Database Connection Issues
PostgreSQL Connection Failed
Symptoms: FATAL: password authentication failed
Solutions:
# Check database logs
docker-compose logs postgres
# Reset database password
docker-compose exec postgres psql -U postgres
ALTER USER aicser_user PASSWORD 'new_password';
# Verify environment variables
echo $POSTGRES_PASSWORD
echo $POSTGRES_USER
Database Migration Errors
Symptoms: Alembic migration failed
Solutions:
# Check migration status
docker-compose exec auth alembic current
docker-compose exec auth alembic history
# Reset migrations (WARNING: Data loss)
docker-compose exec auth alembic downgrade base
docker-compose exec auth alembic upgrade head
# Check database schema
docker-compose exec postgres psql -U aicser_user -d aicser_db -c "\dt"
3. AI Service Issues
OpenAI API Errors
Symptoms: OpenAI API error: Invalid API key
Solutions:
# Verify API key
echo $OPENAI_API_KEY
# Test API connection
curl -H "Authorization: Bearer $OPENAI_API_KEY" \
https://api.openai.com/v1/models
# Check rate limits
curl -H "Authorization: Bearer $OPENAI_API_KEY" \
https://api.openai.com/v1/usage
Model Loading Failures
Symptoms: Failed to load AI model
Solutions:
# Check model cache
ls -la /tmp/aicser_models/
# Clear model cache
rm -rf /tmp/aicser_models/*
# Check available memory
free -h
# Verify model files
docker-compose exec ai-analytics ls -la /app/models/
4. Frontend Issues
React App Won't Load
Symptoms: White screen or JavaScript errors
Solutions:
# Check frontend logs
docker-compose logs client
# Verify build files
ls -la packages/client/client/build/
# Rebuild frontend
docker-compose exec client npm run build
# Check browser console for errors
# Press F12 in browser
API Endpoint Errors
Symptoms: 404 Not Found or 500 Internal Server Error
Solutions:
# Check API routes
curl -X GET http://localhost:8000/api/v1/health
# Verify service discovery
docker-compose exec client curl http://auth:8000/health
# Check NGINX configuration
docker-compose exec nginx nginx -t
5. Performance Issues
Slow Query Response
Symptoms: Queries taking > 5 seconds
Solutions:
# Check database performance
docker-compose exec postgres psql -U aicser_user -d aicser_db -c "
SELECT query, mean_time, calls
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 10;"
# Check cache hit rate
docker-compose exec redis redis-cli info memory
# Monitor resource usage
docker stats --no-stream
High Memory Usage
Symptoms: Out of memory errors or slow performance
Solutions:
# Check memory usage
free -h
docker stats --no-stream
# Restart memory-intensive services
docker-compose restart ai-analytics
docker-compose restart postgres
# Adjust memory limits in docker-compose.yml
services:
ai-analytics:
deploy:
resources:
limits:
memory: 4G
🛠️ Debugging Techniques
1. Log Analysis
Enable Debug Logging
# Set debug level
export LOG_LEVEL=DEBUG
export DEBUG=true
# Restart services
docker-compose restart
# Follow logs in real-time
docker-compose logs -f --tail=100
Log Search & Filtering
# Search for errors
docker-compose logs | grep -i error
# Search for specific service
docker-compose logs auth | grep -i "authentication"
# Search with context
docker-compose logs | grep -A 5 -B 5 "error"
2. Network Debugging
Check Network Connectivity
# Test inter-service communication
docker-compose exec auth ping postgres
docker-compose exec client ping auth
# Check DNS resolution
docker-compose exec auth nslookup postgres
docker-compose exec auth nslookup redis
# Verify port accessibility
telnet postgres 5432
telnet redis 6379
Network Configuration
# Check Docker network
docker network ls
docker network inspect aicser-world_default
# Inspect container networking
docker inspect <container_id> | grep -A 20 "NetworkSettings"
3. Database Debugging
PostgreSQL Debugging
# Connect to database
docker-compose exec postgres psql -U aicser_user -d aicser_db
# Check active connections
SELECT * FROM pg_stat_activity;
# Check query performance
SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;
# Check table sizes
SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename))
FROM pg_tables
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
Redis Debugging
# Connect to Redis
docker-compose exec redis redis-cli
# Check memory usage
INFO memory
# Check key statistics
INFO keyspace
# Monitor commands in real-time
MONITOR
📊 Monitoring & Alerting
1. Health Checks
Service Health Endpoints
# Health check URLs
http://localhost:3005/health # Frontend
http://localhost:8000/health # Auth API
http://localhost:8001/health # AI Analytics
http://localhost:5432/health # PostgreSQL
http://localhost:6379/health # Redis
# Health check script
#!/bin/bash
services=("3005" "8000" "8001" "5432" "6379")
for port in "${services[@]}"; do
if curl -f "http://localhost:$port/health" >/dev/null 2>&1; then
echo "✅ Port $port: Healthy"
else
echo "❌ Port $port: Unhealthy"
fi
done
Automated Monitoring
# Prometheus health check configuration
scrape_configs:
- job_name: 'aicser-health'
static_configs:
- targets: ['localhost:3005', 'localhost:8000', 'localhost:8001']
metrics_path: /health
scrape_interval: 30s
2. Performance Monitoring
Key Metrics to Monitor
- Response Time: API endpoint latency
- Error Rate: Failed requests percentage
- Throughput: Requests per second
- Resource Usage: CPU, memory, disk utilization
Alerting Rules
# Grafana alerting rules
groups:
- name: aicser-alerts
rules:
- alert: HighResponseTime
expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 2
for: 5m
labels:
severity: warning
annotations:
summary: "High response time detected"
🆘 Getting Help
1. Self-Service Resources
Documentation
Community Support
2. Professional Support
Enterprise Support
- Email: support@aicser.com
- Phone: +1-800-AICSER-HELP
- Slack: Enterprise customers only
- Response Time: 4 hours (business hours)
Support Tiers
- Community: GitHub issues and community forum
- Standard: Email support with 24-hour response
- Premium: Phone, email, and Slack with 4-hour response
- Enterprise: Dedicated support engineer
3. Escalation Process
When to Escalate
- Critical Issues: System down, data loss, security breach
- Performance Issues: Response time > 10 seconds
- Integration Problems: API failures affecting business operations
- Compliance Issues: Security or regulatory concerns
Escalation Contacts
- Technical Escalation: tech-lead@aicser.com
- Management Escalation: management@aicser.com
- Emergency: emergency@aicser.com (24/7)
📝 Issue Reporting
1. Bug Report Template
## Issue Description
Brief description of the problem
## Steps to Reproduce
1. Step 1
2. Step 2
3. Step 3
## Expected Behavior
What should happen
## Actual Behavior
What actually happens
## Environment
- OS: [e.g., Ubuntu 20.04]
- Docker: [e.g., 20.10.0]
- Aicser Version: [e.g., 1.0.0]
## Logs
Relevant log output
## Additional Context
Any other information that might be helpful
2. Feature Request Template
## Feature Description
What feature would you like to see
## Use Case
How would this feature help you
## Proposed Solution
Your suggested implementation
## Alternatives Considered
Other approaches you've considered
## Additional Context
Any other relevant information
Still Need Help?