Home
A Python-based chaos engineering sidecar tool for testing the resilience of containerized applications.
Overview¶
Py-Chaos-Agent runs alongside your application containers to inject controlled failures and validate system behavior under stress. It's designed to help you build more resilient systems by proactively testing how they handle various failure scenarios.
Key Features¶
- Multiple Failure Modes: CPU stress, memory pressure, process termination, network latency
- Flexible Configuration: YAML-based configuration with probability controls
- Kubernetes Native: Designed as a sidecar container with proper security contexts
- Observable: Prometheus metrics for monitoring chaos experiments
- Safe by Default: Self-protection mechanisms and dry-run mode
- Infrastructure as Code: Terraform modules for AWS EKS deployment
Quick Example¶
agent:
interval_seconds: 10
dry_run: false
failures:
cpu:
enabled: true
probability: 0.3
duration_seconds: 5
cores: 1
Why Chaos Engineering?¶
Chaos engineering helps you:
- Identify weaknesses before they cause outages
- Build confidence in system resilience
- Validate monitoring and alerting systems
- Improve incident response procedures
- Test auto-scaling and recovery mechanisms
Getting Started¶
Ready to start chaos testing? Check out our Quick Start Guide to get up and running in minutes.
Safety First¶
Testing Environments Only
This tool is designed for testing environments only. Always exercise caution and never run in production without proper safeguards and approval.
See our Safety & Ethics guidelines for responsible chaos engineering practices.
Project Status¶
Py-Chaos-Agent is actively maintained and used in development and staging environments. We welcome contributions and feedback from the community.
License¶
This project is licensed under the MIT License - see the LICENSE file for details.