Practice of intentionally introducing failures to test system resilience. Like conducting fire drills to ensure everyone knows how to respond to emergencies.
Netflix practices chaos engineering by randomly terminating servers in production to verify their systems can handle failures gracefully.
Chaos Engineering is a practice, not a single product. AWS Fault Injection Service and Azure Chaos Studio are first-party services designed to run controlled fault experiments. Google Cloud and OCI commonly rely on third-party chaos tools (often run on Kubernetes or VMs) plus native monitoring/incident tooling; OCI’s DR service focuses on recovery orchestration rather than fault injection.