Ability to understand what's happening inside a system by examining its outputs like logs, metrics, and traces. Like having security cameras, temperature sensors, and activity logs to understand everything happening in a building.
Good observability lets you see not just that your website is slow, but exactly which database query is causing the problem and why.
All major clouds provide an observability stack covering metrics, logs, and traces. AWS commonly combines CloudWatch (metrics/logs) with X-Ray (tracing) and OpenTelemetry via ADOT. Azure centers on Azure Monitor and Application Insights for app telemetry, often visualized with Managed Grafana. GCP bundles these capabilities under Google Cloud Observability (formerly Stackdriver). OCI provides Monitoring/Logging plus APM for tracing and performance insights.