Processing large volumes of data through an AI model all at once rather than one item at a time. Like grading a stack of exams together rather than waiting for students to submit them individually.
An e-commerce company runs batch inference overnight to generate product recommendations for all millions of users at once.
All four clouds support running offline/batch predictions by reading data from storage, scoring it with a trained model on scalable compute, and writing results back to storage. AWS, Azure, and GCP provide first-class managed batch inference features; OCI commonly implements batch inference via Data Science Jobs that load a model and score datasets on a schedule.