Suppose you’ve trained a machine learning model to accomplish some task, and you’d now like to provide that model’s inference capabilities as a service. This is the purpose of endpoints — they provide a simple web-based API for feeding data to your model and getting back inference results.
Azure ML currently supports three types of endpoints: batch endpoints, Kubernetes online endpoints, and managed online endpoints. In the blog post linked below, I'll discuss batch endpoints. These are designed to handle large requests, working asynchronously and generating results that are held in blob storage. Because compute resources are only provisioned when the job starts, the latency of the response is higher than using online endpoints. However, that can result in substantially lower costs.