Blog Post

Apps on Azure Blog
7 MIN READ

Understanding Idle Usage in Azure Container Apps

samcogan's avatar
samcogan
Icon for Microsoft rankMicrosoft
May 30, 2025

Introduction

Azure Container Apps provides a serverless platform for running containers at scale, and one of the big benefits is that you can easily scale workloads to zero when they are not getting any traffic. Scaling to zero ensures you only pay when your workloads are actively receiving traffic or performing work.

However, for some workloads, scaling to zero might not be possible for a variety of reasons. Some workloads must always be able to respond to requests quickly, and the time it takes to scale from 0 to 1 replicas, while short, is too long. Some applications need to be able to always respond to health checks, and so removing all replicas is not possible. In these scenarios, there may still be time periods where there is no traffic, or the application isn't doing any work. While you can't reduce costs to zero in these scenarios, you can reduce them through the concept of "Idle" usage charges.

Where it is possible to scale your application to zero, it is recommended that this is where you focus your effort and optimise for scaling down when idle.

What Are Idle Charges?

When using the consumption plan (and idle charges only apply to the consumption plan), you are paying per second for vCPU and memory. In the central US region, this is currently $0.000024 per vCPU per second and $0.000003 per MB per second for Pay as You Go pricing. This cost is based on the resources you’ve requested and that have been allocated to your container, which may not be what you are actually consuming at any point in time.

If your container qualifies for idle billing, the CPU cost drops to $0.000003 per vCPU per second, which is a fairly significant drop. Memory costs remain the same.

Eligibility for Idle Pricing

To be eligible to receive idle pricing, your container needs to meet several criteria:

  • Consumption Plan - Idle pricing is only applicable to containers in the consumption plan. Resources in a dedicated plan are charged for the underlying compute resources and do not receive idle pricing.
  • Not GPU Apps - Idle pricing does not apply to containers that have GPUs allocated.
  • Scaled to Minimum - To be eligible for idle pricing, the app must be scaled to the minimum replica count. This does not have to be one replica; you can still have multiple replicas to support availability zone deployments or similar, but your app needs to be at whatever the minimum number you set is.
  • All Containers Are Running - All containers in an app must be running to get idle charges, so if any are in a failed state, you will not see this pricing.
  • Not a Container App Job - Jobs are stopped once the job completes, and so do not get charged.
  • Meet the Required Resource Usage - Containers must have the following resource usage to be eligible:
    • Not processing any HTTP requests
    • Using less than 0.01 vCPU cores (this is actual usage)
    • Receiving less than 1000 bytes/s of inbound network traffic

You should see on your bill that your usage during the time when all of the above is true is shown as idle usage.

Monitoring Idle Usage

There is no single metric or counter that will show you if a container is in a state where it will get idle billing; instead, you need to look at a few different things. Things like being on the consumption plan and not using GPUs are static and something that you can check on your container at any point. The metrics that will vary are scale, HTTP requests, vCPU cores, and network traffic. We can view all of these metrics in Azure Monitor under the following counters:

Counter NameDetailsExpected Value
Replica CountShows the number of replicas currently runningThe value set for "Min Replicas" in the scale rule settings
CPU UsageThe amount of CPU cores in use (ensure this is "CPU Usage" and not "CPU Usage Percentage")Less than 0.01
Network in BytesThe amount of network traffic being receivedLess than 1000 bytes/s
RequestsThe amount of HTTP requests0

Ensure that your granularity is set to the lowest level (1 minute) to avoid skewed results due to averaging.

Common Pitfalls

Health Probes

Health probes from the ACA environment to confirm that containers are running and healthy are not counted when it comes to HTTP requests or network traffic. However, health checks that come from outside the container environment are treated the same way as any other inbound traffic and will cause your container to no longer be "idle". If your container is behind Azure Front Door, API Manager, or Application Gateway, you will likely have this configured to check the health of the service, and this will be causing the container to show as active. For some services, like Azure Front Door, the amount of health probes will be high and may stop your container from ever dropping into an idle state.

For some applications, this real-time health reporting at the ingress layer is vital for ensuring the health of the application and being able to failover or heal from an issue. If that is the case, then you may not be able to take advantage of idle pricing. However, if there is some flexibility possible in this report, there are some techniques you can use to reduce the number of probes that reach the container app:

  • Reduce probe frequency – Tools like Azure Front Door will query a health probe endpoint every 30 seconds by default, and this will be done by every point of presence in the network, resulting in a lot of requests. Most services have the option to reduce the frequency of requests, making it more likely that you will meet the requirements for idle billing.
  • Aggregating and Caching – Rather than your ingress solution querying every application directly, you could use tools like Azure API Manager (APIM) to create a health endpoint that aggregates health data from the appropriate endpoints and caches it. Your ingress solution then queries APIM, and APIM only sends requests to the backend container apps when needed, and only to the applications that need to be queried to understand the overall application health.
  • ARM Health APIs – As mentioned, the health queries done by the Container App Environment to ensure the Container Apps are healthy are not counted when it comes to idle billing. This health data is used to indicate whether the Container App is healthy and is reported in the ARM APIs for the Container App. You could have your ingress solution make a call to the ARM APIs to get this status, rather than querying the application directly.

External Services

Many applications will make calls out to external services, which could include:

  • Monitoring services to record metrics and logs
  • Messaging services (Service Bus, Storage Queues) to check for messages to process
  • Background tasks to write data to external datastores

While these tasks are mostly outbound data, they will often result in some level of inbound response to the Container App, which could push you over the 1000 bytes/s limit. Again, some of this may be unavoidable, but where possible, you will need to limit the amount of communication with external services while idle. For example:

  • Avoid polling external services – Polling queues like Service Bus, Storage Queues, or APIs can generate frequent inbound responses. Where possible, use event-driven patterns to eliminate polling entirely.
  • Throttle background tasks – If your app writes to external datastores or sends telemetry, ensure these tasks are batched or delayed. Avoid frequent writes that trigger acknowledgements or status responses.
  • Limit monitoring and logging traffic – Sending metrics or logs to external monitoring platforms (e.g., Prometheus, Datadog) can result in inbound handshakes or status codes. Use buffering and reduce the frequency of exports.
  • Use lightweight protocols – Where possible, prefer protocols or formats that minimize response size (e.g., gRPC over HTTP, compressed JSON).

CPU Usage

The amount of CPU usage allowed when in an idle state is low, at 0.01 vCPU/s. It is fairly easy for a container to breach that threshold even when not handling any inbound requests. Some reasons might include:

  • Background threads that are running at all times
  • Inefficient idle code not using proper async patterns
  • Polling external services for messages
  • Garbage collection or other framework services
  • Additional services running within your container, but not used

To avoid unintentionally breaching the 0.01 vCPU/s limit and losing idle billing benefits, consider the following strategies:

  • Use proper async patterns – Avoid tight loops or background threads that run continuously. Instead, use async/await, Task.Delay(), or similar non-blocking patterns to yield CPU time. This helps ensure your container doesn’t exceed the 0.01 vCPU/s idle threshold.
  • Throttle or eliminate background activity – If your app polls external services (e.g., queues, APIs), increase the polling interval or switch to event-driven triggers like KEDA. This reduces unnecessary CPU usage during idle periods.
  • Tune framework and runtime settings – Some frameworks (like .NET or Java) may perform garbage collection or diagnostics even when idle. Configure these features to run less frequently or disable them if not needed.
  • Audit container contents – Remove any unnecessary services or agents bundled in your container image. Background daemons, telemetry exporters, or cron jobs can all contribute to idle CPU usage.
  • Monitor and profile – Use Azure Monitor to track CPU usage per replica. Set alerts for unexpected spikes and use profiling tools to identify hidden sources of CPU consumption.

Summary

Idle usage pricing is a good way to reduce your Container App bill if you are unable to scale to zero but do have periods where your applications are idle. Scaling to zero will always provide the lowest and most predictable cost, and I recommend using this wherever possible. However, when that is not feasible, idle pricing may be applicable.

Being eligible for idle pricing does require meeting some fairly low resource limits. If your application is likely to spend a good amount of time idling, it is worth doing some work to optimize your containers to ensure they are using as few resources as possible when they are not servicing requests.

Published May 30, 2025
Version 1.0
No CommentsBe the first to comment