Overview
Java applications often face startup delays due to their runtime initialization and class loading processes. In the cloud-native era, applications start and stop more frequently, with an increasing need for scale-out to accommodate dynamic traffic demands, making this issue even more prominent. To mitigate this, CRaC (Coordinated Restore at Checkpoint) offers a solution to this challenge by allowing applications to be checkpointed and restored, thus avoiding lengthy startup time after the first initialization. Based on the experiment on the Spring PetClinic project project, we observed a 7x improvement in startup speed after enabling CRaC on Azure Kubernetes Service.
In the final section, we will discuss CRaC's limitations and potential future developments. We welcome your feedback, which will help us continue improving and optimizing Java on Azure. Feel free to share your thoughts in the comments section at the end of this article.
Next, we will walk through how to:
1. Package and containerize a Java application locally.
2. Deploy it to Azure Kubernetes Service (AKS).
3. Utilize CRaC to create a checkpoint.
4. Create a new application to restore from the checkpoint.
5. Compare the startup performance between the original and restored applications.
Packaging a Java Application
Before we deploy our Java application to AKS, we need to package it and create a container image. Follow these steps to clone and package the application:
1. Clone the Repository and build the Application:
For this example, we will use the popular Spring PetClinic project, which can be found on GitHub.
git clone -b crac-poc https://github.com/leonard520/spring-petclinic.git
cd spring-petclinic
Note, this repo is a fork of the official Spring PetClinic project. The only modification made is the addition of Spring CRaC dependencies.
<dependency>
<groupId>org.crac</groupId>
<artifactId>crac</artifactId>
<version>1.4.0</version>
</dependency>
For more details, please refer to https://docs.spring.io/spring-framework/reference/integration/checkpoint-restore.html
2. Create a Dockerfile:
Create a Dockerfile to define how your application will be containerized. Note, the Zulu JVM, which offers good support for CRaC, is used here. In the Java startup parameters, the location where the checkpoint image will be stored has been added.
FROM azul/zulu-openjdk:17-jdk-crac-latest as builder
WORKDIR /home/app
ADD . /home/app/spring-petclinic
RUN cd spring-petclinic && ./mvnw -Dmaven.test.skip=true clean package
FROM azul/zulu-openjdk:17-jdk-crac-latest
WORKDIR /home/app
EXPOSE 8080
COPY --from=builder /home/app/spring-petclinic/target/*.jar petclinic.jar
ENTRYPOINT ["java", "-XX:CRaCCheckpointTo=/test", "-jar", "petclinic.jar"]
3. Build the Docker Image:
Use Docker to build the image:
docker build -t spring-petclinic:crac .
Creating a Deployment on Azure Kubernetes Service
With the application containerized, we can now deploy it on AKS. Follow these steps:
1. Create an AKS Cluster:
If you don't have an AKS cluster, create one using the Azure CLI:
az aks create --resource-group myResourceGroup --name myAKSCluster --node-count 1 --enable-addons monitoring --generate-ssh-keys
2. Create an Azure Container Registry (ACR) and push the Docker Image to ACR:
If you are using **Azure Container Registry**, tag the image and push it to ACR:
az acr create -n <acr-name> -g MyResourceGroup
docker tag spring-petclinic:crac <acr-name>.azurecr.io/spring-petclinic:crac
docker push <acr-name>.azurecr.io/spring-petclinic:crac
3. Create an image pull secret to your ACR
kubectl create secret docker-registry regcred --docker-server=<acr-name>.azurecr.io --docker-username=<acr-name> --docker-password=<acr-key>
4. Create Azure File to mount to the deployment
Note, since the speed of restoring from a checkpoint is closely related to disk performance, it is highly recommended to use Azure Storage in the same region.
az storage account create --name mystorageaccount --resource-group myResourceGroup --location eastus --kind FileStorage --sku Premium_LRS
az storage share-rm create --resource-group myResourceGroup --storage-account mystorageaccount --name myfileshare
az storage account keys list --resource-group myResourceGroup --account-name mystorageaccount
kubectl create secret generic azure-secret --from-literal=azurestorageaccountname=mystorageaccount --from-literal=azurestorageaccountkey=<storage-account-key>
5. Create a Kubernetes Deployment:
Create a deployment YAML file (`deployment.yaml`) for your application:
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
replicas: 1
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: <acr-name>.azurecr.io/spring-petclinic:crac
ports:
- containerPort: 8080
securityContext:
allowPrivilegeEscalation: false
capabilities:
add: # The two capabilities are required to to checkpoint
- SYS_PTRACE
- CHECKPOINT_RESTORE
privileged: false
volumeMounts:
- name: crac-storage
mountPath: /test
volumes:
- name: crac-storage
csi:
driver: file.csi.azure.com
volumeAttributes:
secretName: azure-secret
shareName: myfileshare
mountOptions: 'dir_mode=0777,file_mode=0777,cache=strict,actimeo=30,nosharesock,nobrl'
imagePullSecrets:
- name: regcred
6. Deploy to AKS:
Apply the deployment to your AKS cluster:
kubectl apply -f deployment.yaml
7. Check start up logs and duration:
kubectl logs -l app=myapp
|\ _,,,--,,_
/,`.-'`' ._ \-;;,_
_______ __|,4- ) )_ .;.(__`'-'__ ___ __ _ ___ _______
| | '---''(_/._)-'(_\_) | | | | | | | | |
| _ | ___|_ _| | | | | |_| | | | __ _ _
| |_| | |___ | | | | | | | | | | \ \ \ \
| ___| ___| | | | _| |___| | _ | | _| \ \ \ \
| | | |___ | | | |_| | | | | | | |_ ) ) ) )
|___| |_______| |___| |_______|_______|___|_| |__|___|_______| / / / /
==================================================================/_/_/_/
:: Built with Spring Boot :: 3.3.0
2024-09-26T14:59:41.464Z INFO 129 --- [ main] o.s.s.petclinic.PetClinicApplication : Starting PetClinicApplication v3.3.0-SNAPSHOT using Java 17.0.12 with PID 129 (/home/app/petclinic.jar started by root in /home/app)
2024-09-26T14:59:41.470Z INFO 129 --- [ main] o.s.s.petclinic.PetClinicApplication : No active profile set, falling back to 1 default profile: "default"
2024-09-26T14:59:42.994Z INFO 129 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Bootstrapping Spring Data JPA repositories in DEFAULT mode.
2024-09-26T14:59:43.071Z INFO 129 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Finished Spring Data repository scanning in 66 ms. Found 2 JPA repository interfaces.
2024-09-26T14:59:44.125Z INFO 129 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat initialized with port 8080 (http)
2024-09-26T14:59:44.134Z INFO 129 --- [ main] o.apache.catalina.core.StandardService : Starting service [Tomcat]
2024-09-26T14:59:44.135Z INFO 129 --- [ main] o.apache.catalina.core.StandardEngine : Starting Servlet engine: [Apache Tomcat/10.1.24]
2024-09-26T14:59:44.176Z INFO 129 --- [ main] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring embedded WebApplicationContext
2024-09-26T14:59:44.178Z INFO 129 --- [ main] w.s.c.ServletWebServerApplicationContext : Root WebApplicationContext: initialization completed in 2595 ms
2024-09-26T14:59:44.560Z INFO 129 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Starting...
2024-09-26T14:59:44.779Z INFO 129 --- [ main] com.zaxxer.hikari.pool.HikariPool : HikariPool-1 - Added connection conn0: url=jdbc:h2:mem:131e3017-7e28-4a31-b704-5d3840cd46d6 user=SA
2024-09-26T14:59:44.781Z INFO 129 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Start completed.
2024-09-26T14:59:45.011Z INFO 129 --- [ main] o.hibernate.jpa.internal.util.LogHelper : HHH000204: Processing PersistenceUnitInfo [name: default]
2024-09-26T14:59:45.073Z INFO 129 --- [ main] org.hibernate.Version : HHH000412: Hibernate ORM core version 6.5.2.Final
2024-09-26T14:59:45.113Z INFO 129 --- [ main] o.h.c.internal.RegionFactoryInitiator : HHH000026: Second-level cache disabled
2024-09-26T14:59:45.451Z INFO 129 --- [ main] o.s.o.j.p.SpringPersistenceUnitInfo : No LoadTimeWeaver setup: ignoring JPA class transformer
2024-09-26T14:59:46.466Z INFO 129 --- [ main] o.h.e.t.j.p.i.JtaPlatformInitiator : HHH000489: No JTA platform available (set 'hibernate.transaction.jta.platform' to enable JTA platform integration)
2024-09-26T14:59:46.468Z INFO 129 --- [ main] j.LocalContainerEntityManagerFactoryBean : Initialized JPA EntityManagerFactory for persistence unit 'default'
2024-09-26T14:59:46.826Z INFO 129 --- [ main] o.s.d.j.r.query.QueryEnhancerFactory : Hibernate is in classpath; If applicable, HQL parser will be used.
2024-09-26T14:59:48.666Z INFO 129 --- [ main] o.s.b.a.e.web.EndpointLinksResolver : Exposing 14 endpoints beneath base path '/actuator'
2024-09-26T14:59:48.778Z INFO 129 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port 8080 (http) with context path '/'
2024-09-26T14:59:48.810Z INFO 129 --- [ main] o.s.s.petclinic.PetClinicApplication : Started PetClinicApplication in 8.171 seconds (process running for 8.862)
As you can see, the startup typically takes a little over 8 seconds.
Creating a Checkpoint with CRaC
With the application running, the next step is to create a checkpoint using CRaC.
1. Create the Checkpoint:
Once the application reaches the desired state (e.g., after fully initializing), issue a checkpoint command. CRaC will capture the application's state, which can later be restored for fast startups. The image will be stored in the external volumes in the Azure Storage file share created just before.
kubectl exec -it <pod-name> -- jcmd petclinic JDK.checkpoint
Restoring from the Checkpoint
Now that we have created a checkpoint, we can package this state into a new Docker image and deploy it for fast restores.
1. Update deployment to restore Image in AKS:
Modify your deployment YAML to use the restored command when start the container:
containers:
- command:
- java
- -XX:CRaCRestoreFrom=/test
Apply the changes:
kubectl apply -f deployment.yaml
2. Check startup time
kubectl logs -l app=myapp
2024-09-26T15:01:42.400Z INFO 129 --- [Attach Listener] o.s.c.support.DefaultLifecycleProcessor : Restarting Spring-managed lifecycle beans after JVM restore
2024-09-26T15:01:42.396Z WARN 129 --- [l-1 housekeeper] com.zaxxer.hikari.pool.HikariPool : HikariPool-1 - Thread starvation or clock leap detected (housekeeper delta=4m9s910ms846?s988ns).
2024-09-26T15:01:42.473Z INFO 129 --- [Attach Listener] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port 8080 (http) with context path '/'
2024-09-26T15:01:42.474Z INFO 129 --- [Attach Listener] o.s.c.support.DefaultLifecycleProcessor : Spring-managed lifecycle restart completed (restored JVM running for 1009 ms)
This time, the startup took just over one second!
Performance Comparison
The final step is to compare the startup times of the original and restored versions of the application.
1. Measure Startup Time:
For both the original and restored applications, measure the time it takes from container start to application readiness. Compared to the original startup, which took over 8 seconds, restoring from the checkpoint reduced the startup time to just over 1 second—a 7x improvement. What's more, this significant boost only requires adding the CRaC dependency, without any additional code modifications.
2. Compare Results:
Besides, the CRaC-enabled application should demonstrate significantly faster startup times due to restoring from the pre-initialized checkpoint. You can achieve this by creating the checkpoint after giving your Java application sufficient time to warm up.
Conclusion
In this post, we walked through how to leverage CRaC to accelerate the startup of a Java application running on Azure Kubernetes Service. By checkpointing a fully-initialized application and restoring it later, we can drastically reduce startup times, improving performance for both cold and warm starts in containerized environments. CRaC is a promising technology, especially in environments where fast application startup is critical, such as serverless platforms or microservices architectures.
As a comparison, Spring Native is another way to improve performance. Spring Native enables developers to compile Spring applications into native binaries using GraalVM, offering extremely fast startup and low memory usage, which is ideal for short-lived, stateless services. CRaC maintains full JVM capabilities, while Spring Native may require code adjustments and has longer build times.
However, as a relatively new technology, CRaC has its own limitations. For instance, many third-party libraries do not yet support CRaC. Currently, Spring Boot, Quarkus, and Micronaut all support CRaC, but there are still many frameworks and libraries that need to be adapted for CRaC compatibility. Additionally, it requires that the application closes all open file handles before capturing the checkpoint. You may refer to https://github.com/CRaC/docs/blob/master/fd-policies.md for more details. CRaC also demands that the environment at the time of checkpoint creation closely matches the environment during restore.
We will continue to closely monitor these limitations and work alongside the community to improve its broader applicability.
We would also love to hear your thoughts on this technology. Your feedback will help us improve how Java runs on Azure. Feel free to share your thoughts in the comments section at the end of this article.
Updated Nov 07, 2024
Version 4.0Xiaoyun_Ding
Microsoft
Joined March 03, 2023
Apps on Azure Blog
Follow this blog board to get notified when there's new activity