I was troubleshooting a master-0 pod eviction due to disk pressure.
The pod was shutdown
Ephemeral space reclaimed (85% to 73%) – Deleted unused images from the node (I.E. docker image purge …)
The pod was started and master-0 came back online
Right after the restart I saw the disk space usage drop to 38% but found no events in the journal or other logs indicating how the additional space was acquired. Looking at the disk metrics I was able to map the time of the disk space increase to the startup of SQL Server, specifically TEMPDB.
The TEMPDB had grown large, prior to the restart (auto grow no limit), consuming disk space on the master-0 persistent volume. Once the 80% disk space usage threshold was crossed the pod was evicted followed by the cleanup and restart attempt. Luckily, enough space was reclaimed by deleting the unused images allowing the master-0 pod to restart. The master-0 pod restarted (SQL Server restarted) and TEMPDB was recreated using the default file sizes. The TEMPDB recreation caused disk space usage to change from 73% to 38%.
If enough space could not be reclaimed as part of the eviction resolution, the pod generally goes into CrashLoopBackOff and/or remains in evicted state. When disk space used remains above the eviction threshold (80%) Kubernetes is unable to restart the pod and space reclamation requires intervention to correct. You commonly must log into the node and free disk space or add additional disk space resources for the persistent volume to allow the pod to restart.
Recommendation
Be sure to set file size limits on your TEMPDB to avoid unwanted disk space consumption leading to pod eviction.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.