Health Checks for Gradient Deployments
Validated on 14 Dec 2023 • Last edited on 18 Dec 2024
Paperspace Deployments are containers-as-a-service that allow you to run container images and serve machine learning models using a high-performance, low-latency service with a RESTful API.
Health checks leverage Kubernetes probes under the hood. Some slight changes in the configuration have been chosen to deliver a better experience.
There are three configurable health checks available: liveness, readiness, and startup.
-
livenesschecks detect deployment containers that transition to an unhealthy state.livenesschecks remedy said situations through targeted restarts. -
readinesschecks tell our load balancers when a container is ready to receive traffic. These checks run for the life of the container. Applications that leveragereadinesschecks may need to load a model into memory or initiate connections to external services before receiving requests. -
startupchecks detect if a container has started successfully. If the container never enters a successful state, the container is killed and restarted. Once astartuphealth check detects a successful start of the container, it initiates theliveness&readinesshealth checks (if configured).
Any status codes returned greater than or equal to 200 and less than 400 indicate success. Any other code indicates failure.
Configure Health Checks
Use the following parameters in the deployment spec to configure health checks:
-
healthChecks: The overall label used to specify any health checks. -
liveness/readiness/startup: The type of health check specified.-
path: The path of the http endpoint that the health check calls. -
port: (Optional) The port that the path is running on. The default is to use the same port as the image itself. -
initialDelaySeconds: (Optional) The number of seconds after the container has started before health checks are initiated. Defaults to 0 seconds. -
periodSeconds: (Optional) How often (in seconds) to perform the health check. Defaults to 10 seconds. -
timeoutSeconds: (Optional) The number of seconds after which the health check times out. Defaults to 1 second. Minimum value is 1. -
failureThreshold: (Optional) The number of times the health check has to return a failed response for the health check to be assigned a failed status. Defaults to 3 tries.
-
Health Check Example
Below is a deployment spec and a Python script that use health checks to monitor a FastAPI application. On startup the application downloads a model, checks that it can make a connection to an S3 bucket, and waits to be marked healthy before serving requests.
# Deployment Spec Example using HTTP.
healthChecks: # health checks allow you to define a set of probes to check the health of your app
readiness:
path: /readiness
port: 8000
periodSeconds: 10
timeoutSeconds: 5
liveness:
path: /liveness
port: 8000
startup:
path: /startup
port: 8000
periodSeconds: 10
failureThreshold: 6# FastAPI Application Example
from pydantic import BaseSettings
from fastapi import FastAPI, Response, status
class LoadStatus(BaseSettings):
model_loaded: bool = False
# Other statuses
load_status = LoadStatus()
app = FastAPI()
@app.on_event("startup")
async def model_load():
# Download model
load_status.model_loaded = True
@app.get("/liveness/", status_code=200)
def liveness_check():
return "Liveness check succeeded."
@app.get("/readiness/", status_code=200)
def readiness_check(response: Response):
s3_successful = # S3 connection check
if not s3_successful or not load_status.model_loaded:
response.status_code = status.HTTP_503_SERVICE_UNAVAILABLE
return "Readiness check failed."
return "Readiness check succeeded."
@app.get("/startup/", status_code=200)
def startup_check():
return "Startup check succeeded."
@app.post("/predict/")
def predict():
# Make a prediction
# Upload to S3 bucket
# Return responseWhen the deployment spec is submitted, the container is pulled from the container registry. When finished, the deployment starts to build the container. As the container starts to build, the startup health check starts probing the application. The app has 60 seconds to startup before the container is marked as unhealthy by the startup health check and restarted (periodSeconds*failureThreshold = 10*6 = 60 seconds).
The readiness health check ensures the model has been downloaded and the container can make a connection to the S3 bucket and then return a 200 status code which marks the container to be in a successful and ready state.
Once all health checks have passed, the container starts to receive incoming traffic (for example into the /predict/ endpoint) and the liveness and readiness health checks continue to probe and monitor the container. In the case of the readiness probe, if at some point in the future the container can’t make a connection to the S3 bucket, it return a 503 status code to tell the deployment to no longer send traffic to this container until it can successfully makes a connection with the S3 bucket again.
Because the FastAPI app above defines a startup event process, that process (model download) has to finish before the container is considered to have a successful startup. When the model download finishes, assuming it’s within 60 seconds, the startup health check succeeds, stops probing, and the liveness and readiness health checks start to probe the container every 10 seconds (Kubernetes default) to monitor the health and readiness for the life of the container.