# Health Checks for Gradient Deployments Paperspace Deployments are containers-as-a-service that allow you to run container images and serve machine learning models using a high-performance, low-latency service with a RESTful API. Health checks leverage [Kubernetes probes](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/) under the hood. Some slight changes in the configuration have been chosen to deliver a better experience. There are three configurable health checks available: `liveness`, `readiness`, and `startup`. - `liveness` checks detect deployment containers that transition to an unhealthy state. `liveness` checks remedy said situations through targeted restarts. - `readiness` checks tell our load balancers when a container is ready to receive traffic. These checks run for the life of the container. Applications that leverage `readiness` checks may need to load a model into memory or initiate connections to external services before receiving requests. - `startup` checks detect if a container has started successfully. If the container never enters a successful state, the container is killed and restarted. Once a `startup` health check detects a successful start of the container, it initiates the `liveness` & `readiness` health checks (if configured). Any status codes returned greater than or equal to 200 and less than 400 indicate success. Any other code indicates failure. ## Configure Health Checks Use the following parameters in the deployment spec to configure health checks: - `healthChecks`: The overall label used to specify any health checks. - `liveness/readiness/startup`: The type of health check specified. - `path`: The path of the http endpoint that the health check calls. - `port`: (Optional) The port that the path is running on. The default is to use the same port as the image itself. - `initialDelaySeconds`: (Optional) The number of seconds after the container has started before health checks are initiated. Defaults to 0 seconds. - `periodSeconds`: (Optional) How often (in seconds) to perform the health check. Defaults to 10 seconds. - `timeoutSeconds`: (Optional) The number of seconds after which the health check times out. Defaults to 1 second. Minimum value is 1. - `failureThreshold`: (Optional) The number of times the health check has to return a failed response for the health check to be assigned a failed status. Defaults to 3 tries. ## Health Check Example Below is a deployment spec and a Python script that use health checks to monitor a FastAPI application. On startup the application downloads a model, checks that it can make a connection to an S3 bucket, and waits to be marked healthy before serving requests. ```yaml # Deployment Spec Example using HTTP. healthChecks: # health checks allow you to define a set of probes to check the health of your app readiness: path: /readiness port: 8000 periodSeconds: 10 timeoutSeconds: 5 liveness: path: /liveness port: 8000 startup: path: /startup port: 8000 periodSeconds: 10 failureThreshold: 6 ``` ```python # FastAPI Application Example from pydantic import BaseSettings from fastapi import FastAPI, Response, status class LoadStatus(BaseSettings): model_loaded: bool = False # Other statuses load_status = LoadStatus() app = FastAPI() @app.on_event("startup") async def model_load(): # Download model load_status.model_loaded = True @app.get("/liveness/", status_code=200) def liveness_check(): return "Liveness check succeeded." @app.get("/readiness/", status_code=200) def readiness_check(response: Response): s3_successful = # S3 connection check if not s3_successful or not load_status.model_loaded: response.status_code = status.HTTP_503_SERVICE_UNAVAILABLE return "Readiness check failed." return "Readiness check succeeded." @app.get("/startup/", status_code=200) def startup_check(): return "Startup check succeeded." @app.post("/predict/") def predict(): # Make a prediction # Upload to S3 bucket # Return response ``` When the deployment spec is submitted, the container is pulled from the container registry. When finished, the deployment starts to build the container. As the container starts to build, the `startup` health check starts probing the application. The app has 60 seconds to startup before the container is marked as unhealthy by the `startup` health check and restarted `(periodSeconds*failureThreshold = 10*6 = 60 seconds)`. The `readiness` health check ensures the model has been downloaded and the container can make a connection to the S3 bucket and then return a `200` status code which marks the container to be in a successful and ready state. Once all health checks have passed, the container starts to receive incoming traffic (for example into the `/predict/` endpoint) and the `liveness` and `readiness` health checks continue to probe and monitor the container. In the case of the `readiness` probe, if at some point in the future the container can’t make a connection to the S3 bucket, it return a `503` status code to tell the deployment to no longer send traffic to this container until it can successfully makes a connection with the S3 bucket again. Because the FastAPI app above defines a startup event process, that process (model download) has to finish before the container is considered to have a successful startup. When the model download finishes, assuming it’s within 60 seconds, the `startup` health check succeeds, stops probing, and the `liveness` and `readiness` health checks start to probe the container every 10 seconds (Kubernetes default) to monitor the health and readiness for the life of the container.