Description of image

Health Checks for Gradient Deployments

Paperspace Deployments are containers-as-a-service that allow you to run container images and serve machine learning models using a high-performance, low-latency service with a RESTful API.


Health checks leverage Kubernetes probes under the hood. Some slight changes in the configuration have been chosen to deliver a better experience.

There are three configurable health checks available: liveness, readiness, and startup.

  • liveness checks detect deployment containers that transition to an unhealthy state. liveness checks remedy said situations through targeted restarts.

  • readiness checks tell our load balancers when a container is ready to receive traffic. These checks run for the life of the container. Applications that leverage readiness checks may need to load a model into memory or initiate connections to external services before receiving requests.

  • startup checks detect if a container has started successfully. If the container never enters a successful state, the container is killed and restarted. Once a startup health check detects a successful start of the container, it initiates the liveness & readiness health checks (if configured).

Any status codes returned greater than or equal to 200 and less than 400 indicate success. Any other code indicates failure.

Configure Health Checks

Use the following parameters in the deployment spec to configure health checks:

  • healthChecks: The overall label used to specify any health checks.

  • liveness/readiness/startup: The type of health check specified.

    • path: The path of the http endpoint that the health check calls.

    • port: (Optional) The port that the path is running on. The default is to use the same port as the image itself.

    • initialDelaySeconds: (Optional) The number of seconds after the container has started before health checks are initiated. Defaults to 0 seconds.

    • periodSeconds: (Optional) How often (in seconds) to perform the health check. Defaults to 10 seconds.

    • timeoutSeconds: (Optional) The number of seconds after which the health check times out. Defaults to 1 second. Minimum value is 1.

    • failureThreshold: (Optional) The number of times the health check has to return a failed response for the health check to be assigned a failed status. Defaults to 3 tries.

Health Check Example

Below is a deployment spec and a Python script that use health checks to monitor a FastAPI application. On startup the application downloads a model, checks that it can make a connection to an S3 bucket, and waits to be marked healthy before serving requests.

# Deployment Spec Example using HTTP.
healthChecks: # health checks allow you to define a set of probes to check the health of your app
  readiness:
    path: /readiness
    port: 8000
    periodSeconds: 10
    timeoutSeconds: 5
  liveness:
    path: /liveness
    port: 8000
  startup:
    path: /startup
    port: 8000
    periodSeconds: 10
    failureThreshold: 6
# FastAPI Application Example

from pydantic import BaseSettings
from fastapi import FastAPI, Response, status

class LoadStatus(BaseSettings):
    model_loaded: bool = False
    # Other statuses

load_status = LoadStatus()
app = FastAPI()

@app.on_event("startup")
async def model_load():
    # Download model
    load_status.model_loaded = True

@app.get("/liveness/", status_code=200)
def liveness_check():
    return "Liveness check succeeded."

@app.get("/readiness/", status_code=200)
def readiness_check(response: Response):
    s3_successful =  # S3 connection check

    if not s3_successful or not load_status.model_loaded:
      response.status_code = status.HTTP_503_SERVICE_UNAVAILABLE
      return "Readiness check failed."

    return "Readiness check succeeded."

@app.get("/startup/", status_code=200)
def startup_check():
    return "Startup check succeeded."

@app.post("/predict/")
def predict():
    # Make a prediction
    # Upload to S3 bucket
    # Return response

When the deployment spec is submitted, the container is pulled from the container registry. When finished, the deployment starts to build the container. As the container starts to build, the startup health check starts probing the application. The app has 60 seconds to startup before the container is marked as unhealthy by the startup health check and restarted (periodSeconds*failureThreshold = 10*6 = 60 seconds).

The readiness health check ensures the model has been downloaded and the container can make a connection to the S3 bucket and then return a 200 status code which marks the container to be in a successful and ready state.

Once all health checks have passed, the container starts to receive incoming traffic (for example into the /predict/ endpoint) and the liveness and readiness health checks continue to probe and monitor the container. In the case of the readiness probe, if at some point in the future the container can’t make a connection to the S3 bucket, it return a 503 status code to tell the deployment to no longer send traffic to this container until it can successfully makes a connection with the S3 bucket again.

Because the FastAPI app above defines a startup event process, that process (model download) has to finish before the container is considered to have a successful startup. When the model download finishes, assuming it’s within 60 seconds, the startup health check succeeds, stops probing, and the liveness and readiness health checks start to probe the container every 10 seconds (Kubernetes default) to monitor the health and readiness for the life of the container.