Workflows automate machine learning tasks, combining GPU instances with an expressive syntax to generate production-ready machine learning pipelines with a few lines of code.
A Gradient Workflow is composed of a series of steps. These steps specify how to orchestrate computational tasks. Each step can communicate with other steps through what are known as inputs
and outputs
.
There are three types of inputs and outputs:
The dataset type leverages the Gradient platform native dataset primitive. Information stored within datasets is not limited to any single type of data. In fact, a generic dataset can include anything from pre-trained models to generated images to configuration files. Inherent to datasets is the notion of versions. Workflows can consume and produce new dataset versions as well as tag new versions of existing datasets.
To consume a dataset that already exists within Gradient, use the following YAML:
inputs:
my-dataset:
type: dataset
with:
ref: my-dataset-id
To generate a new dataset version from a Workflow step, use the following YAML:
my-job:
uses: container@v1
with:
args:
- bash
- "-c"
- cp -R /my-trained-model /outputs/my-dataset
image: bash:5
outputs:
my-dataset:
type: dataset
with:
ref: my-dataset-id
my-dataset-id
can be the actual ID of the dataset, a 15 character string that looks like def123ghi456jkl
(or appended with a version ID too), or a name for the dataset.
Unlike, for example, GitHub Actions, it is likely that multiple Gradient Steps/Actions execute on multiple machines. To facilitate the passing of data between these nodes, Gradient Actions expose the notion of volumes and volume passing.
Volumes enable actions such as the git-checkout action. Volumes can be defined as input volumes or output volumes or both. When a volume is an output
it is mounted in /outputs
and is writeable. When a volume is an input
it is mounted in /inputs
and is read only.
To define an output volume, use the following YAML:
outputs:
my-volume:
type: volume
This YAML creates a volume as an output and then uses it as an input in a subsequent job step:
defaults:
resources:
instance-type: P4000
jobs:
job1:
uses: container@v1
with:
args:
- bash
- -c
- echo hello > /outputs/my-volume/testfile1; echo "wrote testfile1 to volume"
image: bash
outputs:
my-volume:
type: volume
job2:
needs:
- job1
uses: container@v1
with:
args:
- bash
- -c
- cat /inputs/my-volume/testfile1
image: bash
inputs:
my-volume: job1.outputs.my-volume
In some cases, you may need to pass a single value between Workflow steps. The string type makes this possible.
To pass the string as a workflow-level input, use the following YAML:
inputs:
my-string:
type: string
with:
value: "my string value"
jobs:
job-1:
resources:
instance-type: P4000
uses: container@v1
with:
args:
- bash
- -c
- cat /inputs/my-string
image: bash:5
inputs:
my-string: workflow.inputs.my-string
To pass a string between job steps, use the following YAML:
defaults:
resources:
instance-type: P4000
jobs:
job-1:
uses: container@v1
with:
args:
- bash
- -c
- echo "string output from job-1" > /outputs/my-string; echo job-1 finished
image: bash:5
outputs:
my-string:
type: string
job-2:
uses: container@v1
with:
args:
- bash
- -c
- cat /inputs/my-string
image: bash:5
needs:
- job-1
inputs:
my-string: job-1.outputs.my-string
To create a model from a dataset and pass the model ID as a string to a Deployment step, you first need to:
test-model
and upload valid TensorFlow model files to it.MY_API_KEY
with your gradient-cli api-key.clusterId
in the deployment create step.Then, use the following YAML:
defaults:
resources:
instance-type: P4000
jobs:
UploadModel:
uses: create-model@v1
with:
name: my-model
type: Tensorflow
inputs:
model:
type: dataset
with:
ref: test-model
outputs:
model-id:
type: string
DeployModel:
needs:
- UploadModel
inputs:
model-id: UploadModel.outputs.model-id
env:
PAPERSPACE_API_KEY: secret:MY_API_KEY
uses: container@v1
with:
command: bash
args:
- -c
- >-
gradient deployments create
--clusterId cl1234567
--deploymentType TFServing
--modelId $(cat inputs/model-id)
--name "Sample Deployment"
--machineType P4000
--imageUrl tensorflow/serving:latest-gpu
--instanceCount 1
image: paperspace/gradient-sdk