# Understanding Inputs &amp; Outputs (private)

Workflows automate machine learning tasks, combining GPU instances with an expressive syntax to generate production-ready machine learning pipelines with a few lines of code.

A Gradient Workflow is composed of a series of steps. These steps specify how to orchestrate computational tasks. Each step can communicate with other steps through what are known as `inputs` and `outputs`.

There are three types of inputs and outputs:

- Datasets
- Volumes
- Strings

## Datasets

The dataset type leverages the Gradient platform native [dataset](https://docs.digitalocean.com/products/paperspace/machines/index.html.md) primitive. Information stored within datasets is not limited to any single type of data. In fact, a generic dataset can include anything from pre-trained models to generated images to configuration files. Inherent to datasets is the notion of versions. Workflows can consume and produce new dataset versions as well as tag new versions of existing datasets.

**Note**: Datasets must be defined in advance of being referenced in a workflow. See [Create Datasets for the Workflow](https://docs.digitalocean.com/products/paperspace/workflows/how-to/create-datasets/index.html.md) for more information.

To consume a dataset that already exists within Gradient, use the following YAML:

```yaml
inputs:
  my-dataset:
    type: dataset
    with:
      ref: my-dataset-id
```

To generate a new dataset version from a Workflow step, use the following YAML:

```yaml
my-job:
  uses: container@v1
  with:
    args:
      - bash
      - "-c"
      - cp -R /my-trained-model /outputs/my-dataset
    image: bash:5
  outputs:
    my-dataset:
      type: dataset
      with:
        ref: my-dataset-id
```

`my-dataset-id` can be the actual ID of the dataset, a 15 character string that looks like `def123ghi456jkl` (or appended with a version ID too), or a name for the dataset.

## Volumes

Unlike, for example, GitHub Actions, it is likely that multiple Gradient Steps/Actions execute on multiple machines. To facilitate the passing of data between these nodes, Gradient Actions expose the notion of volumes and volume passing.

Volumes enable actions such as the git-checkout action. Volumes can be defined as input volumes or output volumes or both. When a volume is an `output` it is mounted in `/outputs` and is writeable. When a volume is an `input` it is mounted in `/inputs` and is read only.

**Note**: Volumes are limited to 5 GB of data currently. If you need more space, we recommend using Datasets.

To define an output volume, use the following YAML:

```yaml
outputs:
  my-volume:
    type: volume
```

This YAML creates a volume as an output and then uses it as an input in a subsequent job step:

```yaml
defaults:
  resources:
    instance-type: P4000

jobs:
  job1:
    uses: container@v1
    with:
      args:
        - bash
        - -c
        - echo hello > /outputs/my-volume/testfile1; echo "wrote testfile1 to volume"
      image: bash
    outputs:
      my-volume:
        type: volume
  job2:
    needs:
      - job1
    uses: container@v1
    with:
      args:
        - bash
        - -c
        - cat /inputs/my-volume/testfile1
      image: bash
    inputs:
      my-volume: job1.outputs.my-volume
```

**Warning**: You currently cannot use Volumes as an output after the job that created them ends.

## Strings

In some cases, you may need to pass a single value between Workflow steps. The string type makes this possible.

To pass the string as a workflow-level input, use the following YAML:

```yaml
inputs:
  my-string:
    type: string
    with:
      value: "my string value"

jobs:
  job-1:
    resources:
      instance-type: P4000
    uses: container@v1
    with:
      args:
        - bash
        - -c
        - cat /inputs/my-string
      image: bash:5
    inputs:
      my-string: workflow.inputs.my-string
```

To pass a string between job steps, use the following YAML:

```yaml
defaults:
  resources:
    instance-type: P4000

jobs:
  job-1:
    uses: container@v1
    with:
      args:
        - bash
        - -c
        - echo "string output from job-1" > /outputs/my-string; echo job-1 finished
      image: bash:5
    outputs:
      my-string:
        type: string
  job-2:
    uses: container@v1
    with:
      args:
        - bash
        - -c
        - cat /inputs/my-string
      image: bash:5
    needs:
      - job-1
    inputs:
      my-string: job-1.outputs.my-string
```

To create a model from a dataset and pass the model ID as a string to a [Deployment](https://docs.digitalocean.com/products/paperspace/deployments/index.html.md) step, you first need to:

1. Create a dataset named `test-model` and upload valid TensorFlow model files to it.
2. Define a secret named `MY_API_KEY` with your gradient-cli api-key.
3. Substitute your `clusterId` in the deployment create step.

Then, use the following YAML:

```yaml
defaults:
  resources:
    instance-type: P4000

jobs:
  UploadModel:
    uses: create-model@v1
    with:
      name: my-model
      type: Tensorflow
    inputs:
      model:
        type: dataset
        with:
          ref: test-model
    outputs:
      model-id:
        type: string
  DeployModel:
    needs:
      - UploadModel
    inputs:
      model-id: UploadModel.outputs.model-id
    env:
      PAPERSPACE_API_KEY: secret:MY_API_KEY
    uses: container@v1
    with:
      command: bash
      args:
        - -c
        - >-
          gradient deployments create
          --clusterId cl1234567
          --deploymentType TFServing
          --modelId $(cat inputs/model-id)
          --name "Sample Deployment"
          --machineType P4000
          --imageUrl tensorflow/serving:latest-gpu
          --instanceCount 1
      image: paperspace/gradient-sdk
```