2.10. Workflows#

A workflow is a CWL processing unit that executes command-line tools, expression tools, or workflows (sub-workflows) as steps. It must have inputs, outputs, and steps defined in the CWL document.

digraph G { compound=true; rankdir="LR"; fontname="Verdana"; fontsize="10"; graph [splines=ortho]; node [fontname="Verdana", fontsize="10", shape=box]; edge [fontname="Verdana", fontsize="10"]; subgraph cluster_0 { node [width = 1.75]; steps_0[style="filled" label="Command-line tools"]; steps_1[style="filled" label="Expression tools"]; steps_2[style="filled" label="Sub-workflows"]; label="steps"; fill=gray; } inputs -> steps_1 [lhead=cluster_0]; steps_1 -> outputs [ltail=cluster_0]; }

CWL workflow.#

The CWL document echo-uppercase.cwl defines a workflow that runs the command-line tool, and the expression tool showed in the earlier examples.

echo-uppercase.cwl#
cwlVersion: v1.2
class: Workflow

requirements:
  InlineJavascriptRequirement: {}

inputs:
  message: string

outputs:
  out:
    type: string
    outputSource: uppercase/uppercase_message

steps:
  echo:
    run: echo.cwl
    in:
      message: message
    out: [out]
  uppercase:
    run: uppercase.cwl
    in:
      message:
        source: echo/out
    out: [uppercase_message]

A command-line tool or expression tool can also be written directly in the same CWL document as the workflow. For example, we can rewrite the echo-uppercase.cwl workflow as a single file:

echo-uppercase-single-file.cwl#
cwlVersion: v1.2
class: Workflow

requirements:
  InlineJavascriptRequirement: {}

inputs:
  message: string

outputs:
  out:
    type: string
    outputSource: uppercase/uppercase_message

steps:
  echo:
    run:
      class: CommandLineTool

      baseCommand: echo

      stdout: output.txt

      inputs:
        message:
          type: string
          inputBinding: {}
      outputs:
        out:
          type: string
          outputBinding:
            glob: output.txt
            loadContents: true
            outputEval: $(self[0].contents)
    in:
      message: message
    out: [out]
  uppercase:
    run:
      class: ExpressionTool

      requirements:
        InlineJavascriptRequirement: {}

      inputs:
        message: string
      outputs:
        uppercase_message: string

      expression: |
        ${ return {"uppercase_message": inputs.message.toUpperCase()}; }
    in:
      message:
        source: echo/out
    out: [uppercase_message]

Having separate files helps with modularity and code organization. But it can be helpful writing everything in a single file for development. There are other ways to combine multiple files into a single file (e.g. cwltool --pack) discussed further in other sections of this user guide.

Note

For a sub-workflows you need to enable the requirement SubworkflowFeatureRequirement. It is covered in another section of this user guide in more detail.

2.10.1. Writing Workflows#

This workflow extracts a java source file from a tar file and then compiles it.

1st-workflow.cwl#
#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: Workflow
inputs:
  tarball: File
  name_of_file_to_extract: string

outputs:
  compiled_class:
    type: File
    outputSource: compile/classfile

steps:
  untar:
    run: tar-param.cwl
    in:
      tarfile: tarball
      extractfile: name_of_file_to_extract
    out: [extracted_file]

  compile:
    run: arguments.cwl
    in:
      src: untar/extracted_file
    out: [classfile]

Visualization of 1st-workflow.cwl

Visualization of 1st-workflow.cwl

Use a YAML or a JSON object in a separate file to describe the input of a run:

1st-workflow-job.yml#
tarball:
  class: File
  path: hello.tar
name_of_file_to_extract: Hello.java

Next, create a sample Java file and add it to a tar file to use with the command-line tool.

$ echo "public class Hello {}" > Hello.java && tar -cvf hello.tar Hello.java
Hello.java

Now invoke cwltool with the tool description and the input object on the command line:

$ cwltool 1st-workflow.cwl 1st-workflow-job.yml
INFO /opt/hostedtoolcache/Python/3.9.13/x64/bin/cwltool 3.1.20220913185150
INFO Resolved '1st-workflow.cwl' to 'file:///home/runner/work/user_guide/user_guide/src/_includes/cwl/workflows/1st-workflow.cwl'
INFO [workflow ] start
INFO [workflow ] starting step untar
INFO [step untar] start
INFO [job untar] /tmp/r3oshu1e$ tar \
    --extract \
    --file \
    /tmp/_norg0cf/stg5d0e8759-d441-40dc-be46-17e11829a7a3/hello.tar \
    Hello.java
INFO [job untar] completed success
INFO [step untar] completed success
INFO [workflow ] starting step compile
INFO [step compile] start
INFO ['udocker', 'pull', 'openjdk:9.0.1-11-slim']
Info: downloading layer sha256:8d602e635a7063b254ddcd64997153b2e8f74c29ff4648089ae116a4ca3ea8e3
Info: downloading layer sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
Info: downloading layer sha256:45b0cb5bfff7921055b3160e463c0cbbd0da8804c54c0e81870e32789de17696
Info: downloading layer sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
Info: downloading layer sha256:31aaf5b382af90e713d7581c352ac81060358c641b90a3708b45268486ae3911
Info: downloading layer sha256:5713db526a481e662cb137cca84372e8433d562ce47cab6f445157cd465a6caf
Info: downloading layer sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
Info: downloading layer sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
Info: downloading layer sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
Info: downloading layer sha256:a8a43101ae4292a3536f04251309008da5dbec2da6fb32802dca83a617d2688e
Info: downloading layer sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
INFO [job compile] /tmp/xzf2cp_y$ udocker \
    --quiet \
    run \
    --volume=/tmp/xzf2cp_y:/GVTkLE \
    --volume=/tmp/6_t2bjcr:/tmp \
    --volume=/tmp/r3oshu1e/Hello.java:/var/lib/cwl/stg4a58e64b-5fa9-4922-8ca6-95a7878dd744/Hello.java \
    --workdir=/GVTkLE \
    --rm \
    --env=TMPDIR=/tmp \
    --env=HOME=/GVTkLE \
    openjdk:9.0.1-11-slim \
    javac \
    -d \
    /GVTkLE \
    /var/lib/cwl/stg4a58e64b-5fa9-4922-8ca6-95a7878dd744/Hello.java
INFO [job compile] Max memory used: 19MiB
INFO [job compile] completed success
INFO [step compile] completed success
INFO [workflow ] completed success
{
    "compiled_class": {
        "location": "file:///home/runner/work/user_guide/user_guide/src/_includes/cwl/workflows/Hello.class",
        "basename": "Hello.class",
        "class": "File",
        "checksum": "sha1$39e3219327347c05aa3e82236f83aa6d77fe6bfd",
        "size": 419,
        "path": "/home/runner/work/user_guide/user_guide/src/_includes/cwl/workflows/Hello.class"
    }
}
INFO Final process status is success

What’s going on here? Let’s break it down:

cwlVersion: v1.0
class: Workflow

The cwlVersion field indicates the version of the CWL spec used by the document. The class field indicates this document describes a workflow.

inputs:
  tarball: File
  name_of_file_to_extract: string

The inputs section describes the inputs of the workflow. This is a list of input parameters where each parameter consists of an identifier and a data type. These parameters can be used as sources for input to specific workflows steps.

outputs:
  compiled_class:
    type: File
    outputSource: compile/classfile

The outputs section describes the outputs of the workflow. This is a list of output parameters where each parameter consists of an identifier and a data type. The outputSource connects the output parameter classfile of the compile step to the workflow output parameter compiled_class.

steps:
  untar:
    run: tar-param.cwl
    in:
      tarfile: tarball
      extractfile: name_of_file_to_extract
    out: [extracted_file]

The steps section describes the actual steps of the workflow. In this example, the first step extracts a file from a tar file, and the second step compiles the file from the first step using the java compiler. Workflow steps are not necessarily run in the order they are listed, instead the order is determined by the dependencies between steps (using source). In addition, workflow steps which do not depend on one another may run in parallel.

The first step, untar runs tar-param.cwl (described previously in Parameter References). This tool has two input parameters, tarfile and extractfile and one output parameter extracted_file.

The in section of the workflow step connects these two input parameters to the inputs of the workflow, tarball and name_of_file_to_extract using source. This means that when the workflow step is executed, the values assigned to tarball and name_of_file_to_extract will be used for the parameters tarfile and extractfile in order to run the tool.

The out section of the workflow step lists the output parameters that are expected from the tool.

  compile:
    run: arguments.cwl
    in:
      src: untar/extracted_file
    out: [classfile]

The second step compile depends on the results from the first step by connecting the input parameter src to the output parameter of untar using untar/extracted_file. It runs arguments.cwl (described previously in Additional Arguments and Parameters). The output of this step classfile is connected to the outputs section for the Workflow, described above.

2.10.2. Nested Workflows#

Workflows are ways to combine multiple tools to perform a larger operations. We can also think of a workflow as being a tool itself; a CWL workflow can be used as a step in another CWL workflow, if the workflow engine supports the SubworkflowFeatureRequirement:

requirements:
  SubworkflowFeatureRequirement: {}

Here’s an example workflow that uses our 1st-workflow.cwl as a nested workflow:

nestedworkflows.cwl#
#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: Workflow

inputs: []

outputs:
  classout:
    type: File
    outputSource: compile/compiled_class

requirements:
  SubworkflowFeatureRequirement: {}

steps:
  compile:
    run: 1st-workflow.cwl
    in:
      tarball: create-tar/tar_compressed_java_file
      name_of_file_to_extract:
        default: "Hello.java"
    out: [compiled_class]

  create-tar:
    in: []
    out: [tar_compressed_java_file]
    run:
      class: CommandLineTool
      requirements:
        InitialWorkDirRequirement:
          listing:
            - entryname: Hello.java
              entry: |
                public class Hello {
                  public static void main(String[] argv) {
                      System.out.println("Hello from Java");
                  }
                }
      inputs: []
      baseCommand: [tar, --create, --file=hello.tar, Hello.java]
      outputs:
        tar_compressed_java_file:
          type: File
          streamable: true
          outputBinding:
            glob: "hello.tar"

Note

Visualization of the workflow and the inner workflow from its `compile` step

This two-step workflow starts with the create-tar step which is connected to the compile step in orange; compile is another workflow, diagrammed on the right. In purple we see the fixed string "Hello.java" being supplied as the name_of_file_to_extract.

Visualization of nestedworkflows.cwl Visualization of 1st-workflow.cwl

A CWL Workflow can be used as a step just like a CommandLineTool, its CWL file is included with run. The workflow inputs (tarball and name_of_file_to_extract) and outputs (compiled_class) then can be mapped to become the step’s input/outputs.

  compile:
    run: 1st-workflow.cwl
    in:
      tarball: create-tar/tar_compressed_java_file
      name_of_file_to_extract:
        default: "Hello.java"
    out: [compiled_class]

Our 1st-workflow.cwl was parameterized with workflow inputs, so when running it we had to provide a job file to denote the tar file and *.java filename. This is generally best-practice, as it means it can be reused in multiple parent workflows, or even in multiple steps within the same workflow.

Here we use default: to hard-code "Hello.java" as the name_of_file_to_extract input, however our workflow also requires a tar file at tarball, which we will prepare in the create-tar step. At this point it is probably a good idea to refactor 1st-workflow.cwl to have more specific input/output names, as those also appear in its usage as a tool.

It is also possible to do a less generic approach and avoid external dependencies in the job file. So in this workflow we can generate a hard-coded Hello.java file using the previously mentioned InitialWorkDirRequirement requirement, before adding it to a tar file.

  create-tar:
    requirements:
      InitialWorkDirRequirement:
        listing:
          - entryname: Hello.java
            entry: |
              public class Hello {
                public static void main(String[] argv) {
                    System.out.println("Hello from Java");
                }
              }

In this case our step can assume Hello.java rather than be parameterized, so we can use hardcoded values hello.tar and Hello.java in a baseCommand and the resulting outputs:

  run:
    class: CommandLineTool
    inputs: []
    baseCommand: [tar, --create, --file=hello.tar, Hello.java]
    outputs:
      tar_compressed_java_file:
        type: File
        streamable: true
        outputBinding:
          glob: "hello.tar"

Did you notice that we didn’t split out the tar --create tool to a separate file, but rather embedded it within the CWL Workflow file? This is generally not best practice, as the tool then can’t be reused. The reason for doing it in this case is because the command line is hard-coded with filenames that only make sense within this workflow.

In this example we had to prepare a tar file outside, but only because our inner workflow was designed to take that as an input. A better refactoring of the inner workflow would be to take a list of Java files to compile, which would simplify its usage as a tool step in other workflows.

Nested workflows can be a powerful feature to generate higher-level functional and reusable workflow units - but just like for creating a CWL Tool description, care must be taken to improve its usability in multiple workflows.

2.10.3. Scattering Steps#

Now that we know how to write workflows, we can start utilizing the ScatterFeatureRequirement. This feature tells the runner that you wish to run a tool or workflow multiple times over a list of inputs. The workflow then takes the input(s) as an array and will run the specified step(s) on each element of the array as if it were a single input. This allows you to run the same workflow on multiple inputs without having to generate many different commands or input yaml files.

requirements:
  ScatterFeatureRequirement: {}

The most common reason a new user might want to use scatter is to perform the same analysis on different samples. Let’s start with a simple workflow that calls our first example (hello_world.cwl) and takes an array of strings as input to the workflow:

scatter-workflow.cwl#
#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: Workflow

requirements:
  ScatterFeatureRequirement: {}

inputs:
  message_array: string[]

steps:
  echo:
    run: hello_world.cwl
    scatter: message
    in:
      message: message_array
    out: []

outputs: []

Aside from the requirements section including ScatterFeatureRequirement, what is going on here?

inputs:
  message_array: string[]

First of all, notice that the main workflow level input here requires an array of strings.

steps:
  echo:
    run: hello_world.cwl
    scatter: message
    in:
      message: message_array
    out: []

Here we’ve added a new field to the step echo called scatter. This field tells the runner that we’d like to scatter over this input for this particular step. Note that the input name listed after scatter is the one of the step’s input, not a workflow level input.

For our first scatter, it’s as simple as that! Since our tool doesn’t collect any outputs, we still use outputs: [] in our workflow, but if you expect that the final output of your workflow will now have multiple outputs to collect, be sure to update that to an array type as well!

Using the following input file:

scatter-job.yml#
message_array: 
  - Hello world!
  - Hola mundo!
  - Bonjour le monde!
  - Hallo welt!

As a reminder, hello_world.cwl simply calls the command echo on a message. If we invoke cwltool scatter-workflow.cwl scatter-job.yml on the command line:

$ cwltool scatter-workflow.cwl scatter-job.yml
INFO /opt/hostedtoolcache/Python/3.9.13/x64/bin/cwltool 3.1.20220913185150
INFO Resolved 'scatter-workflow.cwl' to 'file:///home/runner/work/user_guide/user_guide/src/_includes/cwl/workflows/scatter-workflow.cwl'
INFO [workflow ] start
INFO [workflow ] starting step echo
INFO [step echo] start
INFO [job echo] /tmp/v3p2wjsr$ echo \
    'Hello world!' > /tmp/v3p2wjsr/3f8f180805bc1795e077d75bc6dbec026a376483
INFO [job echo] completed success
INFO [step echo] start
INFO [job echo_2] /tmp/0tkw6zpd$ echo \
    'Hola mundo!' > /tmp/0tkw6zpd/3f8f180805bc1795e077d75bc6dbec026a376483
INFO [job echo_2] completed success
INFO [step echo] start
INFO [job echo_3] /tmp/ieahidj9$ echo \
    'Bonjour le monde!' > /tmp/ieahidj9/3f8f180805bc1795e077d75bc6dbec026a376483
INFO [job echo_3] completed success
INFO [step echo] start
INFO [job echo_4] /tmp/q0g1sqaf$ echo \
    'Hallo welt!' > /tmp/q0g1sqaf/3f8f180805bc1795e077d75bc6dbec026a376483
INFO [job echo_4] completed success
INFO [step echo] completed success
INFO [workflow ] completed success
{}
INFO Final process status is success

You can see that the workflow calls echo multiple times on each element of our message_array. Ok, so how about if we want to scatter over two steps in a workflow?

Let’s perform a simple echo like above, but capturing stdout by adding the following lines instead of outputs: []

hello_world_to_stdout.cwl#
outputs:
  echo_out:
    type: stdout

And add a second step that uses wc to count the characters in each file. See the tool below:

wc-tool.cwl#
#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
baseCommand: wc
arguments: ["-c"]
inputs:
  input_file:
    type: File
    inputBinding:
      position: 1
outputs: []

Now, how do we incorporate scatter? Remember the scatter field is under each step:

scatter-two-steps.cwl#
#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: Workflow

requirements:
 ScatterFeatureRequirement: {}

inputs:
  message_array: string[]

steps:
  echo:
    run: hello_world_to_stdout.cwl
    scatter: message
    in:
      message: message_array
    out: [echo_out]
  wc:
    run: wc-tool.cwl
    scatter: input_file
    in:
      input_file: echo/echo_out
    out: []

outputs: []

Here we have placed the scatter field under each step. This is fine for this example since it runs quickly, but if you’re running many samples for a more complex workflow, you may wish to consider an alternative. Here we are running scatter on each step independently, but since the second step is not dependent on the first step completing all languages, we aren’t using the scatter functionality efficiently. The second step expects an array as input from the first step, so it will wait until everything in step one is finished before doing anything. Pretend that echo Hello World! takes 1 minute to perform, wc -c on the output takes 3 minutes and that echo Hallo welt! takes 5 minutes to perform, and wc on that output takes 3 minutes. Even though echo Hello World! could finish in 4 minutes, it will actually finish in 8 minutes because the first step must wait on echo Hallo welt!. You can see how this might not scale well.

Ok, so how do we scatter on steps that can proceed independent of other samples? Remember from Nested Workflows, that we can make an entire workflow a single step in another workflow! Convert our two-step workflow to a single step subworkflow:

scatter-nested-workflow.cwl#
#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: Workflow

requirements:
 ScatterFeatureRequirement: {}
 SubworkflowFeatureRequirement: {}

inputs:
  message_array: string[]

steps:
  subworkflow:
    run:
      class: Workflow
      inputs:
        message: string
      outputs: []
      steps:
        echo:
          run: hello_world_to_stdout.cwl
          in:
            message: message
          out: [echo_out]
        wc:
          run: wc-tool.cwl
          in:
            input_file: echo/echo_out
          out: []
    scatter: message
    in:
      message: message_array
    out: []
outputs: []

Now the scatter acts on a single step, but that step consists of two steps so each step is performed in parallel.

2.10.4. Conditional workflows#

This workflow contains a conditional step and is executed based on the input. This allows workflows to skip additional steps based on input parameters given at the start of the program or by previous steps.

conditional-workflow.cwl#
class: Workflow
cwlVersion: v1.2
inputs:
  val: int

steps:

  step1:
    in:
      in1: val
      a_new_var: val
    run: foo.cwl
    when: $(inputs.in1 < 1)
    out: [out1]

  step2:
    in:
      in1: val
      a_new_var: val
    run: foo.cwl
    when: $(inputs.a_new_var > 2)
    out: [out1]

outputs:
  out1:
    type: string
    outputSource:
      - step1/out1
      - step2/out1
    pickValue: first_non_null

requirements:
  InlineJavascriptRequirement: {}
  MultipleInputFeatureRequirement: {}

The first thing you’ll notice is that this workflow is only compatible for version 1.2 or greater of the CWL standards.

class: Workflow
cwlVersion: v1.2

The first step of the workflow (step1) contains two input properties and will execute foo.cwl when the conditions are met. The new property when is where the condition validation takes place. In this case only when in1 from the workflow contains a value < 1 this step will be executed.

steps:

  step1:
    in:
      in1: val
      a_new_var: val
    run: foo.cwl
    when: $(inputs.in1 < 1)
    out: [out1]

Using the following command cwltool cond-wf-003.1.cwl --val 0 the value will pass the first conditional step and will therefore be executed and is shown in the log by INFO [step step1] start whereas the second step is skipped as indicated by INFO [step step2] will be skipped.

INFO [workflow ] start
INFO [workflow ] starting step step1
INFO [step step1] start
INFO [job step1] /private/tmp/docker_tmpdcyoto2d$ echo

INFO [job step1] completed success
INFO [step step1] completed success
INFO [workflow ] starting step step2
INFO [step step2] will be skipped
INFO [step step2] completed skipped
INFO [workflow ] completed success
{
    "out1": "foo 0"
}
INFO Final process status is success

When a value of 3 is given the first conditional step will not be executed but the second step will cwltool cond-wf-003.1.cwl --val 3.

INFO [workflow ] start
INFO [workflow ] starting step step1
INFO [step step1] will be skipped
INFO [step step1] completed skipped
INFO [workflow ] starting step step2
INFO [step step2] start
INFO [job step2] /private/tmp/docker_tmpqwr93mxx$ echo

INFO [job step2] completed success
INFO [step step2] completed success
INFO [workflow ] completed success
{
    "out1": "foo 3"
}
INFO Final process status is success

If no conditions are met for example when using --val 2 the workflow will raise a permanentFail.

$ cwltool cond-wf-003.1.cwl --val 2

INFO [workflow ] start
INFO [workflow ] starting step step1
INFO [step step1] will be skipped
INFO [step step1] completed skipped
INFO [workflow ] starting step step2
INFO [step step2] will be skipped
INFO [step step2] completed skipped
ERROR [workflow ] Cannot collect workflow output: All sources for 'out1' are null
INFO [workflow ] completed permanentFail
WARNING Final process status is permanentFail