2.10. Workflows#
A workflow is a CWL processing unit that executes command-line tools,
expression tools, or workflows (sub-workflows) as steps. It must have
inputs
, outputs
, and steps
defined in the CWL document.
The CWL document echo-uppercase.cwl
defines a workflow that runs
the command-line tool, and the expression tool showed in the earlier
examples.
cwlVersion: v1.2
class: Workflow
requirements:
InlineJavascriptRequirement: {}
inputs:
message: string
outputs:
out:
type: string
outputSource: uppercase/uppercase_message
steps:
echo:
run: echo.cwl
in:
message: message
out: [out]
uppercase:
run: uppercase.cwl
in:
message:
source: echo/out
out: [uppercase_message]
A command-line tool or expression tool can also be written directly
in the same CWL document as the workflow. For example, we can rewrite
the echo-uppercase.cwl
workflow as a single file:
cwlVersion: v1.2
class: Workflow
requirements:
InlineJavascriptRequirement: {}
inputs:
message: string
outputs:
out:
type: string
outputSource: uppercase/uppercase_message
steps:
echo:
run:
class: CommandLineTool
baseCommand: echo
stdout: output.txt
inputs:
message:
type: string
inputBinding: {}
outputs:
out:
type: string
outputBinding:
glob: output.txt
loadContents: true
outputEval: $(self[0].contents)
in:
message: message
out: [out]
uppercase:
run:
class: ExpressionTool
requirements:
InlineJavascriptRequirement: {}
inputs:
message: string
outputs:
uppercase_message: string
expression: |
${ return {"uppercase_message": inputs.message.toUpperCase()}; }
in:
message:
source: echo/out
out: [uppercase_message]
Having separate files helps with modularity and code organization. But
it can be helpful writing everything in a single file for development.
There are other ways to combine multiple files into a single file
(e.g. cwltool --pack
) discussed further in other sections of this
user guide.
Note
For a sub-workflows you need to enable the requirement
SubworkflowFeatureRequirement
. It is covered in another section
of this user guide in more detail.
2.10.1. Writing Workflows#
This workflow extracts a java source file from a tar file and then compiles it.
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: Workflow
inputs:
tarball: File
name_of_file_to_extract: string
outputs:
compiled_class:
type: File
outputSource: compile/classfile
steps:
untar:
run: tar-param.cwl
in:
tarfile: tarball
extractfile: name_of_file_to_extract
out: [extracted_file]
compile:
run: arguments.cwl
in:
src: untar/extracted_file
out: [classfile]
Use a YAML or a JSON object in a separate file to describe the input of a run:
tarball:
class: File
path: hello.tar
name_of_file_to_extract: Hello.java
Next, create a sample Java file and add it to a tar file to use with the command-line tool.
$ echo "public class Hello {}" > Hello.java && tar -cvf hello.tar Hello.java
Hello.java
Now invoke cwltool
with the tool description and the input object on the
command line:
$ cwltool 1st-workflow.cwl 1st-workflow-job.yml
INFO /opt/hostedtoolcache/Python/3.9.13/x64/bin/cwltool 3.1.20220913185150
INFO Resolved '1st-workflow.cwl' to 'file:///home/runner/work/user_guide/user_guide/src/_includes/cwl/workflows/1st-workflow.cwl'
INFO [workflow ] start
INFO [workflow ] starting step untar
INFO [step untar] start
INFO [job untar] /tmp/r3oshu1e$ tar \
--extract \
--file \
/tmp/_norg0cf/stg5d0e8759-d441-40dc-be46-17e11829a7a3/hello.tar \
Hello.java
INFO [job untar] completed success
INFO [step untar] completed success
INFO [workflow ] starting step compile
INFO [step compile] start
INFO ['udocker', 'pull', 'openjdk:9.0.1-11-slim']
Info: downloading layer sha256:8d602e635a7063b254ddcd64997153b2e8f74c29ff4648089ae116a4ca3ea8e3
Info: downloading layer sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
Info: downloading layer sha256:45b0cb5bfff7921055b3160e463c0cbbd0da8804c54c0e81870e32789de17696
Info: downloading layer sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
Info: downloading layer sha256:31aaf5b382af90e713d7581c352ac81060358c641b90a3708b45268486ae3911
Info: downloading layer sha256:5713db526a481e662cb137cca84372e8433d562ce47cab6f445157cd465a6caf
Info: downloading layer sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
Info: downloading layer sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
Info: downloading layer sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
Info: downloading layer sha256:a8a43101ae4292a3536f04251309008da5dbec2da6fb32802dca83a617d2688e
Info: downloading layer sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
INFO [job compile] /tmp/xzf2cp_y$ udocker \
--quiet \
run \
--volume=/tmp/xzf2cp_y:/GVTkLE \
--volume=/tmp/6_t2bjcr:/tmp \
--volume=/tmp/r3oshu1e/Hello.java:/var/lib/cwl/stg4a58e64b-5fa9-4922-8ca6-95a7878dd744/Hello.java \
--workdir=/GVTkLE \
--rm \
--env=TMPDIR=/tmp \
--env=HOME=/GVTkLE \
openjdk:9.0.1-11-slim \
javac \
-d \
/GVTkLE \
/var/lib/cwl/stg4a58e64b-5fa9-4922-8ca6-95a7878dd744/Hello.java
INFO [job compile] Max memory used: 19MiB
INFO [job compile] completed success
INFO [step compile] completed success
INFO [workflow ] completed success
{
"compiled_class": {
"location": "file:///home/runner/work/user_guide/user_guide/src/_includes/cwl/workflows/Hello.class",
"basename": "Hello.class",
"class": "File",
"checksum": "sha1$39e3219327347c05aa3e82236f83aa6d77fe6bfd",
"size": 419,
"path": "/home/runner/work/user_guide/user_guide/src/_includes/cwl/workflows/Hello.class"
}
}
INFO Final process status is success
What’s going on here? Let’s break it down:
cwlVersion: v1.0
class: Workflow
The cwlVersion
field indicates the version of the CWL spec used by the
document. The class
field indicates this document describes a workflow.
inputs:
tarball: File
name_of_file_to_extract: string
The inputs
section describes the inputs of the workflow. This is a
list of input parameters where each parameter consists of an identifier
and a data type. These parameters can be used as sources for input to
specific workflows steps.
outputs:
compiled_class:
type: File
outputSource: compile/classfile
The outputs
section describes the outputs of the workflow. This is a
list of output parameters where each parameter consists of an identifier
and a data type. The outputSource
connects the output parameter classfile
of the compile
step to the workflow output parameter compiled_class
.
steps:
untar:
run: tar-param.cwl
in:
tarfile: tarball
extractfile: name_of_file_to_extract
out: [extracted_file]
The steps
section describes the actual steps of the workflow. In this
example, the first step extracts a file from a tar file, and the second
step compiles the file from the first step using the java compiler.
Workflow steps are not necessarily run in the order they are listed,
instead the order is determined by the dependencies between steps (using
source
). In addition, workflow steps which do not depend on one
another may run in parallel.
The first step, untar
runs tar-param.cwl
(described previously in
Parameter References).
This tool has two input parameters, tarfile
and extractfile
and one output
parameter extracted_file
.
The in
section of the workflow step connects these two input parameters to
the inputs of the workflow, tarball
and name_of_file_to_extract
using
source
. This means that when the workflow step is executed, the values
assigned to tarball
and name_of_file_to_extract
will be used for the
parameters tarfile
and extractfile
in order to run the tool.
The out
section of the workflow step lists the output parameters that are
expected from the tool.
compile:
run: arguments.cwl
in:
src: untar/extracted_file
out: [classfile]
The second step compile
depends on the results from the first step by
connecting the input parameter src
to the output parameter of untar
using
untar/extracted_file
. It runs arguments.cwl
(described previously in
Additional Arguments and Parameters).
The output of this step classfile
is connected to the
outputs
section for the Workflow, described above.
2.10.2. Nested Workflows#
Workflows are ways to combine multiple tools to perform a larger operations.
We can also think of a workflow as being a tool itself; a CWL workflow can be
used as a step in another CWL workflow, if the workflow engine supports the
SubworkflowFeatureRequirement
:
requirements:
SubworkflowFeatureRequirement: {}
Here’s an example workflow that uses our 1st-workflow.cwl
as a nested
workflow:
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: Workflow
inputs: []
outputs:
classout:
type: File
outputSource: compile/compiled_class
requirements:
SubworkflowFeatureRequirement: {}
steps:
compile:
run: 1st-workflow.cwl
in:
tarball: create-tar/tar_compressed_java_file
name_of_file_to_extract:
default: "Hello.java"
out: [compiled_class]
create-tar:
in: []
out: [tar_compressed_java_file]
run:
class: CommandLineTool
requirements:
InitialWorkDirRequirement:
listing:
- entryname: Hello.java
entry: |
public class Hello {
public static void main(String[] argv) {
System.out.println("Hello from Java");
}
}
inputs: []
baseCommand: [tar, --create, --file=hello.tar, Hello.java]
outputs:
tar_compressed_java_file:
type: File
streamable: true
outputBinding:
glob: "hello.tar"
Note
Visualization of the workflow and the inner workflow from its `compile` step
This two-step workflow starts with the create-tar
step which is connected to
the compile
step in orange; compile
is another workflow, diagrammed on the
right. In purple we see the fixed string "Hello.java"
being supplied as the
name_of_file_to_extract
.
A CWL Workflow
can be used as a step
just like a CommandLineTool
, its CWL
file is included with run
. The workflow inputs (tarball
and name_of_file_to_extract
) and outputs
(compiled_class
) then can be mapped to become the step’s input/outputs.
compile:
run: 1st-workflow.cwl
in:
tarball: create-tar/tar_compressed_java_file
name_of_file_to_extract:
default: "Hello.java"
out: [compiled_class]
Our 1st-workflow.cwl
was parameterized with workflow inputs, so when running
it we had to provide a job file to denote the tar file and *.java
filename.
This is generally best-practice, as it means it can be reused in multiple parent
workflows, or even in multiple steps within the same workflow.
Here we use default:
to hard-code "Hello.java"
as the name_of_file_to_extract
input, however our workflow also requires a tar file at tarball
, which we will
prepare in the create-tar
step. At this point it is probably a good idea to refactor
1st-workflow.cwl
to have more specific input/output names, as those also
appear in its usage as a tool.
It is also possible to do a less generic approach and avoid external
dependencies in the job file. So in this workflow we can generate a hard-coded
Hello.java
file using the previously mentioned InitialWorkDirRequirement
requirement, before adding it to a tar file.
create-tar:
requirements:
InitialWorkDirRequirement:
listing:
- entryname: Hello.java
entry: |
public class Hello {
public static void main(String[] argv) {
System.out.println("Hello from Java");
}
}
In this case our step can assume Hello.java
rather than be parameterized, so
we can use hardcoded values hello.tar
and Hello.java
in a baseCommand
and
the resulting outputs
:
run:
class: CommandLineTool
inputs: []
baseCommand: [tar, --create, --file=hello.tar, Hello.java]
outputs:
tar_compressed_java_file:
type: File
streamable: true
outputBinding:
glob: "hello.tar"
Did you notice that we didn’t split out the tar --create
tool to a separate file,
but rather embedded it within the CWL Workflow file? This is generally not best
practice, as the tool then can’t be reused. The reason for doing it in this case
is because the command line is hard-coded with filenames that only make sense
within this workflow.
In this example we had to prepare a tar file outside, but only because our inner workflow was designed to take that as an input. A better refactoring of the inner workflow would be to take a list of Java files to compile, which would simplify its usage as a tool step in other workflows.
Nested workflows can be a powerful feature to generate higher-level functional and reusable workflow units - but just like for creating a CWL Tool description, care must be taken to improve its usability in multiple workflows.
2.10.3. Scattering Steps#
Now that we know how to write workflows, we can start utilizing the ScatterFeatureRequirement
.
This feature tells the runner that you wish to run a tool or workflow multiple times over a list
of inputs. The workflow then takes the input(s) as an array and will run the specified step(s)
on each element of the array as if it were a single input. This allows you to run the same workflow
on multiple inputs without having to generate many different commands or input yaml files.
requirements:
ScatterFeatureRequirement: {}
The most common reason a new user might want to use scatter is to perform the same analysis on
different samples. Let’s start with a simple workflow that calls our first example
(hello_world.cwl
) and takes an array of strings as input to the workflow:
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: Workflow
requirements:
ScatterFeatureRequirement: {}
inputs:
message_array: string[]
steps:
echo:
run: hello_world.cwl
scatter: message
in:
message: message_array
out: []
outputs: []
Aside from the requirements
section including ScatterFeatureRequirement
, what is
going on here?
inputs:
message_array: string[]
First of all, notice that the main workflow level input here requires an array of strings.
steps:
echo:
run: hello_world.cwl
scatter: message
in:
message: message_array
out: []
Here we’ve added a new field to the step echo
called scatter
. This field tells the
runner that we’d like to scatter over this input for this particular step. Note that
the input name listed after scatter is the one of the step’s input, not a workflow level input.
For our first scatter, it’s as simple as that! Since our tool doesn’t collect any outputs, we
still use outputs: []
in our workflow, but if you expect that the final output of your
workflow will now have multiple outputs to collect, be sure to update that to an array type
as well!
Using the following input file:
message_array:
- Hello world!
- Hola mundo!
- Bonjour le monde!
- Hallo welt!
As a reminder, hello_world.cwl
simply calls the command
echo
on a message. If we invoke cwltool scatter-workflow.cwl scatter-job.yml
on the
command line:
$ cwltool scatter-workflow.cwl scatter-job.yml
INFO /opt/hostedtoolcache/Python/3.9.13/x64/bin/cwltool 3.1.20220913185150
INFO Resolved 'scatter-workflow.cwl' to 'file:///home/runner/work/user_guide/user_guide/src/_includes/cwl/workflows/scatter-workflow.cwl'
INFO [workflow ] start
INFO [workflow ] starting step echo
INFO [step echo] start
INFO [job echo] /tmp/v3p2wjsr$ echo \
'Hello world!' > /tmp/v3p2wjsr/3f8f180805bc1795e077d75bc6dbec026a376483
INFO [job echo] completed success
INFO [step echo] start
INFO [job echo_2] /tmp/0tkw6zpd$ echo \
'Hola mundo!' > /tmp/0tkw6zpd/3f8f180805bc1795e077d75bc6dbec026a376483
INFO [job echo_2] completed success
INFO [step echo] start
INFO [job echo_3] /tmp/ieahidj9$ echo \
'Bonjour le monde!' > /tmp/ieahidj9/3f8f180805bc1795e077d75bc6dbec026a376483
INFO [job echo_3] completed success
INFO [step echo] start
INFO [job echo_4] /tmp/q0g1sqaf$ echo \
'Hallo welt!' > /tmp/q0g1sqaf/3f8f180805bc1795e077d75bc6dbec026a376483
INFO [job echo_4] completed success
INFO [step echo] completed success
INFO [workflow ] completed success
{}
INFO Final process status is success
You can see that the workflow calls echo multiple times on each element of our
message_array
. Ok, so how about if we want to scatter over two steps in a workflow?
Let’s perform a simple echo like above, but capturing stdout
by adding the following
lines instead of outputs: []
outputs:
echo_out:
type: stdout
And add a second step that uses wc
to count the characters in each file. See the tool
below:
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool
baseCommand: wc
arguments: ["-c"]
inputs:
input_file:
type: File
inputBinding:
position: 1
outputs: []
Now, how do we incorporate scatter? Remember the scatter field is under each step:
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: Workflow
requirements:
ScatterFeatureRequirement: {}
inputs:
message_array: string[]
steps:
echo:
run: hello_world_to_stdout.cwl
scatter: message
in:
message: message_array
out: [echo_out]
wc:
run: wc-tool.cwl
scatter: input_file
in:
input_file: echo/echo_out
out: []
outputs: []
Here we have placed the scatter field under each step. This is fine for this example since
it runs quickly, but if you’re running many samples for a more complex workflow, you may
wish to consider an alternative. Here we are running scatter on each step independently, but
since the second step is not dependent on the first step completing all languages, we aren’t
using the scatter functionality efficiently. The second step expects an array as input from
the first step, so it will wait until everything in step one is finished before doing anything.
Pretend that echo Hello World!
takes 1 minute to perform, wc -c
on the output takes 3 minutes
and that echo Hallo welt!
takes 5 minutes to perform, and wc
on that output takes 3 minutes.
Even though echo Hello World!
could finish in 4 minutes, it will actually finish in 8 minutes
because the first step must wait on echo Hallo welt!
. You can see how this might not scale
well.
Ok, so how do we scatter on steps that can proceed independent of other samples? Remember from Nested Workflows, that we can make an entire workflow a single step in another workflow! Convert our two-step workflow to a single step subworkflow:
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: Workflow
requirements:
ScatterFeatureRequirement: {}
SubworkflowFeatureRequirement: {}
inputs:
message_array: string[]
steps:
subworkflow:
run:
class: Workflow
inputs:
message: string
outputs: []
steps:
echo:
run: hello_world_to_stdout.cwl
in:
message: message
out: [echo_out]
wc:
run: wc-tool.cwl
in:
input_file: echo/echo_out
out: []
scatter: message
in:
message: message_array
out: []
outputs: []
Now the scatter acts on a single step, but that step consists of two steps so each step is performed in parallel.
2.10.4. Conditional workflows#
This workflow contains a conditional step and is executed based on the input. This allows workflows to skip additional steps based on input parameters given at the start of the program or by previous steps.
class: Workflow
cwlVersion: v1.2
inputs:
val: int
steps:
step1:
in:
in1: val
a_new_var: val
run: foo.cwl
when: $(inputs.in1 < 1)
out: [out1]
step2:
in:
in1: val
a_new_var: val
run: foo.cwl
when: $(inputs.a_new_var > 2)
out: [out1]
outputs:
out1:
type: string
outputSource:
- step1/out1
- step2/out1
pickValue: first_non_null
requirements:
InlineJavascriptRequirement: {}
MultipleInputFeatureRequirement: {}
The first thing you’ll notice is that this workflow is only compatible for version 1.2 or greater of the CWL standards.
class: Workflow
cwlVersion: v1.2
The first step of the workflow (step1) contains two input properties and will execute foo.cwl when the conditions are met. The new property when
is where the condition validation takes place. In this case only when in1
from the workflow contains a value < 1
this step will be executed.
steps:
step1:
in:
in1: val
a_new_var: val
run: foo.cwl
when: $(inputs.in1 < 1)
out: [out1]
Using the following command cwltool cond-wf-003.1.cwl --val 0
the value will pass the first conditional step and will therefore be executed and is shown in the log by INFO [step step1] start
whereas the second step is skipped as indicated by INFO [step step2] will be skipped
.
INFO [workflow ] start
INFO [workflow ] starting step step1
INFO [step step1] start
INFO [job step1] /private/tmp/docker_tmpdcyoto2d$ echo
INFO [job step1] completed success
INFO [step step1] completed success
INFO [workflow ] starting step step2
INFO [step step2] will be skipped
INFO [step step2] completed skipped
INFO [workflow ] completed success
{
"out1": "foo 0"
}
INFO Final process status is success
When a value of 3 is given the first conditional step will not be executed but the second step will cwltool cond-wf-003.1.cwl --val 3
.
INFO [workflow ] start
INFO [workflow ] starting step step1
INFO [step step1] will be skipped
INFO [step step1] completed skipped
INFO [workflow ] starting step step2
INFO [step step2] start
INFO [job step2] /private/tmp/docker_tmpqwr93mxx$ echo
INFO [job step2] completed success
INFO [step step2] completed success
INFO [workflow ] completed success
{
"out1": "foo 3"
}
INFO Final process status is success
If no conditions are met for example when using --val 2
the workflow will raise a permanentFail.
$ cwltool cond-wf-003.1.cwl --val 2
INFO [workflow ] start
INFO [workflow ] starting step step1
INFO [step step1] will be skipped
INFO [step step1] completed skipped
INFO [workflow ] starting step step2
INFO [step step2] will be skipped
INFO [step step2] completed skipped
ERROR [workflow ] Cannot collect workflow output: All sources for 'out1' are null
INFO [workflow ] completed permanentFail
WARNING Final process status is permanentFail