1.3. Basic Concepts#

This section describes the basic concepts for users to get started working with Common Workflow Language (CWL) workflows. Readers are expected to be familiar with workflow managers, YAML, and comfortable following instructions for the command-line. The other sections of the user guide cover the same concepts but in more detail. If you are already familiar with CWL or looking for more advanced content, you may want to skip this section.

1.3.1. The CWL specification#

CWL is a way to describe command-line tools and connect them together to create workflows. Because CWL is a specification and not a specific piece of software, tools and workflows described using CWL are portable across a variety of platforms that support the CWL standard.

The CWL specification is a document written and maintained by the CWL community. The specification has different versions. The version covered in this user guide is the v1.2.

The specification version can have up to three numbers separated by .’s (dots). The first number is the major release, used for backward-incompatible changes like the removal of deprecated features. The second is the minor release number, used for new features or smaller changes that are backward-compatible. The last number is used for bug fixes, like typos and other corrections to the specification.

Note

The model used for the specification version is called Semantic Versioning. See the end of this section to learn more about it.

1.3.2. Implementations#

An implementation of the CWL specification is any software written following what is defined in a version of the specification document. Implementations may not implement every aspect of the specification. CWL implementations are licensed under both Open Source and commercial licenses.

CWL is well suited for describing large-scale workflows in cluster, cloud and high performance computing environments where tasks are scheduled in parallel across many nodes.

digraph G { compound=true; rankdir="LR"; ranksep=0.75; fontname="Verdana"; fontsize="10"; graph [splines=ortho]; node [fontname="Verdana", fontsize="10", shape=box]; edge [fontname="Verdana", fontsize="10"]; subgraph cluster_0 { label="Implementations"; ranksep=0.25; cwltool; toil; Arvados; runner_others[label="..."]; label="CWL Runners"; } subgraph cluster_1 { label="Tools"; ranksep=0.25; subgraph cluster_2 { "vscode-cwl"; "vim-cwl"; benten; editor_others[label="..."]; label="Editors"; } subgraph cluster_3 { "CWL Viewer"; "vue-cwl"; viewer_others[label="..."]; label="Viewers"; } "And more"; } cwltool -> "CWL Specification" [ltail=cluster_0, dir=back]; "CWL Specification" -> "vscode-cwl" [lhead=cluster_1]; "vscode-cwl" -> "CWL Viewer" [style=invis]; "CWL Viewer" -> "And more" [style=invis]; }

CWL specification, implementations, and other tools.#

1.3.3. Processes and Requirements#

A process is a computing unit that takes inputs and produces outputs. The behavior of a process can be affected by the inputs, requirements, and hints. There are four types of processes defined in the CWL specification v1.2:

  • A command-line tool;

  • An expression tool;

  • An operation;

  • And a workflow.

digraph "A GraphViz graph with the CWL processing units, e.g. Process, Workflow, CommandLineTool, etc." { rankdir="TB"; graph [splines=false]; node [fontname="Verdana", fontsize="10", shape=box]; edge [fontname="Verdana", fontsize="10"]; Process; CommandLineTool; ExpressionTool; Operation; Workflow; node[label="", width=0, height=0]; edge[arrowhead=none]; n1; {rank=same; CommandLineTool; ExpressionTool; Operation; Workflow;} Process -> n1 [arrowhead=normal, dir=back]; n1 -> CommandLineTool; n1 -> ExpressionTool; n1 -> Operation; n1 -> Workflow; }

The processing units available in the CWL objects model.#

A command-line tool is a wrapper for a command-line utility like echo, ls, and tar. A command-line tool can be called from a workflow.

An expression tool is a wrapper for a JavaScript expression. It can be used to simplify workflows and command-line tools, moving common parts of a workflow execution into reusable JavaScript code, that takes inputs and produces outputs like a command-line tool.

The workflow is a process that contains steps. Steps can be other workflows (nested workflows), command-line tools, or expression tools. The inputs of a workflow can be passed to any of its steps, and the outputs produced by its steps can be used in the final output of the workflow.

Operation is an abstract process that also takes inputs, produces outputs, and can be used in a workflow. But it is a special operation not so commonly used. It is discussed in another section.

The CWL specification allows for implementations to provide extra functionality and specify prerequisites to workflows through requirements. There are many requirements defined in the CWL specification, for instance:

  • InlineJavascriptWorkflow, enables JavaScript in expressions.

  • SubworkflowFeatureRequirement, enables nested workflows.

  • InitialWorkDirRequirement, controls staging files in the input directory.

Some CWL runners may provide requirements that are not in the specification. For example, GPU requirements are supported in cwltool through the cwltool:CUDARequirement requirement, but it is not part of the v1.2 specification and may not be supported by other CWL runners.

Hints are similar to requirements, but while requirements list features that are required, hints list optional features. Requirements are explained in detail in another section.

1.3.4. FAIR workflows#

The FAIR principles have laid a foundation for sharing and publishing digital assets and, in particular, data. The FAIR principles emphasize machine accessibility and that all digital assets should be Findable, Accessible, Interoperable, and Reusable. Workflows encode the methods by which the scientific process is conducted and via which data are created. It is thus important that workflows both support the creation of FAIR data and themselves adhere to the FAIR principles. — FAIR Computational Workflows, Workflows Community Initiative.

CWL has roots in “make” and many similar tools that determine order of execution based on dependencies between tasks. However, unlike “make”, CWL tasks are isolated, and you must be explicit about your inputs and outputs.

The benefit of explicitness and isolation are flexibility, portability, and scalability: tools and workflows described with CWL can transparently leverage technologies such as Docker and be used with CWL implementations from different vendors.

cwltool also uses the PROV-O standard ontology for data provenance.

1.3.5. Learn more#