2.16. Best practices#
Below are a set of recommended good practices to keep in mind when writing a Common Workflow Language description for a tool or workflow. These guidelines are presented for consideration on a scale of usefulness: more is better, not all are required.
No
type: string
parameters for names of input or reference files/directories; usetype: File
ortype: Directory
as appropriate.Include a license that allows for re-use by anyone, e.g. Apache 2.0. If possible, the license should be specified with its corresponding SPDX identifier. Construct the metadata field for the licence by providing a URL of the form
https://spdx.org/licenses/[SPDX-ID]
whereSPDX-ID
is the taken from the list of identifiers linked above. See the example snippet below for guidance. For non-standard licenses without an SPDX identifier, provide a URL to the license.Example of metadata field for license with SPDX identifier:
$namespaces: s: https://schema.org/ s:license: https://spdx.org/licenses/Apache-2.0 # other s: declarations
For more examples of providing metadata within CWL descriptions, see the Metadata and Authorship section of this User Guide.
Include attribution information for the author(s) of the CWL tool or workflow description. Use unambiguous identifiers like ORCID.
In tool descriptions, list dependencies using short name(s) under
SoftwareRequirement
.Include SciCrunch identifiers for dependencies in
https://identifiers.org/rrid/RRID:SCR_NNNNNN
format.All
input
andoutput
identifiers should reflect their conceptual identity. Use informative names likeunaligned_sequences
,reference_genome
,phylogeny
, oraligned_sequences
instead offoo_input
,foo_file
,result
,input
,output
, and so forth.In tool descriptions, include a list of version(s) of the tool that are known to work with this description under
SoftwareRequirement
.format
should be specified for all input and outputFile
s. Bioinformatics tools should use format identifiers from EDAM. See alsoiana:text/plain
,iana:text/tab-separated-values
with$namespaces: { iana: "https://www.iana.org/assignments/media-types/" }
. Full IANA media type list (also known as MIME types). For non-bioinformatics tools use or build an appropriate ontology/controlled vocabulary in the same way. Please edit this page to let us know about it.Mark all input and output
File
s that are read from or written to in a streaming compatible way (only once, no random-access), asstreamable: true
.Each
CommandLineTool
description should focus on a single operation only, even if the (sub)command is capable of more. Don’t overcomplicate your tool descriptions with options that you don’t need/use.Custom types should be defined with one external YAML per type definition for re-use.
Include a top level short
label
summarising the tool/workflow.If useful, include a top level
doc
as well. This should provide a longer, more detailed description than was provided in the top levellabel
(see above).Use
type: enum
instead oftype: string
for elements with a fixed list of valid values.Evaluate all use of JavaScript for possible elimination or replacement. One common example: manipulating
File
names and paths? Consider whether one of the built inFile
properties likebasename
,nameroot
,nameext
, etc., could be used instead.Give the tool description to a colleague (preferably at a different institution) to test and provide feedback.
Complex workflows with individual components which can be abstracted should utilise the
SubworkflowFeatureRequirement
to make their workflow modular and allow sections of them to be easily reused.Software containers should be made to be conformant to the “Recommendations for the packaging and containerizing of bioinformatics software” (also useful to other disciplines).