# Best practices Below are a set of recommended good practices to keep in mind when writing a Common Workflow Language description for a tool or workflow. These guidelines are presented for consideration on a scale of usefulness: more is better, not all are required. - No `type: string` parameters for names of input or reference files/directories; use `type: File` or `type: Directory` as appropriate. - Include a license that allows for re-use by anyone, e.g. [Apache 2.0][apache-license]. If possible, the license should be specified with its corresponding [SPDX identifier][spdx]. Construct the metadata field for the licence by providing a URL of the form `https://spdx.org/licenses/[SPDX-ID]` where `SPDX-ID` is the taken from the list of identifiers linked above. See the example snippet below for guidance. For non-standard licenses without an SPDX identifier, provide a URL to the license. _Example of metadata field for license with SPDX identifier:_ ```cwl $namespaces: s: https://schema.org/ s:license: https://spdx.org/licenses/Apache-2.0 # other s: declarations ``` For more examples of providing metadata within CWL descriptions, see the Metadata and Authorship section of this User Guide. - Include [attribution information][license-example] for the author(s) of the CWL tool or workflow description. Use unambiguous identifiers like [ORCID][orcid]. - In tool descriptions, list dependencies using short name(s) under `SoftwareRequirement`. - Include [SciCrunch][scicrunch] identifiers for dependencies in `https://identifiers.org/rrid/RRID:SCR_NNNNNN` format. - All `input` and `output` identifiers should reflect their conceptual identity. Use informative names like `unaligned_sequences`, `reference_genome`, `phylogeny`, or `aligned_sequences` instead of `foo_input`, `foo_file`, `result`, `input`, `output`, and so forth. - In tool descriptions, include a list of version(s) of the tool that are known to work with this description under `SoftwareRequirement`. - `format` should be specified for all input and output `File`s. Bioinformatics tools should use format identifiers from [EDAM][edam-example]. See also `iana:text/plain`, `iana:text/tab-separated-values` with `$namespaces: { iana: "https://www.iana.org/assignments/media-types/" }`. [Full IANA media type list][iana-types] (also known as MIME types). For non-bioinformatics tools use or build an appropriate ontology/controlled vocabulary in the same way. Please edit this page to let us know about it. - Mark all input and output `File`s that are read from or written to in a streaming compatible way (only once, no random-access), as `streamable: true`. - Each `CommandLineTool` description should focus on a single operation only, even if the (sub)command is capable of more. Don't overcomplicate your tool descriptions with options that you don't need/use. - Custom types should be defined with one external YAML per type definition for re-use. - Include a top level short `label` summarising the tool/workflow. - If useful, include a top level `doc` as well. This should provide a longer, more detailed description than was provided in the top level `label` (see above). - Use `type: enum` instead of `type: string` for elements with a fixed list of valid values. - Evaluate all use of JavaScript for possible elimination or replacement. One common example: manipulating `File` names and paths? Consider whether one of the [built in `File` properties][file-prop] like `basename`, `nameroot`, `nameext`, etc., could be used instead. - Give the tool description to a colleague (preferably at a different institution) to test and provide feedback. - Complex workflows with individual components which can be abstracted should utilise the [`SubworkflowFeatureRequirement`][subworkflow] to make their workflow modular and allow sections of them to be easily reused. - Software containers should be made to be conformant to the ["Recommendations for the packaging and containerizing of bioinformatics software"][containers] (also useful to other disciplines). [containers]: https://doi.org/10.12688/f1000research.15140.1 [apache-license]: https://spdx.org/licenses/Apache-2.0.html [license-example]: https://github.com/ProteinsWebTeam/ebi-metagenomics-cwl/blob/master/workflows/emg-assembly.cwl#L200 [scicrunch]: https://scicrunch.org [edam-example]: http://edamontology.org/format_1915 [iana-types]: https://www.iana.org/assignments/media-types/media-types.xhtml [file-prop]: https://www.commonwl.org/v1.0/CommandLineTool.html#File [orcid]: https://orcid.org [subworkflow]: https://www.commonwl.org/v1.0/Workflow.html#SubworkflowFeatureRequirement [spdx]: https://spdx.org/licenses/ % TODO % % - Writing CWL workflows (include existing docs from https://github.com/common-workflow-library/cwl-patterns/blob/main/README.md) % - FAIR best practices with CWL % - Docker best practices with CWL - https://github.com/common-workflow-language/common-workflow-language/issues/347