2.17. File formats#

Tools and workflows can take File types as input and produce them as output. We also recommend indicating the format for File types. This helps document for others how to use your tool while allowing you to do some simple type-checking when creating parameter files.

For file formats, we recommend referencing existing ontologies (like EDAM in our example), reference a local ontology for your institution, or do not add a file format initially for quick development before sharing your tool with others. You can browse existing file format listings for IANA here and for EDAM here.

In the next tutorial, we explain the $namespaces and $schemas section of the document in greater detail, so don’t worry about these for now.

Note that for added value cwltool can do some basic reasoning based on file formats and warn you if there seem to be some obvious mismatches.

metadata_example.cwl#
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool

label: An example tool demonstrating metadata.

inputs:
  aligned_sequences:
    type: File
    label: Aligned sequences in BAM format
    format: edam:format_2572
    inputBinding:
      position: 1

baseCommand: [ wc, -l ]

stdout: output.txt

outputs:
  report:
    type: stdout
    format: edam:format_1964
    label: A text file that contains a line count

$namespaces:
  edam: http://edamontology.org/
$schemas:
  - http://edamontology.org/EDAM_1.18.owl

The equivalent of this CWL description in command line format is:

$ wc -l /path/to/aligned_sequences.ext > output.txt

2.17.1. Sample Parameter Files#

Below is an example of a parameter file for the example above. We encourage checking in working examples of parameter files for your tool. This allows others to quickly work with your tool, starting from a “known good” parameterization.

sample.yml#
aligned_sequences:
    class: File
    format: http://edamontology.org/format_2572
    path: file-formats.bam

Note: To follow the example below, you need to download the example input file, file-formats.bam. The file is available from https://github.com/common-workflow-language/user_guide/raw/main/_includes/cwl/file-formats/file-formats.bam and can be downloaded e.g. via wget:

$ wget https://github.com/common-workflow-language/user_guide/raw/main/_includes/cwl/file-formats/file-formats.bam

Now invoke cwltool with the tool description and the input object on the command line:

$ cwltool metadata_example.cwl sample.yml
INFO /opt/hostedtoolcache/Python/3.9.13/x64/bin/cwltool 3.1.20220913185150
INFO Resolved 'metadata_example.cwl' to 'file:///home/runner/work/user_guide/user_guide/src/_includes/cwl/file-formats/metadata_example.cwl'
INFO [job metadata_example.cwl] /tmp/84q07fzm$ wc \
    -l \
    /tmp/8zqh50_d/stg6d60e98e-d904-4d35-9a1d-0945fec0ba89/file-formats.bam > /tmp/84q07fzm/output.txt
INFO [job metadata_example.cwl] completed success
{
    "report": {
        "location": "file:///home/runner/work/user_guide/user_guide/src/_includes/cwl/file-formats/output.txt",
        "basename": "output.txt",
        "class": "File",
        "checksum": "sha1$6e2c9b019eaf728b40b9d7e275b30239ef5c8eb5",
        "size": 77,
        "format": "http://edamontology.org/format_1964",
        "path": "/home/runner/work/user_guide/user_guide/src/_includes/cwl/file-formats/output.txt"
    }
}
INFO Final process status is success