Skip to content

Use Biocontainers API for creating modules #875

@ewels

Description

@ewels

Putting down an idea into an issue for nf-core create so I don't forget (but probably too much work to get into PR #869).

Biocontainers itself has quite a nice API that we can use. It's documented here: https://api.biocontainers.pro/ga4gh/trs/v2/ui/#/GA4GH/tools_id_get

For example, we can query MultiQC:

curl -X GET "https://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc" -H  "accept: application/json"
JSON Response
{
  "contains": [],
  "description": "Multiqc aggregates results from multiple bioinformatics analyses across many samples into a single report. it searches a given directory for analysis logs and compiles a html report. i is a general use tool, perfect for summarising the output from numerous bioinformatics tools.",
  "id": "multiqc",
  "identifiers": [
    "biotools:multiqc",
    "PMID:27312411"
  ],
  "license": "GPL-3.0",
  "name": "multiqc",
  "organization": "biocontainers",
  "pulls": 3602004,
  "tool_tags": [
    "High-Throughput Nucleotide Sequencing",
    "Quality Control",
    "Computational Biology",
    "Sequencing",
    "Bioinformatics",
    "RNA-Seq",
    "Transcriptomics"
  ],
  "tool_url": "https://github.com/ewels/MultiQC",
  "toolclass": {
    "description": "CommandLineTool",
    "id": "0",
    "name": "CommandLineTool"
  },
  "url": "http://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc",
  "versions": [
    {
      "id": "multiqc-1.0",
      "meta_version": "1.0",
      "name": "multiqc",
      "url": "http://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc/versions/multiqc-1.0"
    },
    {
      "id": "multiqc-1.5",
      "meta_version": "1.5",
      "name": "multiqc",
      "url": "http://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc/versions/multiqc-1.5"
    },
    {
      "id": "multiqc-1.4",
      "meta_version": "1.4",
      "name": "multiqc",
      "url": "http://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc/versions/multiqc-1.4"
    },
    {
      "id": "multiqc-0.9.1a0",
      "meta_version": "0.9.1a0",
      "name": "multiqc",
      "url": "http://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc/versions/multiqc-0.9.1a0"
    },
    {
      "id": "multiqc-1.3",
      "meta_version": "1.3",
      "name": "multiqc",
      "url": "http://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc/versions/multiqc-1.3"
    },
    {
      "id": "multiqc-1.6a0",
      "meta_version": "1.6a0",
      "name": "multiqc",
      "url": "http://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc/versions/multiqc-1.6a0"
    },
    {
      "id": "multiqc-1.5a",
      "meta_version": "1.5a",
      "name": "multiqc",
      "url": "http://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc/versions/multiqc-1.5a"
    },
    {
      "id": "multiqc-1.2",
      "meta_version": "1.2",
      "name": "multiqc",
      "url": "http://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc/versions/multiqc-1.2"
    },
    {
      "id": "multiqc-1.1",
      "meta_version": "1.1",
      "name": "multiqc",
      "url": "http://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc/versions/multiqc-1.1"
    },
    {
      "id": "multiqc-1.7",
      "meta_version": "1.7",
      "name": "multiqc",
      "url": "http://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc/versions/multiqc-1.7"
    },
    {
      "id": "multiqc-1.6",
      "meta_version": "1.6",
      "name": "multiqc",
      "url": "http://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc/versions/multiqc-1.6"
    },
    {
      "id": "multiqc-1.8",
      "meta_version": "1.8",
      "name": "multiqc",
      "url": "http://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc/versions/multiqc-1.8"
    },
    {
      "id": "multiqc-1.9",
      "meta_version": "1.9",
      "name": "multiqc",
      "url": "http://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc/versions/multiqc-1.9"
    }
  ]
}

Using this API call gives us several things in a single shot:

  • Version information
  • Tool description
  • Tool homepage
  • Biocontainers URL
  • Tool identifiers
  • Licence

It also gives URLs for each version which we can query (_NOTE: It lists http but this doesn't work, needs to be https).

For example, MultiQC 1.9:

curl -X GET "https://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc/versions/multiqc-1.9" -H  "accept: application/json"
JSON Response
{
  "id": "multiqc-1.9",
  "images": [
    {
      "downloads": 48596,
      "image_name": "multiqc==1.9--pyh9f0ad1d_0",
      "image_type": "Conda",
      "registry_host": "http://anaconda.org/",
      "size": 862231,
      "updated": "2020-05-30T00:00:00Z"
    },
    {
      "downloads": 0,
      "image_name": "quay.io/biocontainers/multiqc:1.9--pyh9f0ad1d_0",
      "image_type": "Docker",
      "registry_host": "quay.io/",
      "size": 194294593,
      "updated": "2020-05-30T00:00:00Z"
    },
    {
      "image_name": "https://depot.galaxyproject.org/singularity/multiqc:1.9--pyh9f0ad1d_0",
      "image_type": "Singularity",
      "registry_host": "depot.galaxyproject.org/singularity/",
      "size": 189788160,
      "updated": "2020-05-31T04:44:00Z"
    },
    {
      "downloads": 48596,
      "image_name": "multiqc==1.9--py_1",
      "image_type": "Conda",
      "registry_host": "http://anaconda.org/",
      "size": 862231,
      "updated": "2020-05-30T00:00:00Z"
    },
    {
      "downloads": 0,
      "image_name": "quay.io/biocontainers/multiqc:1.9--py_1",
      "image_type": "Docker",
      "registry_host": "quay.io/",
      "size": 179981913,
      "updated": "2020-07-28T00:00:00Z"
    },
    {
      "image_name": "https://depot.galaxyproject.org/singularity/multiqc:1.9--py_1",
      "image_type": "Singularity",
      "registry_host": "depot.galaxyproject.org/singularity/",
      "size": 176119808,
      "updated": "2020-07-29T06:19:00Z"
    }
  ],
  "meta_version": "1.9",
  "name": "multiqc",
  "url": "http://api.biocontainers.pro/ga4gh/trs/v2/tools/multiqc/versions/multiqc-1.9"
}

This gives us:

  • Details of each build for a given version (as there can be many)
  • Conda, Docker and Singularity URLs in one shot (with no guessing)
  • Doesn't assume a given image registry (nearly everything is quay.io now, but that could change in the future)
  • Metadata: updated timestamps, size, download

My thought is that we could query this when running nf-core modules create instead of bioconda / quay.io. I think that this would be more accurate as well as giving us a bunch of additional information to put into meta.yml about the tool.

Ideally, we could use either use an exact build tag provided on the command line (fail if not found) or use a questionary select list as done in nf-core launch. This would be very precise (select only from first versions, then builds that are available) and also super user-friendly.

Phil

Metadata

Metadata

Assignees

No one assigned

    Labels

    command line toolsAnything to do with the cli interfaces

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions