Skip to content

Included motif db incompatible with modisco reports #193

@casblaauw

Description

@casblaauw

Report

Trying to create a report from a modisco run (like with crested.tl.modisco.tfmodisco(..., report=True) with the default motif meme database (from crested.get_motif_db()) doesn't work, since it tries to write files with forward slashes and/or names that are too long in general.

First error

I'm getting this error when using the files, either from the command line with modisco report, as a separate call with modiscolite.report.report_motifs or from crested.tl.modisco.tfmodisco (which calls report_motifs):

File "/lustre1/project/stg_00002/lcb/cblaauw/software/mambaforge/envs/crested_py311_tf216_v4/lib/python3.11/site-packages/modiscolite/report.py", line 221, in _plot_weights
    plt.savefig(path)
[...]
FileNotFoundError: [Errno 2] No such file or directory: '/lustre1/project/stg_00002/lcb/cblaauw/python_modules/CREsted/docs/tutorials/modisco_results_ft_2000/Astro_report/metacluster_18.2.dbtfbs__SMAD1_GM12878_ENCSR813DCK_merged_N1 Foxa1/2/3::Lcorl::Nfatc1::Nfia::Nfib::Nfic::Nfix::Nr2e1::Smad1::Tlx1::Yy1.png'

I just updated modiscolite as well, so I'm on the most current github main branch (v2.4.0).

The directory exists, so that can't be the reason this is failing. After some trial and error, I found out it's the forward slashes in the motif names. You could argue these should be cleaned up in the filename creation in modisco, but we should also just not distribute a file that has motif names with forward slashes.

Here's an example motif name from the MEME file:

[...]
MOTIF tfdimers__MD00225 Bptf::Cdx2::Cebpb::Crx::Dbx2::Ep300::Ets2::Fos::Gfi1::Gfi1b::Hltf::Hnf4a::Hnf4g::Hoxa10/13::Ikzf1::Isl1::Nkx6-2::Nr2f1/2::Otx1/2::Pax4/7::Pbx1::Pdx1::Pitx3::Pou2f1::Pou5f1::Smad4::Taf6::Tbp::Tfap2c
[...]

Second error

After cleaning up the meme database's forward slashes, I ran into the next issue:

OSError: [Errno 36] File name too long: '/lustre1/project/stg_00002/lcb/cblaauw/python_modules/CREsted/docs/tutorials/modisco_results_ft_2000/Astro_report/metacluster_118.1.jaspar__MA0132.2 Alx134::Arx::Barhl12::Barx12::Bsx::Cphx1::Dbx12::Dlx12345::Dmbx1::Elf1::Emx12::En12::Esx1::Evx12::Fli1::Gbx12::Gsx12::Hesx1::Hmx1::Hoxa1234567::Hoxb12345678::Hoxc458::Hoxd1348::Isl12::Isx::Lbx12::Lhx12345689::Lmx1a::Lmx1b::Meox12::Mixl1::Mnx1::Msx123::Mycs::Nanog::Nkx1-12::Nkx2-69::Nkx6-123::Nobox::Noto::Pax347::Pdx1::Pou1f1::Pou2f23::Pou3f124::Pou6f1::Prrx12::Prrxl1::Rax::Rhox610::Shox2::Tcf7l1::Tead4::Uncx::Vax12::Vsx1::Zfhx2.png'

Removing the TF name info from the motifs seems to fix things:

awk '/^MOTIF/ { $0 = $1 " " $2 } { print }' motif_db.meme > motif_db_cleannames.meme

But, yknow, the very reason we include this file is because we want to run this modisco report function, so it should probably work.

Version information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions