-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Report
Trying to create a report from a modisco run (like with crested.tl.modisco.tfmodisco(..., report=True) with the default motif meme database (from crested.get_motif_db()) doesn't work, since it tries to write files with forward slashes and/or names that are too long in general.
First error
I'm getting this error when using the files, either from the command line with modisco report, as a separate call with modiscolite.report.report_motifs or from crested.tl.modisco.tfmodisco (which calls report_motifs):
File "/lustre1/project/stg_00002/lcb/cblaauw/software/mambaforge/envs/crested_py311_tf216_v4/lib/python3.11/site-packages/modiscolite/report.py", line 221, in _plot_weights
plt.savefig(path)
[...]
FileNotFoundError: [Errno 2] No such file or directory: '/lustre1/project/stg_00002/lcb/cblaauw/python_modules/CREsted/docs/tutorials/modisco_results_ft_2000/Astro_report/metacluster_18.2.dbtfbs__SMAD1_GM12878_ENCSR813DCK_merged_N1 Foxa1/2/3::Lcorl::Nfatc1::Nfia::Nfib::Nfic::Nfix::Nr2e1::Smad1::Tlx1::Yy1.png'
I just updated modiscolite as well, so I'm on the most current github main branch (v2.4.0).
The directory exists, so that can't be the reason this is failing. After some trial and error, I found out it's the forward slashes in the motif names. You could argue these should be cleaned up in the filename creation in modisco, but we should also just not distribute a file that has motif names with forward slashes.
Here's an example motif name from the MEME file:
[...]
MOTIF tfdimers__MD00225 Bptf::Cdx2::Cebpb::Crx::Dbx2::Ep300::Ets2::Fos::Gfi1::Gfi1b::Hltf::Hnf4a::Hnf4g::Hoxa10/13::Ikzf1::Isl1::Nkx6-2::Nr2f1/2::Otx1/2::Pax4/7::Pbx1::Pdx1::Pitx3::Pou2f1::Pou5f1::Smad4::Taf6::Tbp::Tfap2c
[...]
Second error
After cleaning up the meme database's forward slashes, I ran into the next issue:
OSError: [Errno 36] File name too long: '/lustre1/project/stg_00002/lcb/cblaauw/python_modules/CREsted/docs/tutorials/modisco_results_ft_2000/Astro_report/metacluster_118.1.jaspar__MA0132.2 Alx134::Arx::Barhl12::Barx12::Bsx::Cphx1::Dbx12::Dlx12345::Dmbx1::Elf1::Emx12::En12::Esx1::Evx12::Fli1::Gbx12::Gsx12::Hesx1::Hmx1::Hoxa1234567::Hoxb12345678::Hoxc458::Hoxd1348::Isl12::Isx::Lbx12::Lhx12345689::Lmx1a::Lmx1b::Meox12::Mixl1::Mnx1::Msx123::Mycs::Nanog::Nkx1-12::Nkx2-69::Nkx6-123::Nobox::Noto::Pax347::Pdx1::Pou1f1::Pou2f23::Pou3f124::Pou6f1::Prrx12::Prrxl1::Rax::Rhox610::Shox2::Tcf7l1::Tead4::Uncx::Vax12::Vsx1::Zfhx2.png'
Removing the TF name info from the motifs seems to fix things:
awk '/^MOTIF/ { $0 = $1 " " $2 } { print }' motif_db.meme > motif_db_cleannames.meme
But, yknow, the very reason we include this file is because we want to run this modisco report function, so it should probably work.
Version information
No response