Skip to content

Conversation

@pinin4fjords
Copy link
Member

@pinin4fjords pinin4fjords commented Dec 2, 2025

Summary

HISAT2 uses .ht2l extension instead of .ht2 for large genome indices. The current module only detects .ht2 indices, causing failures with large genomes.

Changes

Updated the index detection command from:

INDEX=`find -L ./ -name "*.1.ht2" | sed 's/\.1.ht2$//'`

To:

INDEX=`find -L ./ -name "*.1.ht2*" | sed 's/\.1.ht2.*$//'`

This uses a wildcard glob to match both .1.ht2 and .1.ht2l extensions in a single pattern.

Related

Related to nf-core/rnaseq#1643

Testing notes

Automated testing of large genome indices (.ht2l) is not feasible because:

  1. Large genome indices require >4GB reference genomes to be generated by hisat2-build
  2. Simply renaming .ht2 files to .ht2l doesn't work - the file formats are fundamentally different

Manual verification: When .ht2l files are present, the updated find command correctly:

  • Finds files matching *.1.ht2l
  • Extracts the correct index base path via sed
  • HISAT2 correctly invokes hisat2-align-l (the large-index variant)

The existing tests continue to pass, verifying no regression for standard .ht2 indices.

🤖 Generated with Claude Code

HISAT2 uses .ht2l extension instead of .ht2 for large genomes.
Updated index detection to match both extensions.

Related to nf-core/rnaseq#1643

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@pinin4fjords pinin4fjords force-pushed the fix/hisat2-large-genome-index branch from 412dfda to 0553755 Compare December 2, 2025 11:29
@pinin4fjords pinin4fjords added this pull request to the merge queue Dec 2, 2025
Merged via the queue into master with commit 5ec0e05 Dec 2, 2025
49 checks passed
@pinin4fjords pinin4fjords deleted the fix/hisat2-large-genome-index branch December 2, 2025 12:20
pinin4fjords added a commit to nf-core/rnaseq that referenced this pull request Dec 2, 2025
- Update to latest upstream hisat2/align module which includes support
  for large genome indices (.ht2l extension) via nf-core/modules#9493
- Regenerate patch file for rnaseq-specific changes (contaminant_screening)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
vagkaratzas pushed a commit that referenced this pull request Dec 8, 2025
HISAT2 uses .ht2l extension instead of .ht2 for large genomes.
Updated index detection to match both extensions.

Related to nf-core/rnaseq#1643

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants