sam: keep otherTags when adding a duplicate reference on BAM read#154
sam: keep otherTags when adding a duplicate reference on BAM read#154ckingsford wants to merge 1 commit intobiogo:masterfrom
Conversation
|
Thanks for sending this. Can you post a small reproducer so that I can understand the issue better? |
|
Sure, here: mappings.bam.gz is a .bam file with just the header where references each have a It's the package main
import (
"fmt"
"os"
"github.com/biogo/hts/bam"
"github.com/biogo/hts/sam"
)
func main() {
f, _ := os.Open("mappings.bam")
b, _ := bam.NewReader(f, 4)
tag := sam.NewTag("DS")
for _, r := range b.Header().Refs() {
// expected: NAME LEN (D) or NAME LEN (T) when run on attached .bam (see samtools view -H mappings.bam)
// result: NAME LEN ()
fmt.Printf("%s %d (%s)\n", r.Name(), r.Len(), r.Get(tag))
}
} |
|
OK. I think this is correct fix for the fast path that's used in deserialisation, but I don't think it properly handles the case when a user is adding a constructed Looking at |
This PR fixes an issue where the BAM reader (via
bam.DecodeBinary) did not preserve additional tags for reference entries in the header. If, for example, a reference had aDS:footag, it would not be preserved in the final header after decoding.This occurred because the BAM format lists the references in two places: the "text" portion and a binary-encoded list of references. The reading of the binary-encoded list (in
bam.DecodeBinary) overrode some parts of the references read via the text portion and did not preserve theotherTagsfield of references that had already been read in the text portion of the header.The fix is in
sam.Header.AddReference: when adding a reference that has already been seen, if the reference being added has a nilotherTags, use the existing tags. This follows the pattern already used inAddReferencefor themd5,assemID, etc. fields.