Skip to content

BGZip slow performance near end of chromosomes #153

Open
@davmlaw

Description

@davmlaw

It can take over a minute to retrieve a few bases:

$ time faidx --no-rebuild Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz 7:117199563-117199564
>7:117199563-117199564
GG

real	1m8.888s
user	0m31.042s
sys	0m37.785s

Low coordinates are fine:

$time faidx --no-rebuild Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz 7:10000-11000 > /dev/null

real	0m0.295s
user	0m0.241s
sys	0m0.056s

You said in a previous issue:

there is still a large performance penalty for fetching small substrings near the end of a record, and I'll open an issue to remind me to explore a solution.

I can't find that issue, so am raising this one. Good luck!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions