Skip to content

Tetraploid species #201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Homap opened this issue May 19, 2025 · 5 comments
Open

Tetraploid species #201

Homap opened this issue May 19, 2025 · 5 comments
Labels
genomescope included smudgeplot included if smudgeplot was posted with the quesiton / problem

Comments

@Homap
Copy link

Homap commented May 19, 2025

Hello,

Thank you so much for the great software and the time you put in helping us with our questions.

I have troubles understanding my smudgeplot. I have used following commands to generate it

FastK -v -t4 -k31 -M16 -T4 $fastq_filtered_dir/P18758_173_S83_L004_R1_001.filtered.fastq.gz $fastq_filtered_dir/P18758_173_S83_L004_R2_001.filtered.fastq.gz -NBAT4x_FastK_db
Histex -G BAT4x_FastK_db > BAT4x_Kmer31.hist
genomescope2 -i BAT4x_Kmer31.hist -k 31 -p 4 -o BAT4x -n BAT4x
smudgeplot.py hetmers -L 5 -t 4 -o BAT4x --verbose ../FastK_db/BAT4x_FastK_db
smudgeplot.py plot -o BAT4x_smudgeplot_2D BAT4x_smudgeplot_masked_errors_smu.txt BAT4x_smudgeplot_smudge_sizes.txt 13.2

and it look like this:

Image
Image

BAT4x_smudgeplot_2D_10thtry_smudgeplot_log10.pdf

Now, I know already of ploidy from flow cytometry data (Tetraploidy). The histograms makes sense, I think. I was wondering whether this is indicative of autopolyploidy or allopolyploidy? I was also wondering about the smudgeplot, do you think it's worth going with this analysis given my low haploid coverage of about 13X?

Our main goal here is to try to differentiate between modes of ploidy, if auto- or allo-. I'd appreciate your help very much with this.

Thank you in advance,
Homa

@Homap
Copy link
Author

Homap commented May 19, 2025

Sorry, another question. I downloaded smudgeplot using conda. However, I cannot get the top and right-side histograms. I tried to add these myself but I think I haven't been totally successful yet. I see in other examples of smudgeplot that these graphs are also produced. I also tried copy and paste the code from the github into the conda installation but it still didn't work. Thanks so much for your help again!

@KamilSJaron KamilSJaron added smudgeplot included if smudgeplot was posted with the quesiton / problem genomescope included labels May 20, 2025
@KamilSJaron
Copy link
Owner

Hi, this indeed look like a tetraploid. It's one of those funny cases that are hard to make anything out of - what is the species? Is it sexual?

Hannes Becher developed some explicit expectations for auto- and allo- tetraploid k-mer spectra. In his model, it always needs to be auto- when the first peak is the tallest: https://www.cell.com/plant-communications/fulltext/S2590-3462(20)30133-4?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS2590346220301334%3Fshowall%3Dtrue

However, there are two caveats to that - first, hist model is a symmetrical allo- so AABB style genomes. If you had a hybrid that would be AAAB the expectation would break down (e.g. in case of root knot nematodes) and the second caveat is that you need to trust that you know where the 1n peak is - is your genome size est in the right ballpark? Is the heterozygosity sensible? What is the species?

@Homap
Copy link
Author

Homap commented May 20, 2025

Hi Kamil, Thank you so much for your quick responses and the time you put in this. The organism is a flowering plant called Lithophragma bolanderi. The tetraploidy nature of it was determined using flow cytometry. We are puzzled by it because originally the ITS sequences, sequenced using Sanger, showed an allopolyploidy origin. Excited, we sequenced the parental diploids and the polyploids but the genome-wide data fail to show any sign of allopolyploidy. In the PCA, polyploids cluster with one of the parents only, in STRUCTURE, the same. The reviewers asked for the kmer analysis using GenomeScope2. The genome size reported based on flowcytometry is about 470 Mb, similar to the one reported by GenomeScope2, however, our assembly size is about 780 Mb. Sorry, it's all have been a bit confusing. I'd appreciate any advice you may have. Thank you so much!

@KamilSJaron
Copy link
Owner

"The genome size reported based on flowcytometry is about 470 Mb"

  • what is this number? 1C? 2C? Or do you divide it by ploidy?

Well the reviewer is quite right that you need to be cautious about what you are actually looking at - if you have uncollapsed haplotypes and you call variants and do STRUCTURE, that will be a disaster.

Your coverage / genome model / genome assembly MUST make sense together.

Did you read through that wiki? Give it a few days of playing around the models, perhaps look at the BGA tutorial about genomescope too. I am sorry, I don't have capacity to help with this more right now... I am preparing for a k-mer course we are running 1.-6. of June...

@Homap
Copy link
Author

Homap commented May 20, 2025

This is already great! Thank you so much and good luck with the course!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
genomescope included smudgeplot included if smudgeplot was posted with the quesiton / problem
Projects
None yet
Development

No branches or pull requests

2 participants