Skip to content

Issue with Tumor Fraction Prediction Using Pre-trained MethylBERT #18

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
LiJingqi7 opened this issue Mar 27, 2025 · 4 comments
Closed

Comments

@LiJingqi7
Copy link

As described in the data preparation tutorial, fine-tuning the MethylBERT model with pure tumor and normal samples is optional. So I used the pre-trained model from https://huggingface.co/hanyangii/methylbert_hg19_12l to directly predict plasma samples, but it failed to detect tumor signals, meaning the tumor fraction results were all 0 in all samples. Did I choose the wrong pre-trained model, or do I have to fine-tune it with cancer and normal tissue samples to detect the tumor fraction in plasma samples?

@hanyangii
Copy link
Collaborator

Dear @LiJingqi7

Thank you for your interest in MethylBERT.

Although it's written as optional in the tutorial, in your case, you need pure tumour and normal samples for fine-tuning. It'd be helpful for you to understand the pipeline if you read our paper . In the Method section, the pipeline is described in more detail.

Please let me know if the model still does not work for you after fine-tuning.

@LiJingqi7
Copy link
Author

Thank you for getting back to me. I have followed the fine-tuning process using pure tumor and normal samples as suggested. Specifically, I fine-tuned the model using liver cancer tissue samples and their matched normal tissue samples and then used it to predict the tumor fraction in plasma samples. However, after fine-tuning, the model still fails to detect tumor signals, with the predicted tumor fraction remaining at 0. Could you provide any insights into what might be causing this issue?

@hanyangii
Copy link
Collaborator

Hello @LiJingqi7

Sorry for my late reply. This sounds weird to me. Can you share more information about your fine-tuned model?:

  • train, valid accuracy
  • approximated number of reads in the training and a plasma sample.

You can try the estimation without adjustment option and see if the result looks better. Depending on the quality of selected DMRs, it could be the case that the adjustment option hinders an accurate estimation.

@LiJingqi7
Copy link
Author

LiJingqi7 commented Jun 3, 2025

Dear @hanyangii,
A total of 23 paired normal and liver cancer tissue samples were used to fine-tune the model. test_seq.csv contains 240,000 reads, and train_seq.csv contains 800,000 reads. Details of the trained model are provided in the files listed below. Approximately 3265 reads from plasma samples(~3x) were used as input for prediction. Could you please help me check what might be causing the inaccuracy in tumor fraction prediction? The results are shown in the attached merged_deconvolution.csv file.
fine-tuned model eval.csv
fine-tuned model train.csv
train_param.txt

merged_deconvolution.csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants