Issue with Tumor Fraction Prediction Using Pre-trained MethylBERT #18

LiJingqi7 · 2025-03-27T03:45:58Z

As described in the data preparation tutorial, fine-tuning the MethylBERT model with pure tumor and normal samples is optional. So I used the pre-trained model from https://huggingface.co/hanyangii/methylbert_hg19_12l to directly predict plasma samples, but it failed to detect tumor signals, meaning the tumor fraction results were all 0 in all samples. Did I choose the wrong pre-trained model, or do I have to fine-tune it with cancer and normal tissue samples to detect the tumor fraction in plasma samples?

hanyangii · 2025-03-27T09:22:34Z

Dear @LiJingqi7

Thank you for your interest in MethylBERT.

Although it's written as optional in the tutorial, in your case, you need pure tumour and normal samples for fine-tuning. It'd be helpful for you to understand the pipeline if you read our paper . In the Method section, the pipeline is described in more detail.

Please let me know if the model still does not work for you after fine-tuning.

LiJingqi7 · 2025-04-02T03:14:24Z

Thank you for getting back to me. I have followed the fine-tuning process using pure tumor and normal samples as suggested. Specifically, I fine-tuned the model using liver cancer tissue samples and their matched normal tissue samples and then used it to predict the tumor fraction in plasma samples. However, after fine-tuning, the model still fails to detect tumor signals, with the predicted tumor fraction remaining at 0. Could you provide any insights into what might be causing this issue?

hanyangii · 2025-04-17T06:20:32Z

Hello @LiJingqi7

Sorry for my late reply. This sounds weird to me. Can you share more information about your fine-tuned model?:

train, valid accuracy
approximated number of reads in the training and a plasma sample.

You can try the estimation without adjustment option and see if the result looks better. Depending on the quality of selected DMRs, it could be the case that the adjustment option hinders an accurate estimation.

LiJingqi7 · 2025-06-03T04:04:42Z

Dear @hanyangii,
A total of 23 paired normal and liver cancer tissue samples were used to fine-tune the model. test_seq.csv contains 240,000 reads, and train_seq.csv contains 800,000 reads. Details of the trained model are provided in the files listed below. Approximately 3265 reads from plasma samples（~3x） were used as input for prediction. Could you please help me check what might be causing the inaccuracy in tumor fraction prediction? The results are shown in the attached merged_deconvolution.csv file.
fine-tuned model eval.csv
fine-tuned model train.csv
train_param.txt

merged_deconvolution.csv

LiJingqi7 closed this as completed Jun 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue with Tumor Fraction Prediction Using Pre-trained MethylBERT #18

Issue with Tumor Fraction Prediction Using Pre-trained MethylBERT #18

LiJingqi7 commented Mar 27, 2025

hanyangii commented Mar 27, 2025

Uh oh!

LiJingqi7 commented Apr 2, 2025

Uh oh!

hanyangii commented Apr 17, 2025

Uh oh!

LiJingqi7 commented Jun 3, 2025 •

edited

Loading

Uh oh!

Issue with Tumor Fraction Prediction Using Pre-trained MethylBERT #18

Issue with Tumor Fraction Prediction Using Pre-trained MethylBERT #18

Comments

LiJingqi7 commented Mar 27, 2025

hanyangii commented Mar 27, 2025

Uh oh!

LiJingqi7 commented Apr 2, 2025

Uh oh!

hanyangii commented Apr 17, 2025

Uh oh!

LiJingqi7 commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LiJingqi7 commented Jun 3, 2025 •

edited

Loading