MiLiC-Eval: Benchmarking Multilingual LLMs for China's Minority Languages (ACL'25 Findings)

MiLiC-Eval is an NLP evaluation suite for Minority Languages in China, covering Tibetan (bo), Uyghur (ug), Kazakh (kk, in the Kazakh Arabic script), and Mongolian (mn, in the traditional Mongolian script).

📑 Preprint

📊 HuggingFace

Statistics

Tasks

Currently, MiLiC-Eval consists of 9 tasks and 4 languages, with 24K instances. The statistics of each task are shown in the following table.

Task	Size	Metric	Languages
Vocabulary Understanding	1,000/lang	Accuracy	bo, ug, kk, mn
Topic Classification (Sentence)	492/lang	Accuracy	bo, ug, kk, mn, zh, en
Topic Classification (Passage)	600/lang	Accuracy	bo, ug, kk, mn
Reading Comprehension	250/lang	Accuracy	bo, ug, kk, mn, zh, en
Response Selection	507/lang	Accuracy	bo, ug, kk, mn, zh, en
Title Generation	1,000/lang	ROUGE-L	bo, ug, kk, mn
Machine Translation (Article)	1,012/lang	chrF++	bo, ug, kk, mn, zh, en
Machine Translation (Dialogue)	773/lang	chrF++	bo, ug, kk, mn, zh, en
Math Reasoning	250/lang	Accuracy	bo, ug, kk, mn, zh, en

Data Splits

For each task, we provide a data split, including training, development, and test sets.

The training sets are small and used for in-context learning. For each task, we provide three training sets sampled with different seeds, to reduce the impact of randomness during prompting.

The development sets are used for hyperparameter tuning. The test sets are used for evaluation.

For each language, the data split is shown in the following table.

Task	Train	Dev	Test
Vocabulary Understanding	20 * 3	40	900
Topic Classification (Sentence)	10 * 3	30	432
Topic Classification (Passage)	16 * 3	48	504
Reading Comprehension	10 * 3	20	200
Response Selection	20 * 3	40	407
Title Generation	20 * 3	40	900
Machine Translation (Article)	20 * 3	40	912
Machine Translation (Dialogue)	20 * 3	40	673
Math Reasoning	10 * 3	20	200

Usage

Download

The dataset can be downloaded from Hugging Face. Put the downloaded dataset in the data directory.

Setup

Install the packages required for inference by running:

pip install -r requirements.txt

Install the package required by multilingual ROUGE scoring. See https://github.com/csebuetnlp/xl-sum/tree/master/multilingual_rouge_scoring
Run the scripts for inference and metric calculation (with Qwen-2.5 as an example):

cd scripts
bash run_eval.sh
bash calculate_metrics.sh

The evaluation results will be saved in the output directory.

Pretraining Corpus

Current LLMs have limited performance in minority languages due to the lack of pretraining data. We provide a pretraining corpus, MC^2 for the four languages in MiLiC-Eval.

The corpus can be downloaded from Hugging Face. You can read the details of the corpus in our paper MC^2: Towards Transparent and Culturally-Aware NLP for Minority Languages in China (ACL 2024).

Citation

If you use MiLiC-Eval in your research, please cite our GitHub repository:

@article{zhang2025milic,
      title={MiLiC-Eval: Benchmarking Multilingual LLMs for China's Minority Languages}, 
      author={Zhang, Chen and Tao, Mingxu and Liao, Zhiyuan and Feng, Yansong },
      journal={arXiv preprint arXiv:2503.01150},
      year={2025},
      url={https://arxiv.org/abs/2503.01150}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
script		script
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
aggregate_results.py		aggregate_results.py
eval.py		eval.py
infer.py		infer.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MiLiC-Eval: Benchmarking Multilingual LLMs for China's Minority Languages (ACL'25 Findings)

Statistics

Tasks

Data Splits

Usage

Download

Setup

Pretraining Corpus

Citation

About

Uh oh!

Uh oh!

Languages

License

luciusssss/MiLiC-Eval

Folders and files

Latest commit

History

Repository files navigation

MiLiC-Eval: Benchmarking Multilingual LLMs for China's Minority Languages (ACL'25 Findings)

Statistics

Tasks

Data Splits

Usage

Download

Setup

Pretraining Corpus

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages