ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation
- 📜 Introduction
- ⚙️ Environment setup
- 🔨 Data construction
- 🏋🏻♂️ ReaRAG Training
- 🤖️ Inference
- 📝 Citation
conda create --name rearag python=3.10 -y && conda activate rearag
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
pip install vllm==0.6.5
pip install datasets flask langid uvicorn termcolor jieba fuzzywuzzy rouge
conda install -c pytorch -c nvidia faiss-gpu=1.8.0
Please note that the RAG engine provided here differs from the implementation described in the original paper. To facilitate public usage, we offer a practical and simplified version in this repository.
To setup RAG engine, first download the following:
- E5-base-v2 Retriever. 🤗 Link
- 2018 Wikipedia Corpus. 🤗 Link
- Indexed 2018 Wikipedia Corpus. FlashRAG_Dataset/retrieval_corpus/wiki18_100w_e5_index.zip
- Modify the config in
ReaRAG/deploy/deploy_config.sh
andReaRAG/deploy/retriever_config.yaml
- Run the deployment script, make sure you see the phrase
'xx running on http://{host}:{port}'
to confirm deployment:
# From within ReaRAG/deploy/
bash deploy_rag_engine.sh
Before starting, make sure you have deployed the rag_engine
(see RAG Engine Deployment). Then, follow the steps below to deploy a LLM/LRM (e.g., QwQ-32b-preview
) for data construction:
- Modify the environment variables in
ReaRAG/deploy/deploy_async.sh
. - Run the deployment script, make sure you see the phrase
'xx running on http://{host}:{port}'
to confirm deployment:
# From within ReaRAG/deploy/
bash deploy_async.sh
We construct data from HotpotQA, MuSiQue, and NQ. Therefore, make sure you have downloaded them and processed them into following structure:
{
"question": "What is the capital of ...",
"answer": "The capital of xxx is ...",
}
Next, modify the config in ReaRAG/src_data/data_config.yaml
. Then, execute script below:
# From within ReaRAG/
bash data_construct.sh
The result of the data will be saved at ReaRAG/src_data/data
, named conv_qwq.json
for example, where each data is a list of conversation, structured as below:
{
"messages": [{"role": "user", "content": "..."},
{"role": "assistant", "reasoning": "..."},
{"role": "observation", "content": "..."}, ...]
}
During sft, the loss is computed only on messages that contain the reasoning
key, rather than the content
key.
Training data can be found from (🤗 huggingface).
You can mix it with general SFT data such as ShareGPT. We adopt Metragon-LM for model training. For a more lightweight implementation, you may adopt the code and environment from LongAlign.
Before starting, make sure you have deployed the rag_engine
(see RAG Engine Deployment). Then, follow the steps below to deploy
ReaRAG:
- Modify the config in
ReaRAG/deploy/deploy_config.sh
. - Run the deployment script, make sure you see the phrase
'xx running on http://{host}:{port}'
to confirm deployment:
# From within ReaRAG/deploy/
bash deploy.sh
Next, modify the config in ReaRAG/infer.sh
. Then, execute script below:
# From within ReaRAG/
bash infer.sh
If you find our work useful, please consider citing ReaRAG:
@article{lee2025rearag,
title={ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation},
author={Lee, Zhicheng and Cao, Shulin and Liu, Jinxin and Zhang, Jiajie and Liu, Weichuan and Che, Xiaoyin and Hou, Lei and Li, Juanzi},
journal={arXiv preprint arXiv:2503.21729},
year={2025}
}