Skip to content

LAYRA is a ready-to-use visual RAG system with a complete web UI built with Next.js and FastAPI, preserving document layout, tables, paragraphs, and graphical elements without any structural fragmentation.

License

Notifications You must be signed in to change notification settings

liweiphys/layra

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

15 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

English | ็ฎ€ไฝ“ไธญๆ–‡

๐ŸŒŒ LAYRA: A Visual-First Retrieval Agent Beyond OCR

Forget tokenization. Forget layout loss.
With pure visual embeddings, LAYRA understands documents like a human โ€” page by page, structure and all.

LAYRA is a next-generation Retrieval-Augmented Generation (RAG) system powered by pure visual embeddings. It treats documents not as sequences of tokens but as visually structured artifacts โ€” preserving layout, semantics, and graphical elements like tables, figures, and charts.

Built for both research exploration and enterprise deployment, LAYRA features:

  • ๐Ÿง‘โ€๐Ÿ’ป Modern frontend stack: TypeScript-based Next.js 15 and TailwindCSS 4.0 โ€” delivering a snappy, responsive, and developer-friendly interface.
  • โš™๏ธ Async-first backend architecture: Built on FastAPI, seamlessly integrated with fully asynchronous components including Redis, MySQL, MongoDB, and MinIO, optimized for high-performance data flow and scalability.
  • ๐Ÿง  Visual-first multimodal foundation: Uses Qwen2.5-VL series as the current default large language model, with plans to support GPT-4o, Claude, Gemini, and other multimodal models in future releases.
  • ๐ŸŽฏ Image-level embedding: Document parsing and visual embedding is powered by the Colpali project โ€” using the colqwen2.5 to convert pages into rich semantic vectors stored in Milvus.

LAYRA aims to be an enterprise-ready, plug-and-play visual RAG platform, bridging unstructured document understanding with multimodal AI.

๐Ÿšง Currently in active development:
The first test version is now available for trial, with support for PDF documents only.
๐Ÿ“š Future releases will add support for more document types (e.g., Word, PPT, Excel, images, Markdown).
๐Ÿ“ˆ For details, see the Roadmap.


๐Ÿ“š Table of Contents


๐Ÿš€ Latest Updates

  • (2025.4.6) First Trial Version Now Available:
    The first testable version of LAYRA has been released! Users can now upload PDF documents, ask questions, and receive layout-aware answers. Weโ€™re excited to see how this feature can help with real-world document understanding.

  • Current Features:

    • PDF batch upload and parsing functionality
    • Visual-first retrieval-augmented generation (RAG) for querying document content
    • Backend fully optimized for scalable data flow with FastAPI, Milvus, Redis, MongoDB, and MinIO
  • Upcoming Features:

    • Expanded document format support (Word, PPT, Excel, and images)
    • Support for additional large models, including GPT-4o and Claude
    • Integration of intelligent agent for multi-hop reasoning and advanced document analysis

Stay tuned for future updates and feature releases!

โ“ Why LAYRA?

Most RAG systems rely on OCR or text-based parsing to process documents. But these approaches:

  • โŒ Lose layout fidelity (columns, tables, hierarchy collapse)
  • โŒ Struggle with non-text visuals (charts, diagrams, figures)
  • โŒ Break semantic continuity due to poor OCR segmentation

LAYRA changes this.

๐Ÿ” It sees each page of the document as a whole โ€” just like a human reader.

By using pure visual embeddings, LAYRA preserves:

  • โœ… Layout structure (headers, lists, sections)
  • โœ… Tabular integrity (rows, columns, merged cells)
  • โœ… Embedded visuals (plots, graphs, stamps, handwriting)
  • โœ… Multi-modal consistency between layout and content

๐Ÿงช First Trial Version Available

โœ… The first testable version of LAYRA is now available!
Upload your own PDF documents, ask questions, and receive layout-aware answers.

You can now explore the first version, which supports PDF uploads and returns questions about your documents with visual layout context.

Screenshots:

  1. Homepage โ€” Get start Demo Screenshot

  2. Knowledge Base โ€” Upload your document and view files Demo Screenshot

  3. Interactive Dialogue โ€” Ask and get layout-preserving answers Demo Screenshot Demo Screenshot


๐Ÿง  System Architecture

LAYRAโ€™s pipeline is designed for async-first, visual-native, and scalable document retrieval and generation.

๐Ÿ” Query Flow

The query goes through embedding โ†’ vector retrieval โ†’ anser generation:

Query Architecture

๐Ÿ“ค Upload & Indexing Flow

PDFs are parsed into images and embedded visually via ColQwen2.5, with metadata and files stored in appropriate databases:

Upload Architecture


โœจ Key Features

Feature Description
๐Ÿง  Visual-First RAG Embeds raw document images without relying on OCR
๐Ÿงพ Layout-Preserving QA Understands tables, headers, and multi-column layouts
๐Ÿ“Š Visual Content Support Parses and reasons over plots, diagrams, and charts
โš™๏ธ Async Document Parsing Background document processing via Kafka
๐Ÿ” Fast Vector Search Powered by Milvus for scalable dense retrieval
๐Ÿค– Flexible LLM Backend Supports Qwen2.5-VL series, and extensible to GPT-4o, Claude 3, etc.
๐ŸŒ Modern Web UI Built with Next.js + Typescript + TailwindCSS + Zustand

๐Ÿงฐ Tech Stack

Frontend:

  • Next.js, TypeScript, TailwindCSS, Zustand

Backend & Infrastructure:

  • FastAPI, Kafka, Redis, MySQL, MongoDB, MinIO, Milvus

Models & RAG:

  • Embedding: colqwen2.5-v0.2
  • LLM Serving: VLM (Qwen2.5-VL series)

๐Ÿš€ Deployment

โ–ถ๏ธ Local Development

# Clone the repo
git clone https://github.com/liweiphys/layra.git
cd layra

# Set up database and FastAPI environment configuration
vim .env
vim web/.env.local 
vim gunicorn_config.py
# Or use default settings

# Launch Milvus, Redis, MongoDB, Kafka, and MinIO via Docker Compose.
cd docker
sudo docker-compose -f milvus-standalone-docker-compose.yml -f docker-compose.yml up -d

# Back to project root
cd ../

# Install Python 3.10.6 and create virtual environment (optional)
# python -m venv venv && source venv/bin/activate
# Or install with conda
conda create --name layra python=3.10
conda activate layra

# Install system dependencies
# For Ubuntu/Debian:
sudo apt-get update && sudo apt-get install -y poppler-utils
# For Fedora/CentOS:
# sudo dnf install -y poppler-utils

# Install dependencies
pip install -r requirements.txt

# Download ColQwen2.5 model weights
# โš ๏ธ If Git LFS not installed, run:
git lfs install

# Download base model weights
git clone https://huggingface.co/vidore/colqwen2.5-base
# For users in China:
# git clone https://hf-mirror.com/vidore/colqwen2.5-base

# Download LoRA fine-tuned weights
git clone https://huggingface.co/vidore/colqwen2.5-v0.2
# For users in China:
# git clone https://hf-mirror.com/vidore/colqwen2.5-v0.2

# Modify the `base_model_name_or_path` field in `colqwen2.5-v0.2/adapter_config.json`
base_model_name_or_path="/absolute/path/to/colqwen2.5-base"
# Set it to local path of colqwen2.5-base

# Set the following in your .env file
COLBERT_MODEL_PATH="/absolute/path/to/colqwen2.5-v0.2"

# Initialize MySQL database
alembic init migrations
cp env.py migrations
alembic revision --autogenerate -m "Init Mysql"
alembic upgrade head

# Start backend with Gunicorn
gunicorn -c gunicorn_config.py app.main:app
# http://localhost:8000

# Start ColBERT embedding model server
python model_server.py

# Frontend development
cd web
npm install
npm run dev  
# http://localhost:3000

# Or build frontend (recommended)
# cd web
# npm install
# npm run build
# npm start  # http://localhost:3000

๐Ÿงช Note: Milvus, Redis, MongoDB, Kafka, and MinIO are expected to run locally or via Docker.

๐ŸŽ‰ Enjoy!

Now that everything is up and running, enjoy exploring and building with Layra! ๐Ÿš€

โ–ถ๏ธ Future Deployment Options

In the future, we will support multiple deployment methods including Docker, Kubernetes (K8s), and other containerized environments. More details will be provided when these deployment options are available.


๐Ÿ“š Use Cases

  • ๐Ÿงพ Intelligent document QA: Contracts, invoices, scanned reports
  • ๐Ÿ› Policy/legal documents: Structure-rich PDF understanding
  • ๐Ÿญ Industrial manuals: OCR-unfriendly layouts, tables, flowcharts
  • ๐Ÿ“ˆ Visual analytics: Trend analysis from plots and charts

๐Ÿ“ฆ Roadmap

  • Knowledge Base PDF batch upload and parsing functionality
  • RAG-based dialogue system for querying and answering
  • Support openai-compatible API interface๏ผˆollamaใ€sglangใ€vllm๏ผ‰
  • Code architecture and modular optimization for scalability
  • Support for additional large models
  • Expanded document format support (e.g., Word, PPT, Excel)
  • Integration of intelligent Agent for multi-hop reasoning
  • Integration with knowledge graph
  • Deployment with Docker Compose
  • Public Knowledge Base API access

๐Ÿค Contributing

Contributions are welcome! Feel free to open an issue or pull request if youโ€™d like to contribute.
We are in the process of creating a CONTRIBUTING.md file, which will provide guidelines for code contributions, issue reporting, and best practices. Stay tuned!


๐Ÿ“ซ Contact

liweiphys
๐Ÿ“ง liweixmu@foxmail.com
๐Ÿ™ github.com/liweiphys/layra
๐Ÿ“บ bilibili: Biggestbiaoge
๐Ÿ” ๅพฎไฟกๅ…ฌไผ—ๅท๏ผšLAYRA้กน็›ฎ
๐Ÿ’ผ Available for hire โ€” open to new opportunities!


๐ŸŒŸ Star History

Star History Chart


๐Ÿ“„ License

This project is licensed under the Apache License 2.0. See the LICENSE file for more details.


LAYRA sees what OCR cannot. It reads documents like we do โ€” visually, structurally, holistically.

About

LAYRA is a ready-to-use visual RAG system with a complete web UI built with Next.js and FastAPI, preserving document layout, tables, paragraphs, and graphical elements without any structural fragmentation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published