Skip to content

EPUB, PDF, DOCX, MD, and TXT file text to speech document reader. Read documents in realtime with high-quality TTS; or extract audiobooks. Use your own Kokoro TTS API or Open AI API endpoint.

License

Notifications You must be signed in to change notification settings

richardr1126/OpenReader-WebUI

Repository files navigation

GitHub Stars GitHub Forks GitHub Watchers GitHub Issues GitHub Last Commit GitHub Release

Discussions

OpenReader WebUI πŸ“„πŸ”Š

OpenReader WebUI is a document reader with Text-to-Speech capabilities, offering a TTS read along experience with narration for EPUB, PDF, TXT, MD, and DOCX documents. It can use any OpenAI compatible TTS endpoint, including Kokoro-FastAPI and Orpheus-FastAPI

Highly available demo currently available at https://openreader.richardr.dev/

  • 🎯 TTS API Integration:
    • Compatible with OpenAI text to speech API and GPT-4o Mini TTS, Kokoro-FastAPI TTS, Orpheus FastAPI or any other compatible service
    • Support for TTS models (tts-1, tts-1-hd, gpt-4o-mini-tts, kokoro, and custom)
  • πŸ’Ύ Local-First Architecture: Uses IndexedDB browser storage for documents
  • πŸ›œ Optional Server-side documents: Manually upload documents to the next backend for all users to download
  • πŸ“– Read Along Experience: Follow along with highlighted text as the TTS narrates
  • πŸ“„ Document formats: EPUB, PDF, TXT, MD, DOCX (with libreoffice installed)
  • 🎧 Audiobook Creation: Create and export audiobooks from PDF and ePub files (in m4b format with ffmpeg and aac TTS output)
  • πŸ“² Mobile Support: Works on mobile devices, and can be added as a PWA web app
  • 🎨 Customizable Experience:
    • πŸ”‘ Set TTS API base URL (and optional API key)
    • 🎯 Set model-specific instructions for GPT-4o Mini TTS
    • 🏎️ Adjustable playback speed
    • πŸ“ Customize PDF text extraction margins
    • πŸ—£οΈ Multiple voice options (checks /v1/audio/voices endpoint)
    • 🎨 Multiple app theme options

πŸ› οΈ Work in progress

  • Audiobook creation and download (m4b format)
  • Support for GPT-4o Mini TTS with instructions
  • Easy Orpheus-FastAPI support
  • More document formats: .txt, .md added
  • Native .docx support (currently requires libreoffice)
  • Support non-OpenAI TTS APIs: ElevenLabs, etc.
  • Accessibility Improvements

🐳 Docker Quick Start

Prerequisites

  • Recent version of Docker installed on your machine
  • A TTS API server (Kokoro-FastAPI, Orpheus-FastAPI, or OpenAI API)
docker run --name openreader-webui \
  -p 3003:3003 \
  -v openreader_docstore:/app/docstore \
  ghcr.io/richardr1126/openreader-webui:latest

(Optionally): Set the TTS API_BASE URL and/or API_KEY to be default for all devices

docker run --name openreader-webui \
  -e API_BASE=http://host.docker.internal:8880/v1 \
  -p 3003:3003 \
  -v openreader_docstore:/app/docstore \
  ghcr.io/richardr1126/openreader-webui:latest

Requesting audio from the TTS API happens on the Next.js server not the client. So the base URL for the TTS API should be accessible and relative to the Next.js server. If it is in a Docker you may need to use host.docker.internal to access the host machine, instead of localhost.

Visit http://localhost:3003 to run the app and set your settings.

Note: The openreader_docstore volume is used to store server-side documents. You can mount a local directory instead. Or remove it if you don't need server-side documents.

⬆️ Update Docker Image

docker stop openreader-webui && \
docker rm openreader-webui && \
docker pull ghcr.io/richardr1126/openreader-webui:latest

Adding to a Docker Compose (i.e. with open-webui or Kokoro-FastAPI)

Note: This is an example of how to add OpenReader WebUI to a docker-compose file. You can add it to your existing docker-compose file or create a new one in this directory. Then run docker-compose up --build to start the services.

Create or add to a docker-compose.yml:

volumes:
  docstore:

services:
  openreader-webui:
    container_name: openreader-webui
    image: ghcr.io/richardr1126/openreader-webui:latest
    environment:
      - API_BASE=http://host.docker.internal:8880/v1
    ports:
      - "3003:3003"
    volumes:
      - docstore:/app/docstore
    restart: unless-stopped

Dev Installation

Prerequisites

  • Node.js & npm (recommended: use nvm) Optionally required for different features:
  • FFmpeg (required for audiobook m4b creation only)
    • On Linux: sudo apt install ffmpeg
    • On MacOS: brew install ffmpeg
  • libreoffice (required for DOCX files)
    • On Linux: sudo apt install libreoffice
    • On MacOS: brew install libreoffice

Steps

  1. Clone the repository:

    git clone https://github.com/richardr1126/OpenReader-WebUI.git
    cd OpenReader-WebUI
  2. Install dependencies:

    npm install
  3. Configure the environment:

    cp template.env .env
    # Edit .env with your configuration settings

    Note: The base URL for the TTS API should be accessible and relative to the Next.js server

  4. Start the development server:

    npm run dev

    or build and run the production server:

    npm run build
    npm start

    Visit http://localhost:3003 to run the app.

πŸ’‘ Feature requests

For feature requests or ideas you have for the project, please use the Discussions tab.

πŸ™‹β€β™‚οΈ Support and issues

If you encounter issues, please open an issue on GitHub following the template (which is very light).

πŸ‘₯ Contributing

Contributions are welcome! Fork the repository and submit a pull request with your changes.

❀️ Acknowledgements

This project would not be possible without standing on the shoulders of these giants:

Docker Supported Architectures

  • linux/amd64 (x86_64)
  • linux/arm64 (Apple Silicon, Raspberry Pi, SBCs, etc.)

Stack

License

This project is licensed under the MIT License.

About

EPUB, PDF, DOCX, MD, and TXT file text to speech document reader. Read documents in realtime with high-quality TTS; or extract audiobooks. Use your own Kokoro TTS API or Open AI API endpoint.

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

Packages

 
 
 

Languages