Skip to content

Added OpenSearch2.19.1 as the vector_database support #7140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

pyyuhao
Copy link

@pyyuhao pyyuhao commented Apr 18, 2025

What problem does this PR solve?

This PR adds the support for latest OpenSearch2.19.1 as the store engine & search engine option for RAGFlow.

Main Benefit

  1. OpenSearch2.19.1 is licensed under the [Apache v2.0 License] which is much better than Elasticsearch
  2. For search, OpenSearch2.19.1 supports full-text search、vector_search、hybrid_search those are similar with Elasticsearch on schema
  3. For store, OpenSearch2.19.1 stores text、vector those are quite simliar with Elasticsearch on schema

Changes

  • Support opensearch_python_connetor. I make a lot of adaptions since the schema and api/method between ES and Opensearch differs in many ways(especially the knn_search has a significant gap) : rag/utils/opensearch_coon.py
  • Support static config adaptions by changing: conf/service_conf.yaml、api/settings.py、rag/settings.py
  • Supprt some store&search schema changes between OpenSearch and ES: conf/os_mapping.json
  • Support OpenSearch python sdk : pyproject.toml
  • Support docker config for OpenSearch2.19.1 : docker/.env、docker/docker-compose-base.yml、docker/service_conf.yaml.template

How to use

  • I didn't change the priority that ES as the default doc/search engine. Only if in docker/.env , we set DOC_ENGINE=${DOC_ENGINE:-opensearch}, it will work.

Others

Our team tested a lot of docs in our environment by using OpenSearch as the vector database ,it works very well.
All the conifg for OpenSearch is necessary.

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. 🌈 python Pull requests that update Python code 💞 feature Feature request, pull request that fullfill a new feature. labels Apr 18, 2025
@KevinHuSh KevinHuSh requested a review from asiroliu April 18, 2025 08:31
@KevinHuSh KevinHuSh added the ci Continue Integration label Apr 18, 2025
@yingfeng
Copy link
Member

Thanks for the contribution. OpenSearch can be added as a doc engine alternative. However, maintaining OpenSearch is not an easy work, after this PR is merged, OpenSearch might not be able to work after several weeks.

… bytes-like object

### What problem does this PR solve?
fix bug infiniflow#6990 internal server error ehile chunking:expected string or
bytes-like object
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

Co-authored-by: unknown <taoshi.ln@chinatelecom.cn>
docker/.env Outdated
@@ -2,7 +2,8 @@
# Available options:
# - `elasticsearch` (default)
# - `infinity` (https://github.com/infiniflow/infinity)
DOC_ENGINE=${DOC_ENGINE:-elasticsearch}
# - `opensearch` (https://github.com/opensearch-project/OpenSearch)
DOC_ENGINE=${DOC_ENGINE:-opensearch}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default DOC_ENGINE should be elasticsearch

Copy link
Author

@pyyuhao pyyuhao Apr 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asiroliu I'v changed it into elasticsearch by default in my commit. It's a config mistake I forgot to change back from my local environment

pyyuhao and others added 3 commits April 18, 2025 17:18
…finiflow#7138)

### What problem does this PR solve?

Feat: Rendering a search test list with real data infiniflow#3221
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
@pyyuhao
Copy link
Author

pyyuhao commented Apr 18, 2025

Thanks for the contribution. OpenSearch can be added as a doc engine alternative. However, maintaining OpenSearch is not an easy work, after this PR is merged, OpenSearch might not be able to work after several weeks.

Thanks for you reply, and I'v changed some litte problems mentioned above. I am an search-engine engineer focusing on the stuff about OpenSearch/Elasticsearch for years, and also write some plugins for Opnsearch.I will still pay much attention on ES/OS continuously. During these two years, We gave more attention on RAG stuff

@pyyuhao pyyuhao requested a review from asiroliu April 18, 2025 10:10
@pyyuhao
Copy link
Author

pyyuhao commented Apr 18, 2025

@asiroliu @yingfeng @KevinHuSh ,hi: I've made some commits mainly about fomat and comment. Please review again, thanks a lot

@yingfeng
Copy link
Member

It can not pass CI, the container of elasticsearch can not be started. See the CI logs https://github.com/infiniflow/ragflow/actions/runs/14533575135/job/40777952188

@pyyuhao
Copy link
Author

pyyuhao commented Apr 18, 2025

It can not pass CI, the container of elasticsearch can not be started. See the CI logs https://github.com/infiniflow/ragflow/actions/runs/14533575135/job/40777952188

It worked well now at my local environment, I will check the code again and create a virtual machine to verify it.
Maybe because I use opensearch on 9200 port which is the same as elasticsearch(on most cases these are the same), is there a rule about this that will block the test job for es?
I will check these in a few days

Have a nice weekend

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Continue Integration 💞 feature Feature request, pull request that fullfill a new feature. 🌈 python Pull requests that update Python code size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants