Skip to content

mention OpenSearchHybridRetrieval in OpenSearch integration page #327

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions integrations/opensearch-document-store.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ toc: true
- [Overview](#overview)
- [Installation](#installation)
- [Usage](#usage)
- [Hybrid Retriever](#hybrid-retriever)

## Overview

Expand Down Expand Up @@ -66,6 +67,35 @@ indexing.connect("converter", "writer")
indexing.run({"converter": {"paths": file_paths}})
```

### Hybrid-Retriever

This integration also provides a hybrid retriever. The `OpenSearchHybridRetriever` combines the capabilities of a vector search and a keyword search. It uses the OpenSearch document store to retrieve documents based on both semantic and keyword-based queries.

You can use the `OpenSearchHybridRetriever` together with the `OpenSearchDocumentStore` to perform hybrid retrieval.

```python
from haystack_integrations.components.retrievers.opensearch import OpenSearchHybridRetriever
from haystack_integrations.document_stores.opensearch import OpenSearchDocumentStore

# Initialize the document store
document_store = OpenSearchDocumentStore(
hosts=["http://localhost:9200"],
index="document_store",
embedding_dim=384,
)

# Initialize the retriever
retriever = OpenSearchHybridRetriever(
document_store=document_store,
embedding_dim=384,
top_k=10,
)

pipeline.run(query="What is the capital of France?")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This usage example can't work. Pipeline is not imported, not initialized and we're not adding the retriever to the pipeline. There are also no documents in the document store.

You could extend the following. It's based on the example we used in the 2.13.0 release https://github.com/deepset-ai/haystack/releases/tag/v2.13.0
Please add the embedder, adjust dimensions param of the doc store and test it.

# pip install haystack-ai datasets "sentence-transformers>=3.0.0"

from haystack import Document
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack_integrations.components.retrievers.opensearch import OpenSearchHybridRetriever
from haystack_integrations.document_stores.opensearch import OpenSearchDocumentStore
from datasets import load_dataset

dataset = load_dataset("HaystackBot/medrag-pubmed-chunk-with-embeddings", split="train")
docs = [Document(content=doc["contents"], embedding=doc["embedding"]) for doc in dataset]
document_store = OpenSearchDocumentStore()
document_store.write_documents(docs)

query = "What treatments are available for chronic bronchitis?"
result = OpenSearchHybridRetriever(document_store).run(...). # add SentenceTransformersTextEmbedder with "BAAI/bge-small-en-v1.5"
print(result)

```

You can learn more about the `OpenSearchHybridRetriever` in the [documentation]().
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can learn more about the `OpenSearchHybridRetriever` in the [documentation]().
You can learn more about the `OpenSearchHybridRetriever` in the [documentation](https://docs.haystack.deepset.ai/docs/opensearchhybridretriever).


### License

`opensearch-haystack` is distributed under the terms of the [Apache-2.0](https://spdx.org/licenses/Apache-2.0.html) license.