Documentation Index
Fetch the complete documentation index at: https://mintlify.com/JhonHander/obstetrics-rag-benchmark/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Hybrid RAG combines two complementary retrieval strategies:- Lexical search (BM25): Matches exact keywords and terms
- Semantic search (ChromaDB): Matches meaning and context
EnsembleRetriever to merge results from both retrievers with configurable weights, providing more robust retrieval than either method alone.
How It Works
The Hybrid RAG pipeline follows these steps:- Parallel Retrieval: Query is sent to both BM25 and semantic retrievers simultaneously
- Weighted Fusion: Results are combined using configurable weights (default 0.5/0.5)
- Result Merging: Ensemble retriever produces a unified ranked list
- Context Formatting: Merged documents are formatted with metadata
- Answer Generation: LLM generates the final answer from combined context
The ensemble weights determine how much influence each retriever has on the final ranking. Equal weights (0.5/0.5) give balanced importance to both keyword matching and semantic similarity.
Key Features
- Best of both worlds: Combines keyword precision with semantic understanding
- Configurable weights: Adjust the balance between lexical and semantic retrieval
- Deduplication: Automatically handles documents retrieved by both methods
- Complementary coverage: Catches documents that only one method would find
Implementation Details
Retriever Configuration
Core Processing Function
Theprocess_hybrid_query() function handles the complete hybrid pipeline:
Usage with query_for_evaluation()
Return Structure
When to Use This Approach
Best For
- Mixed query types: Questions that combine specific terms with conceptual meaning
- Medical terminology: Queries with exact drug names, procedures, or diagnostic terms
- Acronyms and abbreviations: Terms like “IMC” (BMI) or “VIH” (HIV)
- Recall improvement: When semantic search alone misses important keyword matches
- General robustness: When you want consistent performance across diverse query types
Advantages Over Simple Semantic
- Better keyword coverage: BM25 catches exact term matches that embeddings might miss
- Reduced vocabulary gap: Lexical search doesn’t depend on semantic similarity
- Complementary retrieval: Each method covers the other’s blind spots
- Improved recall: More likely to retrieve all relevant documents
Limitations
- No explicit rank fusion: Simple weighted averaging may not optimally combine scores
- Fixed weights: Ensemble weights are static, not query-adaptive
- Potential redundancy: Both retrievers may return very similar documents
- Higher complexity: Requires maintaining two separate indexes
Performance Characteristics
Speed
- Moderate latency: ~2-4 seconds (two retrievers + fusion)
- Parallel retrieval: Both methods can run concurrently
- Minimal overhead: Simple weighted fusion is computationally cheap
Cost
- Embedding cost: Same as simple semantic (~$0.00001 per query)
- LLM cost: Same as simple semantic (~$0.002-0.005 per query)
- No additional API costs: BM25 runs locally
- Total: Slightly higher than simple semantic due to longer context
Quality
- Higher recall: More likely to retrieve all relevant documents
- Better precision: Keyword matching reduces false positives from semantic drift
- More diverse results: Different retrieval methods surface different documents
- Robust performance: Consistent across various query types
Comparison with Other Architectures
Tuning Ensemble Weights
You can adjust the balance between lexical and semantic retrieval:Source Files
- Implementation:
~/workspace/source/src/rag/hybrid.py:134-173 - Evaluation interface:
~/workspace/source/src/rag/hybrid.py:176-242 - Document loading:
~/workspace/source/src/rag/hybrid.py:46-54
