Documentation Index
Fetch the complete documentation index at: https://mintlify.com/JhonHander/obstetrics-rag-benchmark/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The Multi-Query Rewriter RAG module implements a RAG pipeline that generates multiple variations of the user’s question to improve document retrieval. It creates three different query reformulations, retrieves documents for each, combines and re-ranks the results, then synthesizes a final answer. Module:src.rag.rewriter
Source: src/rag/rewriter.py
Configuration
Default Models
Retriever Configuration
Query Rewriting Templates
The module uses three different rewriting strategies:Template 1: Standalone Query
Template 2: Synonym-Based Rephrasing
Template 3: Expanded Context
Functions
format_docs
List of documents to format
Formatted string with documents labeled by relevance (High/Medium/Low)
process_rewriter_query
The user’s question
Custom model for query rewriting. Defaults to gpt-3.5-turbo with temperature=0.3
Custom model for answer generation. Defaults to gpt-4o with temperature=0
The maximum number of documents to return after deduplication and re-ranking
Dictionary containing:
answer(str): The generated answercontexts(List[str]): Retrieved document contentsretrieved_documents(List[Document]): Full document objectsrewritten_queries(List[str]): The 3 generated query variationsmetrics(dict):rewrite_input_tokens(int): Tokens used for rewritingrewrite_output_tokens(int): Tokens generated during rewritingrewrite_cost(float): Cost of rewritinganswer_input_tokens(int): Tokens used for answeranswer_output_tokens(int): Tokens generated for answeranswer_cost(float): Cost of answer generationtotal_input_tokens(int): Total input tokenstotal_output_tokens(int): Total output tokenstotal_cost(float): Total cost in USDusage_source(str): Source of usage datacost_source(str): Source of cost calculation
query_for_evaluation
The question to process
The name of the LLM model to use for query rewriting. Defaults to “gpt-3.5-turbo”
The name of the LLM model to use for answer generation. Defaults to “gpt-4o”
Pre-configured LLM for rewriting. Takes precedence over rewriter_model
Pre-configured LLM for answer. Takes precedence over answer_model
Dictionary containing:
question(str): Original questionanswer(str): Generated answercontexts(List[str]): Retrieved document contentssource_documents(List[Document]): Full retrieved documentsmetadata(dict): Comprehensive metadata including:num_contexts(int): Number of contextsretrieval_method(str): “multi_query_rewrite”rewrite_count(int): Number of query variations (3)llm_model(str): Answer model namerewriter_model(str): Rewriter model nameprovider(str): Answer providermodel_id(str): Answer model IDrewriter_provider(str): Rewriter providerrewriter_model_id(str): Rewriter model IDexecution_time(float): Total execution timeinput_tokens(int): Total input tokensoutput_tokens(int): Total output tokenstotal_cost(float): Total cost in USDtokens_used(int): Total tokensusage_source(str): Usage data sourcecost_source(str): Cost calculation source
Usage Example
Pipeline Flow
- Generate Queries: Creates 3 query variations using gpt-3.5-turbo (temp=0.3):
- Standalone specific query
- Synonym-based rephrasing
- Expanded contextual query
- Retrieve: Performs semantic search for each query variation (top 5 per query)
- Deduplicate: Removes duplicate documents using content-based IDs
- Weight & Re-rank: Applies query-based weighting (later queries get 5% penalty)
- Select: Chooses top 8 documents after re-ranking
- Format: Formats documents with relevance indicators
- Generate: Uses gpt-4o (temp=0) to generate final answer
- Track: Captures separate metrics for rewriting and answer generation
Query Weighting Strategy
- Query 1: weight = 1.0 (100%)
- Query 2: weight = 0.95 (95%)
- Query 3: weight = 0.90 (90%)
Key Features
- Multi-perspective retrieval: 3 different query formulations
- Automatic deduplication: Removes duplicate documents across queries
- Intelligent weighting: Prioritizes more direct query reformulations
- High coverage: Up to 15 candidates (3 queries × 5 docs)
- Relevance labeling: Documents marked as High/Medium/Low relevance
- Dual cost tracking: Separate metrics for rewriting and answer generation
- Temperature tuning: 0.3 for rewriting (balanced), 0 for answer (precise)
