Documentation Index
Fetch the complete documentation index at: https://mintlify.com/JhonHander/obstetrics-rag-benchmark/llms.txt
Use this file to discover all available pages before exploring further.
Obstetrics RAG Benchmark
A systematic evaluation of Retrieval-Augmented Generation (RAG) architectures applied to medical question-answering in the obstetrics domain. This research project benchmarks multiple RAG strategies across various Large Language Models using the RAGAS evaluation framework.Overview
This project investigates the effectiveness of different RAG retrieval strategies for medical Q&A, specifically focusing on pregnancy and childbirth guidance. We implement and evaluate four distinct RAG architectures, comparing their performance across multiple state-of-the-art language models.Quick Start
Get up and running with your first evaluation in minutes
RAG Architectures
Learn about the different RAG strategies we benchmark
Evaluation Framework
Understand RAGAS metrics and how we measure performance
API Reference
Explore the complete API documentation
Key Features
Multiple RAG Architectures
6 RAG StrategiesSimple Semantic, Hybrid (BM25+Semantic), Hybrid-RRF, HyDE, Query Rewriter, and PageIndex
RAGAS Evaluation
4 Core MetricsFaithfulness, Answer Relevancy, Context Precision, and Context Recall
Multi-Model Support
Multiple LLMsDefault models (GPT-4o, GPT-3.5-turbo) plus extensible registry supporting GPT-5, GPT-5.2, MediPhi, and MedGemma
Vector Search
ChromaDB + OpenAIPersistent vector store with OpenAI text-embedding-3-small
LangChain Pipeline
Production-ReadyBuilt on LangChain for reliable retrieval and generation
Comprehensive Results
Detailed AnalyticsJSON output with timestamps and question-by-question breakdown
Research Focus
This project addresses several key questions in the RAG domain:- How do different retrieval strategies (semantic, hybrid, hypothetical embeddings, query reformulation) compare in medical Q&A scenarios?
- What is the impact of model selection on RAG performance in specialized domains?
- How do we quantitatively assess retrieval quality and generation faithfulness without manual annotation?
- Which RAG configuration produces the highest quality responses for obstetrics-related questions?
Use Cases
Medical AI Research
Medical AI Research
Benchmark RAG techniques for healthcare applications, compare retrieval strategies for domain-specific knowledge bases, and establish baseline performance metrics for medical Q&A systems.
RAG Architecture Comparison
RAG Architecture Comparison
Evaluate different RAG strategies side-by-side, identify optimal configurations for your use case, and understand trade-offs between retrieval approaches.
LLM Evaluation
LLM Evaluation
Compare multiple language models on the same task, assess model-specific performance variations, and identify the best model for your requirements.
Educational Resource
Educational Resource
Learn RAG implementation patterns, understand evaluation methodologies, and explore best practices for knowledge-augmented generation.
Getting Started
Install Dependencies
Clone the repository and install Python dependencies including LangChain, ChromaDB, and RAGAS.
Research Contributions
- Systematic Evaluation: RAGAS-based assessment of RAG architectures in the medical domain
- Multiple Architectures: Comparison of 6 distinct RAG retrieval strategies
- Model Diversity: Evaluation across general-purpose and specialized medical language models
- Reproducible Benchmark: Complete pipeline from data processing to evaluation with detailed results
Next Steps
Installation
Complete installation guide
Core Concepts
Understand the fundamentals
Evaluation Guide
Run your first benchmark
