RAG Service

Retrieval-Augmented Generation service for the TFG educational chatbot. Provides document management, semantic search, and vector storage using Qdrant and Ollama embeddings.

Overview

graph LR
    subgraph "RAG Service :8081"
        API[FastAPI API]
        FileLoader[File Loader]
        DocProcessor[Document Processor]
        EmbeddingService[Embedding Service]
        VectorStore[Vector Store]
    end
    
    subgraph "External Services"
        Ollama[Ollama :11434]
        Qdrant[(Qdrant :6333)]
    end
    
    API --> FileLoader
    FileLoader --> DocProcessor
    DocProcessor --> EmbeddingService
    EmbeddingService --> Ollama
    EmbeddingService --> VectorStore
    VectorStore --> Qdrant

Key Features

  • 📄 Document Management: Upload, list, and organize documents by subject and type
  • 🔍 Semantic Search: Find relevant content using natural language queries
  • 🧩 Automatic Chunking: Split documents for optimal retrieval
  • 🔢 Embedding Generation: Convert text to vectors via Ollama (nomic-embed-text)
  • 🗃️ Vector Storage: Efficient similarity search with Qdrant
  • 📊 Prometheus Metrics: Built-in monitoring and instrumentation

Quick Start

Local Development

# Install dependencies
cd rag_service
pip install -e ".[dev]"

# Set environment variables
export QDRANT_HOST=localhost
export OLLAMA_HOST=localhost

# Start the service
uvicorn rag_service.api:app --reload --port 8081

Docker

# Start with dependencies
docker compose up -d qdrant ollama rag_service

# Initialize embedding model
docker exec ollama ollama pull nomic-embed-text

API Access

  • API: http://localhost:8081
  • Health: http://localhost:8081/health
  • Metrics: http://localhost:8081/metrics

API Endpoints

Method Endpoint Description
GET /health Health check with Qdrant status
POST /search Semantic search with filters
POST /index Index documents
GET /files List available files
POST /upload Upload and index file
POST /load-file Index existing file
GET /subjects List subjects
GET /subjects/{asignatura}/types List document types
GET /collection/info Qdrant collection stats

Document Organization

documents/
├── logica-difusa/
│   ├── apuntes/
│   │   └── tema1.pdf
│   └── ejercicios/
│       └── practica1.md
├── iv/
│   ├── teoria/
│   │   └── docker.pdf
│   └── examenes/
│       └── examen-2024.pdf

Configuration

Variable Default Description
QDRANT_HOST qdrant Qdrant server hostname
QDRANT_PORT 6333 Qdrant server port
OLLAMA_HOST ollama Ollama server hostname
OLLAMA_PORT 11434 Ollama API port
OLLAMA_MODEL nomic-embed-text Embedding model
EMBEDDING_DIMENSION 768 Vector dimension
TOP_K_RESULTS 5 Default search results
SIMILARITY_THRESHOLD 0.5 Minimum similarity score
CHUNK_SIZE 1000 Document chunk size
CHUNK_OVERLAP 200 Chunk overlap

Dependencies

  • Qdrant: Vector database for similarity search
  • Ollama: Local LLM inference for embeddings
  • LangChain: Document processing and chunking

Documentation

Document Description
Architecture System design and data flow
API Endpoints Complete API reference
Embeddings Embedding service and models
Vector Store Qdrant integration
Document Processing Chunking and loading
Configuration Environment variables
Development Local setup and testing
Deployment Docker and production

← Back to Services