System Design Overview

Our offline RAG system is built as a desktop application using Python and Tkinter for the user interface. It leverages a local vector database (Chroma) to store document embeddings, uses a Hugging Face transformer (via a custom embedding module) to compute embeddings, and employs a Llama-based language model for generating responses. Documents are ingested (PDF, Markdown, Word) via a dedicated ingestion script.

High-Level Architecture Diagram

The diagram below outlines the key components and their interactions:

Explanation: The diagram shows how the user interface sends queries to the query processor, which then uses the retrieval module to fetch relevant documents from the Chroma vector database. The embedding module creates embeddings, and the ingestion script populates the database. Finally, the LLM generates responses based on the query and retrieved context.

Sequence Diagram: Query Processing Flow

The following sequence diagram illustrates the processing of a user query:

sequenceDiagram participant User participant UI as User Interface participant DP as Document Processor participant EE as Embedding Engine participant VDB as Vector Database participant LLM as LLM Engine %% Document Indexing Flow User->>UI: Load Document/Folder UI->>DP: Process Document Files DP->>DP: Split into Chunks DP->>EE: Generate Embeddings EE->>VDB: Store Document Vectors VDB-->>UI: Indexing Complete UI-->>User: Confirmation %% Query Flow User->>UI: Submit Query UI->>VDB: Semantic Search VDB->>VDB: Apply Content Filtering VDB-->>UI: Return Relevant Documents UI->>LLM: Send Query + Context LLM->>LLM: Generate Response LLM-->>UI: Return Response UI-->>User: Display Response %% Database Management User->>UI: Delete Database UI->>VDB: Clear Database VDB-->>UI: Confirmation UI-->>User: Database Cleared

Class Design

The following class diagram shows the design of our system:

classDiagram class LocalHuggingFaceEmbeddings { +tokenizer +model +device +__init__(model_path) +embed_documents(texts) list[list[float]] +embed_query(query) list[float] } class ChromaDB { +persist_directory +embedding_function +similarity_search(query, k) +add_documents(documents, ids) +get(include) } class DocumentLoader { +load_documents_from_directory(directory_path) +load_pdf(path) +load_md(path) +load_doc(path) +split_documents(documents) +add_to_chroma(chunks) +calculate_chunk_ids(chunks) +clear_database() } class RAGInterface { +conversation_history +current_font_size +do_not_include_items +tokenizer +model +generator +retrieve_similar_documents(query, top_k) +generate_response(input_text, context) +send_query() +browse_file() +browse_folder() +export_conversation() +delete_database() } RAGInterface --> LocalHuggingFaceEmbeddings : uses RAGInterface --> ChromaDB : queries RAGInterface --> DocumentLoader : processes files DocumentLoader --> LocalHuggingFaceEmbeddings : generates embeddings DocumentLoader --> ChromaDB : stores vectors

Component Details

User Interface (Tkinter): Provides the primary interaction point with features like dynamic font resizing, file/folder browsing, and conversation export. (main.py)
Embedding Module: Computes embeddings using Hugging Face’s transformer model (multilingual-e5-small). (embedding.py)
Vector Database (Chroma): Stores and retrieves document embeddings. (populate_database.py)
Document Ingestion: Processes various document types and splits them into chunks for indexing. (populate_database.py)
LLM Generation: Generates natural language responses using a Llama-based model (Qwen2.5-1.5b). (main.py)

Integration & Data Flow

All components are designed to operate offline. The data flow starts with document ingestion and embedding creation, followed by storage in the Chroma vector database. When a query is processed, the system retrieves relevant documents using semantic search, and the LLM generates a response based on the retrieved context.

Final Notes

The system is modular, allowing independent updates to each component. It supports scalability with batch processing and includes resource management for GPU/MPS environments. The user interface provides interactive feedback to ensure a smooth experience.