System Design Overview

Our offline RAG system is built as a desktop application using Python and Tkinter for the user interface. It leverages a local vector database (Chroma) to store document embeddings, uses a Hugging Face transformer (via a custom embedding module) to compute embeddings, and employs a Llama-based language model for generating responses. Documents are ingested (PDF, Markdown, Word) via a dedicated ingestion script.

High-Level Architecture Diagram

The diagram below outlines the key components and their interactions:

flowchart TB subgraph User["User Interface"] UI[TkInter GUI] end subgraph Core["Core Components"] LLM["LLM Processing Qwen2.5-1.5b
"] EMB["Embedding Model
multilingual-e5-small"] VDB["Vector Database
Chroma"] end subgraph Data["Document Processing"] LOAD["Document Loaders
(PDF, Markdown, Word)"] SPLIT["Text Splitter"] end User -->|"Query"| LLM LLM -->|"Response"| User User -->|"Load Documents"| LOAD LOAD -->|"Raw Documents"| SPLIT SPLIT -->|"Text Chunks"| VDB VDB <-->|"Embeddings"| EMB LLM -->|"Similarity Search"| VDB

Explanation: The diagram shows how the user interface sends queries to the query processor, which then uses the retrieval module to fetch relevant documents from the Chroma vector database. The embedding module creates embeddings, and the ingestion script populates the database. Finally, the LLM generates responses based on the query and retrieved context.

Sequence Diagram: Query Processing Flow

The following sequence diagram illustrates the processing of a user query:

sequenceDiagram participant User participant UI as User Interface participant DP as Document Processor participant EE as Embedding Engine participant VDB as Vector Database participant LLM as LLM Engine %% Document Indexing Flow User->>UI: Load Document/Folder UI->>DP: Process Document Files DP->>DP: Split into Chunks DP->>EE: Generate Embeddings EE->>VDB: Store Document Vectors VDB-->>UI: Indexing Complete UI-->>User: Confirmation %% Query Flow User->>UI: Submit Query UI->>VDB: Semantic Search VDB->>VDB: Apply Content Filtering VDB-->>UI: Return Relevant Documents UI->>LLM: Send Query + Context LLM->>LLM: Generate Response LLM-->>UI: Return Response UI-->>User: Display Response %% Database Management User->>UI: Delete Database UI->>VDB: Clear Database VDB-->>UI: Confirmation UI-->>User: Database Cleared

Class Design

The following class diagram shows the design of our system:

classDiagram class LocalHuggingFaceEmbeddings { +tokenizer +model +device +__init__(model_path) +embed_documents(texts) list[list[float]] +embed_query(query) list[float] } class ChromaDB { +persist_directory +embedding_function +similarity_search(query, k) +add_documents(documents, ids) +get(include) } class DocumentLoader { +load_documents_from_directory(directory_path) +load_pdf(path) +load_md(path) +load_doc(path) +split_documents(documents) +add_to_chroma(chunks) +calculate_chunk_ids(chunks) +clear_database() } class RAGInterface { +conversation_history +current_font_size +do_not_include_items +tokenizer +model +generator +retrieve_similar_documents(query, top_k) +generate_response(input_text, context) +send_query() +browse_file() +browse_folder() +export_conversation() +delete_database() } RAGInterface --> LocalHuggingFaceEmbeddings : uses RAGInterface --> ChromaDB : queries RAGInterface --> DocumentLoader : processes files DocumentLoader --> LocalHuggingFaceEmbeddings : generates embeddings DocumentLoader --> ChromaDB : stores vectors

Component Details

Integration & Data Flow

All components are designed to operate offline. The data flow starts with document ingestion and embedding creation, followed by storage in the Chroma vector database. When a query is processed, the system retrieves relevant documents using semantic search, and the LLM generates a response based on the retrieved context.

Final Notes

The system is modular, allowing independent updates to each component. It supports scalability with batch processing and includes resource management for GPU/MPS environments. The user interface provides interactive feedback to ensure a smooth experience.