User and Deployment Manual

Contents with quick links

Phase 1: Offline-LLM Powered Literature Review Tool

Introduction

The Offline-LLM Powered Literature Review Tool is designed to enhance small-scale LLMs with RAG-based improvements. This manual provides a step-by-step guide to setting up and using the system.

Getting Started

System Requirements:

  • Operating System: Windows, macOS(Apple Silicon, M1 or later)
  • Memory: 16GB recommended
  • Storage: 30GB free space
  • GPU: NVIDIA with CUDA support or Apple Silicon (for MPS acceleration) is preferred
  • Python: 3.10, 3.11 or 3.12 - UCL CS Hub Setup Guide
  • For apple mps acceleration debug - Accelerated PyTorch on Mac

Running Steps (CPU and Apple MPS accelerated):

  1. Download the software executable package of your OS.
  2. Extract files and navigate to the directory.
  3. Run the executable. (you will see a terminal window open up first. Please wait for the application to load as this may take some time, depending on your device)

For mac exec and windows exe, if you have problem running, please refer the steps below

Deployment Steps from sourcecode (CPU and Apple MPS accelerated):

  1. Clone the project repository:
    git clone https://github.com/nigelm48/COMP0016_Group27_2024-25.git
  2. Navigate to the project directory:
    cd COMP0016_Group27_2024-25
  3. Manually download the required models from Hugging Face(if you use the code from onedrive, the models may have included in the folder):
  4. Place the downloaded model directories inside the project root directory, maintaining the following structure:
    COMP0016_Group27_2024-25/
    ├── multilingual-e5-small/       # Directory containing the multilingual-e5-small model files
    ├── Qwen2.5-1.5B/                # Directory containing the Qwen2.5-1.5B model files
  5. Install required Python libraries -please do this in clean environment or venv:
    pip install -r requirements.txt
  6. Compile the code:
    pyinstaller build.spec
    OR
    python -m PyInstaller build.spec
  7. Run the executable file in the dist folder.

Running Steps (CUDA Accelerated):

  1. Download and install the latest NVIDIA driver and CUDA toolkit compatible with your GPU from: https://developer.nvidia.com/cuda-downloads
  2. Clone the project repository:
    git clone https://github.com/nigelm48/COMP0016_Group27_2024-25.git
  3. Navigate to the project directory:
    cd COMP0016_Group27_2024-25
  4. Manually download the required models from Hugging Face(if you use the code from onedrive, the models may have included in the folder):
  5. Place the downloaded model directories inside the project root directory, maintaining the following structure:
    COMP0016_Group27_2024-25/
    ├── multilingual-e5-small/       # Directory containing the multilingual-e5-small model files
    ├── Qwen2.5-1.5B/                # Directory containing the Qwen2.5-1.5B model files
  6. Install required Python libraries-please do this in clean environment or venv:
    pip install -r requirements.txt

    Make sure you install CUDA-powered torch

    if not, uninstall torch (pip uninstall torch)and

    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
  7. For CUDA: Ensure your system uses a Python version >= 3.10 and your environment supports CUDA.
  8. Run the main application:
    python main.py

Deployment Steps (CUDA accelerated):

  1. After completing installation, proceed to compile the code:
  2. Compile the code:
    pyinstaller build.spec
    OR
    python -m PyInstaller build.spec
  3. Run the executable file in the dist folder.

Sample data:

Sample data can be found in the sample_data folder of the OneDrive.

Features

  • RAG-enhanced LLM responses.
  • Offline functionality for privacy.
  • RAG with Chroma Database.

Usage

  • Load documents from the User Folder.
  • Manage and remove the database as needed.
  • Ask questions and receive LLM-generated responses.
  • Export chat history for future reference.
Literature Review Tool Interface

Figure 1: Screenshot of the Literature Review Tool interface showing the document query and response system

AI RAG Assistant Interface Explanation

The GUI (Graphical User Interface) of the AI RAG Assistant includes the following elements:

  1. Model Information (Top Bar)
    • Shows the LLM path and the Embedding model used.
    • In this case:
      • LLM: Located in /Users/haochengxu/Documents/COMP0016_Group27_2024-25/jids/AI_RAG_Assistant/_internal/Qwen2.5-1.5B
      • Embedding model: multilingual-e5-small
  2. Settings of Font (Top Bar)
    • Allows users to adjust the AI assistant's font size.
  3. Main Text Display Area
    • The large blank space is where responses and interactions appear.
  4. Input Box (Below the Text Display)
    • Users type their queries here before clicking Send to get AI responses.
  5. Control Buttons (Below the Input Box)
    • Load Folder: Import a directory of documents.
    • Load File: Import a single document.
    • Delete Database: Remove the stored vector database.
    • Export Chat: Save the chat history.
    • Send: Submit a query to the AI assistant.
  6. Filtering and Exclusion Options
    • Do Not Include Items: Users can specify documents or terms to be excluded from responses.
    • Filter Field: Allows users to refine the Do Not Include Items based on keywords.
    • Apply Filter and Sort Items: Help manage Do Not Include Items efficiently.
  7. Helper (Top left)
    • Displays introduction to software and example usage.
  8. Search (Top left)
    • Search within main display area
    • <- previous result
    • -> next result
    • clear - remove all search mark and results

Workflow

The typical workflow of the AI RAG Assistant involves the following steps:

  1. Document Loading: Import documents from the User Folder or single file.
  2. Add do-not-include items: Ensure data safety.
  3. Query Submission: Ask questions and receive AI-generated responses.
  4. Export Chat History: Save the chat history for future reference.
  5. Database Management: Remove or import more to database the database as needed.
  6. Close app: The app will automatically release resources.

Workflow guide video

Troubleshooting

Q: How do I reset the database?

A: Use the "Remove Database" option in settings.

Q: Can I add new document sources?

A: Yes, PDFs, markdowns and word files can be added.

Q: Is my data stored online?

A: No, the entire workflow remains offline for privacy.

Q: I have trouble compiling,

A: please add magic to excludes=[] in build.spec.

Q: I have trouble running on mac,

A: please refer to https://developer.apple.com/metal/pytorch/ to install mps powered pytorch.

Phase 2: Offline LLM with Ossia Voice

Introduction

Ossia Voice is an accessibility tool for Augmentative and Alternative Communication designed to help people with significant speech and motion difficulties, such as those with Motor Neurone Disease. This offline version eliminates the need for API keys or internet connectivity.

Getting Started

System Requirements:

  • Operating System: Windows 10/11, macOS 12+
  • Memory: 16GB recommended
  • Storage: 20GB free space
  • GPU: NVIDIA with CUDA support or Apple Silicon (for MPS acceleration), standard x86 CPU is also ok
  • Chrome browser
  • Internet speed: recommended at least 100 Mbps
  • Node.js: latest version

Installation and deployment Steps:

  1. Download the Ossia Voice offline package from our GitHub repository
  2. Extract the downloaded archive to your preferred location
  3. Install Node.js from https://nodejs.org/en
  4. Choose the version that fits your needs:

    Standard Version

    git clone https://github.com/Rainy-Day04/OssiaVoiceOffline.git
    cd OssiaVoiceOffline
    npm install
    npm run dev

    Diarization Version

    git clone https://github.com/Rainy-Day04/OssiaVoiceOffline.git
    cd OssiaVoiceOffline
    git checkout stt-diarization
    git pull
    npm install
    npm run dev

    Realtime STT Version

    git clone https://github.com/Rainy-Day04/OssiaVoiceOffline.git
    cd OssiaVoiceOffline
    git checkout stt-realtime-whisper
    git pull
    npm install
    npm run dev
  5. Follow the on-screen instructions to complete setup

Features

  • Offline LLM generation without upload to external partners
  • Multiple voice input experimental options
  • Accessible interface for users with MNDs
  • less than 1 words typed per chat
  • Integration with assistive devices and switches
  • No API costs or usage limits

UI

Ossia Voice Setup Screen

Figure 2: Screenshot of the Ossia Voice UI

Interface Controls:

  • Main input area(top left): enter message or voice input the message
  • Saved phrases panel(bottom left): set the keywords(first option), tone (second option) and topic(third option) and generate sentences
  • Message center(bottom right): choose generated words to submit
  • Settings menu(top left gear icon): Initial Setups

Usage

  1. Initial Setup: Finish the settings and backstory
Ossia Voice Setup Screen

Figure 2: Screenshot of the Ossia Voice setup screen where users can configure voice settings and backstory

  1. Usage:
  2. Ossia Voice Setup Screen

    Figure 3: Usage flow of the Ossia Voice

Workflow guide video

Troubleshooting

Q: Why is the voice output not working?

A: Check your microphone and browser microphone permission settings.

Q: How do I change the voice settings?

A: Use the settings menu.

Q: What happens if the UI shows fails loading a model

A: Please check your internet and browser privacy settings.

If this is still unsolved, please switch to Google Chrome.

Support

For additional support or to report issues, please contact us through the GitHub repository.