GDPR & Privacy Considerations

Our project comprises two distinct experiments, each with unique privacy considerations. As offline AI solutions, both systems prioritize data privacy by design where all data processing occurs locally, in line with GDPR principles and the legal guidelines outlined inSystem engineering.

Quick Navigation

Phase 1: Offline LLM-Powered Literature Review Tool

License: Please contact UCL GDIHUB for licensing information

For more details, visit UCL Global Disability Innovation Hub

Data Collection

The literature review tool processes, but keeps offline, the following data types in compliance with GDPR:

  • Document Content: Academic papers, research documents, and literature uploaded by researchers are all processed locally.
  • User Queries: Search terms, questions and export chat history to local text files.
  • User Interactions: Filter selections and choices for excluding certain items.

Privacy Advantage: All processing occurs locally on the user's machine with no data transmitted to external servers, ensuring data minimization and purpose limitation.

Data Storage & Security

  • All documents and derived vector embeddings are stored locally in a secure database on the user's device.
  • No cloud write/export of sensitive research materials.
  • History and generated responses remain solely on the local system.
  • Users maintain complete control over document insertion and deletion.

GDPR Compliance Measures

  • Right to Access: Users can directly access all their data via the local file system.
  • Right to Erasure: A simple process enables deletion of document collections using a dedicated button.
  • Data Minimization: Only processes documents explicitly added by the user.
  • Purpose Limitation: Data is used exclusively for document analysis and retrieval functions.

User Controls

  • Full control over which documents are indexed.
  • Options to exclude specific sections from analysis.
  • Database clearing functionality to remove all processed data.
  • No usage statistics collection.

Phase 2: Offline LLM with Ossia Voice

License: Attribution-NonCommercial 4.0 International (CC BY-NC 4.0 DEED)

In addition, the contributor agreement below applies. If these terms do not suit your needs, please reach out for suitable collaboration.

Any contributor, by adding to or adapting the contents of this repository, accepts that the original author (@arneyjfs) retains full ownership over it's contents

The below info are considered when NOT selecting OpenAI (OpenAI option is kept for compatibility) on model choice part.

If OpenAI is selected, data will be passed to OpenAI for processing, please refer to OpenAI official website for more info.

Data Collection

The Ossia Voice system processes the following data types:

  • User Input: Text, voice and commands(button pressed) entered into the system.
  • Generated Speech: Voice outputs produced by the system.
  • User Preferences: Voice settings and personalization options.

Privacy Advantage: Unlike current OpenAI-dependent versions, our solution processes all data locally without sending to OpenAI, eliminating cloud privacy risks.

Special Considerations for Accessibility

As an assistive technology for individuals with NMD:

  • User communication content is treated as highly sensitive personal data.
  • The tests has been done to simulate the disabled's ability to use the software.

Data Security Measures

  • All processing occurs on the user's device with no external data transmission.
  • The system operates on one way transimission(download only), eliminating network-based vulnerabilities.

GDPR Compliance

  • Data Minimization: Only essential information for operation is processed and stored.
  • Storage Limitation: No long-term storage of user interactions is performed.
  • Transparency: No chat history is saved.
  • User Rights: Users have easy access to delete any locally stored settings on their browser.

Privacy by Design Approach

Both phases of our project were developed with privacy as a fundamental design principle. By creating fully offline alternatives to cloud-based AI systems, we have eliminated many traditional privacy concerns while providing equivalent functionality.

Reference & Source Libraries

Phase 1: Offline LLM-Powered Literature Review Tool

Source Libraries

  • Python Standard Libraries- PSF license
    • gc - Garbage collection
    • os - Operating system interfaces for file operations
    • time - Time access
    • shutil - High-level file operations
    • argparse - argument parsing
  • GUI Framework- no lincense required
    • tkinter - Standard Python interface to Tk GUI toolkit
  • Machine Learning
    • torch - BSD 2-Clause License, but with a 3rd clause that prohibits others from using the name of the copyright holder or its contributors to promote derived products without written consent.
    • transformers - Apache License 2.0.
  • LangChain Ecosystem - MIT license
    • langchain_chroma
    • langchain.prompts
    • langchain_community.vectorstores
    • langchain_community.document_loaders
    • langchain_text_splitters
    • langchain.schema.document
  • Loading logo
    • pillow MIT license
  • Models used
    • multilingual-e5-small - MIT lincense
    • Qwen 2.5 series models - Qwen LICENSE AGREEMENT: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/resolve/main/LICENSE

Phase 2: Ossia Voice with Offline LLM

Source Libraries

  • STT Libraries
    • huggingface/transformers - Apache LicenseVersion 2.0
    • vue - MIT License
    • Whisper models of different sizes, onnx format - MIT License
    • onnx-community/pyannote-segmentation-3.0 - MIT License
  • TTS Libraries
    • huggingface/transformers - Apache LicenseVersion 2.0
    • Xenova/speecht5_tts - MIT License
  • Offline LLM Libraries
    • huggingface/transformers - Apache LicenseVersion 2.0
    • Gemma2 - Apache LicenseVersion 2.0
  • all other part Libraries
    • huggingface/transformers - Apache License 2.0
    • @lc-ai/web-llm - Apache License 2.0
    • @xenova/transformers - Apache License 2.0
    • openai N/A
    • pinia - MIT License
    • vue-router - MIT License
    • vuetify - MIT License
    • @mdi/font - Apache 2.0 and MIT
    • @rushstack/eslint-patch - MIT License
    • @vitejs/plugin-vue - MIT License
    • @vue/eslint-config-prettier - MIT License
    • @vue/test-utils - MIT License
    • cypress - MIT License
    • eslint - MIT License
    • eslint-plugin-cypress - MIT License
    • eslint-plugin-vue - MIT License
    • jsdom - MIT License
    • prettier - MIT License
    • sass - MIT License
    • start-server-and-test - N/A
    • vite - MIT License
    • vitest - MIT License

Reference Projects

  • Speech Processing research non-MoscoW-listed-feature
    • Whisper realtime - Reused this project's worker file and use the project's concept idea of chunk processing
    • https://github.com/huggingface/transformers.js/blob/main/examples/webgpu-whisper/src/worker.js
    • Licensed under Apache License Version 2.0.