GDPR & Privacy Considerations
Our project comprises two distinct experiments, each with unique privacy considerations. As offline AI solutions, both systems prioritize data privacy by design where all data processing occurs locally, in line with GDPR principles and the legal guidelines outlined inSystem engineering.
Quick Navigation
Phase 1: Offline LLM-Powered Literature Review Tool
License: Please contact UCL GDIHUB for licensing information
For more details, visit UCL Global Disability Innovation Hub
Data Collection
The literature review tool processes, but keeps offline, the following data types in compliance with GDPR:
- Document Content: Academic papers, research documents, and literature uploaded by researchers are all processed locally.
- User Queries: Search terms, questions and export chat history to local text files.
- User Interactions: Filter selections and choices for excluding certain items.
Privacy Advantage: All processing occurs locally on the user's machine with no data transmitted to external servers, ensuring data minimization and purpose limitation.
Data Storage & Security
- All documents and derived vector embeddings are stored locally in a secure database on the user's device.
- No cloud write/export of sensitive research materials.
- History and generated responses remain solely on the local system.
- Users maintain complete control over document insertion and deletion.
GDPR Compliance Measures
- Right to Access: Users can directly access all their data via the local file system.
- Right to Erasure: A simple process enables deletion of document collections using a dedicated button.
- Data Minimization: Only processes documents explicitly added by the user.
- Purpose Limitation: Data is used exclusively for document analysis and retrieval functions.
User Controls
- Full control over which documents are indexed.
- Options to exclude specific sections from analysis.
- Database clearing functionality to remove all processed data.
- No usage statistics collection.
Phase 2: Offline LLM with Ossia Voice
License: Attribution-NonCommercial 4.0 International (CC BY-NC 4.0 DEED)
In addition, the contributor agreement below applies. If these terms do not suit your needs, please reach out for suitable collaboration.
Any contributor, by adding to or adapting the contents of this repository, accepts that the original author (@arneyjfs) retains full ownership over it's contents
The below info are considered when NOT selecting OpenAI (OpenAI option is kept for compatibility) on model choice part.
If OpenAI is selected, data will be passed to OpenAI for processing, please refer to OpenAI official website for more info.
Data Collection
The Ossia Voice system processes the following data types:
- User Input: Text, voice and commands(button pressed) entered into the system.
- Generated Speech: Voice outputs produced by the system.
- User Preferences: Voice settings and personalization options.
Privacy Advantage: Unlike current OpenAI-dependent versions, our solution processes all data locally without sending to OpenAI, eliminating cloud privacy risks.
Special Considerations for Accessibility
As an assistive technology for individuals with NMD:
- User communication content is treated as highly sensitive personal data.
- The tests has been done to simulate the disabled's ability to use the software.
Data Security Measures
- All processing occurs on the user's device with no external data transmission.
- The system operates on one way transimission(download only), eliminating network-based vulnerabilities.
GDPR Compliance
- Data Minimization: Only essential information for operation is processed and stored.
- Storage Limitation: No long-term storage of user interactions is performed.
- Transparency: No chat history is saved.
- User Rights: Users have easy access to delete any locally stored settings on their browser.
Privacy by Design Approach
Both phases of our project were developed with privacy as a fundamental design principle. By creating fully offline alternatives to cloud-based AI systems, we have eliminated many traditional privacy concerns while providing equivalent functionality.
- Minimizing data collection to only what is necessary.
- Processing all data locally on user devices.
- Providing users with complete control over their data.
- Eliminating risks associated with data transmission and third-party processing.
- Ensuring accessibility without compromising privacy.
Legal Compliance and Data Governance
In accordance with the guidelines outlined in our course and the COMP0016 lecture materials Legal Issues, this project adheres to:
- GDPR principles : ensuring Lawfulness, fairness and transparency, Purpose limitation, Data minimisation, Accuracy, Storage limitation, Integrity and confidentiality (security), Accountability.
- User rights : including access, correction, and erasure of personal data.
- Data protection by design : all processing is executed locally without transmission to external servers.
Reference & Source Libraries
Phase 1: Offline LLM-Powered Literature Review Tool
Source Libraries
- Python Standard Libraries- PSF license
gc
- Garbage collectionos
- Operating system interfaces for file operationstime
- Time accessshutil
- High-level file operationsargparse
- argument parsing
- GUI Framework- no lincense required
tkinter
- Standard Python interface to Tk GUI toolkit
- Machine Learning
torch
- BSD 2-Clause License, but with a 3rd clause that prohibits others from using the name of the copyright holder or its contributors to promote derived products without written consent.transformers
- Apache License 2.0.
- LangChain Ecosystem - MIT license
langchain_chroma
langchain.prompts
langchain_community.vectorstores
langchain_community.document_loaders
langchain_text_splitters
langchain.schema.document
- Loading logo
pillow
MIT license
- Models used
multilingual-e5-small
- MIT lincenseQwen 2.5 series models
- Qwen LICENSE AGREEMENT: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/resolve/main/LICENSE
Phase 2: Ossia Voice with Offline LLM
Source Libraries
- STT Libraries
huggingface/transformers
- Apache LicenseVersion 2.0vue
- MIT LicenseWhisper models of different sizes, onnx format
- MIT Licenseonnx-community/pyannote-segmentation-3.0
- MIT License
- TTS Libraries
huggingface/transformers
- Apache LicenseVersion 2.0Xenova/speecht5_tts
- MIT License
- Offline LLM Libraries
huggingface/transformers
- Apache LicenseVersion 2.0Gemma2
- Apache LicenseVersion 2.0
- all other part Libraries
huggingface/transformers
- Apache License 2.0@lc-ai/web-llm
- Apache License 2.0@xenova/transformers
- Apache License 2.0openai
N/Apinia
- MIT Licensevue-router
- MIT Licensevuetify
- MIT License@mdi/font
- Apache 2.0 and MIT@rushstack/eslint-patch
- MIT License@vitejs/plugin-vue
- MIT License@vue/eslint-config-prettier
- MIT License@vue/test-utils
- MIT Licensecypress
- MIT Licenseeslint
- MIT Licenseeslint-plugin-cypress
- MIT Licenseeslint-plugin-vue
- MIT Licensejsdom
- MIT Licenseprettier
- MIT Licensesass
- MIT Licensestart-server-and-test
- N/Avite
- MIT Licensevitest
- MIT License
Reference Projects
- Speech Processing research non-MoscoW-listed-feature
Whisper realtime
- Reused this project's worker file and use the project's concept idea of chunk processing- https://github.com/huggingface/transformers.js/blob/main/examples/webgpu-whisper/src/worker.js
- Licensed under Apache License Version 2.0.