Testing Both Experiments
Our project has two experiments:
Quick Navigation OfflineLLM powered literature review tool with UCL GDIHUB
Quick Navigation Offline LLM with Ossia voice
Experiment 1: Offline-LLM powered literature review tool with UCL GDIHUB
Testing Strategy
Our testing approach is multi-layered to make sure everything works smoothlyβfrom individual parts to the entire system under pressure.
We start with unit tests to check each component, then move on to client (integration) tests that mimic real user interactions, and finally, we perform stress tests to see how the system behaves under heavy load. This process helps us catch problems early, confirm that all parts work well together, and ensure the system remains reliable even extreme usage (50+ documents loaded), multiple open and close.
Unit and Integration Testing
Purpose
The unit test is used on the database.py segment to ensure:
- The database could be cleared and reloaded with new data.
- The database could be queried with the RAG model.
- The database could be high-efficient to avoid duplication.
Testing Tools
- Python unittest framework
- Mocking libraries
Methodology
Run all tests using command pytest tests.
Results
All 6 tests are passed.
π‘ Analysis & Conclusion
This test is the baseline for all the developments by making sure the data loaded into the db is correct
Compatibility Testing
Purpose
To ensure our offline RAG system operates consistently across various operating systems and environments, making it accessible to all potential users regardless of their platform.
Testing Tools
- Machines with different OS configurations
Methodology
We established a systematic testing protocol across multiple environments:
- Deployed and tested the application on Windows, MacOS
- Verified functionality with multiple Python versions
- The Windows machines are running WIN11 23H2/24H2 machines with RTX4060 + 32G of RAM
- The MacOS machines are running MacOS 15.3 with M4 PRO 24G and M4 MAX 48G.
Results
The application demonstrated consistent functionality across all tested platforms with only minor differences:
- Successfully ran on Windows, MacOS
- Compatible with all tested Python versions
- Documentation updated with detailed installation requirements
π‘ Analysis & Conclusion
Cross-platform compatibility was successfully achieved, ensuring our solution is accessible to researchers regardless of their operating system preference. The application demonstrated consistent behavior and performance across all tested environment.
Responsive Design Testing
Purpose
To ensure the user interface adapts appropriately to different screen sizes and resolutions, providing an optimal experience across various display configurations.
Testing Tools
- Screen recording software
- Various display resolutions and aspect ratios
Methodology
We implemented and tested adaptive UI features through:
- Testing the application at various screen sizes from 1080P to 2.5K resolution
- Verifying proper element repositioning during window resizing
- Ensuring UI elements remain accessible and functional at all sizes,
Results
The responsive design features performed effectively:
- UI successfully adapted to all tested window sizes
- Elements maintained proper proportions and accessibility
- Text remained readable across all display configurations
π‘ Analysis & Conclusion
The implementation of adaptive UI successfully ensures that our application provides a consistent user experience regardless of display size or resolution. This feature is particularly valuable for researchers who may use the tool across different devices or in varied workspace configurations.
User Acceptance Testing
Purpose
To validate that our offline RAG system meets the actual needs of researchers and effectively supports their literature review workflows in real-world scenarios. This testing phase confirms that the system delivers value to end users and identifies any usability improvements before final deployment.
Testers
- UCL GDIHUB researchers as actual user
- Year 2 students as simulated user
Client Feedback
The clients expressed high satisfaction with both the operation and precision of the entire workflow.
"The system efficiently handled our needs while maintaining excellent response accuracy."
Detailed Analysis
The user acceptance testing was conducted with researchers from UCL GDIHUB, who simulated real-world usage of the system. The test cases covered the full workflow, from adding documents to the database, applying and managing filter items, querying the system using retrieval-augmented generation (RAG), verifying the results, and exporting or deleting data.
Throughout the testing process, simulated researchers interacted with the system as intended users, assessing usability, accuracy, and overall system performance. The testing confirmed that the system efficiently handled document management and filtering, provided precise and relevant responses to queries, and maintained stability during data operations.
Based on feedback, clients expressed satisfaction with both the functionality. No major issues were reported, but minor suggestions were made for UI improvements to enhance the user experience. These insights has been Deployed of the system to refine usability further.
Stress Testing
Purpose
To evaluate stability under heavy load conditions, ensuring the offline RAG system can handle extensive document.
Testing Tools
- Memory profiler
- Large document corpus (50+ academic papers)
Methodology
We conducted systematic stress testing through:
- Loading progressively larger document collections (5, 20, 50+ documents)
- Monitoring memory usage during extended operation periods
- Testing with lengthy documents
- Simulating user interactions in quick succession
Results
Stress testing revealed:
- Successfully handled 50+ documents without significant performance degradation
- Memory usage remained within acceptable bounds during extended sessions
- No critical failures occurred during intensive testing scenarios
π‘ Analysis & Conclusion
The stress testing confirmed that our offline RAG system can reliably handle the document volumes and usage patterns expected in real-world research scenarios.
Experiment 2: Offline LLM with Ossia voice
Testing Strategy
Our testing for the Ossia Voice project follows a comprehensive approach that addresses both technical functionality and user experience. We focus on validating the single subsystem (SST TTS LLM) before moving to overall testing.
Unit and Integration Testing
Purpose
To verify that individual components of the Ossia voice system function correctly and work together seamlessly, ensuring reliability of the core offline speech processing functionality.
Testing Tools
- Browser console
- Debugging windows in form of vue component
Methodology
Our testing methodology included:
- Tests for speech processing modules
- Test for offline LLM response
- Integration tests from end to end
Tests were run by all team memebrs as well as external testers since LLM and speech recognition response is unpreditable
Results
Testing results demonstrated:
- All core modules passed manual verification of results
- Integration points between components work correctly
- Edge cases identified and addressed and errors are handled correctly
π‘ Analysis & Conclusion
The unit and integration testing confirmed that our offline LLM powered ossia system functions reliably across all core components. Minor issues were identified and resolved during the testing process, ensuring a stable foundation for the user acceptance phase.
Compatibility Testing
Purpose
To ensure the Ossia voice system works consistently across different operating systems, hardware configurations and different needs.
Testing Tools
- Multiple hardware configurations
- Various operating systems (Windows, macOS)
- Different assistive input devices(screen keyboard,touchpad and mocked use of eye tracking devices)
Methodology
We conducted compatibility testing across:
- Windows 10/11 and macOS environments
- Systems with various CPU/GPU configurations
- Trial with common assistive input technologies
Results
Compatibility testing revealed:
- Successful operation across all tested operating systems
- Compatible with standard assistive input devices
π‘ Analysis & Conclusion
The system demonstrated good compatibility across different environments, the whole system was established to ensure adequate speech processing and offline-LLM performance.
Responsive Design Testing
Purpose
To ensure the Ossia voice application interface adapts acceptablly to different screen sizes and resolutions, providing an accessible experience for users working on regular desktop and laptops.
Testing Tools
- Browser DevTools
- Various physical devices (Windows tablets, Windows and Mac laptops, Windows desktops)
- Screen recording for further analysis
Methodology
Our responsive design testing approach included:
- Testing the application across multiple screen sizes from 1080p to 4K resolution
- Verifying button and interactive element sizing for accessibility on chrome based browsers
- Ensuring critical interface components stay in place
Results
The responsive design testing showed:
- Interface successfully adapted to all tested screen sizes
- Targets and UI maintained suitable size for users with motor impairments
- Text and speech controls and inputs remained accessible across all tested devices
- Microsoft edge for MacOS Version 134.0.3124.68 has bugs on personal accounts downloading files (while is working fine using school microsoft account), instruction is given to end user to reset their settings or switch to google chrome
π‘ Analysis & Conclusion
Our responsive design testing confirmed that the Ossia voice application provides consistent accessibility across different devices and screen resolutions. This is particularly important for users with NMDs who may use various devices depending on their environment and care situation. The adaptable interface ensures that users can effectively communicate with their loves and friends.
User Acceptance Testing
Purpose
To validate that the offline Ossia voice system meets the needs of people with NMDs, providing an accessible and effective communication tool without requiring OPENAI API subscriptions.
Testers
- Simulated users with motor neuron disease(with certain body parts fixed)
- Accessibility specialists
- Regular testers simulating patient's loved ones and friends
Methodology
User testing was conducted with:
- Guided setup to validate the setup procedure
- Good accessibility and response that could generate suitable words
- Feedback collection through interviews and questionnaires
Results
User feedback revealed:
- High satisfaction with the offline-LLM functionality
- Positive response to voice quality and naturalness
- Appreciation for the elimination of API costs
π‘ Analysis & Conclusion
User acceptance testing confirmed that our offline-llm solution successfully addresses the original goal of making Ossia Voice accessible without API dependencies. The system provides similar quality to online solutions while eliminating subscription costs, which was particularly valued by users who depend on the system for daily communication.