MoSCoW achievement table

Phase1

ID Requirement Priority State Contributors
1 Have working GUI with AI response and textbox to enter keywords or prompt text Must Yes Nigel, Haocheng
2 Be able to extract text from PDFs Must Yes Nigel
3 Generate a suitable response using an offline LLM Must Yes Nigel, Neethesh, Haocheng, Zizhou
4 Use RAG and vector database to search documents Must Yes Neethesh, Haocheng, Zizhou
5 Chunk texts for easy retrieval and more understandable responses Should Yes Nigel
6 Add support to choose from files or folders Should Yes Nigel
7 Save history of previous responses Should Yes Haocheng
8 Add support for Word documents Could Yes Haocheng
9 Add support for Markdown files Could Yes Haocheng
10 Add support for multi-column documents Could Yes Nigel, Neethesh
11 Allow for Do-Not-Include items Could Yes Haocheng
12 Filters and option to sort Could Yes Haocheng
13 Be able to link to similar online resources/OneDrive for further study Could Partial (able to extract from OneDrive if already logged in on the system and download button is clicked) Haocheng, Nigel, Zizhou, Neethesh
14 Add support for queries regarding images in documents Could No -
15 Add build-in helper page Could Yes Haocheng
Key Functionalities (Must and Should): 100%
Optional Functionalities (Could): 81%

Phase2

ID Requirement Priority State Contributors
1 Speech To Text working locally in browser Must Yes Haocheng
2 Text To Speech working locally in browser Must Yes Neethesh, Nigel
3 Use offline LLM to generate possible keywords Must Yes Zizhou
4 Allow users to choose between offline/online models and STT models Must Yes Nigel
5 Have suitable expression Should Yes Nigel, Neethesh, Haocheng, Zizhou
6 Ensure that the LLM generates a wide array of responses based on mood Should Yes Zizhou
7 Diarisation to discern between multiple speakers** Could Yes Haocheng
8 Allow user to upload voice clips for voice cloning in the user’s own voice Could Partially (We are able to do the cloning of the voice by creating an embedding however, this is done using a python script rather than on the client side as per the initial requirements) Nigel, Neethesh
9 Support real-time speech-to-text using Whisper with finalization for larger models** Could* Yes Haocheng
Key Functionalities (Must and Should): 100%
Optional Functionalities (Could): 83%

* (Not in MoSCoW document but preferred by client as a research feature)

** (More research than feature; client does not expect them in the main branch)

Non-Functional Requirements (Both)

ID Requirement Priority State Contributors
10 Be easily usable for patients Must Yes All
11 Be easily perceivable and maintainable for future development Must Yes All
12 Have minimal latency or wait times while using the app Should Yes All
13 Compile Windows exe(phase1) Should Yes All
14 Compile MacOS exec(phase1) Could Yes All

Known Bug List

In this project, our team members use GitHub to update the project and fix bugs. Once a bug is discovered, a team member is assigned to identify the issue and find a solution. As a result, we have fixed almost all bugs that would seriously impact the user experience.

In the end, only two bugs remained unfixed:

ID Which Part Known Bug Priority
1 Phase 1 The software is slow when handle uploading more than ten files at a time Low
2 Phase 2 For LLMs with size smaller than 1b, they may fail to output a vaild json answer Low
github

Bugfix commit on our GitHub project

Individual contribution distribution table for Phase1

Work Packages Haocheng,Xu Mathews, Nigel Neethesh, Neethesh Zizhou, Shi
UI Design 50% 50% 0% 0%
LLM 30% 20% 20% 30%
Database 25% 10% 40% 25%
Testing 40% 20% 20% 20%
Overall 25% 25% 25% 25%

Individual contribution distribution table for Phase2

Work Packages Haocheng,Xu Mathews, Nigel Neethesh, Neethesh Zizhou, Shi
UI Design 15% 70% 0% 15%
LLM 0% 0% 0% 100%
Text-To-Speech 0% 15% 85% 0%
Speech-To-Text 100% 0% 0% 0%
Testing 40% 20% 20% 20%
Overall 25% 25% 25% 25%

Individual contribution distribution table for website

Work Packages Haocheng,Xu Mathews, Nigel Neethesh, Neethesh Zizhou, Shi
Home page 0% 100% 0% 0%
Requirement 10% 80% 10% 0%
Research 20% 20% 20% 40%
System Design 0% 0% 100% 0%
Implementation 0% 0% 100% 0%
UI Design 50% 50% 0% 0%
Testing 100% 0% 0% 0%
Evaluation 0% 0% 0% 100%
User Mannual 100% 0% 0% 0%
GDPR & Privacy 100% 0% 0% 0%
Development Blog 0% 100% 0% 0%
Monthly Videos 25% 25% 25% 25%
Overall 25% 25% 25% 25%

Note: Each task is not weighted equally, and as such the overall contribution numbers may not add up to the exact overall values.

Critical Evaulation

Project management

Throughout all stages of the project, we used GitHub to update code and fix bugs. At the same time, we held weekly meetings with our client to provide progress updates. We also had a clear timeline, outlining the tasks we needed to complete each week.

In addition to coding, we regularly updated our research finding on the notion, sharing them to teammates and our client. These measures effectively helped us manage the project's progress, ensuring that we did not encounter any management issues until the end of the project.

Functionality

In our Phase 1 project, users could upload PDF, DOCX, and Markdown (MD) files, covering the most common data types. Additionally, users could ask questions via the input box using their keyboard and add negative prompts to filter responses.

In Phase 2, we introduced multiple input methods, including voice input and text input. Users could also modify responses in various ways, such as selecting keywords and adjusting the tone of the answers.

Overall, I believe we successfully implemented the planned functionality.

Stability

We out great effort on the stability of our software. Throughout the project, we conducted tests to ensure that all softwares' functionalities worked properly.

Additionally, we implemented various backup plans, such as regenerating responses when the answer did not match the requirements. These measures ensured that our software could run stably.

Efficiency

To ensure that our product is compatible with most users' device, we adopted the most cost-effective plan in both model and solution selection.

For example, we chose the model that performed best under the lowest requirements as our final solution and used quantized version to minimize memory usage. Additionally, we use a specialized embedding model to handle data embedding task, ensuring that our project achieves optimal performance with minimal resource consumption.

Compatibility

Our project fully takes into account the performance of all users' devices, ensuring that it can run on as many devices as possible. both our Phase 1 and Phase 2 projects offer Windows and Mac versions and support hardware acceleration across all platforms, ensuring that users can run our software smoothly.

In Phase 1, we adopted a combination of a 3B model and a lightweight embedding model, with a total memory requirement of no more than 16GB, ensuring that it could run on most mainstream devices with 16GB of RAM.

In Phase 2, we provided users with three model options: 8B, 3B, and OpenAI API. If the user's device has lower performance, such as a mobile device or an older computer, they can choose to run the 3B model or use OpenAI's API. While if the user's device has better performance and they want better use experience, they can choose the 8B model.

Maintainability

For our project's maintainability, we have structured our code by writing separate functions for different features, making it easier to modify and update specific functionalities without affecting the entire system. This modular approach enhances the readability and scalability of our code.

However, one drawback is that bug fixes and patches were not well managed. Some patches were added without proper documentation or organization, which could lead to difficulties in tracking and maintaining in the long run.

Overall, we think we did a good job in Project maintainability, but we also have some drawback that can be improved in future project.

User experience

During the project showcase day, we demonstrated our demo to many guests and recorded their user experiences and suggestions. Many users were amazed by our product and found it very useful. Of course, we also received some suggestions for improvement.

One particularly valuable suggestion was whether our project could be integrated with visual input devices, such as eye-tracking technology, to help users input their thoughts more quickly. This is especially relevant because our target users typically have mobility disabilities. While our product helps reduce the amount of input required, entering keywords can still be quite challenging for them. This suggestion is very useful and has provided inspiration for our future plans.

user

picture taken with the guests

Future Work

Integration with Visual Input Devices

One particularly valuable improvement we plan to test is the integration of our project with visual input devices, such as eye-tracking technology. This would allow users to input their thoughts faster. This feature is especially important because our target users often have mobility disabilities, making traditional input methods challenging. While our product already helps reduce the amount of input required, entering keywords can still be difficult for some users. By incorporating visual input technology, we aim to make our product more accessible, efficient, and user-friendly for those who need it most.

Expanding Compatibility Across More Devices

We aim to optimize our project for more devices, particularly mobile devices. This will ensure that users can run the software on a wider range of hardware, improving accessibility and flexibility.

Embedding voice on client side

The initial plan to completely have voice cloning on the client side was hindered by difficulty and complexities in creating a transformers pipeline for the voice embedding models. Hence in the future with more research and knowledge in this field, we plan to integrate the voice embedding to the client side through creating a transformers pipeline.

Third-Party Software Integration

We plan to integrate our project with third-party applications, such as WhatsApp. This will enable users to access and use our software directly within their chat apps, eliminating the need to open a separate application. This seamless integration will enhance user convenience and improve the overall experience.