MoSCoW achievement table

Phase1

ID	Requirement	Priority	State	Contributors
1	Have working GUI with AI response and textbox to enter keywords or prompt text	Must	Yes	Nigel, Haocheng
2	Be able to extract text from PDFs	Must	Yes	Nigel
3	Generate a suitable response using an offline LLM	Must	Yes	Nigel, Neethesh, Haocheng, Zizhou
4	Use RAG and vector database to search documents	Must	Yes	Neethesh, Haocheng, Zizhou
5	Chunk texts for easy retrieval and more understandable responses	Should	Yes	Nigel
6	Add support to choose from files or folders	Should	Yes	Nigel
7	Save history of previous responses	Should	Yes	Haocheng
8	Add support for Word documents	Could	Yes	Haocheng
9	Add support for Markdown files	Could	Yes	Haocheng
10	Add support for multi-column documents	Could	Yes	Nigel, Neethesh
11	Allow for Do-Not-Include items	Could	Yes	Haocheng
12	Filters and option to sort	Could	Yes	Haocheng
13	Be able to link to similar online resources/OneDrive for further study	Could	Partial (able to extract from OneDrive if already logged in on the system and download button is clicked)	Haocheng, Nigel, Zizhou, Neethesh
14	Add support for queries regarding images in documents	Could	No	-
15	Add build-in helper page	Could	Yes	Haocheng
Key Functionalities (Must and Should):				100%
Optional Functionalities (Could):				81%

Phase2

ID	Requirement	Priority	State	Contributors
1	Speech To Text working locally in browser	Must	Yes	Haocheng
2	Text To Speech working locally in browser	Must	Yes	Neethesh, Nigel
3	Use offline LLM to generate possible keywords	Must	Yes	Zizhou
4	Allow users to choose between offline/online models and STT models	Must	Yes	Nigel
5	Have suitable expression	Should	Yes	Nigel, Neethesh, Haocheng, Zizhou
6	Ensure that the LLM generates a wide array of responses based on mood	Should	Yes	Zizhou
7	Diarisation to discern between multiple speakers**	Could	Yes	Haocheng
8	Allow user to upload voice clips for voice cloning in the user’s own voice	Could	Partially (We are able to do the cloning of the voice by creating an embedding however, this is done using a python script rather than on the client side as per the initial requirements)	Nigel, Neethesh
9	Support real-time speech-to-text using Whisper with finalization for larger models**	Could*	Yes	Haocheng
Key Functionalities (Must and Should):				100%
Optional Functionalities (Could):				83%

* (Not in MoSCoW document but preferred by client as a research feature)

** (More research than feature; client does not expect them in the main branch)

Non-Functional Requirements (Both)

ID	Requirement	Priority	State	Contributors
10	Be easily usable for patients	Must	Yes	All
11	Be easily perceivable and maintainable for future development	Must	Yes	All
12	Have minimal latency or wait times while using the app	Should	Yes	All
13	Compile Windows exe(phase1)	Should	Yes	All
14	Compile MacOS exec(phase1)	Could	Yes	All

Known Bug List

In this project, our team members use GitHub to update the project and fix bugs. Once a bug is discovered, a team member is assigned to identify the issue and find a solution. As a result, we have fixed almost all bugs that would seriously impact the user experience.

In the end, only two bugs remained unfixed:

ID	Which Part	Known Bug	Priority
1	Phase 1	The software is slow when handle uploading more than ten files at a time	Low
2	Phase 2	For LLMs with size smaller than 1b, they may fail to output a vaild json answer	Low

Bugfix commit on our GitHub project

Individual contribution distribution table for Phase1

Work Packages	Haocheng,Xu	Mathews, Nigel	Neethesh, Neethesh	Zizhou, Shi
UI Design	50%	50%	0%	0%
LLM	30%	20%	20%	30%
Database	25%	10%	40%	25%
Testing	40%	20%	20%	20%
Overall	25%	25%	25%	25%

Individual contribution distribution table for Phase2

Work Packages	Haocheng,Xu	Mathews, Nigel	Neethesh, Neethesh	Zizhou, Shi
UI Design	15%	70%	0%	15%
LLM	0%	0%	0%	100%
Text-To-Speech	0%	15%	85%	0%
Speech-To-Text	100%	0%	0%	0%
Testing	40%	20%	20%	20%
Overall	25%	25%	25%	25%

Individual contribution distribution table for website

Work Packages	Haocheng,Xu	Mathews, Nigel	Neethesh, Neethesh	Zizhou, Shi
Home page	0%	100%	0%	0%
Requirement	10%	80%	10%	0%
Research	20%	20%	20%	40%
System Design	0%	0%	100%	0%
Implementation	0%	0%	100%	0%
UI Design	50%	50%	0%	0%
Testing	100%	0%	0%	0%
Evaluation	0%	0%	0%	100%
User Mannual	100%	0%	0%	0%
GDPR & Privacy	100%	0%	0%	0%
Development Blog	0%	100%	0%	0%
Monthly Videos	25%	25%	25%	25%
Overall	25%	25%	25%	25%

Note: Each task is not weighted equally, and as such the overall contribution numbers may not add up to the exact overall values.

Critical Evaulation

Project management

Throughout all stages of the project, we used GitHub to update code and fix bugs. At the same time, we held weekly meetings with our client to provide progress updates. We also had a clear timeline, outlining the tasks we needed to complete each week.

In addition to coding, we regularly updated our research finding on the notion, sharing them to teammates and our client. These measures effectively helped us manage the project's progress, ensuring that we did not encounter any management issues until the end of the project.

Functionality

In our Phase 1 project, users could upload PDF, DOCX, and Markdown (MD) files, covering the most common data types. Additionally, users could ask questions via the input box using their keyboard and add negative prompts to filter responses.

In Phase 2, we introduced multiple input methods, including voice input and text input. Users could also modify responses in various ways, such as selecting keywords and adjusting the tone of the answers.

Overall, I believe we successfully implemented the planned functionality.

Stability

We out great effort on the stability of our software. Throughout the project, we conducted tests to ensure that all softwares' functionalities worked properly.

Additionally, we implemented various backup plans, such as regenerating responses when the answer did not match the requirements. These measures ensured that our software could run stably.

Efficiency

To ensure that our product is compatible with most users' device, we adopted the most cost-effective plan in both model and solution selection.

For example, we chose the model that performed best under the lowest requirements as our final solution and used quantized version to minimize memory usage. Additionally, we use a specialized embedding model to handle data embedding task, ensuring that our project achieves optimal performance with minimal resource consumption.

Compatibility

Our project fully takes into account the performance of all users' devices, ensuring that it can run on as many devices as possible. both our Phase 1 and Phase 2 projects offer Windows and Mac versions and support hardware acceleration across all platforms, ensuring that users can run our software smoothly.

In Phase 1, we adopted a combination of a 3B model and a lightweight embedding model, with a total memory requirement of no more than 16GB, ensuring that it could run on most mainstream devices with 16GB of RAM.

In Phase 2, we provided users with three model options: 8B, 3B, and OpenAI API. If the user's device has lower performance, such as a mobile device or an older computer, they can choose to run the 3B model or use OpenAI's API. While if the user's device has better performance and they want better use experience, they can choose the 8B model.

Maintainability

For our project's maintainability, we have structured our code by writing separate functions for different features, making it easier to modify and update specific functionalities without affecting the entire system. This modular approach enhances the readability and scalability of our code.

However, one drawback is that bug fixes and patches were not well managed. Some patches were added without proper documentation or organization, which could lead to difficulties in tracking and maintaining in the long run.

Overall, we think we did a good job in Project maintainability, but we also have some drawback that can be improved in future project.

User experience

During the project showcase day, we demonstrated our demo to many guests and recorded their user experiences and suggestions. Many users were amazed by our product and found it very useful. Of course, we also received some suggestions for improvement.

One particularly valuable suggestion was whether our project could be integrated with visual input devices, such as eye-tracking technology, to help users input their thoughts more quickly. This is especially relevant because our target users typically have mobility disabilities. While our product helps reduce the amount of input required, entering keywords can still be quite challenging for them. This suggestion is very useful and has provided inspiration for our future plans.

picture taken with the guests

Future Work

Integration with Visual Input Devices

One particularly valuable improvement we plan to test is the integration of our project with visual input devices, such as eye-tracking technology. This would allow users to input their thoughts faster. This feature is especially important because our target users often have mobility disabilities, making traditional input methods challenging. While our product already helps reduce the amount of input required, entering keywords can still be difficult for some users. By incorporating visual input technology, we aim to make our product more accessible, efficient, and user-friendly for those who need it most.

Expanding Compatibility Across More Devices

We aim to optimize our project for more devices, particularly mobile devices. This will ensure that users can run the software on a wider range of hardware, improving accessibility and flexibility.

Embedding voice on client side

The initial plan to completely have voice cloning on the client side was hindered by difficulty and complexities in creating a transformers pipeline for the voice embedding models. Hence in the future with more research and knowledge in this field, we plan to integrate the voice embedding to the client side through creating a transformers pipeline.

Third-Party Software Integration

We plan to integrate our project with third-party applications, such as WhatsApp. This will enable users to access and use our software directly within their chat apps, eliminating the need to open a separate application. This seamless integration will enhance user convenience and improve the overall experience.