| ID | Requirement | Priority | Status | Contributors |
|---|---|---|---|---|
| Key Functionalities (Must + Should Have) | ||||
| 1 | User can input PDFs | Must | Completed | All |
| 2 | User can enter a prompt to define the systematic review | Must | Completed | All |
| 3 | Systematic review is displayed to the user | Must | Completed | Wing Ho |
| 4 | Vector database integration | Must | Completed | All |
| 5 | LLM-powered section generation | Must | Completed | Wing Ho, Yiwei |
| 6 | User authentication system | Should | Completed | Wing Ho, Kevin |
| 7 | Review history tracking for users | Should | Completed | Wing Ho, Kevin |
| 8 | Quality assessment visualisation of the review through graphs | Should | Completed | Kevin, Jennifer |
| Optional Functionalities (Could Have) | ||||
| 9 | Users can view the source files of each review | Could | Completed | Wing Ho |
| 10 | PDFs of the systematic review can be exported | Could | Completed | Wing Ho |
| 11 | Medical paper database integration | Could | Not Completed | n/a |
| ID | Bug Description | Priority |
|---|---|---|
| 1 | Entering an ID into the URL causes the generation process with no files or prompt | Medium |
| 2 | Ocassional markdown rendering issues with HTML of the review | Low |
| Work Packages | Wing Ho Yeung | Jennifer Zhang | Yiwei Wang | Kevin Jin |
|---|---|---|---|---|
| Research and Experiments | 20 | 40 | 20 | 20 |
| UI Design | 80 | 0 | 0 | 20 |
| Coding | 25 | 25 | 25 | 25 |
| Testing | 0 | 20 | 40 | 40 |
| Overall Contribution (%) | 31.25 | 21.25 | 21.25 | 26.25 |
| Work Packages | Wing Ho Yeung | Jennifer Zhang | Yiwei Wang | Kevin Jin |
|---|---|---|---|---|
| Website Template and Setup | 0 | 0 | 100 | 0 |
| Home | 10 | 55 | 35 | 0 |
| Video | 0 | 0 | 0 | 100 |
| Requirement | 0 | 80 | 0 | 20 |
| Research | 0 | 70 | 0 | 30 |
| Algorithm | 0 | 50 | 20 | 30 |
| UI Design | 30 | 40 | 0 | 30 |
| System Design | 60 | 20 | 0 | 20 |
| Implementation | 60 | 20 | 0 | 20 |
| Testing | 0 | 0 | 100 | 0 |
| Evaluation and Future Work | 0 | 80 | 20 | 0 |
| User and Deployment Manuals | 70 | 0 | 0 | 30 |
| Legal Issues | 0 | 30 | 0 | 70 |
| Blog and Monthly Video | 100 | 0 | 0 | 0 |
| Overall Contribution (%) | 23.6 | 31.8 | 19.6 | 25 |
User Interface and Experience: We aim to create a simple, clean and effective user interface, which is easily navigable even for users that are not familiar with computers or AI. Key features include history tracking of generated systematic reviews, clickable input PDFs, and download options for reviews. These enhance user experience, and general feedback was positive.
Functionality: All features were successfully implemented. No major issues were reported regarding core functionality.
Stability: The application performs reliably, although improvements could be made to enhance long-term maintainability of the codebase.
Efficiency: The application's efficiency is inadequate due to the time required to upsert vectors and process the LLM. This criterion will be prioritized in future work.
Compatibility: The system currently only runs locally. We plan to deploy it to the cloud in future to enhance accessibility and portability.
Maintainability: Project files are well-organized with modular structure. This helped identify and fix bugs more efficiently.
Project Management: We maintained good communication via WhatsApp and GitHub. GitHub was used extensively for issue tracking, pull requests, and branching workflows. Regular weekly meetings with and without the client kept progress on track. These practices improved teamwork and version control.
While overall collaboration was smooth, we believe starting coding earlier could have allowed us to implement more features and experiments.
We often faced the question: “Why use RAG-n-Bones instead of ChatGPT directly?” To address this, we conducted a user survey comparing outputs of both systems using the same prompt. Ratings ranged from 1 (ChatGPT better) to 10 (RAG-n-Bones better):
| Evaluation Criteria | Average Score | Interpretation |
|---|---|---|
| Relevance and Accuracy | 7 | Better than expected — despite using same model, users found our results more relevant. |
| Response Speed | 1 | Significantly slower than ChatGPT due to embedding & document processing overhead. |
| Response Structure | 10 | RAG-n-Bones generates better-structured reviews, aiding comprehension. |
| Reduction of Hallucinations | 5 | No significant difference — both use GPT-3.5 and are similarly prone to hallucination. |
| Overall Preference for Systematic Review Generation | 7 | Despite latency, users preferred our system for accuracy and structure. |
Here's the comparison between output from our application and ChatGPT with a query we typed in: “Write me a systematic review for efficacy of COVID-19 vaccines by the attached research papers."
The comparison shows that review generated via our application focuses on the efficacy of
RBD-based Covid-19 vaccines. It provides an in-depth systematic review of studies on this vaccine
and duscussing efficavy rates and public health implicationd. On the other hand, the GPT generated review has
a broader scope, it compares multiple Covid-19 vaccines and discusses their general efficacy.
Moreover, the review generated by our application is more structured and detailed, it follows PRISMA guidelines
and includes a detailed discussion of the studies and their results. The review generated by ChatGPT is less
structured and only focused on extracting information from the three input research papers.
Overall, the review generated by our application is more proficient and understands the context and format needed in
a systematic review, whereas ChatGPT only provides a general overview with extracted information.