An overview of our core algorithmic processes and data flows
We were thinking of improving accuracy and relevancy, so we planned to fine-tune a language model to achieve better performance. Theoretically, a fine-tuned model However, due to various reasons such as GPU limitation, cost limitation, time limitation, we stopped implementing. The fine-tuning code is available in our GitHub, called model_agent in backend. Here is the process of how we did it.
There are three files involved in the fine-tuning process:
Responsible for loading PDF files, cleaning the text, splitting the data
This script performs the following steps:
We prepare the dataset by pairing X as input medical papers and Y as output systematic reviews into a Pandas data frame. We created the dataset manually by looking up systematic reviews (Y) online and getting their reference papers (X). Then, we split the data into training and testing datasets with 80% for training and 20% for testing. After all these, we save them into training_data.csv and testing_data.csv.
Handles the fine-tuning process of the pre-trained model
This script explains the process of fine-tuning a pre-trained T5-small model (from Hugging Face) on the prepared dataset:
Used for testing the fine-tuned model
This script tests the fine-tuned model:
Unfortunately, the result output was some sentences that aren't even grammatically correct, many sentence were meaningless and lack coherence. As the model failed to produce a usable output, we did not conduct any measurement evaluations like cosine similarity or BLEU scores.
Overall, it is an interesting experience, especially the fine-tuning process for transformer models. Also, we understand that there is a lot of challenges beside writing the code itself, such as data quality, pre-trained model itself, etc. If we are given more time and resources, we could try this on a larger dataset and better language model, potentially achieving higher quality results.
Here's the comparison between output from our fine-tuned model, and our current application output:
This subsystem enhances content quality by systematically acquiring and refining literature in three main stages:
Extracts domain-relevant terms (e.g., "Covid-19") from segmented PDF sections to guide focused literature retrieval.
Searches PubMed → PMC/DOI, retrieves full texts or abstracts via PMC/Sci-Hub.
Cleans and normalizes collected data:
This system is key to building a reliable, academically-grounded database that supports accurate and responsible content generation. Future iterations will aim to deepen literature coverage and context relevance for systematic review tasks.