Algorithm - RAG-n-Bones

Implementation

Experiment on Fine-tuning

We were thinking of improving accuracy and relevancy, so we planned to fine-tune a language model to achieve better performance. Theoretically, a fine-tuned model However, due to various reasons such as GPU limitation, cost limitation, time limitation, we stopped implementing. The fine-tuning code is available in our GitHub, called model_agent in backend. Here is the process of how we did it.

Fine-tuning Process

There are three files involved in the fine-tuning process:

upload_pdfs.py

Responsible for loading PDF files, cleaning the text, splitting the data

This script performs the following steps:

Loads PDF files
Cleans the text
Splits the data (similar to the regular pipeline)
Prepares training and testing datasets
Saves data into CSV files

We prepare the dataset by pairing X as input medical papers and Y as output systematic reviews into a Pandas data frame. We created the dataset manually by looking up systematic reviews (Y) online and getting their reference papers (X). Then, we split the data into training and testing datasets with 80% for training and 20% for testing. After all these, we save them into training_data.csv and testing_data.csv.

fine_tune_model.py

Handles the fine-tuning process of the pre-trained model

This script explains the process of fine-tuning a pre-trained T5-small model (from Hugging Face) on the prepared dataset:

Loads the training_data.csv
Extracts X and Y
Creates a custom TensorFlow dataset class to tokenize the input X and target Y texts
Initializes the model
Compiles it with optimizer and computes the loss function
Trains the model on the dataset in batches of 8 for 3 epochs
Evaluates the model on the same dataset (we did not have time to create a validation dataset)
Saves the fine-tuned model and tokenizer to the fine-tuned-model directory

try_model.py

Used for testing the fine-tuned model

This script tests the fine-tuned model:

Loads the model from fine-tuned-model directory
Loads the testing_data.csv file with X_test and Y_test extracted
Creates a custom TensorFlow dataset class to tokenize X_test and Y_test texts
Compiles, computes loss, and evaluates
Prints the test loss
Generates predictions for the testing dataset and prints them in terminal

Results and Comparison

Unfortunately, the result output was some sentences that aren't even grammatically correct, many sentence were meaningless and lack coherence. As the model failed to produce a usable output, we did not conduct any measurement evaluations like cosine similarity or BLEU scores.

Overall, it is an interesting experience, especially the fine-tuning process for transformer models. Also, we understand that there is a lot of challenges beside writing the code itself, such as data quality, pre-trained model itself, etc. If we are given more time and resources, we could try this on a larger dataset and better language model, potentially achieving higher quality results.

Comparison Between Outputs

Here's the comparison between output from our fine-tuned model, and our current application output:

              Fine-tuned Model Output:

              Click to view fine-tuning output file

              Current Application Output:

              Click to view current application output file

Experiment of Crawling Similar Articles

System Description

This subsystem enhances content quality by systematically acquiring and refining literature in three main stages:

1. Keyword Extraction Automated

Extracts domain-relevant terms (e.g., "Covid-19") from segmented PDF sections to guide focused literature retrieval.

2. Literature Collection Multi-source

Searches PubMed → PMC/DOI, retrieves full texts or abstracts via PMC/Sci-Hub.

4 full texts from PMC
3 PDFs via Sci-Hub
2 abstracts used
1 unavailable

3. Content Standardization Preprocessing

Cleans and normalizes collected data:

Fixes hyphenation (e.g., "reha-
bilitation")
Removes non-essential symbols (†, ‡, etc.)
Deletes reference sections
Normalizes whitespace

Future Enhancements

Expanded Sources: Integrate Crossref, Google Scholar, preprint servers
Structured Extraction: Preserve tables, figures, and equations
Literature Prioritization: Add filters for recency and journal credibility
Cross-domain Linking: Use keyword maps for inter-discipline relevance

This system is key to building a reliable, academically-grounded database that supports accurate and responsible content generation. Future iterations will aim to deepen literature coverage and context relevance for systematic review tasks.