RESEARCH
Technology Review
Pinecone Vector DB
Pinecone is a vector database designed for managing high-dimensional vector data.
Having a vector database is definitely important and useful when it comes up to high-dimensional
vector embeddings, which is a must operation in RAG and LLMs.
The reason why we chose Pinecone includes scalability, allowing real-time
search, and allowing semantic search. Pinecone is built to handle large-scale vector
data, which maximizes the limit of input files. Pinecone provides real-time search,
which allows quick response. Lastly, Pinecone performs semantic searches, retrieving
results that are contextually relevant rather than just keyword matches, which leads
to higher consistency and relevancy.
Why Pinecone over other vector databases? We thought about using Weaviate,
however, Pinecone has a larger scalability without the need of manually configuring
the database. Scalability is a key factor in our project, as instead of chat bot or
other RAG applications, we are specifically dealing with large
number of papers. Pinecone is also easier to use, as it provides a managed service
that does not require any maintenance or configuration, while Weaviate is open-sourced.
We wont consider Weaviate as it requires more time and effort to set up and maintain.
OpenAI GPT-4o Mini
We used OpenAI GPT-4o Mini as the Large Language Model (LLM) of our prototype.
The LLM plays a crucial role in a RAG system. It
processes the embedded text to generate sections of
a review, that are then combined into a full systematic review.
The reason we why chose GPT-4o Mini as the model
was because its good understanding with the prompt
and generates a decent quality result output. We were not able to
invest in a better models like GPT-4.5 or fine-tuning
pre-trained LLM due to the resource and pricing problems.
So how is GPT-4o Mini better than GPT-3.5?
GPT-4o Mini supports up to 128k tokens, whereas GPT-3.5 only supports up to 16.4k tokens.
Larger number of tokens allows us to generate longer text with better quality.
Moreover, GPT-4o Mini has a more recent knowledge cut-off(Oct 2023) comparing to GPT-3.5(Sept 2021).
Next.js
We have chosen to use the React framework, Next.js, with Javascript to develop our website as opposed to other web frameworks like Angular or Vue.js because there is more flexibility with React. React provides plenty of tools, such as state management hooks and the use of the UI framework Tailwind CSS, allowing for more efficient development of design. There is a vast amount of libraries available with React, including the conversion of markdown text to HTML. We specifically chose Next.js because it provides server-side rendering, reducing the load time, and its file-based routing to simplify page navigation.
Technical Decisions Summary
Our technical choices evolved through iterative experimentation, balancing effectiveness with cost-efficiency. Decisions were made both pre-development and during implementation phases, with continuous optimization of approaches to minimize resource expenditure while maximizing system performance.
References
J. A. A. P. A. A. N. Pramod Chunduri, "RAG-Demystified", GitHub Repository, 26 Jan 2024. https://github.com/pchunduri6/rag-demystified
Y. N. O. Edmund Hong, "Transforming the Analytics Landscape with RAG-powered LLM", Grab Tech Blog, 9 Oct 2024. https://engineering.grab.com/article-link