COMP0016 Team 5 -- JurisBUD AI

RESEARCH

AI Agents


In the researching stage for AI Agents system, we have searched a number of open-sourced systems online. Based on them, we developed our own AI Agents system using LangChain. The following list of models are what we have researched online.




HuggingGPT

HuggingGPT, a framework that leverages LLMs (e.g., ChatGPT) to connect various AI models in Hugging Face community to solve AI tasks.(1) Specifically, ChatGPT is used to conduct task planning when receiving a user request, select models according to their function descriptions available in Hugging Face, execute each subtask with the selected AI model, and summarize the response according to the execution results. This is an example of using large language models to build a chatbot system.



AutoGen

AutoGen is firstly introduced by our client Mr. Fergus Kidd. It is a multi-agent conversation framework made by Microsoft as a high-level abstraction. With AutoGen, users can build LLM workflows with customizable agents that could be solved different tasks based on their given responsibilities.(2) >

BabyAGI

BabyAGI is also a AI Agents LLM but with simpler structure. The key concept it introduced is task generated. The process behind Baby AGI is that it will create a task using predefined objectives that are based on the outcome out a previous task. By default, it has Execution Agent, Task Creation Agent and Prioritization Agent that are worked for a basic LLM workflow.(3) This system is exploring and demonstrating to us the potential of large language models, such as GPT and how it can autonomously perform tasks.

Vector Database


What's Vector Database?

A vector database is a type of database that stores data as high-dimensional vectors, which are mathematical representations of features or attributes.(4) It transfer everything in the database to a series of vectors generated by embedding function.

ChromaDB

ChromaDB is an open-source vector database designed specifically for LLM applications. ChromaDB offers both a user-friendly API and impressive performance, making it a great choice for many embedding applications. One amazing feature of ChromaDB is that it can actually filter queries using metadata, it accelerates the process of document processing dramatically especially when users need to interact with documents in the database directly. (5)

Technology Research

Next.js

Next.js is an open-source web development framework created by the private company Vercel providing React-based web applications with server-side rendering and static website generation,(6) where React is a free and open-source front-end JavaScript library for building user interfaces based on components. However, in the usage of React, developers find some strategic issues.(7) For example, when develop a website using React, all components of the website need to be rendered locally. On the contrary, Next.js let most of components be rendered on server-side, which decrease local computers’ workload dramatically to improve the performance for users.


Tailwind CSS

Tailwind CSS is an open-source utility-first CSS framework enables rapid styling. It can simplify traditional CSS code and especially working well with Daisy UI.(8)


CSS Modules

CSS Modules locally scope CSS by automatically creating a unique class name.(9) This allows developers to use the same class name in different files without worrying about collisions. This behavior makes CSS Modules the ideal way to include component-level CSS. Next.js has built in support for it and it could make the code more well-strctured.


Daisy UI

Daisy UI is Tailwind CSS plugin which add a number of component classes that could be used more directly. There are a bunch of built-in code to decorate website components It also introduces class names for Tailwind so a lot of duplication of same styling could be avoided.(10)


Next→Backend
django

Django is a high-level Python web framework that encourages rapid development and clean, pragmatic design.(11) it typically groups the code that handles each of these steps into separate files. Django framework makes the built web app more complete, versatile, secure, scalable, maintainable and portable.


PDF to Text

To extract texts from PDF files, we are using a Python package called pdfminer. It is a powerful tool that can extract infomation from PDFs in any format including text, images, tables etc. It also works for different font types and multiple languages.(12)


Next→AI System
LangChain

LangChain is a framework for developing applications powered by language models. It enables applications that are context-aware and reasonable.(13) LangChain serves as a generic interface for nearly any LLM, providing a centralized development environment to build LLM applications and integrate them with external data sources and software workflows.(14) It can facilitate most use cases for LLMs and natural language processing (NLP), like chatbots, intelligent search, question-answering, summarization services or even virtual agents capable of robotic process automation.


Next→Languages
Python

Python is a high-level general purpose programming language. It was created by Guido van Rossum, and released in 1991. it can be used to create a variety of different programs and isn’t specialized for any specific problems. This versatility, along with its beginner-friendliness, has made it one of the most-used programming languages today. For our project, we are using it for backend development for the web application, database developmen, and AI Model development.(15)


TypeScript

TypeScript is a syntactic superset of JavaScript which adds static typing. This basically means that TypeScript adds syntax on top of JavaScript, allowing developers to add types. It is used for both server-side and client-side development. Since it is a superset of JavaScript, functions of JavaScript are working correctly as well with TypeScript.(16)


Technical Decisions

After researching on different technologies that can be used for development, and also we learned something to enlighten us, we made decisions on the ones that we are gonna use to build JurisBUD AI.

Tech Decision
Frontend Framework Next.js
CSS Framework Tailwind CSS
CSS Plugin Daisy UI
Backend Framework Python django
PDF to Text pdfminer
Database ChromaDB
AI model development LangChain
Languages Python & Typescript
AI API Azure OpenAI API

References:

1. Shen Y, Song K, Tan X, Li D, Lu W, Zhuang Y. HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face [Internet]. www.microsoft.com. 2023 [cited 2024 Mar 22]. Available from: https://www.microsoft.com/en-us/research/publication/hugginggpt-solving-ai-tasks-with-chatgpt-and-its-friends-in-hugging-face/

2. AutoGen [Internet]. Microsoft Research. [cited 2024 Mar 22]. Available from: https://www.microsoft.com/en-us/research/project/autogen/

3. Baby AGI: The Birth of a Fully Autonomous AI [Internet]. KDnuggets. Available from: https://www.kdnuggets.com/2023/04/baby-agi-birth-fully-autonomous-ai.html

4. What is a Vector Database? [Internet]. Cloudflare. Available from: https://www.cloudflare.com/learning/ai/what-is-vector-database/

5. ChromaDB: A Vector Database for AI Applications [Internet]. Real Python. Available from: https://realpython.com/chromadb-vector-database/

6. Differences Between Static Generated Sites And Server-Side Rendered Apps [Internet]. Smashing Magazine. 2020. Available from: https://www.smashingmagazine.com/2020/07/differences-static-generated-sites-server-side-rendered-apps/

7. What Is React? - What React Is and Why It Matters [Book] [Internet]. www.oreilly.com. Available from: https://www.oreilly.com/library/view/what-react-is/9781491996744/ch01.html

8. Introduction to Tailwind CSS [Internet]. GeeksforGeeks. 2020. Available from: https://www.geeksforgeeks.org/introduction-to-tailwind-css/

9. Styling: CSS Modules | Next.js [Internet]. nextjs.org. [cited 2024 Mar 22]. Available from:https://nextjs.org/docs/app/building-your-application/styling/css-modules

10. Daisy UI [Internet]. Tailwind Awesome. 2020 [cited 2024 Mar 22]. Available from: https://www.tailwindawesome.com/resources/daisy-ui

11. Django introduction [Internet]. MDN Web Docs. Available from: https://developer.mozilla.org/en-US/docs/Learn/Server-side/Django/Introduction

12. Welcome to pdfminer.six’s documentation! — pdfminer.six __VERSION__ documentation [Internet]. pdfminersix.readthedocs.io. Available from: https://pdfminersix.readthedocs.io/en/latest/

13. What is LangChain? | IBM [Internet]. www.ibm.com. Available from: https://www.ibm.com/topics/langchain

14. Introduction | 🦜️🔗 Langchain [Internet]. python.langchain.com. Available from: https://python.langchain.com/docs/get_started/introduction

15. Coursera. What is Python used for? A beginner’s guide [Internet]. Coursera. 2021. Available from: https://www.coursera.org/articles/what-is-python-used-for-a-beginners-guide-to-using-python

16. Staff A. Microsoft TypeScript: the JavaScript we need, or a solution looking for a problem? [Internet]. Ars Technica. 2012. Available from: https://arstechnica.com/information-technology/2012/10/microsoft-typescript-the-javascript-we-need-or-a-solution-looking-for-a-problem/