Research

Key insights and tools behind FinSync.

Related Project Review

SAP Concur Expense

    Concur Expense is a software where users can take pictures of their receipts using their mobile phones, and an expense entry will be created through the extraction of relevant information from the receipt [1]. SAP Concur also integrates with a number of financial and accounting systems, allowing for ease of integration with pre-existing architecture. There is also a dashboard that provides a visual summary of data, enabling users to gain quick insights into the data. Since this software is generalised for use by many different companies, it may not be specialised for particular formats of invoices and budgeting, which is extremely important for our project. In researching this project, we determined the following key takeaways for our own project:

  • System Integration: Compatibility with organisation's pre-existing systems ensures that data stays consistent across platforms, and allows companies to effortlessly transition into using the product.
  • Real-time visibility: Providing up-to-date visualizations of data enables companies to increase the efficiency of financial reporting, and as a result, allows for better informed decision making.

Technology Review

Possible solutions

    To solve the main problem of reconciling expenses with budgets, we came up with two possible solutions: manual data engineering, and using an LLM to match expenses with budgets. LLM-based matching is quick to set up but would require fine-tuning and consistent human review to ensure maximum accuracy. It also scales well and is more adaptable. However, the major concern with this method is the lower predictability of the results. It may misinterpret data, causing incorrect matching and as a result, inaccurate summary spreadsheets. It is also much less explainable, making users less certain of the accuracy of the results. Manual data engineering ensures high accuracy by following predefined rules and logic, and engineers have full control over and are aware of how data is being matched. Despite the slower setup time and required maintenance, once data is cleaned and processed, more reliable results can be achieved. In terms of scalability, it is possible to optimize for larger datasets, but more engineering would be needed. The data manipulation required for this project includes data merging by column and summing, for which high processing power is usually not required. After consideration, we decided to use manual data engineering due to the following key reasons:

  • Accuracy: Invoice and budget reconciliation involves financial data where errors can have a significant negative impact, resulting in inaccurate financial reporting. Manual data engineering ensures a clear and precise approach.
  • Explainability & Control: Every change made to the data is transparent and follows predefined logic, making it easier to debug and describe.

  • As an extension to our project, we also decided to experiment with implementing an AI chatbot feature in order to allow users to ask questions about the data, which is then converted into a queries which return the appropriate results. This will enable users to easily gain tailored insights into their data.

Languages

Typescript

    Typescript was chosen for the frontend over Javascript due to its various features that better suited our project. Javascript's dynamic typing increases flexibility and development speed, however, this is only a large advantage for smaller projects and without regard for scalability and project extensions. Javascript also has a large volume of resources, libraries and tools, increasing development ease [2]. Typescript is statically typed, allowing for type related errors to be detected early. It also supports interfaces and generics, improving collaboration by making code more consistent and readable. This in turn makes code maintenance and scaling much easier, which looking forward, would allow for project extensions to be made seamlessly [2]. We decided that these features were more advantageous for the specifications of our project, hence choosing to use Typescript.

Python

    Initially for the backend, we were planning on using Typescript in order to keep everything streamlined so that deployment would be cleaner. However, we ended up using Python as we realised would be the best choice for our project; it is one of the most commonly used languages for data engineering due to the large amount of data manipulation libraries available, such as Pandas and NumPy [3]. This would allow for efficient cleaning and merging of the given datasets. Python can also be easily integrated with other languages [4], which is essential for our project due to the use of Typescript for the frontend.

Libraries and frameworks

Next.js

    The two frameworks we considered for the frontend were React and Next.js. React is highly flexible and allows for increased customisation. This makes it more suitable for projects that require complex animations or tailored layouts, which is not an important part of our project. React's architecture is component based, which makes rendering and updates fairly efficient [7]. However, this would only be advantageous if server-side rendering was not necessary. With Next.js, pages are pre-rendered on server-side/at build time, meaning they load faster. Next.js also automatically splits code, loading only what is necessary for each page, reducing initial load times. Both of these features enhance the user experience by increasing the speed of the web app. The use of API routes to fetch data is also a key aspect of our project, as it is required to create excel previews and dashboard visuals. This feature is provided by Next.js, making it a well suited framework to use [7].

Pandas

    The Pandas library was used as it is an extremely useful library for cleaning and manipulating large amounts of data. Missing data is handled very well, allowing for empty values to be quickly identified and addressed accordingly [5]. In addition to this, preprocessing, sorting, filtering and grouping data is made quicker and easier with Pandas' inbuilt functions [6].

Azure services

    We decided to use a variety of Microsoft Azure services in order to prioritise ease of deployment and integration with Chanel's existing architecture, as they already have an organisation wide Microsoft Azure account. This also allows for enhanced security, as all of the data displayed on the web app is housed within their own account, reducing the risk of unauthorised access. This is particularly important for this project, due to the sensitive nature of the financial data provided by the clients. The Azure services were accessed and managed through their respective Python libraries, which allowed us to easily combine them with our pandas data analysis. We used the following azure services: Blob storage, function apps and azure data explorer. Blob storage was used to store all of the Excel and CSV files, data explorer was used to store the data in tables, and Azure functions were used to automatically trigger data processing upon upload.

LLMs and Agents

    In order to select the appropriate LLM to query table data, experiments were done with a variety of different models. Initially, TAPAS models from Microsoft were tried [8], but it did not perform particularly well when passing the billed report as a data frame. The performance was adequate using a virtualized GPU in Google Colab, however it was too intensive to run on a local laptop. Next, a fine tuned Mistral v4 model [9] was tried, which was again too large to run locally. It was then quantized, which caused the performance to become significantly poorer. As a result, the approach to this idea was reconsidered, and the use of agents in conjunction with an LLM was introduced. The agent experimented with was LangChain [10], which was run locally with IBM Granite 3.2 8b [11], which showed great quality and worked much better than the models previously tried. IBM Granite was used as it is trained on business data and specifically targeted for businesses, making it a great fit for the project. Running LLMs offline is also a large security advantage, as no data needs to be put on the cloud. Despite this feature not being integrated with our web app, this experimentation is key in considering the potential future of the project.

Technical Decisions Summary

Technical Decisions Summary

After thorough research of different technologies, we made our final decisions on what the project was to be built with.

Category Details
Languages Python, TypeScript
Frameworks Next.js
Data processing library Pandas
Storage Azure blob storage
Automation Azure function apps
Database Azure data explorer
LLMs and Agent Azure OpenAI, LangChain, and Pandas

References

[1] SAP Concur Team, "How Concur Expense Works," www.concur.com, Jun. 10, 2023. https://www.concur.com/blog/article/how-concur-expense-works (accessed Mar. 26, 2025).
[2] Chaewonkong, "TypeScript, What Good For?," Medium, Apr. 18, 2023. https://medium.com/@chaewonkong/typescript-what-good-for-8240dc9173c7 (accessed Mar. 26, 2025).
[3] Yash Patel, "Programming Languages for Data Engineering: A Guide for 2024," Medium, Feb. 14, 2024. https://medium.com/@laners.org/programming-languages-for-data-engineering-a-guide-for-2024-d4bfc9cdcc46 (accessed Mar. 26, 2025).
[4] Nishaadequate, "Advantages of Python Language | Technical Chamber - Nishaadequate - Medium," Medium, Jun. 02, 2023. https://medium.com/@nishaadequate123/advantages-of-python-language-technical-chamber-9efb5b5be0e7 (accessed Mar. 26, 2025).
[5] "What is pandas Python?," NVIDIA Data Science Glossary, 2024. https://www.nvidia.com/en-gb/glossary/pandas-python/ (accessed Mar. 26, 2025).
[6] S. Pillai, "Advantages of Pandas Library for Data Analysis," Incentius, Feb. 16, 2023. https://www.incentius.com/blog-posts/advantages-of-pandas-library-for-data-analysis/ (accessed Mar. 26, 2025).
[7] UXPin, "NextJS vs React ? Which One is Better for Web Development?," Studio by UXPin, Apr. 11, 2024. https://www.uxpin.com/studio/blog/nextjs-vs-react/ (accessed Mar. 26, 2025).
[8] "microsoft/tapex-large-finetuned-wtq Hugging Face," Huggingface.co, Jan. 18, 2024. https://huggingface.co/microsoft/tapex-large-finetuned-wtq (accessed Mar. 27, 2025).
[9] "aisha44/mistral_instruct_v4_KQL Hugging Face," Huggingface.co, 2025. https://huggingface.co/aisha44/mistral_instruct_v4_KQL (accessed Mar. 27, 2025).
[10] "Pandas Dataframe | LangChain," Langchain.com, 2025. https://python.langchain.com/docs/integrations/tools/pandas/ (accessed Mar. 27, 2025).
[11] Dinesh Nirmal, "Granite foundation models," Ibm.com, Sep. 07, 2023. https://www.ibm.com/think/news/granite-foundation-models (accessed Mar. 27, 2025).