Menu

Project Background

The X5Learn platform hosts hundreds of thousands of learning resources in different formats. To find relevant resources, users rely on the platform’s search engine. The problem is that most resources are in video, audio or PDF formats which cannot be indexed directly due to their non-text nature. Therefore, search engines have to rely on computer-generated transcripts. These transcripts always contain errors, making them inherently noisy. The main goal of this project is thus to build a search engine that works effectively with this noisy data and gives users relevant results.

Client Requirements

The project is divided into two parts: machine learning (the backend) and software engineering (the frontend). In the machine learning part, one key requirement is to create multiple search models with Elasticsearch. These have to consider the noisy nature of the data. The algorithms then have to be tested statistically. The aim of this is to find the most effective search model. In the software engineering part, we had to develop an API architecture that allows us to plug in different search algorithms. The results will be displayed in a Google-like web user interface independent of the search algorithm. We also had to implement search filters that allow users to specify what kind of materials they are looking for.

Project Goals

This project entails building an AI-powered Information Retrieval Engine that can carry out information retrieval using noisy documents where documents are generated via transcription that can introduce errors. The work entails initial research into existing methods of robust information retrieval and solutions proposed for similar problems, preparing a research report from the findings, identifying the requirements, and proposing a solution application that can carry out information retrieval with noisy data, and finally, development and deployment of the proposed system with necessary integrations.

Personas

Use Cases

From the user personas, we could sketch a use cases diagram as shown below

use cases diagram

MoSCoW List

Software Engineering

Machine Learning