Term 2 brings with it a list of challenges for us to overcome. Our team is excited to get stuck into the wide range of tasks that are necessary to get this project completed in a timely fashion.
Continuing on from term 1, during term 2 we will continue to be in constant and professional communication with our clients from the ANCSSC. This will be achieved through our bi-weekly reports and regular video conferences. Our clients are non-technical, so we will strive to make our briefings and updates understandable and straightforward.
As mentioned in other sections of this website, a big part of our project is the implementation of a form and corresponding database in order to allow ANCSSC staff to collect and record information from NGOs they visit. After building the database in term 1, term 2 will involve the creation of a web-form and server in order to allow this data to be collected and inputted into the database. We may also need to perform further refactoring on our database as the clients refine what kind of information they would want to collect.
As a clearer picture emerges of the type of data we are collecting within the forms, we will need to refine our table layouts within database 1 in order to accommodate the data that is going to be collected.
Term 1 has involved a lot of research and trial-and-error when it comes to PDF extraction. This is the most complex and difficult task we must undertake for this project. Most likely, our research will continue into term 2 as we refine our approach to this large problem. We have worried continually during term 1 that this task may not be possible at all, at least when we factor in our team size, experience, and time available. Our algorithm will most likely tend towards a state of "semi-automation", where we automate as much of the extraction as possible, and ask the user to do the rest. For this particular task, it is important we continue to be realistic with our clients as to what is possible, and create an end product that can fulfill the brief as best as possible.
Building this extraction tool could also involve consultation with representatives at Microsoft, who can help us in understanding what tools they have available, as well as how best to use them to solve the problems we face.
Microsoft has very recently (within the last week) released a new feature of their cognitive services API that allow the extraction of data from tables, which solves a big problem we were facing as the PDFs are filled with tables of different structures. We will need to look into this new feature in depth going forwards
This term, we focused on the extraction of data itself, but we will also need a way to store this data once it has been extracted. During term 2 we will need to build a separate database structure to contain this data.
It is important that the data we collect from these pdf documents can be searched in a user friendly way, thus we need to build a front-end to allow for this. We are not planning to make this front-end overly complex, just a basic search function will do.
Right before the Prototype 1 deadline, we became aware of potential limitations of the budget of our clients. We will need to consult with our clients regarding this matter after the submission of Prototype 1, which could possibly lead to deliverables such as our databases being deployed on a different platforms.