Achievements and Requirements
Original Requirements
ID |
Description |
Rationale |
Type |
Priority |
1 |
Well-documented project that can be built upon in the future |
This is only the first iteration of the project and it is important that any success can be made the most of in the future rather than needing to be re-implemented |
Non-functional |
Must Have |
2 |
A pipeline that goes from inputting images to being able to search through those images by tags |
The system will be used by arts and humanities researchers and so it is not important that they know about the inner workings of the system to be able to use it |
Non-functional |
Must Have |
3 |
Segmented steps in our pipeline |
The method of tagging, the way images are input, or even the search functionality may change in the future so it is important that it can be easily adaptable to this |
Non-functional |
Must Have |
4 |
A set of images with correct tags |
In order to try and train our classifier to work on the entire set we need to have a number of images that we are confident are correct |
Functional |
Must Have |
15 |
An automated image tagging method |
There are far too many image to be tagged by hand, especially if the nature of the tags changes over time and every image needs to be re-classified |
Non-functional |
Must Have |
6 |
An automated way to tag images based on their medium (photograph/painting/diagram etc.) |
Although this would still result in large sets of images, it at least narrows down the number of images to search through, and could be used in combination with other tags |
Non-functional |
Should Have |
7 |
A way to infer information about the image by the book that contains it |
By using the fact that the books have a number of attributes such as publication date, location and the book's genre, we can use this in conjunction with the visual information to more accurately classify the images |
Non-functional |
Could Have |
8 |
A way to use the existing Flickr tags |
There is a wealth of tags that already exist on Flickr for the BL dataset which is valuable data but it is not necessarily reliable. May be useful for generating new tags |
Non-functional |
Should Have |
9 |
A way to find images by their tags |
Once the images have been correctly tagged, there still needs to be a way of retrieving them. This could be either our own system or by passing the tags on to another system such as Flickr |
Non-functional |
Must Have |
10 |
A way to group similar images for retrieval |
It is helpful to find images that have similar attributes when searching, especially if the user is not quite sure exactly what they are looking for |
Non-functional |
Should Have |
11 |
A way to describe the clusters of images in a way that a human could understand |
This makes browsing and retrieval of images much more straightforward but is difficult and with the time left may not be feasible |
Non-functional |
Would like to have |
12 |
An object classification method for images |
Searching by the images content is likely to be the most intuitive way to search for pictures. For example searching 'boat' for pictures of boats. But also the hardest, and may not be possible in the time given |
Non-functional |
Would like to have |
Implementation of Requirements and Achievements
The user was very happy with our system because we implemented all of his requirements successfully. The following table shows each requirement and the priority of it:
ID |
Description |
Implemented successfully? |
Priority of Requirement |
1 |
Well-documented project that can be built upon in the future |
Implemented Successfully |
Must Have |
2 |
A pipeline that goes from inputting images to being able to search through those images by tags |
Implemented Successfully |
Must Have |
3 |
Segmented steps in our pipeline |
Implemented Successfully |
Must Have |
4 |
A set of images with correct tags |
Implemented Successfully |
Must Have |
5 |
An automated image tagging method |
Implemented Successfully |
Must Have |
6 |
An automated way to tag images based on their medium (photograph/painting/diagram etc.) |
Implemented Successfully |
Should Have |
7 |
A way to infer information about the image by the book that contains it |
Implemented Successfully |
Could Have |
8 |
A way to use the existing Flickr tags |
Implemented Successfully |
Should Have |
9 |
A way to find images by their tags |
Implemented Successfully |
Must Have |
10 |
A way to group similar images for retrieval |
Implemented Successfully |
Should Have |
11 |
A way to describe the clusters of images in a way that a human could understand |
Implemented Successfully |
Would like to have |
12 |
An object classification method for images |
Implemented Successfully |
Would like to have |
Further Explanations on Achievements
- ”Well-documented project that can be built upon in the future”: (Implemented successfully) We have written the documentation so that it shows the steps required to continue building the project. We have done this by including:
- A user manual which shows how the application works - it shows the steps required to carry out the main features.
- The code is formatted so that people can understand the different segments of it. Comments, indentation and appropriate identifier names were used to achieve this.
- ”A pipeline that goes from inputting images to being able to search through those images by tags”: (Implemented successfully) This was implemented successfully since:
- All of the images used are input from the British Library image data-set (extracted from Flickr).
- Users can search for images that are stored on Flickr (image data-set).
- ”Segmented steps in our pipeline”: (Implemented successfully) All of the features have been separated into different modules throughout the code. This is for many reasons:
- It is easier for future developers to continue developing certain features without having to worry about other features being affected.
- It is easier to read the code if the different features and steps are modular.
- If bugs occur when future teams work on our project, it is much easier to debug the code when the features are separated into different modules.
- “A set of images with correct tags”: (Implemented successfully) We have tagged thousands of images in the data-set using two APIs and machine learning algorithms. It is not possible to tag every image 100% accurately, so we conducted thorough testing to check the percentage of images that are tagged correctly. We found that around 90% of them were tagged correctly using our algorithms.
- “An automated image tagging method”: (Implemented successfully) Our methods used to tag images use algorithms to do this. We use two APIs to tag images based on what they represent (AlchemyAPI and Imagga API). There are approximately one million images in the data-set, so it would take too long to tag all of them. Therefore, we have made a script which tags a certain amount of images every day automatically.
- “An automated way to tag images based on their medium (photograph/painting/diagram etc.)”: (Implemented successfully) We have also used machine learning algorithms to classify images based on characteristics such as:
- Whether the image is in black and white or colour.
- Whether the image is part of some musical notes.
- Whether the image is a line drawing or a photograph.
- “A way to infer information about the image by the book that contains it”: (Implemented successfully) We have made a feature that allows users to view information about the book that each image is from. The following information is displayed about the books that the images are from:
- Volume of the book.
- Name of the publisher.
- The book title.
- The book author.
- The place of publication.
- The year it was published.
- The number of pages.
- “A way to use the existing Flickr tags”: (Implemented successfully) many of the existing Flickr tags are still associated with the images that they are tagged with. So, users can search for images that have already been tagged on Flickr.
- “A way to find images by their tags”: (Implemented successfully) In the home page, users can input a search query and choose to find out all of the images that have tags associated to the input. Then, the user would be re-directed to a page that has all of the images as the search results.
- “A way to group similar images for retrieval” (Implemented successfully) All of the images that the user are presented with in the search page are similar in one way. The images are either in the same book or they have similar tags (based on the APIs and machine learning algorithms used).
- “A way to describe the clusters of images in a way that a human could understand” (Implemented successfully) In the search results page, the images are clustered by the books that they are from or by the tags that the images have (If the images have the same tags or they are in the same book, then they would be clustered together).
- “An object classification method for images”: (Implemented successfully) images can be classified into having different characteristics (including the colour of the image and whether it is a line drawing or a photograph).