Requirements and Use Cases
Click the buttons below to jump to the corresponding section.
1. Initial Requirements 2. Requirements Gathering 3. MoSCoW Requirements 4. Use-Cases
Initial Requirements
The implementation should include a functional web-based UI for the key features, including data loading, data analysis and visualisation. Sample data will be provided. All or parts of all of the project will be made available as an open-source project.
Requirements Gathering
The requirements should take into consideration feedback from the following stakeholders:
- The Seldon team
- Selected Seldon enterprise customers (i.e. the customer providing a data sample)
- Surveys to the wider Seldon developer community
Questionnaire
A questionnaire was devised in the early stages of the project, intended for Seldon customers and developer community. The aim of this questionnaire is to identify the general needs and desires of potential users in a data analysis program.
MoSCoW Requirements
This is a list of all the requirements as agreed with our client. These requirements are subject to change over the course of the development phase of this project.
ID | Requirement | Type | Category | Priority |
UI/UX | ||||
Um1 | The DCS shall display the user imported dataset as a spreadsheet/table. | Functional | UI/UX | Must |
Uc1 | The DCS shall display all unresolved cleanliness issues | Functional | UI/UX | Could |
Uc2 | The DCS shall offer the user a choice between a pure GUI interface and a notebook style interface. | Functional | UI/UX | Could |
Uc3 | The DCS shall support persistence of user sessions. | Functional | UI/UX | Could |
Uc4 | The DCS shall support partial loading of rows in datasets. | Functional | UI/UX | Could |
Uw1 | The DCS shall allow multiple users to collaborate showing changes in real time | Functional | UI/UX | Would |
Uw2 | The DCS shall compute a "messiness" score. | Functional | UI/UX | Would |
Data Loading | ||||
Lm1 | The DCS shall support loading of user-uploaded CSV files. | Functional | Data Loading | Must |
Lm2 | The DCS shall allow users to specify variable names and types. | Functional | Data Loading | Must |
Ls1 | The DCS shall support a Date variable type. | Functional | Data Loading | Should |
Ls2 | The DCS shall parse dates with a user-provided format string. | Functional | Data Loading | Should |
Lc1 | The DCS shall support loading of user-uploaded structured file formats (JSON, Excel). | Functional | Data Loading | Could |
Lc2 | The DCS shall support partial loading of columns in datasets. | Functional | Data Loading | Could |
Lw1 | The DCS shall support loading user-uploaded unstructured data text files. | Functional | Data Loading | Would |
Lw2 | The DCS shall support parsing of unstructured file format. | Functional | Data Loading | Would |
Lw3 | The DCS shall support loading of data files over network. | Functional | Data Loading | Would |
Lw4 | The DCS shall load asynchronously. | Functional | Data Loading | Would |
Lw5 | The DCS shall support loading of data streams. | Functional | Data Loading | Would |
Lw6 | The DCS shall support an Email variable type. | Functional | Data Loading | Would |
Lw7 | The DCS shall intelligently guess variable types. | Functional | Data Loading | Would |
Data Viewing | ||||
Xs1 | The DCS shall support sorting of rows by user-specified column | Functional | Data Viewing | Should |
Xs2 | The DCS shall support searching datasets with a keyword | Functional | Data Viewing | Should |
Xc1 | The DCS shall support querying datasets with SQL | Functional | Data Viewing | Could |
Data Cleaning | ||||
Cm1 | The DCS shall support removing rows as a universal cleaning operation | Functional | Data Cleaning | Must |
Cm2 | The DCS shall support inserting user-specified values as a universal cleaning operation | Functional | Data Cleaning | Must |
Cm3 | The DCS shall show rows with invalid numbers in specified column | Functional | Data Cleaning | Must |
Cm4 | The DCS shall show rows with missing values in specified column | Functional | Data Cleaning | Must |
Cm5 | The DCS shall support cleaning of missing values by inserting an average | Functional | Data Cleaning | Must |
Cm6 | The DCS shall support cleaning of missing values by filling with the most recent value | Functional | Data Cleaning | Must |
Cm7 | The DCS shall support cleaning of missing values by interpolation | Functional | Data Cleaning | Must |
Cs1 | The DCS shall show rows where Date parsing failed | Functional | Data Cleaning | Should |
Cs2 | The DCS shall show duplicate rows | Functional | Data Cleaning | Should |
Cs3 | The DCS shall provide the option to ignore outliers | Functional | Data Cleaning | Should |
Cs4 | The DCS shall provide the option to filter rows using regular expression | Functional | Data Cleaning | Should |
Cs5 | The DCS shall support data normalisation and standarisation | Functional | Data Cleaning | Should |
Cc1 | The DCS shall provide the option to group multiple text representation of the same entity and replace them with a single value | Functional | Data Cleaning | Could |
Cc2 | The DCS shall fix escaped HTML strings | Functional | Data Cleaning | Could |
Cc3 | The DCS shall show values that are not found in English dictionary | Functional | Data Cleaning | Could |
Cw1 | The DCS shall show rows where Email parsing failed | Functional | Data Cleaning | Would |
Data Analysis | ||||
Am1 | The DCS shall show the unique values and their count of every column of the dataset | Functional | Data Analysis | Must |
Am2 | The DCS shall show the mean, median, mode of columns with numerical data | Functional | Data Analysis | Must |
Am3 | The DCS shall show the max and min values of columns with numerical data | Functional | Data Analysis | Must |
Am4 | The DCS shall show the range and standard deviation of columns with numerical data | Functional | Data Analysis | Must |
As1 | The DCS shall show text analysis such as most frequent word for string type data | Functional | Data Analysis | Should |
Data Visualisation | ||||
Vm1 | The DCS shall be able to visualise data using histograms | Functional | Data Visualisation | Must |
Vm2 | The DCS shall be able to visualise data using line charts | Functional | Data Visualisation | Must |
Vs1 | The DCS shall be able to visualise data using scatter plots | Functional | Data Visualisation | Should |
Vs2 | The DCS shall be able to visualise data using time-series plots | Functional | Data Visualisation | Should |
Vc1 | The DCS shall provide the option to export graphs to image | Functional | Data Visualisation | Could |
Vc2 | The DCS shall be able to visualise data using pie charts | Functional | Data Visualisation | Could |
Vw1 | The DCS shall be able to visualise data using regression matricies | Functional | Data Visualisation | Would |
Vw2 | The DCS shall be able to visualise data using bar charts | Functional | Data Visualisation | Would |
Others | ||||
Nm1 | The DCS shall use a browser as its user interface | Non-Functional | Compliance to Standards | Must |
Nm2 | The DCS shall support the latest versions of Safari, Internet Explorer, Chrome, Firefox | Non-Functional | Performance | Must |
Nm3 | The DCS shall be easily installable by an untrained user with the help of documentation | Non-Functional | Deployment | Must |
Nc1 | The DCS shall ensure that error messages give the users specific instructions for recovery | Non-Functional | Ease of Use | Could |
Nc2 | The DCS shall ensure that a users persistence data has an availability of 100% | Non-Functional | Availability | Could |
Nw1 | The DCS shall support 100 concurrent sessions | Non-Functional | Capacity | Would |
Nw2 | The DCS shall be easily scalable to accommodate more concurrent users | Non-Functional | Capacity | Would |
Use-Cases
Use Case ID | Use Case Name |
UC1 | LoadFile |
UC2 | DefineVariableTypes |
UC3 | DeleteSelectedRows |
UC4 | ChangeValueInCell |
UC5 | DisplayRowsWithMissingValues |
UC6 | FillMissingValuesWithAverage |
UC7 | ViewAnalysisOfColumn |
UC8 | DisplayUniqueValues |
UC9 | CreateVisualisation |
UC10 | ChangeVisualisation |
UC11 | DispalyDuplicateRows |
UC12 | SortValuesInColumns |
UC13 | QueryDataWithKeyword |
UC14 | DefineConstraintsOnRange |
UC15 | EnterCodeInNotebook |
UC16 | ClusterData |
USE CASE | LoadFile |
ID | UC1 |
BRIEF DESCRIPTION | A user attempts to load a file on the system |
PRIMARY ACTORS | User |
SECONDARY ACTORS | None |
PRECONDITIONS | None |
MAIN FLOW | 1. The use case is initiated when the user opens the web app in a browser 2. The system displays a file chooser 3. The user selects a file to upload from a local directory 4. The user selects the upload option |
POSTCONDITIONS | The file is uploaded into the workspace |
ALTERNATIVE FLOWS | None |
USE CASE | DefineVariableTypes |
ID | UC2 |
BRIEF DESCRIPTION | The user defines a name and a type for each variable |
PRIMARY ACTORS | User |
SECONDARY ACTORS | None |
PRECONDITIONS | The user has selected a file to be uploaded |
MAIN FLOW | 1. The use case is initiated when the user selects a file to be uploaded 2. The system displays a list of all the variable names based on the first value of every column 3. The user selects the name(s) they wish to edit, and enters a new name followed by the next button 4. The system displays a list of the variable names preceded by the type 5. The user selects the appropriate type for each variable |
POSTCONDITIONS | The type and names of the variables have been set |
ALTERNATIVE FLOWS | None |
USE CASE | DeleteSelectedRows |
ID | UC3 |
BRIEF DESCRIPTION | The user selects on a row/rows to be deleted |
PRIMARY ACTORS | User |
SECONDARY ACTORS | None |
PRECONDITIONS | A file has been uploaded to the system |
MAIN FLOW | 1. The user selects on multiple rows within the spreadsheet 2. The system highlights all the selected rows 3. The user selects the delete rows option in the sidebar |
POSTCONDITIONS | The selected rows are deleted |
ALTERNATIVE FLOWS | None |
USE CASE | ChangeValueInCell |
ID | UC4 |
BRIEF DESCRIPTION | The user updates the value of cells within the spreadsheet |
PRIMARY ACTORS | User |
SECONDARY ACTORS | None |
PRECONDITIONS | A file has been uploaded to the system |
MAIN FLOW | 1. The user selects the cell to be updated 2. The user types in the new value of the cell 3. The system checks if the value entered conforms with the variable type 4. The system displays the updated value in the cell |
POSTCONDITIONS | The old value is replaced with the new value |
ALTERNATIVE FLOWS | ValueEnteredDoesNotMatchType |
ID | UC4.1 |
BRIEF DESCRIPTION | The value entered does not conform with the type of the variable |
PRIMARY ACTORS | System (App) |
SECONDARY ACTORS | None |
PRECONDITIONS | The user enters a value that is of an incorrect type |
MAIN FLOW | The alternative flow starts after step 3 of the main flow The system replaces the value of the updated cell with an N/A |
POSTCONDITIONS | The value is replaced with N/A |
USE CASE | DisplayRowsWithMissingValues |
ID | UC5 |
BRIEF DESCRIPTION | The system displays all the rows in a particular column that have missing values |
PRIMARY ACTORS | User |
SECONDARY ACTORS | None |
PRECONDITIONS | A file has been uploaded to the system |
MAIN FLOW | 1. The user selects a column in the spreadsheet 2. The user selects the missing value option in the sidebar 3. The system updates the spreadsheet to display only those rows consisting of missing values within the specified column |
POSTCONDITIONS | The spreadsheet is updated |
ALTERNATIVE FLOWS | None |
USE CASE | FillMissingValuesWithAverage |
ID | UC6 |
BRIEF DESCRIPTION | The user replaces all the missing values within a selected column using an average |
PRIMARY ACTORS | User |
SECONDARY ACTORS | None |
PRECONDITIONS | A file has been uploaded to the system |
MAIN FLOW | 1. The user selects a column within the spreadsheet 2. The user selects the replace missing values option in the sidebar 3. The system displays a list consisting of the different averages depending on the type of column 4. The user selects on the appropriate average, then selects the run option The system replaces all the missing values within the column with the chosen average |
POSTCONDITIONS | The spreadsheet is updated |
ALTERNATIVE FLOWS | None |
USE CASE | ViewAnalysisOfColumn |
ID | UC7 |
BRIEF DESCRIPTION | The system displays an analysis of each variable within the spreadsheet |
PRIMARY ACTORS | User |
SECONDARY ACTORS | None |
PRECONDITIONS | A file has been uploaded to the system |
MAIN FLOW | 1. The user selects the analyse tab at the top 2. The system carries out an analysis of each variable depending on its type 3. The system displays the analysis of each variable in a table |
POSTCONDITIONS | None |
ALTERNATIVE FLOWS | None |
USE CASE | DisplayUniqueValues |
ID | UC8 |
BRIEF DESCRIPTION | The system displays a list of the frequency of values within a variable |
PRIMARY ACTORS | User |
SECONDARY ACTORS | None |
PRECONDITIONS | File uploaded to system |
MAIN FLOW | 1. The use case is initiated when the user is in the analyse tab 2. The system will display the analysis results of each variable in a table 3. The user toggles the display unique value feature in the appropriate table 4. The system displays the frequencies of all values within the selected variable |
POSTCONDITIONS | None |
ALTERNATIVE FLOWS | None |
USE CASE | CreateVisualisation |
ID | UC9 |
BRIEF DESCRIPTION | The user creates a visualisation by selecting a chart type followed by the variables |
PRIMARY ACTORS | User |
SECONDARY ACTORS | None |
PRECONDITIONS | A file has been uploaded to the system |
MAIN FLOW | 1. The use case is initiated when the user selects the visualise option at the top The system displays a list of chart types and variable names 2. The user selects the graph The system limits the number of variables a user is able to select depending on the chart 3. The user selects the variable names 4. The system displays the graph |
POSTCONDITIONS | The system displays the graph that is generated |
ALTERNATIVE FLOWS | None |
USE CASE | ChangeVisualisation |
ID | UC10 |
BRIEF DESCRIPTION | The user changes the type of chart used to display results |
PRIMARY ACTORS | User |
SECONDARY ACTORS | None |
PRECONDITIONS | A chart is already generated |
MAIN FLOW | 1. The use case is initiated when the user has already generated a chart 2. The system displays a list of chart types 3. The user the chart type they wish to change to 4. The system displays the new chart |
POSTCONDITIONS | The new chart is displayed |
ALTERNATIVE FLOWS | UnableToGenerateGraph |
ID | UC10.1 |
BRIEF DESCRIPTION | The chosen chart supports a different number of variables to that previously selected |
PRIMARY ACTORS | System |
SECONDARY ACTORS | User |
PRECONDITIONS | The user selects a chart that supports a different number of variables |
MAIN FLOW | 1. The alternate flow starts after step 3 of the main flow 2. The system prompts the user to select an appropriate number of variables depending on the newly selected chart 3. The user selects the variables 4. The newly generated graph is displayed |
POSTCONDITIONS | The new graph is displayed |
USE CASE | DispalyDuplicateRows |
ID | UC11 |
BRIEF DESCRIPTION | The system displays all rows within a column that have duplicate values |
PRIMARY ACTORS | User |
SECONDARY ACTORS | None |
PRECONDITIONS | A file has been uploaded to the system The user is in the clean tab |
MAIN FLOW | 1. The user selects a column in the spreadsheet 2. The user selects on the duplicate values option in the side bar 3. The system groups together rows that have the same value in a particular column and displays this in the spreadsheet |
POSTCONDITIONS | Rows with duplicate values are grouped together |
ALTERNATIVE FLOWS | None |
USE CASE | SortValuesInColumns |
ID | UC12 |
BRIEF DESCRIPTION | The user sorts values in a particular column |
PRIMARY ACTORS | User |
SECONDARY ACTORS | None |
PRECONDITIONS | A file has been uploaded to the system The user is in the clean tab |
MAIN FLOW | 1. The user selects a column 2. The user selects the sort option in the sidebar 3. The system displays a list of sorting techniques 4. The user selects the sorting technique 5. The system sorts the values in the variable according to the technique selected |
POSTCONDITIONS | The column is sorted |
ALTERNATIVE FLOWS | None |
USE CASE | QueryDataWithKeyword |
ID | UC13 |
BRIEF DESCRIPTION | The user is able to query data in a particular column using a keyword |
PRIMARY ACTORS | User |
SECONDARY ACTORS | None |
PRECONDITIONS | A file has been uploaded to the system |
MAIN FLOW | 1. The user selects a column within the spreadsheet The user selects on the query data option in the sidebar 2. The system displays a textbox in which the user can enter a keyword 3. The user enters the keyword 4. The system updates the spreadsheet to display only those rows which contain the keyword in the chosen variable field. |
POSTCONDITIONS | The spreadsheet is updated to display rows that match |
ALTERNATIVE FLOWS | None |
USE CASE | DefineConstraintsOnRange |
ID | UC14 |
BRIEF DESCRIPTION | The user is able to define constraints on the range of a variable |
PRIMARY ACTORS | User |
SECONDARY ACTORS | None |
PRECONDITIONS | A file has been uploaded to the system |
MAIN FLOW | 1. The user selects a column in the spreadsheet 2. The user selects the define range option 3. The system displays two checkboxes 4. The user enters the upper and lower bound of the range in the textboxes 5. The system updates the spreadsheet to displays only those rows where the value of the variable is within the defined range |
POSTCONDITIONS | None |
ALTERNATIVE FLOWS | InvalidRange |
ID | UC14.1 |
BRIEF DESCRIPTION | The range entered does not conform with the type of the variable |
PRIMARY ACTORS | System |
SECONDARY ACTORS | User |
PRECONDITIONS | The range entered does not conform with the type |
MAIN FLOW | 1. The alternate flow starts after step 4 of the main flow 2. The system displays a message that the values entered are of a different type 3. The system provides the user with an option to enter the range again or close the feature |
POSTCONDITIONS | None |
USE CASE | EnterCodeInNotebook |
ID | UC15 |
BRIEF DESCRIPTION | The user enters code snippets in a notebook |
PRIMARY ACTORS | User |
SECONDARY ACTORS | None |
PRECONDITIONS | None |
MAIN FLOW | 1. The user selects on the notebook option 2. The system displays a notebook interface 3. The user enters snippets of code in the notebook 4. The user selects the run option The system updates the spreadsheet |
POSTCONDITIONS | The spreadsheet is updated |
ALTERNATIVE FLOWS | None |
USE CASE | ClusterData |
ID | UC16 |
BRIEF DESCRIPTION | the user clusters data in a particular column |
PRIMARY ACTORS | User |
SECONDARY ACTORS | None |
PRECONDITIONS | A file has been uploaded to the system |
MAIN FLOW | 1. The user selects a column in the spreadsheet 2. The user clicks on the cluster option in the sidebar 3. The system groups all similar values within the column and displays this in the spreadsheet |
POSTCONDITIONS | The spreadsheet is updated |
ALTERNATIVE FLOWS | none |