Requirements and Use Cases

Click the buttons below to jump to the corresponding section.

1. Initial Requirements 2. Requirements Gathering 3. MoSCoW Requirements 4. Use-Cases

Initial Requirements

The implementation should include a functional web-based UI for the key features, including data loading, data analysis and visualisation. Sample data will be provided. All or parts of all of the project will be made available as an open-source project.


^ Back to Top


Requirements Gathering

The requirements should take into consideration feedback from the following stakeholders:

  • The Seldon team
  • Selected Seldon enterprise customers (i.e. the customer providing a data sample)
  • Surveys to the wider Seldon developer community

Questionnaire

A questionnaire was devised in the early stages of the project, intended for Seldon customers and developer community. The aim of this questionnaire is to identify the general needs and desires of potential users in a data analysis program.


^ Back to Top


MoSCoW Requirements

This is a list of all the requirements as agreed with our client. These requirements are subject to change over the course of the development phase of this project.

ID Requirement Type Category Priority
UI/UX
Um1 The DCS shall display the user imported dataset as a spreadsheet/table. Functional UI/UX Must
Uc1 The DCS shall display all unresolved cleanliness issues Functional UI/UX Could
Uc2 The DCS shall offer the user a choice between a pure GUI interface and a notebook style interface. Functional UI/UX Could
Uc3 The DCS shall support persistence of user sessions. Functional UI/UX Could
Uc4 The DCS shall support partial loading of rows in datasets. Functional UI/UX Could
Uw1 The DCS shall allow multiple users to collaborate showing changes in real time Functional UI/UX Would
Uw2 The DCS shall compute a "messiness" score. Functional UI/UX Would
Data Loading
Lm1 The DCS shall support loading of user-uploaded CSV files. Functional Data Loading Must
Lm2 The DCS shall allow users to specify variable names and types. Functional Data Loading Must
Ls1 The DCS shall support a Date variable type. Functional Data Loading Should
Ls2 The DCS shall parse dates with a user-provided format string. Functional Data Loading Should
Lc1 The DCS shall support loading of user-uploaded structured file formats (JSON, Excel). Functional Data Loading Could
Lc2 The DCS shall support partial loading of columns in datasets. Functional Data Loading Could
Lw1 The DCS shall support loading user-uploaded unstructured data text files. Functional Data Loading Would
Lw2 The DCS shall support parsing of unstructured file format. Functional Data Loading Would
Lw3 The DCS shall support loading of data files over network. Functional Data Loading Would
Lw4 The DCS shall load asynchronously. Functional Data Loading Would
Lw5 The DCS shall support loading of data streams. Functional Data Loading Would
Lw6 The DCS shall support an Email variable type. Functional Data Loading Would
Lw7 The DCS shall intelligently guess variable types. Functional Data Loading Would
Data Viewing
Xs1 The DCS shall support sorting of rows by user-specified column Functional Data Viewing Should
Xs2 The DCS shall support searching datasets with a keyword Functional Data Viewing Should
Xc1 The DCS shall support querying datasets with SQL Functional Data Viewing Could
Data Cleaning
Cm1 The DCS shall support removing rows as a universal cleaning operation Functional Data Cleaning Must
Cm2 The DCS shall support inserting user-specified values as a universal cleaning operation Functional Data Cleaning Must
Cm3 The DCS shall show rows with invalid numbers in specified column Functional Data Cleaning Must
Cm4 The DCS shall show rows with missing values in specified column Functional Data Cleaning Must
Cm5 The DCS shall support cleaning of missing values by inserting an average Functional Data Cleaning Must
Cm6 The DCS shall support cleaning of missing values by filling with the most recent value Functional Data Cleaning Must
Cm7 The DCS shall support cleaning of missing values by interpolation Functional Data Cleaning Must
Cs1 The DCS shall show rows where Date parsing failed Functional Data Cleaning Should
Cs2 The DCS shall show duplicate rows Functional Data Cleaning Should
Cs3 The DCS shall provide the option to ignore outliers Functional Data Cleaning Should
Cs4 The DCS shall provide the option to filter rows using regular expression Functional Data Cleaning Should
Cs5 The DCS shall support data normalisation and standarisation Functional Data Cleaning Should
Cc1 The DCS shall provide the option to group multiple text representation of the same entity and replace them with a single value Functional Data Cleaning Could
Cc2 The DCS shall fix escaped HTML strings Functional Data Cleaning Could
Cc3 The DCS shall show values that are not found in English dictionary Functional Data Cleaning Could
Cw1 The DCS shall show rows where Email parsing failed Functional Data Cleaning Would
Data Analysis
Am1 The DCS shall show the unique values and their count of every column of the dataset Functional Data Analysis Must
Am2 The DCS shall show the mean, median, mode of columns with numerical data Functional Data Analysis Must
Am3 The DCS shall show the max and min values of columns with numerical data Functional Data Analysis Must
Am4 The DCS shall show the range and standard deviation of columns with numerical data Functional Data Analysis Must
As1 The DCS shall show text analysis such as most frequent word for string type data Functional Data Analysis Should
Data Visualisation
Vm1 The DCS shall be able to visualise data using histograms Functional Data Visualisation Must
Vm2 The DCS shall be able to visualise data using line charts Functional Data Visualisation Must
Vs1 The DCS shall be able to visualise data using scatter plots Functional Data Visualisation Should
Vs2 The DCS shall be able to visualise data using time-series plots Functional Data Visualisation Should
Vc1 The DCS shall provide the option to export graphs to image Functional Data Visualisation Could
Vc2 The DCS shall be able to visualise data using pie charts Functional Data Visualisation Could
Vw1 The DCS shall be able to visualise data using regression matricies Functional Data Visualisation Would
Vw2 The DCS shall be able to visualise data using bar charts Functional Data Visualisation Would
Others
Nm1 The DCS shall use a browser as its user interface Non-Functional Compliance to Standards Must
Nm2 The DCS shall support the latest versions of Safari, Internet Explorer, Chrome, Firefox Non-Functional Performance Must
Nm3 The DCS shall be easily installable by an untrained user with the help of documentation Non-Functional Deployment Must
Nc1 The DCS shall ensure that error messages give the users specific instructions for recovery Non-Functional Ease of Use Could
Nc2 The DCS shall ensure that a users persistence data has an availability of 100% Non-Functional Availability Could
Nw1 The DCS shall support 100 concurrent sessions Non-Functional Capacity Would
Nw2 The DCS shall be easily scalable to accommodate more concurrent users Non-Functional Capacity Would


^ Back to Top


Use-Cases

Use Case ID Use Case Name
UC1 LoadFile
UC2 DefineVariableTypes
UC3 DeleteSelectedRows
UC4 ChangeValueInCell
UC5 DisplayRowsWithMissingValues
UC6 FillMissingValuesWithAverage
UC7 ViewAnalysisOfColumn
UC8 DisplayUniqueValues
UC9 CreateVisualisation
UC10 ChangeVisualisation
UC11 DispalyDuplicateRows
UC12 SortValuesInColumns
UC13 QueryDataWithKeyword
UC14 DefineConstraintsOnRange
UC15 EnterCodeInNotebook
UC16 ClusterData


USE CASE LoadFile
ID UC1
BRIEF DESCRIPTION A user attempts to load a file on the system
PRIMARY ACTORS User
SECONDARY ACTORS None
PRECONDITIONS None
MAIN FLOW 1. The use case is initiated when the user opens the web app in a browser
2. The system displays a file chooser
3. The user selects a file to upload from a local directory
4. The user selects the upload option
POSTCONDITIONS The file is uploaded into the workspace
ALTERNATIVE FLOWS None


USE CASE DefineVariableTypes
ID UC2
BRIEF DESCRIPTION The user defines a name and a type for each variable
PRIMARY ACTORS User
SECONDARY ACTORS None
PRECONDITIONS The user has selected a file to be uploaded
MAIN FLOW 1. The use case is initiated when the user selects a file to be uploaded
2. The system displays a list of all the variable names based on the first value of every column
3. The user selects the name(s) they wish to edit, and enters a new name followed by the next button
4. The system displays a list of the variable names preceded by the type
5. The user selects the appropriate type for each variable
POSTCONDITIONS The type and names of the variables have been set
ALTERNATIVE FLOWS None


USE CASE DeleteSelectedRows
ID UC3
BRIEF DESCRIPTION The user selects on a row/rows to be deleted
PRIMARY ACTORS User
SECONDARY ACTORS None
PRECONDITIONS A file has been uploaded to the system
MAIN FLOW 1. The user selects on multiple rows within the spreadsheet
2. The system highlights all the selected rows
3. The user selects the delete rows option in the sidebar
POSTCONDITIONS The selected rows are deleted
ALTERNATIVE FLOWS None


USE CASE ChangeValueInCell
ID UC4
BRIEF DESCRIPTION The user updates the value of cells within the spreadsheet
PRIMARY ACTORS User
SECONDARY ACTORS None
PRECONDITIONS A file has been uploaded to the system
MAIN FLOW 1. The user selects the cell to be updated
2. The user types in the new value of the cell
3. The system checks if the value entered conforms with the variable type
4. The system displays the updated value in the cell
POSTCONDITIONS The old value is replaced with the new value
ALTERNATIVE FLOWS ValueEnteredDoesNotMatchType
ID UC4.1
BRIEF DESCRIPTION The value entered does not conform with the type of the variable
PRIMARY ACTORS System (App)
SECONDARY ACTORS None
PRECONDITIONS The user enters a value that is of an incorrect type
MAIN FLOW The alternative flow starts after step 3 of the main flow The system replaces the value of the updated cell with an N/A
POSTCONDITIONS The value is replaced with N/A


USE CASE DisplayRowsWithMissingValues
ID UC5
BRIEF DESCRIPTION The system displays all the rows in a particular column that have missing values
PRIMARY ACTORS User
SECONDARY ACTORS None
PRECONDITIONS A file has been uploaded to the system
MAIN FLOW 1. The user selects a column in the spreadsheet
2. The user selects the missing value option in the sidebar
3. The system updates the spreadsheet to display only those rows consisting of missing values within the specified column
POSTCONDITIONS The spreadsheet is updated
ALTERNATIVE FLOWS None


USE CASE FillMissingValuesWithAverage
ID UC6
BRIEF DESCRIPTION The user replaces all the missing values within a selected column using an average
PRIMARY ACTORS User
SECONDARY ACTORS None
PRECONDITIONS A file has been uploaded to the system
MAIN FLOW 1. The user selects a column within the spreadsheet
2. The user selects the replace missing values option in the sidebar
3. The system displays a list consisting of the different averages depending on the type of column
4. The user selects on the appropriate average, then selects the run option The system replaces all the missing values within the column with the chosen average
POSTCONDITIONS The spreadsheet is updated
ALTERNATIVE FLOWS None


USE CASE ViewAnalysisOfColumn
ID UC7
BRIEF DESCRIPTION The system displays an analysis of each variable within the spreadsheet
PRIMARY ACTORS User
SECONDARY ACTORS None
PRECONDITIONS A file has been uploaded to the system
MAIN FLOW 1. The user selects the analyse tab at the top
2. The system carries out an analysis of each variable depending on its type
3. The system displays the analysis of each variable in a table
POSTCONDITIONS None
ALTERNATIVE FLOWS None


USE CASE DisplayUniqueValues
ID UC8
BRIEF DESCRIPTION The system displays a list of the frequency of values within a variable
PRIMARY ACTORS User
SECONDARY ACTORS None
PRECONDITIONS File uploaded to system
MAIN FLOW 1. The use case is initiated when the user is in the analyse tab
2. The system will display the analysis results of each variable in a table
3. The user toggles the display unique value feature in the appropriate table
4. The system displays the frequencies of all values within the selected variable
POSTCONDITIONS None
ALTERNATIVE FLOWS None


USE CASE CreateVisualisation
ID UC9
BRIEF DESCRIPTION The user creates a visualisation by selecting a chart type followed by the variables
PRIMARY ACTORS User
SECONDARY ACTORS None
PRECONDITIONS A file has been uploaded to the system
MAIN FLOW 1. The use case is initiated when the user selects the visualise option at the top The system displays a list of chart types and variable names
2. The user selects the graph The system limits the number of variables a user is able to select depending on the chart
3. The user selects the variable names
4. The system displays the graph
POSTCONDITIONS The system displays the graph that is generated
ALTERNATIVE FLOWS None


USE CASE ChangeVisualisation
ID UC10
BRIEF DESCRIPTION The user changes the type of chart used to display results
PRIMARY ACTORS User
SECONDARY ACTORS None
PRECONDITIONS A chart is already generated
MAIN FLOW 1. The use case is initiated when the user has already generated a chart
2. The system displays a list of chart types
3. The user the chart type they wish to change to
4. The system displays the new chart
POSTCONDITIONS The new chart is displayed
ALTERNATIVE FLOWS UnableToGenerateGraph
ID UC10.1
BRIEF DESCRIPTION The chosen chart supports a different number of variables to that previously selected
PRIMARY ACTORS System
SECONDARY ACTORS User
PRECONDITIONS The user selects a chart that supports a different number of variables
MAIN FLOW 1. The alternate flow starts after step 3 of the main flow
2. The system prompts the user to select an appropriate number of variables depending on the newly selected chart
3. The user selects the variables
4. The newly generated graph is displayed
POSTCONDITIONS The new graph is displayed


USE CASE DispalyDuplicateRows
ID UC11
BRIEF DESCRIPTION The system displays all rows within a column that have duplicate values
PRIMARY ACTORS User
SECONDARY ACTORS None
PRECONDITIONS A file has been uploaded to the system The user is in the clean tab
MAIN FLOW 1. The user selects a column in the spreadsheet
2. The user selects on the duplicate values option in the side bar
3. The system groups together rows that have the same value in a particular column and displays this in the spreadsheet
POSTCONDITIONS Rows with duplicate values are grouped together
ALTERNATIVE FLOWS None


USE CASE SortValuesInColumns
ID UC12
BRIEF DESCRIPTION The user sorts values in a particular column
PRIMARY ACTORS User
SECONDARY ACTORS None
PRECONDITIONS A file has been uploaded to the system The user is in the clean tab
MAIN FLOW 1. The user selects a column
2. The user selects the sort option in the sidebar
3. The system displays a list of sorting techniques
4. The user selects the sorting technique
5. The system sorts the values in the variable according to the technique selected
POSTCONDITIONS The column is sorted
ALTERNATIVE FLOWS None


USE CASE QueryDataWithKeyword
ID UC13
BRIEF DESCRIPTION The user is able to query data in a particular column using a keyword
PRIMARY ACTORS User
SECONDARY ACTORS None
PRECONDITIONS A file has been uploaded to the system
MAIN FLOW 1. The user selects a column within the spreadsheet The user selects on the query data option in the sidebar
2. The system displays a textbox in which the user can enter a keyword
3. The user enters the keyword
4. The system updates the spreadsheet to display only those rows which contain the keyword in the chosen variable field.
POSTCONDITIONS The spreadsheet is updated to display rows that match
ALTERNATIVE FLOWS None


USE CASE DefineConstraintsOnRange
ID UC14
BRIEF DESCRIPTION The user is able to define constraints on the range of a variable
PRIMARY ACTORS User
SECONDARY ACTORS None
PRECONDITIONS A file has been uploaded to the system
MAIN FLOW 1. The user selects a column in the spreadsheet
2. The user selects the define range option
3. The system displays two checkboxes
4. The user enters the upper and lower bound of the range in the textboxes
5. The system updates the spreadsheet to displays only those rows where the value of the variable is within the defined range
POSTCONDITIONS None
ALTERNATIVE FLOWS InvalidRange
ID UC14.1
BRIEF DESCRIPTION The range entered does not conform with the type of the variable
PRIMARY ACTORS System
SECONDARY ACTORS User
PRECONDITIONS The range entered does not conform with the type
MAIN FLOW 1. The alternate flow starts after step 4 of the main flow
2. The system displays a message that the values entered are of a different type
3. The system provides the user with an option to enter the range again or close the feature
POSTCONDITIONS None


USE CASE EnterCodeInNotebook
ID UC15
BRIEF DESCRIPTION The user enters code snippets in a notebook
PRIMARY ACTORS User
SECONDARY ACTORS None
PRECONDITIONS None
MAIN FLOW 1. The user selects on the notebook option
2. The system displays a notebook interface
3. The user enters snippets of code in the notebook
4. The user selects the run option The system updates the spreadsheet
POSTCONDITIONS The spreadsheet is updated
ALTERNATIVE FLOWS None


USE CASE ClusterData
ID UC16
BRIEF DESCRIPTION the user clusters data in a particular column
PRIMARY ACTORS User
SECONDARY ACTORS None
PRECONDITIONS A file has been uploaded to the system
MAIN FLOW 1. The user selects a column in the spreadsheet
2. The user clicks on the cluster option in the sidebar
3. The system groups all similar values within the column and displays this in the spreadsheet
POSTCONDITIONS The spreadsheet is updated
ALTERNATIVE FLOWS none


^ Back to Top