Testing and Evaluation

To ensure that our system meets quality standards, we will be placing a heavy emphasis on software testing throughout development. This page details the testing strategies for this project. Click the buttons below to jump to the corresponding section.

1. Unit Testing 2. Functional Testing 3. Compatibility Testing 4. Requirements Evaluation 5. Performance Testing 6. Acceptance Testing

Unit Testing

To ensure the integrity, completeness and correctness of our software, we developed a individual test suites which consisted of unit tests for both our front and back-ends. After developing a feature we would carry out regression testing by automatically running the test suites to ensure no new bugs are introduced.

Back-end - Python Modules

Since our backend consisted of Python modules, we used a full-featured testing tool called Pytest to create our unit tests. While developing the test suite for the backend, we made sure to individually test each python function with extreme, normal and abnormal data to ensure that our software was robust enough. In total we had 100 unit tests in our test suite.

Front-end - Angular.js

To construct unit tests for the front-end, we used Jasmine which is a behaviour driven development framework as well as Karma which is a test runner that is framework agnostic. we used Karma to run the tests written in Jasmine to run on different browsers. While constructing the unit tests we tried to test most of the important features of our UI. Overall we have 75 tests in our suite.

The locations of these unit tests can be found in the Project File Structure page of our technical documentation.


^ Back to Top


Functional Testing

Functional testing is used as a quality assurance process for a software system.It refers to activities that verify a specific action or function of the code. Functional test tends to answer the questions like “can the user do this” or “does this particular feature work”. In order to carry out our functional testing, we used our requirements specification to construct a series of test cases which were used for manual black-box testing.

Manual Testing

The test cases we used were built around the specification, system requirements and design parameters. For each test case we selected both valid and invalid inputs in order to determine whether the system produces the correct desired output. Each of the test cases were carried out manually, the type of test carried out and the result of each test is recorded in the table below:

Test ScenarioTest DataTest Data TypeExpected ResultsActual ResultsPass/Fail
Upload Screen
Upload a CSV fileValid CSV fileNormalParse file successfully and update spreadsheet with dataAs expectedPass
Upload a JSON fileValid JSON fileNormalParse file successfully and update spreadsheet with dataAs expectedPass
Upload a XLS(X) fileValid XLS(X) fileNormalParse file successfully and update spreadsheet with dataAs expectedPass
Upload a file with inconsistencies Invalid CSV fileAbnormalunable to parse file error appearsAs expectedPass
Select a sample size when uploading a fileSample size = 50, (Normal Data)NormalUpload a percentage of the data-set based on the sample size, (sampling should be done randomly)As expectedPass
Select a sample size when uploading a fileSample size = -10AbnormalInput rejectedAs expectedPass
Specify a seed for sampling the dataSeed = 4NormalUse the seed for sampling the dataAs expectedPass
Specify a seed for sampling the dataSeed = -50AbnormalInput RejectedInput accepted, File Parsing FailedFail
Specify a number of initial rows to skip when uploading a fileSkip=10NormalSkip the first 10 rows of the file before uploadingAs expectedPass
Specify a number of initial rows to skip when uploading a fileSkip = -10AbnormalInput rejectedAs expectedPass
Unmark the option for 'file includes column headers'Uncheck the checkboxNormalUpload the dataset without column headersAs ExpectedPass
Clean Tab
Create a duplicate of a columnSelect a columnNormalCreate a duplicate of the column specified and add it to the datasetAs expectedPass
Split a Column using a delimiterDelimiter ='.'NormalSplits the data contained within the column into multiple columns if the delimiter is presentAs expectedPass
Split a Column that contains invalid valuesDelimiter = '.'ExtremeSplit the data in the column if the delimiter is presentOperation FailsFail
Combine multiple columns into a new columnSeparator = '@,' New column name = emailNormalCombine the selected columns using '@' as a separator into a new column named 'email'As expectedPass
combine multiple columns into a new columnseperator = ' ' (space)ExtremeCombine the selected columns using ' ' as a separatorThe seperator is not used Fail
Use combine feature on a single columnSelect a single column to combine onlyAbnormalThe user cannot run the operationAs expectedPass
Impute missing data with the columns meanSelect a column with missing valuesNormalMissing values are filled with the column meanAs expectedPass
Impute missing data with the columns meanSelect a column with no missing valuesExtremeColumn stays the sameAs expectedPass
Impute missing data with the columns modeSelect a column with missing valuesNormalMissing values are filled with the column modeAs expectedPass
Impute missing data with the columns modeSelect a column with missing values but with no modeExtremeMissing values are filled with any value contained within the columnAs expectedPass
Impute missing data with the columns medianSelect a column with missing valuesNormalMissing values are filled with the column medianAs expectedPass
Impute missing data with the last valid valueSelect a column with missing valuesNormalMissing values are filled with the last valid valueAs expectedPass
Impute missing data with the next valid valueSelect a column with missing valuesNormalMissing values are filled with the next valid valueAs expectedPass
Impute missing data using linear interpolationSelect a column with missing valuesNormalMissing values are filled using linear interpolationAs expectedPass
Impute missing data using spline interpolation Spline order = 10NormalMissing values are filled using spline interpolationAs expectedPass
Impute missing data using spline interpolation Spline order = -5 AbnormalInput rejectedAs expectedPass
Impute missing data using polynomial interpolation Polynomial order = 3 NormalMissing values are filed using polynomial interpolationAs expectedPass
Impute missing data using PHCIP interpolationSelect a column with missing values NormalMissing values are filed using PHCIP interpolationAs expectedPass
Impute missing data in a specific column using a custom value Custom value = "%£$^&^%* ExtremeMissing values are filed using the custom valueAs expectedPass
Delete rows containing missing values in a single columnSelect a column with missing valuesNormalRows containing missing values should be deletedAs expectedPass
Delete rows containing missing values in a single columnSelect a column with no missing valuesExtremeDataset should remain unchangedAs expectedPass
Discretize the data in a column using binsNumber of Bins = 5 NormalColumn data changed with discretized dataAs expectedPass
Discretize the data in a column using binsSelect a column that contains invalid values ExtremeColumn data changed with discretized dataOperation FailsFail
Specify custom ranges for discretizing a columnranges = 1,20,30,40Normaldescretize the data using the range provided As expectedPass
Quantile the data in a column using binsNumber of Bins = 5 NormalColumn data changed with Quantized dataAs expectedPass
Quantile the data in a column using binsNumber of Bins = -20 AbnormalInput rejectedAs expectedPass
Apply categorical feature encoding to a columnSelect a columnNormalConvert each record of the column to a combination of high and low bitsAs expectedPass
Apply categorical feature encoding to a columnSelect a column with missing/invalid valuesExtremeConvert each record of the column to a combination of high and low bitsAs expectedPass
Apply min-max feature scaling to a numeric columnRange = 10 to 20NormalScale the data in the column according to the range specifiedAs expectedPass
Apply min-max feature scaling to a numeric columnRange = -3543 to 294342ExtremeScale the data in the column according to the range specifiedAs expectedPass
Standardise the data in a columnSelect a columnNormalStandardise the data within the columnAs expectedPass
Find and replace all values within a column that match the input string Match = 'hello', Replace = 'world'NormalReplace all occurences of the string 'hello' with the string 'world'As expectedPass
Apply find and replace feature to a column of type dateSelect a colum of type date NormalReplace all values of the specified string with the replacement string Operation fails Fail
Use find and replace feature using a regular expression Regex = [0-9]*, Replace = 10NormalReplace all values that match the regex with the string '10' As expectedPass
Use find and replace feature using a regular expression Regex = [0-9]*, Replace = 48"£$%^&*fExtremeReplace all values that match the regex with the string provided As expectedPass
filter the dataset by duplicates in a selected columnSelect a column with duplicatesNormalfilter the data to display only duplicates in the selected columnAs expectedPass
filter the dataset by duplicates in a selected columnSelect a column with no duplicatesExtremeFiltered dataset should contain nothingAs expectedPass
filter the dataset by Invalid values in a selected columnSelect a column with invalid valuesNormalfilter the data to display only rows containing invalid values in the selected columnAs expectedPass
filter the dataset by Invalid values in a selected columnSelect a column with no invalid valuesExtremeFiltered dataset should contain nothingAs expectedPass
filter the dataset using outliers in a selected columnStandard deviation = 2, Trim Percentage = 10Normalfilter the data to display only rows containing outliers in the selected columnAs expectedPass
filter the dataset using outliers in a selected columnStandard deviation = -24, Trim Percentage = -11AbnormalInput rejectedAs expectedPass
Change the name of a columnNew name = 'col1'NormalChange the column name to 'col1'As expectedPass
Change the name of a columnNew name = '"£$%^&*('ExtremeChange the column name As expectedPass
Change the name of a columnChange the name to a name that is already present ExtremeReject the input Operation failsFail
Change the type of a column from float to int Select a column with no invalid valuesNormalData type changes to int As expectedPass
Change the type of a column from float to int Select a column with invalid valuesExtremeData type changes to int Operation failsFail
Change the type of a column from int to floatSelect a columnNormalData type changes to floatAs expectedPass
Change the type of a column from int/float to stringSelect a columnNormalData type changes to StringAs expectedPass
Change the type of a column from int/float to stringSelect a column with missing valuesExtremeData type changes to StringAs expectedPass
Change the type of a column from string to datetime Select a column of type string with date valuesNormalData type changes to datetimeAs expectedPass
Change the type of a column from string to datetime Select a column of type dateNormalData type changes to string Operation failsFail
Delet a selected columnSelect a columnNormalThe selected column should be deleted As expectedPass
Sort the dataset by a single columnSelect a columnNormalSort the dataset with respect to the specified columnAs expectedPass
Sort the dataset by a single columnSelect a column where every value is the same ExtremeData set remains unchangedAs expectedPass
Sort the dataset by multiple columnsSelect multiple columns for sorting NormalSort the dataset with respect to the specified columnsAs expectedPass
Search the rows in a column using a search termSearch term = 'hello'NormalDisplay only those rows within the column that contatin the word hello (complete match)The filtered results arenot a complete matchFail
Search the rows in a column usng a reular expressionSelect a column that is not a numeric type, regex = [0-9]NormalDisplay only those rows that contatin the matched regexAs expectedPass
Search the rows in a column using a search termSelect a search term that is not contained within the columnExtremeFiltered dataset should contain no rows As expectedPass
Search for a term in all the columns Search term = 'hello'NormalDisplay only those rows which contain the search term in any columnAs expectedPass
Search for a term in all the columns using a regular expressionRegex = [0-9]*NormalDisplay only those rows which contain the regex term in any columnAs expectedPass
Filter the column than search through the columnFilter = Duplicates, Search = 'hello'NoramalDisplay only those rows which contain the search term in the filtered columnOperation fails Fail
Edit the vallue in a particular cellNew cell value = 'hello world'NormalChange the value of the cell to 'hello world'As expectedPass
Edit the vallue in a particular cellNew cell value = '£$^&&@(-3'ExtremeChange the value of the cell to ''£$^&&@(-3'As expectedPass
Analysis tab
Check if all generic analyses are displayed for every columnFull datasetNormalDisplay all generic analyses for every columnAs expectedPass
Check if all numeric analyses are displayed for numeric columnsNumeric columnsNormalDisplay numeric analyses for all the numeric columnsAs expectedPass
Check if all String analyses are displayed for string and date columnsString and date columnsNormalDisplay string analyses for all the string and date columnsAs expectedPass
Check if a word frequency table is available for string types in the analysis tabString and date columnsNormalDisplay the word frequency table for all the string and date columnsAs expectedPass
Visualise Tab
Check If a line chart works correctlySelect two columnsNormalLine charat is diaplayedAs expectedPass
Check If a scatter chart work correctlySelect two columnsNormalScatter charat is diaplayedAs expectedPass
Check If a histogram works correctlySelect two columnsNormalHistogram is diaplayedAs expectedPass
Change the bin size of the histogrambin size = 10NormalDisplay histogram with a bin size of 10As expectedPass
Change the bin size of the histogrambin size = '$%^&*ExtremeInput rejectedInput acceptedFail
Check If a frequency chart work correctlySelect a single colum NormalFrequency charat is diaplayedAs expectedPass
Check If a time series chart work correctlySelect a columnNormalTime series chart is displayedDates are not ordered before the chart is generated Fail
Export/Save
Save the dataframe as a CSV fileDatasetNormalDataset is downloaded as a CSV fileAs expectedPass
Save the dataframe as a JSON fileDatasetNormalDataset is downloaded as a JSON fileAs expectedPass

N.B. Most of the failures has since been fixed.


^ Back to Top


Compatibility Testing

With all the different operating systems and web browsers available, it is very important to these the compatibility of an application. Below are some of the tests we carried out:

Browser compatibility

Ensuring that a web application works on the most widely used browsers is extremely important, this is because applications can behave differently dependent on browsers they are run on. It is also the case that different browsers have different configurations and settings that a web page should be compatible with. We therefore tested our web application on Internet explorer, Firefox, Chrome, Safari, and Edge with different versions.

OS compatibility

In some cases a functionality of a web application may not be compatible with all operating systems. Technologies such as graphic designs and interface calls may not be available in all Operating Systems. We therefore tested our application on Windows, Mac and Linux systems.


^ Back to Top


Requirements Evaluation

Throughout the development of the system, we evaluate the progress and success of the system by referring to our requirements specification for the system and made sure that all must have and should have requirements were implemented. The status of each requirement at the end of the project is listed below:

ID Requirement Type Category Priority Status
UI/UX
Um1 The DCS shall display the user imported dataset as a spreadsheet/table. Functional UI/UX Must Implemented
Uc1 The DCS shall display all unresolved cleanliness issues Functional UI/UX Could Not Implemented
Uc2 The DCS shall offer the user a choice between a pure GUI interface and a notebook style interface. Functional UI/UX Could Withdrawn
Uc3 The DCS shall support persistence of user sessions. Functional UI/UX Could Implemented
Uc4 The DCS shall support partial loading of rows in datasets. Functional UI/UX Could Implemented
Uw1 The DCS shall allow multiple users to collaborate showing changes in real time Functional UI/UX Would Partially Implemented
Uw2 The DCS shall compute a "messiness" score. Functional UI/UX Would Not Implemented
Data Loading
Lm1 The DCS shall support loading of user-uploaded CSV files. Functional Data Loading Must Implemented
Lm2 The DCS shall allow users to specify variable names and types. Functional Data Loading Must Implemented
Ls1 The DCS shall support a Date variable type. Functional Data Loading Should Implemented
Ls2 The DCS shall parse dates with a user-provided format string. Functional Data Loading Should Implemented
Lc1 The DCS shall support loading of user-uploaded structured file formats (JSON, Excel). Functional Data Loading Could Implemented
Lc2 The DCS shall support partial loading of columns in datasets. Functional Data Loading Could Not Implemented
Lw1 The DCS shall support loading user-uploaded unstructured data text files. Functional Data Loading Would Not Implemented
Lw2 The DCS shall support parsing of unstructured file format. Functional Data Loading Would Not Implemented
Lw3 The DCS shall support loading of data files over network. Functional Data Loading Would Not Implemented
Lw4 The DCS shall load asynchronously. Functional Data Loading Would Not Implemented
Lw5 The DCS shall support loading of data streams. Functional Data Loading Would Not Implemented
Lw6 The DCS shall support an Email variable type. Functional Data Loading Would Not Implemented
Lw7 The DCS shall intelligently guess variable types. Functional Data Loading Would Partially Implemented
Data Viewing
Xs1 The DCS shall support sorting of rows by user-specified column Functional Data Viewing Should Implemented
Xs2 The DCS shall support searching datasets with a keyword Functional Data Viewing Should Implemented
Xc1 The DCS shall support querying datasets with SQL Functional Data Viewing Could Not Implemented
Data Cleaning
Cm1 The DCS shall support removing rows as a universal cleaning operation Functional Data Cleaning Must Implemented
Cm2 The DCS shall support inserting user-specified values as a universal cleaning operation Functional Data Cleaning Must Implemented
Cm3 The DCS shall show rows with invalid numbers in specified column Functional Data Cleaning Must Implemented
Cm4 The DCS shall show rows with missing values in specified column Functional Data Cleaning Must Implemented
Cm5 The DCS shall support cleaning of missing values by inserting an average Functional Data Cleaning Must Implemented
Cm6 The DCS shall support cleaning of missing values by filling with the most recent value Functional Data Cleaning Must Implemented
Cm7 The DCS shall support cleaning of missing values by interpolation Functional Data Cleaning Must Implemented
Cs1 The DCS shall show rows where Date parsing failed Functional Data Cleaning Should Withdrawn
Cs2 The DCS shall show duplicate rows Functional Data Cleaning Should Implemented
Cs3 The DCS shall provide the option to ignore outliers Functional Data Cleaning Should Withdrawn
Cs4 The DCS shall provide the option to filter rows using regular expression Functional Data Cleaning Should Implemented
Cs5 The DCS shall support data normalisation and standarisation Functional Data Cleaning Should Implemented
Cc1 The DCS shall provide the option to group multiple text representation of the same entity and replace them with a single value Functional Data Cleaning Could Not Implemented
Cc2 The DCS shall fix escaped HTML strings Functional Data Cleaning Could Not Implemented
Cc3 The DCS shall show values that are not found in English dictionary Functional Data Cleaning Could Not Implemented
Cw1 The DCS shall show rows where Email parsing failed Functional Data Cleaning Would Not Implemented
Data Analysis
Am1 The DCS shall show the unique values and their count of every column of the dataset Functional Data Analysis Must Implemented
Am2 The DCS shall show the mean, median, mode of columns with numerical data Functional Data Analysis Must Implemented
Am3 The DCS shall show the max and min values of columns with numerical data Functional Data Analysis Must Implemented
Am4 The DCS shall show the range and standard deviation of columns with numerical data Functional Data Analysis Must Implemented
As1 The DCS shall show text analysis such as most frequent word for string type data Functional Data Analysis Should Implemented
Data Visualisation
Vm1 The DCS shall be able to visualise data using histograms Functional Data Visualisation Must Implemented
Vm2 The DCS shall be able to visualise data using line charts Functional Data Visualisation Must Implemented
Vs1 The DCS shall be able to visualise data using scatter plots Functional Data Visualisation Should Implemented
Vs2 The DCS shall be able to visualise data using time-series plots Functional Data Visualisation Should Implemented
Vc1 The DCS shall provide the option to export graphs to image Functional Data Visualisation Could Implemented
Vc2 The DCS shall be able to visualise data using pie charts Functional Data Visualisation Could Not Implemented
Vw1 The DCS shall be able to visualise data using regression matricies Functional Data Visualisation Would Not Implemented
Vw2 The DCS shall be able to visualise data using bar charts Functional Data Visualisation Would Partially Implemented
Others
Nm1 The DCS shall use a browser as its user interface Non-Functional Compliance to Standards Must Implemented
Nm2 The DCS shall support the latest versions of Safari, Internet Explorer, Chrome, Firefox Non-Functional Performance Must Implemented
Nm3 The DCS shall be easily installable by an untrained user with the help of documentation Non-Functional Deployment Must Implemented
Nc1 The DCS shall ensure that error messages give the users specific instructions for recovery Non-Functional Ease of Use Could Implemented
Nc2 The DCS shall ensure that a users persistence data has an availability of 100% Non-Functional Availability Could Not Tested
Nw1 The DCS shall support 100 concurrent sessions Non-Functional Capacity Would Not Tested
Nw2 The DCS shall be easily scalable to accommodate more concurrent users Non-Functional Capacity Would Not Implemented


^ Back to Top


Performance Testing

To make sure the web application performs well with reasonably sized datasets, we tested the final system with a series of artificially generated datasets, ranging from less than 1MB in size to over 100MB. We tested 5 main functionalities of the system, including:

  • Upload the dataset
  • Impute missing values in a column (with 95.8% values missing) with the column mode
  • Perform a find and replace operation using regex on a column
  • Generate a scatter plot between 2 numerical columns
  • Split a column into 3 columns using a delimiter

We measured the time the system takes to perform each of these operations using a stopwatch.

Specification of the machine used for testing:

  • MacBook Pro 15-inch (Late 2013)
  • 2.0 GHz (i7-4750HQ) quad-core Intel Core i7 Haswell with 6 MB on-chip L3 and 128 MB L4 cache (Crystalwell)
  • 8 GB built-in onboard RAM
  • Intel Iris Pro 5200 Graphics with DDR3L SDRAM shared with main memory
  • Running on Google Chrome 50

Results

result1
Results of performance testing.
result2
The effect of file size on the time taken by each operation.

N.B. Most operations performed by the system works on columns rather than rows of a dataset so the performance of the system depends more on the number of rows in the dataset.

The results were not surprising to us as most basic operations performs well enough even with a large file size. Some operations are resource intensive in its nature and there’s little we can do to improve their performance, but they still work well with smaller datasets. When compared to other software such as OpenRefine and Excel, our system performs as well if not better on some of the features. For example, both OpenRefine and Excel failed to load the 106.8MB file while our system loaded it under 30 seconds. Overall we are satisfied by the performance of our system.


^ Back to Top


User Acceptance Testing (UAT)

This is the last phase of the software engineering process. During UAT the completed software is tested by potential users of the system to ensure it can handle required tasks in real-world scenarios. In order to carry out UAT we designed several test cases which covered most of our key requirements. We then presented these test cases to our client who in turn decided to pass them on to potential users of the system to carry out the UAT. Our client informed us that they will send us the feedback they may receive regarding bugs and usability. However it is worth mentioning that our clients were extremely pleased with each individual feature offered by the system as well as the overall User Experience.


^ Back to Top