WebSocket API¶

A majority of the communication between the Willow backend and frontend occurs over a WebSocket connection using SocketIO. WebSocket was chosen over HTTP because the Willow backend needs to be able to push the results of potentially long running data operations without the frontend being forced to continuously poll for results.

Request Message Structure¶

WebSocket messages made with SocketIO consist of a title and JSON body. Our WebSocket API defines a common structure to requests:

Title

<String>

Body

{
        'operation': <String>,
        'sessionID': <String>,
        'requestID': <String>,
        ...request-specific key-value pairs...
}

Message title must match one of the defined API requests.

Value for sessionID identifies the dataset the operation will be performed on and is a 30 character long hexadecimal string which gets returned after successfully uploading a new dataset.

Value for requestID does not affect how the server performs the operation, but should be a unique string that the message sender remembers in order to identify the corresponding response message received from the backend.

Value for operation must match the message title.

Example Request¶

Title

changeColumnDataType

Body

{
        'operation': 'changeColumnDataType',
        'sessionID': '617646cb1e421f72b7e742dbdbd4cb',
        'requestID': '0002a43e',
        'column': 'Date',
        'newDataType': 'datetime64'
}

Response Message Structure¶

After a well-formed request message for an operation is received and parsed by the Willow backend, the operation gets queued. Once the backend performs and completes the operation, a SocketIO response message with the following structure is sent to the client:

Title

Body

{
        'operation': <String>
        'sessionID': <String>,
        'requestID': <String>,
        'success': <Boolean>,
        'error': <String>,
        'errorDescription': <String>,
        ...request-specific response key-value pairs...
}

The message title and value for operation in the body will both holding the name of the original operation. Besides operation, requestID and sessionID will also be identical to the values specified in the original request, allowing the client to identify which request the response corresponds to.

error and errorDescription keys will only be present in the response body if success is false, meaning the operation failed.

Requests¶

The Willow WebSocket API defines the following requests. Note that the headings are the request names which should be supplied as the title of the SocketIO message.

`metadata`¶

Requests the metadata for a dataset. Metadata includes information such as dataset size (no. of rows and columns) and column names and datatypes.

It is possible to request the metadata on a filtered and/or searched view of the dataset. This can be used to, for example, get the number of rows which contain outliers in a specific column.

Request Body Structure
{
        ...standard request key-value pairs...,
        'filterType': <String>,
        'filterColumnIndices': [<Integer>],
        'outliersStdDev': <Number>,
        'outliersTrimPortion': <Float>,
        'searchQuery': <String>,
        'searchColumnIndices': [<Integer],
        'searchIsRegex': <Boolean>
}
Params

filterType, optional

Supply a value to request the metadata on a filtered view of the specified dataset. Valid options are:

'invalid' for missing/invalid values in specified columns

'outliers' for outliers in specified numerical columns

'duplicates' for duplicates in specified columns

filterColumnIndices, optional

Must be used with filterType parameter for specifying which columns the filter should be applied on. Value should be a list of column indices (integers).

Note

The filter is applied using a boolean conjunction, meaning that a row must satisfy the filter condition for all specified columns to be included.

outlierStdDev, optional

Must be used if filterType set to 'outliers' to specify how many standard deviations a value must be to be considered an outlier.

outliersTrimPortion, optional

Must be used if filterType set to 'outliers' to specify a portion of the dataset to trim from highest and lowest values

searchQuery, optional

Supply a value to request the metadata on a searched view of the dataset. Can be a simple search term or a regular expression.

searchColumnIndices, optional

Must be used with searchQuery parameter for specifying which columns the search will be performed on.

Note

Unlike filters, the search is performed using a boolean disjunction, meaning that a row only has to contain the search term in one of the specified columns to be included.

searchIsRegex, optional

Must be used with searchQuery parmeter for specifying whether or not the search term is a regular expression.

Response Body Structure
{
        ...standard response key-value pairs...,
        'undoAvailable': <Boolean>,
        'dataSize': {
                'rows': <Integer>,
                'columns': <Integer>
        },
        columns: [<String>],
        columnInfo: {
                <column_name>: {
                        'dataType': <String>,
                        'invalidValues': <Integer>,
                },
                ...
        }
}
undoAvailable specifies whether or not an undo operation is currently possible for the dataset

dataSize specifies the size of the data set as a dictionary indexed by ‘rows’ and ‘columns’

columns specifies the names of each column as a list

columnInfo specifies the data type and number of invalid values in each column as a dictionary indexed by column name

`data`¶

Requests the data for a dataset in JSON format.

It is possible to request the data of a filtered, sorted and/or searched view of the dataset. This can be used to, for example, get the rows and columns that are duplicated.

Note

Because Willow generally handles large datasets, you must always specify a slice when retrieving data through this request. Although is nothing to prevent specifying the entire dataset as the slice, performance will definitely take a hit.

Request Body Structure
{
        ...standard request key-value pairs...,
        'rowIndexFrom': <Integer>,
        'rowIndexTo': <Integer>,
        'columnIndexFrom': <Integer>,
        'columnIndexTo': <Integer>,
        'filterType': <String>,
        'filterColumnIndices': [<Integer>],
        'outliersStdDev': <Number>,
        'outliersTrimPortion': <Float>,
        'searchQuery': <String>,
        'searchColumnIndices': [<Integer],
        'searchIsRegex': <Boolean>,
        'sortColumnIndex': <Integer>,
        'sortAscending': <Boolean>
}
Params

rowIndexFrom

A required parameter for specifying the slice of the dataset to view

rowIndexTo

A required parameter for specifying the slice of the dataset to view

columnIndexFrom

A required parameter for specifying the slice of the dataset to view

columnIndexTo

A required parameter for specifying the slice of the dataset to view

sortColumnIndex, optional

Index of a column to sort the dataset by

sortAscending, optional

Specify true to sort in ascending order, false for descending

Remaining parameters behave identically to the parameters for :ref:`metadata <socket-metadata>`.

Response Body Structure
{
        ...standard response key-value pairs...,
        'data': {
                index: [<Integer>],
                columns: [<String>],
                data -> [[<Any>]]
        }
}
The data is encapsulated in a dictionary under the data key in the response. The dictionary holds the indices of the requested slice, names of the requested columns and the actual data as an array of arrays.

`analyze`¶

Compute statistics on the specified column.

The Willow backend will provide the appropriate statistics based on the data type of column.

Request Body Structure
{
        ...standard request key-value pairs...,
        'column': <String>
}
Params

column

Name of column to analyze

Response Body Structure
{
        'invalid': <Integer>,
        'unique_count': <Integer>,
        'mode': [<Any>],
        'mode_count': <Integer>,
        'frequencies': [
                <value>: <Integer>,
                ...
        ],
        ...data-type specific statistical metrics...
}
invalid specifies the number of invalid/missing values in the column

unique_count specifies the number of unique values in the column

mode specifies the most frequently occuring values as a list

mode_count specifies the frequency of the mode(s)

frequencies is a list of the top 50 most commonly occurring values and their frequencies

The returned response will also contain more statistical metrics depending on the data type.

`changeColumnDataType`¶

Change the data type of a column

Request Body Structure
{
        ...standard request key-value pairs...,
        'column': <String>,
        'newDataType': <String>,
        'dateFormat': <String>
}
Params

column

Name of column to change data type of

newDataType

Any valid type string that can be parsed by numpy.dtype().

dateFormat, optional

Supply a Python date format string to override automatic date parsing when converting a column to numpy.datetime64.

Response Body Structure
{
        ...standard response key-value pairs...
}

`combineColumns`¶

Combines multiple columns into a new column, concatenating each value using a specified separator.

Request Body Structure
{
        ...standard request key-value pairs...,
        'columnsToCombine': [<Integer>],
        'seperator': <String>,
        'newName': <String>,
        'insertIndex': <Integer>
}
Params

columnIndex

List of columns to combine

seperator

Separator character or string

newName

Name for column containing combined values

insertIndex

Index to insert new column at

Response Body Structure
{
        ...standard response key-value pairs...
}

`deleteColumns`¶

Rename a column

Request Body Structure
{
        ...standard request key-value pairs...,
        'columnIndices': [<Integer>]
}
Params

columnIndices

List of column indices

Response Body Structure
{
        ...standard response key-value pairs...
}

`deleteRows`¶

Delete rows

Request Body Structure
{
        ...standard request key-value pairs...,
        'rowIndices': [<Integer>]
}
Params

columnIndices

List of row indices

Response Body Structure
{
        ...standard response key-value pairs...
}

`deleteRowsWithNA`¶

Delete rows with missing values in the specified column

Request Body Structure
{
        ...standard request key-value pairs...,
        'columnIndex': <Integer>
}
Params

columnIndex

Index of column for operation

Response Body Structure
{
        ...standard response key-value pairs...
}

`emptyStringToNaN`¶

Replaces all instances of ‘’ (empty string) with NaN for a specified string column. Useful for a consistent definition of “missing/invalid” value.

Request Body Structure
{
        ...standard request key-value pairs...,
        'columnIndex': <Integer>
}
Params

columnIndex

Index of column for operation

Response Body Structure
{
        ...standard response key-value pairs...
}

`executeCommand`¶

Executes a Python statement in a pre-configured environment

Danger

Using this function carries direct risk, as any arbitrary command can be executed

The command parameter can be a string containing multiple lines of Python statements. The command is executed in a pre-configured environment with df holding a reference to the data frame, and multiple modules loaded, including pandas and numpy.

Request Body Structure
{
        ...standard request key-value pairs...,
        'command': <String>
}
Params

command

String containing a single Python command, or multiple Python commands delimited by newline

Response Body Structure
{
        ...standard response key-value pairs...
}

`fillDown`¶

Fill missing values with last or next seen valid value for a range of columns

Request Body Structure
{
        ...standard request key-value pairs...,
        'columnFrom': <Integer>,
        'columnTo': <Integer>,
        'method': <String>
}
Params

columnFrom

Starting index of column range

columnTo

Ending index of column range (inclusive)

method

Mode of operation: specify ‘bfill’ for backwards fill (next valid value) and ‘pad’ for forward fill (last valid value)

Response Body Structure
{
        ...standard response key-value pairs...
}

`fillWithCustomValue`¶

Fill missing values with a custom specified value, in-place

Request Body Structure
{
        ...standard request key-value pairs...,
        'columnIndex': <Integer>,
        'newValue': <Any>
}
Params

columnIndex

Index of column to operate on

newValue

Fill value

Response Body Structure
{
        ...standard response key-value pairs...
}

`fillWithAverage`¶

Fill missing values with an average metric. Average metrics that can be used to fill with are: mean, median and mode.

Warning

Using mean or median metric on a non numeric column will result in an error response.

Request Body Structure
{
        ...standard request key-value pairs...,
        'columnIndex': <Integer>,
        'metric': <Integer>
}
Params

columnIndex

Index of column to operate on

metric

Average metric to use, options are: ‘mean’, ‘median’ and ‘mode’

Response Body Structure
{
        ...standard response key-value pairs...
}

`findReplace`¶

Finds all values matching the given patterns in the specified column and replaces them with a value.

Searching for multiple patterns is supported, and search patterns can be strings which will be matched as a whole or regular expressions (if matchRegex param set to true).

Standard Pythonic regex subsitutions are also possible.

Request Body Structure
{
        ...standard request key-value pairs...,
        'columnIndex': <Integer>,
        'toReplace': [<String>],
        'replaceWith': [<String>],
        'matchRegex': <Boolean>
}
Params

columnIndex

Index of column for operation

toReplace

List of search strings or regular expressions. Length of list must match length of replaceWith parameter.

replaceWith

List of replacement strings or regular expressions. Length of list must match length of toReplace parameter.

matchRegex

Must be set to true if supplying list of regular expressions

Response Body Structure
{
        ...standard response key-value pairs...
}

`findReplace`¶

Generates dummies/indicator variable columns from a specified column (containing categorical data)

Searching for multiple patterns is supported, and search patterns can be strings which will be matched as a whole or regular expressions (if matchRegex param set to true).

Standard Pythonic regex subsitutions are also possible.

Request Body Structure
{
        ...standard request key-value pairs...,
        'columnIndex': <Integer>,
        'inplace': <Boolean>
}
Params

columnIndex

Index of column for operation

inplace

Removes original column if set to true

Response Body Structure
{
        ...standard response key-value pairs...
}

`interpolate`¶

Fill missing values for specified numeric column using interpolatoin

Request Body Structure
{
        ...standard request key-value pairs...,
        'columnIndex': <Integer>,
        'method': <String>,
        'order': <Integer>
}
Params

columnIndex

Index of numeric column to operate on

method

Interpolation method. Options include ‘linear’, ‘spline’ and ‘polynomial’. Refer to list of all available methods of interpolation here.

order, optional

Must be specified if using ‘polynomial’ or ‘spline’ interpolation.

Warning

The higher the order (and larger the dataset), the more computationally expensive the interpolation will be.

Response Body Structure
{
        ...standard response key-value pairs...
}

`insertDuplicateColumn`¶

Duplicates a column, inserting the new column to the right of the original column.

Request Body Structure
{
        ...standard request key-value pairs...,
        'columnIndex': <Integer>
}
Params

columnIndex

Index of column to duplicate

Response Body Structure
{
        ...standard response key-value pairs...
}

`newCellValue`¶

Modifies the value of a specified cell.

Request Body Structure
{
        ...standard request key-value pairs...,
        'columnIndex': <Integer>,
        'rowIndex': <Integer>,
        'newValue': <Any>
}
Params

columnIndex

Integer index of column

rowIndex

Integer index of row

newValue

New value for cell

Response Body Structure
{
        ...standard response key-value pairs...
}

`normalize`¶

Performs normalization on a numeric column, uniformally scaling the values to fit in the specified range

Warning

Requesting normalize on a non numeric column will invoke an error response.

Request Body Structure
{
        ...standard request key-value pairs...,
        'columnIndex': <Integer>,
        'rangeFrom': <Number>,
        'rangeTo': <Number>
}
Params

columnIndex

Index of (numeric) column to normalize

rangeStart, optional

Start of scaling range, default = 0

rangeEnd, optional

End of scaling range, default = 1

Response Body Structure
{
        ...standard response key-value pairs...
}

`renameColumn`¶

Rename a column

Request Body Structure
{
        ...standard request key-value pairs...,
        'column': <String>,
        'newName': <String>
}
Params

column

Name of column to rename

newName

New name

Response Body Structure
{
        ...standard response key-value pairs...
}

`splitColumn`¶

Splits a string column according to a specified delimiter or regular expression.

The split values are put in new columns inserted to the right of the original column.

Request Body Structure
{
        ...standard request key-value pairs...,
        'columnIndex': <Integer>,
        'delimiter': <String>,
        'regex': <Boolean>
}
Params

columnIndex

Index of (string) column to split

delimiter

Delimiting character, string or regular expression for splitting each row

regex

Set to true if delimiter is a regular expression

Response Body Structure
{
        ...standard response key-value pairs...
}

`standardize`¶

Performs standardization on a numeric column, unformally scales the values so that mean equals 0 and standard deviation equals 1.

Warning

Requesting normalize on a non numeric column will invoke an error response.

Request Body Structure
{
        ...standard request key-value pairs...,
        'columnIndex': <Integer>
}
Params

columnIndex

Index of (numeric) column to standardize

Response Body Structure
{
        ...standard response key-value pairs...
}

`undo`¶

Undo the previous operation

Note

The Willow backend does not track changes past the most recent operation, meaning that the effective number of undo’s is limited to 1. Requesting undo twice equates to a redo.

It should be made sure that the undo operation is available for the dataset, by checking the undoAvailable key in the response of a metadata request. Otherwise, an error response will be given.

Request Body Structure
{
        ...standard request key-value pairs...
}
Response Body Structure
{
        ...standard response key-value pairs...
}

`visualize`¶

Generate a visualization on the specified column.

Supported visualizations are frequency bar charts, histograms, scatter plots, time series plots and line charts. The image is encoded as a Base64 PNG image string in the response.

Histogram Request Body Structure
{
        ...standard request key-value pairs...,
        'type': 'histogram',
        'columnIndices': [<Integer>],
        'options': {
                'numberOfBins': <Integer>,
                'axis': {
                        'x': {
                                'start': <Number>,
                                'end': <Number>
                        },
                        'y': {
                                'start': <Number>,
                                'end': <Number>
                        }
                }
        }
}
Histogram Request Body Params

columnIndices

List of column indices for plotting (histograms support multiple columns)

options, optional

Custom settings for plotting:

numberOfBins, optional

Number of bins to categorize values into

axis, optional

Dictionary specifying axis/window settings

Frequency Chart Request Body Structure
{
        ...standard request key-value pairs...,
        'type': 'frequency',
        'columnIndex': <Integer>,
        'options': {
                'useWords': <Boolean>,
                'cutoff': <Integer>
        }
}
Frequency Chart Request Body Params

columnIndex

Index of column to plot

options, optional

Custom settings for plotting:

useWords, optional

Set to true to plot word frequencies instead of row value frequencies for a string column

cutoff, optional

Specify the top n values by frequency to plot, default is 50, maximum is 50

Scatter Plot Request Body Structure
{
        ...standard request key-value pairs...,
        'type': 'scatter',
        'xColumnIndex': <Integer>,
        'yColumnIndices': [<Integer>],
        'options': {
                'axis': {
                        'x': {
                                'start': <Number>,
                                'end': <Number>
                        },
                        'y': {
                                'start': <Number>,
                                'end': <Number>
                        }
                }
        }
}
Scatter Plot Request Body Params

xColumnIndex

Index of column to plot on x-axis

yColumnIndices

List of indices of columns to plot on y-axis.

Note

The function supports plotting multiple columns with respect to one axis, but the number of columns should be limited to 6 for optimal color assignment of the plot points.

options, optional

Custom settings for plotting:

axis, optional

Dictionary specifying axis/window settings

Line Chart Request Body Structure
{
        ...standard request key-value pairs...,
        'type': 'line',
        'xColumnIndex': <Integer>,
        'yColumnIndices': [<Integer>],
        'options': {
                'axis': {
                        'x': {
                                'start': <Number>,
                                'end': <Number>
                        },
                        'y': {
                                'start': <Number>,
                                'end': <Number>
                        }
                }
        }
}
Line Chart Request Body Params

xColumnIndex

Index of column to plot on x-axis

yColumnIndices

List of indices of columns to plot on y-axis.

Note

The function supports plotting multiple columns with respect to one axis, but the number of columns should be limited to 6 for optimal color assignment of the plot points.

options, optional

Custom settings for plotting:

axis, optional

Dictionary specifying axis/window settings

Response Body Structure
{
        ...standard response key-value pairs...,
        'image': <String>,
        'axis': {
                'x': {
                        'start': <Number>,
                        'end': <Number>
                },
                'y': {
                        'start': <Number>,
                        'end': <Number>
                }
        }
}
image

Base64 encoded PNG image string of generated plot

axis

Dictionary specifying axis/window settings used in the generated chart

Notifications¶

Besides sending request messages, clients which are connected to the Willow backend should be prepared the receive the following notifications:

`dataChanged`¶

Notification sent when a dataset has been changed, due to an data manipulation operation performed by any client using the same sessionID (operating on the same dataset).

Notification Body Structure
{
        'sessionID': <String>
}

WebSocket API¶

Request Message Structure¶

Example Request¶

Response Message Structure¶

Requests¶

`metadata`¶

`data`¶

`analyze`¶

`changeColumnDataType`¶

`combineColumns`¶

`deleteColumns`¶

`deleteRows`¶

`deleteRowsWithNA`¶

`emptyStringToNaN`¶

`executeCommand`¶

`fillDown`¶

`fillWithCustomValue`¶

`fillWithAverage`¶

`findReplace`¶

`findReplace`¶

`interpolate`¶

`insertDuplicateColumn`¶

`newCellValue`¶

`normalize`¶

`renameColumn`¶

`splitColumn`¶

`standardize`¶

`undo`¶

`visualize`¶

Notifications¶

`dataChanged`¶

Willow

Versions

Navigation

WebSocket API¶

Request Message Structure¶

Example Request¶

Response Message Structure¶

Requests¶

metadata¶

data¶

analyze¶

changeColumnDataType¶

combineColumns¶

deleteColumns¶

deleteRows¶

deleteRowsWithNA¶

emptyStringToNaN¶

executeCommand¶

fillDown¶

fillWithCustomValue¶

fillWithAverage¶

findReplace¶

findReplace¶

interpolate¶

insertDuplicateColumn¶

newCellValue¶

normalize¶

renameColumn¶

splitColumn¶

standardize¶

undo¶

visualize¶

Notifications¶

dataChanged¶

`metadata`¶

`data`¶

`analyze`¶

`changeColumnDataType`¶

`combineColumns`¶

`deleteColumns`¶

`deleteRows`¶

`deleteRowsWithNA`¶

`emptyStringToNaN`¶

`executeCommand`¶

`fillDown`¶

`fillWithCustomValue`¶

`fillWithAverage`¶

`findReplace`¶

`findReplace`¶

`interpolate`¶

`insertDuplicateColumn`¶

`newCellValue`¶

`normalize`¶

`renameColumn`¶

`splitColumn`¶

`standardize`¶

`undo`¶

`visualize`¶

`dataChanged`¶