Although we experimented and used various methods to test our system, our main testing strategy composed of user acceptance testing, to get feedback on the end user experience, and widget testing (which is a form of white-box testing), to verify specific outcomes depending on certain branches of logic. Also, we frequently carried out our own black-box testing to ensure the system behaves as expected.

Flutter supports three main testing strategies - unit, widget and integration testing - and after evaluating all of them, widget testing was the most useful for our app. (These strategies are all white-box testing, since of course you do not need special tooling to carry out black-box testing.)

Flutter being a (mostly) declarative UI framework eliminates some classes of bugs so we were able to focus on testing the parts that do have more complex logic.

Unit Testing

Unit testing at it’s simplest level tests whether a function has the expected output, given some input. We use this style to test our cryptographic procedures.

We generated an example public/private key pair (using OpenSSL¹) and took the components:

final publicExponent = BigInt.parse('65537');
final privateExponent = BigInt.parse('0x...',
    radix: 16);
final modulus = BigInt.parse('0x...',
    radix: 16);
final p = BigInt.parse('0x...',
    radix: 16);
final q = BigInt.parse('0x...',
    radix: 16);
const publicPEM = '-----BEGIN RSA PUBLIC KEY-----\n'
    'MIIBCgKCAQEAz6mfTm1Kwa9c4SKioJtRHtRSXKmnlFciMc1cGNKqVCaM00rW5Z5L'
    'Z3socDnx65ljsgiYhWXM1te6+x2HRX7qmS4SfgL/BHBYGzvnMbV7KTImuB03AeCA'
    'u4E5dYqFIhLPr7yROetuQujarPO7WKbFX3iYwj27Anr2FYm2xEtMDI1VkrpFPePm'
    'FS6w+s5EsSRAF3gDqrIDN66hGPPX08cLnaxVCn2IHAasjsrSvWXUaXvpTQD+8Zq0'
    'dac1u6UmqgGUqiwoZxjeliVHNWSEbHrd5UxgN8OgOeX7//KMHfbX21bbVyNrYDIB'
    'frPDdq+ifNYo9L2GgqtAE/GzkkzzILCvqQIDAQAB'
    '\n-----END RSA PUBLIC KEY-----';

I have omitted the full strings since they are rather long.

Then we perform some short, quick tests to ensure it works as expected. For example, we check if our encoding produces the same PEM string as the one generated by OpenSSL:

test('encodePublicKeyInPem correctly encodes PKCS#1 public key to PEM', () {
  // setup public key
  final publicKey = RSAPublicKey(modulus, publicExponent);
  
  // execute function we are testing
  final encoded = encodePublicKeyInPem(publicKey);
  
  // verify expectations
  expect(encoded, publicPEM);
});

Since Flutter apps are primarily made up of widgets, Flutter provides a way to test them in a virtual execution environment². We can then interact with the widget with methods like tester.drag or tester.tap.

Here is an example of a test which found a bug:

  testWidgets('Swipes through without exception', (WidgetTester tester) async {
    // draws up the `IntroScreen` onto the testing environment
    await tester.pumpWidget(wrapAppProvider(IntroScreen()));

    for (int i = 0; i < numberOfPages - 1; ++i) {
      // perform a swipe to the left, beginning from where the current image displayed
      // by the app is located
      await tester.drag(find.byType(Image), Offset(-500.0, 0.0));
      
      // wait for all the animations to complete
      await tester.pumpAndSettle();
    }
  });

wrapAppProvider is explained in the ‘Mocking’ section.

The bug that this found was a rendering bug with the UI. We initially ran the program on our physical devices to test it, but since our devices have a large screen size, we didn’t realize that our UI would not properly scale to fit smaller screens. The virtual testing environment where these widget tests run have an intentionally small screen size to pick up this class of bugs.

Flutter - Mocking with `mockito`

For most widgets we are interested in testing, there is often some dependency that we want to remove to ensure that we are only testing a small snippet of code. We use the mockito package to mock dependencies.

In our code, the objects that we end up mocking are the database helper classes, so we use a helper function to wrap a given widget around mocked databases:

// wraps a [Widget] with [MaterialApp] and also provides mocked databases
Widget wrapAppProvider(Widget w, {UserWellbeingDB wbDB, FriendDB friendDB}) {
  if (wbDB == null) {
    wbDB = MockedWBDB();
  }
  if (friendDB == null) {
    friendDB = MockedFriendDB();
  }

  return MultiProvider(
    providers: [
      // provide the mocked databases for descendants:
      ChangeNotifierProvider.value(
        value: wbDB,
      ),
      ChangeNotifierProvider.value(
        value: friendDB,
      ),
    ],
    // we need to wrap it around a MaterialApp since the widgets we test are
    // smaller components that are normally already wrapped in a MaterialApp
    child: MaterialApp(
      home: w,
    ),
  );
}

We define mocked versions of a class like so:

class _MockedDB extends Mock implements UserWellbeingDB {}

By default, _MockedDB is now stubbed, returning null for any methods called on it. We can use when, thenAnswer, etc. on an instance of _MockedDB to specify how it should behave.

Here is one mock test that found a bug in the wellbeing check page:

  testWidgets('Works when steps reset', (WidgetTester tester) async {
    // we also need to mock SharedPreferences:
    SharedPreferences.setMockInitialValues(
        {'postcode': 'N6', 'support_code': '12345', PREV_STEP_COUNT_KEY: 6666});

    // setup mocked wellbeing database:
    final mockedDB = _MockedDB();
    when(mockedDB.getLastNWeeks(3)).thenAnswer((_) async => <WellbeingItem>[]);

    // we don't need to use mockito or mocking to fake the step count stream,
    // we just create one:
    final fakeStepStream = Stream.fromIterable([0]);

    await tester.pumpWidget(
        wrapAppProvider(WellbeingCheck(fakeStepStream), wbDB: mockedDB));
    await tester.pumpAndSettle();

    // should be at score of 10 after dragging
    await tester.drag(find.byType(Slider), Offset(500.0, 0.0));
    await tester.pumpAndSettle();
    await withClock(
        // this should use the fake clock when requesting date
        Clock.fixed(DateTime(2021)),
        () async => await tester.tap(find.byType(ElevatedButton)));

    verify(mockedDB.getLastNWeeks(3));
    verify(mockedDB.insertWithData(
        date: "2021-01-01",
        postcode: 'N6',
        wellbeingScore: 10.0,
        // this part found the bug, the actual numSteps was negative:
        numSteps: 0, 
        supportCode: '12345'));
    final newPrev = await SharedPreferences.getInstance()
        .then((prefs) => prefs.getInt(PREV_STEP_COUNT_KEY));
    assert(newPrev == 0);
  });

We also use the clock package to fake a time in the above test, since tests should definitely not be dependent on the time we run them at.

Verifying that mockedDB.insertWithData was called with the specified arguments was the check that found the bug. (In particular, there was an arithmetic error in the code that produced negative numbers when the step count was reset.)

API Testing

The Go web framework that we used, echo³, supports API testing. This allows us to create requests and record responses, asserting whether they match our expectations. We use the testify⁴ toolkit for assertions and mock testing in Go.

Go - Mocking with `testify`

As usual, the main object we wish to mock is the database helper object.

We have some interface that some type needs to implement to act as a database helper:

// interface that defines the methods needed to interact with database
type DataSource interface {
  DoesUserExist(identifier string) (bool, error)

  // @param password is plaintext
  isValidPassword(identifier string, password string) (bool, error)
  
  // etc...

So in addition to our actual database helper that implements these (elsewhere in the code), we have a FakeDB:

// mocked object that implements DataSource
type FakeDB struct {
  // including `mock.Mock` gives access to the mock object tracking
  mock.Mock
}

func (db *FakeDB) DoesUserExist(identifier string) (bool, error) {
  // notify the mocking tracker that we were called with `identifier`
  args := db.Called(identifier)
  // these behave as strongly typed getters
  return args.Bool(0), args.Error(1)
}

func (mydb *FakeDB) InsertUser(identifier string, digest []byte) error {
  args := mydb.Called(identifier, digest)
  return args.Error(0)
}

// etc...

Note that in the mockito package for Flutter, it performs this ‘tracking’ automatically, but here we have to specify when a function was called and with certain arguments, etc⁵.

Now we can setup the mocked return values on an instance of FakeDB with fakeDB.On and fakeDB.Return.

Example API Test

func TestDoesNotAddExistingUser(t *testing.T) {
  identifier := "user"
  password := "battery horse staple"
  body := "{\"identifier\":\"" + identifier + "\", \"password\": \""+ password +"\"}"

  // setup the return value for the FakeDB
  fakeDB := new(FakeDB)
  fakeDB.On("DoesUserExist", identifier).Return(true, nil)

  // new reuest to the /user/new endpoint, using `body` as the request body
  req := httptest.NewRequest(http.MethodPost, "/user/new", strings.NewReader(body))
  req.Header.Set(echo.HeaderContentType, echo.MIMEApplicationJSON)
  rec := httptest.NewRecorder()
  // the echo framework uses contexts to determine the current state (e.g. the 
  // parameters of the currennt request):
  c := echo.New().NewContext(req, rec)

  if assert.NoError(t, handleAddUser(fakeDB)(c)) {
    // assert expectations related to the mocked object:
    fakeDB.AssertExpectations(t)
    fakeDB.AssertNotCalled(t, "InsertUser", identifier, mock.AnythingOfType("[]uint8"))

    // assert expectations related to API response:
    assert.Equal(t, http.StatusBadRequest, rec.Code)
    assert.Contains(t, rec.Body.String(), "\"success\":false")
  }
}

The key part above is that we are using AssertNotCalled to ensure that the mocked object (the database) was not asked to add the new user during the execution of handleAddUser. (In this case, we expect this because we are specifying that the user already exists.)

Continuous Integration

We used Github actions to run a series of steps on any pull request or push to master. We only merged pull requests into the master branch when all the checks were passing. Here are the steps we ran (from .github/workflows/flutter.yml):

# setup branches, docker image, etc. omitted

    # run these 4 steps on new PRs
    steps:
      - uses: actions/checkout@v2

      - name: Print Flutter version
        run: flutter --version

      - name: Install dependencies
        run: flutter pub get

      - name: Analyze project source
        run: flutter analyze

      - name: Run tests
        run: flutter test

flutter analyze runs static analysis on our codebase, and picks up common errors like missing imports. flutter test runs our own test suite.

We set up a similar workflow for the Go/backend repo.

Integration testing

Integration testing allows for the automated running of tests on a physical or emulated device. This is the slowest method of testing, but is useful to identify bugs that may not appear when running in Flutter’s widget testing execution environment.

We need to use a separate main function to drive integration tests, from integration_test.dart:

import 'package:integration_test/integration_test_driver.dart';

Future<void> main() => integrationDriver();

The actual tests are found in the integration_test/ directory and are defined similarly to the existing widget tests.

Performance Testing & Profiling

Flutter App - FPS Targets

After using watchPerformance to generate performance data, we inspect the integration_response_data.json to determine if the performance of the app is meeting our targets. Although we have not been set explicit requirements, we aim for an average of 60fps to provide the users with a smooth UI. Here is a snippet of the data:

{
  "performance": {
    "average_frame_build_time_millis": 6.704,
    "90th_percentile_frame_build_time_millis": 8.904,
    "99th_percentile_frame_build_time_millis": 63.525,
    "worst_frame_build_time_millis": 81.152,
    "missed_frame_build_budget_count": 8,
    "average_frame_rasterizer_time_millis": 9.668,
    "90th_percentile_frame_rasterizer_time_millis": 10.654,
    "99th_percentile_frame_rasterizer_time_millis": 97.241,
    "worst_frame_rasterizer_time_millis": 143.301,
    "missed_frame_rasterizer_budget_count": 11,
    "frame_count": 188,
    "frame_build_times": [
      842,
      764,

From the Flutter docs:

For 60fps, frames need to render approximately every 16ms. Flutter Performance Profiling⁶

So we can see that NudgeMe achieves the target, since the 90th_percentile_frame_build_time_millis is 8.904ms. (So 90% of the frames displayed to the user were built in 8.904ms or less.)

Go Backend - Web Performance

It’s important to test our website’s performance since our back-end server is also doing more things (message passing, serving the add-friend page, etc.). We need to confirm that it’s not under too much load.

To test our website visualization performance, we compared it to the previous version using GTmetrix (which uses Google’s Lighthouse). Here is the full report.

Sequence diagram of wellbeing sharing

Our website, previous version, and Google maps (from left-to-right).

As you can see, we have notably improved performance compared to the old version. In addition, we have better performance versus Google Maps, beating it in all criteria except for “Cumulative Layout Shift”⁷.

User Acceptance Testing

We carried out 2 types of user acceptance testing: alpha and beta testing. In both cases, we downloaded ipa’s onto the tester’s phones and asked them to send any feedback they had over the following 2 weeks (enough time for 2 wellbeing checks).

Alpha testing

Throughout development, we downloaded NudgeMe onto Naima and Vishnu’s phones and asked them to report any issues or feedback.

Testers:

Naima (19 yrs, Computer Science student)
Vishnu (24 yrs, Software Engineer)

Feedback from testers included:

During the development process, Naima informed us of small bugs associated with the step count and the graph. She also provided feedback on the various colour changes, for example with the wellbeing score circle on the home page and the change of navigation bar design.
Vishnu responded about the minute and second drop down buttons (in the notification time selector), saying that it was confusing. This was because the single digit numbers were not formatted with preceding 0’s (the minute options included 1, 2, 3 etc. rather than 01, 02, 03 etc.). He also expressed that it was not clear where users can scroll down and suggested adding a scroll bar to make this more obvious. Additionally, he asked that we add the option to change the sharing data permissions after the introduction screen, so we added this to the Settings page.

We sent several apks of NudgeMe to our project partner Joseph Connor and he provided constant feedback. This feedback was often in the form of small changes to the language we used in the application, particularly the introduction screen and the network page. In order to make this process simpler, we made figmas of these pages, which you can find here: Introduction Screen Figma, Network Page Figma. We used these to finalise the language before transferring the changes to the actual application in order to get on the same page and save time.

Beta testing

Once we finished NudgeMe, we downloaded the end product onto the phones of Vishnu, Anchal and Manni and asked them to provide feedback. This might be useful for future developments of NudgeMe.

Testers:

Vishnu (24 yrs, Software Engineer)
Anchal (30 yrs, Graphic Designer)
Manni (54 yrs, Pre-school teacher)

Feedback from testers included:

Vishnu found the home page confusing. He believes the wellbeing page should be considered the home page as it is the most useful and important page. Additionally, he thought that in the Support Code and Postcode sections of the Settings page, the current postcode/support code should be presented as hint text in the textboxes. He also thought a ‘Send to All’ button on the network page could be a useful addition.
Anchal reported that she likes the bolded text on network page as it makes the information easier to read and the bump on the navigation bar that indicates which page is selected. She also loves the home screen as there is limited information so there is nothing to confuse her, and she thinks the emphasis on last week’s score encourages her to do better this week.
Manni stated that NudgeMe has encouraged her to do more exercise and that the notification aspect pushes her to move more. However, she also thought that the caption text is too small and she could not read support code text in the introduction screen about typing selfhelp. She found the text message containing the deep link confusing and was sightly reluctant to send it to her friends because she didn’t recognise the link. In future iterations, the caption text could be enlarged and the deep link could be replaced with a bit.ly link to address these comments.

Feedback from Joseph (our project partner) included that on the step goal progress page, the person should be moving clockwise rather than anticlockwise (view the step goal progress here). We had previously considered making this change but had to prioritise some other changes first and in the end, did not have time.

Remote Error Reporting

We used Sentry to remotely collect error logs and stack traces. Using sentry helped us during user acceptance testing since we wouldn’t have to rely on bug reports from (often non-technical) people. Instead we were (usually) able to precisely identify the exact line of code that caused the exception.

Sentry was especially useful when our project partner reported issues that we were not able to reproduce on our devices. There are slight differences between Android versions and a functioning app on one version may be broken in another. Therefore, even with a precise bug report, we would not have a clear picture of the problem. This occurence was the reason that motivated us to use Sentry in the first place, and allowed us to debug the issue.

https://en.wikibooks.org/wiki/Cryptography/Generate_a_keypair_using_OpenSSL ↩︎
Note that this ‘virtual execution environment’ is not quite an ‘emulator’, which is also why widget tests run much faster. Emulators are much slower. ↩︎
https://echo.labstack.com/ ↩︎
https://github.com/stretchr/testify ↩︎
There is a way to auto generate it but this involves modifying our development build process. ↩︎
https://flutter.dev/docs/perf/rendering/ui-performance ↩︎
The reason we include this is to demonstrate the unavoidable performance penalty that including the Google Maps API into a website incurs. ↩︎

🧪 Testing

Unit Testing

Widget Testing

Flutter - Mocking with `mockito`

API Testing

Go - Mocking with `testify`

Example API Test

Continuous Integration

Integration testing

Performance Testing & Profiling

Flutter App - FPS Targets

Go Backend - Web Performance

User Acceptance Testing

Alpha testing

Beta testing

Remote Error Reporting

🧪 Testing

Unit Testing

Widget Testing

Flutter - Mocking with mockito

API Testing

Go - Mocking with testify

Example API Test

Continuous Integration

Integration testing

Performance Testing & Profiling

Flutter App - FPS Targets

Go Backend - Web Performance

User Acceptance Testing

Alpha testing

Beta testing

Remote Error Reporting

Flutter - Mocking with `mockito`

Go - Mocking with `testify`