Requirements

Project Background

Our project has been developed in collaboration with Microsoft, one of the world’s leading technology companies, renowned for its software solutions that empower individuals and businesses globally. As part of our brief, we were tasked with addressing a critical pain point in the software development process by creating a Visual Studio Code extension that leverages GitHub Copilot. Developed by GitHub and Microsoft, GitHub Copilot is an AI-powered coding assistant that enhances developer productivity by using OpenAI’s Codex to generate intelligent code suggestions.

Microsoft Logo

Despite the numerous tools available, developers still face challenges in ensuring code quality, maintaining consistency, and adhering to best practices throughout the development lifecycle. These pain points are particularly prevalent when working with large codebases or legacy systems, where debugging and optimising code can become time-consuming and error-prone.

One of the most significant pain points is unit testing, which remains a time-consuming and error-prone aspect of development. Developers often struggle with ensuring test effectiveness, adhering to coding standards, and managing the writing, maintenance, and optimisation of tests. This widespread frustration underscores a critical issue – one our project aims to address by enhancing GitHub Copilot’s capabilities to streamline the unit testing workflow.

To bridge this gap, we are developing UnitPylot, a GitHub Copilot Tool for Python developers who use Visual Studio Code. By creating a tool that offers intelligent insights into the quality of unit tests, we aim to make the process more accessible and efficient for both new and experienced developers.

Project Goals

Following discussions with our client and our initial requirements meeting, we have identified the overarching goal of our project: to develop a VS Code extension that extends GitHub Copilot to address a critical pain point in the software development lifecycle. Specifically, it aims to:

  1. Enhance developer efficiency and productivity by providing intelligent, context-aware suggestions. The extension should streamline workflows, reduce cognitive load, and minimise repetitive tasks, enabling developers to focus on higher-level problem-solving.
  2. Improve code quality and maintainability by offering real-time insights and enforcing best practices, helping developers write cleaner, more robust code.


As will be explained below, through our research, we identified unit testing as a key area for improvement, making it our chosen pain point. Therefore, the specific goals of our project are to:
  • Provide immediate insights into an existing testing suite, helping developers analyse, optimise, and maintain their tests more effectively.
  • Offer real-time feedback on test quality, identify inefficiencies, and provide actionable recommendations.
  • Seamlessly integrate into the developer’s workflow within VS Code, making unit testing more efficient and accessible.
By addressing these challenges, our extension will empower developers to write better tests, reduce debugging time, and ultimately enhance overall code quality.

Requirements Gathering

Primary Research: Investigating our User Landscape

To ensure that our project meets the needs of our target audience, we conducted primary research to investigate our user landscape.

Before investigating, we evaluated various methods on how we could conduct our research. Ultimately we chose to conduct semi-structured interviews with academics, software engineers, and our Microsoft clients, to gain in-depth qualitative insights and identify key pain points/inefficiencies in the software development life cycle. This helped us identify potential areas where extension tools could be beneficial.

Taking advantage of the fact that our department is currently building a portfolio of Software Engineering Tools, we decided to speak to some of the participating professors in order to gain insights into tools that are being already developed and identify key areas for innovation.

User Interviews

An example of the responses from one of the semi-structured interviews is as follows:

  • “What are the main challenges you face in the current software development process?”
  • "Often I find myself writing code, running it once, and accepting it if it doesn't crash. I can't be bothered to write tests because it just takes too long."

  • “How do you currently use AI in your workflow?”
  • "I use it to start off on a project as this is the most difficult part for me. I don't know what to do first."

  • “What kinds of mistakes do you see most often in code?”
  • "I often make silly mistakes such as missing an '=' sign in an if statement or forgetting to return a value from a function."

  • “What do you think AI tools should improve upon?”
  • "The code generated by these tools look correct but often contain small errors which end up taking more time to fix than just manually writing it out."

  • “What do you wish development tools could do better?”
  • "I wish debugging tools could provide more intuitive insights into test failures instead of just dumping logs."

  • “Are there any IDE tools you particularly find helpful or unhelpful?”
  • "Integrated linters and static analysis tools are great, but auto-formatting sometimes changes code in ways that break tests unexpectedly."

  • “What are the main challenges you face in the current software development process?”
  • "Debugging failing test cases is frustrating, especially when error messages are unclear. Balancing speed and thorough testing under deadlines is tough."

  • “How do you currently use AI in your workflow?”
  • "Copilot helps with generating functions and test cases, but its suggestions are sometimes too general and not specific to certain metrics. I usually have to tweak them for better accuracy."

  • “What kinds of mistakes do you see most often in code?”
  • "I sometimes forget to test certain scenarios, and that leads to unexpected bugs later."

  • “What do you think AI tools should improve upon?”
  • "It needs better contextual awareness. Right now, it’s good at patterns but weak in understanding project-specific needs."

  • “What do you wish development tools could do better?”
  • "More intuitive ways to measure code quality. Metrics exist, but they aren’t always easy to act upon."

  • “Are there any IDE tools you particularly find helpful or unhelpful?”
  • "I like VS Code for its extensions, but I wish it had more AI-powered features."

  • “What are the main challenges you face in the current software development process?”
  • "Working in a large team is time consuming for me as I need to put in effort to write code that follows the same style as my peers. Also, new technologies are being launched every day and it is very hard to keep up."

  • “How do you currently use AI in your workflow?”
  • "I use it to help me understand other people's code better. I frequently use the /explain feature of Copilot which breaks down complex parts of a system."

  • “What kinds of mistakes do you see most often in code?”
  • "Most of the code I work with is very well documented but some legacy codebases are written using old naming conventions which are very hard to follow."

  • “What do you think AI tools should improve upon?”
  • "I think AI tools should be more adaptive. They should only provide code snippets if they are certain to work."

  • “What do you wish development tools could do better?”
  • "I wish there was a tool which could automate the tedious parts of development while I work on the interesting aspects."

  • “Are there any IDE tools you particularly find helpful or unhelpful?”
  • "When AI tools apply suggestions to my code, I find it hard to see what it has changed and for what reason."

Additionally, we distributed a questionnaire to other academics and our peers to enable us to gather quantitative data about their experience, proficiency, desired AI tool features, and pain points Participants were given an information sheet and consent form and all ethical guidelines were strictly followed throughout the process, ensuring the integrity and confidentiality of their information.

Questionnaire Results

The insights we gathered from our primary research proved to be invaluable in helping us understand the challenges faced by developers and guided us in identifying the key features our tool should incorporate.

Here is a snapshot of some of the results from our questionnaire:

Survey Results

The insights gathered from our primary research were invaluable in helping us understand the challenges faced by developers and the potential opportunities for innovation in the software development landscape. This information was crucial in shaping our project goals and identifying the key features that our tool should offer.

Secondary Research

Along with our primary research, we reviewed online articles and journals to gain a broader understanding of the current software development landscape and identified existing tools that address similar pain points. This secondary research was crucial in helping us find gaps and limitations in the existing tools the market has to offer.

Continued in our Research Section...

Personas

Using the data collected from our user interviews, questionnaire, and secondary research, we created personas and scenarios for our target users in order to better understand their needs. The personas represent the two different users who would benefit from our tool, beginner and expert software developers.

Scenarios

DeMarcus Cousins (Senior Developer)

DeMarcus is responsible for maintaining code quality in his team, which primarily works with a large brownfield codebase. Reviewing interns’ contributions is particularly challenging because they often struggle to write effective tests for legacy code that was not originally designed with testing in mind. Many existing tests are outdated, redundant, or provide poor coverage, making it difficult to determine whether new changes are truly reliable. Debugging test failures is another pain point—figuring out whether a failure stems from a genuine issue, an unstable test, or an untested dependency is time-consuming. Additionally, DeMarcus has noticed inefficiencies in how testing is approached across the team. Some tests take significantly longer to run than they should, slowing down development workflows, while others consume excessive memory, impacting performance. In a rapidly evolving codebase, tracking and addressing these inefficiencies is difficult without clear visibility into test metrics over time. With deadlines approaching, DeMarcus needs better tools to help interns improve their testing practices and identify areas where tests could be optimised.

Alex Summer (Computer Science Student at University)

Alex, an intern at DeMarcus’s company, is eager to prove herself while contributing to a complex brownfield codebase. Unlike greenfield projects, where tests are written alongside new code, much of the existing code lacks proper unit tests, making it difficult for her to validate her changes. She spends hours manually tracing dependencies and trying to understand how existing tests (if any) relate to her work. When she does write tests, she often struggles to determine whether they provide meaningful coverage or if they are simply repeating existing tests without adding real value. Debugging failing tests is another challenge, because the codebase has evolved over time, failures can stem from legacy issues rather than her changes, making it hard to pinpoint the root cause. Additionally, running tests takes longer than expected, as some legacy tests are inefficient and slow down the entire suite. Without clear visibility into which tests are problematic, she hesitates to run the full suite frequently, risking integration issues down the line. If she had better insights into test performance (such as coverage trends, test execution time, and failure patterns), she could contribute more effectively and avoid common pitfalls in maintaining a legacy system.

Use Cases

The use case diagram below highlights through which features a software developer can interact with the extension. Developers can use the extension for viewing, improving, downloading, and customising test metric data.

Use Case Diagram


Use Case List and Descriptions

ID Name Actors Description Preconditions Basic Flow Postconditions
UC1 Run all or selected tests User The user can execute all tests or selectively run tests based on recent changes, ensuring efficient test execution The user has a test suite in their project.
  1. The user selects an option to run all tests or particular tests.
  2. The system executes the selected tests.
  3. The system updates relevant metrics (e.g., pass/fail rate, execution time) within the dashboard.
Test metrics are updated in the dashboard.
UC2 View overall test metrics in the dashboard User The user can view an overview of test suite performance (e.g., passing/failing tests, slowest tests, memory usage, coverage), highlighting critical metrics at a glance. The user has run all the tests at least once, so metrics can be computed.
  1. The user opens the test metrics dashboard.
  2. The system computes and displays key test metrics, such as:
    • Number of passing and failing tests
    • Slowest tests
    • Most memory-intensive tests
    • Overall coverage percentage
  3. The system displays an overview of test metrics (e.g., pass/fail rate, coverage, memory usage).
  4. The user can click on specific metrics to view more detailed information.
The user has a clear overview of detailed test metrics.
UC3 View Coverage annotations in the IDE User The user can see untested or under-tested areas of the codebase with specific line or module annotations directly within the IDE itself. The system can generate test coverage data.
  1. The user opens the IDE and has enabled code coverage visualisation in the extension's settings.
  2. The system highlights untested areas directly in the code editor.
Coverage insights are visible directly in the editor.
UC4 View all test metrics on a granular scale User The user can view all test metrics on a granular scale, via a tree view of the test suite. The system must support a tree view for displaying the test suite.

The user must have run all tests at least once, so metrics can be computed.
  1. The user opens the test metrics dashboard.
  2. The system computes test metrics for any test cases with changes, and displays these in a tree view.
  3. The user can expand/collapse the tree view to view tests that are failing, memory-intensive, or slow, organised by file.
  4. The user can click on a test case, and the relevant file containing the code for that test case will open in the editor.
The test metrics are displayed on a granular level via a tree view, and are updated if changes are made in the test suite
UC5 Improve test metrics using LLM optimisation commands User, LLM Model The user can improve test metrics using LLM optimisation commands to enhance test efficiency and resolve failing cases. The user has access to the commands.

The system can generate test optimisation reccomendations.
  1. The system computes metrics for the tests cases in the codebase.
  2. The system generates suggestions based on the selected command for that partiuclar metric.
  3. The user can accept or reject the suggestion.
  4. The system applies the accepted suggestion.
The test suite is optimised based on user-accepted changes.
UC6 View trends through a graphical display User The user can see trends (e.g., coverage, failing tests) through graphical visualisations. The system has collected the relevant data for trends (e.g., coverage, failing tests).

The system has processed it available for visualisation.
  1. The user opens the test metrics dashboard and clicks an option to view a specific trend.
  2. The system computes and displays trends over time for the chosen metric (i.e. coverage or failing tests).
  3. The user can view the trends in graphical form, and can interact with the graphical display, such as hovering to get more details.
The user has successfully viewed trends through a graphical display.
UC7 Download and export test suite data User The user can download/export test data in Markdown or JSON format for documentation. Test metric data exists within the system.
  1. The user selects the export option from the dashboard.
  2. The system prompts the user to choose a format (Markdown or JSON), and select the location to save the file.
  3. The user confirms the export.
  4. The system generates and saves the exported file in the selected format.
Logs about the test metric are successfully exported in the chosen format and saved in the specified location.
UC8 Customise metric display User The user can configure how test metrics are displayed within the dashboard, such as specifying the number of slowest and memory-intensive tests to display, and whether results are updated each time a file is saved or after a set interval. The settings page is accessible to the user.
  1. The user opens the settings page.
  2. The user customises the display of test metrics, for example specifying:
    • The number of slowest tests to display.
    • The number of memory-intensive tests to display.
    • Whether to update metrics on file save or after a set interval.
  3. The system applies and saves the preferences.
The user's display settings are successfully updated and applied.

MoSCoW Requirements

Functional Requirements

Key:

  • Dashboard View
  • AI-Driven Test Optimisation
  • Other Features

  1. Compute and display the overall test pass/fail rates.
  2. Display overall line and branch coverage.
  3. Identify and list tests with the highest memory usage.
  4. Identify and list the slowest tests.
  5. Resolve issues with test case code coverage.
  6. Fix failing test cases.
  7. Optimise the slowest tests.
  8. Optimise tests with the highest memory usage.

  1. Display specific metrics per test case.
  2. Visualise trends over time through graphs.
  3. Provide insights into test interconnectedness and robustness.
  4. Continuously execute tests in the background.
  5. Enable users to accept or reject suggested code snippets.
  6. Store logs about the metrics.

  1. Suggest PyDoc documentation.
  2. Educate developers on best practices for testing.
  3. Include a settings page for customisation.
  4. Allow users to specify certain tests for execution.


Non-Functional Requirements

  • User-Friendly Interface: The extension must be intuitive to navigate and easy to use.
  • GitHub Copilot Integration: The extension must extend GitHub Copilot for AI-powered enhancements.
  • Performance: The extension should be highly responsive and efficient, ensuring minimal lag.
  • Availability: The extension must be publicly available on the VS Code Marketplace.
  • Compatibility: The extension must be compatible with Windows, macOS, and Linux to support all users.

  • Usability: The extension should include clear documentation and a user manual for guidance.
  • Extensibility: The architecture should support the addition of new features with minimal disruption.
  • Maintainability: The codebase should be well-documented, modular, and easy to update for long-term support.

  • Security: The extension could include data privacy measures, such as secure API calls.
  • Offline Mode: The extension could function with limited features when offline, ensuring usability without an internet connection.

  • Custom IDE Support: The extension is exclusive to VS Code and will not be developed for other IDEs (e.g., PyCharm, IntelliJ).