Testing

Testing Background

Testing is the process of evaluating that a software meets its requirements and performs as expected. To ensure this, we conducted thorough, extensive testing of its components. Testing is especially important in this project as we have built custom data structures for certain aspects such as text prefix matching, and have included LLM interaction through a chatbot which can introduce non-deterministic results.

To gain a holistic view of our project, we conducted many forms of testing ranging from unit testing individual, self contained components to qualitative user acceptance testing. Finally, a major difficulty in this project was porting our local version of the program to Cisco’s codebase. To ensure that our changes did not impact the core functionality of their log viewer, and was compatible with their codebase, we ran comprehensive integration tests.

Frontend Core Functionality Testing: Filter Matching Function Testing

A crucial aspect of our frontend is its ability to accurately filter log entries. The checkMatch function, part of our log service module, encapsulates this core functionality. It determines whether a log object satisfies a set of filters by checking if any log property (converted to a string) meets the filter criteria. The filters can be plain text—evaluated in either a case-sensitive or case-insensitive manner—or regular expressions.

Test Environment Setup

  • Test Framework: We use Jest as our test runner, which supports JavaScript/TypeScript unit testing.
  • Fixtures and Sample Data: No external fixtures (like file or network dependencies) are required because the function operates purely on provided inputs. However, each test case defines sample log objects and filter arrays to simulate realistic scenarios.
  • Code Under Test: The function checkMatch is defined as follows:

/**
* Determines if any property value in the given log object matches at least one of the filters.
*
* Each filter is an object containing:
* - `regex` {boolean}: If true, the filter's `text` is treated as a regular expression.
* - `caseSensitive` {boolean}: If true, the matching is case-sensitive.
* - `text` {string}: The substring or pattern to match against the log's property values.
*
* If no filters are provided (i.e. the filters array is empty), the function returns true.
*
* @param {Object} log - An object whose values will be checked for matches.
* @param {Array<{regex: boolean, caseSensitive: boolean, text: string}>} filters - An array of filter criteria.
* @returns {boolean} True if any filter matches any value in the log, false otherwise.
*/
export const checkMatch = (log, filters) => {
  if (filters.length === 0) return true;
  return filters.some(({ regex, caseSensitive, text }) => {
    return Object.values(log).some((value) => {
      const strValue = String(value);
      if (regex) {
        const pattern = new RegExp(text, caseSensitive ? "" : "i");
        return pattern.test(strValue);
      }
      return caseSensitive
        ? strValue.includes(text)
        : strValue.toLowerCase().includes(text.toLowerCase());
    });
  });
};

Test Cases and Code Walkthrough

Below is the complete set of test cases for checkMatch, with explanations for each scenario:

import { checkMatch } from '../src/logService';

describe("checkMatch", () => {
  // Test when no filters are provided. The function should always return true.
  test("should return true when no filters are applied", () => {
      const log = { messages: ["Test log"], level: "info" };
      const filters = [];
      expect(checkMatch(log, filters)).toBe(true);
  });

  // Test a case-insensitive plain text match.
  test("should return true for a case-insensitive plain text match", () => {
      const log = { message: "Test log" };
      const filters = [{ text: "test", regex: false, caseSensitive: false }];
      expect(checkMatch(log, filters)).toBe(true);
  });

  // Test a case-sensitive plain text match.
  test("should return true for a case-sensitive plain text match", () => {
      const log = { message: "Test log" };
      const filters = [{ text: "Test", regex: false, caseSensitive: true }];
      expect(checkMatch(log, filters)).toBe(true);
  });

  // Test failure of a case-sensitive match when case differs.
  test("should return false when case-sensitive match fails due to case difference", () => {
      const log = { message: "Test log" };
      const filters = [{ text: "test", regex: false, caseSensitive: true }];
      expect(checkMatch(log, filters)).toBe(false);
  });

  // Test regex matching in a case-insensitive manner.
  test("should correctly handle regex filters (case-insensitive)", () => {
      const log = { message: "Test log" };
      const filters = [{ text: "^test", regex: true, caseSensitive: false }];
      expect(checkMatch(log, filters)).toBe(true);
  });

  // Test regex matching with a case-sensitive match.
  test("should correctly handle regex filters (case-sensitive match)", () => {
      const log = { message: "Test log" };
      const filters = [{ text: "^Test", regex: true, caseSensitive: true }];
      expect(checkMatch(log, filters)).toBe(true);
  });

  // Test failure of a regex match when case-sensitive setting is not met.
  test("should return false for regex filters when case-sensitive match fails", () => {
      const log = { message: "Test log" };
      const filters = [{ text: "^test", regex: true, caseSensitive: true }];
      expect(checkMatch(log, filters)).toBe(false);
  });

  // Test matching when one of multiple log values satisfies the filter.
  test("should match when one of multiple log values satisfies the filter", () => {
      const log = { message: "Test log", level: "warning" };
      const filters = [{ text: "warning", regex: false, caseSensitive: false }];
      expect(checkMatch(log, filters)).toBe(true);
  });

  // Test when none of the provided filters match any log value.
    test("should return false when none of multiple filters match any log value", () => {
      const log = { message: "Test log", level: "info" };
      const filters = [
          { text: "error", regex: false, caseSensitive: false },
          { text: "critical", regex: false, caseSensitive: false }
      ];
      expect(checkMatch(log, filters)).toBe(false);
  });

  // Test matching numeric log values by converting them to strings.
  test("should match numeric log values when converted to string", () => {
      const log = { id: 12345 };
      const filters = [{ text: "123", regex: false, caseSensitive: false }];
      expect(checkMatch(log, filters)).toBe(true);
  });

  // Test that the function handles undefined or null log values gracefully.
  test("should handle undefined or null log values gracefully", () => {
      const log = { message: null, level: undefined };
      const filters = [{ text: "null", regex: false, caseSensitive: false }];
      // Since String(null) results in "null", the filter should match.
      expect(checkMatch(log, filters)).toBe(true);
  });

  // Test matching using a more complex regex pattern.
  test("should correctly match a complex regex pattern", () => {
      const log = { message: "Testing log with special characters: *+?^$" };
      const filters = [{ text: "Testing log with", regex: true, caseSensitive: false }];
      expect(checkMatch(log, filters)).toBe(true);
  });

  // Test that an empty log object returns false when filters are provided.
  test("should return false when the log object is empty and a filter is provided", () => {
      const log = {};
      const filters = [{ text: "anything", regex: false, caseSensitive: false }];
      expect(checkMatch(log, filters)).toBe(false);
  });
});

Explanation of the Tests

  • No Filters: When the filters array is empty, the function should immediately return true since there are no constraints to check against.
  • Plain Text Matching: Tests validate both case-sensitive and case-insensitive plain text matching by comparing the text from the log with the filter text.
  • Regex Matching: We test both case-insensitive and case-sensitive regular expression scenarios. The tests ensure that the regex is compiled with the correct flags and matches (or fails) as expected
  • Multiple Log Values: The function examines all values within a log object. If any value satisfies the filter condition, the function returns true.
  • Handling Non-string Values: Since log values may be numeric or even null/undefined, the function converts all values to strings. Tests ensure that these conversions allow the filter to match properly (e.g., matching "123" in a numeric value).
  • Complex Regex Patterns and Empty Logs: These tests cover edge cases where the filter contains special regex characters and where the log object is empty, ensuring robustness in all scenarios.

Execution and Results

  • Execution: The tests are executed using Jest in our continuous integration (CI) pipeline. Each test case runs independently, and results are reported in the standard Jest output format
  • Results: All tests passed confirming that:
    • The checkMatch function correctly handles both plain text and regex-based filtering.
    • It behaves as expected when dealing with multiple values, non-string data types, and edge cases.

Analysis and Conclusion

The comprehensive suite of unit tests for checkMatch demonstrates that our log filtering functionality is robust and versatile. By verifying behavior across a range of inputs—including edge cases—we ensure that the function can be confidently integrated into the larger log viewer system. This rigorous testing approach provides a solid foundation for future enhancements and maintains the overall quality of our log service.

AI Agent (Backend) Testing

To ensure that our AI agent behaves correctly as we extend its functionality, we developed a comprehensive suite of integration tests using pytest. These tests simulate real-world scenarios and validate the behavior of the agent’s core actions, such as summary decision-making, summary generation, known issue evaluation, and filter decision and generation.

Our test environment is designed to closely mimic production conditions while providing controlled inputs. The environment is configured as follows:

  • Environment Variables: The tests require an API key (e.g., OPENAI_API_KEY) to instantiate the model client. If the API key is not set, tests are skipped to avoid false failures
  • Fixtures: We use pytest fixtures to set up reusable components (the specific configurations shown in code):
    • chat_agent Fixture: Initializes the ChatAgent using an instance of OpenAIModelClient along with a base prompt. This simulates our agent’s production configuration.
    • logs Fixture: Loads log data from a JSON file using the load_logs() utility. If no logs are found, the tests are skipped.
    • issue_name and issue_details Fixtures: Provide realistic issue metadata, including error descriptions, context, keywords, conditions, and resolutions. The issue_details fixture further extracts a subset of logs based on defined keywords using the extract_top_rows() utility.

Below is an excerpt of the test environment code with fixtures:

TEST_MODEL = "gpt-4o" # stick to this model for consistency

# Fixture for ChatAgent setup
@pytest.fixture
def chat_agent():
  api_key = os.getenv("OPENAI_API_KEY")
  if not api_key:
      pytest.skip("OPENAI_API_KEY not set; skipping integration tests.")
  model = OpenAIModelClient(api_key=api_key, model=TEST_MODEL)
  base_prompt = (
      "You are a helpful assistant integrated with a log viewer tool for Cisco engineers. "
      "Your role is to help analyze and filter large log files quickly. "
      "Do not include any additional text."
  )
  return ChatAgent(model=model, base_prompt=base_prompt)

# Fixture for loading logs from a JSON file
@pytest.fixture
def logs():
  logs = load_logs()
  if not logs:
      pytest.skip("No logs found; skipping integration tests.")
  return logs

# Fixture for issue name
@pytest.fixture
def issue_name():
  return "Missing Media Track Error"

# Fixture for issue details and extracting relevant log rows
@pytest.fixture
def issue_details():
    details = {
        "description": "A media track could not be retrieved by the Media Track Manager, resulting in a 'No Track!' error. This may indicate a failure in creating or negotiating the required media track for a video call.",
        "context": "This error is logged when the system attempts to retrieve a video track (vid=1) during a media session and finds that no track exists. This might be due to signaling failures, media engine initialization issues, or network problems that prevented proper track creation.",
        "keywords": {
            "media": ["CMediaTrackMgr::GetTrack", "No Track!"],
            "video": ["vid=1"],
        },
        "conditions": "This error typically occurs during call setup or renegotiation and may be accompanied by other signaling errors or warnings in the logs.",
        "resolution": "Investigate preceding log entries for errors in media negotiation or track creation. Ensure that the media engine is properly initialized and that network conditions support the required media streams. Verify configuration settings for media track management.",
    }
    logs = load_logs()
    details["logs"] = extract_top_rows(logs, details["keywords"])
    return details

Test Cases and Code Walkthrough

Each test case invokes a specific asynchronous method of the ChatAgent and asserts that the returned output meets expected criteria. Below are examples of the tests along with explanations.

  • Summary Decision Tests: These tests check whether the agent correctly decides to generate a summary based on the user query.
  • @pytest.mark.asyncio
    async def test_decide_summary_integration_true(chat_agent: ChatAgent, logs: list[dict[str, Any]]):
      message = "Can you generate a summary for my logs?"
      generate_summary, explanation = await chat_agent.decide_summary(message, logs)
      assert generate_summary is True
      assert explanation is not None and explanation != ""
    
    @pytest.mark.asyncio
    async def test_decide_summary_integration_false(chat_agent: ChatAgent, logs: list[dict[str, Any]]):
      message = "Filter for debug logs."
      generate_summary, explanation = await chat_agent.decide_summary(message, logs)
      assert generate_summary is False
      assert explanation is not None and explanation != ""

    Explanation:

    • Test 1: Verifies that a query requesting a summary returns True along with a non-empty explanation.
    • Test 2: Ensures that a query focused on filtering (e.g., "Filter for debug logs") returns False.

  • Summary Generation Test: Validates that a natural language summary and accompanying statistics are generated correctly.
  • @pytest.mark.asyncio
    async def test_generate_summary_integration(chat_agent: ChatAgent, logs: list[dict[str, Any]]):
      message = "Can you generate a summary for my logs?"
      summary, stats = await chat_agent.generate_summary(message, logs)
      most_common_keywords = list(stats["Most Common Keywords"])
      assert summary is not None
      assert stats["Debug"] == 465
      assert stats["Info"] == 3415
      assert stats["Warn"] == 81
      assert stats["Error"] == 39
      assert most_common_keywords == ["[]WME:0", "=", "[cid=3691033875]", "::[UTIL]", "::[AudioEngine]"]
    

    Explanation: This test checks both the generated summary and the computed log statistics to ensure they align with our expected values.

  • Known Issue Evaluation Test: Ensures the agent correctly decides whether to evaluate known issues and returns an appropriate issue summary.
  • @pytest.mark.asyncio
    async def test_evaluate_decision_integration_true(chat_agent: ChatAgent):
      message = "Can you look for potential issues in my logs?"
      should_check, explanation = await chat_agent.evaluate_decision(message)
      assert should_check is True
      assert explanation is not None and explanation != ""
    
    @pytest.mark.asyncio
    async def test_evaluate_decision_integration_false(chat_agent: ChatAgent):
      message = "Can you filter for debug logs?"
      should_check, explanation = await chat_agent.evaluate_decision(message)
      assert should_check is False
      assert explanation is not None

    Explanation: These tests verify that the evaluation decision method returns the correct Boolean flag based on the query's intent.

    @pytest.mark.asyncio
    async def test_evaluate_issue_integration(chat_agent: ChatAgent, logs: list[dict[str, Any]], issue_name: str, issue_details: dict[str, Any]):
      message = "Can you look for potential issues in my logs?"
      issue_text = (await chat_agent.evaluate_issue(issue_name, issue_details, message, [])).strip()
      assert issue_text is not None and issue_text != ""
      assert "Issue Summary" in issue_text
      assert "Resolution" in issue_text
                

    Explanation: This test checks that when evaluating a specific issue, the returned text includes both an "Issue Summary" and a "Resolution" section.

  • Filter Decision and Generation Tests: Verify that the agent can decide to add a filter based on detected issues and that it generates a correctly structured filter group
  • 
    @pytest.mark.asyncio
    async def test_filter_decision_integration_true(chat_agent: ChatAgent, issue_name: str, issue_details: dict[str, Any]):
      message = "Can you filter for debug logs?"
      should_filter, explanation = await chat_agent.decide_filter(message, {issue_name: issue_details})
      assert should_filter is True
      assert explanation is not None and explanation != ""
    
    @pytest.mark.asyncio
    async def test_filter_decision_integration_false(chat_agent: ChatAgent, issue_name: str, issue_details: dict[str, Any]):
      message = "Can you look for potential issues in my logs?"
      should_filter, explanation = await chat_agent.decide_filter(message, {})
      assert should_filter is False
      assert explanation is not None and explanation != ""
    
    @pytest.mark.asyncio
    async def test_generate_filter_integration(chat_agent: ChatAgent, issue_name: str, issue_details: dict[str, Any]):
      message = "Can you filter for potential issues in my logs?"
      filter_group = await chat_agent.generate_filter_group(message, {issue_name: issue_details})
      assert "title" in filter_group
      assert "description" in filter_group
      assert "filters" in filter_group
      assert isinstance(filter_group["filters"], list)
      assert len(filter_group["filters"]) > 0
      assert all("text" in f and "regex" in f and "caseSensitive" in f and "color" in f for f in filter_group["filters"])
      assert all("description" in f for f in filter_group["filters"])
      assert filter_group["title"] is not None
      assert filter_group["description"] is not None
      assert filter_group["filters"] is not None
      # Ensure that filters include keywords defined in the issue details
      assert any(any(kw in f["text"] for kw in issue_details["keywords"]["media"]) for f in filter_group["filters"])
      assert any(any(kw in f["text"] for kw in issue_details["keywords"]["video"]) for f in filter_group["filters"])
                

Explanation: These tests confirm that the filter decision method returns the appropriate decision based on the presence of known issues and that the generated filter group contains all required fields, including filters that capture keywords from the issue details.

Test Execution and Results

  • Execution: The tests are executed using pytest in a continuous integration (CI) environment. Each test is run asynchronously to simulate real-world usage of the AI agent. If any fixture (like the API key or logs) is missing, tests are automatically skipped to prevent false negatives.
  • Results All test cases have been passing consistently:
    • The summary decision functions return expected Boolean values along with appropriate explanations.
    • The summary generation test produces both a valid summary and accurate log statistics.
    • The known issue evaluation tests return structured responses containing the necessary sections.
    • The filter decision and filter group tests validate that all required fields are present and correctly populated.

Analysis and Conclusion

Our testing strategy covers a broad range of scenarios that reflect actual usage by Cisco engineers. The integration tests confirm that:

  • Each asynchronous action in the AI agent functions correctly.
  • The core matching mechanism (keyword/regex) and the semantic augmentation (when available) are working seamlessly.
  • Immediate asynchronous results ensure that the front-end receives updates in real time, contributing to a responsive and reliable user experience.

The consistent passing of these tests gives us confidence in the AI agent’s stability and its ability to support future enhancements without introducing regressions.

Custom Data Structure Testing

To ensure that the custom Prefix Trie and Min Priority Queue we built were working properly, we created detailed unit tests. These tests were designed primarily for edge and erroneous cases to make sure that these data structures did not break any of the vital components of the program.

Test Environment Setup
  • Test Framework: We use Jest as our test runner, which supports JavaScript/TypeScript unit testing.
  • Test Config: This tells Jest to use the Node.js environment for testing. Instead of simulating a browser environment, tests will run as if they're executing in a Node.js context, which is useful for testing server-side or Node-specific code.
Test Cases and Code Walkthrough

Trie Test Cases

  • Standard Use Test Cases: This test case is not a stress/erroneous test case. It adds 4 words to the Trie and ensures that for a given prefix, only the words with that prefix are returned.
  • 
    test('Should insert words and collect them based on prefix', () => {
        // Insert several words
        t.insertWord("hello");
        t.insertWord("hell");
        t.insertWord("heaven");
        t.insertWord("goodbye");
    
        // Test collecting words that start with "he"
        const results = t.collect("he", 10);
        const words = results.map(pair => pair.word);
    
        // We expect the words "hello", "hell", and "heaven" to appear, but not "goodbye"
        expect(words).toEqual(expect.arrayContaining(["hello", "hell", "heaven"]));
        expect(words).not.toEqual(expect.arrayContaining(["goodbye"]));
    });
                    
  • Frequency Test: The Trie is supposed to keep track of the number of times a particular word is searched for. This test ensures that the frequency count of a node is incremented properly
  • 
    test('Should increase frequency on duplicate word insertion', () => {
        t.insertWord("hello");
        t.insertWord("hello");
        
        // Collecting "hello" should show a frequency of 2.
        const results = t.collect("hello", 5);
        // We assume that the Pair structure is preserved from the Trie
        expect(results.length).toBe(1);
        expect(results[0].freq).toBe(2);
    });
                    
  • Empty Trie Test: If the prefix user is searching for does not exist in the Trie, it should return an empty array with no search hits
  • 
    test('Should return an empty array for a non-existent prefix', () => {
        t.insertWord("hello");
        t.insertWord("world");
        
        const results = t.collect("abc", 5);
        expect(results).toEqual([]);
    });
                    
  • Trie Search Filtering: The prefix matching engine should only return the top k most frequently searched words. This test verifies this behaviour works as expected.
    
    test('Should remove the least frequently searched word from trie if number of search hits exceeds number of searches', () => {
        t.insertWord("hello");
        t.insertWord("hello");
        t.insertWord("hell");
        t.insertWord("hell");
        t.insertWord("hell");
        t.insertWord("hellochickenkatsu");
        t.insertWord("he");
        t.insertWord("he"); 
    
        //hellochickenkatsu should not be in the search result list
        const results = t.collect("h", 3);
        expect(results.length).toBe(3);
        expect(results).not.toContain("hellochickenkatsu");
    });
                        

Priority Queue Test Cases

Before each test case, a new PQ is initialised with its maximum size preset to 3 elements

  • Empty PQ Test: When the PriorityQueue is instantiated, it should start empty. This test ensures that the internal counter is zero and that retrieving the top results returns an empty array.
    test("initializes with an empty queue (n = 0)", () => {
        expect(pq.n).toBe(0);
        // The heap array is pre-filled with dummy Pair objects,
        // but only elements from index 1 to n are considered valid.
        expect(pq.getTopResults()).toEqual([]);
    });
                        
  • Insertion Test: The PriorityQueue should correctly insert elements until it reaches its maximum capacity. This test verifies that these elements are stored and can be retrieved from the queue.
    test("inserts elements without exceeding maxSize", () => {
        const pair1 = new Pair("apple", 5);
        const pair2 = new Pair("banana", 3);
        const pair3 = new Pair("cherry", 8);
    
        pq.insert(pair1);
        pq.insert(pair2);
        pq.insert(pair3);
    
        expect(pq.n).toBe(maxSize);
        const results = pq.getTopResults();
        // All inserted pairs should be in the queue
        expect(results).toContainEqual(pair1);
        expect(results).toContainEqual(pair2);
        expect(results).toContainEqual(pair3);
    });
                        
  • Capacity Overflow Test: When the PriorityQueue is full and a new element is inserted, the queue must remove the element with the smallest frequency to maintain the maximum capacity.
    test("inserts extra element and removes the smallest frequency pair", () => {
        const pair1 = new Pair("apple", 5);
        const pair2 = new Pair("banana", 3);
        const pair3 = new Pair("cherry", 8);
    
        // Fill the priority queue to maxSize
        [pair1, pair2, pair3].forEach(pair => pq.insert(pair));
        expect(pq.n).toBe(maxSize);
    
        // Now, insert a new pair with frequency 7
        const newPair = new Pair("date", 7);
        pq.insert(newPair);
    
        // Because capacity is 3, one element should be removed.
        // The element with the smallest frequency is "banana" (freq: 3).
        expect(pq.n).toBe(maxSize);
        const results = pq.getTopResults();
    
        // The remaining pairs should be "apple" (5), "cherry" (8) and "date" (7)
        expect(results).toContainEqual(pair1);
        expect(results).toContainEqual(pair3);
        expect(results).toContainEqual(newPair);
    
        // Ensure that "banana" was removed.
        const bananaPresent = results.some(
            pair => pair.word === "banana" && pair.freq === 3
        );
        expect(bananaPresent).toBe(false);
    });
                        
  • Low-Frequency Element Rejection Test: This test ensures that if an element with a frequency lower than those already in the full PriorityQueue is inserted, it is rejected and does not affect the current top results.
    test("inserts element with lower frequency than current minimum and gets removed", () => {
        // Insert higher frequency pairs
        pq.insert(new Pair("a", 50));
        pq.insert(new Pair("b", 60));
        pq.insert(new Pair("c", 70));
        expect(pq.n).toBe(maxSize);
    
        // Insert a pair with a much lower frequency
        const lowPair = new Pair("d", 10);
        pq.insert(lowPair);
    
        // Since lowPair is the smallest, it should be removed immediately.
        expect(pq.n).toBe(maxSize);
        const results = pq.getTopResults();
        const words = results.map(pair => pair.word);
        expect(words).not.toContain("d");
    });
                        

Test Execution and Results

Results: All test cases for both the Trie and PriorityQueue have been consistently passing. Notable outcomes include:

  • Trie Insertion and Retrieval: Verifies that inserting words and collecting them by prefix returns the correct subset of entries.
  • Frequency Handling: Confirms that duplicate word insertions correctly increase frequency counters, which are crucial for ranking and search suggestions.
  • Prefix Filtering: Demonstrates that only the most frequently searched terms (up to a specified limit) are retained, removing less relevant words as needed.
  • PriorityQueue Capacity Management: Ensures that when the queue reaches its maximum size, the item with the smallest frequency is removed to maintain the correct capacity.
  • Heap Ordering: Validates that the PriorityQueue preserves internal ordering based on frequency, removing and retaining elements appropriately even when inserted out of order.

Analysis and Conclusion

Overall, the unit tests demonstrate that the Trie and Heap:

  • Correctly handles word insertions and prefix-based queries.
  • Robustly manages duplicates through frequency counting.
  • Properly filters and prioritizes search results, ensuring high-frequency words are retained.
These outcomes provide confidence in the reliability and efficiency of the data structures’ design, supporting its intended use in autocomplete and search suggestions.