Depth Sensing Endoscope

Evaluation

This section details what went well in this project and what has been done to a good standard in the final system, as well as areas for improvement if the project were to be repeated in the future. For recommendations about what could be done with the project in the future, see the following sections that all relate to this:
The Future
Further Development
Collaboration Opportunities

Depth Sensing Surgical System Project
This subsection evaluates the project as a whole, and the subsections that follow this concern individual parts of the whole project.

One of the most fundamental concepts that the team adhered to during this project was that this project is a starting point - a proof of concept - and is not something that will deployed in its current state. This meant that it was imperative that the team make the system future-proof - so that all of the concepts being used in the project right now would be applicable and usable in future iterations of the system.

To this end, every design made and every aspect of the system is conducted with the future developer in mind. The architecture of the system was designed according to a strategy design pattern and is heavily commented and explained so that the future developers will easily be able to add to the system. Furthermore, the team has designed aspects of the system that would definitely not be implemented in this iteration of development such as Haptic and Audio feedback, but as the designs are included with the project the future developer will know exactly how to build these features.

In the user interface design PDF (UI design), designs are shown for a haptic headset that would both be able to take voice commands and be able to vibrate to give the surgeon haptic feedback. Tools could also be made to stiffen when they touch the body tissue. In terms of Audio feedback - the system would emit a series of ever more frequent beeps as the tool approached the background body tissue, so the closer the tool, the more frequent the beeps, similar to a parking sensor. This would allow colour blind users to use the system as they could rely on the beeps getting more frequent to tell when the tools approach the body tissue rather looking at the colour of the tools which they would not be able to discern.

The fact that these designs have been thought out by the team mean that it will be easier for future developers to implement them than having to design them from scratch. This concept also applies to the technical aspects of the project. One of the most important parts of the final system will be its ability to identify tools in the camera feed. The current system identifies tools by their bright green colour, but an actual system used in surgery would have to identify the tools by other means without having to put physical markers on them. In order to make this task easier for future developers, the team has talked with others such as the Surgical Robot Vision Group ( http://www.surgicalvision.cs.ucl.ac.uk) about how to do this so that this information is present for future developers; more information about this is given in the Green Tool Identification section of this document.

Furthermore, the augmentations based on depth that have been created solve the problems the team were presented with at the start of this project. The surgeon wanted to be able to get a sense of the general 3D environment in which he was working, which we have provided with our ColoredDepth and ColoredToolDepth augmentations. The surgeon wanted to be able to tell how close his tools were to the background body tissue, which he can do with our ToolProximity augmentation. Finally, our clients wanted all of this to be as simple and intuitive as possible, which the team has done by making the UI minimalistic so there are not lots of different buttons and options to confuse users.

In addition, the system has been made multithreaded which is a critical feature. The fact that the system is multithreaded means that the visual augmentation algorithms can be run in a separate thread to the UI, so that if the augmentations run slowly, the rest of the system does not. This also promotes patient safety, as if the surgeon was operating and the augmentation froze, the normal camera view which runs in a separate thread would still be providing a real time image of the surgical area, so the surgeon would never lose sight of what he is doing even if the augmentations run slowly.

However, there was one particular area where the system could have been improved: it could have been built using C++. The team used C# to build the system, because C# is extremely easy to understand and build with so the team could start building features quickly. This does however mean that the augmentations can run slowly at times, especially if trying to process lots of tools moving fast at the same time because C# is not built for performance. This does not however effect the normal feed of the system because it runs in a separate thread to the augmentation.

C++ is used to build performance critical systems. By using C++, the system would have been much faster and the team would likely have never run into any issues with augmentations running slowly if intelligent algorithms were used.

On balance however, it is a good thing that the team did not use C++. This is because if C++ was used, the team would have spent months learning how to use it before any part of the system could have been built. But as the team used C#, which was easy to understand because the team knew Java well, prototypes and experiments with the Kinect could be built extremely quickly. Without this ability to make fast prototypes, development would not have reached the level that it has. In the future, developers could look at implementing the system in C++ which would make it faster, but that would have been unnecessary for us at this stage of research and development.

System Architecture and the Strategy Pattern
The system architecture was based on the strategy design pattern, which meant that the behaviour of the system could be changed at runtime, which was essential because there are multiple augmentations that can be applied to the feed according to options that the user has selected, each of which is a separate strategy. The architecture was well designed and highly suited to the system, and also makes future development much easier because all future developers need to do to add to the system is add a new strategy to the relevant file, then strategy can be displayed by the system; developers will barely need to touch the MainWindow file (which handles issues such as displaying the normal and augmented images).

However, it could be argued that the strategy pattern was overused. Our system had three separate folders for different strategies - ToolIdentifiers, VisualAugmenters and Controllers. It is possible that making a family of strategies called Controllers was excessive. The idea behind these was that if developers wanted to add a new way of controlling the system, all they would have to do would be to implement the Controller interface and communicate with the MainWindow file. However, there are not that many different ways to control the system - currently it has voice control and click control from the user interface, but it could also have a feature such as control from a mobile application.

It may have been better to simply implement all different types of control directly in the MainWindow file rather than encapsulating them in a strategy, as this potentially just makes things more confusing for future developers than adding new features directly to the MainWindow file.

The Kinect Sensor
The team used the Kinect 2.0 sensor in building this project, as it has good depth sensing ability. However, the Kinect also posed one of the biggest challenges of the entire project.

The Kinect is far larger than any camera that would ever actually be used in minimally invasive surgery, and because of that the team had to scale up the test scenarios that we conducted dramatically, making them very large in order to fill the Kinect’s entire field of view. That made constructing test scenarios difficult because a lot of space was needed to do so, and sourcing props was even more difficult. In the camera feed of actual minimally invasive surgery, the tools take up a fairly large percentage of the image. However with the Kinect, even with scaled up tools, the tools only took up a tiny percentage of the actual image. That may prove not to be a problem at all if the system were implemented on the surgical scale, but it is still a discrepancy between the view from an endoscope and the view provided by our system.

Furthermore, it is stated in the Kinect’s documentation that its depth sensing ability starts working at 50 centimetres, anything less than that and depth cannot be measured accurately. However, with the team’s testing it was observed that the minimum distance is more like 1 metre, because at anything less than this a “doubling” effect was observed - where there would appear to be two of everything in the camera feed.

In conclusion the Kinect sensor was a great tool to design this proof of concept with because it is so easy to use, especially with C#. However once the team got to the point of developing test scenarios and realised how big they had to be, a point was reached where a more accurate and smaller scale depth sensor was needed - after all the Kinect was designed for monitoring whole people rather than anything on a small scale.

In the future, the team recommends looking at how to use other depth sensors such as the Intel® RealSense™ which may have greater accuracy and closer range than the Kinect. The Intel® RealSense™ also comes with an SDK.

ToolIdentification Strategies
The team created a folder and interface for a family of strategies called ToolIdentifiers. These would take the raw image provided by the Kinect, analyse it and then return either an integer array where each array value is an index of the main colour image array which is a pixel belonging to the tool, or a boolean array that is the same size as the image array, with “true” or “false” at each index, where each index corresponds to the same index in the image array, and if the value at that index is “true”, then this is a pixel that belongs to the tool.

This would be useful, because the array returned by these strategies could then be passed to the visual augmenter strategies, which would then be able to use this information about which pixels comprise the tool to augment the view of the tool.

However, the team did not actually implement any of these strategies themselves. This was because the team created the visual augmentations separately from the final system during the research phase of the project, and they already had tool identification by fluorescent green incorporated into them, rather than tool identification being a separate thing entirely to visual augmentation. Thus the team did not add any tool identification strategies to the system.

It could be argued therefore, that the folder of ToolIdentifier strategies is useless, because nothing was added to it and the tool identification algorithms were incorporated into the visual augmentation strategies. However, this is not the case. Our tool identifier strategies were only incorporated into the visual augmentation algorithms because the team was using fluorescent green-coated tools to identify them by.

In the future, especially if this system is actually deployed, tools would need to be identified by other means because it is unreasonable to expect surgical tool makers to paint their tools flourescent green, although that would make the whole system much easier to implement. When tools are identified by other means, such as machine learning, a separate, ToolIdentifer class will be needed entirely, because having code to perform machine learning algorithms inside the visual augmentation code would be extremely cumbersome. It is much simpler if the tool is identified in the picture by a tool identification algorithm first, and then the identified pixels passed to the visual augmentation algorithm to work with.

So in short, although the team did not implement any ToolIdentifier strategies itself, it has left the interface and file in our project for future developers to benefit from.

Green Tool Identification
Before applying any augmentations to images received from the Kinect sensor, the team first had to identify which pixels of the image belonged to the surgical tool or tools. The team tried a few simplistic algorithms to do this, but realised that identifying tools was an extremely difficult machine learning task that would take a great deal of time to implement.

This project was about applying depth sensing to surgery, not about identifying surgical tools. Therefore the team decided to simplify the whole process by making all of our test props bright green, a colour that would never be found in the body, which made identifying them a trivial task. For full details of the development and implementation of this idea, see this PDF:

Making tools flourescent green worked well for this project, but it is unrealistic to expect that the same idea could be applied to actual surgery because this would require a change in manufacturing to make the tools bright green.

It is important to remember that this project is a proof of concept - that depth sensing can be applied to surgery - not a deployment of an actual, to-scale system, and effectively what we are doing with identifying tools by their green colour is saying:

“Assuming we had identified the tool, this is what we can do to it with its depth information.”

Identifying tools by green colour completely cuts out the need to identify tools by complex means, so that the depth sensing ability of the system can be showcased. In an actual deployment of the system, a tool identification algorithm using machine learning for example would have been implemented, which would then work automatically with the depth sensing work that has been done in this project. By making the tools green, we cut out the need to work out how to identify tools and can focus solely on the depth sensing ability of the system.

However, the team did appreciate that a different method of tool identification would be needed in the future. We therefore experimented with identifying tools by depth by finding all of the edges in the image. This allowed us to identify the tool by depth most of the time - except when it is touching the background because then it has the same depth value as the background. The team has therefore not only developed a depth sensing system, but also considered the future and how the system would be deployed by experimenting with algorithms to identify the tool that do not require any physical markers on the tool.

Furthermore, because tool identification will be such an important part of this system when it is deployed, the team tried to help future developers as much as possible with developing tool identification algorithms by researching ways in which this could be done. For example the team has corresponded with Max Allen of the Surgical Robot Vision Research Group at UCL who has experience with recognising surgical tools algorithmically, and gave the team advice and guidance on how this could be done. His suggestions can be found in the Surgical Vision Group section at this page:

UI Improvements
The main goals as regards the user interface of the system were that it be highly intuitive and simple so that the surgeon would not have to try and work out how to use the interface while he was performing surgery.

The system’s UI is certainly very simple, as well as being voice controllable which makes it more natural. However, there are certain areas where it could be improved. For example, visibility of system status is fairly poor. There is no indication on the system of the status of the Kinect (or other depth sensor) as there is on the example programs provided by Microsoft. An example of a status would be “Kinect: Connected” or “Kinect: Running”. This would improve the system because if it stopped working for any reasons, then the user could simply check the Kinect status to see if the issue was that the Kinect had disconnected, as in 99% of cases this is the problem when the system stops working rather than any issue with the code.

Furthermore, given that the system is voice controllable, there should be a small button that when clicked provides users with an overview of what the voice commands are encase the user forgets. It could also provide reminders of what all of the different Visual Augmentations do.

In summary, the user interface is well designed and is both simple and intuitive, however there are minor areas to improve on it such as improving the visibility of system status and adding some in-system guides encase the user forgets what the voice commands are or what any of the augmentations do.