Literature Review Video

The main findings from our research are summarised in our literature review video.

Depth Mapping

To improve the quality of projection we need a depth map of the TV screen’s irregular surroundings. There should be a compromise between the ease of use, complexity of configuration, the time it takes to calibrate and the desired precision of the outcome. Additionally, we are required to replace the Kinect camera from IllumiRoom with a webcam or smartphone camera. Many solutions using a single, mono camera are either unreliable, hard to operate or take a long series of frames to calculate a depth map.

Thankfully, with the projector accessible for calibration, we can consider using one of the many "structured lights" algorithms which should deal with said concerns and are widely used in existing 3D scanners. One such technique was even utilised in the initial Kinect design and prototype with the help of infrared projection [5]. We will compare three different binary approaches tested by Alhareth Altalib in his paper, Depth Map Extraction Using Structured Light [6], to find the most suitable for our use case.

Binary Codes

These work with all white or all black intensity projections by subdividing the surface into ever so smaller equal sectors and reading pixel data each time as 1 (fully illuminated) or 0 (not illuminated), later using triangulation to get depth. Thanks to using binary data this solution can give precise results, but with high time-cost.

Gray Codes

These work similar to Binary codes, but use different binary patterns for the stripes.

Fringe Patterns

These generate a shifting pattern using a function (usually a sine wave) potentially reducing time it takes to capture depth.

Projection Mapping

RoomAlive Toolkit

The project requires us to implement 2D projection mapping. This technique will be used to turn the irregular surroundings behind the TV into a display surface for video projection.

The projection mapping used for IllumiRoom is explained in Microsoft’s RoomAlive Toolkit [4], a toolkit used to create dynamic projection mapping. The aim of the projection mapping process is to generate a render of the view to be projected onto the surroundings, from the user’s viewpoint. This requires a projection matrix to be computed for the user’s projector. The projector’s intrinsic parameters (internal parameters such as focal length, field of view, etc.) should be considered when assembling this matrix.

There are several types of projection matrices which can be formulated as detailed in Projections in Context [7], such as parallel, oblique, orthographic and perspective projections. The paper argues that the perspective projection is the most realistic as it closely resembles human vision.

Graphics APIs

Presently, there are various open-source real-time graphics APIs which can be used to assemble a perspective projection matrix. Leading APIs include Microsoft’s Direct3D, OpenGL and Vulkan.

  1. DirectX / Direct3D — widely used for Windows and Xbox
  2. OpenGL — available for most operating systems
  3. Vulkan API — lower-level API which offers higher performance and more efficient CPU and GPU usage. But it presents a steeper learning curve, requiring more boilerplate and state management.

OpenGL is usually considered to have better performance than Direct3D though this differs depending on the game. Since DirectX typically works better on Windows platforms and it is our client’s operating system, we plan to use it for our graphics API.

Libraries

The RoomAlive toolkit uses Direct3D to set up a perspective projection matrix for the projector. Along with the projector intrinsics (mentioned previously), it is also necessary to take into account the that the principle point of the projector may not be the centre of projection (COP). Since the COP is typically placed at the origin [8], this would result in an off-centre/ off-axis perspective projection matrix.

For simple 2D mapping, RoomAlive has used Direct3D via SharpDX, a .NET wrapper of the DirectX API along with Math.NET Numerics packages. SharpDX is relatively stable but hasn’t been maintained since 2019. Furthermore SharpDX is one of the biggest blockers for migrating to newer .NET versions, as it is only compatible with .NET Framework 4.8.

Hence, it is important that we replace any code taken from the toolkit to not use SharpDX if we want to ensure the longevity of the project. Some alternatives to the SharpDX library are: Vortice, TerraFX (low-level), Silk.NET. We have decided to investigate Vortice further as it most closely resembles SharpDX.

Game-specific Projection Modes

Selecting Games

The purpose of the system is to increase immersion for all users. We’ll mostly use closed source mainstream games to test and implement the system, since these are the games most users will play (as identified when gathering requirements). The games we select for testing our system must have controller support due to the nature of TV/ couch based gaming making keyboard/mouse input inconvenient.

To demonstrate the system for open source games, SuperTuxKart, a Nintendo Super Mario Kart clone, is available for use. The game has good features and controller support, making it suitable for playing in front of a TV seated on a sofa.

SuperTuxKart

FIFA 20, Call of Duty Black Ops, Rocket League, Forza Horizon and MultiVersus are 5 very different games which we hope to test our system with to ensure it can be used with more popular games.

Reading & Processsing Games

How can we read in and process game data to allow us to display game-specific projection effects?

  1. Do we hack the game memory? – This might work for open source games, however since we are focusing on closed source games with DRM and anticheat software builtin, we are unlikely to be able to access this information.
  2. Computer vision of screen content? – This is probably our best bet! We discovered a number of libraries that may help with this, including PyImageSearch.
  3. Read controller inputs/vibration output? – This is definitely a possibility that will need some consideration. If we can read the controller inputs, then we may be able to display certain effects when keys are pressed. Further investigation into this option is necessary.
  4. Listen to game sound? – Analysing game sound could be a great option for detecting when actions occur

Libraries

Video analysis can be conducted using PyImageSearch, a Python library built on OpenCV. We would be able to use it to take screenshots, analyse the game content, and dispatch actions to the projection mapping system to display effects.

Sound analysis can be performed with a library such as Librosa to detect various sounds such as gun shots, loud distinct noises in games, or car crashes etc. These can be used to trigger certain projections.

Summary of Technical Decisions

We will implement our system – UCL Open-Illumiroom V2 in Python, rather than build on the original C# implementation for 2 reasons:

  1. In our team, we are much more experienced with Python and have identified its various libraries we could make use of such as OpenCV2, Librosa and more.
  2. We need to create a new implementation of the system as the original relied on the Kinect camera and ours needs to be able to calibrate using only a webcam or smartphone camera.

For creating the projection modes, the system will make use of the MSS library for capturing game data (by recording the game display). OpenCV will be used to analyse and transform the data for generating various visual effects. We will also make use of Librosa to analyse audio data. Since we aim to make our own variation of the Microsoft IllumiRoom’s Radial Wobble, we hope to use the audio detection to trigger the animation when certain sounds are detected.

Our projection effects will be displayed using a PySide2 window (the Python binding of the GUI toolkit Qt) and our system will be compiled for distribution, accessible on the Microsoft Store. Nuitka will most likely be used for compilation to C as it allows for a full build to be generated with all libraries used statically linked.

We will continue to investigate depth and projection mapping, as this will be the most complex part of the system that will be implemented. As discussed in Depth Mapping, structured light algorithms will also likely be an option which we can use. The projection mapping from RoomAlive will almost certainly be helpful.

References

[1] B. Jones, H. Benko, E. Offer, A. Wilson, “IllumiRoom: Peripheral Projected Illusions for Interactive Experiences”, 2013. [Online] Available: https://www.microsoft.com/en-us/research/wp-content/uploads/2013/04/illumiroom-illumiroom_chi2013_bjones.pdf

[2] “Projection Mapping Central”. [Online] Available: https://projection-mapping.org/inspiration/

[3] B. Jones, R. Sodhi, M. Murdock, “RoomAlive: Magical Experiences Enabled by Scalable, Adaptive Projector-Camera Units”, 2014. [Online] Available: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/paper2943.pdf

[4] B. Jones, R. Sodhi, M. Murdock, “RoomAlive Github Toolkit”, 2014. [Online] Available: https://github.com/microsoft/RoomAliveToolkit/

[5] Villena-Martínez, V.; Fuster-Guilló, A.; Azorín-López, J.; Saval-Calvo, M.; Mora-Pascual, J.; Garcia-Rodriguez, J.; Garcia-Garcia, A. A Quantitative Comparison of Calibration Methods for RGB-D Sensors Using Different Technologies. Sensors 2017, 17, 243. [Online] Available: https://doi.org/10.3390/s17020243

[6] Altalib, Alhareth, "Depth Map Extraction Using Structured Light", 2019. [Online] Available: https://www.researchgate.net/publication/337947456_DEPTH_MAP_EXTRACTION_USING_STRUCTURED_LIGHT

[7] K. Mason, S. Carpendale, B. Wyvill, “Projections in Context”, 2003. [Online] Available: https://www.researchgate.net/publication/266456254

[8] T. Pejsa, J. Kantor, H. Benko, E. Ofek, A. Wilson, “Room2Room: Enabling Life-Size Telepresence in a Projected Augmented Reality Environment”, 02/2016. [Online] Available: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/06/Room2Room_CSCW2016-2.pdf