Icosi-Do cover
Icosi-Do screenshot
Genre: Strategy, Indie

Icosi-Do

Autumn Sale and Steam Awards

During the Steam Autumn Sale, icosi-do is available at a discount of 40%! Also, don't forget to nominate icosi-do in the Labor-of-Love category of the Steam Awards 2023.

On the potential benefits of hand gesture controls

As I've mentioned, I'm working on hand gesture controls for icosi-do with Tilt Five glasses, which includes quite a bit of testing. To some degree, I was surprised how good these controls feel (as long as they work as intended). That's not to say that my implementation works sufficiently well already, but I hope that it will once I can access the wide-angle head tracking camera of the Tilt Five glasses.

Now I'm wondering: what makes hand gesture controls feel better than hand-held controllers? Before answering that question, it's useful to consider what is required to make hand gesture controls feel at least as good as hand-held controllers.

The Power of Controlling


Of course, the detection of specific hand gestures has to work sufficiently well. On top of that, many hand gestures will include some form of motion tracking, which has to be sufficiently accurate and fast to facilitate comfortable, interactive controls. While these requirements are not trivial, they are only the most basic ones in a longer list.

The Power of Resting


Long-term use of input devices such as keyboards, computer mice, touch pads, etc. is only comfortable if hands may rest on their palms while using these devices. Similarly, lower arms should be allowed to rest while using hand gesture controls. For Tilt Five, it might sometimes be possible to rest one's elbows on the same surface that the game board is placed on. Armrest pillows or adjustable armrests might be even more comfortable as demonstrated by the armrests of the surgeon's console of da Vinci Surgical Systems.

The Power of Touching


Another challenge for hand gesture controls is the lack of touch. Part of the problem is that touching is inherently pleasurable to most humans - even very young humans: some start thumb sucking even before birth. But there is more to it: holding a button down and keeping it in exactly that position is a lot easier than keeping a trigger or joystick in a specific position midway between its extreme positions. That's because we can hold a button down by applying a varying force as long as that force is always great enough to keep the button down; thus, there is no need for continuously correcting the applied force based on sensory feedback to keep its position, which means that the required cognitive load is smaller. One problem with mid-air hand gestures is that you are usually not able to push against other objects to reduce the required cognitive load. Thus, if a hand gesture requires users to keep their fingers in specific positions mid-air, holding that gesture can feel quite demanding and uncomfortable.

A possible design solution to this challenge might be two-fold: one part is to include touch by requiring two fingers to touch each other, e.g., as part of a OK or ring hand gesture. The other part is:

The Power of Clutching


With "clutch", I mean a way for users to temporarily stop controlling a user interface. Lifting a computer mouse up from a surface or lifting a finger up from a touch surface or muting a microphone are clutches in this sense. Often these clutches only exist in software: a microphone "muted in software" might technically still record sounds, but that doesn't matter if the software discards all recordings.

There are multiple benefits and uses of clutches. One of them is that users may relax and don't have to worry that their actions may have unintended consequences. Thus, if hovering with a mouse (without pressing a mouse button) doesn't have any consequences (other than changing the position of a mouse pointer), then mouse buttons become a clutch, too.

As mentioned above, holding your fingers or hand in a specific, controlled position may feel quite demanding and uncomfortable, thus, it is useful to offer users a clutch that allows them to temporarily exit this requirement. A good candidate for such a clutch is the touching of two fingers, i.e., as long as a user's fingers don't touch each other, no actions are triggered - similarly to hovering with a mouse.

Less is More


Assuming that all above issues have been solved: why would hand gesture controls feel better than hand-held controllers? A useful comparison might be to the relation between finger touch and stylus touch. Nowadays, finger touch is considerably more popular than stylus touch for a couple of reasons. For example:
  • Finger touch does not require buying, storing, finding, handling, and/or charging another device, which improves convenience and lowers friction.
  • Finger touch provides more direct interaction with a screen without the need for an intermediate device, which potentially improves immersion.

In my opinion, both points are relevant for the comparison between hand gesture controls and the Tilt Five wand: finding the wand, putting batteries into it, and turning it on does not take long, but it adds to the friction of using the system.

While pointing with the wand is fine, the long stick at its front means that the user's hands are usually more than 15 cm away from the point of interaction. Compare this with finger touch or with hand-held VR controllers, which are sometimes required to be moved directly next to or even into virtual objects to interact with them.

The Magic of Hand Gestures


While the comparison to finger touch and stylus touch is useful, in my opinion it doesn't fully explain how good hand gestures may feel. The missing element might be the magical feeling of using hand gestures to interact with objects at a distance without touching them. In fact, magicians often distract the attention of their audience with dramatic hand gestures that pretend to affect objects at a distance. Similarly, telekinetic abilities in films are often visually communicated with hand gestures, for example, the use of the "Force" in the Star Wars universe. The VR gaming industry has long recognised grabbing at a distance as a more convenient alternative to grabbing only those virtual objects that are very close to the virtual representations of hand-held controllers. One example of distance grabbing is provided by the "gravity gloves" in "Half-Life: Alyx" (which was predated by the "gravity gun" in "Half-Life 2").

Remote controls, laser pointers, and VR controllers have familiarised many people with the idea of interacting with objects at a distance. However, the possibility of interacting with distant objects by means of hand gestures without holding any technical device might cause a renewed feeling of performing magic when using hand gesture controls.

TL;DR


The feeling of magic might be an important benefit of hand gesture controls. But make no mistake: all the other issues mentioned above have to be solved first before one can hope to reap this prize. The good news is that reproducing the experience of grabbing or touching virtual objects is not necessary - and probably not even desirable.

Patch V-14: Second test of hand gesture control for Tilt Five glasses

This small update includes another small test of hand gesture controls for users of Tilt Five™ glasses. If you have access to Tilt Five™ glasses, please try it out with the full game or the demo.

Some instructions (only for Tilt Five™ users):

  • In the Tilt Five™ Control Panel under "Settings", activate "Allow Camera Frame Sender".
  • In icosi-do, the test of hand gesture control is activated under SETTINGS > AR GLASSES > GESTURES > TEST. Through the Tilt Five™ glasses, you should then see a red, low-resolution camera image below the puzzle.
  • The hand gesture equivalent to a mouse click/trigger pull is an OK/ring sign formed by thumb and one finger. When recognised, the inside of the ring becomes green in the camera image. To be recognised, the ring has to be closed and the inside of the ring has to be of a certain size but may not be too large. (You may want to adjust the apparent size of your hand by moving it closer to or farther away from the glasses.) Also, a large portion of the rest of the gameboard has to be visible, thus, please make sure that there are no other objects in the camera view.
  • If you move your hand while forming the OK/ring sign, the puzzle is rotated based on the movements of your hand in the camera image. (You may change the direction of rotations under SETTINGS > CONTROLS > INVERT and the sensitivity under SETTINGS > CONTROLS > SPEED.)
  • You may activate a white 3D reticle to select rods to solve the puzzles without wand. To do this, turn off(!) the Tilt Five™ wand and(!) select SETTINGS > AR GLASSES > CLONE VIEW > ON. Then form a clamp/pinch gesture without(!) forming a ring, i.e., your thumb should not touch any finger. The fingertips of the two fingers forming the clamp/pinch gesture should be marked by blue pixels in the camera image. If the reticle does not appear, make sure that your hand appears in the center of the camera image, and avoid spreading out the remaining fingers. While the white 3D reticle is on top of a rod, you may then select it by forming the OK/ring sign.
  • Depending on which hand you use and which of your eyes is dominant, you might want to adjust the position of the reticle. To this end, form the OK/ring sign, which should fix the position of the reticle. While you form the ring, move your hand such that the reticle appears where you want it to appear relative to your hand. Then spread out the remaining fingers of your hand, while still forming the ring, i.e., you should form a proper OK sign with your middle, ring, and little finger being spread out. (Alternatively to spreading out your fingers, you can press "K" on the keyboard with your other hand while still forming the ring.) If the gesture is recognised, the red color of the camera images changes to yellow, and the relative position of the reticle is adjusted. Test the adjustment by forming a clamp/pinch gesture. If you are not satisfied, repeat this adjustment process.
  • Note that the game menu cannot be controlled with hand gestures yet; however, if you are at "CONTINUE" in the main menu, showing a OK/ring sign can bring you back to the puzzle.
  • Also note that the Tilt Five™ Glasses often crash after using the tangible camera. This shouldn't be a problem: just unplug the glasses and plug them in again, or reboot them in the Tilt Five™ Control Panel.

The limited field of view of the Tangible Camera is certainly a problem; in particular because the app only detects fingertips near the center of the camera image. Nonetheless, I'm quite enjoying this new way of solving icosi-do puzzles. Since the system has grown quite a bit, I would again appreciate feedback on this test from as many users of Tilt Five™ glasses as possible. Please leave a comment below this announcement, or in the discussion forum, or on the Tilt Five™ discord. Thank you!

ICOSI-DO ON SUMMER SALE!

icosi-do is participating in the Steam Summer Sale 2023 with a whopping 30% discount!

And the demo is still free!

In other news, I continue working on hand gestures for selecting rods when using Tilt Five glasses. Currently, the pointing is quite jittery, but it's getting there.



More thoughts on hand control for Tilt Five users

Thanks to everyone who has provided feedback about the first test of hand control for Tilt Five users in icosi-do! (If you wonder: these discussions happened on the Tilt Five discord.)

Here is a summary of my conclusions:



1) Performance matters: the approach of downsampling the camera image on the GPU and continuing the CPU computations with a 32 x 32 pixels image appears to provide a sufficient refresh rate for somewhat smooth interactivity.

2) The connected component analysis works quite well to detect whether thumb and another finger touch each other and thereby complete the OK/ring sign/gesture. The passive haptic feedback of this moment is a nice touch (pun intended). Of course there are limitations due to the single, infrared camera, but within these technical limitations, it appears to work very well.

3) The "head cursor", i.e. controlling a 3D reticle with head movements, doesn't feel very good. This becomes particularly obvious when comparing the feel of rotating the puzzle with hand control and the feel of selecting rods - the former feels a lot better. One part of the problem might be the required control loop between head movements and movements of the 3D reticle, which feels quite unnatural. Furthermore, the required head movements make the next point worse:

4) The limited field of view of the tangible camera is a real problem. The visualization of the camera image helps, but ideally the interaction design should encourage users to keep their hand in the center of the field of view such that they don't lose tracking. The field of view of the tracking camera would improve the situation, but I don't think that it wouldn't completely solve it. Also, I'm not sure whether or when images of the tracking camera are going to be available.

So, plan for future (i.e. current) work:



1) I'll focus on rod selection. Since the selection is confirmed by closing the ring formed by thumb and another finger, my current idea is to aim at (and highlight while "hovering over") a rod with an "open clamp" hand pose where the tips of thumb and another finger are somewhat close to each other and the center point between them is in line of sight of the selected rod. (The lack of depth information limits the feasible designs.)

2) To implement this design, I'm trying to find the positions of all fingertips in the image by computing the shortest-path distance of dark pixels from the image boundary and picking the pixels with locally maximal distance. I hope that I can easily identify the position of thumbs based on their small y-coordinate. The second fingertip might then just be the closest fingertip in the image. I suspect that positions on a 32 x 32 grid will not be accurate enough for a pleasant interaction, but that's a problem for another day (and I have an idea how to address it).

In other news: Steam Summer Sale 2023 is starting on June 29!




Patch V-13: First test of hand gesture control for Tilt Five glasses

This small update only includes a small test for users of Tilt Five™ glasses to see how icosi-do might be controlled with hand gestures (as suggested in my previous post). If you have access to Tilt Five™ glasses, please try it out with the full game or the demo.

Some instructions (only for Tilt Five™ users):

  • In the Tilt Five™ Control Panel under "Settings", activate "Allow Camera Frame Sender". (Without camera frames, hand gesture control is not possible. The infrared camera frames are sent from the Tilt Five™ glasses to the connected PC and analysed there to control the game.)
  • In icosi-do, the test of hand gesture control is activated under SETTINGS > AR GLASSES > OK SIGN > TEST. Through the Tilt Five™ glasses, you should then see a red, low-resolution camera image below the puzzle.
  • The only supported hand gesture is an OK/ring sign formed by thumb and one finger. This gesture represents a left mouse click or trigger pull. When recognised, the inside of the ring becomes green in the camera image. To be recognised, the ring has to be closed and the inside of the ring has to be of a certain size but may not be too large. (You may want to adjust the apparent size of your hand by moving it closer to or farther away from the glasses.) Also, a large portion of the rest of the gameboard has to be visible, thus, please make sure that there are no other objects in the camera view.
  • If you move your hand while forming the OK/ring sign, the puzzle is rotated based on the movements of your hand in the camera image. (You may change the direction of rotations under SETTINGS > CONTROLS > INVERT and the sensitivity under SETTINGS > CONTROLS > SPEED.)
  • You may activate a white 3D reticle to select rods by head movements to solve the puzzles without wand. To do this, turn off(!) the Tilt Five™ wand and(!) select SETTINGS > AR GLASSES > CLONE VIEW > ON. While the white 3D reticle is on top of a rod, you may then select it by forming the OK/ring sign.
  • Note that the game menu cannot be controlled with hand gestures yet; however, if you are at "CONTINUE" in the main menu, showing a OK/ring sign can bring you back to the puzzle.

All that might sound a bit complicated because it is! In fact, the small field of view of the camera can be quite frustrating. Moreover, I'm not sure about some of the parameters used in the detection of the OK/ring sign, specifically the minimum and maximum allowed size of the inside of the ring. Thus, I would really appreciate feedback on this test from as many users of Tilt Five™ glasses as possible. Please leave a comment below this announcement, or in the discussion forum, or on the Tilt Five™ discord. Thank you!

What I am working on (June 2023 edition)

On June 5, 2023, Apple has presented the Apple Vision Pro and with it a new "spatial computing" interface based on (among other things) eye tracking and hand gestures. While the keynote presentation had its controversial moments, the reports by journalists who were able to experience early devices were quite enthusiastic.

So, this made me think: how could I integrate some of the features of this spatial computing interface into the Tilt Five version of icosi-do? To answer this question, let's step back a bit: At its core, eye tracking provides a direction in 3D space, which is similar to the tracked direction of Tilt Five glasses. The hand gestures potentially provide much more information, but the most basic information is just a single Bit: users are or aren't pinching with their hands, which is similar to pressing or not pressing a single button.

How to build a user interface based on a tracked direction and one button? It turns out that this is not only similar to using Google Cardboard but also to using icosi-do with the Tilt Five glasses but without the Tilt Five wand. And even better: I've already dealt with this situation and implemented a kind of reticle that can be controlled by moving the Tilt Five glasses! This covers the tracked direction.

What about the button? Instead of the trigger and buttons of the Tilt Five wand, the buttons of a gamepad controller, mouse, or keyboard may be used. To create an interface without such controllers, icosi-do could use an infrared camera of the Tilt Five glasses and detect a specific hand gesture. If the hand gesture is detected, the game could behave as if a button was pressed. Easy, right?

But which hand gesture is suited best? "Pinching" is not ideal because some users might "pinch" with their thumb and index finger forming a small angle (almost parallel fingers) as if they try to hold a large "pinch" of salt between thumb and index finger. This would be difficult to detect because of fingers overlapping in the camera image. A better hand gesture is the "OK sign", i.e., forming an "O" or "ring" with the thumb and index finger, and spreading the other fingers to form a "K". An alternative is the "ring gesture" (or "ring sign"), which doesn't require spreading the other fingers.

How to detect an OK/ring sign/gesture? I did a few experiments with the available infrared camera of the Tilt Five glasses and concluded that the gesture has to be performed in front of the retroreflective part of the gameboard, otherwise there probably isn't enough contrast between fingers and background to reliably detect hand gestures. Also, this isn't a big limitation because the gesture has to be visible to the camera anyway. The easiest way to detect an OK/ring sign/gesture in front of the retroreflective gameboard is probably to classify the pixels into "foreground" (high intensity of retroreflected infrared light) and "background" (low intensity of infrared light diffusely reflected by skin), run a connected-component analysis and search for a connected component ("blob") that represents the inside of the closed ring formed by thumb and index finger. My assumption is that this blob should be larger than the blobs corresponding to the dots on the boundary of the gameboard and smaller than the blobs corresponding to the rest of the visible retroreflective parts of the gameboard.

The interesting questions is then: how to run an efficient connected-component analysis on a relatively large image at more than 10 frames per second? My first attempt was to create a GPU-based pyramid algorithm for this. It worked in some cases, but wasn't very reliable. Also: I concluded that I had to analyse a low-resolution version of the camera image on the CPU in any case, which inspired another idea: a GPU-based downsampling algorithm that preserves the connected components of the image, followed by a connected-component analysis on the CPU.

So, that's what I'm working on right now. To be clear: this won't include any eye tracking nor will it allow to detect more than a single hand gesture. But at least this might be a step towards a controller-free interface when using Tilt Five glasses, which might be a lot of fun.

Note about the music in icosi-do (for YouTubers and other streamers)

On YouTube, when you upload videos of icosi-do that include the built-in music, you might get hit by Content ID claims. That's because Bach's music is very popular and a couple of companies claimed copyrights for specific performances.

Here is what you need to know: It is safe to dispute any Content ID claim related to Bach's music in icosi-do, because there are no copyrights on the 300 years-old music by Bach and the performance in icosi-do is not copyrighted by third parties because I programmed that performance myself based on sheet music.

Personally, I got 4 Content ID claims on the music by 4 different companies, and successfully disputed all of them without problems. If you want to avoid the issue of Content ID claims altogether, you can turn off the background music under SETTINGS > AUDIO > MUSIC.

Please let me know if you have any questions or comments.

Patch V-12: new button with old function and old button with new function

Another small patch polishing the interface for users of a single-button mouse:

  • A "hint"/"undo" button has been added in the lower-left corner, which has the same function as the "X" key/button.
  • In the tutorial, the "right arrow" button is now clickable in the hands-on parts of the tutorial with the puzzle. In this case, it has the same function as the escape key/button, i.e., it continues with the next step of the tutorial. The idea is that you can quickly click through the tutorial by repeatedly clicking that button. (It might also be less confusing than using the escape key/button to continue in the hands-on parts, but using the right arrow/"D" key to continue when text is presented.)

Please let me know any further ideas to improve the interface for users of a single-button mouse, or other comments or feedback. Thanks!

Patch V-11: slightly better layout

Only a tiny patch polishing the user interface:

  • The left/right/esc icons have been moved slightly towards the centre to improve the layout for some aspect ratios (and the main text is a bit smaller to make space for them).
  • The areas to click the left/right/esc icons now move with the icons. (Previously they were defined relative to the window).
  • The bottom rod moves smaller distances for the credits such that it should stay inside the window for more aspect ratios.

As always, please let me know any comments!