What Input Modalities Would be Effective for AR and Text Input?


Experimenting With Gestural Input Potential for AR Applications


If augmented reality (AR) is to become more ubiquitous, the input modalities need to be established. For situations when speech input isn’t possible or ideal, there will need to be a secondary mode of input which should be possible while walking, standing or sitting. Gestural input could offer a solution.

Individual Project (as part of major project)



1 week


My Role:

Desk Research

User Experience Design

Interaction Design

Usability Testing

Visual Design

Coding (Processing)

Video Prototyping

My major project asks what the future of web usage might be with ubiquitous head worn AR glasses and what kind of operating sytem would run it.


As part of the project, I carried out a sprint of research, usability testing and prototyping to attempt to uncover what type of input modalities would be required through testing how text input might be accomplished. Speech input is likely to be a primary mode of input, however there are situations where speech isn’t viable such as loud environments, public anxiety or password input.


Reviewing Other Methods


To gain a deeper understanding of other methods of text input which have been established with other systems, I carried out an extensive review of what already exists.


The review explored physical and digital keyboards and their potential viability for an AR system. Analogues of gestural inputs such as the Xbox “Daisy Wheel” and Steam Controller keyboards were of particular interest.

Current AR Options


Current text input methods for AR are essentially reverse engineered digital versions of physical keyboards. “Gaze and select” or “gaze and dwell” are methods which involve the user pointing their head at an individual character on a digital QWERTY keyboard to type, leading to an extremely cumbersome experience.


Other methods transpose a QWERTY keyboard to a surface or object; also resulting in a less than ideal experience. New methods need to be explored, rather than simply applying what works for other devices.

Image Credit: thetechjournal.com



CTRL-Kit by CTRL-Labs is being pitched as a non-invasive neural interface wearable which deciphers information from neurons through machine learning and surface electromyography (EMG).


This technology will allow the user to transfer hand movement into digital information, but it also claims to allow you to control digital content without physically moving. This technology could be applied to AR in the form of a wearable to track hand movements regardless of location.


Daisy Wheel & Dual Touch


As part of testing a gestural form of text input, I conducted some usability tests with the Steam and daisy wheel keyboards. Due to time restraints, the testing was a limited study of 4 participants including myself.


While these tests were very quick and with a low amount of participants, I believe it gave good insight into how a text input method should feel:


A level of familiarity helps the user to transfer their current skill and knowledge (Steam uses a QWERTY layout while the daisy chain is completely new)


The typing experience can be ranked highly even if it’s less efficient (Steam keyboard was ranked above or comparable to the mobile keyboard in preference)


Feedback is linked to experience & performance (The Steam controller provided haptic feedback and freedom of movement, the daisy wheel was slightly unresponsive and difficult to tell when a key has been pressed)


Error rate reduces experience quality (The daisy wheel led to irritation towards the end of the test)


Prototype 1


The first version tracks both hands. The right hand is used for character selection; by rotating your hand you can move the selection left and right, and by tilting your hand up or down you can access the other rows in the keyboard. The left selects characters through pinching, adds a space by tilting your hand down and a backspace by tilting your hand up.


I used a QWERTY keyboard layout for the prototype to increase familiarity of character location. After some testing, major problems became very apparant. When rotating your right hand, it can become a natural reaction to tilt the hand upwards or downwards which causes a lot of difficulty in character selection. Hand tracking limitations with the hardware caused some difficulty too; when users moved one hand over the other the tracking sometimes failed.


I wanted to create a prototype which felt more natural, making more use of actual gestures.


Prototype 2


For this prototype I removed keyboard row selection through tilting, replacing it with row selection through finger extension. 0 – 1 fingers extended selects the top row, 2 fingers extended selects the second row & 3+ fingers extended selects the bottom row.


This method felt similar to playing a musical instrument or using sign language to type and the concept was enjoyed by those who used it, however some people did have trouble becoming used to complete gestural input.


Assessing the Prototypes


I tested myself on both prototypes to assess their efficiency against the other methods I had already tested. I found the efficiency to be comparable to the daisy wheel Xbox input (especially when cross referenced with other participant metrics) but with a far superior experience and flow to it. The error rate is far higher primarily due to limitations in tracking hardware and software accuracy.


Even though the gestural typing methods performed worse than the other methods in this test, with some improvements I believe it could be a viable form of text input for AR:


Improved hand tracking technology: A company called CTRL-Labs are working on a wearable (CTRL-Kit) which measures electrical pulses along the neurons in the arm, which can track hand movements without cameras, or even physical movements.


Optimised Keyboard Layout: With a keyboard layout more attuned to gestural input, I can see the efficiency of the method rising significantly.


Single Hand Typing: I was unable to create a comparable single handed prototype using the technology I had, however the CTRL-Kit could bypass camera tracking and unleash the full potential of one handed gestural typing.


Hand Position: My prototype required the user’s hands to be raised to elbow level for accurate hand tracking, once again, CTRL-Kit could allow the user’s hand to be wherever they want.


Prototype 3


I created a keyboard based on the most frequently used letters in typing. This prototype still used the same extended finger and hand rotation controls, however to increase accuracy of character selection on each row, I increased the row count to 5.


Much higher speed and accuracy was achievable with this iteration of the prototype. I managed to increase my total speed by 30% (258 seconds vs 352 seconds) and reduce my error rate by over 80% (4 errors vs 19 errors) in comparison to the previous gestural QWERTY prototype.


This improvement is due to the reduced accuracy and movement required for selecting each letter because of the optimised keyboard layout and reduced amount of letters per row, but increased familiarty with the gestural interface could also be a factor.







My proposed gestural input method for text input in AR is a one handed approach combined with a suitable keyboard layout. This layout takes frequently used letters and letter combinations into account and places them near the neutral hand position.


Each row is controlled by individual fingers and characters are selected by moving the specified finger to the palm of the hand. Space and backspace are controlled by moving the thumb to the index finger. Letter columns are selected by rotating the hand.


Ergonomic testing showed that moving the pinky finger to the palm is not always a comfortable or easy action. The least commonly used letters have been placed on the pinky row, and the number of letters on that row has been reduced.


This concept would allow a user to type while sitting, standing or walking, with a single hand by their side. Predictive text would be combined with the concept to considerably increase typing speed and reduce required accuracy.






Chosen Input Modalities


Primary Input – Speech: Speech recognition software is becoming more effective at interpreting long strings of spoken language. When it comes to fast and effective text input, speech would perform well and be suitable for the majortiy of contexts.


Secondary Input – Gestural: In contexts where speech wouldn’t be ideal such as loud environments or situations where silence is required or socially expected; when inputting passwords or sensitive information; if the user dislikes speech input and has a preference for gesture.


The gestural input would be interpreted through a wrist worn wearable with technology such as CTRL-Kit. This will allow fine motor control such as movement, position and gesture for other forms of interaction.