Insights

Speaking in gestures

Designing a gesture library using hand tracking

Doug Cook

—

Oct

2023

In the right context, hand gestures can be an attractive alternative to on-screen interfaces, especially in the context of XR and wearables. With advances in computer vision and machine learning, it is now easier than ever to train your own custom models to recognize these gestures.

Continuing our explorations into natural interactions, we recently set out to build a small library of gestures based on simple hand signals and finger tracking to aid our prototyping on partner projects.

‍

ABC

Easy as 1, 2, 3

Google’s MediaPipe libraries provide a great starting point for prototyping these interactions. Specifically, MediaPipe’s Gesture Recognizer provides a quick and accessible set of models for categorizing hand gestures, identifying handedness, and recognizing up to 20 different hand coordinates.

‍

four hands held up with blue and white nodes outlining the digits

‍

To do this, Google’s Gesture Recognizer uses two different models: a hand landmark model and a gesture classification model.

The landmark model detects the presence of hands and hand geometry to identify palm and finger coordinates, while the classification model uses a two step neural network to detect and recognize gestures.

That may sound like a lot, but out of the box, it just kind of works. In fact, it turns out that a number of gestures are not only easy to recognize, some are even recognized by default using Google's own models, making it great for design prototyping.

‍

Identifying fingers and positions

Readers of our last post will remember Mediapipe’s landmark model. That model provides an array of 21 points, each of which is a 3D coordinate in a normalized coordinate system (i.e., screen-independent).

Each landmark is composed of x, y, and z coordinates. x and y correspond to the landmark’s position, with z representing how close the landmark is to the camera. Using these landmark coordinates, it’s possible to track and recognize a number of basic gestures.

‍

‍

Recognizing signals and directions

Google’s gesture recognizer and default models can detect a number of basic hand signals, including open palm, closed fist, pointing up, thumbs up, thumbs down, victory, and I love you.

‍

Learning our numbers

In creating our gesture library, we thought a good next step would be to add hand-signaled numbers. Although not recognized by default, the number of raised fingers is easy to detect using MediaPipe’s default landmarks.

The model determines how many fingers are up by checking the tip of each finger. If the y coordinate of a particular fingertip falls below the coordinate of its central landmark, the model recognizes the finger as closed.

‍

Adding the full alphabet

To extend our library, we thought it only natural to try adding support for American Sign Language (ASL)—a natural language that serves as the primary sign language of deaf communities in the United States and Canada. ASL has a set of 26 signs, known as the American Manual Alphabet, that can be used to spell words from the English language.

‍

‍

Creating a model to recognize ASL is a bit more involved, but you can leverage other models or train your own custom model to use with MediaPipe.

Fortunately, ASL training data is readily available on the web, from custom image libraries to hand shape datasets to pre-built models. For our purposes, we decided to start with a subset of images from a larger training dataset of 87,000 images found on Kaggle.

Using this image set, it was possible to train a custom model to use with MediaPipe using Tensorflow.

‍

Another option would have been to use MediaPipe to capture images or landmark coordinates for each sign to create our own dataset. Either of these methods would have worked, and we explored both, but since larger datasets tend to yield better results, and the datasets on Kaggle were more than adequate, we ultimately decided not to reinvent the wheel.

We have a few more things in the works, but in the meantime, be sure to check out our first writeup on gesture-based interactions.

Have an idea or interested in learning more? Feel free to reach out to us on Instagram or Twitter!

‍

Special thanks to Natalie Vanderveen, Jaden Flores, and Morgan Gerber

Doug Cook

FOUNDER AND PRINCIPAL

Doug is the founder of thirteen23. When he’s not providing strategic creative leadership on our engagements, he can be found practicing the time-honored art of getting out of the way.

Around the studio

UPDATES

Studio News

Speaking in gestures

ABC

Easy as 1, 2, 3

Identifying fingers and positions

Recognizing signals and directions

Learning our numbers

Adding the full alphabet

Doug Cook

Around the studio

Designing agentic AI with Dell

Death of a browser

A minimalist’s guide to agents

BAM! BOOM! KAPOW!

Creating a more inclusive, connected world with AI

Fixify raises $25M in Series A

The future of accessibility

Seeing the world through AI

Modeling new shoes

The new language of experience design

How LLMs are reshaping digital experiences

Celebrating Earth Day

Lost in translation

Designing for health and longevity

We made Inc’s 2024 Regionals list!

Intelligent care

Design boom

Designing invisible interfaces

Interacting in space

The internship experience

AI in the kitchen

Mentoring our interns

Designing in the age of intelligence

thirteen23 honored by Inc Magazine

2022: A retrospective

Looking forward to new innovations

Camp thirteen23

Rebranding thirteen23

Design collaboration from afar

Bringing Design Friday to our team

Our design playbook

Sign up to our newsletter

Thanks for subscribing!