Tracing a gesture from a single chip

My engineering final-year project at CEA — reconstruct the 3D path of a simple, repeated hand gesture from nothing but a coin-sized inertial sensor, using deep learning. A CNN + bidirectional-LSTM network learns the motion, an error-state Kalman filter grounds it, and a rig I built by hand supplies the ground-truth path.

host: CEA LIST · LISA lab
programme: UTC engineering · ARS
role: Final-year intern
period: 2019 → 2020
sensor: BNO055 IMU

The pipeline — short windows of raw accelerometer and gyroscope data from a BNO055 IMU feed a CNN + bidirectional-LSTM network that predicts the change in position (Δp) and orientation (Δq) per window; chaining those with quaternion math rebuilds the 3D path. A learned multi-task loss balances position against orientation during training, ground truth comes from a hand-built mirror-and-infrared rig (and a Google Tango phone), and an error-state Kalman filter serves as the classical baseline.

01OVERVIEW

A camera can watch you move, but it needs line of sight, space and decent light. A tiny inertial sensor — the accelerometer-and-gyroscope chip that sits in every phone — has none of those needs: it's cheap, it fits anywhere, and it samples motion a hundred times a second. The catch is turning its raw readings into an actual path. You have to integrate noisy acceleration twice, and the tiniest bias snowballs into metres of error within seconds.

This was my final-year engineering project (projet de fin d'études) at CEA LIST, in the LISA lab for sensory and ambient interfaces, supervised by Pierre-Henri Orefice and Mehdi Boukallel. The brief was deliberately narrow: not a general motion tracker, but the 3D trajectory of one simple, repeated hand gesture — the kind you'd study in sport or rehabilitation — recovered from a single BNO055 IMU, using deep learning to swallow the sensor biases that classical methods have to model by hand.

I built two systems side by side. A classical baseline first — an error-state Kalman filter over the full six-degree-of-freedom kinematics — so I'd know what 'good' looked like. Then the deep-learning model: it takes short windows of raw acceleration and angular velocity and predicts the change in position and orientation between two instants; chain those small deltas with quaternion math and a trajectory falls out. Two convolutional branches read the accelerometer and gyroscope separately, a bidirectional LSTM ties the sequence together, and — the part I'm proudest of — a learned multi-task loss automatically balances the position error (in metres) against the orientation error (unit-less), so neither term drowns the other during training.

The hard, unglamorous part was ground truth: you can't train a model without knowing the real path. With no motion-capture lab — and then a COVID lockdown — I built my own rig at home. The sensor slid down an inclined mirror past three infrared distance sensors wired to an Arduino and a Raspberry Pi, which gave an exact reference path from basic physics. After lockdown I strapped the IMU to a Google Tango phone, whose visual-inertial tracking is accurate to a few centimetres, and time-aligned the two devices by cross-correlating their motion. The verdict was honest: orientation was predicted almost perfectly, but position is genuinely hard — accurate over short, mostly-1D motions and drifting on longer 2D ones. That limit of inertial-only odometry was exactly what the project set out to measure.

02WHAT I BUILT

Deep inertial-odometry network

A CNN + bidirectional-LSTM that takes short windows of raw accelerometer and gyroscope data and predicts the change in position and orientation per window. Two convolutional branches read the two sensors separately before a BiLSTM fuses the sequence; chaining the per-window deltas with quaternion math reconstructs the full 3D path.

Learned multi-task loss

Position error is measured in metres, orientation error is unit-less — left alone, one swamps the other. A small learned layer weights the two terms by their own uncertainty, so the network balances translating well against rotating well, with no hand-tuned coefficients.

Error-state Kalman baseline

A full 6-DOF error-state Kalman filter over the kinematic and error models, with quaternion orientation, as the classical yardstick the learned model had to beat — and the honest reference for how far raw integration drifts.

A ground-truth rig, built by hand

Under lockdown, an inclined mirror plus three infrared rangefinders on an Arduino and Raspberry Pi gave an exact reference path from physics; afterwards, a Google Tango phone provided a few-centimetre visual-inertial reference. The two devices were time-synced by cross-correlating their motion signals.

2D methods, pushed to 3D

Re-implemented and extended state-of-the-art inertial-navigation methods (RoNIN, RIDI) from 2D navigation to full 3D, and validated the whole approach on the public OxIOD dataset before trusting it on my own recordings.

03STACK

Method

CNN + BiLSTMerror-state Kalman filterlearned multi-task loss6-DOF odometry

Hardware

BNO055 IMURaspberry PiArduinoIR rangefindersGoogle Tango

Data · tools

PythonOxIOD datasetquaternion kinematics