I happen to work in machine learning. You are most likely referring to the Stanford Gyrophone paper. Given that the sampling frequency of the gyroscope sensor on typical smartphones is extremely limited, you can only get very low frequency content (Nyquist).
It wouldn’t be possible for any human to process or understand the recorded signals, so the researchers trained a machine learning model on the recorded samples, with a very limited vocabulary consisting of only the digits from 0 to 9 and “oh”.
If the model was not trained on the particular speaker (requiring annotated training data for that particular speaker, which would be almost impossible to get in the assumed scenario), the recognition rate was 26%. For a vocabulary of 11 words.
It’s a nice proof of concept, and doubly so if tge CIA considers you a target, but otherwise it’s not happening.
I happen to work in machine learning. You are most likely referring to the Stanford Gyrophone paper. Given that the sampling frequency of the gyroscope sensor on typical smartphones is extremely limited, you can only get very low frequency content (Nyquist).
It wouldn’t be possible for any human to process or understand the recorded signals, so the researchers trained a machine learning model on the recorded samples, with a very limited vocabulary consisting of only the digits from 0 to 9 and “oh”.
If the model was not trained on the particular speaker (requiring annotated training data for that particular speaker, which would be almost impossible to get in the assumed scenario), the recognition rate was 26%. For a vocabulary of 11 words.
It’s a nice proof of concept, and doubly so if tge CIA considers you a target, but otherwise it’s not happening.