Speechmatics advances recognition of accentuated English – TechCrunch

0

Voice recognition has gone from practical to crucial in recent years as smart speakers and driver assistance modes have taken off, but not everyone’s voice is equally recognized. Speechmatics claims to have the most inclusive and accurate model around, beating Amazon, Google and others when it comes to speaking outside of the most common American accents.

The company explained that it was guided to the question of accuracy by a 2019 Stanford study titled “Racial Disparities on Speech Recognition,” which found exactly that. Voice engines from Amazon, Apple, Google, IBM and Microsoft “exhibited substantial racial disparities, with an average word error rate (WER) of 0.35 for black speakers versus 0.19 for white speakers.” . Not great!

Part of the source of this disparity can be attributed to a lack of diversity in the datasets used to train these systems. After all, if there are few black speakers in the data, the model will not learn these speech patterns as well. The same can be said for speakers with other accents, dialects, etc. – America (not to mention the UK) is full of accents and any business claiming to offer services to “everyone” should be aware of this.

Either way, UK-based Speechmatics has made the accuracy of accented English transcription a priority for its latest model, and it claims to have blasted the others out of the water. Based on the same data sets used in the Stanford study (but using the latest versions of speech software), “Speechmatics recorded an overall accuracy of 82.8% for African American voices compared to Google ( 68.7%) and Amazon (68.6%), ”the company wrote in its press release.

The company attributes this success to a relatively new approach to creating a voice recognition model. Traditionally, the machine learning system comes with labeled data – think of an audio file of speech accompanied by metadata or a text file containing what is being said, usually transcribed and verified by humans. For a cat detection algorithm, you would have images and data showing which contains cats, where the cat is in each image, etc. It is supervised learning, where a model learns the correlations between two forms of prepared data.

Speechmatics used self-supervised learning, a method that has gained momentum in recent years as data sets, learning efficiency, and computing power have increased. In addition to labeled data, it uses raw, unlabeled data and more, building its own “understanding” of speech with much less guidance.

In this case, the model was based on roughly 30,000 hours of tagged data to get some sort of basic level of understanding, and then fueled by 1.1 million hours of publicly available audio from YouTube, podcasts. and other content. This type of collection is a bit of a gray area, as no one has explicitly consented to their podcast being used to train someone’s commercial speech recognition engine. But it’s used that way by many, just as “the whole Internet” was used to form OpenAI’s GPT-3, probably including thousands of my own articles. (Although he hasn’t mastered my unique voice yet.)

In addition to improving accuracy for Black American speakers, the Speechmatics model claims better transcription for children (around 92% accuracy vs. around 83% in Google and Deepgram) and minor but significant improvements in English with accents from all over the world: Indian, Filipino, Southern Africa and many more – even Scottish.

They support dozens of other languages ​​and are also competitive in many of them; this is not only an English recognition model, but given the use of the language as a lingua franca (a hilariously unfit idiom these days), accents are especially important to him.

Speechmatics may be ahead in the metrics it cites, but the world of AI is changing at an incredibly fast pace and I wouldn’t be surprised to see more leaps in the next year. Google, for example, is working hard to make sure its engines work for people with speech disabilities. Inclusion is a big part of all AI work these days and it’s good to see companies trying to outdo themselves.

Leave A Reply

Your email address will not be published.