Enriched Speech for Effortless Listening

Block Diagram

Abstract

Usually, people produce effortless speech in order to communicate under casual or typical circumstances. That type of speech produced with no special speaking effort is referred to as casual speech. Casual speech is mostly less articulated. On the contrary, clear speech is well-articulated with increased effort. Clear speech has proven to increase the intelligibility of speech in many listening contexts. Voice talents have received special training for producing clear speech. However, in general and while speakers try to adjust their speech to achieve the maximum clarity with the minimum speech effort, the usual output is considered as casual speech. Moreover, in many occasions, speakers produce speech without knowing if their audience face any communication barrier such as noise or even having a hearing-impairment. Converting casual speech to clear speech is, therefore, important but also quite challenging. Considering many examples of clear speech from various speakers, data driven machine learning techniques are employed to first convert the casual speech to text and then text to clear speech, without modifying speaker identity.

Publication
In ICASSP 2020 Show & Tell