Speech signal conveys both linguistic and paralinguistic information. Linguistic information is based on language or dialect related characteristics, whereas speaker timbre and prosody come under para-linguistic information. A typical voice conversion (VC) system converts an utterance of a source speaker to create the perception of being spoken by a specified target speaker. The objective of a VC system is to learn a mapping function through training that can mimic desired target speaker’s voice. This is done by transforming speaker timbre and prosody while keeping the linguistic message in the utterance unchanged. Various tasks such as personalized Text-to-Speech (TTS) systems, entertainment, speaking assistance and speech enhancements are benefited by the application of VC.

