Dipjyoti Paul

Ph.D. Student | Research Scientist | Machine Learning | Deep Learning | Audio Signal Processing | Computer Vision | Conversational AI

Computer Science Department, University of Crete

Biography

I am a PhD researcher at the Department of Computer Science, University of Crete, Greece under Dr. Yannis Stylianou. I am co-advised by Dr. Yannis Pantazis and Dr. Simon King. My research interests span the general area of theory and applications of machine learning algorithms especially deep learning. My research agenda is to establish a thorough understanding of the theoretical concepts and then leverage these concepts for solving various real-world problems. I have research interests in diverse areas such as Applied Probability Theory, Machine/Deep Learning, Computer Vision, Signal/Speech Processing, Optimization Theory. I also have a broad interest in probabilistic machine learning methods and generative models.

Download my resumé.

Interests

Machine/Deep Learning
Audio Signal Processing
Speech Synthesis & Voice Conversion
Image Processing & Computer Vision
Spoofing & Anti-spoofing
Speaker Recognition.

Education

PhD in Computer Science, 2018 - Present

University of Crete
MS in Electronics & Electrical Communication Engineering, 2014 - 2017

Indian Institute of Technology Kharagpur
BTech in Electronics & Communication Engineering, 2009 - 2013

West Bengal University of Technology

Skills

Python

PyTorch

TensorFlow

Spark

Linux

Docker

AWS

Statistics & Signal Processing

Machine/Deep Learning

Experience

Senior Research Scientist (Part-time)

Stealth Startup, London, UK (Remote)

Oct 2020 – Present Greece

Responsibilities include:

Development of voice morphing algorithms.
Development of Text-to-Speech (TTS) systems.
Working on improving speech quality from lower bandwidth and lower bit-depth samples.

ML/AI/Audio/CV Freelancer

Upwork

Oct 2017 – Sep 2020 Greece

My work varied from generic task formulation and consulting to the full cycle of design-development-deployment. In general, I helped to figure out and develop viable solutions for ML-based applications such as Audio, Computer Vision etc.

Visiting Researcher

Chania General Hospital, Greece

Sep 2020 – Sep 2020 Greece

Analysing voice pathology data and design novel deep learning algorithms to enhance speech intelligibility.

Visiting Researcher

Voxygen, France

Mar 2019 – May 2019 France

Developed a pipeline for expressive text-to-speech synthesis using sequence-to-sequence learning.

Visiting Researcher

Foundation for Research and Technology - Hellas, Greece

Aug 2018 – Oct 2018 Greece

Developed new training algorithms for Generative Adversarial Networks (GANs).

Marie Skłodowska-Curie Fellow

ENRICH European Union’s Training Network (ETN).

Oct 2017 – Sep 2020 Greece

Introduced universal multi-speaker, multi-style expressive TTS systems.
Developed speech intelligibility enhancement algorithms in speech synthesis.
Analyzed the feasibility of incorporating variational representation of disentangled representations learning in real-world scenarios.
Proposed novel voice morphing algorithms.

Junior Project Officer

Indian Institute of Technology, Kharagpur & Indian Space Research Organization (ISRO), Govt. of India.

Dec 2013 – Aug 2017 India

Built authentication systems that enhance the security of automatic speaker verification systems against intentional circumvention using fake audio recordings.

Selected Publications & Patents

A Universal Multi-Speaker Multi-Style Text-to-Speech via Disentangled Representation Learning based on Rényi Divergence Minimization

This paper presents a universal multi-speaker, multi-style Text-to-Speech (TTS) synthesis system which is able to generate speech from text with speaker characteristics and speaking style similar to a given reference signal.

Dipjyoti Paul, Sankar Mukherjee, Yannis Pantazis, Yannis Stylianou

A Universal Multi-Speaker Multi-Style Text-to-Speech via Disentangled Representation Learning based on Rényi Divergence Minimization

Cumulant GAN

In this paper, we propose a novel loss function for training Generative Adversarial Networks (GANs) aiming towards deeper theoretical understanding as well as improved performance for the underlying optimization problem.

Yannis Pantazis, Dipjyoti Paul, Michail Fasoulakis, Yannis Stylianou, Markos Katsoulakis

Speech frame selection for spoofing detection with an application to partially spoofed audio-data

In this paper, we introduce a frame selection strategy for improved detection of spoofed speech.

A Kishore Kumar, Dipjyoti Paul, Monisankha Pal, Md Sahidullah, Goutam Saha

Speech frame selection for spoofing detection with an application to partially spoofed audio-data

Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style Conversion

We proposed a novel transfer learning approach using Tacotron and WaveRNN based TTS synthesis to provide high intelligibility gains in speech-shaped noise and competing-speaker noise.

Dipjyoti Paul, Muhammed PV Shifas, Yannis Pantazis, Yannis Stylianou

Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style Conversion

Speaker Conditional WaveRNN: Towards Universal Neural Vocoder for Unseen Speaker and Recording Conditions

We propose a variant of WaveRNN, referred to as speaker conditional WaveRNN (SC-WaveRNN). We target towards the development of an efficient universal vocoder even for unseen speakers and recording conditions.

Dipjyoti Paul, Yannis Pantazis, Yannis Stylianou

Speaker Conditional WaveRNN: Towards Universal Neural Vocoder for Unseen Speaker and Recording Conditions

Non-Parallel Voice Conversion Using Weighted Generative Adversarial Networks

We suggest a novel way to train Generative Adversarial Network (GAN) for the purpose of non-parallel, many-to-many voice conversion.

Dipjyoti Paul, Yannis Pantazis, Yannis Stylianou

Non-Parallel Voice Conversion Using Weighted Generative Adversarial Networks

Training Generative Adversarial Networks with Weights

We propose a simple training variation where suitable weights are defined and assist the training of the Generator.

Yannis Pantazis, Dipjyoti Paul, Yannis Stylianou

Training Generative Adversarial Networks with Weights

Synthetic speech detection using fundamental frequency variation and spectral feature

In this paper, we targeted that by focusing on three major types of artifacts related to magnitude, phase and pitch variation, which are introduced during the generation of synthetic speech. We proposed a new approach to detect synthetic speech using score-level fusion of front-end features namely, constant Q cepstral coefficients (CQCCs), all-pole group delay function (APGDF) and fundamental frequency variation (FFV).

Monisankha Pal, Dipjyoti Paul, Goutam Saha

Synthetic speech detection using fundamental frequency variation and spectral feature

Generalization of Spoofing Countermeasures: a Case Study with ASVspoof 2015 and BTAS 2016 Corpora

This work investigates the generalization capability of spoofing countermeasures in restricted training conditions where speech from broad attack types are left out in the training database.

Dipjyoti Paul, Md Sahidullah, Goutam Saha

Generalization of Spoofing Countermeasures: a Case Study with ASVspoof 2015 and BTAS 2016 Corpora

Spectral Features for Synthetic Speech Detection

In this paper, we first study the characteristics of synthetic speech vis-à-vis natural speech and then propose a set of novel short-term spectral features that can efficiently capture the discriminative information between them.

Dipjyoti Paul, Monisankha Pal, Goutam Saha

Spectral Features for Synthetic Speech Detection

Enriched Speech for Effortless Listening

The present demo exhibits a system and method for casual to clear speech conversion in the context of speech intelligibility enhancement.

Dipjyoti Paul, Avashna Govender

Enriched Speech for Effortless Listening

System and method for automatic synthetic speech detection for speech based biometric authentication

The present invention discloses a system and method for automatic systematic speech detection for speech based biometric authentication.

Goutam Saha, Dipjyoti Paul, Monisankha Pal

System and method for automatic synthetic speech detection for speech based biometric authentication

Accomplishments

Invited Talk at Apple Siri.

Apple, UK Nov 2020

Presented my research work on Text-to-Speech Synthesis.

Public understanding event at the Royal Institution in London.

Royal Institution, London, UK Mar 2020

Presented my research work in the public understanding event.

Google’s Speech Technology Summit.

Google AI, London, UK May 2018

Invited to attend Google’s 3rd Speech Technology Summit at Google London.

Marie Skłodowska-Curie Fellowship

European Union. Oct 2017 – Sep 2020

Awarded Marie Skłodowska-Curie Fellowship during the year 2017–2020 from European Union’s training network (ETN).

Travel Grant.

IIT Kharagpur, India Apr 2017

Received full financial assistance to present research paper at ICASSP 2017 held at New Orleans, Louisiana, USA

Reviewer (Journals and Conferences)

Apr 2016 – Aug 2021

IEEE/ACM Transactions on Audio, Speech, and Language Processing
IEEE Journal of Selected Topics in Signal Processing
IEEE Access
Computer Speech and Language
International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Conference of the International Speech Communication Association (INTERSPEECH)
Spoken Language Technology (SLT)

Research Fellowship

Indian Space Research Organization (ISRO), Govt. of India. Apr 2014 – Jun 2017

Awarded Fellowship during the year 2014–2017 from Indian Space Research Organization (ISRO), Govt. of India.

Projects

Contact

dipjyotipaul@csd.uoc.gr
+30 6944360873
Office H304, Department of Computer Science, University of Crete, Heraklion, Crete 70013
Skype Me

Dipjyoti Paul

Ph.D. Student | Research Scientist | Machine Learning | Deep Learning | Audio Signal Processing | Computer Vision | Conversational AI

Computer Science Department, University of Crete

Biography

Skills

Experience

Selected Publications & Patents

Accomplishments

Projects

Image Translation

Image Generation

Text-to-Speech Synthesis

Voice Conversion

Spoofing Countermeasure

Contact

Dipjyoti Paul

Ph.D. Student | Research Scientist | Machine Learning | Deep Learning | Audio Signal Processing | Computer Vision | Conversational AI

Computer Science Department, University of Crete

Biography

Skills

Experience

Selected Publications & Patents

Accomplish­ments

Projects

Image Translation

Image Generation

Text-to-Speech Synthesis

Voice Conversion

Spoofing Countermeasure

Contact

Accomplishments