Key Responsibilities:

Train speech synthesis mel spectrogram and vocoder models
Measure and benchmark model performance across use cases
Maintain and enhance text to speech evaluation systems
Analyze model accuracy and bias and recommend improvements
Improve processes related to speech data preparation, augmentation, and filtering
Develop and refine training datasets for speech models
Characterize performance and quality metrics across different platforms
Collaborate with cross functional teams to deliver new product features
Participate in code development, design reviews, and test planning
Identify issues, propose solutions, and contribute to continuous innovation

Required Qualifications:

Master’s degree or PhD in Computer Science, Electrical Engineering, Artificial Intelligence, Applied Mathematics, Linguistics, or Computational Linguistics or equivalent experience
Minimum of 5 years of relevant experience
Strong programming skills in Python
Solid understanding of programming fundamentals and software design
Deep knowledge of machine learning and deep learning techniques including CNN, RNN, LSTM, and Transformers
Experience applying deep learning to speech synthesis, large language models, and speech to speech translation
Hands on experience with speech technologies such as speech synthesis and voice cloning
Experience training speech models
Proficiency with PyTorch deep learning frameworks
Knowledge of speech signal processing techniques including FFT, MFCC, and mel spectrograms
Familiarity with version control tools such as Git, Gerrit, or GitLab
Strong collaboration and communication skills in a matrixed environment

Preferred Qualifications:

Fluency in one or more languages such as Spanish, Mandarin, German, Japanese, Russian, French, Arabic, Hindi, Korean, Italian, or Portuguese
Experience with multilingual or code switched text to speech systems
Experience with voice cloning and cross lingual voice cloning
Knowledge of text normalization and inverse text normalization using neural networks or WFST
Experience working with grapheme to phoneme systems for multiple languages
Interest in linguistics, phonetics, and language technologies
Strong C plus plus programming skills
Familiarity with GPU technologies such as CUDA, cuDNN, or TensorRT
Experience deploying machine learning models to cloud, data center, or embedded systems

Vailexa : Faster Solutions, Smarter Talent

Apply for this position