Job Category: IT
Duration: 6+ Months

Key Responsibilities:

  • Train speech synthesis mel spectrogram and vocoder models
  • Measure and benchmark model performance across use cases
  • Maintain and enhance text to speech evaluation systems
  • Analyze model accuracy and bias and recommend improvements
  • Improve processes related to speech data preparation, augmentation, and filtering
  • Develop and refine training datasets for speech models
  • Characterize performance and quality metrics across different platforms
  • Collaborate with cross functional teams to deliver new product features
  • Participate in code development, design reviews, and test planning
  • Identify issues, propose solutions, and contribute to continuous innovation

Required Qualifications:

  • Master’s degree or PhD in Computer Science, Electrical Engineering, Artificial Intelligence, Applied Mathematics, Linguistics, or Computational Linguistics or equivalent experience
  • Minimum of 5 years of relevant experience
  • Strong programming skills in Python
  • Solid understanding of programming fundamentals and software design
  • Deep knowledge of machine learning and deep learning techniques including CNN, RNN, LSTM, and Transformers
  • Experience applying deep learning to speech synthesis, large language models, and speech to speech translation
  • Hands on experience with speech technologies such as speech synthesis and voice cloning
  • Experience training speech models
  • Proficiency with PyTorch deep learning frameworks
  • Knowledge of speech signal processing techniques including FFT, MFCC, and mel spectrograms
  • Familiarity with version control tools such as Git, Gerrit, or GitLab
  • Strong collaboration and communication skills in a matrixed environment

Preferred Qualifications:

  • Fluency in one or more languages such as Spanish, Mandarin, German, Japanese, Russian, French, Arabic, Hindi, Korean, Italian, or Portuguese
  • Experience with multilingual or code switched text to speech systems
  • Experience with voice cloning and cross lingual voice cloning
  • Knowledge of text normalization and inverse text normalization using neural networks or WFST
  • Experience working with grapheme to phoneme systems for multiple languages
  • Interest in linguistics, phonetics, and language technologies
  • Strong C plus plus programming skills
  • Familiarity with GPU technologies such as CUDA, cuDNN, or TensorRT
  • Experience deploying machine learning models to cloud, data center, or embedded systems

 

Apply for this position

Allowed Type(s): .pdf, .doc, .docx