HomeSpeech Recognition EngineerResearch Engineer – Audio & Speech Models

Research Engineer – Audio & Speech Models

Zyphra·California, US

Posted 2928w ago

Full-Time
Apply Now

About the Role

About the position As a Research Engineer - Audio & Speech Models , you will be a core contributor on Zyphra’s Audio Team, building the next generation of open-source text-to-speech and audio models. You will be deeply involved in the entire model training process from data gathering and processing to designing novel architectures and training methodologies. You’ll work across: Large-scale audio training runs Performance optimization of our training stack Audio dataset collection, processing, and evaluation Architecture and training methodology ablations and improvements Requirements • Strong research taste and intuition. • The ability to work through a research project from conception to execution to write-up. • Strong implementation and prototyping ability (can take an idea from conception to experimentation quickly) • The ability to work well with others in a high-paced research setting • Can rapidly learn new fields and are excited to implement new ideas • Excellent communication and collaboration skills, and can work effectively on both research and engineering implementation at scale. • Proficiency with PyTorch and Python. • Experience contributing to large pre-existing codebases and rapidly getting up to speed. Nice-to-haves • Expertise and intuition for training models in the audio domain, including text-to-speech, ASR, speech-to-speech, speech-emotion-recognition, or other models • Experience in training audio autoencoders. • Understanding of signal processing, especially of audio signals. • Experience with diffusion models, consistency models, or GANs • Experience with training on large-scale (multi-node) GPU clusters • Strong grasp of proper experimental methodology for running rigorous ablations and other hypothesis testing • Understanding of and interest in large-scale, highly parallel data processing pipelines. • Previously published machine learning research in well-respected venues. • Postgraduate degree in a scientific subject (Computer Science, EE/EECS, Mathematics, Physics, Machine Learning) Benefits • Comprehensive medical, dental, vision, and FSA plans • Competitive compensation and 401(k) • Relocation and immigration support on a case-by-case basis • On-site meals prepared by a dedicated culinary team; Thursday Happy Hours • In-person team in Palo Alto, CA, with a collaborative, high-energy environment

What you'll do

  • You will be deeply involved in the entire model training process from data gathering and processing to designing novel architectures and training methodologies
  • You’ll work across: Large-scale audio training runs Performance optimization of our training stack Audio dataset collection, processing, and evaluation Architecture and training methodology ablations and improvements
  • Expertise and intuition for training models in the audio domain, including text-to-speech, ASR, speech-to-speech, speech-emotion-recognition, or other models
  • Understanding of signal processing, especially of audio signals

Requirements

  • Strong research taste and intuition
  • The ability to work through a research project from conception to execution to write-up
  • Strong implementation and prototyping ability (can take an idea from conception to experimentation quickly)
  • The ability to work well with others in a high-paced research setting
  • Can rapidly learn new fields and are excited to implement new ideas
  • Excellent communication and collaboration skills, and can work effectively on both research and engineering implementation at scale
  • Proficiency with PyTorch and Python
  • Experience contributing to large pre-existing codebases and rapidly getting up to speed
  • Experience in training audio autoencoders
  • Experience with diffusion models, consistency models, or GANs
  • Experience with training on large-scale (multi-node) GPU clusters
  • Strong grasp of proper experimental methodology for running rigorous ablations and other hypothesis testing
  • Understanding of and interest in large-scale, highly parallel data processing pipelines
  • Previously published machine learning research in well-respected venues
  • Postgraduate degree in a scientific subject (Computer Science, EE/EECS, Mathematics, Physics, Machine Learning)

Benefits

  • Comprehensive medical, dental, vision, and FSA plans
  • Competitive compensation and 401(k)
  • Relocation and immigration support on a case-by-case basis
Back to all jobs