Research Engineer – Audio & Speech Models
Zyphra·California, US
Posted 2928w ago
Full-Time
Apply Now About the Role
About the position
As a Research Engineer - Audio & Speech Models , you will be a core contributor on Zyphra’s Audio Team, building the next generation of open-source text-to-speech and audio models. You will be deeply involved in the entire model training process from data gathering and processing to designing novel architectures and training methodologies. You’ll work across: Large-scale audio training runs Performance optimization of our training stack Audio dataset collection, processing, and evaluation Architecture and training methodology ablations and improvements
Requirements
• Strong research taste and intuition.
• The ability to work through a research project from conception to execution to write-up.
• Strong implementation and prototyping ability (can take an idea from conception to experimentation quickly)
• The ability to work well with others in a high-paced research setting
• Can rapidly learn new fields and are excited to implement new ideas
• Excellent communication and collaboration skills, and can work effectively on both research and engineering implementation at scale.
• Proficiency with PyTorch and Python.
• Experience contributing to large pre-existing codebases and rapidly getting up to speed.
Nice-to-haves
• Expertise and intuition for training models in the audio domain, including text-to-speech, ASR, speech-to-speech, speech-emotion-recognition, or other models
• Experience in training audio autoencoders.
• Understanding of signal processing, especially of audio signals.
• Experience with diffusion models, consistency models, or GANs
• Experience with training on large-scale (multi-node) GPU clusters
• Strong grasp of proper experimental methodology for running rigorous ablations and other hypothesis testing
• Understanding of and interest in large-scale, highly parallel data processing pipelines.
• Previously published machine learning research in well-respected venues.
• Postgraduate degree in a scientific subject (Computer Science, EE/EECS, Mathematics, Physics, Machine Learning)
Benefits
• Comprehensive medical, dental, vision, and FSA plans
• Competitive compensation and 401(k)
• Relocation and immigration support on a case-by-case basis
• On-site meals prepared by a dedicated culinary team; Thursday Happy Hours
• In-person team in Palo Alto, CA, with a collaborative, high-energy environment
What you'll do
- You will be deeply involved in the entire model training process from data gathering and processing to designing novel architectures and training methodologies
- You’ll work across: Large-scale audio training runs Performance optimization of our training stack Audio dataset collection, processing, and evaluation Architecture and training methodology ablations and improvements
- Expertise and intuition for training models in the audio domain, including text-to-speech, ASR, speech-to-speech, speech-emotion-recognition, or other models
- Understanding of signal processing, especially of audio signals
Requirements
- Strong research taste and intuition
- The ability to work through a research project from conception to execution to write-up
- Strong implementation and prototyping ability (can take an idea from conception to experimentation quickly)
- The ability to work well with others in a high-paced research setting
- Can rapidly learn new fields and are excited to implement new ideas
- Excellent communication and collaboration skills, and can work effectively on both research and engineering implementation at scale
- Proficiency with PyTorch and Python
- Experience contributing to large pre-existing codebases and rapidly getting up to speed
- Experience in training audio autoencoders
- Experience with diffusion models, consistency models, or GANs
- Experience with training on large-scale (multi-node) GPU clusters
- Strong grasp of proper experimental methodology for running rigorous ablations and other hypothesis testing
- Understanding of and interest in large-scale, highly parallel data processing pipelines
- Previously published machine learning research in well-respected venues
- Postgraduate degree in a scientific subject (Computer Science, EE/EECS, Mathematics, Physics, Machine Learning)
Benefits
- Comprehensive medical, dental, vision, and FSA plans
- Competitive compensation and 401(k)
- Relocation and immigration support on a case-by-case basis