Voice Recognition Engineer – Browser-Based Speech Interfaces
New York Technology Partners·US·Remote Friendly
Posted 1w ago
Full-TimeUSD 90,000–120,000
Apply Now About the Role
Senior Technical Recruiter/Trainer @ New York Technology Partners | Resume Writer
Position Type: Contract
Location: Remote
Key Responsibilities
• Develop and optimize voice recognition functionality across Chrome, Edge, Safari, Firefox, and Brave.
• Ensure consistent performance, compatibility, and user experience across desktop, laptop, mobile, and tablet environments.
• Customize and extend the Web Speech API and integrate third‑party speech frameworks, including (but not limited to):
• ElevenLabs (Scribe)
• Deepgram
• OpenAI Whisper API
• Amazon Transcribe / Polly
Performance, Accuracy & Resilience
• Optimize recognition speed, accuracy, and robustness, especially in noisy or low‑bandwidth environments.
• Conduct benchmarking and tuning for real‑world usage scenarios across diverse accents, languages, and acoustic conditions.
User Experience & Accessibility
• Collaborate with product and design teams to build intuitive, inclusive voice interactions.
• Support configurable speech duration thresholds and accessibility best practices for users with varying abilities.
• Partner with technical leads and product managers to align voice capabilities with product roadmap.
• Support client‑facing pilots, demos, and proof‑of‑concept initiatives.
Ideal Candidate Profile
• API Tailor: Deep familiarity with Web Speech API and at least one major commercial speech‑to‑text platform.
• Accuracy‑Focused: Passionate about refining speech models for real‑world reliability, speed, and multilingual performance.
• Collaborative Partner: Communicates effectively with cross‑functional teams (engineering, product, UX).
• Innovative Builder: Enjoys prototyping, problem‑solving, and elevating voice interaction beyond basic transcription.
Required Qualifications
• Must have hands‑on experience with Web Speech API + at least one other commercial speech framework.
• Implement custom logic for error handling, timeout management, speech completion detection, and multilingual support.
• Minimum 3+ years of experience in speech recognition, voice UI, or audio processing.
• Demonstrated work with Web Speech API and at least one of the following: ElevenLabs, AssemblyAI, Deepgram, OpenAI Whisper, Google Cloud STT, Azure Speech, or Amazon Transcribe.
• Understanding of latency, privacy, and security considerations in client‑side voice processing.
Preferred Qualifications
• Experience with WebRTC, MediaRecorder API, or AudioContext.
• Background in natural language understanding (NLU) or voice assistant development.
• Contributions to open‑source speech or accessibility projects.
Seniority Level
Mid‑Senior level
Employment Type
Contract
Job Function
Information Technology and Engineering
Industries
Research Services
What you'll do
- Develop and optimize voice recognition functionality across Chrome, Edge, Safari, Firefox, and Brave
- Ensure consistent performance, compatibility, and user experience across desktop, laptop, mobile, and tablet environments
- Customize and extend the Web Speech API and integrate third‑party speech frameworks, including (but not limited to):
- ElevenLabs (Scribe)
- Deepgram
- OpenAI Whisper API
- Amazon Transcribe / Polly
- Optimize recognition speed, accuracy, and robustness, especially in noisy or low‑bandwidth environments
- Conduct benchmarking and tuning for real‑world usage scenarios across diverse accents, languages, and acoustic conditions
- Support client‑facing pilots, demos, and proof‑of‑concept initiatives
Requirements
- User Experience & Accessibility
- Collaborate with product and design teams to build intuitive, inclusive voice interactions
- Support configurable speech duration thresholds and accessibility best practices for users with varying abilities
- Partner with technical leads and product managers to align voice capabilities with product roadmap
- API Tailor: Deep familiarity with Web Speech API and at least one major commercial speech‑to‑text platform
- Accuracy‑Focused: Passionate about refining speech models for real‑world reliability, speed, and multilingual performance
- Collaborative Partner: Communicates effectively with cross‑functional teams (engineering, product, UX)
- Innovative Builder: Enjoys prototyping, problem‑solving, and elevating voice interaction beyond basic transcription
- Must have hands‑on experience with Web Speech API + at least one other commercial speech framework
- Implement custom logic for error handling, timeout management, speech completion detection, and multilingual support
- Minimum 3+ years of experience in speech recognition, voice UI, or audio processing
- Demonstrated work with Web Speech API and at least one of the following: ElevenLabs, AssemblyAI, Deepgram, OpenAI Whisper, Google Cloud STT, Azure Speech, or Amazon Transcribe
- Understanding of latency, privacy, and security considerations in client‑side voice processing