HomeSpeech Recognition EngineerVoice Recognition Engineer – Browser-Based Speech Interfaces

Voice Recognition Engineer – Browser-Based Speech Interfaces

New York Technology Partners·US·Remote Friendly

Posted 1w ago

Full-TimeUSD 90,000–120,000

About the Role

Senior Technical Recruiter/Trainer @ New York Technology Partners | Resume Writer Position Type: Contract Location: Remote Key Responsibilities • Develop and optimize voice recognition functionality across Chrome, Edge, Safari, Firefox, and Brave. • Ensure consistent performance, compatibility, and user experience across desktop, laptop, mobile, and tablet environments. • Customize and extend the Web Speech API and integrate third‑party speech frameworks, including (but not limited to): • ElevenLabs (Scribe) • Deepgram • OpenAI Whisper API • Amazon Transcribe / Polly Performance, Accuracy & Resilience • Optimize recognition speed, accuracy, and robustness, especially in noisy or low‑bandwidth environments. • Conduct benchmarking and tuning for real‑world usage scenarios across diverse accents, languages, and acoustic conditions. User Experience & Accessibility • Collaborate with product and design teams to build intuitive, inclusive voice interactions. • Support configurable speech duration thresholds and accessibility best practices for users with varying abilities. • Partner with technical leads and product managers to align voice capabilities with product roadmap. • Support client‑facing pilots, demos, and proof‑of‑concept initiatives. Ideal Candidate Profile • API Tailor: Deep familiarity with Web Speech API and at least one major commercial speech‑to‑text platform. • Accuracy‑Focused: Passionate about refining speech models for real‑world reliability, speed, and multilingual performance. • Collaborative Partner: Communicates effectively with cross‑functional teams (engineering, product, UX). • Innovative Builder: Enjoys prototyping, problem‑solving, and elevating voice interaction beyond basic transcription. Required Qualifications • Must have hands‑on experience with Web Speech API + at least one other commercial speech framework. • Implement custom logic for error handling, timeout management, speech completion detection, and multilingual support. • Minimum 3+ years of experience in speech recognition, voice UI, or audio processing. • Demonstrated work with Web Speech API and at least one of the following: ElevenLabs, AssemblyAI, Deepgram, OpenAI Whisper, Google Cloud STT, Azure Speech, or Amazon Transcribe. • Understanding of latency, privacy, and security considerations in client‑side voice processing. Preferred Qualifications • Experience with WebRTC, MediaRecorder API, or AudioContext. • Background in natural language understanding (NLU) or voice assistant development. • Contributions to open‑source speech or accessibility projects. Seniority Level Mid‑Senior level Employment Type Contract Job Function Information Technology and Engineering Industries Research Services

What you'll do

Develop and optimize voice recognition functionality across Chrome, Edge, Safari, Firefox, and Brave
Ensure consistent performance, compatibility, and user experience across desktop, laptop, mobile, and tablet environments
Customize and extend the Web Speech API and integrate third‑party speech frameworks, including (but not limited to):
ElevenLabs (Scribe)
Deepgram
OpenAI Whisper API
Amazon Transcribe / Polly
Optimize recognition speed, accuracy, and robustness, especially in noisy or low‑bandwidth environments
Conduct benchmarking and tuning for real‑world usage scenarios across diverse accents, languages, and acoustic conditions
Support client‑facing pilots, demos, and proof‑of‑concept initiatives

Requirements

User Experience & Accessibility
Collaborate with product and design teams to build intuitive, inclusive voice interactions
Support configurable speech duration thresholds and accessibility best practices for users with varying abilities
Partner with technical leads and product managers to align voice capabilities with product roadmap
API Tailor: Deep familiarity with Web Speech API and at least one major commercial speech‑to‑text platform
Accuracy‑Focused: Passionate about refining speech models for real‑world reliability, speed, and multilingual performance
Collaborative Partner: Communicates effectively with cross‑functional teams (engineering, product, UX)
Innovative Builder: Enjoys prototyping, problem‑solving, and elevating voice interaction beyond basic transcription
Must have hands‑on experience with Web Speech API + at least one other commercial speech framework
Implement custom logic for error handling, timeout management, speech completion detection, and multilingual support
Minimum 3+ years of experience in speech recognition, voice UI, or audio processing
Demonstrated work with Web Speech API and at least one of the following: ElevenLabs, AssemblyAI, Deepgram, OpenAI Whisper, Google Cloud STT, Azure Speech, or Amazon Transcribe
Understanding of latency, privacy, and security considerations in client‑side voice processing

Back to all jobs