HomeSpeech Recognition EngineerVoice Recognition Engineer – Browser-Based Speech Interfaces

Voice Recognition Engineer – Browser-Based Speech Interfaces

New York Technology Partners·US·Remote Friendly

Posted 1w ago

Full-TimeUSD 90,000–120,000
Apply Now

About the Role

Senior Technical Recruiter/Trainer @ New York Technology Partners | Resume Writer Position Type: Contract Location: Remote Key Responsibilities • Develop and optimize voice recognition functionality across Chrome, Edge, Safari, Firefox, and Brave. • Ensure consistent performance, compatibility, and user experience across desktop, laptop, mobile, and tablet environments. • Customize and extend the Web Speech API and integrate third‑party speech frameworks, including (but not limited to): • ElevenLabs (Scribe) • Deepgram • OpenAI Whisper API • Amazon Transcribe / Polly Performance, Accuracy & Resilience • Optimize recognition speed, accuracy, and robustness, especially in noisy or low‑bandwidth environments. • Conduct benchmarking and tuning for real‑world usage scenarios across diverse accents, languages, and acoustic conditions. User Experience & Accessibility • Collaborate with product and design teams to build intuitive, inclusive voice interactions. • Support configurable speech duration thresholds and accessibility best practices for users with varying abilities. • Partner with technical leads and product managers to align voice capabilities with product roadmap. • Support client‑facing pilots, demos, and proof‑of‑concept initiatives. Ideal Candidate Profile • API Tailor: Deep familiarity with Web Speech API and at least one major commercial speech‑to‑text platform. • Accuracy‑Focused: Passionate about refining speech models for real‑world reliability, speed, and multilingual performance. • Collaborative Partner: Communicates effectively with cross‑functional teams (engineering, product, UX). • Innovative Builder: Enjoys prototyping, problem‑solving, and elevating voice interaction beyond basic transcription. Required Qualifications • Must have hands‑on experience with Web Speech API + at least one other commercial speech framework. • Implement custom logic for error handling, timeout management, speech completion detection, and multilingual support. • Minimum 3+ years of experience in speech recognition, voice UI, or audio processing. • Demonstrated work with Web Speech API and at least one of the following: ElevenLabs, AssemblyAI, Deepgram, OpenAI Whisper, Google Cloud STT, Azure Speech, or Amazon Transcribe. • Understanding of latency, privacy, and security considerations in client‑side voice processing. Preferred Qualifications • Experience with WebRTC, MediaRecorder API, or AudioContext. • Background in natural language understanding (NLU) or voice assistant development. • Contributions to open‑source speech or accessibility projects. Seniority Level Mid‑Senior level Employment Type Contract Job Function Information Technology and Engineering Industries Research Services

What you'll do

  • Develop and optimize voice recognition functionality across Chrome, Edge, Safari, Firefox, and Brave
  • Ensure consistent performance, compatibility, and user experience across desktop, laptop, mobile, and tablet environments
  • Customize and extend the Web Speech API and integrate third‑party speech frameworks, including (but not limited to):
  • ElevenLabs (Scribe)
  • Deepgram
  • OpenAI Whisper API
  • Amazon Transcribe / Polly
  • Optimize recognition speed, accuracy, and robustness, especially in noisy or low‑bandwidth environments
  • Conduct benchmarking and tuning for real‑world usage scenarios across diverse accents, languages, and acoustic conditions
  • Support client‑facing pilots, demos, and proof‑of‑concept initiatives

Requirements

  • User Experience & Accessibility
  • Collaborate with product and design teams to build intuitive, inclusive voice interactions
  • Support configurable speech duration thresholds and accessibility best practices for users with varying abilities
  • Partner with technical leads and product managers to align voice capabilities with product roadmap
  • API Tailor: Deep familiarity with Web Speech API and at least one major commercial speech‑to‑text platform
  • Accuracy‑Focused: Passionate about refining speech models for real‑world reliability, speed, and multilingual performance
  • Collaborative Partner: Communicates effectively with cross‑functional teams (engineering, product, UX)
  • Innovative Builder: Enjoys prototyping, problem‑solving, and elevating voice interaction beyond basic transcription
  • Must have hands‑on experience with Web Speech API + at least one other commercial speech framework
  • Implement custom logic for error handling, timeout management, speech completion detection, and multilingual support
  • Minimum 3+ years of experience in speech recognition, voice UI, or audio processing
  • Demonstrated work with Web Speech API and at least one of the following: ElevenLabs, AssemblyAI, Deepgram, OpenAI Whisper, Google Cloud STT, Azure Speech, or Amazon Transcribe
  • Understanding of latency, privacy, and security considerations in client‑side voice processing
Back to all jobs