Whisper Description

We have developed and are open-sourcing Whisper, a neural network that approximates human-level robustness in English speech recognition. Whisper is an automated speech recognition (ASR), system that was trained using 680,000 hours of multilingual, multitask supervised data from the internet. The use of such a diverse dataset results in a better resistance to accents, background noise, technical language, and other linguistic issues. It also allows transcription in multiple languages and translation from these languages into English. We provide inference code and open-sourcing models to help you build useful applications and further research on robust speech processing. The Whisper architecture is an end-to-end, simple approach that can be used as an encoder/decoder Transformer. The input audio is divided into 30-second chunks and converted into a log Mel spectrogram. This then goes into an encoder.

Integrations

API:
Yes, Whisper has an API

Reviews

Total
ease
features
design
support

No User Reviews. Be the first to provide a review:

Write a Review

Company Details

Company:
OpenAI
Headquarters:
United States
Website:
openai.com/blog/whisper/

Media

Whisper Screenshot 1
Recommended Products
Twilio Segment - the Leading Customer Data Platform Icon
Twilio Segment - the Leading Customer Data Platform

Get started with a free Segment account and access 450+ integrations, features and capabilities.

We’ll show you how Segment helps 25,000+ businesses to collect, clean, and activate their data. Twilio is the #1 CDP for market share four years in a row per IDC (2019-2022).

Product Details

Platforms
SaaS
Type of Training
Documentation
Webinars
Videos
Customer Support
Online

Whisper Features and Options

Speech Recognition Software

Audio Capture
Automatic Form Fill
Automatic Transcription
Call Analysis
Concatenated Speech
Continuous Speech
Customizable Macros
Multi-Languages
Specialty Vocabularies
Speech-to-Text Analysis
Variable Frequency
Voice Recognition

Transcription Software

AI / Machine Learning
Annotations
Audio/Video File Upload
Automatic Transcription
Collaboration Tools
File Sharing
For Manual Transcription
Full Text Search
Multi-Language Support
Natural Language Processing (NLP)
Playback Controls
Speech Recognition
Subtitles
Text Editor
Timecoding

Whisper Lists