Speech Recognition API: Transcribe Audio to Text
Start Building with Hypereal
Access Kling, Flux, Sora, Veo & more through a single API. Free credits to start, scale to millions.
No credit card required • 100k+ developers • Enterprise ready
What is the Speech Recognition API?
The Speech Recognition API (ASR - Automatic Speech Recognition) transcribes audio files into text. It supports multiple languages and can provide precise timestamps for each segment of speech.
Use Cases
- Transcription Services: Convert meetings, interviews, and lectures to text
- Subtitles & Captions: Generate subtitles for videos with timestamps
- Voice Commands: Process voice input for applications
- Content Indexing: Make audio content searchable
- Accessibility: Create text versions of audio content
API Parameters
Required Parameters
| Parameter | Type | Description |
|---|---|---|
audio |
string | URL to the audio file to transcribe |
Optional Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
language |
string | — | Language code (e.g., en, zh, ja, es) |
ignore_timestamps |
boolean | true |
Set to false to get precise timestamps |
Pricing
| Usage | Price (USD) | Credits |
|---|---|---|
| Per minute of audio | $0.006 | ~1 |
Based on $0.36 per audio hour.
How to Use Speech Recognition API
Step 1: Create an Account
Sign up at Hypereal to get started.
Step 2: Get Your API Key
Generate your API key from the dashboard.
Step 3: Make Your API Call
const response = await fetch('https://api.hypereal.com/v1/audio/generate', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'audio-asr',
audio: 'https://example.com/speech-recording.mp3',
language: 'en',
ignore_timestamps: false
})
});
const result = await response.json();
console.log(result.text);
console.log(result.segments); // With timestamps
Response Format
{
"text": "Hello, welcome to our presentation today.",
"duration": 5.2,
"segments": [
{ "text": "Hello,", "start": 0.0, "end": 0.8 },
{ "text": "welcome to our presentation today.", "start": 0.9, "end": 5.2 }
]
}
Best Practices
- Specify language - Providing the language code improves accuracy
- Audio quality matters - Clear audio produces better transcriptions
- Use timestamps wisely - Enable timestamps only when needed (adds latency for short audio)
- Supported formats - Use MP3, WAV, M4A, or FLAC for best compatibility
Supported Languages
The API supports multiple languages including:
- English (en)
- Chinese (zh)
- Japanese (ja)
- Spanish (es)
- French (fr)
- German (de)
- And many more
FAQ
What is the maximum audio length?
There's no hard limit. Long audio files are processed in segments.
How accurate is the transcription?
Accuracy depends on audio quality and clarity. Clear speech typically achieves 95%+ accuracy.
Can I get word-level timestamps?
Yes, set ignore_timestamps: false to receive segment-level timestamps.
Why Choose Hypereal?
Access Speech Recognition and 100+ other AI models through a single, unified API.
- One API key for all models
- Unified billing across providers
- Competitive pricing with volume discounts
Get Started Free - No credit card required.
Related Articles
Start Building Today
Get 35 free credits on signup. No credit card required. Generate your first image in under 5 minutes.
