Speech Recognition API: Transcribe Audio to Text

What is the Speech Recognition API?

The Speech Recognition API (ASR - Automatic Speech Recognition) transcribes audio files into text. It supports multiple languages and can provide precise timestamps for each segment of speech.

Use Cases

Transcription Services: Convert meetings, interviews, and lectures to text
Subtitles & Captions: Generate subtitles for videos with timestamps
Voice Commands: Process voice input for applications
Content Indexing: Make audio content searchable
Accessibility: Create text versions of audio content

API Parameters

Required Parameters

Parameter	Type	Description
`audio`	string	URL to the audio file to transcribe

Optional Parameters

Parameter	Type	Default	Description
`language`	string	—	Language code (e.g., `en`, `zh`, `ja`, `es`)
`ignore_timestamps`	boolean	`true`	Set to `false` to get precise timestamps

Pricing

Usage	Price (USD)	Credits
Per minute of audio	$0.006	~1

Based on $0.36 per audio hour.

How to Use Speech Recognition API

Step 1: Create an Account

Step 2: Get Your API Key

Generate your API key from the dashboard.

Step 3: Make Your API Call

const response = await fetch('https://api.hypereal.com/v1/audio/generate', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'audio-asr',
    audio: 'https://example.com/speech-recording.mp3',
    language: 'en',
    ignore_timestamps: false
  })
});

const result = await response.json();
console.log(result.text);
console.log(result.segments); // With timestamps

Response Format

{
  "text": "Hello, welcome to our presentation today.",
  "duration": 5.2,
  "segments": [
    { "text": "Hello,", "start": 0.0, "end": 0.8 },
    { "text": "welcome to our presentation today.", "start": 0.9, "end": 5.2 }
  ]
}

Best Practices

Specify language - Providing the language code improves accuracy
Audio quality matters - Clear audio produces better transcriptions
Use timestamps wisely - Enable timestamps only when needed (adds latency for short audio)
Supported formats - Use MP3, WAV, M4A, or FLAC for best compatibility

Supported Languages

The API supports multiple languages including:

English (en)
Chinese (zh)
Japanese (ja)
Spanish (es)
French (fr)
German (de)
And many more

FAQ

What is the maximum audio length?

There's no hard limit. Long audio files are processed in segments.

How accurate is the transcription?

Accuracy depends on audio quality and clarity. Clear speech typically achieves 95%+ accuracy.

Can I get word-level timestamps?

Yes, set ignore_timestamps: false to receive segment-level timestamps.

Why Choose Hypereal?

Access Speech Recognition and 100+ other AI models through a single, unified API.

One API key for all models
Unified billing across providers
Competitive pricing with volume discounts

Get Started Free - No credit card required.

Parameter

Type

Description

audio

string

URL to the audio file to transcribe

Parameter

Type

Default

Description

language

string

—

Language code (e.g., en, zh, ja, es)

ignore_timestamps

boolean

true

Set to false to get precise timestamps

Usage

Price (USD)

Credits

Per minute of audio

$0.006

How to Use Speech Recognition API

Step 1: Create an Account

Step 2: Get Your API Key

Generate your API key from the dashboard.

Step 3: Make Your API Call

const response = await fetch('https://api.hypereal.com/v1/audio/generate', { method: 'POST', headers: { 'Authorization': 'Bearer YOUR_API_KEY', 'Content-Type': 'application/json' }, body: JSON.stringify({ model: 'audio-asr', audio: 'https://example.com/speech-recording.mp3', language: 'en', ignore_timestamps: false }) }); const result = await response.json(); console.log(result.text); console.log(result.segments); // With timestamps

Response Format

{ "text": "Hello, welcome to our presentation today.", "duration": 5.2, "segments": [ { "text": "Hello,", "start": 0.0, "end": 0.8 }, { "text": "welcome to our presentation today.", "start": 0.9, "end": 5.2 } ] }

FAQ

What is the maximum audio length?

There's no hard limit. Long audio files are processed in segments.

How accurate is the transcription?

Accuracy depends on audio quality and clarity. Clear speech typically achieves 95%+ accuracy.

Can I get word-level timestamps?

Yes, set ignore_timestamps: false to receive segment-level timestamps.

Start Building with Hypereal

What is the Speech Recognition API?

Use Cases

API Parameters

Required Parameters

Optional Parameters

Pricing

How to Use Speech Recognition API

Step 1: Create an Account

Step 2: Get Your API Key

Step 3: Make Your API Call

Response Format

Best Practices

Supported Languages

FAQ

What is the maximum audio length?

How accurate is the transcription?

Can I get word-level timestamps?

Why Choose Hypereal?

Related Articles

Where to Use Minimax Music 2.0 API: Top Use Cases for AI Music Generation

How to Use AI Music Generation API: Create Tracks via REST API (2026)

How to Use AI Voice Cloning API: Clone Any Voice in Seconds (2026)

Start Building Today

Start Building with Hypereal

What is the Speech Recognition API?

Use Cases

API Parameters

Required Parameters

Optional Parameters

Pricing

How to Use Speech Recognition API

Step 1: Create an Account

Step 2: Get Your API Key

Step 3: Make Your API Call

Response Format

Best Practices

Supported Languages

FAQ

What is the maximum audio length?

How accurate is the transcription?

Can I get word-level timestamps?

Why Choose Hypereal?

Related Articles

Where to Use Minimax Music 2.0 API: Top Use Cases for AI Music Generation

How to Use AI Music Generation API: Create Tracks via REST API (2026)

How to Use AI Voice Cloning API: Clone Any Voice in Seconds (2026)

Start Building Today