Arabic speech,
recognized.every dialect.
Voicexa transcribes Arabic in real time across Khaleeji, MSA, Egyptian, and Levantine dialects, with speaker diarization, custom vocabulary, and on-premises deployment.
All major Arabic dialects, one APILive Transcription
Voicexa Stream
Detected Dialect
Streaming Transcript
Speaker 1
السلام عليكم، كيف أقدر أساعدك اليوم؟
How Voicexa Works
From microphone to transcript in four steps, all over a single API.
Send Audio
Stream audio over WebSocket in real time, or upload a file. Voicexa accepts MP3, WAV, FLAC, OGG, and M4A from the browser, mobile, or your backend.
AI Transcribes
The acoustic model decodes speech with sub-second latency. Speaker diarization, punctuation, and named entities are added inline as the audio comes in.
Dialect Detected
Voicexa identifies the regional dialect on the fly, Khaleeji, MSA, Egyptian, or Levantine, and adapts the language model so the transcript matches how it was actually spoken.
Get Text
Receive the transcript with word-level timestamps, speaker labels, and confidence scores. Stream it live, store it in history, or pipe it straight into your stack.
Built for the way Arabic is actually spoken.
Real dialects, real code-switching, real accuracy on the audio that your customers, agents, and citizens are producing today.
Every Major Arabic Dialect
Voicexa is trained on one of the largest Gulf Arabic speech datasets ever assembled, and extends across MSA, Egyptian, and Levantine. Code-switching with English and Hindi is handled inline, no toggle required.
Real-Time Diarization
Two speakers, ten speakers, an entire call centre queue: Voicexa attributes every utterance to the right speaker as the audio streams in.
Agent
Speaker 1, 51% airtime
Customer
Speaker 2, 49% airtime
Sub-500ms Streaming
Real-time captions for live broadcasts, agent assist, and voice interfaces with latency that feels instant.
280ms
Median latencyA simple, well-documented API
REST and WebSocket endpoints, official Python and JavaScript SDKs, and on-premises deployment for regulated workloads, with a visible badge for data residency.
See exactly what Voicexa does.
From the live console to the analytics view, every screen in this section is the actual product.
One console for every transcription job.
The Voicexa overview gives operators a single view of streaming sessions, batch jobs, and recent transcripts, with usage trends, error rates, and queue status updating live.
- Live status of every streaming session and batch job
- Usage by hour, day, and project for billing and capacity
- Per-job latency and word error rate at a glance
- Quick links to recent transcripts and saved presets

Type-as-you-speak Arabic transcription.
Stream from a microphone or any audio source and watch the transcript build word by word, with speaker labels, punctuation, and confidence scores arriving inline.
- Word-by-word streaming with sub-500ms latency
- Speaker diarization for multi-party conversations
- Inline punctuation, casing, and named entity tagging
- Save, export, or pipe directly into your downstream system

Pick the language model, or let it pick itself.
Voicexa ships with a family of language models tuned for Khaleeji, MSA, Egyptian, and Levantine Arabic. Auto-detect routes audio to the right model, or pin a model per project.
- Auto-detect dialect from the first seconds of audio
- Manual model selection per project or per session
- Custom vocabulary and named entities per workspace
- Code-switching with English and Hindi handled inline

Searchable archive of every transcript.
Every session is saved with audio, transcript, speaker labels, and metadata. Full-text search and filters make it trivial to find a specific call, meeting, or broadcast moment.
- Full-text search across every transcript in your workspace
- Filter by date, speaker, dialect, project, or tag
- Side-by-side audio playback synced to the transcript
- Bulk export as JSON, SRT, VTT, or plain text

REST and WebSocket, with first-class SDKs.
The Voicexa API is built around two endpoints: a streaming WebSocket for real-time audio and a REST endpoint for batch files. Official Python and JavaScript SDKs cover the rest.
- WebSocket streaming with backpressure and reconnection
- REST batch endpoint with webhook callbacks
- API keys with per-key quotas and rate limits
- Live request logs with replay for debugging

Audio analytics, not just transcripts.
Voicexa tracks accuracy, latency, throughput, and cost across every project. Spot regressions, capacity ceilings, and the projects driving spend before they become problems.
- Word error rate trends per dialect and per project
- Latency distribution across streaming sessions
- Throughput, concurrency, and queue depth in real time
- Cost breakdown by project, model, and team member

From contact centres to public sector archives.
Voicexa runs in the workloads where Arabic transcription has to be accurate, fast, and on the record. Same engine, different deployment.
Every Arabic call, transcribed and searchable.
GCC contact centres deploy Voicexa to transcribe every customer call across Khaleeji and MSA, in real time, with speaker diarization. Quality teams move from sampling 2 percent of calls to reviewing 100 percent, with full-text search across the entire archive.
- 100 percent call coverage instead of 2 percent sampling
- Live agent assist with on-screen prompts during the call
- Full-text search across every recorded conversation
Public meetings, parliamentary sessions, on the record.
Government entities use Voicexa to transcribe parliamentary debates, public hearings, and citizen service centre calls, on premises and inside their own data residency boundary. Speaker attribution and bilingual export are built in for the official record.
- On-premises deployment inside national data residency
- Speaker attribution for the official meeting record
- Bilingual Arabic and English export for archives
Ready to transcribe Arabic
the way it is actually spoken?
Request API AccessCloud or on-premises. Real-time and batch. Every major Arabic dialect, one engine.