Arabic Speech Recognition Engine

Arabic speech,
recognized.every dialect.

Voicexa transcribes Arabic in real time across Khaleeji, MSA, Egyptian, and Levantine dialects, with speaker diarization, custom vocabulary, and on-premises deployment.

All major Arabic dialects, one API

Request API Access See How It Works

Live Transcription

Voicexa Stream

Recording

Detected Dialect

Streaming Transcript

Speaker 1

السلام عليكم، كيف أقدر أساعدك اليوم؟

Latency 280msSpeaker Diarization On

How Voicexa Works

From microphone to transcript in four steps, all over a single API.

Send Audio

Stream audio over WebSocket in real time, or upload a file. Voicexa accepts MP3, WAV, FLAC, OGG, and M4A from the browser, mobile, or your backend.

AI Transcribes

The acoustic model decodes speech with sub-second latency. Speaker diarization, punctuation, and named entities are added inline as the audio comes in.

Dialect Detected

Voicexa identifies the regional dialect on the fly, Khaleeji, MSA, Egyptian, or Levantine, and adapts the language model so the transcript matches how it was actually spoken.

Get Text

Receive the transcript with word-level timestamps, speaker labels, and confidence scores. Stream it live, store it in history, or pipe it straight into your stack.

Built for the way Arabic is actually spoken.

Real dialects, real code-switching, real accuracy on the audio that your customers, agents, and citizens are producing today.

Every Major Arabic Dialect

Voicexa is trained on one of the largest Gulf Arabic speech datasets ever assembled, and extends across MSA, Egyptian, and Levantine. Code-switching with English and Hindi is handled inline, no toggle required.

KhaleejiMSAEgyptianLevantine

BahrainiSaudiEmiratiKuwaitiQatariOmani

Real-Time Diarization

Two speakers, ten speakers, an entire call centre queue: Voicexa attributes every utterance to the right speaker as the audio streams in.

Agent

Speaker 1, 51% airtime

Customer

Speaker 2, 49% airtime

Sub-500ms Streaming

Real-time captions for live broadcasts, agent assist, and voice interfaces with latency that feels instant.

280ms

Median latency

A simple, well-documented API

REST and WebSocket endpoints, official Python and JavaScript SDKs, and on-premises deployment for regulated workloads, with a visible badge for data residency.

// Stream Arabic audioconst ws = new WebSocket('wss://api.voicexa.io/v1/stream');ws.onmessage = (e) => {const { text, speaker, dialect } = JSON.parse(e.data);};

GCC Data Residency

Inside the platform

See exactly what Voicexa does.

From the live console to the analytics view, every screen in this section is the actual product.

Console

One console for every transcription job.

The Voicexa overview gives operators a single view of streaming sessions, batch jobs, and recent transcripts, with usage trends, error rates, and queue status updating live.

Live status of every streaming session and batch job
Usage by hour, day, and project for billing and capacity
Per-job latency and word error rate at a glance
Quick links to recent transcripts and saved presets

Voicexa overview console with live sessions, usage trends, and recent transcripts

Live Transcribe

Type-as-you-speak Arabic transcription.

Stream from a microphone or any audio source and watch the transcript build word by word, with speaker labels, punctuation, and confidence scores arriving inline.

Word-by-word streaming with sub-500ms latency
Speaker diarization for multi-party conversations
Inline punctuation, casing, and named entity tagging
Save, export, or pipe directly into your downstream system

Real-time transcription view with streaming Arabic words and speaker labels

Dialects

Pick the language model, or let it pick itself.

Voicexa ships with a family of language models tuned for Khaleeji, MSA, Egyptian, and Levantine Arabic. Auto-detect routes audio to the right model, or pin a model per project.

Auto-detect dialect from the first seconds of audio
Manual model selection per project or per session
Custom vocabulary and named entities per workspace
Code-switching with English and Hindi handled inline

Language and dialect model picker with Khaleeji, MSA, Egyptian, and Levantine options

History

Searchable archive of every transcript.

Every session is saved with audio, transcript, speaker labels, and metadata. Full-text search and filters make it trivial to find a specific call, meeting, or broadcast moment.

Full-text search across every transcript in your workspace
Filter by date, speaker, dialect, project, or tag
Side-by-side audio playback synced to the transcript
Bulk export as JSON, SRT, VTT, or plain text

History view of saved transcripts with search, filters, and audio playback

API

REST and WebSocket, with first-class SDKs.

The Voicexa API is built around two endpoints: a streaming WebSocket for real-time audio and a REST endpoint for batch files. Official Python and JavaScript SDKs cover the rest.

WebSocket streaming with backpressure and reconnection
REST batch endpoint with webhook callbacks
API keys with per-key quotas and rate limits
Live request logs with replay for debugging

API console showing endpoints, keys, and live request logs

Analytics

Audio analytics, not just transcripts.

Voicexa tracks accuracy, latency, throughput, and cost across every project. Spot regressions, capacity ceilings, and the projects driving spend before they become problems.

Word error rate trends per dialect and per project
Latency distribution across streaming sessions
Throughput, concurrency, and queue depth in real time
Cost breakdown by project, model, and team member

Analytics dashboard with WER, latency, throughput, and cost metrics

In production

From contact centres to public sector archives.

Voicexa runs in the workloads where Arabic transcription has to be accurate, fast, and on the record. Same engine, different deployment.

Contact Centre

Every Arabic call, transcribed and searchable.

GCC contact centres deploy Voicexa to transcribe every customer call across Khaleeji and MSA, in real time, with speaker diarization. Quality teams move from sampling 2 percent of calls to reviewing 100 percent, with full-text search across the entire archive.

100 percent call coverage instead of 2 percent sampling
Live agent assist with on-screen prompts during the call
Full-text search across every recorded conversation

Streaming Track · Telecom & BPO

Government Service

Public meetings, parliamentary sessions, on the record.

Government entities use Voicexa to transcribe parliamentary debates, public hearings, and citizen service centre calls, on premises and inside their own data residency boundary. Speaker attribution and bilingual export are built in for the official record.

On-premises deployment inside national data residency
Speaker attribution for the official meeting record
Bilingual Arabic and English export for archives

Batch Track · Public Sector

Ready to transcribe Arabic
the way it is actually spoken?

Request API Access

Cloud or on-premises. Real-time and batch. Every major Arabic dialect, one engine.

Arabic speech,recognized.every dialect.