Back to Products
Arabic Speech Recognition Engine

Arabic speech,
recognized.every dialect.

Voicexa transcribes Arabic in real time across Khaleeji, MSA, Egyptian, and Levantine dialects, with speaker diarization, custom vocabulary, and on-premises deployment.

All major Arabic dialects, one API

Live Transcription

Voicexa Stream

Recording

Detected Dialect

Streaming Transcript

Speaker 1

السلام عليكم، كيف أقدر أساعدك اليوم؟

Latency 280msSpeaker Diarization On

How Voicexa Works

From microphone to transcript in four steps, all over a single API.

01

Send Audio

Stream audio over WebSocket in real time, or upload a file. Voicexa accepts MP3, WAV, FLAC, OGG, and M4A from the browser, mobile, or your backend.

02

AI Transcribes

The acoustic model decodes speech with sub-second latency. Speaker diarization, punctuation, and named entities are added inline as the audio comes in.

03

Dialect Detected

Voicexa identifies the regional dialect on the fly, Khaleeji, MSA, Egyptian, or Levantine, and adapts the language model so the transcript matches how it was actually spoken.

04

Get Text

Receive the transcript with word-level timestamps, speaker labels, and confidence scores. Stream it live, store it in history, or pipe it straight into your stack.

Built for the way Arabic is actually spoken.

Real dialects, real code-switching, real accuracy on the audio that your customers, agents, and citizens are producing today.

Every Major Arabic Dialect

Voicexa is trained on one of the largest Gulf Arabic speech datasets ever assembled, and extends across MSA, Egyptian, and Levantine. Code-switching with English and Hindi is handled inline, no toggle required.

KhaleejiMSAEgyptianLevantine
BahrainiSaudiEmiratiKuwaitiQatariOmani

Real-Time Diarization

Two speakers, ten speakers, an entire call centre queue: Voicexa attributes every utterance to the right speaker as the audio streams in.

Agent

Speaker 1, 51% airtime

Customer

Speaker 2, 49% airtime

Sub-500ms Streaming

Real-time captions for live broadcasts, agent assist, and voice interfaces with latency that feels instant.

280ms

Median latency

A simple, well-documented API

REST and WebSocket endpoints, official Python and JavaScript SDKs, and on-premises deployment for regulated workloads, with a visible badge for data residency.

// Stream Arabic audioconst ws = new WebSocket('wss://api.voicexa.io/v1/stream');ws.onmessage = (e) => {const { text, speaker, dialect } = JSON.parse(e.data);};
GCC Data Residency
Inside the platform

See exactly what Voicexa does.

From the live console to the analytics view, every screen in this section is the actual product.

Console

One console for every transcription job.

The Voicexa overview gives operators a single view of streaming sessions, batch jobs, and recent transcripts, with usage trends, error rates, and queue status updating live.

  • Live status of every streaming session and batch job
  • Usage by hour, day, and project for billing and capacity
  • Per-job latency and word error rate at a glance
  • Quick links to recent transcripts and saved presets
Voicexa overview console with live sessions, usage trends, and recent transcripts
Live Transcribe

Type-as-you-speak Arabic transcription.

Stream from a microphone or any audio source and watch the transcript build word by word, with speaker labels, punctuation, and confidence scores arriving inline.

  • Word-by-word streaming with sub-500ms latency
  • Speaker diarization for multi-party conversations
  • Inline punctuation, casing, and named entity tagging
  • Save, export, or pipe directly into your downstream system
Real-time transcription view with streaming Arabic words and speaker labels
Dialects

Pick the language model, or let it pick itself.

Voicexa ships with a family of language models tuned for Khaleeji, MSA, Egyptian, and Levantine Arabic. Auto-detect routes audio to the right model, or pin a model per project.

  • Auto-detect dialect from the first seconds of audio
  • Manual model selection per project or per session
  • Custom vocabulary and named entities per workspace
  • Code-switching with English and Hindi handled inline
Language and dialect model picker with Khaleeji, MSA, Egyptian, and Levantine options
History

Searchable archive of every transcript.

Every session is saved with audio, transcript, speaker labels, and metadata. Full-text search and filters make it trivial to find a specific call, meeting, or broadcast moment.

  • Full-text search across every transcript in your workspace
  • Filter by date, speaker, dialect, project, or tag
  • Side-by-side audio playback synced to the transcript
  • Bulk export as JSON, SRT, VTT, or plain text
History view of saved transcripts with search, filters, and audio playback
API

REST and WebSocket, with first-class SDKs.

The Voicexa API is built around two endpoints: a streaming WebSocket for real-time audio and a REST endpoint for batch files. Official Python and JavaScript SDKs cover the rest.

  • WebSocket streaming with backpressure and reconnection
  • REST batch endpoint with webhook callbacks
  • API keys with per-key quotas and rate limits
  • Live request logs with replay for debugging
API console showing endpoints, keys, and live request logs
Analytics

Audio analytics, not just transcripts.

Voicexa tracks accuracy, latency, throughput, and cost across every project. Spot regressions, capacity ceilings, and the projects driving spend before they become problems.

  • Word error rate trends per dialect and per project
  • Latency distribution across streaming sessions
  • Throughput, concurrency, and queue depth in real time
  • Cost breakdown by project, model, and team member
Analytics dashboard with WER, latency, throughput, and cost metrics
In production

From contact centres to public sector archives.

Voicexa runs in the workloads where Arabic transcription has to be accurate, fast, and on the record. Same engine, different deployment.

Contact Centre

Every Arabic call, transcribed and searchable.

GCC contact centres deploy Voicexa to transcribe every customer call across Khaleeji and MSA, in real time, with speaker diarization. Quality teams move from sampling 2 percent of calls to reviewing 100 percent, with full-text search across the entire archive.

  • 100 percent call coverage instead of 2 percent sampling
  • Live agent assist with on-screen prompts during the call
  • Full-text search across every recorded conversation
Streaming Track · Telecom & BPO
Government Service

Public meetings, parliamentary sessions, on the record.

Government entities use Voicexa to transcribe parliamentary debates, public hearings, and citizen service centre calls, on premises and inside their own data residency boundary. Speaker attribution and bilingual export are built in for the official record.

  • On-premises deployment inside national data residency
  • Speaker attribution for the official meeting record
  • Bilingual Arabic and English export for archives
Batch Track · Public Sector

Ready to transcribe Arabic the way it is actually spoken?

Request API Access

Cloud or on-premises. Real-time and batch. Every major Arabic dialect, one engine.