Open Source · Apache 2.0

Open voice infrastructure for AI agents.

A Python framework that bridges telephony infrastructure (Asterisk, FreeSWITCH, LiveKit) with AI voice agents (STT, LLM, TTS). It lets developers build AI-powered call centers without needing to understand telecom internals.

Real-time call audio · STT → LLM → TTS$ pip install voxtra

Telephony in, AI out.

Voxtra sits between the PBX and the model providers. It hands you a session API (say, listen, agent) and orchestrates the STT → LLM → TTS pipeline behind the scenes. Every layer is a registry, swap providers without touching the rest.

Telephony

Asterisk

ARI adapter in production · LiveKit + FreeSWITCH planned

AI Stack

Streaming

Deepgram (STT) · OpenAI GPT-4o (LLM) · ElevenLabs + Cartesia (TTS)

Layered by design.

Voxtra is a stack of small Python packages, each owning one concern. Core handles app lifecycle and session routing. Telephony adapts ARI / SIP. Audio runs the AudioSocket TCP transport. Media bridges sessions into the pipeline. AI exposes the STT / LLM / TTS / VAD provider registry. Pipeline orchestrates all of it per call.

Pipeline · per session

audio in→STT→LLM→TTS→audio out

Two files, one running call center.

Configure providers in YAML; describe the call flow in Python. voxtra start wires the rest.

app.py, Python

from voxtra import VoxtraApp

app = VoxtraApp.from_yaml("voxtra.yaml")

@app.route(extension="1000")
async def support_call(session):
    await session.answer()
    await session.say("Hello, how can I help you?")
    text = await session.listen()
    reply = await session.agent.respond(text)
    await session.say(reply.text)

app.run()

View on GitHub Read the docs