Backed by Y Combinator

Voice agents that get better every day

Learn from your production calls, ship fixes, and prove measurable outcome lift with every single call.

$0.05 /min all-in7 lines of code
Connect viaMCP
from callingbox import Callingbox
client = Callingbox()
call = client.calls.create(
    agent_id="agt_7f3a...",
    to="+15551234567",
    context={ "patient": "Emma Cohen" },
)
call dispatched · ai talks · json returned

How it works

Configure an agent. Dispatch the call.

Define what the agent does once. Trigger calls against it with a single POST. Every call returns structured JSON.

1
Requestagent · v1.4.3
POST /v1/calls
agent_id = "agt_7f3a..."
to = "+1555..."
context = { patient }
2
Call
Tool call

check_availability(date="Thu")

3
Result
{
"confirmed":
}
4
Learn→ improvement queue

Every call is scored for failure modes. Ranked prompt and policy fixes land in your improvement queue, gated behind an A/B harness before they ship.

↺ FEEDS THE NEXT CALL

The loop

Improve your agents with every production call.

Most voice agents plateau at the quality wall, usually 75–85% on the metric that matters. CallingBox closes the loop: real calls become ranked fixes, shipped diffs, and verified uplift in dollars.

01

Captured

Every production call lands as labeled data: audio, transcript, agent version, and the outcome that matters.

Live feed
last 7 days · 1,284 calls
wonLinda M.02:14
lostRobert K.01:47
wonSarah J.03:02
lostMarcus T.00:58
972 won·312 lost→ diagnose
02

Diagnosed

Failure modes like hallucinations and weak handoffs, ranked by how much each one costs your KPI.

Failure modesfrom 312 lost calls
01weak coverage objection38
02voicemail handoff dropped24
03verification step skipped17
04tone too transactional11
4 modes ranked·90 of 312 lost
03

Shipped

Prompt and policy diffs ranked by expected lift. A/B-gated, versioned, and reversible.

VOICETOOLSPOLICY+0.8ptTONE+2.1ptPROMPT+1.4pt
04

Verified

Real lift on the metric that matters: dollars collected, bookings won, measured against a held-out baseline.

+$12,480/wk

recovered · 13 weeks · 9 shipped diffs

+18% vs baseline
D1D2D3
w1w7w13

Examples

Watch a call become an improvement.

Conversation, tool calls, the webhook, and a queued diff for the next version of the agent.

0:00 / 0:12

Why CallingBox

Other voice runtimes ship and stop. We ship and compound.

Most teams launch, hit the quality wall in a few weeks, and stay there. CallingBox keeps moving the metric: every week, with verified uplift attached.

CallingBoxOther voice runtimes
outcome KPI · closed-won rate · 12 weeks
ship cadence ~weeklyuplift counterfactualrollback 1 click
illustrative · internal benchmarks

Twelve weeks · same starting line

CallingBox

+14pts

Other voice runtimes

+0.5pts

Same metric. Same horizon. Different curve.

plateau wall

where most agents stop after launch

75–85%
failures auto-detected

by incumbents

0
data feeding the loop

on CallingBox

every call

Migrate off Retell, Vapi, or Bland in an afternoon. Keep your numbers. Watch your KPI move.

Also included

And yes: we're still the runtime.

The improvement loop sits on top of a production-grade voice stack. One vendor, one bill, one API.

What you're replacing

4 layers · 1 API
01Telephony
SIPNumbersSTIR/SHAKEN
02STT · LLM · TTS
StreamingToolsVoices
03Voicemail · IVR · DTMF
DetectionNavigationTones
04JSON + webhooks
ReturnsEventsRecordings

$0.05/min connected time, all-in. No per-component metering, no surprise add-ons.

See your first improvement in week one.

$5 free credit. No card. First live call in 5 minutes, first measured improvement in 7 days.

  • Keep your phone numbers
  • Cancel anytime

FAQ

Common questions

Still have questions?

Talk to the founders directly — 20-minute slot, real engineers.

Talk to founders