Shipped2026-02-28 · Posted by the Platform Team

February 2026: Streaming inference, voice call infrastructure, and video generation v1.1

Streaming response pipeline cuts TTFT to 0.8s, live voice calls in beta at sub-400ms latency, video generation upgraded to 720p, and iOS general availability.

AI Model Voice Video Photos Mobile Trust & Safety

AI Model

Streaming inference pipeline deployed

TTFT (time-to-first-token) reduced from 3.2s to 0.8s across all platforms.
Tokens streamed incrementally as generated; no wait for full response to complete.
Achieved by switching from batched to streaming decode with a revised KV-cache layout.
p99 TTFT is 1.4s under normal load.

Ref: ARIA-312

Voice

Live voice call infrastructure: beta

WebRTC-based voice pipeline with sub-400ms round-trip latency on stable connections.
Opus codec at 24kbps with VAD-gated transmission to reduce bandwidth.
Adaptive bitrate on degraded networks (bitrate ladder incomplete at launch; see known issues).

Video

Video generation v1.1: 720p output

Resolution increased from 512x512 to 1280x720.
Temporal consistency score improved from 0.71 to 0.79 on internal motion-stability benchmark.
p95 generation time increased to 18s at 720p vs. 12s at 512px (hardware-bound).
Closed beta expanded from 500 to 2,000 accounts.

Photos

Image generation: prompt adherence improvements

Fine-tuned the image generation model on a curated set of 40k character-scene pairs.
CLIP similarity score (prompt-to-image) improved by 0.06 on the internal eval set.
Consistent-character engine applies to all new image requests automatically.

Mobile