Shipped2026-02-28 · Posted by the Platform Team
February 2026: Streaming inference, voice call infrastructure, and video generation v1.1
Streaming response pipeline cuts TTFT to 0.8s, live voice calls in beta at sub-400ms latency, video generation upgraded to 720p, and iOS general availability.
AI Model
Streaming inference pipeline deployed
- TTFT (time-to-first-token) reduced from 3.2s to 0.8s across all platforms.
- Tokens streamed incrementally as generated; no wait for full response to complete.
- Achieved by switching from batched to streaming decode with a revised KV-cache layout.
- p99 TTFT is 1.4s under normal load.
Ref: ARIA-312
Voice
Live voice call infrastructure: beta
- WebRTC-based voice pipeline with sub-400ms round-trip latency on stable connections.
- Opus codec at 24kbps with VAD-gated transmission to reduce bandwidth.
- Adaptive bitrate on degraded networks (bitrate ladder incomplete at launch; see known issues).
Video
Video generation v1.1: 720p output
- Resolution increased from 512x512 to 1280x720.
- Temporal consistency score improved from 0.71 to 0.79 on internal motion-stability benchmark.
- p95 generation time increased to 18s at 720p vs. 12s at 512px (hardware-bound).
- Closed beta expanded from 500 to 2,000 accounts.
Photos
Image generation: prompt adherence improvements
- Fine-tuned the image generation model on a curated set of 40k character-scene pairs.
- CLIP similarity score (prompt-to-image) improved by 0.06 on the internal eval set.
- Consistent-character engine applies to all new image requests automatically.
Mobile
iOS app general availability
- Full App Store release; TestFlight beta closed February 28.
- Mobile inference client updated: reduced cold-start latency by 340ms on iPhone 13+.
Trust & Safety
Age verification in 8 regions
- Age verification integrated via a third-party identity vendor across 8 regions.
- Vendor listed in /legal-information.
Known Issues
- Voice calls drop on weak 4G connections. Root cause: bitrate ladder missing a low-bandwidth rung below 16kbps. Fix targeting March.
What's Next
- TTS v3: prosody model with context-aware intonation.
- Video generation: improved motion coherence between frames.
- Memory retrieval latency reduction targeting p50 under 80ms.