Shipped2026-03-31 · Posted by the Platform Team
March 2026: TTS v3 prosody model, video generation quality pass, and memory retrieval improvements
TTS v3 prosody model with context-aware intonation, video generation motion coherence improvements, image generation session memory, and the 4G voice call fix.
Voice
TTS v3: context-aware prosody model
- New prosody model predicts pitch, rate, and energy from semantic context rather than a fixed style.
- Naturalness score (5-point MOS scale, internal panel of 50 raters) improved from 3.6 to 4.1.
- Handles punctuation-free input more gracefully; fewer robotic pauses on long responses.
- TTFA unchanged at 0.9s p50.
Video
Video generation: motion coherence and frame interpolation
- Replaced single-pass generation with a two-stage model: keyframe generation followed by frame interpolation.
- Temporal consistency score improved from 0.79 to 0.86 on internal benchmark.
- Inter-frame flickering on hair and clothing reduced by 61% vs. v1.1.
- p95 generation time reduced to 14s at 720p despite the two-stage pipeline.
Photos
Image generation: session-level style memory
- Character appearance attributes (clothing, hairstyle, accessories) now persisted as structured metadata per session.
- Image generation model receives this metadata at inference time to maintain visual continuity.
- Reduces prompt re-specification needed to maintain a consistent look across multiple images.
Ref: PHO-88
Personality & Memory
Memory retrieval latency: p50 under 80ms
- Reduced memory retrieval p50 from 140ms to 74ms by switching to an HNSW approximate-nearest-neighbour index.
- Index now rebuilt incrementally on write rather than on a nightly batch job.
- No changes to the embedding model or vector dimensions.
Reliability
Voice call 4G drop fix
- Added 12kbps and 8kbps rungs to the WebRTC bitrate ladder.
- Deployed March 12; call drop rate on 4G connections down 74% vs. February baseline.
- Validated across 18 carrier/device combinations in the device lab.
Rollback / Incident
On March 8 we enabled an aggressive memory pruning strategy on 5% of accounts to reduce vector store cost. After 36 hours we received reports of lost conversation context from affected users and reverted immediately. Memory data was restored for all affected accounts from backup. A post-mortem covering root cause and process changes is on the blog.
Status: Resolved. All affected accounts restored from backup.
What's Next
- Aria-3.5: targeted fine-tune for episodic recall and longer context coherence.
- Video generation: test 1080p output tier on new inference hardware.
- TTS v3.1: multilingual checkpoint expansion to 22 languages.
Thanks to everyone who reported the memory regression quickly. The speed of those reports is what allowed us to contain it.