Shipped2026-04-30 · Posted by the Platform Team
April 2026: Aria-3.5, episodic memory, multilingual TTS, and video generation at 1080p
Aria-3.5 fine-tune with episodic recall, TTS v3.1 in 22 languages, video generation at 1080p on new inference hardware, and a DSP fix resolving audio echo on Bluetooth.
AI Model
Aria-3.5: episodic recall fine-tune
- Fine-tuned Aria-3 on a synthetic dataset of 1.2M episodic memory examples.
- Model now distinguishes between factual memory ("you like coffee") and event memory ("the conversation about your job change").
- Episodic recall accuracy improved 38% on a held-out benchmark vs. Aria-3 base.
- Context window unchanged at 32k tokens.
Ref: ARIA-350
Personality & Memory
Episodic memory: event-based retrieval
- Memory store now indexes events as time-stamped episodes in addition to discrete facts.
- Retrieval pipeline selects the most relevant episode from the index given the current conversation context.
- Requires Aria-3.5 to interpret; falls back gracefully to fact retrieval on older sessions.
Ref: MEM-210
Voice
TTS v3.1: 22-language multilingual checkpoint
- Extended the TTS prosody model to 22 languages in a single multilingual checkpoint.
- Added Arabic, Hindi, Thai, and Vietnamese.
- Cross-lingual MOS on new languages: 3.8 average, vs. 4.1 for English.
- TTFA unchanged at 0.9s p50 across all languages.
Video
Video generation: 1080p on new inference hardware
- Deployed a new GPU inference tier; 1080p output now available at p95 generation time of 22s.
- Temporal consistency score at 1080p: 0.84 (slightly below 720p 0.86 due to resolution scaling artifacts being worked on).
- 720p remains default; 1080p opt-in in generation settings.
Photos
Image generation: face-match score v2
- Updated the face-similarity evaluation model to a newer backbone (ArcFace v3).
- Recalibrated face-match targets; effective score on the existing test set: 81%.
- No change to the generation model itself; improvement is in evaluation accuracy.
Reliability
Audio DSP echo fix (VOX-412)
- Root cause: DSP buffer size misconfigured for chipsets with asymmetric capture/playback latency (Qualcomm WCD9385 and similar).
- Fix deployed April 3; validated on 40 Bluetooth headset models in the device lab.
- Echo reports dropped to baseline noise level post-deploy.
Ref: VOX-412
Trust & Safety
Independent privacy audit completed
- Third-party privacy audit completed. Summary report available at /privacy-policy.
- Data export (full JSON archive) and deletion with 7-day grace period now available from account settings.
Known Issues
- 1080p video generation on sessions with high-frequency scene changes occasionally produces frame tears at the transition point. Being addressed in the v1.3 generation model.
What's Next
- Aria-4: 64k context window, new pre-training data mix.
- Video generation v1.3: frame tear fix and improved scene-change handling.
- Image generation: targeting face-match score above 85% on the standard test set.