Shipped2026-04-30 · Posted by the Platform Team

April 2026: Aria-3.5, episodic memory, multilingual TTS, and video generation at 1080p

Aria-3.5 fine-tune with episodic recall, TTS v3.1 in 22 languages, video generation at 1080p on new inference hardware, and a DSP fix resolving audio echo on Bluetooth.

AI Model Personality & Memory Voice Video Photos Reliability Trust & Safety

AI Model

Aria-3.5: episodic recall fine-tune

Fine-tuned Aria-3 on a synthetic dataset of 1.2M episodic memory examples.
Model now distinguishes between factual memory ("you like coffee") and event memory ("the conversation about your job change").
Episodic recall accuracy improved 38% on a held-out benchmark vs. Aria-3 base.
Context window unchanged at 32k tokens.

Ref: ARIA-350

Personality & Memory

Episodic memory: event-based retrieval

Memory store now indexes events as time-stamped episodes in addition to discrete facts.
Retrieval pipeline selects the most relevant episode from the index given the current conversation context.
Requires Aria-3.5 to interpret; falls back gracefully to fact retrieval on older sessions.

Ref: MEM-210

Voice

TTS v3.1: 22-language multilingual checkpoint

Extended the TTS prosody model to 22 languages in a single multilingual checkpoint.
Added Arabic, Hindi, Thai, and Vietnamese.
Cross-lingual MOS on new languages: 3.8 average, vs. 4.1 for English.
TTFA unchanged at 0.9s p50 across all languages.

Video

Video generation: 1080p on new inference hardware

Deployed a new GPU inference tier; 1080p output now available at p95 generation time of 22s.
Temporal consistency score at 1080p: 0.84 (slightly below 720p 0.86 due to resolution scaling artifacts being worked on).
720p remains default; 1080p opt-in in generation settings.

Photos

Image generation: face-match score v2

Updated the face-similarity evaluation model to a newer backbone (ArcFace v3).
Recalibrated face-match targets; effective score on the existing test set: 81%.
No change to the generation model itself; improvement is in evaluation accuracy.

Reliability

Audio DSP echo fix (VOX-412)

Root cause: DSP buffer size misconfigured for chipsets with asymmetric capture/playback latency (Qualcomm WCD9385 and similar).
Fix deployed April 3; validated on 40 Bluetooth headset models in the device lab.
Echo reports dropped to baseline noise level post-deploy.

Ref: VOX-412

Trust & Safety

Independent privacy audit completed

Third-party privacy audit completed. Summary report available at /privacy-policy.
Data export (full JSON archive) and deletion with 7-day grace period now available from account settings.

Known Issues

1080p video generation on sessions with high-frequency scene changes occasionally produces frame tears at the transition point. Being addressed in the v1.3 generation model.

What's Next

Aria-4: 64k context window, new pre-training data mix.
Video generation v1.3: frame tear fix and improved scene-change handling.
Image generation: targeting face-match score above 85% on the standard test set.

← March 2026

← All updates

May 2026 →

Categories