logoAigirlfriends.ai
Shipped2026-04-30 · Posted by the Platform Team

April 2026: Aria-3.5, episodic memory, multilingual TTS, and video generation at 1080p

Aria-3.5 fine-tune with episodic recall, TTS v3.1 in 22 languages, video generation at 1080p on new inference hardware, and a DSP fix resolving audio echo on Bluetooth.

AI ModelPersonality & MemoryVoiceVideoPhotosReliabilityTrust & Safety
AI Model

Aria-3.5: episodic recall fine-tune

  • Fine-tuned Aria-3 on a synthetic dataset of 1.2M episodic memory examples.
  • Model now distinguishes between factual memory ("you like coffee") and event memory ("the conversation about your job change").
  • Episodic recall accuracy improved 38% on a held-out benchmark vs. Aria-3 base.
  • Context window unchanged at 32k tokens.

Ref: ARIA-350

Personality & Memory

Episodic memory: event-based retrieval

  • Memory store now indexes events as time-stamped episodes in addition to discrete facts.
  • Retrieval pipeline selects the most relevant episode from the index given the current conversation context.
  • Requires Aria-3.5 to interpret; falls back gracefully to fact retrieval on older sessions.

Ref: MEM-210

Voice

TTS v3.1: 22-language multilingual checkpoint

  • Extended the TTS prosody model to 22 languages in a single multilingual checkpoint.
  • Added Arabic, Hindi, Thai, and Vietnamese.
  • Cross-lingual MOS on new languages: 3.8 average, vs. 4.1 for English.
  • TTFA unchanged at 0.9s p50 across all languages.
Video

Video generation: 1080p on new inference hardware

  • Deployed a new GPU inference tier; 1080p output now available at p95 generation time of 22s.
  • Temporal consistency score at 1080p: 0.84 (slightly below 720p 0.86 due to resolution scaling artifacts being worked on).
  • 720p remains default; 1080p opt-in in generation settings.
Photos

Image generation: face-match score v2

  • Updated the face-similarity evaluation model to a newer backbone (ArcFace v3).
  • Recalibrated face-match targets; effective score on the existing test set: 81%.
  • No change to the generation model itself; improvement is in evaluation accuracy.
Reliability

Audio DSP echo fix (VOX-412)

  • Root cause: DSP buffer size misconfigured for chipsets with asymmetric capture/playback latency (Qualcomm WCD9385 and similar).
  • Fix deployed April 3; validated on 40 Bluetooth headset models in the device lab.
  • Echo reports dropped to baseline noise level post-deploy.

Ref: VOX-412

Trust & Safety

Independent privacy audit completed

  • Third-party privacy audit completed. Summary report available at /privacy-policy.
  • Data export (full JSON archive) and deletion with 7-day grace period now available from account settings.

Privacy Policy →

Known Issues

  • 1080p video generation on sessions with high-frequency scene changes occasionally produces frame tears at the transition point. Being addressed in the v1.3 generation model.

What's Next

  • Aria-4: 64k context window, new pre-training data mix.
  • Video generation v1.3: frame tear fix and improved scene-change handling.
  • Image generation: targeting face-match score above 85% on the standard test set.
← All updates