Gemini 3.1 Flash TTS: the next generation of expressive AI speech
Summary
Google has launched Gemini 3.1 Flash TTS, its latest text-to-speech model, designed to provide improved controllability, expressivity, and quality for AI-generated speech. The model introduces "audio tags," which allow users to precisely control vocal style, pacing, and delivery using natural language commands embedded directly into the text. This feature enables developers to fine-tune AI voices for specific scenarios, create consistent characters, and build immersive audio experiences. Gemini 3.1 Flash TTS supports over 70 languages and has achieved a high score on the Artificial Analysis TTS leaderboard for its natural and expressive speech. All audio generated by the model is watermarked with SynthID to ensure reliable detection of AI-generated content and help prevent misinformation. The model is available for developers via the Gemini API and Google AI Studio, for enterprises on Vertex AI, and for Workspace users through Google Vids.
(Source:Gemini)