How Descript enables multilingual video dubbing at scale

中文日本語 Español

OpenAI Mar 6, 2026

Descript redesigned its video dubbing pipeline using OpenAI models to optimize for both semantic fidelity and duration adherence simultaneously, significantly improving natural pacing.

Read Full Article

Summary

Descript, an AI-native video editor, has significantly improved its multilingual video dubbing capabilities by redesigning its translation pipeline to address the critical issue of duration adherence, which often made translated speech sound unnatural.

Previously, translations optimized for meaning often failed timing constraints because different languages require different speaking rates (e.g., German is often 'longer' than English). This forced users into tedious manual adjustments. Descript's new approach uses OpenAI reasoning models, specifically leveraging improved consistency in tasks like syllable counting, to optimize simultaneously for semantic fidelity and duration adherence during generation, rather than correcting timing afterward.

The results showed a 15% increase in translated video exports and a 13 to 43 percentage point improvement in duration adherence. Listening tests confirmed that the redesigned pipeline increased segments falling within a natural pacing window from 40%-60% up to 73%-83%. Descript is now building batch processing capabilities to enable large-scale localization, with future improvements focusing on making the pipeline more multimodal to better preserve nonverbal speech characteristics like tone and emphasis.

(Source：OpenAI)

中文日本語 Español

Read Full Article

TechCrunch Apr 20, 2026

Anthropic takes $5B from Amazon and pledges $100B in cloud spending in return

TechCrunch Apr 20, 2026

Google rolls out Gemini in Chrome in seven new countries

TechCrunch Apr 20, 2026

It’s not just one thing — it’s another thing

Phoronix Apr 20, 2026

Popular Rust-Based Database Turns To AI For Up To 1.5x Speedup, Other Improvements

The Verge Apr 20, 2026