Loading tutorials…
Loading tutorials…
Overdub is Descript's AI voice clone — train it on 10 minutes of your voice and type new sentences that sound like you. Used right, it saves hours of re-recording. Used wrong, it can sound robotic or raise consent issues. Here's the full setup + ethical use guide.
Who this is forPodcasters, course creators, and content producers who frequently need to fix mistakes, add corrections, or insert short voiceovers without re-recording the full segment.
What you'll need
Step 1
Overdub is Pro plan only. You can train ONLY your own voice. Cloning others requires their explicit recorded consent. Misuse violates Descript ToS.
Confirm you're on Descript Pro: Settings → Plan & Billing. Overdub does not exist on Free or Creator plans.
Read Descript's Overdub consent policy: you can train Overdub ONLY on YOUR own voice OR on voices where the speaker has provided recorded consent.
Recorded consent format: speaker says "I, [name], on [date], give Descript permission to train an AI model on my voice." Descript validates this voice-statement matches the training audio.
Violations: cloning a public figure or someone without consent violates the ToS and may carry legal risk (right-of-publicity, defamation if misused).
Use cases that are clearly fine: fixing your own mistakes, generating corrections, adding short intros/outros in your own voice. Cases that are not fine: cloning a guest's voice to put words in their mouth.
Step 2
Settings → Overdub → Train New Voice. Record 10-30 minutes of clean, monotone-varied speech. Quiet room, good mic, follow Descript's script.
Settings → Overdub → + Train New Voice.
Descript provides a training script (varies but typically 5-15 minutes of text covering different sentence types).
Recording environment: quiet room (no AC hum, no traffic), good mic (USB condenser at minimum, XLR ideally), 6-12 inches from mic, headphones to monitor.
Vocal performance: read the script in YOUR normal speaking voice. Vary tone naturally — don't monotone, don't over-perform. Imagine talking to a colleague.
Read 2-3 takes if Descript suggests. Pick the cleanest, most natural take.
After recording, Descript provides a consent verification step: read the consent statement out loud. Click Submit.
Training time: 12-48 hours. Descript emails you when the model is ready.
Step 3
After training, open any project → highlight a missing-word gap → right-click → Overdub. Type the missing word. Listen. Repeat for several sentences.
When Descript notifies you the model is ready, open any project.
In the script, click in a gap or to insert new text. Type 3-5 words.
Right-click the typed text → Overdub → Generate.
Listen to the result with headphones. Compare to your real voice in the surrounding audio.
Good Overdub: indistinguishable from your real voice in short fragments (3-10 words). Slight robotic feel on longer passages (15+ words).
Common quality issues: (a) wrong intonation on questions (rising vs falling), (b) flat affect on emotional words, (c) mispronounced unfamiliar names.
If quality is poor: re-train with more/cleaner training audio. Settings → Overdub → Retrain Voice.
Step 4
Best use case. Found a typo in a name, missed a word, made a slight error. Type the fix in the transcript → right-click → Overdub. 30-second fix.
Scenario: you recorded a podcast and said 'Sarah' when you meant 'Sara.' In Premiere/Audacity, fixing this requires re-recording the section.
In Descript with Overdub: delete the wrong word, type the correct word, right-click → Overdub. Listen. Done.
Other common small fixes: mispronounced names, missed words (you said 'the' when you meant 'three'), correcting a date, fixing a small factual error.
Workflow during editing: as you read the transcript, mark fixes with a tag (#overdub). After the main edit, do a 5-min Overdub pass for all marked sections.
Time savings: 30-60 min per episode of re-recording or careful audio editing replaced with 5-10 min of Overdub generation.
Step 5
OK for short voiceover bridges (10-20 words). Risky for full paragraphs. Long passages start sounding robotic — listeners notice.
Beyond fixes, Overdub can generate entirely new content. Example: you forgot to record a brief intro line for an episode; type it, Overdub generates.
Sweet spot: 5-20 words of new content. Sounds natural, blends into surrounding real audio.
Risky: 30+ words of new content. The slight robotic tendency becomes audible. Listeners may notice without being able to articulate what's off.
Don't use for: emotional content, complex sentences with subtle inflection, anything where authenticity matters more than convenience.
Mental rule: if you'd accept the Overdub for a small typo fix, it's fine. If you wouldn't notice the difference between Overdub and real you, it's fine. If you'd hear something slightly off, re-record.
Always sanity-check final episodes by listening to Overdub sections back-to-back with surrounding real audio.
Step 6
Workspace owners can manage trained voices. Multiple team members each train their own. Set permissions: who can use whose voice clone.
Settings → Workspace → Overdub Voices. Lists every trained voice in the workspace.
Each team member trains their own voice individually (consent required per voice).
Permissions: by default, only the voice's owner can use their own clone. Workspace owners can grant access to other team members (e.g., a video editor can use the host's clone to fix mistakes during post).
Audit trail: workspace admins see who generated what Overdub content and when. Useful for accountability and compliance.
Best practice: limit Overdub access to 1-2 video editors per voice. Broader access invites misuse.
Step 7
Before exporting, listen back to every Overdub-generated section. If any sound robotic or off, re-generate or re-record the real audio.
Before final export, run a QC pass specifically on Overdub sections.
View → Highlight Overdubbed sections (or use the tag if you used #overdub).
Listen to each section in context with surrounding real audio. Slight tone shifts, breath patterns, or pace changes mean Overdub didn't blend cleanly.
Re-generate options: right-click → Overdub → Regenerate (sometimes produces a better take), or shorten the Overdub (smaller chunks blend better), or re-record the real audio (sometimes the right call).
Document Overdub policy for your team: when it's acceptable, who can use it, what gets flagged for re-record. A 1-pager prevents misuse.
Per Descript's terms, you should disclose AI voice generation in content where listeners might reasonably expect 100% authentic audio — typically you don't need to disclose for small typo fixes, but you should for long passages.
Common mistakes
Recording training audio in a noisy environment
What goes wrong: Voice clone inherits the room noise + reverb. Generated speech sounds like it was recorded in a different room than your real audio. Blending fails.
How to avoid: Train in a quiet, acoustically-treated room with the SAME mic + setup you use for normal recordings. Match conditions exactly so generated voice blends seamlessly.
Using Overdub for long passages
What goes wrong: 30+ word passages develop a subtle robotic affect. Listeners may not consciously notice but trust erodes. 'Something felt off' reviews appear.
How to avoid: Overdub for fixes (3-15 words). Re-record for anything longer. Test: would you accept a typo fix? If yes, Overdub OK. If you'd want a full re-take, do that instead.
Cloning voices without consent
What goes wrong: Violates Descript ToS, potentially carries legal liability (right-of-publicity, defamation), and erodes trust if discovered. The damage outweighs the convenience.
How to avoid: Only train YOUR own voice (or someone who has given recorded consent per Descript's policy). For guest voices: re-invite them to re-record or skip the section.
Not testing the clone before relying on it
What goes wrong: First time you use Overdub in production is also the first time you hear how it sounds. Quality is poor. Episode ships with audible AI voice.
How to avoid: After training completes, generate 20-30 test sentences. Listen carefully. Decide whether quality is shippable BEFORE using it in real content.
No final QC pass on Overdub sections
What goes wrong: Overdub sections that sounded fine in isolation feel off in context. Listeners notice the transitions. Episode quality is unevenly perceived.
How to avoid: Mandatory QC pass: listen to every Overdub section in context with surrounding real audio. Re-generate or re-record if anything sounds off.
Letting Overdub access spread across the team
What goes wrong: Multiple editors can generate fake host audio. Accountability blurs. Misuse risk grows. Compliance issues if Overdub is used without proper logging.
How to avoid: Limit Overdub access to 1-2 editors per voice. Workspace admins audit usage quarterly. Document the policy.
Recap
Done — what's next
How to set up a Descript account for podcast + video editing
Read the next tutorial
Hand it off
Setting up Overdub takes 90 minutes. Using it judiciously across weekly episode production — knowing when to fix, when to regenerate, when to re-record — is craft. A vetted video editor on EverestX handles Overdub responsibly as part of full post-production from $14-16/hr.
See video editor rates
For 3-15 word fixes: yes, indistinguishable in most cases. For 20+ word passages: trained ears detect slight artifacts. For full paragraphs: most listeners notice something feels off even if they can't articulate why. Use accordingly.
Only with their explicit recorded consent per Descript's ToS. They must record a consent statement saying their name, the date, and permission to train an AI model on their voice. Descript validates this matches the training audio. Cloning without consent violates ToS and may carry legal risk.
Descript provides a training script of ~10-15 minutes. Read it cleanly, in your natural voice, in a treated room with your usual mic. More training audio (up to ~30 min) marginally improves quality. Quality of audio matters more than quantity.
For small typo fixes: typically no — listeners don't expect 'fixing a misspoken name' to be disclosed. For long Overdub-generated passages: yes, ethical practice (and emerging FTC guidance) suggests disclosure. When in doubt, disclose. Trust is the foundation; once broken it's hard to rebuild.
Overdub generates in the language of training audio. If you trained in English, it generates English. It maintains your accent — won't switch to a different accent. To generate in another language, you'd need to train a separate voice on audio in that language.
Descript
Descript is the editor that thinks like a doc — but the install is misleadingly simple. Skip the workspace + project + storage configuration and you get untitled files everywhere and editors fighting over which version is current. Here's the proper setup.
Descript
Most podcasters bounce between 4 tools: Riverside for recording, Descript for editing, Audacity for cleanup, Canva for show art. Descript can do 80% of it in one app. Here's the full episode workflow.
Descript
Descript's killer feature: edit your video by editing the transcript. Delete a word in the doc, the audio/video deletes too. This walks the full workflow — Filler Word Removal, Word Find & Replace, Strikethrough, and Studio Sound — that makes Descript 5-10x faster than traditional editors.
Descript
Most content creators hit the DIY Descript ceiling at 10-20 episodes. Editing eats 6-12 hours/week per episode. Quality plateaus. Cadence slips. Here's the honest framework for when to bring in a podcast or video editor.
Loom
Loom AI turns a 10-minute recording into a 4-line summary, 6 chapter markers, and a list of action items in 30 seconds. The features are powerful but only useful when configured for your team's workflow. Here's the full setup.