How to set up and use Descript Overdub voice cloning

Overdub is Descript's AI voice clone — train it on 10 minutes of your voice and type new sentences that sound like you. Used right, it saves hours of re-recording. Used wrong, it can sound robotic or raise consent issues. Here's the full setup + ethical use guide.

~2 hrIntermediateUpdated May 26, 2026

Who this is forPodcasters, course creators, and content producers who frequently need to fix mistakes, add corrections, or insert short voiceovers without re-recording the full segment.

What you'll need

Descript Pro plan (Overdub is Pro-only as of 2026)
A quiet recording environment + good mic for training data
About 10-30 min of clean recording of YOUR OWN voice for training
Clear understanding of Descript's voice consent policy
About 60-90 minutes for setup + first usage

Step 1

Verify your plan and understand the consent policy

Overdub is Pro plan only. You can train ONLY your own voice. Cloning others requires their explicit recorded consent. Misuse violates Descript ToS.

Confirm you're on Descript Pro: Settings → Plan & Billing. Overdub does not exist on Free or Creator plans.

Read Descript's Overdub consent policy: you can train Overdub ONLY on YOUR own voice OR on voices where the speaker has provided recorded consent.

Recorded consent format: speaker says "I, [name], on [date], give Descript permission to train an AI model on my voice." Descript validates this voice-statement matches the training audio.

Violations: cloning a public figure or someone without consent violates the ToS and may carry legal risk (right-of-publicity, defamation if misused).

Use cases that are clearly fine: fixing your own mistakes, generating corrections, adding short intros/outros in your own voice. Cases that are not fine: cloning a guest's voice to put words in their mouth.

Step 2

Record clean training audio

Settings → Overdub → Train New Voice. Record 10-30 minutes of clean, monotone-varied speech. Quiet room, good mic, follow Descript's script.

Settings → Overdub → + Train New Voice.

Descript provides a training script (varies but typically 5-15 minutes of text covering different sentence types).

Recording environment: quiet room (no AC hum, no traffic), good mic (USB condenser at minimum, XLR ideally), 6-12 inches from mic, headphones to monitor.

Vocal performance: read the script in YOUR normal speaking voice. Vary tone naturally — don't monotone, don't over-perform. Imagine talking to a colleague.

Read 2-3 takes if Descript suggests. Pick the cleanest, most natural take.

After recording, Descript provides a consent verification step: read the consent statement out loud. Click Submit.

Training time: 12-48 hours. Descript emails you when the model is ready.

Step 3

Test your voice clone quality

After training, open any project → highlight a missing-word gap → right-click → Overdub. Type the missing word. Listen. Repeat for several sentences.

When Descript notifies you the model is ready, open any project.

In the script, click in a gap or to insert new text. Type 3-5 words.

Right-click the typed text → Overdub → Generate.

Listen to the result with headphones. Compare to your real voice in the surrounding audio.

Good Overdub: indistinguishable from your real voice in short fragments (3-10 words). Slight robotic feel on longer passages (15+ words).

Common quality issues: (a) wrong intonation on questions (rising vs falling), (b) flat affect on emotional words, (c) mispronounced unfamiliar names.

If quality is poor: re-train with more/cleaner training audio. Settings → Overdub → Retrain Voice.

Step 4

Use Overdub for small fixes in existing recordings

Best use case. Found a typo in a name, missed a word, made a slight error. Type the fix in the transcript → right-click → Overdub. 30-second fix.

Scenario: you recorded a podcast and said 'Sarah' when you meant 'Sara.' In Premiere/Audacity, fixing this requires re-recording the section.

In Descript with Overdub: delete the wrong word, type the correct word, right-click → Overdub. Listen. Done.

Other common small fixes: mispronounced names, missed words (you said 'the' when you meant 'three'), correcting a date, fixing a small factual error.

Workflow during editing: as you read the transcript, mark fixes with a tag (#overdub). After the main edit, do a 5-min Overdub pass for all marked sections.

Time savings: 30-60 min per episode of re-recording or careful audio editing replaced with 5-10 min of Overdub generation.

Step 5

Use Overdub sparingly for new content generation

OK for short voiceover bridges (10-20 words). Risky for full paragraphs. Long passages start sounding robotic — listeners notice.

Beyond fixes, Overdub can generate entirely new content. Example: you forgot to record a brief intro line for an episode; type it, Overdub generates.

Sweet spot: 5-20 words of new content. Sounds natural, blends into surrounding real audio.

Risky: 30+ words of new content. The slight robotic tendency becomes audible. Listeners may notice without being able to articulate what's off.

Don't use for: emotional content, complex sentences with subtle inflection, anything where authenticity matters more than convenience.

Mental rule: if you'd accept the Overdub for a small typo fix, it's fine. If you wouldn't notice the difference between Overdub and real you, it's fine. If you'd hear something slightly off, re-record.

Always sanity-check final episodes by listening to Overdub sections back-to-back with surrounding real audio.

Step 6

Set up team Overdub (multiple voices in one workspace)

Workspace owners can manage trained voices. Multiple team members each train their own. Set permissions: who can use whose voice clone.

Settings → Workspace → Overdub Voices. Lists every trained voice in the workspace.

Each team member trains their own voice individually (consent required per voice).

Permissions: by default, only the voice's owner can use their own clone. Workspace owners can grant access to other team members (e.g., a video editor can use the host's clone to fix mistakes during post).

Audit trail: workspace admins see who generated what Overdub content and when. Useful for accountability and compliance.

Best practice: limit Overdub access to 1-2 video editors per voice. Broader access invites misuse.

Step 7

Quality control + final review

Before exporting, listen back to every Overdub-generated section. If any sound robotic or off, re-generate or re-record the real audio.

Before final export, run a QC pass specifically on Overdub sections.

View → Highlight Overdubbed sections (or use the tag if you used #overdub).

Listen to each section in context with surrounding real audio. Slight tone shifts, breath patterns, or pace changes mean Overdub didn't blend cleanly.

Re-generate options: right-click → Overdub → Regenerate (sometimes produces a better take), or shorten the Overdub (smaller chunks blend better), or re-record the real audio (sometimes the right call).

Document Overdub policy for your team: when it's acceptable, who can use it, what gets flagged for re-record. A 1-pager prevents misuse.

Per Descript's terms, you should disclose AI voice generation in content where listeners might reasonably expect 100% authentic audio — typically you don't need to disclose for small typo fixes, but you should for long passages.

Common mistakes

What goes wrong (and how to avoid it)

Recording training audio in a noisy environment
What goes wrong: Voice clone inherits the room noise + reverb. Generated speech sounds like it was recorded in a different room than your real audio. Blending fails.
How to avoid: Train in a quiet, acoustically-treated room with the SAME mic + setup you use for normal recordings. Match conditions exactly so generated voice blends seamlessly.
Using Overdub for long passages
What goes wrong: 30+ word passages develop a subtle robotic affect. Listeners may not consciously notice but trust erodes. 'Something felt off' reviews appear.
How to avoid: Overdub for fixes (3-15 words). Re-record for anything longer. Test: would you accept a typo fix? If yes, Overdub OK. If you'd want a full re-take, do that instead.
Cloning voices without consent
What goes wrong: Violates Descript ToS, potentially carries legal liability (right-of-publicity, defamation), and erodes trust if discovered. The damage outweighs the convenience.
How to avoid: Only train YOUR own voice (or someone who has given recorded consent per Descript's policy). For guest voices: re-invite them to re-record or skip the section.
Not testing the clone before relying on it
What goes wrong: First time you use Overdub in production is also the first time you hear how it sounds. Quality is poor. Episode ships with audible AI voice.
How to avoid: After training completes, generate 20-30 test sentences. Listen carefully. Decide whether quality is shippable BEFORE using it in real content.
No final QC pass on Overdub sections
What goes wrong: Overdub sections that sounded fine in isolation feel off in context. Listeners notice the transitions. Episode quality is unevenly perceived.
How to avoid: Mandatory QC pass: listen to every Overdub section in context with surrounding real audio. Re-generate or re-record if anything sounds off.
Letting Overdub access spread across the team
What goes wrong: Multiple editors can generate fake host audio. Accountability blurs. Misuse risk grows. Compliance issues if Overdub is used without proper logging.
How to avoid: Limit Overdub access to 1-2 editors per voice. Workspace admins audit usage quarterly. Document the policy.

Recap

What to take away

Pro plan only. Train ONLY your own voice or with explicit recorded consent.
Train in a quiet, treated room with the SAME mic you use for normal recordings.
Best for fixes (3-15 words). Risky for long passages (30+ words).
Test quality before relying on it. Generate 20-30 test sentences.
Final QC pass: listen to every Overdub section in context.
Limit access to 1-2 editors per voice. Disclose AI voice generation where appropriate.

Done — what's next

How to set up a Descript account for podcast + video editing

Read the next tutorial

Hand it off

Setting up Overdub takes 90 minutes. Using it judiciously across weekly episode production — knowing when to fix, when to regenerate, when to re-record — is craft. A vetted video editor on EverestX handles Overdub responsibly as part of full post-production from $14-16/hr.

See video editor rates

Frequently Asked Questions

Is Descript Overdub really good enough to fool listeners?

For 3-15 word fixes: yes, indistinguishable in most cases. For 20+ word passages: trained ears detect slight artifacts. For full paragraphs: most listeners notice something feels off even if they can't articulate why. Use accordingly.

Can I clone someone else's voice in Descript?

Only with their explicit recorded consent per Descript's ToS. They must record a consent statement saying their name, the date, and permission to train an AI model on their voice. Descript validates this matches the training audio. Cloning without consent violates ToS and may carry legal risk.

How much training audio does Overdub need?

Descript provides a training script of ~10-15 minutes. Read it cleanly, in your natural voice, in a treated room with your usual mic. More training audio (up to ~30 min) marginally improves quality. Quality of audio matters more than quantity.

Do I need to disclose Overdub use in my podcast?

For small typo fixes: typically no — listeners don't expect 'fixing a misspoken name' to be disclosed. For long Overdub-generated passages: yes, ethical practice (and emerging FTC guidance) suggests disclosure. When in doubt, disclose. Trust is the foundation; once broken it's hard to rebuild.

Can Overdub speak in different languages or accents?

Overdub generates in the language of training audio. If you trained in English, it generates English. It maintains your accent — won't switch to a different accent. To generate in another language, you'd need to train a separate voice on audio in that language.

How to set up and use Descript Overdub voice cloning

Verify your plan and understand the consent policy

Record clean training audio

Test your voice clone quality

Use Overdub for small fixes in existing recordings

Use Overdub sparingly for new content generation

Set up team Overdub (multiple voices in one workspace)

Quality control + final review

What goes wrong (and how to avoid it)

What to take away

Frequently Asked Questions

Related tutorials

How to set up a Descript account for podcast + video editing

How to produce a podcast end-to-end in Descript

How to use Descript's transcript-based editing workflow

When to hire a podcast / video editor for Descript workflows

How to set up Loom AI transcripts, summaries, and chapters

How to set up and use Descript Overdub voice cloning

Verify your plan and understand the consent policy

Record clean training audio

Test your voice clone quality

Use Overdub for small fixes in existing recordings

Use Overdub sparingly for new content generation

Set up team Overdub (multiple voices in one workspace)

Quality control + final review

What goes wrong (and how to avoid it)

What to take away

Frequently Asked Questions

Related tutorials

How to set up a Descript account for podcast + video editing

How to produce a podcast end-to-end in Descript

How to use Descript's transcript-based editing workflow

When to hire a podcast / video editor for Descript workflows

How to set up Loom AI transcripts, summaries, and chapters