Loading tutorials…
Loading tutorials…
Descript's killer feature: edit your video by editing the transcript. Delete a word in the doc, the audio/video deletes too. This walks the full workflow — Filler Word Removal, Word Find & Replace, Strikethrough, and Studio Sound — that makes Descript 5-10x faster than traditional editors.
Who this is forAnyone editing talking-head content (podcasts, interviews, webinars, course modules, sales-team training). If you spend most of your editing time on Premiere/DaVinci timeline trimming, this workflow saves 60-80% of that time.
What you'll need
Step 1
New Project → Import existing audio/video, OR record directly inside Descript. Transcript generates automatically in 1-3 min for most clips.
Open Descript → New Project. Name it after your episode/video.
Import: drag-and-drop your source MP4, MOV, MP3, WAV, or M4A into the project. Descript supports most common formats.
OR record directly: click Record → choose Audio Only, Audio + Video, or Screen Recording. Set tracks (one per speaker for interviews — Descript can auto-detect speakers but not perfectly).
After import or recording, Descript automatically transcribes. A 60-min file takes ~3-5 minutes to transcribe (uses Descript's servers).
Verify transcript appears in the script panel (left side of the editor). Each spoken word shows underlined when audio plays.
If transcription quality is poor: open Project Settings → Transcription Language. Confirm correct language. Re-transcribe if needed.
Step 2
Tools → Filler Word Removal → preview detected 'um/uh/like/you-know.' Choose to remove all or selectively. Saves 30-60 min per episode.
Tools menu → Filler Word Removal (or use the shortcut depending on UI version).
Descript scans the transcript for: 'um,' 'uh,' 'er,' 'ah,' 'like,' 'you know,' 'I mean,' 'so' (when used as filler). Configurable per project.
Preview the detections. Descript shows each instance with surrounding context — you decide what to keep or remove.
For most podcasts: remove 'um,' 'uh,' 'er.' Keep 'like' and 'you know' (they're natural speech; removing all makes the speaker sound robotic).
For sales/marketing video: remove more aggressively. The polish matters more than natural cadence.
Click 'Remove All' or selectively check what to remove. Descript deletes both the transcript words and the corresponding audio/video.
Time savings: a 60-min raw interview typically has 200-400 filler words. Manually removing each in Premiere = 60-90 min. Descript auto-detection + selective approve = 5-10 min.
Step 3
Select any word/sentence in the transcript → Delete. The corresponding audio/video deletes. Drag-select for multi-sentence cuts. Cmd/Ctrl+Z reverts.
The core workflow: instead of scrubbing a timeline, you edit the transcript like a Google Doc.
Click the start of a sentence you want to cut. Shift-click the end. Delete key. The transcript text vanishes AND the corresponding audio/video disappears.
Want to keep something but mute the audio? Right-click the selection → Mute. Useful for crosstalk where you want to keep visual but remove sound.
Restore a deleted section: Cmd+Z (Mac) or Ctrl+Z (Windows). Or use the History panel.
For 'almost' cuts: Strikethrough (Cmd+Shift+S on Mac). The text crosses out but doesn't delete. Lets you preview the cut before committing.
Speed: a 60-min interview typically edits down to 35-45 min in 30-45 min of Descript work. Same edit in Premiere = 3-4 hours.
Step 4
Cmd+F to find any word in the transcript. Replace all 'so' with nothing to batch-remove. Tag sections with hashtags for later finding.
Cmd+F (or Ctrl+F) opens Find. Type any word; Descript shows every occurrence in the transcript with surrounding context.
Find + Replace: replace one word with another, or batch-delete by replacing with blank. Example: replace 'so basically' with '' to remove every instance of that filler phrase.
Caveat: word-replace deletes the audio/video for that word too. Listen before replacing widely — a literal 'so' as filler is fine to remove, but a 'so' that begins an actual conclusion isn't.
Tagging: select any section → right-click → Add Tag (or #shortcut). Tags help mark 'great quote,' 'b-roll moment,' 'cut potential' for later passes.
View tagged sections: View → Tags Panel. Jump to any tagged section instantly. Great for podcasts where you flag highlights for clip-extraction.
Step 5
Tools → Studio Sound → applies AI-driven noise reduction + EQ to make any recording sound podcast-clean. One-click per track.
Tools → Studio Sound (or the toolbar Studio Sound button).
Descript applies AI-driven noise reduction, room-reverb removal, EQ smoothing, and level normalization to the selected audio track.
Before/After: critical to compare. Studio Sound is aggressive — sometimes too aggressive on already-clean studio audio (introduces a slight 'compressed' sound). Less aggressive on bad audio = bigger lift.
Per-track application: apply Studio Sound only to tracks that need it (e.g., the noisy guest mic). Leave the clean host mic untouched.
Toggle on/off per clip if you have inconsistent source quality across sections.
When to use: phone interviews, Zoom recordings, podcasts recorded in noisy cafes. When to skip: high-quality studio recordings where Studio Sound makes them sound LESS natural.
Caveat: Studio Sound is not a substitute for good recording. It's a rescue tool. Garbage in = better-but-still-not-great out.
Step 6
Drag media into the script position where you want it. Descript timelines support multi-track video, audio, graphics. Stays in sync as you edit.
Beyond the transcript edit, Descript supports timeline-style additions: b-roll clips, graphics, music beds, lower-third titles.
Drag media (image, video, audio file) from the Library into the script at the position you want it. It anchors to that transcript word and stays anchored as you continue editing.
Add lower-thirds: Insert → Lower Third → choose template → customize text + brand. Renders on the video output during the specified transcript range.
Music beds: drag an audio file into the timeline; Descript auto-ducks (lowers volume) under voice. Configure duck amount: Audio menu → Ducking settings.
Transitions: between video clips, drag a transition from the Effects panel. Cut, dissolve, push, slide. Don't overuse — content is the story, transitions are seasoning.
Brand templates: workspace can save lower-third templates, intro/outro sequences, and color palettes. Editors clone instead of rebuilding per project.
Step 7
Watch the full edit at 1.25× speed for pace check. Final read-through of transcript for typos. Publish → Export → MP4/MP3/Audio podcast feed.
Before exporting, do TWO reviews:
Review 1 (pace): play the full edit at 1.25× or 1.5× speed. Listen for: sentences that feel rushed (need a breath added), sections that drag (need more cuts), abrupt transitions (need a beat).
Review 2 (transcript): read the transcript carefully. Fix typos. Most teams ship the transcript alongside the audio/video as captions or show notes.
Export: Publish → Export.
Export options: MP4 video (1080p or 4K), MP3 audio, WAV audio for podcast hosts (Buzzsprout, Anchor, Libsyn), separate audio tracks for multi-track post.
For podcasts: export to MP3 at 128 kbps stereo (audiophile podcasts: 192 kbps). For video: 1080p H.264 at the highest quality your platform accepts.
Publish directly: Descript can publish to YouTube, Buzzsprout, Anchor, Spotify, Apple Podcasts via integrations. Saves the export-then-upload step.
Save the final project. Move to the right Library folder (don't leave in Recent).
Common mistakes
Trying to use Descript like Premiere
What goes wrong: You drag clips on timelines, ignore the transcript, and miss the 5-10x speed advantage. Descript becomes a worse version of Premiere instead of a better one.
How to avoid: Edit BY transcript. Delete words in the doc. Use the timeline only for b-roll, music, and transitions. The mental shift is the unlock.
Over-removing filler words
What goes wrong: Removing every 'um, uh, like, you know' makes the speaker sound robotic and unnatural. Listeners trust the speaker less. Engagement drops.
How to avoid: Remove 'um,' 'uh,' 'er.' Keep 'like,' 'you know' (natural cadence). Selectively remove only when it's clearly filler, not natural pause.
Applying Studio Sound to everything
What goes wrong: Already-clean studio audio gets the 'over-compressed' sound. Music tracks sound bad. Listeners notice and the polish feels artificial.
How to avoid: Studio Sound only on noisy tracks. Compare before/after every time. Don't apply to music. Less is more.
No b-roll or visual variety on video projects
What goes wrong: Video version is 30 minutes of one talking head. Watch time drops to <40%. YouTube algorithm doesn't push the video.
How to avoid: Add b-roll every 30-60 seconds. Stock footage, screen recordings, image overlays. Even 2-3 seconds of break visual lifts watch time.
Skipping the pace review at 1.25×
What goes wrong: Edit feels right at normal speed but drags at listener pace. Real audience drops off at the 4-min mark. You don't notice until the analytics roll in a week later.
How to avoid: Always play the final edit at 1.25× speed before export. If it feels boring at 1.25×, it's boring at 1×. Cut more.
Not using tags for highlight extraction
What goes wrong: After editing a 60-min podcast, you don't know which 90-sec section to clip for social. You rewatch the whole thing trying to find the highlight.
How to avoid: Tag highlight-worthy quotes #clip during the edit pass. After publish, jump to tagged sections and extract clips in 5 minutes.
Recap
Done — what's next
How to set up a Descript account for podcast + video editing
Read the next tutorial
Hand it off
Mastering the workflow takes 2 hours. Producing 4-8 episodes/month with the editorial judgment to know what to cut, what to elevate, and how to pace listeners is craft work. A vetted video editor on EverestX runs this end-to-end starting at $14-16/hr — typically $800-1,800/mo for a weekly podcast with video + clips.
See video editor rates
For a 60-min talking-head interview: Descript takes 45-75 minutes for a clean edit (filler removal + cuts + clean export). Premiere/Final Cut takes 3-5 hours for the same edit. The 5-10x speedup comes from the transcript-based workflow on the mechanical edits. Color grading + advanced motion graphics: Premiere still wins.
Yes — for talking-head content (podcasts, interviews, webinars, course modules, screen-share tutorials), Descript outputs the same quality as Premiere with 5-10x less time. For cinematic/film work, multi-cam concerts, or heavy VFX: Premiere/DaVinci is still better. Match the tool to the content.
Cmd+Z (or Ctrl+Z) restores. Descript also keeps a History panel showing every action — revert to any prior state. For the cautious: use Strikethrough (Cmd+Shift+S) instead of Delete during the first pass; commits the cut only when you're confident.
Auto removal detects 'um/uh/er/like' patterns globally and lets you bulk-approve. Manual deletion is per-word in the transcript. Use Auto first to catch the obvious filler at scale, then manual for editorial judgment cuts. Auto saves 30-60 min on a 60-min episode.
Descript transcribes 30+ languages with varying accuracy. Best: English, Spanish, French, German, Portuguese, Italian, Japanese. The transcript-based EDITING still works (delete word → delete audio), but transcript accuracy matters more in non-English where dictionary correction options are limited.
Descript
Descript is the editor that thinks like a doc — but the install is misleadingly simple. Skip the workspace + project + storage configuration and you get untitled files everywhere and editors fighting over which version is current. Here's the proper setup.
Descript
Most podcasters bounce between 4 tools: Riverside for recording, Descript for editing, Audacity for cleanup, Canva for show art. Descript can do 80% of it in one app. Here's the full episode workflow.
Descript
Editing is half the job. Publishing, sharing review links, gathering feedback, and shipping cleanly is the other half. Descript has solid publishing + collaboration features — here's how to use them so projects don't get stuck in 'waiting for feedback' purgatory.
Descript
Most content creators hit the DIY Descript ceiling at 10-20 episodes. Editing eats 6-12 hours/week per episode. Quality plateaus. Cadence slips. Here's the honest framework for when to bring in a podcast or video editor.
Loom
Loom AI turns a 10-minute recording into a 4-line summary, 6 chapter markers, and a list of action items in 30 seconds. The features are powerful but only useful when configured for your team's workflow. Here's the full setup.