Loading tutorials…
Loading tutorials…
Most podcasters bounce between 4 tools: Riverside for recording, Descript for editing, Audacity for cleanup, Canva for show art. Descript can do 80% of it in one app. Here's the full episode workflow.
Who this is forPodcasters publishing weekly to monthly. Solo show hosts, interview shows, or co-hosted formats. If you're spending 8-12 hours per episode in post, this workflow cuts it to 3-5.
What you'll need
Step 1
Build episode template in Descript Library → Templates. Includes intro/outro placeholders, music beds, lower-thirds. Clone per episode.
Library → 05 — Templates folder → + New Project → "Podcast Episode Template."
Add at the start: intro music (15-20 sec), a placeholder text "Welcome to [show name] — today we're talking about [topic]" for the host's intro voiceover, then a placeholder for guest intro.
Add at the end: 'Thanks for listening — links in show notes' outro voiceover placeholder, outro music (15-20 sec).
Add brand lower-third for video version: name + episode number, host name, guest name. Position bottom-left, appears for first 5 sec of each speaker change.
Set color palette: workspace-defined brand colors. Lower-thirds, title cards, transitions all pull from this.
Save. Every new episode = right-click template → Duplicate. Renames, ready to import audio.
Step 2
Solo: record directly in Descript. Interviews: use Riverside/SquadCast/Zoom local-only recording. Always get each speaker on their own track.
Solo episodes: Descript → New Project → Record → Audio Only. Mic check (peak levels around -6 to -3 dB). Record.
Remote interviews: use a tool with separate local tracks (Riverside, SquadCast, Zencastr). Avoid recording just from Zoom — Zoom's single combined track is unfixable in post when one speaker has bad audio.
After remote recording, download each speaker's separate track. Import all tracks into Descript as a multi-track project.
In Descript: View → Tracks → assign each speaker. Descript can also auto-detect speakers if recorded as one file, but accuracy is 60-75% — manual track-per-speaker is reliable.
In-person multi-mic recording: each mic into its own channel. Same multi-track import workflow.
Always record + save backup audio locally on each speaker's machine. Network drops happen. Re-recording due to lost audio is the worst.
Step 3
Let Descript auto-transcribe (3-5 min for 60-min episode). Run Filler Word Removal first. Quick read-through to catch obvious issues.
After import, transcript generates automatically. Each speaker shows in a different color in the script panel.
Tools → Filler Word Removal → preview detections. Selectively approve (keep some 'um' for natural cadence, remove the egregious ones).
Quick read-through: scan the transcript at 1.5× speed. Flag major issues: long tangents, dead air, technical glitches, anything that needs an editorial cut.
Don't try to edit on this pass — just mark with tags or strikethrough. Editorial decisions happen in pass 2.
Time check: cleanup pass on a 60-min raw interview should take 15-25 minutes. If it's taking longer, the source recording has bigger issues that need addressing before continuing.
Step 4
Read transcript end-to-end. Delete tangents, off-topic sections, and weak quotes. Rearrange order for narrative flow. Target 20-30% time reduction.
Read the transcript completely. Mark editorial decisions:
Cut: long tangents that don't serve the episode's promise. 4-minute side-stories about something irrelevant get deleted.
Tighten: when a guest takes 2 minutes to make a point that could land in 30 seconds, cut the buildup.
Rearrange: drag transcript sections to reorder for narrative flow. Sometimes the best quote in minute 47 should open the episode.
Strengthen: if a key topic was discussed briefly, leave space for the host to add a short voiceover bridge during post-production (record separately).
Cold open: pick the most compelling 30-90 seconds and move it to the start, before the intro music. Hooks listeners in the first 10 seconds.
Target reduction: 60-min raw interview → 40-50 min final episode. 90-min raw → 55-70 min final. Anything longer than 60 min final = niche audiences only; mass audiences drop off.
Step 5
Apply Studio Sound only to noisy voice tracks. Set music ducking (-12 to -18 dB under voice). Normalize peaks to -3 dB. Master loudness to -16 LUFS.
Audio mixing is where amateur and professional podcasts diverge.
Studio Sound: apply to any voice track that sounds noisy/echoey. Compare before/after; don't apply if the source is already clean.
EQ: most voice tracks benefit from a high-pass filter at 80 Hz (removes rumble) and a slight cut at 200-300 Hz (removes 'boxy' resonance). Descript's effects panel handles both.
Compression: light compression evens out volume between quiet and loud moments. Descript applies this automatically in Studio Sound; manual compression on top is usually unnecessary.
Music ducking: when intro/outro music plays under voice, music should sit -12 to -18 dB below voice. Descript auto-ducks; verify per track.
Loudness normalization: target -16 LUFS for podcast distribution standards (Apple Podcasts spec). Descript shows LUFS in the export panel — adjust master gain to hit -16.
Peak normalization: peaks should not exceed -3 dB. Descript flags clipping; fix with gain reduction on the offending track.
Step 6
For YouTube/video versions: lower-thirds per speaker, brand color overlays, chapter markers, b-roll on key moments, ad reads if applicable.
Most podcasts now also publish a video version (talking heads + lower-thirds) to YouTube. Descript handles this in the same project.
Lower-thirds: pre-built in your template. Verify each speaker change triggers a fresh lower-third with name + role.
Chapters: Tools → Chapter Markers. Add at major topic shifts (typically 5-8 per episode). YouTube uses these for chapters in the player.
B-roll: add image/screen-recording overlays at the moments where a speaker references something visual. Reduces fatigue of pure talking-head viewing.
Brand stinger/intro card: 3-5 sec animated brand intro at the very start. Built once in the template, reused per episode.
Ad reads: if you do mid-roll ads, insert at natural break points. Use a music transition (1-2 sec stinger) into and out of ad reads.
Step 7
Export MP3 for podcast feed, MP4 for YouTube, separate clip files for socials. Publish via Descript or upload manually. Schedule socials.
Export → Audio (MP3, 128 kbps stereo) for podcast hosts (Buzzsprout, Anchor, Libsyn, Transistor).
Export → Video (MP4, 1080p H.264) for YouTube.
Extract socials: jump to your tagged #clip sections from the edit pass. For each, select the 30-90 sec section → File → Export Selection → MP4 with auto-subtitles (Descript can generate subtitles in the export).
Clips for vertical formats (Reels, TikTok, Shorts): Descript can crop 16:9 to 9:16 with auto speaker-tracking. Tools → Aspect Ratio → 9:16. Each clip becomes a vertical short.
Show notes: open the transcript → File → Export Transcript → Markdown. Edit for show notes (links, key timestamps, guest bio).
Publish: Descript can publish directly to Buzzsprout, YouTube, Spotify via integrations. Saves the export-and-upload step.
Schedule socials in Buffer, Later, or Hootsuite. Stagger clip releases over the week the episode drops.
Common mistakes
Recording remote interviews on Zoom only
What goes wrong: Zoom records a single combined track. One bad mic ruins both speakers. Post-production can't separate them. You either ship low-quality or re-record entirely.
How to avoid: Use Riverside, SquadCast, or Zencastr for remote interviews. Each speaker gets their own local high-quality track. Worst case, even Zoom with "Record each participant separately" (paid feature) is better than nothing.
Not building episode templates
What goes wrong: Each episode starts from scratch. Intro music, brand colors, lower-thirds, music ducking all get re-built every time. 1-2 hours wasted per episode.
How to avoid: Build the episode template once in 05 — Templates folder. Clone per episode. Saves 1-2 hours every time.
Skipping the editorial pass
What goes wrong: 60-min episodes drag through 4-min tangents. Listener drop-off at minute 12. Reviews complain about pacing. Subscribers stop growing.
How to avoid: Mandatory editorial pass: cut to 20-30% shorter than raw. Strongest opens, tightest middles, shortest endings. Listen at 1.5× to verify pace.
No loudness normalization
What goes wrong: Episodes get auto-adjusted by distribution platforms. Sometimes too quiet, sometimes too loud. Listener fatigue. Compared to professional podcasts you sound amateur.
How to avoid: Normalize to -16 LUFS before every export. Descript shows the meter; adjust master gain to hit target. Industry standard, non-negotiable.
Releasing without clips
What goes wrong: Podcast episode drops to crickets on social. Discoverability suffers. New listener acquisition stalls at organic-search-only.
How to avoid: Extract 3-5 clips per episode (30-90 sec). Vertical format for Reels/TikTok/Shorts. Schedule the week of release. Drives 30-50% of new listener growth.
Not exporting show notes from transcript
What goes wrong: Show notes get written from scratch in Google Docs. Takes 30-45 min per episode. Inaccurate quotes. Timestamps off.
How to avoid: File → Export Transcript → Markdown. Edit transcript into show notes in 10-15 min. Timestamps are accurate. Quotes are verbatim.
Recap
Done — what's next
How to set up a Descript account for podcast + video editing
Read the next tutorial
Hand it off
Producing weekly podcasts solo takes 8-12 hours/episode in DIY mode, 4-6 hours with the Descript workflow. Hiring a vetted podcast video editor gets it down to 1-2 hours of host time (recording + light review) at $14-16/hr — typically $1,000-2,000/mo for a weekly podcast with video + clips + show notes.
See podcast editor rates
With the workflow above: 4-6 hours per 60-min episode (excluding recording). Breakdown: 30 min cleanup, 60 min editorial edit, 60 min mix, 60 min visuals + chapters, 30 min export + socials + show notes. Faster with templates + practice. Slower the first 5-10 episodes.
For solo episodes: Descript's built-in recorder is fine — same audio quality as any other tool when paired with a good mic. For remote interviews with 2+ speakers: use Riverside, SquadCast, or Zencastr. They record each speaker locally on their own machine then upload separate tracks — Descript can't do that for remote guests.
Four things: (1) good mic + treated room (matters more than tools), (2) multi-track recording — never one combined track, (3) loudness normalization to -16 LUFS, (4) editorial discipline — cut 20-30% of the raw recording. Pro production isn't 'better tools,' it's relentless craft.
Tag #clip during your editorial pass on highlight quotes. After publish, jump to each tag → Export Selection → MP4 with subtitles. For vertical formats (Reels/TikTok/Shorts), use Tools → Aspect Ratio → 9:16 with auto speaker-tracking. 3-5 clips per episode in 30-45 min.
Yes — integrations exist for Buzzsprout, Anchor (Spotify for Podcasters), Transistor, and Captivate. Connect once in Settings → Publishing. Then in your project, click Publish → choose host → uploads MP3 + metadata. Saves the export-then-upload step. For non-integrated hosts (Libsyn, Podbean), export and manually upload.
Descript
Descript is the editor that thinks like a doc — but the install is misleadingly simple. Skip the workspace + project + storage configuration and you get untitled files everywhere and editors fighting over which version is current. Here's the proper setup.
Descript
Descript's killer feature: edit your video by editing the transcript. Delete a word in the doc, the audio/video deletes too. This walks the full workflow — Filler Word Removal, Word Find & Replace, Strikethrough, and Studio Sound — that makes Descript 5-10x faster than traditional editors.
Descript
Editing is half the job. Publishing, sharing review links, gathering feedback, and shipping cleanly is the other half. Descript has solid publishing + collaboration features — here's how to use them so projects don't get stuck in 'waiting for feedback' purgatory.
Descript
Most content creators hit the DIY Descript ceiling at 10-20 episodes. Editing eats 6-12 hours/week per episode. Quality plateaus. Cadence slips. Here's the honest framework for when to bring in a podcast or video editor.
Loom
Most Loom workspaces hit 200 videos and become unsearchable. Then 500 and become abandoned. The teams whose libraries stay alive at 1,000+ videos did the structural work upfront. Here's that work.