Loading tutorials…
Loading tutorials…
The best content marketing teams produce one long-form piece and extract 5-10 short-form pieces from it. Descript is built for this workflow — transcript-driven highlight extraction, auto vertical cropping, captions on every clip. Here's the playbook.
Who this is forContent marketers, founders, or marketing leads who already produce long-form content (podcast, webinar, YouTube video) and want to systematically extract short-form clips for Reels, TikTok, Shorts, LinkedIn.
What you'll need
Step 1
During the long-form editorial pass, tag 6-12 highlight sections with #clip. Each tag becomes a candidate short-form piece.
The most efficient time to find highlights is during the original edit — you're already reading the transcript carefully.
Select any 30-90 sec section of strong content → right-click → Add Tag → #clip.
What makes a good clip:
— A specific claim or insight (not generic 'lots of value here')
— Self-contained: makes sense without the surrounding 5 minutes of context
— Has a hook in the first 3-5 seconds
— Ideally a specific number, framework, contrarian opinion, or memorable story
Aim for 6-12 tagged moments per 60-min episode. More than 12 = you're tagging weak stuff.
After publishing the long-form, jump to each tag → start the clip extraction workflow.
Step 2
Library → Templates → 9:16 Vertical Clip template. Brand colors, animated captions, lower-third with show name, end card. Clone per clip.
Build the vertical-clip template once. Save in `05 — Templates`.
Project size: 1080×1920 (9:16 vertical) for Reels/TikTok/Shorts.
Top safe zone (~150px): leave clear for platform UI overlays.
Bottom safe zone (~250px): leave clear for platform username/CTA bars.
Center: where main video content lives.
Caption template: 32-40px font, bold, white text with dark stroke, animated word-by-word. Caption position: middle or lower-center.
Brand lower-third (first 3-5 sec): your podcast/show name + episode #.
End card (last 3-5 sec): 'Full episode in bio' + your handle. Or a CTA matched to the platform.
Save as template. Every clip extraction = clone, drop in source clip, customize captions.
Step 3
For each tagged section: File → Export Selection → New Project → Aspect Ratio 9:16 → Auto speaker tracking → captions on.
Open the long-form project. Jump to each #clip tag.
Select the section (30-90 sec). Right-click → Send to New Project (or File → Export Selection → New Project).
In the new project: Tools → Aspect Ratio → 9:16 Vertical.
Descript prompts: which subject to track? Choose 'Active speaker' for interview clips, 'Specific person' for solo clips. Auto-tracking keeps the speaker centered in 9:16 frame as they move.
Verify the auto-tracking: scrub through the clip. If the speaker drifts off frame, manually adjust by setting keyframes (Tools → Keyframe → Position).
Apply your saved vertical template: layout, brand colors, lower-third, end card all populate.
Generate captions: Tools → Captions → Auto-generate from transcript. Use animated word-by-word style for high engagement.
Step 4
Reels: 30-90 sec, snappy captions, hook in first 3 sec. TikTok: 30-60 sec, trending audio optional. Shorts: <60 sec, strong cold open.
Per-platform optimization matters. Don't just upload the same clip to every channel.
Instagram Reels: 30-90 sec optimal. Captions essential. Use 3-5 trending hashtags in caption. Hook the viewer in first 3 sec.
TikTok: 30-60 sec sweet spot. Captions essential. Trending audio can boost reach (overlay subtly under voice). Front-load the hook.
YouTube Shorts: <60 sec. Captions essential. Add #Shorts in title. End-screen with full-episode link.
LinkedIn: 60-90 sec. Captions essential. Polished, professional tone. CTAs in caption text (not video) — LinkedIn audience clicks captions, not video CTAs.
X/Twitter: 30-60 sec. Captions essential. Lead with the most provocative claim in the first sentence — Twitter scrolls fast.
Step 5
Audiogram (waveform + still image): great for Twitter/LinkedIn where video plays muted. Descript: Tools → Audiogram → choose template.
Audiograms are video clips where the visual is a static image (host photo, podcast cover art) + animated waveform + captions.
They work where short-form video doesn't get sound (Twitter feed, LinkedIn feed, embedded blog posts).
Descript: Tools → Audiogram (or via Insert → Audiogram in some UI versions).
Choose template: full-screen still image with waveform on bottom, side-by-side image + waveform, or animated branded card with waveform.
Customize: image (host photo or cover art), brand colors, caption position.
Generate: Descript creates the audiogram clip. Export as MP4 (vertical 9:16 or square 1:1 for LinkedIn).
Use case: each podcast episode generates 3-5 audiograms in addition to video clips. Different content for muted-feed scrolling vs sound-on engagement.
Step 6
Day 0: episode + main clip. Day 1: hook clip. Day 2-3: insight clips. Day 4-7: deep cuts + audiogram. Schedule in Buffer/Later/Hootsuite.
Don't dump all clips on launch day — you compete with yourself.
Recommended launch-week cadence for 1 long-form episode + 6 clips:
— Day 0 (launch day): full episode published + 1 hero clip (the strongest, most universally appealing).
— Day 1: hook clip (a 30-sec teaser of the most provocative moment).
— Day 2-3: insight clips (the framework/lesson/numbered moments).
— Day 4-5: story clips (anecdotal moments that humanize).
— Day 6-7: deep cuts + audiogram + 'next episode preview' if applicable.
Schedule in your social tool (Buffer, Later, Hootsuite, Metricool). Set platform-specific times based on audience timezone data.
Track per-clip performance: views, completion rate, replies. Highest-performing clip = use the format/topic in next episode.
Step 7
Top 20% of clips get re-cut for different platforms. Iterate the format. Build a library of "what works" for your audience.
After 4-8 weeks, you'll have data on which clips/topics performed.
Top performers (top 20% by engagement): re-cut for additional platforms, longer formats (LinkedIn video carousel, blog excerpt), or sequels (this topic deserves more depth).
Bottom performers (bottom 20%): what didn't work? Topic? Hook? Format? Don't repeat the pattern.
Build a 'what works' doc: 5-10 patterns that consistently engage your audience. Examples: 'numbered frameworks land,' 'contrarian takes on industry sacred cows land,' 'specific dollar amounts land more than vague claims.'
Feed this back into the original recording strategy: when interviewing or scripting next episodes, deliberately create more of the proven formats.
Compound effect: by month 6, your hit rate on clips improves 30-50% because you're building on data, not guessing.
Common mistakes
Extracting only 1-2 clips from a 60-min episode
What goes wrong: 80% of the long-form ROI is left on the table. Audience growth stalls because long-form alone doesn't compound in 2026's short-form-dominant feeds.
How to avoid: 6-12 tagged moments per episode. Even average clips perform — the volume + variety is what drives growth.
Cross-posting identical clips to every platform
What goes wrong: Reels caption style fails on LinkedIn. TikTok hook fails on YouTube Shorts. Engagement drops 30-50% vs platform-optimized variants.
How to avoid: Per-platform optimization: different captions, durations, hooks, hashtags. Same clip can become 4-5 platform-specific variants in 5 min each.
No captions on clips
What goes wrong: 60-85% of short-form viewers scroll muted. No captions = no comprehension. Completion rate tanks. Algorithm doesn't push.
How to avoid: Captions on every clip, every platform. Animated word-by-word style gets the best engagement on Reels/TikTok/Shorts.
Dumping all clips on launch day
What goes wrong: 6 clips compete for the same audience attention in the same 24 hours. Each underperforms. Stretching them across a week gets 3-5x more total reach.
How to avoid: Stagger clips over launch week. Day 0 hero, days 1-7 supporting. Use Buffer/Later/Hootsuite for scheduling.
Skipping the audiogram variants
What goes wrong: LinkedIn and Twitter feeds play muted by default. Without audiograms, your content has no visual story when sound is off.
How to avoid: Generate 2-3 audiograms per episode using Descript's built-in tool. Different formats for muted-feed scrolling.
Not tracking per-clip performance
What goes wrong: You make 6 clips per episode for 6 months. 72 clips total. You can't tell which formats worked. You guess at next episode's strategy.
How to avoid: Spreadsheet per clip: platform, topic, hook style, views, completion rate. After 8-12 weeks, patterns emerge. Lean into what works.
Recap
Done — what's next
How to produce a podcast end-to-end in Descript
Read the next tutorial
Hand it off
Running this workflow weekly takes 2-3 hours per episode. A vetted video editor on EverestX runs the long-to-short clip extraction + distribution from $14-16/hr — typically $600-1,400/mo for a weekly content cadence. ROI is usually a 3-5x lift in audience growth within 60 days.
See video editor rates
Target 6-12 clips per 60-min episode. Fewer than 6 = leaving audience growth on the table. More than 12 = you're tagging weak moments. Track per-clip performance after 60 days; the top-3 clips typically drive 60-70% of clip-based audience growth.
Same source clip, different optimizations. Reels: 30-90 sec, 3-5 hashtags, hook in 3 sec. TikTok: 30-60 sec, trending audio optional. Shorts: <60 sec, #Shorts in title. Captions on all. The differences in caption style and duration matter more than you'd think.
Partially. Descript AI can suggest clip candidates based on transcript analysis (looks for self-contained moments with strong hooks). It's about 60-75% accurate. Use as a starting point — your editorial judgment refines the selection. Auto-extraction without review = inconsistent clip quality.
Descript's caption styles include animated word-by-word, line-by-line, and karaoke styles. For short-form: animated word-by-word with bold white text + dark stroke at 32-40px. Position middle or lower-center to stay clear of platform UI. Custom brand fonts available on Pro+.
Typical lift: long-form alone reaches X audience. Long-form + 6 well-extracted clips reaches 5-10x. Why: short-form discovers new audience (TikTok/Reels algorithms surface to non-followers), while long-form deepens engagement with existing audience. The combo compounds.
Descript
Most podcasters bounce between 4 tools: Riverside for recording, Descript for editing, Audacity for cleanup, Canva for show art. Descript can do 80% of it in one app. Here's the full episode workflow.
Descript
Descript's killer feature: edit your video by editing the transcript. Delete a word in the doc, the audio/video deletes too. This walks the full workflow — Filler Word Removal, Word Find & Replace, Strikethrough, and Studio Sound — that makes Descript 5-10x faster than traditional editors.
Descript
Editing is half the job. Publishing, sharing review links, gathering feedback, and shipping cleanly is the other half. Descript has solid publishing + collaboration features — here's how to use them so projects don't get stuck in 'waiting for feedback' purgatory.
Descript
Most content creators hit the DIY Descript ceiling at 10-20 episodes. Editing eats 6-12 hours/week per episode. Quality plateaus. Cadence slips. Here's the honest framework for when to bring in a podcast or video editor.
Loom
Most reps record Looms that look like screenshares from 2018. The reps closing 2-3× more in async video did the same setup work — backdrop, CTA card, thumbnail strategy, CRM logging. Here's the full sales-outreach configuration.