Audio & Video Transcription
The Archiver transcribes spoken content in audio and video items, producing a time-segmented transcript synchronised with the player. Available on Professional plans and above.
What gets transcribed
Audio and video transcription is opt-in. After uploading a media file, you'll see a transcribability score and can choose whether to run transcription. When you do, the transcript:
- Is segmented by speaker turn where speakers are distinguishable
- Includes per-segment timestamps
- Is fully searchable — the transcript joins the file's full-text index, so search hits across the archive include audio and video content
- Powers the synchronised player — click any segment to jump to that point
Quality and the transcribability score
Before running a full transcription, the platform produces a transcribability score from 1 to 10 based on a short sample:
- 8–10 — clean audio, single or distinguishable speakers, modern recording. Expect excellent accuracy.
- 5–7 — usable audio, some background noise, accents or older recording. Good for content; review proper nouns.
- 1–4 — poor audio quality. Transcription may still run but expect significant errors; consider uploading a higher-quality version if available.
Items scoring below 4 are flagged on the item; the transcript is generated but accompanied by a quality warning.
Item costs
| Item | Item cost |
|---|---|
| Audio file | 1 |
| Video file | 3 |
Video is 3 because transcription + frame extraction + content analysis run in parallel. Each costs about as much as a single document item to process.
There's no extra charge for transcription beyond the standard item cost — it's bundled with processing on Professional and above.
Editing the transcript
Open the item detail page. Beneath the player, the transcript is editable in place:
- Click any segment to edit the text
- Tab advances to the next segment
- Speaker labels are editable — useful for renaming "Speaker 1 / Speaker 2" to actual names
Edits save automatically. The original (untouched) machine transcript is preserved internally and can be restored via the … menu.
Downloading transcripts
From the item's … menu:
- Download as SRT — standard subtitle format
- Download as VTT — WebVTT for HTML5
- Download as TXT — plain text, no timestamps
- Download as JSON — full structure with timestamps, confidence, speaker labels
Transcripts are also included in every accession export — see Export formats.
Two-pass refinement
Professional+ uses a two-pass approach for longer audio:
- First pass captures the overall content and produces a draft transcript with rough boundaries.
- Second pass refines the draft using context from the first pass — improves proper nouns, speaker turn boundaries, and overall coherence.
This happens automatically; you don't need to do anything beyond uploading.
Limits
- Maximum duration per file: typically the file-size cap is the binding constraint — a 500 MB MP3 is several hours, a 2 GB video is comfortably an hour at HD.
- Languages: English is the strongest. The platform auto-detects the language; non-English transcription is supported but accuracy varies by language. Set your preferred output language in Profile → AI & Defaults.
Bypassing transcription
If you already have a transcript (SRT, VTT, or TXT), upload it alongside the media. The platform uses your transcript instead of generating one. Useful for:
- Files you've had professionally transcribed
- Older items where you have a typed transcript already
- Languages where you have a better tool for transcription