Audio & Video Transcription

The Archiver transcribes spoken content in audio and video items, producing a time-segmented transcript synchronised with the player. Available on Professional plans and above.

What gets transcribed

Audio and video transcription is opt-in. After uploading a media file, you'll see a transcribability score and can choose whether to run transcription. When you do, the transcript:

Is segmented by speaker turn where speakers are distinguishable
Includes per-segment timestamps
Is fully searchable — the transcript joins the file's full-text index, so search hits across the archive include audio and video content
Powers the synchronised player — click any segment to jump to that point

Quality and the transcribability score

Before running a full transcription, the platform produces a transcribability score from 1 to 10 based on a short sample:

8–10 — clean audio, single or distinguishable speakers, modern recording. Expect excellent accuracy.
5–7 — usable audio, some background noise, accents or older recording. Good for content; review proper nouns.
1–4 — poor audio quality. Transcription may still run but expect significant errors; consider uploading a higher-quality version if available.

Items scoring below 4 are flagged on the item; the transcript is generated but accompanied by a quality warning.

Item costs

Item	Item cost
Audio file	1
Video file	3

Video is 3 because transcription + frame extraction + content analysis run in parallel. Each costs about as much as a single document item to process.

There's no extra charge for transcription beyond the standard item cost — it's bundled with processing on Professional and above.

Editing the transcript

Open the item detail page. Beneath the player, the transcript is editable in place:

Click any segment to edit the text
Tab advances to the next segment
Speaker labels are editable — useful for renaming "Speaker 1 / Speaker 2" to actual names

Edits save automatically. The original (untouched) machine transcript is preserved internally and can be restored via the … menu.

Downloading transcripts

From the item's … menu:

Download as SRT — standard subtitle format
Download as VTT — WebVTT for HTML5
Download as TXT — plain text, no timestamps
Download as JSON — full structure with timestamps, confidence, speaker labels

Transcripts are also included in every accession export — see Export formats.

Professional+ uses a two-pass approach for longer audio:

First pass captures the overall content and produces a draft transcript with rough boundaries.
Second pass refines the draft using context from the first pass — improves proper nouns, speaker turn boundaries, and overall coherence.

This happens automatically; you don't need to do anything beyond uploading.

Limits

Maximum duration per file: typically the file-size cap is the binding constraint — a 500 MB MP3 is several hours, a 2 GB video is comfortably an hour at HD.
Languages: English is the strongest. The platform auto-detects the language; non-English transcription is supported but accuracy varies by language. Set your preferred output language in Profile → AI & Defaults.

Bypassing transcription

If you already have a transcript (SRT, VTT, or TXT), upload it alongside the media. The platform uses your transcript instead of generating one. Useful for:

Files you've had professionally transcribed
Older items where you have a typed transcript already
Languages where you have a better tool for transcription

What gets transcribed​

Quality and the transcribability score​

Item costs​

Editing the transcript​

Downloading transcripts​

Two-pass refinement​

Limits​

Bypassing transcription​