Skip to main content

Audio & Video Transcription

The Archiver transcribes spoken content in audio and video items, producing a time-segmented transcript synchronised with the player. Available on Professional plans and above.


What gets transcribed

Audio and video transcription is opt-in. After uploading a media file, you'll see a transcribability score and can choose whether to run transcription. When you do, the transcript:

  • Is segmented by speaker turn where speakers are distinguishable
  • Includes per-segment timestamps
  • Is fully searchable — the transcript joins the file's full-text index, so search hits across the archive include audio and video content
  • Powers the synchronised player — click any segment to jump to that point

Quality and the transcribability score

Before running a full transcription, the platform produces a transcribability score from 1 to 10 based on a short sample:

  • 8–10 — clean audio, single or distinguishable speakers, modern recording. Expect excellent accuracy.
  • 5–7 — usable audio, some background noise, accents or older recording. Good for content; review proper nouns.
  • 1–4 — poor audio quality. Transcription may still run but expect significant errors; consider uploading a higher-quality version if available.

Items scoring below 4 are flagged on the item; the transcript is generated but accompanied by a quality warning.


Item costs

ItemItem cost
Audio file1
Video file3

Video is 3 because transcription + frame extraction + content analysis run in parallel. Each costs about as much as a single document item to process.

There's no extra charge for transcription beyond the standard item cost — it's bundled with processing on Professional and above.


Editing the transcript

Open the item detail page. Beneath the player, the transcript is editable in place:

  • Click any segment to edit the text
  • Tab advances to the next segment
  • Speaker labels are editable — useful for renaming "Speaker 1 / Speaker 2" to actual names

Edits save automatically. The original (untouched) machine transcript is preserved internally and can be restored via the menu.


Downloading transcripts

From the item's menu:

  • Download as SRT — standard subtitle format
  • Download as VTT — WebVTT for HTML5
  • Download as TXT — plain text, no timestamps
  • Download as JSON — full structure with timestamps, confidence, speaker labels

Transcripts are also included in every accession export — see Export formats.


Two-pass refinement

Professional+ uses a two-pass approach for longer audio:

  1. First pass captures the overall content and produces a draft transcript with rough boundaries.
  2. Second pass refines the draft using context from the first pass — improves proper nouns, speaker turn boundaries, and overall coherence.

This happens automatically; you don't need to do anything beyond uploading.


Limits

  • Maximum duration per file: typically the file-size cap is the binding constraint — a 500 MB MP3 is several hours, a 2 GB video is comfortably an hour at HD.
  • Languages: English is the strongest. The platform auto-detects the language; non-English transcription is supported but accuracy varies by language. Set your preferred output language in Profile → AI & Defaults.

Bypassing transcription

If you already have a transcript (SRT, VTT, or TXT), upload it alongside the media. The platform uses your transcript instead of generating one. Useful for:

  • Files you've had professionally transcribed
  • Older items where you have a typed transcript already
  • Languages where you have a better tool for transcription