How AI Processing Works

The end-to-end view of what happens between file uploaded and catalogue record ready to export. Useful to understand both the capabilities and the costs.

The pipeline

Every file goes through some subset of these stages, depending on its type:

Ingest — virus scan, MIME-type detection, dedupe (same file uploaded twice is skipped).
Extraction — vision analysis for photographs, artefacts, and documents. OCR for documents is opt-in (see below). Transcription for audio and video is opt-in.
Classification — assign a category (Document, Photograph, Artefact, Audio, Video) and populate the metadata schema for that category.
Authority resolution — resolve extracted subjects, people, places, and organisations against published vocabularies. Each entity becomes a pill linked to its source-of-record URI.
Optional collection-level Analysis — when you ask for it on the Analysis tab, the platform produces a top-down narrative summary, themes, and a proposed arrangement.

You see the result in the Items, Analysis, and Authorities tabs on the accession page.

What each model does

The Archiver doesn't lock you into a single model — different stages use different providers:

OCR. Mistral OCR for documents, opt-in only. Documents are analysed with vision first; you then review the transcribability score and choose whether to run OCR.
Vision / object description. Gemini 3.5 Flash for photographs, artefacts, and document metadata extraction (when OCR is not run). This model version provides superior metadata extraction quality compared to earlier versions.
Handwritten content. Gemini Flash runs first with a confidence score; if confidence is low, the file is flagged so you can opt into a deeper pass.
Transcription. Speech-to-text for audio and video, with timed segments. Opt-in after reviewing the transcribability score. Two-pass refinement on Pro+.
Authority resolution. Live API calls to LCSH, FAST, Getty AAT, Getty TGN, VIAF, GeoNames, and Wikidata; semantic LLM-based disambiguation for ambiguous matches.
Research & Explore. Gemini Flash on Professional / Team; Gemini Pro on Enterprise.

OCR and transcription are opt-in. Documents and media items receive a transcribability score (1-10) during initial processing. Items scoring 7+ are marked "Excellent" for OCR/transcription; 4-6 are "Challenging" (expect some errors); below 4 are "Poor" but you can still run OCR/transcription if needed. Click "Run OCR" or "Transcribe" on any item to extract the full text.

Transcribability scoring

Every document and media file is assessed for OCR or transcription suitability during initial processing, before any text extraction runs. You'll see:

Score (1-10). Based on image quality, text density, print clarity (for documents) or audio quality and speech clarity (for media).
Band label. "Excellent" (7+), "Challenging" (4-6), or "Poor" (below 4).
Limiting factors. Why the score is what it is — e.g. "handwritten", "low resolution", "background noise".

After reviewing the score, click Run OCR (documents) or Transcribe (audio/video) to extract the text. Items with "Excellent" scores typically produce accurate, clean transcripts. "Challenging" items will run successfully but expect errors on proper nouns, dates, or degraded passages. "Poor" items can still be processed if needed — the platform offers a "Run OCR anyway" option.

OCR and transcription do not count against your item quota beyond the base processing cost — but they do take time (a 100-page document may take 30+ seconds; a 1-hour audio file may take several minutes).

Instant vs. batch

There are two processing modes:

Instant. Each item is processed as it's uploaded. You see results within seconds-to-minutes of each file landing. Cap on the Community plan: 2 instant accessions per month, falling back to batch beyond that.
Batch. Items are queued and processed in bulk. Slower latency but cheaper at scale — and the Archiver gives you a 30% uplift on your monthly item quota when you choose batch.

You don't have to pick — the platform routes Community accounts to batch once they've used their instant runs, and Pro+ users have unlimited instant.

Item costs

How many items a file counts as against your monthly quota:

File type	Item cost
Document	1
Image / photograph	1
Audio	1
Video	3

Video is 3 because transcription, frame extraction, and content analysis each consume independent capacity.

Confidence and flagging

For every metadata value the AI assigns, it emits a self-assessed confidence score (high / medium / low). Low-confidence values mark the item as Flagged in the review screen. The model also flags items where it bailed out early — un-readable scans, audio with too much noise, or images that don't seem to contain the subject implied by the filename.

Filter to Flagged in the review toolbar to deal with the awkward cases first.

What's not in scope

The Archiver does not retrain its models on your data.
No customer content leaves The Archiver and its model providers for any other purpose.
We do not publish your files or metadata anywhere — the Authorities page on the marketing site shows examples from public collections, not customer data.

Want more control?

Profile → AI & Defaults lets you set an Output language, Institutional context, Writing style, and optionally Reparative description that's injected into every AI prompt. See Your profile.
Settings → Data Model controls which fields the AI populates per category and which vocabularies it's allowed to resolve against. See Data Model.
Settings → Exports → Mappings controls how AI-populated fields appear in each export format.

The pipeline​

What each model does​

Transcribability scoring​

Instant vs. batch​

Item costs​

Confidence and flagging​

What's not in scope​

Want more control?​