AI extraction built around how CMS content actually works

Pith isn't a generic PDF reader. It's an extraction pipeline designed to produce CMS-native structured content.

Features

Extraction templates

Tell the AI exactly what to find

Define a template with field names, types, and plain-English instructions for the AI. Text, numbers, dates, booleans, arrays — Pith maps each field to your CMS schema before a single page is processed.

Templates are reusable across documents. Set it up once for administrative orders and run every new filing through the same template automatically.

Start building templates

Confidence scoring

Know what the AI is uncertain about — before it ships

Every extracted field comes with a confidence score from 0 to 100. High-confidence fields can auto-approve. Low-confidence fields get flagged for human review.

Color-coded indicators make the review interface fast. Editors only touch what actually needs attention.

CMS connectors

Content arrives as native CMS data — not HTML

When Pith pushes to Sanity, body content becomes proper Portable Text blocks. Images become references. Related documents resolve to typed references in your schema.

No post-processing, no copy-paste, no HTML cleanup. Your CMS gets the structured data it was designed to hold.

Reference resolution

Cross-document relationships — handled automatically

Pith detects when a document references another document in your CMS and resolves it to a proper typed reference. Internal links, related content, parent-child relationships — all wired up at push time.

Batch processing

Hundreds of documents. Same pipeline.

Upload a folder of PDFs and let Pith work through them. Real-time progress tracking shows page counts, extraction status, and review queue depth as each document finishes.

Batch review lets you approve low-variance documents in bulk and focus manual time on exceptions.

Slug generation

Clean URLs configured for your CMS

Define slug rules with prefix support, multi-field composition, and custom separators. Pith generates clean, consistent URLs at push time — no manual slug entry, no duplicates.

More under the hood

PDF & Word support

Upload PDFs and .docx files. More formats on the roadmap.

Source citations

See which part of the source document each extracted value came from.

Inline editing

Edit any field in the review interface before approving.

Re-extract

Update your template and re-run extraction on any document without re-uploading.

Team review

Multiple reviewers can work through the queue simultaneously.

Usage tracking

See pages used this month, your limit, and a breakdown by project.