Converting Court Orders to Structured Web Content

The First Judicial Circuit of Florida had a problem that many courts share: thousands of administrative orders existed as PDF files, accessible only by knowing the exact file name or navigating a legacy document management system. The orders weren't indexed, weren't searchable, and weren't accessible to people using screen readers.

The requirement was to publish them as structured web content — proper HTML with semantic markup, searchable fields, and consistent formatting. The challenge was volume: thousands of existing orders, plus new orders issued regularly.

What information lives in an administrative order

Court administrative orders typically contain a consistent set of fields, even when the formatting varies between order types. Case number (or order number), effective date, judge or judicial officer, order type or category, title, the body of the order, and sometimes referenced prior orders.

The formatting is inconsistent across document templates and across time — courts update their templates, different divisions use different formats, and older orders use older conventions. But the underlying data is consistent enough to extract reliably.

Building the extraction template

The extraction template for administrative orders defines each field with specific instructions that account for formatting variation. The effectiveDate field, for example, instructs the AI to look for patterns like "Ordered this [day] of [month], [year]", "Effective [date]", and date strings near the signature block.

For the order type field, the template provides the court's own taxonomy as a list — so the AI classifies each order against the court's actual categories, not its own interpretation.

The body field is configured as rich text, so the AI preserves paragraph structure, numbered lists, and other formatting from the original document as Portable Text blocks.

The review step in context

For court orders, the review step serves two purposes: accuracy verification and classification spot-checking. High-confidence field values — case numbers, dates in standard formats — rarely need human attention. Order type classification and ambiguous date formats are where reviewers focus.

With a well-tuned template, reviewers working through a batch of court orders typically spend 2–3 minutes per document. That's a fraction of the time required for manual entry.

Handling the backlog vs. ongoing volume

The initial migration — thousands of historical orders — runs as a batch. New orders as they're issued run through the same pipeline individually or in small batches.

Because the extraction template is reusable, the ongoing workflow is straightforward: the clerk uploads the signed PDF, Pith extracts and scores, a reviewer approves, and the order publishes to the court's CMS automatically.