February 4, 20268 min read

How to Extract All Dates, Names, and Amounts from Legal Documents

Turn stacks of complaints, motions, settlement agreements, and court orders into structured entity tables. Extract every date, person, organization, and monetary amount using an AI agent on your local machine.

The entity extraction problem in legal work

Legal documents are dense with factual details -- dates when things happened, names of people and organizations involved, and monetary amounts at stake. A single 30-page settlement agreement might reference 40 dates, 25 named entities, and 15 distinct monetary figures. A litigation matter with 20 documents easily contains hundreds of each.

These details matter. A paralegal building a case chronology needs every date in order. An attorney drafting a motion needs every party referenced accurately. A finance team calculating reserves needs every monetary amount from every settlement and judgment.

The traditional approach is to read each document with a highlighter and a spreadsheet, marking every date, name, and amount as you go. This is slow and error-prone. Dates in dense paragraphs get missed. Names that appear in different forms -- "Johnson & Smith LLC," "J&S," "the Firm" -- get recorded inconsistently. Dollar amounts in footnotes get overlooked.

With docrew, you point the agent at your legal documents and receive structured tables of every date, name, and amount -- organized by document, cross-referenced across the set, and ready for use in chronologies, case management, and financial analysis.

What counts as an entity

Legal documents contain three primary entity types, each with its own complexities.

Dates. Explicit dates ("January 15, 2024"), partial dates ("in March 2024"), relative dates ("within 30 days of the Effective Date"), date ranges, fiscal periods ("Q3 2024"), and conditional dates ("upon the occurrence of a Triggering Event").

Names. Individuals (parties, witnesses, attorneys, judges), organizations (corporate entities, agencies, courts), and roles functioning as named references ("the Plaintiff," "the Indemnifying Party"). Legal documents often introduce a formal name then use a shortened form -- "Meridian Technologies, Inc. (hereinafter 'Meridian')" -- and the extraction needs to track both.

Amounts. Specific figures ("$2,500,000"), calculated amounts ("15% of gross revenue"), ranges ("between $500,000 and $750,000"), per-unit rates ("$350 per hour"), aggregate caps ("not to exceed $1,000,000"), and conditional amounts with installment structures. Some appear in foreign currencies or are stated in both words and numerals.

A thorough extraction captures all of these with context -- not just "January 15, 2024" but "January 15, 2024 -- date of the alleged breach per Paragraph 12 of the Complaint."

Set up the extraction task

Collect your legal documents into a folder. A typical litigation set might include:

The original complaint and answer/counterclaim
Motions (to dismiss, for summary judgment)
Deposition transcripts
A settlement agreement
A court order approving the settlement
Correspondence between counsel

Open docrew and start a conversation with a precise prompt: "I have legal documents for the Meridian Technologies litigation in the Meridian Case folder. Read every document and extract all dates, person and organization names, and monetary amounts. For each entity, record: the value, the type, which document it appears in, the page or section, and the surrounding context. Produce three output files: a dates table sorted chronologically, a names table sorted alphabetically with variant forms grouped, and an amounts table sorted by value."

The more precise the instructions, the more useful the results.

How the agent processes each document

The agent works through the document set methodically, performing a full read of each document and identifying every instance of the three entity types.

Date extraction. The agent identifies explicit dates in any format ("January 15, 2024," "1/15/24," "the fifteenth day of January") and normalizes them. For relative dates, it calculates the actual date when possible -- "within 30 days of the Effective Date" becomes "March 3, 2024" if the Effective Date (February 1, 2024) is defined elsewhere. It captures date ranges, recurring dates, and conditional dates.

Name extraction. The agent identifies every person, organization, and role reference. It tracks the full formal name and all subsequent abbreviated forms, grouping them as the same entity. It distinguishes between party names, attorneys, judges, witnesses, and experts based on context, and captures organizational relationships.

Amount extraction. The agent identifies every monetary reference -- specific amounts, formulas, ranges, rates, and caps. It notes the currency, what the amount represents, and any conditions. "$2,500,000 settlement payment, payable in three installments" is recorded with both the total and the installment structure.

For each extracted entity, the agent records the source document, page or section, and enough context to understand the reference without reopening the original.

Handling ambiguity

Legal documents are full of ambiguous references. The agent handles these systematically.

Defined terms. "The Company" might appear 50 times in a settlement agreement. The agent traces it to the definition section and records the resolved name. If the term means different entities in different documents, it flags the difference.

Relative dates without anchors. "Within 30 days of filing" is clear if the filing date appears elsewhere. If not, the agent records the relative date as-is and notes the unresolved anchor.

Ambiguous amounts. "Damages in excess of $75,000" is a jurisdictional allegation, not an actual figure. "Approximately $1.2 million" is an estimate. The agent distinguishes between stated amounts, estimates, thresholds, and calculated amounts, labeling each accordingly.

Same-surname individuals. If "Robert Johnson" is the plaintiff and "David Johnson" a witness, the agent tracks them separately. When "Mr. Johnson" appears without a first name, it uses context to determine which one -- and flags it as ambiguous if context is insufficient.

The agent does not guess. When an entity reference is genuinely ambiguous, the output says so explicitly.

Understanding the output

The agent produces three structured tables.

The dates table. Sorted chronologically: the normalized date, original text as it appears in the document, source document name, page or section, and what the date refers to. Relative and conditional dates appear at the end with dependency notes.

Sample entry: "2024-01-15 | 'January 15, 2024' | Complaint, Paragraph 12 | Date plaintiff alleges defendant breached the non-compete agreement."

The names table. Sorted alphabetically: canonical name, all variant forms across the document set, entity type (individual, organization, court), role in the matter (plaintiff, defendant, counsel, witness), and documents where the entity appears.

Sample entry: "Meridian Technologies, Inc. | Variants: 'Meridian,' 'the Company,' 'Defendant' | Organization | Defendant | Appears in: Complaint, Answer, Settlement Agreement, Court Order."

The amounts table. Sorted by value (largest first): normalized amount, original text, amount type (stated, estimated, threshold, rate), context, source document, and page or section.

Sample entry: "$2,500,000 | 'two million five hundred thousand dollars ($2,500,000)' | Stated | Total settlement payment | Settlement Agreement, Section 3.1."

Cross-document entity index

Beyond per-document extraction, the agent produces a cross-document index showing where each entity appears across the entire set.

Timeline construction. The chronological dates table from all documents gives you a complete matter timeline -- the foundation for a case chronology that normally takes hours of manual compilation.

Party tracking. The names index shows every document where each party, attorney, or witness is mentioned. Need to know everywhere "Robert Johnson" appears across 14 documents? The index answers immediately.

Financial summary. The amounts index consolidates every monetary figure -- damages claimed, settlement amounts, attorney fee rates, court-approved amounts -- all in one table for reserve calculation and reporting.

Consistency checking. If the complaint says the contract was executed January 10 but the settlement agreement says January 12, both dates appear in the index with sources, making the discrepancy visible.

Validation tips

Entity extraction is accurate but not infallible. Practical steps to validate:

Spot-check high-value amounts. Verify the three or four largest monetary figures against source documents. This takes two minutes and catches the errors that matter most.

Verify date calculations. For resolved relative dates, check the calendar math. "Within 30 days of February 1" should be March 3.

Review ambiguity flags. These represent items most likely to need human judgment.

Check entity grouping. Verify that variant names are grouped correctly and that two different entities sharing a shortened name are not merged.

Compare against your case knowledge. If you know five parties are involved but the table shows four, a name was missed. Your existing knowledge is the best validation tool.

From raw extraction to case work product

The entity tables become case work product with minimal effort.

Case chronology. Filter procedural dates from the dates table and you have a chronology skeleton ready for narrative descriptions and trial preparation.

Party and witness list. Filter the names table by role for a complete list useful in conflict checks, deposition planning, and trial prep.

Damages analysis. Filter the amounts table for every monetary figure from initial demand to final settlement, supporting reserve analysis and client reporting.

Discovery responses. "Produce all documents referencing payments to Meridian Technologies" becomes a lookup against the names and amounts tables rather than a manual review of every document.

The extraction takes minutes. The manual alternative takes hours or days. And because all processing happens locally with docrew, privileged legal documents never leave your machine -- they are not uploaded to a cloud service, not processed by a third-party API, not stored on someone else's server. For legal work where confidentiality is not optional, this matters.

Back to all articles