LegalLean

When a computer says "you owe tax", can you check its reasoning?

LegalLean makes legal logic machine-checkable. When a system decides you qualify for a visa, owe a tax, or violate a regulation, LegalLean lets you verify exactly which parts are provable and which require human judgment.

Status

Framework complete with generic solver, typed IR pipeline, and factorial hardening programme. Paper targeting RuleML+RR 2026 / JURIX 2026. Open core: MIT-licensed.

87 Verified theorems
4 Case studies
3 AI reviews
10 Release gate KPIs

The Problem

Every day, computer systems make decisions about people's lives: tax deductions, visa eligibility, regulatory compliance. These systems are increasingly powered by AI. But you can't check their reasoning. When a system says "not eligible", you don't know whether that conclusion rests on a verifiable rule or a judgment call.

Today's AI legal systems

Give you an answer but no way to distinguish a machine-provable fact from a contestable judgment. Both appear as the same output. You can't tell where the computer's certainty ends and human judgment should begin.

With LegalLean

Every legal conclusion is marked as either formally verified (the computer proved it) or requires human determination (with an explanation of why, who should decide, and what the contested boundary is).

How It Works

LegalLean uses the same technology that mathematicians use to verify proofs (Lean 4) and applies it to law. Think of it as a spell-checker, but instead of checking spelling, it checks legal logic.

  1. A law is written as a computer program

    A human (or AI) translates a statute into a structured format. For example: "If you're an individual taxpayer, and your mortgage is secured by your home, and it's under $750,000, you may deduct the interest."

  2. The computer checks the logic

    Lean 4 verifies that every rule is consistent, every exception is properly handled, and no contradictions exist. If a rule says "you can deduct" but another says "you can't", the system catches it.

  3. Vague terms are marked, not hidden

    Legal language is deliberately vague sometimes. "Qualified residence" could mean a house, a boat, or a mobile home. Instead of pretending the computer can decide this, LegalLean marks it as RequiresHumanDetermination and documents the contested boundary.

  4. Anyone can verify the result

    The entire system can be independently checked by running lake build. The Lean 4 type checker verifies all 87 theorems from scratch. No trust required.

Under the Hood

LegalLean is not just a collection of proofs. It includes a generic defeasibility solver, a typed intermediate representation for legal language, and a systematic hardening programme.

Generic Solver

An applicability-aware defeat resolution engine. Given a set of legal rules and facts, it filters applicable rules, resolves defeats by priority, and returns a FormalisationBoundary result. Six solver-property theorems prove monotonicity, determinism, and soundness.

Typed Legal IR

A three-stage pipeline: English → Typed Legal Semantic IR → Lean 4. The IR captures eight factors (voice, modality, negation, exceptions, temporal, thresholds, definitions, cross-references) as Lean 4 inductive types, with idempotent normalisation passes and total lowering to LegalLean types.

Solver-Soundness Bridge

Theorems proving the generic solver agrees with hand-coded case study functions on defeat structure, auto-rejection cases, and output kind. Exposes genuine architectural boundaries: where the solver can and cannot replicate manual encoding.

Factorial Hardening

A five-phase programme testing whether controlled variations in English surface form produce semantically stable Lean 4 outputs. Covering arrays across eight factors, variation family tests (5/5 passing), and ten release-gate KPIs including false precision rate and equivalence proof success.

A Real Example: The Tax Cuts and Jobs Act

In 2017, the US Congress changed the rules for deducting mortgage interest. LegalLean doesn't just tell you this. It proves it.

The Monotonicity Break Theorem

"For every home equity interest payment by an individual taxpayer on secured property within the applicable dollar limits: the pre-2018 legal outcome is 'potentially deductible (human must verify if property is a qualified residence)' and the post-2018 outcome is 'definitely not deductible (machine-verified rejection)'."

This is not a test case. It covers every possible payment that meets the conditions. The computer verified it. No human can make it wrong without changing the law.

Without LegalLean

"Your deduction was denied."

Why? Was it the TCJA? Was my property not "qualified"? Was my loan too large? The system doesn't say.

With LegalLean

"Your deduction was denied because your taxable year is 2018 or later and your loan is classified as home equity. This is a machine-verified rejection based on the TCJA."

Clear. Traceable. Verifiable.

Who Benefits

Citizens

Know exactly why a decision was made, and whether it was based on a provable rule or a judgment call.

Lawyers

Identify exactly where a law is clear and where it's contested. Focus arguments on the genuinely debatable points.

Regulators

Verify that regulations are internally consistent before publishing. Find contradictions, gaps, and ambiguities automatically.

💻
Developers

Build compliance systems with machine-checked guarantees. Point to a verified proof, not just a test suite.

🏫
Academics

A new framework for studying the boundary between formalisable and non-formalisable law. 80 theorems, typed IR pipeline, factorial hardening.

🌎
Society

When laws are machine-checkable, it's harder to hide unjust outcomes behind opaque algorithms.

What LegalLean Does Not Do

Honesty matters.

Replace judges or lawyers

Many legal questions require human judgment. LegalLean marks exactly where. It makes the boundary visible; it does not eliminate it.

Formalise all of law

Law is deliberately vague in many places. "Reasonable", "extraordinary", "substantial" resist precise definition. LegalLean classifies this vagueness rather than pretending it doesn't exist.

Guarantee legal correctness

The system verifies internal consistency. Whether the encoding faithfully represents the actual law requires human review. The computer checks logic, not legal interpretation.

Work on every law today

Currently demonstrated on three case studies (US tax, US immigration, Australian telecoms). Healthcare domains are next.

Hardening the Translation Layer

Proving a legal rule is correctly formalised is necessary but not sufficient. The harder question: does the same rule, phrased differently, produce an equivalent formalisation? LegalLean's hardening programme is now implemented and passing.

Three-Stage Pipeline

English → Typed Legal Semantic IR (actors, modalities, exceptions, time, thresholds, vagueness markers) → Lean 4 (definitions + propositions) → Proofs / equivalence checks / explicit abstentions.

Lean as Test Oracle

For two English variants that should be synonymous, translate both to Lean terms T₁ and T₂, then ask Lean to prove T₁ ↔ T₂. Three outcomes: equivalence proved, semantic divergence, or coverage gap. The kernel does not guess.

Factorial Variation Testing

Controlled families of legal-language variations across eight factors: voice, modality, negation, exception structure, temporal phrasing, thresholds, definitions, and cross-references.

False Precision Detection

The most dangerous failure: when the input is genuinely vague, but the system emits a crisp decidable predicate. We measure false precision rate as a primary release gate.

Worked Example: Variation Family

"not eligible unless..." and "eligible only if..." → equivalence expected (same legal effect, different syntax)

"may be treated as eligible where the decision-maker is satisfied..." → should diverge (discretion + evidential burden + open texture)

If the system emits the same Lean predicate for all three, it has committed false precision. The factorial test catches this because the comparison is built into the experimental design.

Full technical details: Legal Reasoning Needs a Wind Tunnel.

Coming Next: Healthcare

Clinical guidelines, drug dosage rules, and insurance coverage decisions follow the same pattern: clear rules with contested boundaries. LegalLean's FormalisationBoundary type already encodes this.

NICE Guidelines

Evidence-graded clinical recommendations. Grade A evidence maps to FormalisationBoundary.formal; "consider" recommendations map to boundary. The evidence grading IS the formalisation boundary.

BNF Drug Dosage Rules

Dosage ranges, contraindications, and interactions. Highly formalisable: numeric thresholds the type checker can verify, with clinical judgment at the boundaries (renal impairment adjustments, off-label use).

US Insurance Coverage

Coverage decisions ("medically necessary", "experimental treatment") are legal reasoning applied to healthcare. The same defeasible structure: base coverage defeated by exclusions, restored by exceptions.

NHS Clinical Pathways

Treatment pathways with decision points: some machine-verifiable (blood test results), some requiring clinical judgment (patient preference, comorbidity weighting). LegalLean makes the boundary explicit.

Don't Trust Us. Verify.

Everything in LegalLean can be independently verified. You don't need to trust us. You don't need to trust any AI. The mathematics is checkable.

Run it yourself Clone the repo and run lake build. 15 seconds. Zero trust required.
Use Docker Run docker build . to verify in a completely clean environment from a blank Ubuntu installation.
3-model adversarial review Gemini, Codex (GPT), and Claude independently reviewed the paper and code. A cross-model synthesis identifies consensus findings.
Automated theorem proving We use Aristotle by Harmonic for independent verification beyond the Lean type checker.

The Source Code is Open

LegalLean is open source under the MIT licence. The academic paper, all 87 theorems, the verification protocol, the generic solver, the typed IR pipeline, and the three-model adversarial review are all published.