2026-04-19 meta 8 min read

19 Theorems: Auditing 1,200 AI Research Sessions

We claimed 957 theorems. The honest count is 18. What went wrong and how we fixed it.

From April 2026 onward, we ran over 1,200 AI-assisted research sessions on the EML operator. The sessions produced code, notebooks, and a running "theorem" count. By session 1,237, that count had reached 957.

The honest count is 18.

This post is about what happened, why it happened, and what we changed.

What we were doing

Each session would take an area of mathematics or science — consciousness, evolutionary biology, the Millennium Problems, grief, dolphins — and ask: what is the EML depth of the key objects in this domain? Then it would call the answer a "theorem."

A sample from session 546:

T267: Animal cognition — insects=EML-0; dogs=EML-2; dolphins=EML-3.

This is not a theorem. It's a metaphor. There's no mathematical content — no proof that the cognitive processes of dogs require exactly two levels of exponential-logarithmic composition to describe. The "theorem" is an informal analogy dressed in formal notation.

We had hundreds of these. Grief mapped to EML depth levels. Consciousness as EML-∞. The Navier-Stokes regularity problem "proved" independent of ZFC by EML-theoretic analysis. The Riemann Hypothesis "resolved" by identifying which EML depth class the critical line belongs to.

None of this was mathematics. It was speculative classification with formal-looking notation.

Why it happened

The pattern emerged naturally from how the AI assistant engaged with the research. Each session asked it to classify something — and it did, confidently and in the style of a theorem. The human reading these outputs saw formal notation and session numbers and a running count, and the count felt like progress.

The underlying incentive structure rewarded breadth. More domains covered, more sessions run, higher theorem count. The quality of the "theorems" wasn't being audited.

This is a general hazard of AI-assisted research: the AI will produce confident-sounding formal output for whatever prompt you give it. If you ask it to classify consciousness by EML depth, it will do so with the same syntactic confidence as when it proves that ln(1) = 0.

What the honest count is

We audited every claimed result. The criteria for a theorem:

Complete proof with no gaps
Checkable: can be verified by running code or following mathematical steps
Mathematical content: makes a precise claim about mathematical objects

By these criteria, the count is:

Tier	Count	What it means
THEOREM	7	Complete proof, no gaps
PROPOSITION	5	Proved, routine
CONJECTURE	4	Stated, believed, unproved
OBSERVATION	4	Empirical, no proof
DEFINITION	4	Choices, not claims
SPECULATION	4	Interesting but unfalsifiable

The 4 SPECULATION entries are properly labeled — including "P = EML-2, NP = EML-∞" and "Consciousness and EML-∞." These are interesting metaphors. They're not theorems. We keep them in the catalog with the SPECULATION label because ideas shouldn't be deleted just because they're not proved — but they should be honest about what they are.

The 7 actual theorems

For completeness, the 7 theorems (abbreviated):

T01 — Infinite Zeros Barrier: sin(x) has no finite real EML tree.
T02 — EML Universality: eml generates every elementary function. (Odrzywołek, arXiv:2603.21852)
T03 — Euler Gateway: ceml(ix, 1) = exp(ix) = cos(x) + i·sin(x).
T04 — Log Recovery: ln(x) = 1 − ceml(0, x).
T05 — Phantom Attractor is a Precision Artifact: The apparent attractor at ~6.27 in EML gradient descent vanishes at 15+ digit precision.
T06 — Tropical Self-EML: teml(a, a) = |a| in the tropical semiring.
T07 — BEST Routing sin/cos via 1 node: sin(x) = Im(ceml(ix, 1)).

What we changed

The challenge board at monogate.dev was already clean — it had the 6-tier system (THEOREM / PROPOSITION / CONJECTURE / OBSERVATION / DEFINITION / SPECULATION) from the start. The problem was the private research log, which accumulated 957 speculative classifications without the tier system.

We cleaned the private log: archived 1,500+ lines of speculative session summaries, replaced them with a 50-line honest summary. The frontier research files remain in the codebase (the Python modules that compute EML depths of domain-specific formulas), but they're now described accurately — as domain classifications, not theorems.

The lesson: if you're using an AI to help with research, you need to audit what it calls a "theorem" — and build the auditing into the workflow, not as a one-time correction. The AI doesn't know what it doesn't know.

Honest theorem catalog: challenge.monogate.dev/theorems