19 Theorems: Auditing 1,200 AI Research Sessions
We claimed 957 theorems. The honest count is 18. What went wrong and how we fixed it.
From April 2026 onward, we ran over 1,200 AI-assisted research sessions on the EML operator. The sessions produced code, notebooks, and a running "theorem" count. By session 1,237, that count had reached 957.
The honest count is 18.
This post is about what happened, why it happened, and what we changed.
What we were doing
Each session would take an area of mathematics or science — consciousness, evolutionary biology, the Millennium Problems, grief, dolphins — and ask: what is the EML depth of the key objects in this domain? Then it would call the answer a "theorem."
A sample from session 546:
T267: Animal cognition — insects=EML-0; dogs=EML-2; dolphins=EML-3.
This is not a theorem. It's a metaphor. There's no mathematical content — no proof that the cognitive processes of dogs require exactly two levels of exponential-logarithmic composition to describe. The "theorem" is an informal analogy dressed in formal notation.
We had hundreds of these. Grief mapped to EML depth levels. Consciousness as EML-∞. The Navier-Stokes regularity problem "proved" independent of ZFC by EML-theoretic analysis. The Riemann Hypothesis "resolved" by identifying which EML depth class the critical line belongs to.
None of this was mathematics. It was speculative classification with formal-looking notation.
Why it happened
The pattern emerged naturally from how the AI assistant engaged with the research. Each session asked it to classify something — and it did, confidently and in the style of a theorem. The human reading these outputs saw formal notation and session numbers and a running count, and the count felt like progress.
The underlying incentive structure rewarded breadth. More domains covered, more sessions run, higher theorem count. The quality of the "theorems" wasn't being audited.
This is a general hazard of AI-assisted research: the AI will produce confident-sounding formal output for whatever prompt you give it. If you ask it to classify consciousness by EML depth, it will do so with the same syntactic confidence as when it proves that ln(1) = 0.
What the honest count is
We audited every claimed result. The criteria for a theorem:
- Complete proof with no gaps
- Checkable: can be verified by running code or following mathematical steps
- Mathematical content: makes a precise claim about mathematical objects
By these criteria, the count is:
| Tier | Count | What it means |
|---|---|---|
| THEOREM | 7 | Complete proof, no gaps |
| PROPOSITION | 5 | Proved, routine |
| CONJECTURE | 4 | Stated, believed, unproved |
| OBSERVATION | 4 | Empirical, no proof |
| DEFINITION | 4 | Choices, not claims |
| SPECULATION | 4 | Interesting but unfalsifiable |
The 4 SPECULATION entries are properly labeled — including "P = EML-2, NP = EML-∞" and "Consciousness and EML-∞." These are interesting metaphors. They're not theorems. We keep them in the catalog with the SPECULATION label because ideas shouldn't be deleted just because they're not proved — but they should be honest about what they are.
The 7 actual theorems
For completeness, the 7 theorems (abbreviated):
- T01 — Infinite Zeros Barrier: sin(x) has no finite real EML tree.
- T02 — EML Universality: eml generates every elementary function. (Odrzywołek, arXiv:2603.21852)
- T03 — Euler Gateway: ceml(ix, 1) = exp(ix) = cos(x) + i·sin(x).
- T04 — Log Recovery: ln(x) = 1 − ceml(0, x).
- T05 — Phantom Attractor is a Precision Artifact: The apparent attractor at ~6.27 in EML gradient descent vanishes at 15+ digit precision.
- T06 — Tropical Self-EML: teml(a, a) = |a| in the tropical semiring.
- T07 — BEST Routing sin/cos via 1 node: sin(x) = Im(ceml(ix, 1)).
What we changed
The challenge board at monogate.dev was already clean — it had the 6-tier system (THEOREM / PROPOSITION / CONJECTURE / OBSERVATION / DEFINITION / SPECULATION) from the start. The problem was the private research log, which accumulated 957 speculative classifications without the tier system.
We cleaned the private log: archived 1,500+ lines of speculative session summaries, replaced them with a 50-line honest summary. The frontier research files remain in the codebase (the Python modules that compute EML depths of domain-specific formulas), but they're now described accurately — as domain classifications, not theorems.
The lesson: if you're using an AI to help with research, you need to audit what it calls a "theorem" — and build the auditing into the workflow, not as a one-time correction. The AI doesn't know what it doesn't know.
Honest theorem catalog: challenge.monogate.dev/theorems