Andrews-Curtis conjecture formalization

Robert Joseph George June 2026

What is the Andrews-Curtis conjecture?

The Andrews-Curtis conjecture is easy to state and hard to touch. Start with a balanced presentation of the trivial group: the number of generators equals the number of relators, and the relators normally generate the whole free group. The conjecture says that such a presentation should be reducible to the obvious trivial presentation \(\langle x_1,\ldots,x_n \mid x_1,\ldots,x_n\rangle\) by elementary AC moves: invert a relator, multiply one relator by another, and conjugate a relator. The problem has been open since 1965.

Those elementary moves are the heart of the problem. One may invert a relator, multiply one relator by another, or conjugate a relator by a generator or word. In rank two, where much of our computational work lives, a presentation looks like a pair of words \((r_1,r_2)\) in the free group \(F(x,y)\). A certificate is not a heuristic score or a plausibility argument. It is literally a list of moves taking \((r_1,r_2)\) back to \((x,y)\), with every intermediate word reduced exactly.

State

A pair of relators, for example \(r_1=xyx^{-1}y^{-1}\) and \(r_2=y\), represented as exact free-group words.

Move

An exact rewrite such as \(r_i\leftarrow r_i r_j\), \(r_i\leftarrow r_i^{-1}\), or \(r_i\leftarrow g r_i g^{-1}\).

Certificate

A replayable finite trace of moves. The verifier checks the trace; it does not trust the searcher.

\[ (r_1,r_2) \xrightarrow{\text{AC move}} (r'_1,r'_2) \xrightarrow{\text{AC move}} \cdots \xrightarrow{\text{AC move}} (x,y). \]

For a concrete example, the presentation \[ P_0=\langle x,y \mid xy,\;xyx^{-1}\rangle \] is AC-trivial in two moves: \[ \langle x,y \mid xy,\;xyx^{-1}\rangle \xrightarrow{\,r_2\leftarrow x^{-1}r_2x\,} \langle x,y \mid xy,\;y\rangle \xrightarrow{\,r_1\leftarrow r_1r_2^{-1}\,} \langle x,y \mid x,\;y\rangle. \] The two algebraic equalities behind the trace are \(x^{-1}(xyx^{-1})x=y\) and \((xy)y^{-1}=x\).

A search program can produce a sequence like this. That is not yet a proof. It becomes a proof only after the sequence is replayed inside the intended formal relation. If the trace replays in ACEquivalent, it proves an ACEquivalent statement. If it replays in GeneralACEquivalent, it proves a generalized statement. If it uses stabilization, it proves a stable statement.

Two moves are easy to check by direct algebra. The conjecture asks whether every balanced presentation of the trivial group has a certificate like this. The certificate may be much longer, and the intermediate words may grow enormously before shrinking. For simple presentations the answer is yes. For the hard families, nobody has found a classical certificate, and nobody has proved one cannot exist.

This is also why the problem is such a clean search problem. A program can wander through the move graph for as long as it wants. At the end it has to hand over a move list. Lean does not care how the list was found. It only checks whether the moves replay.

Why this tiny-looking problem is hard

Every AC move is easy to describe. That is the trap. The hard part is finding the right sequence of easy moves.

In rank two, the checker has twelve immediate moves. That sounds small. But after a few steps the tree is already huge, and the good path may start by making the words longer. So a length-greedy search can lie to you. The move that makes the presentation look simpler may be a dead end, and the ugly move may be the beginning of the proof.

There is another trap. Knowing that a presentation gives the trivial group is not the same as having an AC certificate. The first is a group-theoretic fact. The second is a path in a very specific move graph. It is easy to slide between those two statements in prose. Lean does not let you slide.

For reinforcement learning, the game is almost embarrassingly literal. The state is the current pair of relators. An action is one allowed move. You win when the pair becomes \((x,y)\). Easy examples look like the two-move trace above. Hard examples are the same game, except the graph is deeper and meaner. Some branches shorten the words right away. Some branches look worse for a long time and only later open a door. Most branches just waste your life.

The nastiest part is length. Even when a presentation is AC-trivial, the route back to the trivial presentation may pass through much larger words. So "just search everything below this length" is not a serious plan. You might be cutting off the only path.

Group-theoretic difficulty

The hypothesis is about the normal closure of relators in a free group, not about a syntactic normal form.

Search difficulty

Move sequences can grow words before reducing them, so local length is a weak guide and sparse rewards are severe.

Formal difficulty

Different move systems require explicit bridge theorems; Lean treats them as different relations.

What we formalized

The first challenge when formalizing AC is a design question: what exactly is an AC move? In a paper you write "multiply relator \(r_i\) by \(r_j\)" and everyone nods. In Lean, an inductive relation needs constructors. Those constructors encode which moves your system allows. Get that wrong and your theorem about the conjecture is actually about a subtly different move system. We ran into exactly this, and Aristotle helped catch it.

The project is pinned through elan to Lean \(4.29.0\), with mathlib pulled by Lake. That matters because the proof is checked against the definitions in this project, not against a paragraph describing what we meant. If the move relation changes, the theorem changes.

Toolchain and base presentation type

-- lean-toolchain
leanprover/lean4:v4.29.0

-- Problems/Core/Presentation.lean
-- Think of this as the data object for a group presentation.
-- α is the type of generator names, and relations is the list of relators.
structure Presentation (α : Type*) where
  relations : List (FreeGroup α)
deriving DecidableEq

-- Balanced means: same number of generators and relators.
-- The bracketed arguments are Lean's way of saying "α is finite"
-- and "Lean can decide equality of free-group words when needed."
def balanced {α : Type*} [Fintype α] [DecidableEq (FreeGroup α)]
    (p : Presentation α) : Prop :=
  Fintype.card α = p.relations.length

A rank-\(N\) presentation is represented as \(\mathrm{Presentation}(\mathrm{Fin}\,N)\). The generators are the finite type \(\mathrm{Fin}\,N\); each relator is an exact element of \(\mathrm{FreeGroup}(\mathrm{Fin}\,N)\). This is the main modeling move: instead of storing presentations as loose strings, the rank of the presentation appears in the type.

Free words

Words in generators and inverses are reduced by exact cancellation, not floating point heuristics.

Presentations

A balanced presentation is encoded as a finite tuple of relators over a free group.

Move relations

AC, generalized AC, and stable AC are separate relations with separate theorem boundaries.

Certificates

Search output is useful only when replayed as an exact chain of moves in the formal system.

The rank-two relation is the one we use when we want a small action space for experiments. The general relation is the cleaner mathematical version: any finite rank, same kind of moves. Stable AC adds one more freedom: you may temporarily add a new generator together with the relator that kills it.

Why stable AC?

The reason to allow that extra freedom is simple. Sometimes a problem is stuck in rank \(n\), but becomes easier after moving to rank \(n+1\). You add a harmless generator and relator, simplify in the larger presentation, then try to remove the extra pair. That is a different theorem from classical AC, so the formalization keeps it separate.

PresentationGroup.lean: quotient-group semantics

-- This turns relators into the normal subgroup they generate.
-- In group-presentation language, this is "impose these relators."
def normalClosureAny {N : Nat}
    (rels : List (FreeGroup (Fin N))) : Subgroup (FreeGroup (Fin N)) :=
  Subgroup.normalClosure (rels.toFinset : Set (FreeGroup (Fin N)))

-- The presentation is trivial exactly when those relators generate everything.
def presentsTrivialGroupAny {N : Nat} (p : Presentation (Fin N)) : Prop :=
  normalClosureAny p.relations = ⊤

-- The actual presented group: free group modulo the normal closure.
def presentedGroup {N : Nat} (p : Presentation (Fin N)) :=
  (FreeGroup (Fin N)) ⧸ (normalClosureAny p.relations)

-- The standard target presentation: each generator is killed by its own relator.
def trivialPresentationAny (N : Nat) : Presentation (Fin N) :=
  ⟨List.ofFn (fun i : Fin N => FreeGroup.of i)⟩

-- "AC-trivial" means reachable from p by the generated AC equivalence relation.
def GeneralACTrivial (N : Nat) (p : Presentation (Fin N)) : Prop :=
  GeneralACEquivalent N p (trivialPresentationAny N)

This is where the classical statement enters the formal world. \(\mathrm{presentsTrivialGroupAny}(p)\) says the relators normally generate the whole free group. \(\mathrm{GeneralACTrivial}(N,p)\) says there is an AC-equivalence path from \(p\) to the standard trivial presentation. The Andrews-Curtis conjecture is exactly the claim that the first condition implies the second, for balanced presentations.

Basic.lean / GeneralACMoves.lean / StableACMoves.lean: relation excerpts

-- An inductive relation is a list of allowed one-step moves.
-- Each constructor below is one legal move in the move graph.
inductive ACMove : Presentation (Fin 2) → Presentation (Fin 2) → Prop
  | h1 (p : Presentation (Fin 2)):
    -- h1 says: replace r₂ by r₂ * r₁.
    ACMove p ⟨p.relations.set 1 ((p.relations.getD 1 1) * (p.relations.getD 0 1))⟩
  -- h2 through h11 omitted here for space.
  | h12 (p : Presentation (Fin 2)):
    -- h12 says: conjugate r₁ by x⁻¹.
    ACMove p ⟨p.relations.set 0 (x⁻¹ * (p.relations.getD 0 1) * x)⟩

-- EqvGen means "take the equivalence relation generated by those one-step moves."
-- In plain English: any finite sequence of ACMove steps, forwards or backwards.
def ACEquivalent : Presentation (Fin 2) → Presentation (Fin 2) → Prop :=
  Relation.EqvGen ACMove

-- GeneralACMove is the same idea, but for any rank N.
-- Fin N is Lean's finite type with N generator names: 0, 1, ..., N-1.
inductive GeneralACMove (N : Nat) :
    Presentation (Fin N) → Presentation (Fin N) → Prop
  | multiply (p : Presentation (Fin N)) (i j : Fin N) (s : Int)
    (h_ne : i ≠ j) (h_sign : s = 1 ∨ s = -1) :
    -- Replace rᵢ by rᵢ * rⱼ^s. The proofs say i and j are different,
    -- and that s is really one of the two allowed signs.
    GeneralACMove N p (applyMultiplicationMove N p i j s)
  | invert (p : Presentation (Fin N)) (i : Fin N) :
    -- Replace rᵢ by its inverse.
    GeneralACMove N p (applyInversionMove N p i)
  | conjugate (p : Presentation (Fin N)) (i g : Fin N) (s : Int)
    (h_sign : s = 1 ∨ s = -1) :
    -- Conjugate rᵢ by generator g or by g⁻¹.
    GeneralACMove N p (applyConjugationMove N p i g s)

-- Stable moves can change rank, so the rank N travels with the presentation.
def AnyPresentation : Type := Σ N : Nat, Presentation (Fin N)

inductive StableACMove : AnyPresentation → AnyPresentation → Prop
  | ac (N : Nat) (p q : Presentation (Fin N)) :
      -- Ordinary fixed-rank AC moves are still allowed.
      GeneralACMove N p q →
      StableACMove ⟨N, p⟩ ⟨N, q⟩
  | stabilize (N : Nat) (p : Presentation (Fin N)) :
      -- Add a fresh generator and a fresh relator killing it.
      StableACMove
        ⟨N, p⟩
        ⟨N+1, ⟨(liftPresentationSucc N p).relations ++ [newGen N]⟩⟩
  -- destabilize, substitute_remove, substitute_remove_any,
  -- and substitute_remove_neg_any are defined in StableACMoves.lean.

I do not want to rank these relations as better or worse. They are just different. If a certificate uses stable moves, it is a stable certificate. If a theorem is proved for GeneralACMove, it does not automatically become a theorem about the hand-written rank-two ACMove. You need a bridge theorem. If the bridge is not in the file, Lean will not pretend it exists.

Examples that became Lean statements

Definitions are not enough. We also needed examples: a toy certificate we could replay, hard families we could name, and conjecture stubs that made the open part impossible to hide.

A two-step AC certificate

The presentation \(\langle x,y \mid xy, xyx^{-1}\rangle\) is a small sanity check. First conjugate the second relation by \(x^{-1}\), turning \(xyx^{-1}\) into \(y\). Then right-multiply the first relation by \(y^{-1}\), turning \(xy\) into \(x\). The Lean proof constructs exactly these two moves and chains them with \(\mathrm{Relation.EqvGen.trans}\).

Basic.lean: checked toy certificate

-- The destination state for the search game.
-- The two relators are just x and y.
def trivial_presentation : Presentation (Fin 2) :=
  ⟨[x, y]⟩

-- The toy starting state: ⟨x,y | xy, xyx⁻¹⟩.
def example_presentation : Presentation (Fin 2) :=
  ⟨[x * y, x * y * x⁻¹]⟩

-- This is a replayable certificate, not a heuristic claim.
#time theorem example_is_AC_trivial : ACEquivalent example_presentation trivial_presentation := by
  -- Step 1: Conjugate the second relation by x⁻¹
  let rels1 : List (FreeGroup (Fin 2)) := {x * y, y}
  have step1 : ACMove example_presentation (Presentation.mk rels1) := by
    -- h5 is the named move constructor we want.
    -- convert lets Lean match the concrete list representation.
    convert ACMove.h5 example_presentation
    -- decide handles the remaining finite computation.
    decide
  -- Step 2: Right multiply r₁ by r₂⁻¹
  have step2 : ACMove (Presentation.mk rels1) trivial_presentation := by
    -- h2 changes xy into xy * y⁻¹ = x.
    convert ACMove.h2 (Presentation.mk rels1)
    decide
  -- Now combine all the steps using transitivity of ACEquivalent
  have eq1 : ACEquivalent example_presentation (Presentation.mk rels1) :=
    Relation.EqvGen.rel _ _ step1
  have eq2 : ACEquivalent (Presentation.mk rels1) trivial_presentation :=
    Relation.EqvGen.rel _ _ step2
  -- Chain the two certified edges into one certified path.
  apply Relation.EqvGen.trans example_presentation (Presentation.mk rels1) trivial_presentation
  apply eq1
  apply eq2

The AK examples

The AK examples are the kind of examples that make this problem feel unfair. You can write \(AK(n)\) in a few symbols. There are only two generators and two relators. Nothing about it looks large. But when you start applying AC moves, it does not open up the way the toy example did.

AK.lean: a parameterized family

-- AK is a whole family of presentations, one for each natural number n.
def AK (n : Nat) : Presentation (Fin 2) :=
  let x := FreeGroup.of (0 : Fin 2)
  let y := FreeGroup.of (1 : Fin 2)
  -- The first relator encodes x^n = y^(n+1).
  let rel1 := x^n * (y^(n+1))⁻¹
  -- The second relator is the fixed braid-like word in the family.
  let rel2 := x * y * x * (y * x * y)⁻¹
  ⟨[rel1, rel2]⟩

-- This proof is just bookkeeping: AK n has two generators and two relators.
lemma balanced_AK (n : Nat) : balanced (AK n) := by
  unfold balanced AK
  simp

-- This theorem says AK n is equivalent to the Miller-Schupp presentation
-- in the general move system. It is not automatically a Basic.lean ACMove proof.
theorem AK_equiv_MS_w₁ (n : Nat) (_hn : 2 ≤ n) :
    GeneralACEquivalent 2 (AK n) (MS n w₁ w₁_exp_sum) := by
  classical
  -- The proof in AK.lean continues as an explicit chain of GeneralACMove steps.

The annoying thing is that the presentation has structure, but not the helpful kind. One relator ties \(x^n\) to \(y^{n+1}\). The other has a braid-like shape. If you poke it with the obvious moves, you mostly get messier words. Maybe those messier words are useful. Maybe they are just mess. That is the problem.

In the Lean file, \(AK(n)\) is just a function. That is good. It means we are not pasting in a separate example for each \(n\). We prove once that the family is balanced, and then state the theorem connecting it to the Miller-Schupp family. But the theorem is stated in \(\mathrm{GeneralACEquivalent}\,2\). That is not a small wording issue. It means the result lives in the general move system, not automatically in the hand-coded rank-two ACMove relation from Basic.lean.

Stable AC without hiding rank changes

Stable AC needs a different representation because stabilization changes the rank. The code uses \(\mathrm{AnyPresentation}\), a sigma type storing both the rank and the presentation at that rank. Stabilization, destabilization, and substitution-removal are then ordinary constructors of \(\mathrm{StableACMove}\).

StableACMoves.lean: stable presentations, excerpt

-- A stable presentation is a pair:
--   rank N, plus a presentation over Fin N.
def AnyPresentation : Type := Σ N : Nat, Presentation (Fin N)

inductive StableACMove : AnyPresentation → AnyPresentation → Prop
  | ac (N : Nat) (p q : Presentation (Fin N)) :
      -- A normal general AC move, without changing rank.
      GeneralACMove N p q →
      StableACMove ⟨N, p⟩ ⟨N, q⟩

  | stabilize (N : Nat) (p : Presentation (Fin N)) :
      -- Add one new generator and one new relator saying it is trivial.
      StableACMove
        ⟨N, p⟩
        ⟨N+1, ⟨(liftPresentationSucc N p).relations ++ [newGen N]⟩⟩

  | destabilize (N : Nat) (p : Presentation (Fin (N+1)))
      (h_split : ∃ rs : List (FreeGroup (Fin (N+1))),
        p.relations = rs ++ [newGen N] ∧ (∀ r ∈ rs, DoesNotMentionLast N r)) :
      -- Remove the last generator again, but only if the other relators
      -- do not mention it. The proof h_split is that safety check.
      StableACMove
        ⟨N+1, p⟩
        ⟨N,
          let rs := Classical.choose h_split
          let hrs := (Classical.choose_spec h_split).2
          dropLastPresentation N ⟨rs⟩ hrs⟩

  | substitute_remove (N : Nat) (rs : List (FreeGroup (Fin (N+1)))) (w : FreeGroup (Fin N)) :
      -- If the new generator is defined by y = w, substitute w everywhere
      -- and then remove y.
      StableACMove
        ⟨N+1, ⟨rs ++ [ (newGen N)⁻¹ * liftWordSucc w ]⟩⟩
        ⟨N, substituteAndRemovePresentation N rs w⟩

  | substitute_remove_any (N : Nat) (k : Fin (N + 1))
      (rs : List (FreeGroup (Fin (N + 1)))) (w : FreeGroup (Fin N)) :
      -- Same operation, but the removable generator can be any chosen k.
      StableACMove
        ⟨N + 1, ⟨rs ++ [ (FreeGroup.of k)⁻¹ * liftWordAvoiding k w ]⟩⟩
        ⟨N, substituteAndRemovePresentationAt N k rs w⟩

  | substitute_remove_neg_any (N : Nat) (k : Fin (N + 1))
      (rs₁ rs₂ : List (FreeGroup (Fin (N + 1))))
      (u v : FreeGroup (Fin (N + 1))) (w : FreeGroup (Fin N))
      (h_assign : liftWordAvoiding k w = v * u) :
      -- Solver-friendly variant: the generator k appears inside a longer word,
      -- and h_assign records the algebra needed to solve for it.
      StableACMove
        ⟨N + 1, ⟨rs₁ ++ [u * (FreeGroup.of k)⁻¹ * v] ++ rs₂⟩⟩
        ⟨N, substituteAndRemovePresentationAt N k (rs₁ ++ rs₂) w⟩

-- A stable certificate is any finite chain of these stable moves.
def StableACEquivalent : AnyPresentation → AnyPresentation → Prop :=
  Relation.EqvGen StableACMove

Knot-theoretic theorem interfaces

The project also includes a knot-theoretic interface for Wirtinger presentations. For an unknot diagram, deleting a Wirtinger relator and adding a word of exponent sum \(\pm1\) is connected to AC-triviality and stable AC-triviality. This is where the benchmark-style word problems meet the higher-level mathematical story.

I like these examples because they are not just random hard strings. They come from topology. A knot diagram gives a Wirtinger presentation; changing one relator produces a concrete word problem; and the theorem asks whether that presentation can still be simplified by AC-style moves. So the formalization is doing two jobs at once: it handles the search certificates, and it records how those certificates connect back to the geometric source of the examples.

KnotTheory.lean: paper-facing statements

-- Interface theorem: from an unknot diagram, delete one relator,
-- add a word with exponent sum ±1, and get general AC-triviality.
theorem theorem16_paper (K : KnotDiagram)
    (h_unknot : IsUnknotWirtinger K)
    (k : Fin K.n) (w : FreeGroup (Fin K.n)) (hw : HasExponentSumPM1 w) :
    GeneralACTrivial K.n (deleteRelatorAddWord K k w) := by
  -- Proof continues in KnotTheory.lean.

-- The stable version of the same paper-facing construction.
theorem proposition12_paper (K : KnotDiagram)
    (h_unknot : IsUnknotWirtinger K)
    (k : Fin K.n) (w : FreeGroup (Fin K.n)) (hw : HasExponentSumPM1 w) :
    StablyACTrivialAny K.n (deleteRelatorAddWord K k w) := by
  -- Proof continues in KnotTheory.lean.

The open conjectures remain explicit theorem stubs. That is a feature, not an accident. It means the code distinguishes between what has been formalized as an exact theorem, what is a conditional statement, what is a checked certificate for an instance, and what remains the actual open conjecture.

The open conjectures stay visibly open

-- The classical rank-two conjecture as a Lean theorem stub.
-- Assumptions: p is balanced and presents the trivial group.
-- Conclusion: p is AC-equivalent to the standard trivial presentation.
theorem AC_conjecture (p : Presentation (Fin 2)) :
  balanced p →
  AndrewCurtisConjecture.presentsTrivialGroupAny p →
  ACEquivalent p trivial_presentation := by
  -- Still open mathematically.
  sorry

-- The stable version in arbitrary rank.
theorem stable_AC_conjecture (N : Nat) (p : Presentation (Fin N)) :
  balanced p →
  AndrewCurtisConjecture.presentsTrivialGroupAny p →
  StablyACTrivialAny N p := by
  -- Also open.
  sorry

How Harmonic helped

Aristotle became useful after the definitions existed. At the beginning, the work was just formalization: choose the types, define presentations, define move systems, write small examples, and keep the conjecture stubs visible. Once there was enough Lean code to review, Aristotle became a second reader for the theorem graph.

We used Harmonic's Aristotle as an interactive formalization reviewer. The questions were concrete: which relation does this inductive type generate, which results still depend on sorry, does this theorem prove a stable statement or a classical AC statement, and can a certificate in GeneralACEquivalent 2 be used where ACEquivalent is expected?

The workflow was a loop. I would paste in a chunk of the AC Lean development and ask a specific question. Aristotle would summarize the dependencies and point to the exact theorem boundary. I would check that answer against the mathematics, edit the Lean or the prose, and ask again. The value was practical: it turned vague review notes into concrete Lean obligations.

The first thing we asked

One of the first experiments was blunt: try to prove or disprove the Andrews-Curtis conjecture. That is not a realistic request for rank two and above. But it is a good test of whether the tool understands the shape of the problem. A bad answer would claim the conjecture was solved. A useful answer would formalize the statement, prove the easy cases, and stop before the open rank-two problem.

Aristotle dashboard-style diff viewer for the first Andrews-Curtis conjecture run. — The first Aristotle run generated a small Lean project. It proved the \(n=0\) and \(n=1\) cases, checked normal-closure invariance, and did not claim to solve the rank-two problem.

The generated summary was also a good sign because it proved the boring things that should be true before any conjecture talk starts. The standard presentation presents the trivial group. A single AC move preserves the normal closure of the relators. A finite chain of moves therefore preserves the presented group. This is not the part anyone gets excited about, but it is exactly what you want underneath a certificate checker.

The \(n=1\) case is a nice sanity check. The free group on one generator is just \(\mathbb Z\). A single relator normally generates the whole group only when it is the generator or its inverse. If it is the generator, we are already at the standard presentation. If it is the inverse, one inversion move fixes it. Aristotle found that proof, and then stopped before the rank-two case, where the conjecture is open.

Aristotle-generated project: the base cases it found

-- Rank zero: there are no generators and no relators.
-- The only tuple is already the standard tuple.
theorem andrewsCurtis_n0 (r : RelatorTuple 0) (h : r.presentsTriv) :
    ACEquiv r (RelatorTuple.standard 0) := by
  convert ACEquiv.refl _
  exact funext fun i => Fin.elim0 i

-- Rank one: FreeGroup (Fin 1) behaves like ℤ.
-- A relator that normally generates everything has exponent ±1.
-- So either it is already the generator, or one inversion move is enough.
theorem andrewsCurtis_n1 (r : RelatorTuple 1) (h : r.presentsTriv) :
    ACEquiv r (RelatorTuple.standard 1) := by
  -- The generated proof is longer, but the mathematical reason is this:
  -- in rank one, the only normal generators are x and x⁻¹.

The sessions were useful in two different modes. In explanation mode, Aristotle gave a readable account of the current definitions: what \(\mathrm{ACMove}\) generated, what \(\mathrm{GeneralACMove}\) generated, and how stable AC changed rank. In repair mode, it suggested edits to project structure, imports, theorem statements, and review notes. We did not accept every suggestion blindly, but the back-and-forth made the theorem boundary much clearer. By the end of the process, we were quite happy with the tool as a reviewer for Lean formalization work.

Explanation

Summarize which results were proved, which were conditional, which still used \(\mathtt{sorry}\), and how the definitions related to the paper.

Repair suggestions

Flag mismatched definitions, stale imports, missing bridge lemmas, and statements that were stronger than the available proof path.

A concrete interaction

In one session, we asked Aristotle to read the definitions of ACMove and GeneralACMove and describe how they differed. It noted that GeneralACMove listed inversion as an explicit constructor, while ACMove did not. It then asked: is there a lemma in the file proving that the two relations generate the same equivalence on rank-two presentations? There was not. This became the bridge theorem stub, with the mathematical obligation made visible.

Stylized Harmonic Aristotle review showing the missing bridge between ACMove and GeneralACMove. — A cleaned-up version of the later Aristotle review. The UI is just context. The warning was the substance: the generated relation in one file had inversion as a primitive move, while the rank-two relation in another file did not.

What made the review useful

Aristotle was useful because it was literal. In prose, I could write "AC moves" and keep going. In Lean, the constructors are the moves. If a constructor is missing, or if another file uses a slightly different relation, the theorem has changed. Aristotle kept pointing back to that fact.

The missing bridge

We had two move systems that looked like the same mathematics. Lean treated them as two different relations.

The hand-written rank-two ACMove type had four multiplication moves and eight conjugation moves. Inversion, replacing \(r_i\) by \(r_i^{-1}\), was not a constructor. The general GeneralACMove type did include inversion as a primitive move.

Mathematically, this might be harmless if we prove that the missing inversion can be simulated by the other rank-two moves. But that theorem was not in the file. So a result proved using GeneralACEquivalent 2 could not simply be reused as a result about ACEquivalent. Aristotle caught the gap because it read the constructors, not the intention.

The audit warning in mathematical form

-- Informal statement of the issue:
-- ACEquivalent is generated by ACMove.
-- GeneralACEquivalent 2 is generated by GeneralACMove 2.
-- If the generating moves differ, then theorem transfer requires a bridge:

theorem bridge (P Q : Presentation (Fin 2)) :
  GeneralACEquivalent 2 P Q → ACEquivalent P Q := by
  -- Not automatic. A proof would have to translate every
  -- GeneralACMove 2 step into a finite chain of Basic.lean ACMove steps.
  sorry

That was the useful correction. We did not need a dramatic story. We needed one missing theorem boundary to be named.

References and related work

J. J. Andrews and M. L. Curtis. Free groups and handlebodies. Proceedings of the American Mathematical Society, 1965. The classical source of the Andrews-Curtis conjecture.
Caroline Zhang, Aaron Zhao, Robert Joseph George, Sergei Gukov, and Anima Anandkumar. Mathematical Discovery and Formalization Towards the AC Conjecture. NeurIPS MATH-AI 2025.
Robert Joseph George et al. What makes math problems hard for reinforcement learning: a case study. Background on AC-style search as a benchmark for reinforcement learning and theorem discovery.
Harmonic. Aristotle and AI-assisted formalization tools. Used here as an audit layer for Lean definitions, theorem dependencies, and formalization boundaries.
Lean theorem prover. Lean 4 documentation and theorem proving ecosystem. The final certificate checker for the formalized move systems discussed above.