Learning Lean 4 as a programming language 5 – A different style of proofs

Posted on2023-12-19 by Markus Schmaus

I haven’t posted a new article since my previous post on Fibonacci numbers. I found the idea of proving the correctness of my function highly tantalizing, but also more cumbersome than necessary. So I started working on some code which would help me prove statements about functions. This has absorbed the entire time I spend with Lean since then and I haven’t solved any additional Euler problems yet. This post describes the starting point of my journey.

The standard style which I find in Mathlib is to write functions with functional code, just like I would in Haskell, and have separate theorems about properties of these functions with proofs. This is the style which I also used for the Fibonacci numbers in my previous post. Since then I have discovered a different style, which uses Subtype to combine functions and theorems into one, and is based on a brief section on subtypes in Functional Programming in Lean.

Proofs based on `Subtype`

When the idea for this different style crystalized I was looking at Nat.minFac from Mathlib. So I decided to test the viability of this idea by re-implementing Nat.minFac in this style, and I will also use this example to explain what I mean. I don’t believe this idea to be original, others probably had a similar ideas at some point and maybe even abandoned it for good reasons, but I haven’t found a comprehensive description yet, nor any conclusive reason why it wouldn’t work for my use case.

In Mathlib there is a function minFac and a theorem minFac_has_prop with these type signatures:

def minFac (n : ℕ) : ℕ := sorry

theorem minFac_has_prop {n : ℕ} (n1 : n ≠ 1) : minFacProp n (minFac n) := sorry

Both of these can be combined into a single function using Subtype in its type signature:

def minFac' (n : {n : ℕ // 2 ≤ n}) : {m : ℕ // minFacProp' n m} := sorry

This signature essentially expresses the same statement as minFac_has_prop, but it’s body contains both code for computing the result, as well as the proof for this statement. There is one significant difference here, as I restrict the domain to n ≥ 2. I will discuss this in detail below, for now here is my implementation of minFac' (on live.lean-lang.org)

import Mathlib

namespace Nat

theorem eq_or_succ_le_of_le {a b : ℕ} (h : a ≤ b) : a = b ∨ Nat.succ a ≤ b := by
  cases Nat.eq_or_lt_of_le h with
  | inl h_eq => left; assumption
  | inr h_lt => right; assumption

structure minFacProp' (n : ℕ) (result : ℕ) : Prop where
  intro ::
  ge_two : 2 ≤ result
  fac : result ∣ n
  min : ∀ m : ℕ, 2 ≤ m → m < result → ¬ m ∣ n

def minFacAux' (n : {n : ℕ // 2 ≤ n ∧ ¬ 2 ∣ n}) :
    {k : ℕ // (3 ≤ k ∧ 2 ∣ k.succ) ∧ ∀ m : ℕ, 2 ≤ m → m < k → ¬ m ∣ n} →
    {m : ℕ // minFacProp' n m}
  | k =>
    if h : ¬ k.val * k.val ≤ n.val  then
      Subtype.mk n <| by
        apply minFacProp'.intro
        · exact n.property.left
        · simp[dvd_refl]
        · have n_prime : Prime n := by
            have sqrt_lt_k : sqrt n.val < k.val := not_le.mp <| (Iff.not Nat.le_sqrt).mpr h
            apply (and_imp.mp Nat.prime_def_le_sqrt.mpr) n.property.left
            intros m m_ge_2 m_le_sqrt
            exact k.property.right m m_ge_2 <| lt_of_le_of_lt m_le_sqrt sqrt_lt_k
          exact Nat.prime_def_lt'.mp n_prime |>.right
    else
      if k_dvd_n : k.val ∣ n then
        Subtype.mk k <| by
          apply minFacProp'.intro
          · have : 2 ≤ 3 := by decide
            have := le_trans this k.property.left.left
            assumption
          · exact k_dvd_n
          · exact k.property.right
      else
        -- used for termination
        have : Nat.sqrt n - k < Nat.sqrt n + 2 - k := by
          have sqrt_lt_sqrt_add2 : sqrt ↑n < sqrt ↑n + 2 := by simp only [lt_add_iff_pos_right]
          have k_le_sqrt := Nat.le_sqrt.mpr <| of_not_not h
          exact (tsub_lt_tsub_iff_right k_le_sqrt).mpr sqrt_lt_sqrt_add2

        let k' := Subtype.mk (k.val + 2) <| by
          apply And.intro
          · apply And.intro
            · have k_le_k_plus_2 : k.val ≤ k.val + 2 := by simp only [le_add_iff_nonneg_right]
              exact (le_trans k.property.left.left k_le_k_plus_2)
            · exact Nat.dvd_add_self_right.mpr k.property.left.right
          · intros m m_ge_2 m_lt_k_plus_2
            have := k.property.right m m_ge_2
            by_cases m < k
            · exact k.property.right m m_ge_2 h
            · cases eq_or_succ_le_of_le <| le_of_not_gt h with
              | inl k_eq_m =>
                rw [k_eq_m] at k_dvd_n
                exact k_dvd_n
              | inr succ_k_le_m =>
                cases eq_or_succ_le_of_le succ_k_le_m with
                | inl succ_k_eq_m =>
                  rw [←succ_k_eq_m]
                  have := n.property.right
                  by_contra m_dvd_n
                  have := Nat.dvd_trans k.property.left.right m_dvd_n
                  contradiction
                | inr succ_succ_k_le_m =>
                  have := lt_of_le_of_lt succ_succ_k_le_m m_lt_k_plus_2
                  simp only [lt_self_iff_false] at this

        minFacAux' n k'
termination_by _ n k => sqrt n + 2 - k

def minFac' (n : {n : ℕ // 2 ≤ n}) : {m : ℕ // minFacProp' n m} :=
  if _2_dvd_n : 2 ∣ n.val then
    Subtype.mk 2 <| by
      apply minFacProp'.intro
      · decide
      · exact _2_dvd_n
      · intros m _2_le_m m_lt_2
        have := lt_of_le_of_lt _2_le_m m_lt_2
        contradiction
  else
    let n' := Subtype.mk n.val <| And.intro n.property _2_dvd_n
    let k' := Subtype.mk 3 <| by
      simp only [Nat.isUnit_iff, true_and]
      intros m _2_le_m m_lt_3
      have m_eq_2 := eq_of_le_of_lt_succ _2_le_m m_lt_3
      rw [m_eq_2]
      exact _2_dvd_n
    minFacAux' n' k'

end Nat

The key feature of this code is that whenever I compute a result I use Subtype.mk to also supply a proof that the value has some desired property. Most importantly, I use this pattern for all result values. Conversely, I can use .property to access the defining property of such a value in the proof parts.

Why I like this style

One thing I have struggled with when trying to proof theorems about functions, is that the proof needs to replicate the structure of the function. This is also apparent in the original proof of minFacAux_has_prop. Since minFacAux uses if h : n < k * k then n, minFacAux_has_prop has to use by_cases h : n < k * k, and later if k ∣ n then k necessitates by_cases dk : k ∣ n in the proof. In the Subtype based style there is no duplication of structure, the computation of a value and the proof of it’s property automatically happens in the same context. Similarly, because minFacAux recursively calls minFacAux n (k + 2), minFac_has_prop has to invoke minFacAux_has_prop, with some rather complex arguments, to proof that the result has the desired property, while with the Subtype style the proof of this property is automatically attached to the value.

Contract programming

Functions using Subtype for their arguments and return value remind me of contract programming. The Subtype predicates of arguments represent preconditions, while the Subtype predicate of the return type represents postconditions. From this perspective Lean 4 offers the ability to use contract programming with compile time checking of contracts, which makes Lean 4 the most exciting language I have ever looked at.

Additional burden

The section on subtypes in Functional Programming in Lean, which this is all based on includes a warning:

Subtypes are a two-edged sword. They allow efficient representation of validation rules, but they transfer the burden of maintaining these rules to the users of the library, who have to prove that they are not violating important invariants. Generally, it’s a good idea to use them internally to a library, providing an API to users that automatically ensures that all invariants are satisfied, with any necessary proofs being internal to the library.

Since I am promoting the the use of Subtype I should address this warning. First, this doesn’t apply to the return type, the maximal burden a user might incur from a return type using Subtype, is having to type .val in case the values isn’t automatically coerced to the underlying type. Second, there exists a principal in contract programming, that the preconditions should be as weak as possible, which applies here as well. If at all reasonable, arguments shouldn’t use Subtype, and if they do these should be simple properties which are easy to proof. Finally, if preconditions cannot be reasonably weakened any further, putting that burden on the user when invoking the function is justified, as I will show in the next section.

Restricting the domain (`2 ≤ n`)

One big difference with my reimplementation is that I restrict the domain of minFac' to 2 ≤ n, while the standard implementation minFac in Lean 4 accepts both n = 0 and n = 1. This is an additional precondition, which the user of minFac' has to proof before calling this function. Before I justify this choice, let me first address why Lean 4 usually doesn’t make such restrictions. I believe the blog post “Division by zero in type theory: a FAQ” by Kevin Buzzard perfectly explains why functions in Lean 4 have a very broad domain, even including nonsensical input values, like the prime example 1/0 = 0. Trying to summarize it in one sentence, I would say: it doesn’t matter that such functions return some value for nonsensical inputs, as long as these inputs don’t show up in the theorems. This is perfectly valid and justified, as long as you are only interested in the theorems. But, if like I, you are interested in Lean as a programming language, this can be problematic. If I write a/b somewhere in my program, I probably reason, that b can never be 0. If my reasoning is flawed, and b ends up being 0 after all, I would rather have the program throw an error, so I can notice and fix it, rather than silently produce some nonsensical result. The solution within Lean 4 is of course to proof the reasoning to be correct, so I can be sure b never is 0. In the standard style this is done in a separate theorem, but as explained above this requires the proof of that theorem to replicate the structure of the function, and I find it much easier to proof the correctness of the code when the function is invoked. If I want to use such a function, but find that proofing its correct use isn’t worth the effort, I can always use sorry.

Summary and next steps

I think the Subtype style is very promising for implementing provable contract programming in Lean 4. It’s a possible to use this style to combine code with proofs, and since it doesn’t require to replicate the structure of the function within the proof of the theorem, I find it easier. However there is very little infrastructure in place to make working in this style easier. For example, while Subtype itself isn’t a monad or even a functor, there exists a related structure which is both a functor and monad, which isn’t defined in Lean 4 yet, and which might make working in this style even easier. So I’ve been investigating the possibility of this and related monads to make proving properties of functions easier.

Proofs based on Subtype