EGraphs and Equality Saturation

An EGraph is an efficient data structure for representing congruence relations. EGraphs are data structures originating from theorem provers. Several projects have very recently repurposed EGraphs to implement state-of-the-art, rewrite-driven compiler optimizations and program synthesizers using a technique known as equality saturation. Metatheory.jl provides a general purpose, customizable implementation of EGraphs and equality saturation, inspired from the egg library for Rust. You can read more about the design of the EGraph data structure and equality saturation algorithm in the egg paper.

Let's load Metatheory and the rule library

using Metatheory
using Metatheory.Library

What can I do with EGraphs in Metatheory.jl?

In classical term rewriting, rewrites are typically destructive and forget the matched left-hand side. Therefore, rules are applied in an arbitrary or controlled order - this often results in local minima and looping. For decades, programmers and scientists using term rewriting systems have spent their time trying to find confluent and terminating systems of rules. This requires a lot of effort and time. When studying any computational, mathematical or scientific system governed by equational rules, about non obviously oriented equations, such as (a + b) + c = a + (b + c )?

E-Graphs come to our help. EGraphs are bipartite graphs of ENodes and EClasses: a data structure for efficiently represent and rewrite on many equivalent expressions at the same time. A sort of fast data structure for sets of trees. Subtrees and parents are shared if possible. This makes EGraphs similar to DAGs. Most importantly, with EGraph rewriting you can use bidirectional rewrite rules, such as equalities without worrying about the ordering and confluence of your rewrite system! Therefore, rule application in EGraphs is non-destructive - everything is copied! This allows users to run non-deterministic rewrite systems. Many rules can match at the same time and the previous state of expressions will not be lost.

The EGraph backend for Metatheory.jl allows you to create an EGraph from a starting expression, to add more expressions to the EGraph with addexpr!, and then to effectively fill the EGraph with all possible equivalent expressions resulting from applying rewrite rules from a theory, by using the saturate! function. You can then easily extract expressions from an e-graph by calling extract! with a cost function.

A killer feature of egg and Metatheory.jl are EGraph Analyses. They allow you to annotate expressions and equivalence classes in an EGraph with values from a semilattice domain, and then to:

  • Automatically extract optimal expressions from an EGraph deciding from analysis data.
  • Have conditional rules that are executed if some criteria is met on analysis data
  • Have dynamic rules that compute the right hand side based on analysis data.

Library

The Metatheory.Library module contains utility functions and macros for creating rules and theories from commonly used algebraic structures and properties, to be used with the e-graph backend.

comm_monoid = @commutative_monoid (*) 1

# output

4-element Vector{RewriteRule}:
 ~a * ~b --> ~b * ~a
 (~a * ~b) * ~c --> ~a * (~b * ~c)
 ~a * (~b * ~c) --> (~a * ~b) * ~c
 1 * ~a --> ~a

Theories and Algebraic Structures

The e-graphs backend can directly handle associativity, equalities commutativity and distributivity, rules that are otherwise known of causing loops and require extensive user reasoning in classical rewriting.

t = @theory a b c begin
    a * b == b * a
    a * 1 == a
    a * (b * c) == (a * b) * c
end

# output

3-element Vector{EqualityRule}:
 ~a * ~b == ~b * ~a
 ~a * 1 == ~a
 ~a * (~b * ~c) == (~a * ~b) * ~c

Equality Saturation

We can programmatically build and saturate an EGraph. The function saturate! takes an EGraph and a theory, and executes equality saturation. Returns a report of the equality saturation process. saturate! is configurable, customizable parameters include a timeout on the number of iterations, a eclasslimit on the number of e-classes in the EGraph, a stopwhen functions that stops saturation when it evaluates to true.

g = EGraph(:((a * b) * (1 * (b + c))));
report = saturate!(g, t);

With the EGraph equality saturation backend, Metatheory.jl can prove simple equalities very efficiently. The @areequal macro takes a theory and some expressions and returns true iff the expressions are equal according to the theory. The following example may return true with an appropriate example theory.

julia> @areequal some_theory (x+y)*(a+b) ((a*(x+y))+b*(x+y)) ((x*(a+b))+y*(a+b)) 

Configurable Parameters

EGraphs.saturate! can accept an additional parameter of type EGraphs.SaturationParams to configure the equality saturation algorithm. Extensive documentation for the configurable parameters is available in the EGraphs.SaturationParams API docstring.

# create the saturation params
params = SaturationParams(timeout=10, eclasslimit=4000)
saturate!(egraph, theory, params)

Outline of the Equality Saturation Algorithm

The saturate! function behaves as following. Given a starting e-graph g, a set of rewrite rules t and some parameters p (including an iteration limit n):

  • For each rule in t, search through the e-graph for l.h.s.
  • For each match produced, apply the rewrite
  • Do a bottom-up traversal of the e-graph to rebuild the congruence closure
  • If the e-graph hasn’t changed from last iteration, it has saturated. If so, halt saturation.
  • Loop at most n times.

Note that knowing if an expression with a set of rules saturates an e-graph or never terminates is still an open research problem

Extracting from an EGraph

Since e-graphs non-deterministically represent many equivalent symbolic terms, extracting an expression from an EGraph is the process of selecting and extracting a single symbolic expression from the set of all the possible expressions contained in the EGraph. Extraction is done through the extract! function, and the theoretical background behind this procedure is an EGraph Analysis; A cost function is provided as a parameter to the extract! function. This cost function will examine mostly every e-node in the e-graph and will determine which e-nodes will be chosen from each e-class through an automated, recursive algorithm.

Metatheory.jl already provides some simple cost functions, such as astsize, which expresses preference for the smallest expressions contained in equivalence classes.

Here's an example Given the theory:

using Metatheory
using Metatheory.Library

comm_monoid = @commutative_monoid (*) 1;
t = @theory a b c begin
    a + 0 --> a
    a + b --> b + a
    a + inv(a) --> 0 # inverse
    a + (b + c) --> (a + b) + c
	a * (b + c) --> (a * b) + (a * c)
	(a * b) + (a * c) --> a * (b + c)
	a * a --> a^2
	a --> a^1
	a^b * a^c --> a^(b+c)
	log(a^b) --> b * log(a)
	log(a * b) --> log(a) + log(b)
	log(1) --> 0
	log(:e) --> 1
	:e^(log(a)) --> a
	a::Number + b::Number => a + b
	a::Number * b::Number => a * b
end
t = comm_monoid ∪ t ;

We can extract an expression by using

expr = :((log(e) * log(e)) * (log(a^3 * a^2)))
g = EGraph(expr)
saturate!(g, t)
ex = extract!(g, astsize)
:(log(a ^ 5))

The second argument to extract! is a cost function. astsize is a cost function provided by default, which computes the size of expressions.

Defining custom cost functions for extraction.

A cost function for EGraph extraction is a function used to determine which e-node will be extracted from an e-class.

It must return a positive, non-complex number value and, must accept 3 arguments.

  1. The current ENode n that is being inspected.
  2. The current EGraph g.
  3. The current analysis name an::Symbol.

From those 3 parameters, one can access all the data needed to compute the cost of an e-node recursively.

  • One can use TermInterface.jl methods to access the operation and child arguments of an e-node: operation(n), arity(n) and arguments(n)
  • Since e-node children always point to e-classes in the same e-graph, one can retrieve the EClass object for each child of the currently visited enode with g[id] for id in arguments(n)
  • One can inspect the analysis data for a given eclass and a given analysis name an, by using hasdata and getdata.
  • Extraction analyses always associate a tuple of 2 values to a single e-class: which e-node is the one that minimizes the cost

and its cost. More details can be found in the egg paper in the Analyses section.

Here's an example:

# This is a cost function that behaves like `astsize` but increments the cost 
# of nodes containing the `^` operation. This results in a tendency to avoid 
# extraction of expressions containing '^'.
function cost_function(n::ENodeTerm, g::EGraph)
    cost = 1 + arity(n)

    operation(n) == :^ && (cost += 2)

    for id in arguments(n)
        eclass = g[id]
        # if the child e-class has not yet been analyzed, return +Inf
        !hasdata(eclass, cost_function) && (cost += Inf; break)
        cost += last(getdata(eclass, cost_function))
    end
    return cost
end

# All literal expressions (e.g `a`, 123, 0.42, "hello") have cost 1
cost_function(n::ENodeLiteral, g::EGraph) = 1

EGraph Analyses

An EGraph Analysis is an efficient and automated way of analyzing all the possible terms contained in an e-graph. Metatheory.jl provides a toolkit to ease and automate the process of EGraph Analysis.

An EGraph Analysis defines a domain of values and associates a value from the domain to each EClass in the graph. Theoretically, the domain should form a join semilattice. Rewrites can cooperate with e-class analyses by depending on analysis facts and adding equivalences that in turn establish additional facts.

In Metatheory.jl, EGraph Analyses are uniquely identified by either

  • An unique name of type Symbol.
  • A function object f, used for cost function analysis. This will use built-in definitions of make and join.

If you are specifying a custom analysis by its Symbol name, the following functions define an interface for analyses based on multiple dispatch on Val{analysis_name::Symbol}:

  • islazy(an) should return true if the analysis name an should NOT be computed on-the-fly during egraphs operation, but only when inspected.
  • make(an, egraph, n) should take an ENode n and return a value from the analysis domain.
  • join(an, x,y) should return the semilattice join of x and y in the analysis domain (e.g. given two analyses value from ENodes in the same EClass, which one should I choose?). If an is a Function, it is treated as a cost function analysis, it is automatically defined to be the minimum analysis value between x and y. Typically, the domain value of cost functions are real numbers, but if you really do want to have your own cost type, make sure that Base.isless is defined.
  • modify!(an, egraph, eclassid) Can be optionally implemented. This can be used modify an EClass egraph[eclassid] on-the-fly during an e-graph saturation iteration, given its analysis value.

Defining a custom analysis

In this example, we will provide a custom analysis that tags each EClass in an EGraph with :even if it contains an even number or with :odd if it represents an odd number, or nothing if it does not contain a number at all. Let's suppose that the language of the symbolic expressions that we are considering will contain only integer numbers, variable symbols and the `and+` operations.*

Since we are in a symbolic computation context, we are not interested in the the actual numeric result of the expressions in the EGraph, but we only care to analyze and identify the symbolic expressions that will result in an even or an odd number.

Defining an EGraph Analysis is similar to the process Mathematical Induction. To define a custom EGraph Analysis, one should start by defining a name of type Symbol that will be used to identify this specific analysis and to dispatch against the required methods.

using Metatheory
using Metatheory.EGraphs

The next step, the base case of induction, is to define a method for make dispatching against our OddEvenAnalysis. First, we want to associate an analysis value only to the literals contained in the EGraph. To do this we take advantage of multiple dispatch against ENodeLiteral.

function EGraphs.make(::Val{:OddEvenAnalysis}, g::EGraph, n::ENodeLiteral)
    if n.value isa Integer
        return iseven(n.value) ? :even : :odd
    else 
        return nothing
    end
end

Now we have to consider the induction step. Knowing that our language contains only * and + operations, and knowing that:

  • odd * odd = odd
  • odd * even = even
  • even * even = even

And we know that

  • odd + odd = even
  • odd + even = odd
  • even + even = even

We can now define a method for make dispatching against OddEvenAnalysis and ENodeTerms to compute the analysis value for nested symbolic terms. We take advantage of the methods in TermInterface to inspect the content of an ENodeTerm. From the definition of an ENode, we know that children of ENodes are always IDs pointing to EClasses in the EGraph.

function EGraphs.make(::Val{:OddEvenAnalysis}, g::EGraph, n::ENodeTerm)
    # Let's consider only binary function call terms.
    if exprhead(n) == :call && arity(n) == 2
        op = operation(n)
        # Get the left and right child eclasses
        child_eclasses = arguments(n)
        l = g[child_eclasses[1]]
        r = g[child_eclasses[2]]

        # Get the corresponding OddEvenAnalysis value of the children
        # defaulting to nothing 
        ldata = getdata(l, :OddEvenAnalysis, nothing)
        rdata = getdata(r, :OddEvenAnalysis, nothing)

        if ldata isa Symbol && rdata isa Symbol
            if op == :*
                if ldata == rdata
                    ldata
                elseif (ldata == :even || rdata == :even) 
                    :even
                else
                    nothing 
                end
            elseif op == :+
                (ldata == rdata) ? :even : :odd
            end
        elseif isnothing(ldata) && rdata isa Symbol && op == :*
            rdata
        elseif ldata isa Symbol && isnothing(rdata) && op == :*
            ldata
        end
    end

    return nothing
end

We have now defined a way of tagging each ENode in the EGraph with :odd or :even, reasoning inductively on the analyses values. The analyze! function will do the dirty job of doing a recursive walk over the EGraph. The missing piece, is now telling Metatheory.jl how to merge together analysis values. Since EClasses represent many equal ENodes, we have to inform the automated analysis how to extract a single value out of the many analyses values contained in an EGraph. We do this by defining a method for join.

function EGraphs.join(::Val{:OddEvenAnalysis}, a, b)
    if a == b 
        return a 
    else
        # an expression cannot be odd and even at the same time!
        # this is contradictory, so we ignore the analysis value
        return nothing 
    end
end

We do not care to modify the content of EClasses in consequence of our analysis. Therefore, we can skip the definition of modify!. We are now ready to test our analysis.

t = @theory a b c begin 
    a * (b * c) == (a * b) * c
    a + (b + c) == (a + b) + c
    a * b == b * a
    a + b == b + a
    a * (b + c) == (a * b) + (a * c)
end

function custom_analysis(expr)
    g = EGraph(expr)
    saturate!(g, t)
    analyze!(g, OddEvenAnalysis)
    return getdata(g[g.root], OddEvenAnalysis)
end

custom_analysis(:(2*a)) # :even
custom_analysis(:(3*3)) # :odd
custom_analysis(:(3*(2+a)*2)) # :even
custom_analysis(:(3y * (2x*y))) # :even