FreshLists: Containers That Only Accept "Fresh" Elements
When programming, we quite frequently encounter the need for a data
structure that can contain at most one of each given element. Typically,
we might use a Set
, which satisfies this constraint, but
does so as a runtime invariant, which must be taken on trust, and
results in ergonomic concerns when used as a component of API design. We
introduce "Fresh Lists" as an alternative that provide a type-level
guarantee of the uniqueness of each element, while also generalizing the
notion of uniqueness and providing compile-time checking of
literals.
As best as I can tell, the concept of the FreshList
originated with Catarina Coquand's A Formalised Proof of the
Soundness and Completeness of a Simply Typed Lambda-Calculus with
Explicit Substitutions paper, and acquired the name 'FreshList'
when they were modified for inclusion in Agda's
standard library. The implementation in this post is inspired both
by the Agda implementation, and by the implementation in collie.
The Problems With Sets
Let's define a little example function that consumes a
SortedSet
to demonstrate my complaints with:
takesSortedSet : SortedSet a -> Bool
takesSortedSet _ = True
Set
's
Constraints on the Element Type
In general, Set
s require that the element type is either
hash-able or order-able, Idris at the moment only provides a
SortedSet
, requiring the element implement
Ord
.
While many types that we might want to apply the uniqueness constraint to have natural and reasonable orderings, like integers or strings, applying such a constraint isn't always desirable.
As an example, I've found myself implementing contrived orderings based on an internal integer identifier that was supposed to be completely opaque for opaque token types in other languages.
The constraint for a HashSet
is less annoying, but it's
still a bit onerous to require that a type be hashable just to guarantee
that a collection contains at most one of each given item.
Unchecked Literals
In most languages, including Idris, there isn't a literal for
Set
types, instead you have to call some conversion
function on a list or array literal to specify one at compile time.
This works ok, wrapped literals do, in fact, work when
passed to functions consuming Set
s:
takesSortedSet $ fromList [1, 2, 3, 4, 5]
However, this provides no compile time checking that the elements in the input list are actually unique, the programmer could very well accidentally duplicate a value without any warning from the compiler:
takesSortedSet $ fromList [1, 1, 2, 3, 4, 5]
This can be especially problematic when the literal contains variables, when the construction of the values incurs side effects, or when the values are complex and only some of the contained data is used in the equality check.
Sets only provide uniqueness over equality
While the need for a collection of elements to be unique specifically under equality is really common, its often quite useful to work with a more general notion of uniqueness.
As an example, consider a collection grouping updates to a database
table. It might be useful for the collection to enforce that no two of
the updates touch the same rows, maybe to guarantee they can all easily
be run in parallel. This comparison can be thought of as of equality, it
is an equivalence relation1, which is a sort
of equality, but it would be spectacularly weird to
express this through an implementation of a languages Eq
interface, and any ordering derived from this would be quite
contrived.
Lifting Arbitrary Runtime Checks to the Type Level
While we could build our structure on top of type-level witnesses,
such as Not (x = y)
, proving, structurally, that each pair
of values is not equal, this would be needlessly restrictive, not every
criteria of interest is easily expressible as a type-level
predicate.
Instead, we will use Data.So
's So
type to
lift an arbitrary user-defined runtime predicate to the type level.
So
is defined like so:
Note
The apostrophes have been added to avoid conflict with the standard library data type
data So' : Bool -> Type where
Oh' : So' True
This may seem a little bit confusing if you aren't used to dependent types, don't worry if it doesn't sit right at first, it'll start to make sense as we use it.
The So
type constructor takes an argument of type
Bool
, and its sole data constructor, Oh
, can
only produce values of So
where the value of the argument
to the type constructor is True
. This isn't super
useful on its own, but grows wings when we leverage the type checker's
symbolic evaluation capability.
Lets define a silly little runtime check to write some examples with:
isEven : (Integral ty, Eq ty) => ty -> Bool
isEven x = (x `mod` 2) == 0
We can then call this predicate in So
's type
constructor, for example, to construct a function that only accepts even
numbers:
onlyEven : (Integral ty, Eq ty) => (x : ty) -> {auto prf: So (isEven x)} -> Bool
onlyEven x = True
Since isEven
has type ty -> Bool
, the
statement isEven x
has type Bool
, and is thus
allowed as an argument to So
's type constructor. Thanks to
the type checker's symbolic evaluation capability,
So (isEven x)
is a type that we can only construct a value
of if isEven x
returns True
.
Notice that the prf
argument is declared as an
auto-implicit, the caller doesn't need to explicitly provide that
argument, the compiler will try to locate a suitable value from the
environment, or construct one. So
normally plays quite well
with proof search for reasonable predicates, letting us pretend the
argument doesn't exist when working with literals, so long as the
predicate is satisfied.
This function, as expected, accepts even numbers:
Tip
Since we are passing a literal here, the compiler automatically
generates an appropriate value of the prf
argument to
onlyEven
through proof search
onlyEven 4
It also provides a compile time error if you try to pass it an odd literal:
failing
oopsNotEven : Bool
oopsNotEven = onlyEven 3
The type checker's symbolic evaluation gives us a short cut for
constructing values of So
when the predicate is "obviously"
true2:
sixIsEven : So (isEven 6)
sixIsEven = Oh
In this example, isEven 6
gets symbolically expanded and
evaluated to True
during type checking, allowing us to
return Oh
.
For more complicated predicates, or values constructed at run time,
the choose
function is provided. choose
will
evaluate the given predicate and provide a witness of either its
truthiness or its falseiness, defined as so:
choose' : (b : Bool) -> Either (So b) (So (not b))
choose' True = Left Oh
choose' False = Right Oh
To provide an example of its use, let's write an overly complicated function to check if a number is even:
complicatedIsEven : (Integral ty, Eq ty) => ty -> Bool
complicatedIsEven x =
case choose (isEven x) of
Left y => onlyEven x
Right y => False
choose
returns a Left
in the event the
predicate evaluates to True
, by pattern matching on the
result of choose
, we can extract the
So (isEven x)
in the Left
branch, which then
allows us to pass x
to onlyEven
, with the
compiler able to detect that y
has the same type as the
prf
auto-implicit argument and filling in y
for prf
for us.
Introducing The FreshList
Freshness
As we want to generalize the notion of equality, we ought to give our generalized concept a name.
We refer to a general predicate of type
pred : a -> a -> Bool
as a "freshness" criteria, and
say that an element x
is "fresh" relative to an element
y
when pred x y
evaluates to
True
. An element x
is "fresh" relative to a
list xs
when it is independently "fresh" relative to every
element in xs
.
To recover Set
like behavior, we can use non-equality
(/=
) as the "freshness" criteria, stating that an element
is "fresh" when it is not-equal to every element already in the
list.
When an equivalence relation3 is used as the
"freshness" criteria, the FreshList
will behave roughly
like a Set
, but with a potentially non-standard definition
of equality, instead allowing at most one element from each equivalence
class. "Freshness", however, generalizes beyond equivalence relations,
and we can get some really interesting and useful behavior by using a
non-equivalence relation as the "freshness" criteria, we will cover some
examples later in the article.
Forward declarations
Our definitions are about to get mutually recursive, and since Idris is a strictly define-before-use language, we'll forward declare the types to enable that mutual recursion. Don't worry, we'll cover what each definition means in detail as we fill them in.
We'll also go ahead and hide a built-in operator to avoid conflicts, and declare the operators we will be defining.
%hide Builtin.infixr.(#)
export infix 4 #, ##, #?
public export
data FreshList : (fresh : e -> e -> Bool) -> (e : Type) -> Type
||| Check if an element is "fresh" relative to all of the elements in a given freshlist
public export
(##) : {fresh : e -> e -> Bool} -> (x : e) -> (xs : FreshList fresh e) -> Bool
||| Lift `##` to the type level with `So`
public export
(#) : {fresh : e -> e -> Bool} -> (x : e) -> (xs : FreshList fresh e) -> Type
FreshList proper
Our operators need to be able to destructure FreshList
s
for us to be able to define them, so we'll define FreshList
first:
Tip
In Idris, list literals, like [1, 2, 3]
, are pure syntax
sugar. List literals can be used for any type with a ::
constructor taking exactly 2 explicit arguments and a Nil
constructor taking exactly 0 explicit arguments.
public export
data FreshList : (fresh : e -> e -> Bool) -> (e : Type) -> Type where
Nil : FreshList fresh e
(::) : (x : e) -> (xs : FreshList fresh e) -> {auto 0 freshness : x # xs}
-> FreshList fresh e
%name FreshList xs, ys, zs
The freshness criteria that applies to this particular
FreshList
is taken as the first argument to the
FreshList
type constructor, and the element type as the
second argument. This order is chosen to allow implementation of the
standard collection interfaces like Foldable
for
FreshList
.
A value of type FreshList (/=) Nat
would be a list of
Nat
s with the Set
-like behavior of requiring
that each Nat
be non-equal to every other Nat
in the list.
The Nil
constructor for FreshList
is pretty
straight forward, nearly identical to List
's
Nil
constructor.
The ::
(pronounced 'cons') operator is also very similar
to List
's, but "enhanced" with an erased auto implicit
argument, applying our #
operator to the new element
x
and the rest of the list (xs
). We haven't
implemented #
yet, but it's going to be a short hand for
checking that the new element is fresh relative to all of the elements
already in the list.
The erasure (the 0
after the auto
) provides
an interesting bit of semantics, it restricts the freshness
value (which serves as proof that x
is fresh relative to
the rest of the list) such that it is only allowed to exist at compile
time. While we still have to go through the motions of producing the
freshness
value, even if it's handled for us by the
compiler through proof search, but the freshness
value
doesn't exist at runtime. Appending to a FreshList
is thus
a linear time (O(n)
) operation, but the runtime
representation is exactly that of List
(in fact, Idris has
an optimization that should result in conversions from
FreshList
to List
being optimized out), with
no space overhead from the freshness
proof.
The result, being unable to produce a value of a type without going through the motions of making sure its invariants are upheld, but not paying the runtime cost of actually storing the witness, is a very common pattern in Idris 2.
Our Operators
First, we'll tackle the ##
operator, which as a
refresher, has type
(##) : {fresh : e -> e -> Bool} -> (x : e) -> (xs : FreshList fresh e) -> Bool
:
x ## [] = True
x ## (y :: xs) = (x `fresh` y) && (x ## xs)
This operator takes our freshness predicate as an implicit argument,
named fresh
, and recursively checks that an element is
fresh relative to every element in a given FreshList
, in
effect "lifting" our runtime freshness check over two elements into a
runtime freshness check over an element and an entire list.
While this provides us our whole-list runtime freshness check, we
still need to lift this into a type level witness, which the
#
operator, with type
(#) : {fresh : e -> e -> Bool} -> (x : e) -> (xs : FreshList fresh e) -> Type
,
does by dressing our ##
operator in So
clothing:
x # xs = So (x ## xs)
Some operations over FreshLists
We can define all of our usual list operations over
FreshList
s, but most of them are going to be just about
identical to the List
equivalent, so we'll just cover some
interesting ones. This post will be referenced in future posts, and this
section will grow as new operations are required.
We can't unconditionally append elements to a FreshList
,
they must be fresh after all, and it can be a bit onerous to have to do
the freshness check ourself every time, so we'll define a
maybeAppend
function that runs the freshness check,
returning a Just
if the new element is, in fact, fresh, and
Nothing
otherwise:
maybeAppend : {fresh : e -> e -> Bool}
-> (x : e) -> (xs : FreshList fresh e) -> Maybe (FreshList fresh e)
maybeAppend x xs =
case choose (x ## xs) of
Left fresh => Just (x :: xs)
Right not_fresh => Nothing
Basic Usage of FreshLists
Like Set
s, but
with checked literals
To make things a little cleaner, we'll define an alias for
/=
restricted to Nat
s. Using the
/=
operator directly is possible, but requires a little bit
of mess, the type of such a FreshList
needs to be written
like FreshList ((/=) {ty = Nat}) Nat
, specifying the type
that /=
applies to, so to avoid the clutter:
Neq : Nat -> Nat -> Bool
Neq = (/=)
As previously discussed, a FreshList
with non-equality
as the freshness criteria behaves much like a set, containing at most
one of each value. Unlike a set, however, this is ensured at the type
level.
Lets define a silly lil function that accepts a
FreshList
of unique numbers to use for some examples:
Note
Notice how this definition does not impose any constraints upon the
element type (ignoring that we have explicitly specified it to be
Nat
). While our definition of Neq
implicitly
uses Nat
's Eq
implementation, this is entirely
a result of our own choices as a consumer of FreshList
.
takesUniqueNumbers : FreshList Neq Nat -> Bool
takesUniqueNumbers xs = True
As expected, this function happily accepts a literal
FreshList
containing only unique numbers:
takesUniqueNumbers [1, 2, 3, 4, 5]
Notice how, thanks to proof search and literal overloading, we don't
have to apply a conversion function to the list-style literal here, the
literal directly produces a FreshList
carrying the
associated type-level guarantees.
Additionally, as an upgrade to SortedSet
, at least for
this use-case, attempts to define a literal with duplicate elements are
rejected by the compiler:
failing
oopsNotUnique : Bool
oopsNotUnique = takesUniqueNumbers [1, 2, 2, 3, 4, 5]
Like
Set
s, but with other equivalence relations
We can construct another freshness criteria that compares for
non-equality modulo some number, a fairly straight forward example of an
equivalence class that isn't equality and would be profoundly
weird to express as an implementation of the language's Eq
interface:
Note
We have to use modNatNZ
here instead of just
mod
to allow the compiler to proof search the freshness
witnesses for us.
NeqMod10 : Nat -> Nat -> Bool
NeqMod10 k j =
(modNatNZ k 10 ItIsSucc) /= (modNatNZ j 10 ItIsSucc)
We can now use this relationship to define a function that, for
example, accepts a list of Nat
s that are non-equal modulo
10, corresponding to the requirement that every number's last digit be
different:
differentLastDigits : FreshList NeqMod10 Nat -> Bool
differentLastDigits xs = True
This function happily accepts a list where every element's last digit is different:
differentLastDigits [23, 12, 41, 56]
And also fails with a compile time error if we try to provide it a list where two numbers have the same last digit:
failing
oopsSameLastDigit : Bool
oopsSameLastDigit = differentLastDigits [23, 11, 41, 56]
With a non-equivalence relation
Our freshness criteria doesn't have to be an equivalence
relationship, if we instead use, say, <
as our freshness
criteria, we end up enforcing not only uniqueness, but that the list be
in sorted order. As before, we'll define an alias to clean up the
following code:
Lt : Nat -> Nat -> Bool
Lt = (<)
We can then define a silly lil boy accepting lists of numbers sorted in ascending order:
acceptsAscending : FreshList Lt Nat -> Bool
acceptsAscending xs = True
As expected, this happily accepts lists of unique numbers defined in ascending order:
acceptsAscending [1, 2, 3, 4, 5]
But rejects lists containing repeated elements:
failing
oopsRepeated : Bool
oopsRepeated = acceptsAscending [1, 1, 3, 4, 5]
And also rejects lists with elements that are unique, but defined out of order:
failing
oopsOutOfOrder : Bool
oopsOutOfOrder = acceptsAscending [2, 1, 3, 4, 5]
More Complicated Use of FreshList
While FreshList
s are useful in many contexts, in my
opinion, they really shine as a tool for building "defensive" APIs.
Color descriptions
Imagine we are designing an API for building a color pallet, for some imaginary digital art application. Such an API might want to accept a list of pairs of color names and descriptions of those colors:
record ColorDesc where
constructor MkCD
name : String
desc : String
Now, it wouldn't make much sense for each color to have more than one description, and we don't want our programmer to accidentally write out two descriptions for the same color, only for one of them to be silently discarded.
Let's build a freshness criteria to encapsulate this expectation. We
don't particularly care if the descriptions are the same, but if two
colors are the same, we have a problem, so we'll check for non-equality
of the name
field of our ColorDesc
record:
FreshColor : ColorDesc -> ColorDesc -> Bool
FreshColor x y = x.name /= y.name
We can now define a function that accepts a color pallet based on this freshness criteria:
acceptPallet : FreshList FreshColor ColorDesc -> Bool
acceptPallet xs = True
If the programmer is mindful, and only defines a description for each color once, this function will accept their pallet:
acceptPallet
[ MkCD {name = "blue", desc = "the color of the sky"}
, MkCD {name = "red", desc = "the color of very hot things"}
, MkCD {name = "green", desc = "the color of leaves"}
, MkCD {name = "brown", desc = "the color of dirt"}
]
However, if our programmer slips up and defines a description for the same color twice, the compiler rightfully doesn't accept their absent minded behavior, and fails with a compile time error:
failing
oopsSameColorAgain : Bool
oopsSameColorAgain =
[ MkCD {name = "blue", desc = "the color of the sky"}
, MkCD {name = "red", desc = "the color of very hot things"}
, MkCD {name = "green", desc = "the color of leaves"}
, MkCD {name = "brown", desc = "the color of dirt"}
, MkCD {name = "red", desc = "the color of blood"}
]
Real World Examples of FreshLists in Action
collie
collie, a command line
arguments handling library for Idris, makes use of
FreshList
s to ensure that the programmer doesn't define the
same argument or sub-program twice.
Idris 2 by Highly Contrived Example
My work in progress book, Idris
2 by Highly Contrived Example, which is structured as a
day-by-day work-through of the Advent of Code problem backlog, uses
FreshList
s for the runner
api, prevented a given day from being declared twice, and also
enforcing that they are declared in sorted order as a cheeky little bit
of type-level style enforcement.
The exact conditions under which this will work are beyond the scope of this article↩︎