Type Safe Variadic printf
While C can provide "convenient" string formatting by having
hideously unsafe variadics, and dynamic languages, like python, can do
the same, many type safe languages, such as Rust, are forced to provide
such functionality through the use of a macro. Dependently typed
languages, like Idris, can provide a printf
like formatting
interface, while maintaining both memory and type safety, without the
need for macros. We will explore this by implementing a simplified
version of printf
in Idris from scratch.
This article is inspired by an exercise from chapter 6 of Type Driven Development with Idris, and is written as a literate Idris file, with the source available here.
Game plan
Our goal is to provide a printf
function that can be
called much like it's C equivalent:
Note
As this is a literate Idris document, and we haven't defined our
printf
function yet, we have to use a failing
block to ask the compiler to check that this code parses, and syntax
highlight it for us, but not attempt to actually compile it.
failing
example : String
example = printf "%s %d %2d" "hello" 1 2
Idris lacks a dedicated facility for variadics, but we can call functions in type signatures, which gives us an in for implementing them.
Idris allows us to manipulate types as first class values, and we can use the runtime values of previous arguments to the function we are declaring as arguments to later type-level functions. These type-level functions can, furthermore, recurse into themselves to produce type signatures of varying length and content.
Tip
Idris requires that these type level functions be total for them to be expanded at compile time, meaning they are known to the compiler to return a value in finite time for all possible inputs.
This allows us to produce variadic functions, so long as the number of arguments and their types can be determined from one of the arguments of the function before the variadic portion.
To get our variadic printf
function, we can parse our
format string into a data structure, then pass that data structure into
a type-level function that calculates the rest of the type signature of
our printf
function based on contents of the format
string.
Parsing a Format String
First, we need a data structure to describe our format string. We
define the Format
data type, with constructors for each of
the format specifiers we will be supporting, as well as a constructor to
hold literal components.
data Format : Type where
||| A slot that should be filled in with a number
Number : (next : Format) -> Format
||| A slot that should be filled in with a number, padded to a certain number
||| of digits
PaddedNumber : (digits : Nat) -> (next : Format) -> Format
||| A slot that should be filled in with a string
Str : (next : Format) -> Format
||| A literal component of the format string that should not be interpolated
Literal : (literal : String) -> (next : Format) -> Format
||| The end of the format string
End : Format
We'll need to be able to parse numbers for our
PaddedNumber
constructor, so we'll write a little helper
function to handle that component of parsing. We'll simply keep pulling
off characters, converting them to integers by manipulating their
ordinal values, until we hit a non-digit.
Warning
On Idris 2's chez scheme backend, these are unicode ordinals, where the digit characters are in numerical order, but the values are backend dependent and this code is not guaranteed to work properly on other backends.
parseNumber : (xs : List Char) -> (acc : Nat) -> (Nat, List Char)
parseNumber [] acc = (acc, [])
parseNumber (x :: xs) acc =
if isDigit x
then let value = cast $ (ord x) - (ord '0')
in parseNumber xs (acc * 10 + value)
else (acc, x :: xs)
We'll also want another one to scoop up a literal into a
List Char
, continuing until we hit the end of the string or
the next %
. This is optional, we could have each literal
consist of an individual char, but we will go ahead and group them
together.
parseLiteral : (xs : List Char) -> (List Char, List Char)
parseLiteral [] = ([], [])
parseLiteral ('%' :: xs) = ([], '%' :: xs)
parseLiteral (x :: xs) =
let (literal, rest) = parseLiteral xs
in (x :: literal, rest)
Next, parse our format string into an instance of the
Format
data structure. The specifics of the parsing aren't
really material to the main point of this article, but we use a basic
pattern matching approach, calling into our helper functions as
appropriate.
We return a Maybe Format
to signal failure in the case
of an invalid format string, and trigger an error when the programmer
attempts to use an invalid format string.
parseFormat : (xs : List Char) -> Maybe Format
parseFormat [] = Just End
-- A `%` has to come before a specifier
parseFormat ('%' :: []) = Nothing
parseFormat ('%' :: (x :: xs)) =
if isDigit x
-- Invoke parseNumber to get our padding specifier
then let (digits, rest) = parseNumber (x :: xs) 0
in case rest of
-- A padding specifier has to come before something
[] => Nothing
('d' :: ys) => do
rest <- parseFormat ys
Just $ PaddedNumber digits rest
-- A padding specifier is only valid before a number specifier
(y :: ys) => Nothing
-- Parse as an unpadded specifier
else case x of
'd' => do
rest <- parseFormat xs
Just $ Number rest
's' => do
rest <- parseFormat xs
Just $ Str rest
-- Any other character here is an invalid specifier
_ => Nothing
parseFormat (x :: xs) =
let (literal, rest) = parseLiteral (x :: xs)
in do
rest <- parseFormat rest
Just $ Literal (pack literal) rest
Calculating a Type From a Format String
Since our parser returns a Maybe Format
, we will define
two variants of our PrintfType
type-level function. One
that handles the inner Format
, and another that handles a
Maybe Format
, returning a type signature that is impossible
to satisfy in the Nothing
case, and calling our first
variant in the Just
case.
The structure of this type-level function is rather simple, we just
pattern match on our Format
argument, and prepend the
corresponding argument type to the rest of the type signature, computed
through a recursive call on the next
component of the
Format
. When we hit the End
, we return the
String
type, which is the final return value of our
function.
Tip
The ->
that appears in the body of the function is
actually the exact same ->
that appears in the type
signature, in both cases it is the constructor for function types. Idris
uses the same language for type signatures and for function bodies, so
anything you can use in one place, you can use in the other.
PrintfType' : Format -> Type
PrintfType' (Number next) =
(num : Nat) -> PrintfType' next
PrintfType' (PaddedNumber digits next) =
(num : Nat) -> PrintfType' next
PrintfType' (Str next) =
(str : String) -> PrintfType' next
PrintfType' (Literal literal next) =
PrintfType' next
PrintfType' End = String
PrintfType : Maybe Format -> Type
PrintfType Nothing = Void -> String
PrintfType (Just x) = PrintfType' x
printf
With the Format Structure
Now, we need to be able to produce our formatting function from a
Maybe Format
. We'll use the same two variants trick as
above, but put the variant that operates directly on
Format
s in a where
block, since we won't need
to call it from anywhere else.
We build the function argument by argument, pattern matching on the
Format
, producing a lambda that accepts the argument type
demanded by the current specifier. Then recurse into our inner
printfFmt'
function to handle the rest of the specifiers
and arguments, applying the appropriate string conversion to the
argument and tacking it to the end of the accumulator string.
The accumulator string is returned verbatim when we hit
End
.
Note
Much like Haskell and other functional programming languages, Idris, semantically, only actually has functions of one argument. Multi-argument functions actually accept one argument, returning another function that handles the rest of the arguments.
printfFmt : (fmt : Maybe Format) -> (acc : String) -> PrintfType fmt
printfFmt Nothing acc =
\void => absurd void
printfFmt (Just x) acc = printfFmt' x acc
where
printfFmt' : (fmt : Format) -> (acc : String) -> PrintfType' fmt
printfFmt' (Number next) acc =
\i => printfFmt' next (acc ++ show i)
printfFmt' (PaddedNumber digits next) acc =
\i => printfFmt' next (acc ++ left_pad digits '0' (show i))
printfFmt' (Str next) acc =
\str => printfFmt' next (acc ++ str)
printfFmt' (Literal literal next) acc =
printfFmt' next (acc ++ literal)
printfFmt' End acc = acc
With a Format String
printf
is easily defined as a simple wrapper, converting
the provided string to a Format
and passing it to
printfFmt
. After accepting the format string argument, we
parse a Maybe Format
from it, providing the parsed value to
our PrintfType
function to calculate the rest of our type
signature.
Tip
The _
syntax that appears in this function introduces an
anonymous hole. When the value of a hole is forced by type checking
(resolved), the compiler fills in its value for you. Holes defined with
_
will result in an "unsolved holes" compile error if
unresolved.
printf : (fmt : String) -> PrintfType (parseFormat (unpack fmt))
printf fmt = printfFmt _ ""
We can call printf
as expected, with the number of and
types of the arguments being determined by the provided format
string:
printf "%s %s%s %3d %d" "Hello" "world" "!" 1 23 == "Hello world! 001 23"
In fact, our printf
is an improvement over a simple
C-style printf1, it will fail to compile if you
attempt to provide arguments to an invalid format string, which we can
demonstrate by trying to apply a padding modifier to a string
specifier:
Tip
failing
blocks have an additional feature, they will
trigger a compiler error if their contents do compile
successfully, so we can use them to verify that incorrect use of our
function results in a compiler error.
failing
invalidSpecifier : String
invalidSpecifier = printf "Hello %s %3s" "world" "!"
It will also fail to compile if too many arguments are passed (in Idris, it isn't desirable to fail when passed too few arguments, due to partial application being a core idiom of the language), or the arguments are of an incorrect type:
failing
tooManyArguments : String
tooManyArguments = printf "%s %s" "Hello" "World" "!"
failing
wrongArgumentType : String
wrongArgumentType = printf "%s %s" "Hello" 1
Working With Runtime Format Strings
Since we'll be talking about situations where the compiler doesn't have insight into the value of the format string, we'll define a little helper function to hide a value from the type checker.
Note
The evaluator Idris uses for expanding terms at the type level does
not have an IO context available to it, so round tripping through
IO
with unsafePerformIO
results in the wrapped
term being opaque during compile time.
blackbox : a -> a
blackbox x = unsafePerformIO $ pure x
Our printf
function works just fine when provided a
format string that's known at compile time, but as written, it isn't
really usable with format strings that are provided via IO at
runtime.
When the format string isn't known to the compiler at compile time,
it can't expand the PrintfFmt
function, leaving it unable
to compute the rest of the type signature.
failing
runtimeFormatString : String
runtimeFormatString = printf (blackbox "%s %s") "Hello" "World"
We can construct a variant of printf
that will work with
runtime format strings by defining a custom, heterogeneous list-like
structure, and making a fallible version of our printf
function based on this data structure:
data PrintfInput : Type where
INum : (d : Nat) -> (next : PrintfInput) -> PrintfInput
IStr : (s : String) -> (next : PrintfInput) -> PrintfInput
IEnd : PrintfInput
Now we need a printfFmt
analog that works with our
PrintfInput
type. It accepts a
fmt : Maybe Format
, the corresponding function of type
PrintfType fmt
, and the argument inputs as a
PrintfInput
structure. We use the same two-variants trick
as before to handle the Maybe
, and the rest is basic
pattern matching to make sure the type of the input matches up with the
type of the format specifier, and passing that input in as the next
argument to the formatting function.
printfFmtInput :
(fmt : Maybe Format) -> (fn : PrintfType fmt) -> (input : PrintfInput)
-> Maybe String
printfFmtInput Nothing fn input = Nothing
printfFmtInput (Just x) fn input = printfFmtInput' x fn input
where
printfFmtInput' :
(fmt' : Format) -> (fn : PrintfType' fmt') -> (input : PrintfInput)
-> Maybe String
-- Handle cases where the input type matches up with the formatter type
printfFmtInput' (Number next) fn (INum d y) =
printfFmtInput' next (fn d) y
printfFmtInput' (PaddedNumber digits next) fn (INum d y) =
printfFmtInput' next (fn d) y
printfFmtInput' (Str next) fn (IStr s y) =
printfFmtInput' next (fn s) y
printfFmtInput' (Literal literal next) fn input =
printfFmtInput' next fn input
-- Handle cases where the input type does not match up with the formatter type
printfFmtInput' End fn IEnd = Just fn
printfFmtInput' (Number next) fn _ = Nothing
printfFmtInput' (PaddedNumber digits next) fn _ = Nothing
printfFmtInput' (Str next) fn _ = Nothing
printfFmtInput' End fn _ = Nothing
Our printfInput
function calls printf
to
produce the formatting function, then passes that formatting function to
printfFmtInput
along with the provided input structure.
printfInput : (fmt : String) -> (input : PrintfInput) -> Maybe String
printfInput fmt input = printfFmtInput _ (printf fmt) input
Now that it's all put together, we can now use
printfInput
with format strings that are generated via
IO.
printfInput (blackbox "%s %s") (IStr "Hello" (IStr "World" IEnd)) == Just "Hello World"
Being fully honest, I'm not entirely sure what use cases this particular function would make sense for, but it provides a first taste of working with type signatures that depend on values that aren't known until runtime, which is a concept we will expand upon in future entries in this series.
Conclusions
We've made a fun little function that works much like C's
printf
function or Rust's
println!
/format!
macros, but without relying
on magic variadic mechanisms that introduce unsafe behavior and require
special, individualized support from the compiler to check, and without
requiring macros.
Avoiding using a macro also gives us a lot of neat bonuses, like
printf
automatically playing nice with partial application
and currying, with no extra effort on our part.
Note
The the (List _) x
construction here in explicit type
annotation, needed because of an ambiguity in list-type literals
introduced by importing Data.Vect
for the addenda
map (printf "Hello %s") (the (List _) ["Alice", "Bob"]) == ["Hello Alice", "Hello Bob"]
However, the interface this function presents isn't quite ideal, we could do a lot better.
For instance, since we make the type signature one that isn't possible to satisfy in the case where the format string is invalid, there isn't actually a compiler error if you provide an invalid format string, but don't provide any arguments.
This isn't a huge deal, since the produced function isn't going to doing anything if you don't have any arguments, but it constitutes a minor annoyance. You can also quite easily end up surprising yourself later with this if you are making use of partial application.
printf "Hello %s %3s"
Another annoyance here is the error messages, which are quite
confusing and opaque, at least at the time of writing, taking another
look at our invalidSpecifer
failing example:
failing
invalidSpecifier : String
invalidSpecifier = printf "Hello %s %3s" "world" "!"
The error message we get if we try to have this outside of a failing block, at time of writing, reads like:
[ build ] 1/3: Building LessMacrosMoreTypes.Printf (src/LessMacrosMoreTypes/Printf.md)
[ build ] Error: While processing right hand side of invalidSpecifier. When unifying:
[ build ] String
[ build ] and:
[ build ] ?argTy -> String
[ build ] Mismatch between: String and ?argTy -> String.
[ build ]
[ build ] LessMacrosMoreTypes.Printf:359:20--359:53
[ build ] 355 | Another annoyance here is the error messages, which are quite confusing and opaque, at least at the time of writing, taking another look at our `invalidSpecifer` failing example:
[ build ] 356 |
[ build ] 357 | ```idris
[ build ] 358 | invalidSpecifier : String
[ build ] 359 | invalidSpecifier = printf "Hello %s %3s" "world" "!"
[ build ] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This correctly identifies the printf
call as the source
of the error, but doesn't zero in on the format string or the specifier,
and the error message here is just a completely generic one about
mismatch during unification that doesn't really tell us about
what went wrong.
Some of the fault here lies on the compiler, Idris 2 is currently immature in a lot of ways, to be expected from a pre-1.0 compiler, and error messages are very much one of them, but the way we have written this function also isn't giving the compiler much help.
The need for a bespoke data structure for PrintfInput
also isn't ideal.
We will delve in ways to rectify all of these complaints in future entries in this series.
Addenda
Selecting from a list of format strings with matching type signatures
As long as the format strings are known at compile time,
PrintfType
will fully expand to a function type signature
that contains no references to the string. For example,
PrintfType (parseFormat (unpack "%s"))
expands to
String -> String
.
We can take advantage of this by indexing a container type by the
PrintfType
of a "template" format string, and then our
container can hold any formatting function of compatible type signature,
like so:
formatChoices : Vect 3 (PrintfType (parseFormat (unpack "%s")))
formatChoices = [printf "Hello %s!", printf "Nice to meet you %s", printf "\"%s\""]
We can then index out a single format function from this container
(it doesn't have to be a Vect
, it could also be a map
indexed by the format string itself, for instance):
let f = index 1 formatChoices in f "Alice" == "Nice to meet you Alice"
Or apply all of the contained formatting functions to the same string:
(formatChoices <*> replicate _ "World")
== ["Hello World!", "Nice to meet you World", "\"World\""]
Ignoring for a second the special handling for format strings in modern compilers↩︎