Type Safe Variadic printf

While C can provide "convenient" string formatting by having hideously unsafe variadics, and dynamic languages, like python, can do the same, many type safe languages, such as Rust, are forced to provide such functionality through the use of a macro. Dependently typed languages, like Idris, can provide a printf like formatting interface, while maintaining both memory and type safety, without the need for macros. We will explore this by implementing a simplified version of printf in Idris from scratch.

This article is inspired by an exercise from chapter 6 of Type Driven Development with Idris, and is written as a literate Idris file, with the source available here.

Game plan

Our goal is to provide a printf function that can be called much like it's C equivalent:

Note

As this is a literate Idris document, and we haven't defined our printf function yet, we have to use a failing block to ask the compiler to check that this code parses, and syntax highlight it for us, but not attempt to actually compile it.

failing
  example : String
  example = printf "%s %d %2d" "hello" 1 2

Idris lacks a dedicated facility for variadics, but we can call functions in type signatures, which gives us an in for implementing them.

Idris allows us to manipulate types as first class values, and we can use the runtime values of previous arguments to the function we are declaring as arguments to later type-level functions. These type-level functions can, furthermore, recurse into themselves to produce type signatures of varying length and content.

Tip

Idris requires that these type level functions be total for them to be expanded at compile time, meaning they are known to the compiler to return a value in finite time for all possible inputs.

This allows us to produce variadic functions, so long as the number of arguments and their types can be determined from one of the arguments of the function before the variadic portion.

To get our variadic printf function, we can parse our format string into a data structure, then pass that data structure into a type-level function that calculates the rest of the type signature of our printf function based on contents of the format string.

Parsing a Format String

First, we need a data structure to describe our format string. We define the Format data type, with constructors for each of the format specifiers we will be supporting, as well as a constructor to hold literal components.

data Format : Type where
  ||| A slot that should be filled in with a number
  Number : (next : Format) -> Format
  ||| A slot that should be filled  in with a number, padded to a certain number
  ||| of digits
  PaddedNumber : (digits : Nat) -> (next : Format) -> Format
  ||| A slot that should be filled in with a string
  Str : (next : Format) -> Format
  ||| A literal component of the format string that should not be interpolated
  Literal : (literal : String) -> (next : Format) -> Format
  ||| The end of the format string
  End : Format

We'll need to be able to parse numbers for our PaddedNumber constructor, so we'll write a little helper function to handle that component of parsing. We'll simply keep pulling off characters, converting them to integers by manipulating their ordinal values, until we hit a non-digit.

Warning

On Idris 2's chez scheme backend, these are unicode ordinals, where the digit characters are in numerical order, but the values are backend dependent and this code is not guaranteed to work properly on other backends.

parseNumber : (xs : List Char) -> (acc : Nat) -> (Nat, List Char)
parseNumber [] acc = (acc, [])
parseNumber (x :: xs) acc = 
  if isDigit x
    then let value = cast $ (ord x) - (ord '0')
    in parseNumber xs (acc * 10 + value)
    else (acc, x :: xs)

We'll also want another one to scoop up a literal into a List Char, continuing until we hit the end of the string or the next %. This is optional, we could have each literal consist of an individual char, but we will go ahead and group them together.

parseLiteral : (xs : List Char) -> (List Char, List Char)
parseLiteral [] = ([], [])
parseLiteral ('%' :: xs) = ([], '%' :: xs)
parseLiteral (x :: xs) = 
  let (literal, rest) = parseLiteral xs
  in (x :: literal, rest)

Next, parse our format string into an instance of the Format data structure. The specifics of the parsing aren't really material to the main point of this article, but we use a basic pattern matching approach, calling into our helper functions as appropriate.

We return a Maybe Format to signal failure in the case of an invalid format string, and trigger an error when the programmer attempts to use an invalid format string.

parseFormat : (xs : List Char) -> Maybe Format
parseFormat [] = Just End
-- A `%` has to come before a specifier
parseFormat ('%' :: []) = Nothing
parseFormat ('%' :: (x :: xs)) = 
  if isDigit x
    -- Invoke parseNumber to get our padding specifier
    then let (digits, rest) = parseNumber (x :: xs) 0
    in case rest of
      -- A padding specifier has to come before something
      [] => Nothing
      ('d' :: ys) => do
        rest <- parseFormat ys
        Just $ PaddedNumber digits rest
      -- A padding specifier is only valid before a number specifier
      (y :: ys) => Nothing
    -- Parse as an unpadded specifier
    else case x of
      'd' => do
        rest <- parseFormat xs
        Just $ Number rest
      's' => do
        rest <- parseFormat xs
        Just $ Str rest
      -- Any other character here is an invalid specifier
      _ => Nothing
parseFormat (x :: xs) = 
  let (literal, rest) = parseLiteral (x :: xs)
  in do
    rest <- parseFormat rest
    Just $ Literal (pack literal) rest

Calculating a Type From a Format String

Since our parser returns a Maybe Format, we will define two variants of our PrintfType type-level function. One that handles the inner Format, and another that handles a Maybe Format, returning a type signature that is impossible to satisfy in the Nothing case, and calling our first variant in the Just case.

The structure of this type-level function is rather simple, we just pattern match on our Format argument, and prepend the corresponding argument type to the rest of the type signature, computed through a recursive call on the next component of the Format. When we hit the End, we return the String type, which is the final return value of our function.

Tip

The -> that appears in the body of the function is actually the exact same -> that appears in the type signature, in both cases it is the constructor for function types. Idris uses the same language for type signatures and for function bodies, so anything you can use in one place, you can use in the other.

PrintfType' : Format -> Type
PrintfType' (Number next) =
  (num : Nat) -> PrintfType' next
PrintfType' (PaddedNumber digits next) =
  (num : Nat) -> PrintfType' next
PrintfType' (Str next) = 
  (str : String) -> PrintfType' next
PrintfType' (Literal literal next) = 
  PrintfType' next
PrintfType' End = String
PrintfType : Maybe Format -> Type
PrintfType Nothing = Void -> String
PrintfType (Just x) = PrintfType' x

printf

With the Format Structure

Now, we need to be able to produce our formatting function from a Maybe Format. We'll use the same two variants trick as above, but put the variant that operates directly on Formats in a where block, since we won't need to call it from anywhere else.

We build the function argument by argument, pattern matching on the Format, producing a lambda that accepts the argument type demanded by the current specifier. Then recurse into our inner printfFmt' function to handle the rest of the specifiers and arguments, applying the appropriate string conversion to the argument and tacking it to the end of the accumulator string.

The accumulator string is returned verbatim when we hit End.

Note

Much like Haskell and other functional programming languages, Idris, semantically, only actually has functions of one argument. Multi-argument functions actually accept one argument, returning another function that handles the rest of the arguments.

printfFmt : (fmt : Maybe Format) -> (acc : String) -> PrintfType fmt
printfFmt Nothing acc = 
  \void => absurd void
printfFmt (Just x) acc = printfFmt' x acc
  where 
    printfFmt' : (fmt : Format) -> (acc : String) -> PrintfType' fmt
    printfFmt' (Number next) acc = 
      \i => printfFmt' next (acc ++ show i)
    printfFmt' (PaddedNumber digits next) acc = 
      \i => printfFmt' next (acc ++ left_pad digits '0' (show i))
    printfFmt' (Str next) acc =
      \str => printfFmt' next (acc ++ str)
    printfFmt' (Literal literal next) acc = 
      printfFmt' next (acc ++ literal)
    printfFmt' End acc = acc

With a Format String

printf is easily defined as a simple wrapper, converting the provided string to a Format and passing it to printfFmt. After accepting the format string argument, we parse a Maybe Format from it, providing the parsed value to our PrintfType function to calculate the rest of our type signature.

Tip

The _ syntax that appears in this function introduces an anonymous hole. When the value of a hole is forced by type checking (resolved), the compiler fills in its value for you. Holes defined with _ will result in an "unsolved holes" compile error if unresolved.

printf : (fmt : String) -> PrintfType (parseFormat (unpack fmt))
printf fmt = printfFmt _ ""

We can call printf as expected, with the number of and types of the arguments being determined by the provided format string:

Unit Test
printf "%s %s%s %3d %d" "Hello" "world" "!" 1 23 == "Hello world! 001 23"

In fact, our printf is an improvement over a simple C-style printf1, it will fail to compile if you attempt to provide arguments to an invalid format string, which we can demonstrate by trying to apply a padding modifier to a string specifier:

Tip

failing blocks have an additional feature, they will trigger a compiler error if their contents do compile successfully, so we can use them to verify that incorrect use of our function results in a compiler error.

failing
  invalidSpecifier : String
  invalidSpecifier = printf "Hello %s %3s" "world" "!"

It will also fail to compile if too many arguments are passed (in Idris, it isn't desirable to fail when passed too few arguments, due to partial application being a core idiom of the language), or the arguments are of an incorrect type:

failing
  tooManyArguments : String
  tooManyArguments = printf "%s %s" "Hello" "World" "!"
failing
  wrongArgumentType : String
  wrongArgumentType = printf "%s %s" "Hello" 1

Working With Runtime Format Strings

Since we'll be talking about situations where the compiler doesn't have insight into the value of the format string, we'll define a little helper function to hide a value from the type checker.

Note

The evaluator Idris uses for expanding terms at the type level does not have an IO context available to it, so round tripping through IO with unsafePerformIO results in the wrapped term being opaque during compile time.

blackbox : a -> a
blackbox x = unsafePerformIO $ pure x

Our printf function works just fine when provided a format string that's known at compile time, but as written, it isn't really usable with format strings that are provided via IO at runtime.

When the format string isn't known to the compiler at compile time, it can't expand the PrintfFmt function, leaving it unable to compute the rest of the type signature.

failing
  runtimeFormatString : String
  runtimeFormatString = printf (blackbox "%s %s") "Hello" "World"

We can construct a variant of printf that will work with runtime format strings by defining a custom, heterogeneous list-like structure, and making a fallible version of our printf function based on this data structure:

data PrintfInput : Type where
  INum : (d : Nat) -> (next : PrintfInput) -> PrintfInput
  IStr : (s : String) -> (next : PrintfInput) -> PrintfInput
  IEnd : PrintfInput

Now we need a printfFmt analog that works with our PrintfInput type. It accepts a fmt : Maybe Format, the corresponding function of type PrintfType fmt, and the argument inputs as a PrintfInput structure. We use the same two-variants trick as before to handle the Maybe, and the rest is basic pattern matching to make sure the type of the input matches up with the type of the format specifier, and passing that input in as the next argument to the formatting function.

printfFmtInput :
  (fmt : Maybe Format) -> (fn : PrintfType fmt) -> (input : PrintfInput) 
  -> Maybe String
printfFmtInput Nothing fn input = Nothing
printfFmtInput (Just x) fn input = printfFmtInput' x fn input
  where
    printfFmtInput' :
      (fmt' : Format) -> (fn : PrintfType' fmt') -> (input : PrintfInput)
      -> Maybe String
    -- Handle cases where the input type matches up with the formatter type
    printfFmtInput' (Number next) fn (INum d y) = 
      printfFmtInput' next (fn d) y
    printfFmtInput' (PaddedNumber digits next) fn (INum d y) =
      printfFmtInput' next (fn d) y
    printfFmtInput' (Str next) fn (IStr s y) =
      printfFmtInput' next (fn s) y
    printfFmtInput' (Literal literal next) fn input =
      printfFmtInput' next fn input
    -- Handle cases where the input type does not match up with the formatter type
    printfFmtInput' End fn IEnd = Just fn
    printfFmtInput' (Number next) fn _ = Nothing
    printfFmtInput' (PaddedNumber digits next) fn _ = Nothing
    printfFmtInput' (Str next) fn _ = Nothing
    printfFmtInput' End fn _ = Nothing

Our printfInput function calls printf to produce the formatting function, then passes that formatting function to printfFmtInput along with the provided input structure.

printfInput : (fmt : String) -> (input : PrintfInput) -> Maybe String
printfInput fmt input = printfFmtInput _ (printf fmt) input 

Now that it's all put together, we can now use printfInput with format strings that are generated via IO.

Unit Test
printfInput (blackbox "%s %s") (IStr "Hello" (IStr "World" IEnd)) == Just "Hello World"

Being fully honest, I'm not entirely sure what use cases this particular function would make sense for, but it provides a first taste of working with type signatures that depend on values that aren't known until runtime, which is a concept we will expand upon in future entries in this series.

Conclusions

We've made a fun little function that works much like C's printf function or Rust's println!/format! macros, but without relying on magic variadic mechanisms that introduce unsafe behavior and require special, individualized support from the compiler to check, and without requiring macros.

Avoiding using a macro also gives us a lot of neat bonuses, like printf automatically playing nice with partial application and currying, with no extra effort on our part.

Note

The the (List _) x construction here in explicit type annotation, needed because of an ambiguity in list-type literals introduced by importing Data.Vect for the addenda

Unit Test
map (printf "Hello %s") (the (List _) ["Alice", "Bob"]) == ["Hello Alice", "Hello Bob"]

However, the interface this function presents isn't quite ideal, we could do a lot better.

For instance, since we make the type signature one that isn't possible to satisfy in the case where the format string is invalid, there isn't actually a compiler error if you provide an invalid format string, but don't provide any arguments.

This isn't a huge deal, since the produced function isn't going to doing anything if you don't have any arguments, but it constitutes a minor annoyance. You can also quite easily end up surprising yourself later with this if you are making use of partial application.

Unit Test
printf "Hello %s %3s"

Another annoyance here is the error messages, which are quite confusing and opaque, at least at the time of writing, taking another look at our invalidSpecifer failing example:

failing
  invalidSpecifier : String
  invalidSpecifier = printf "Hello %s %3s" "world" "!"

The error message we get if we try to have this outside of a failing block, at time of writing, reads like:

[ build ] 1/3: Building LessMacrosMoreTypes.Printf (src/LessMacrosMoreTypes/Printf.md)
[ build ] Error: While processing right hand side of invalidSpecifier. When unifying:
[ build ]     String
[ build ] and:
[ build ]     ?argTy -> String
[ build ] Mismatch between: String and ?argTy -> String.
[ build ] 
[ build ] LessMacrosMoreTypes.Printf:359:20--359:53
[ build ]  355 | Another annoyance here is the error messages, which are quite confusing and opaque, at least at the time of writing, taking another look at our `invalidSpecifer` failing example:
[ build ]  356 | 
[ build ]  357 | ```idris
[ build ]  358 | invalidSpecifier : String
[ build ]  359 | invalidSpecifier = printf "Hello %s %3s" "world" "!"
[ build ]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This correctly identifies the printf call as the source of the error, but doesn't zero in on the format string or the specifier, and the error message here is just a completely generic one about mismatch during unification that doesn't really tell us about what went wrong.

Some of the fault here lies on the compiler, Idris 2 is currently immature in a lot of ways, to be expected from a pre-1.0 compiler, and error messages are very much one of them, but the way we have written this function also isn't giving the compiler much help.

The need for a bespoke data structure for PrintfInput also isn't ideal.

We will delve in ways to rectify all of these complaints in future entries in this series.

Addenda

Selecting from a list of format strings with matching type signatures

As long as the format strings are known at compile time, PrintfType will fully expand to a function type signature that contains no references to the string. For example, PrintfType (parseFormat (unpack "%s")) expands to String -> String.

We can take advantage of this by indexing a container type by the PrintfType of a "template" format string, and then our container can hold any formatting function of compatible type signature, like so:

formatChoices : Vect 3 (PrintfType (parseFormat (unpack "%s")))
formatChoices = [printf "Hello %s!", printf "Nice to meet you %s", printf "\"%s\""]

We can then index out a single format function from this container (it doesn't have to be a Vect, it could also be a map indexed by the format string itself, for instance):

Unit Test
let f = index 1 formatChoices in f "Alice" == "Nice to meet you Alice"

Or apply all of the contained formatting functions to the same string:

Unit Test
(formatChoices <*> replicate _ "World")
    == ["Hello World!", "Nice to meet you World", "\"World\""]

  1. Ignoring for a second the special handling for format strings in modern compilers↩︎