Compiling to 1#
Spenser Bauman
What is 1#?
1# is a Turing Tarpit language made to be both extremely minimal and homoiconic. 1# is used by the Indiana University logic group to teach recursion theory.
The language itself consists of sequences of 1s and #s which can be used to denote the 5 instructions for a text register machine. The text register machine consists of a program counter and an infinite set of registers capable of holding the characters ‘1 and ‘#’.
So, what does the instruction set look like? All instructions are of the form \( 1^n \#^m \), where \( 1^n \) denotes n repetitions of the character ‘1’.
- \( 1^n \# \) : Append a ‘1’ to the end of register n
- \( 1^n \#\# \) : Append a ‘#’ to end of register n
- \( 1^n \# \# \# \) : Move forward n instructions
- \( 1^n \# \# \# \# \) : Move backward n instructions
- \( 1^n \# \# \# \# \# \) : Case on the contents of register n
- If register n is empty, move forward one instruction
- If the first character of register n is a ‘1’, delete that character and advance two instructions
- If the first character of register n is a ‘#’, delete that character and advance three instructions
See, very minimal. So minimal, in fact, that 1# is a huge pain to use for any program consisting of more than a few instruction. So, having already signed up for a class which uses 1#, what can be done to make our lives easier?
The DSL Approach
Rather than come up with a whole new language that can be converted to 1#, we’ll leverage an existing language and programmatically generate our 1# programs. This is hardly novel, even for 1#, but this time we choose to embed our 1# generation language in Haskell. The choice of Haskell is quite intentional, the safety gained from using Haskell will save debugging time down the line; we have enough problem dealing with 1#, so lets avoid introducing bugs via our meta-language as well.
The rest of this post will go into the details of the Haskell DSL. The full source can be found at:
What is a 1# Program?
From the perspective of Haskell, what is a 1# program? Well, a literal formulation from the definition above will be somewhat cumbersome to deal with, particularly ensuring that we never mess up any of the relative jump instructions. Instead, we choose to represent jump locations as labels and convert them to relative jumps later.
1 newtype Reg = Reg { regIndex :: Int }
2 deriving Show
3
4 data Instruction
5 = Add1 Reg
6 | AddH Reg
7 | Label String
8 | Jump String
9 | Case Reg
10 | Nop
11 | Comment String
12 deriving Show
13
14 type Instructions = [Instruction]
Now, our program is simply a stream of instructions, with labels placed at
points in the code which may be targeted by absolute jump instructions.
Note the additional Nop
and Comment
constructors. Nop
is a boring
implementation detail, while Comment
is used to insert notes comments into the
generated 1# program (someone has to grade the result).
The actual conversion process to 1# is straightforward, so lets worry about making some useful coding constructs out of this simple language definition. Manually constructing a stream of instructions is a slightly lower level of abstraction than we are shooting for.
Lets consider the program which moves the contents of register 2 to register 1.
1 11##### ;; case on R2
2 111111### ;; R2 is empty, so go forward 6 (end)
3 111### ;; Got a 1, go forward 3
4 1## ;; Got a #, write a # to R1
5 1111#### ;; Go back to the beginning
6 1# ;; Write a 1 to R1
7 111111#### ;; Go back to the beginning
We can encode this manually as, but the result isn’t much prettier.
1 program = [ Label "START"
2 , Case (Reg 2)
3 , Jump "END"
4 , Jump "GOT ONE"
5 , AddH (Reg 1)
6 , Label "GOT ONE"
7 , Add1 (Reg 1)
8 , Jump "START"
9 , Label "END" ]
Instead, we would like to write something resembling the logical structure of the 1# program, without manually constructing everything. In particular
- Manually managing register numbers is error prone, we want something like variables.
- Manually inserting labels at the correct location is tedious and spelling mistakes are easy.
- We want something that can be composed easily to make larger programs without worrying about register use and label naming.
Instead, we would like to express our move function as a loop over the contents of the input register, taking the appropriate action on each iteration. Perhaps something like this
1 move :: Reg -> Reg -> OneHash ()
2 move source target = loop' source (add1 target) (addh target)
where we delegate the register and label management to the OneHash
type.
Well, OneHash
has to keep enough state to handle register management and
produce a stream of instructions, so leveraging the State
and Writer
monads
is a good place to start.
1 data OneHashState = OneHashState
2 { counter :: !Int
3 , temps :: [Reg]
4 } deriving (Show)
5
6 newtype OneHash a = OneHash { unHash :: StateT OneHashState (Writer [Instruction]) a }
7 deriving (Functor, Monad)
Now, we have a monad stack which manages a OneHashState
and an output
instruction stream.
A OneHashState
consists of a counter for label name generation and a list of
unused registers.
Defining the OneHash
type as a Monad allows the use of do
-notation to
compose our 1# programs.
Now, we can define some nice helper functions.
1 add1, addh :: Reg -> OneHash ()
2 add1 r = tell [Add1 r]
3 addh r = tell [AddH r]
4
5 newLabelName :: String -> OneHash Label
6 newLabelName nm = do
7 st <- get
8 put $ st { counter = counter st + 1 }
9 return $ MkL $ "LABEL_" ++ nm ++ show (counter st)
10
11 emitLabel :: Label -> OneHash ()
12 emitLabel l = tell [Label $ name l]
13
14 jump :: Label -> OneHash ()
15 jump l = tell [Jump $ name l]
16
17 namedLabel :: String -> OneHash (OneHash ())
18 namedLabel nm = do
19 l <- newLabelName nm
20 emitLabel l
21 return $ jump l
22
23 -- Generates a label which can be jumped to using the returned OneHash action.
24 label :: OneHash (OneHash ())
25 label = namedLabel ""
The only thing surprising here is label
action, which generates a new label at
the current point in the instruction stream and returns a new action which will
jump to that location.
This saves the user from manually handling labels and invoking jump
(only
label
and namedLabel
are exported by the defining module).
Because labels are emitted at the same location where their names are generated,
how do we produce a forward jump?
We don’t want the user to have to write something like this
1 f = do
2 l <- newLabelName ""
3 jump l
4 {- code -}
5 emitLabel l
6 {- more code #-}
where the user must generate a new label name and emit it separately, this is messy and error prone. Someone could easily use the wrong label, or accidentally emit the same label twice.
Instead, we define loops using the monadic fix point operation mfix
, which
allows us to define f
as
1 f = mfix $ \l -> do
2 l
3 {- code -}
4 label
5
6 {- Or using RecursiveDo -}
7
8 f = mdo
9 l
10 {- code -}
11 l <- label
Much less complicated. Now that we have the ability to perform forward jumps, we can define a case operation similar to the one in 1#, but without manually creating/jumping to code blocks and without fall through behaviour. The definition is fairly concise
1 cases :: Reg -> OneHash () -> OneHash () -> OneHash () -> OneHash ()
2 cases r c1 c2 c3 = void $ mfix $ \ ~(j1, j2, jend) -> do
3 tell [Case r] >> j1 >> j2 >> c3 >> jend
4 liftA3 (,,) (label <* c1 <* jend) (label <* c2) label
The code generates three labels and uses the case operation as a jump table.
The jump table is immediately followed by code generated by the three OneHash
actions passed to cases.
An additional, useful operation which allows jumping to the top and bottom of
a code segment is withLabels
1 withLabels :: (OneHash () -> OneHash () -> OneHash ()) -> OneHash ()
2 withLabels body = void $ mfix $ \ ~(start, end) ->
3 (,) <$> label <* body start end <*> label
The function passed to withLabels
is supplied with jump operations which
target labels placed just before and immediately after the code produced by
the given function when invoked.
withLabels
provides a simple way to produce looping abstractions, where the
input to withLabels
expects a continue
and break
action.
1 loop :: Reg -- Loop control register
2 -> (OneHash () -> OneHash () -> OneHash ()) -- Got a 1
3 -> (OneHash () -> OneHash () -> OneHash ()) -- Got a #
4 -> OneHash ()
5 loop r one hash = withLabels $ \cont break ->
6 cases r break (one cont break >> cont) (hash cont break >> cont)
7
8 while :: Reg -> OneHash () -> OneHash ()
9 while r m = withLabels $ \start end -> cases r end noop noop >> m >> start
10
11 k² :: a -> b -> c -> a
12 k² = const . const
13
14 loop' :: Reg -> OneHash () -> OneHash () -> OneHash ()
15 loop' r = loop r `on` k²
This gets us the nice looping construct defined above.
What about registers?
Alright, so looping is cool, but if we are programmatically generating a lot of code, how can we be sure we are not erroneously reusing registers?