This document describes a pattern matcher, unimaginatively called match, the source code for which is available at:
http://www.cs.indiana.edu/chezscheme/match/match.ss
A match expression looks a lot like a case expression, except that the keys are replaced with a pattern to be matched against the input. A match expression has the general form below.
(match input-expr clause)
Each clause consists of an input pattern, an optional guard, and a set of expressions.
[input-pattern expr1 expr2 ...]
or
[input-pattern (guard guard-expr ...) expr1 expr2 ...]
As with case, the input expression is evaluated to produce the input value, and the first clause the input value matches, if any, is selected. The output expressions of the selected clause are evaluated in sequence, and the value of the last expression is returned.
An input value matches a clause if it fits the clause's pattern and passes the clause's guards, if any. Patterns may contain symbolic constants, which must match exactly, and pattern variables, which match any input. Pattern variables are prefixed by commas; symbolic constants are not.
(match '(a 17 37)
[(a ,x) 1]
[(b ,x ,y) 2]
[(a ,x ,y) 3]) 3
The first clause fails to match because there are three items in the input list, and the pattern has only two. The second clause fails because b does not match a.
In the output expression, the values of the pattern variables are bound to the corresponding pieces of input.
(match '(a 17 37)
[(a ,x) (- x)]
[(b ,x ,y) (+ x y)]
[(a ,x ,y) (* x y)]) 629
When followed by an ellipsis (...), a pattern variable represents a sequence of input values.
(match '(a 17 37) [(a ,x* ...) x*]) (17 37)
By convention, we place a * suffix on each pattern variable that matches a sequence of input expressions. This is just a convention, however, and not part of the syntax of match.
Ellipses can follow a structured pattern containing one or more pattern variables.
(match '(say (a time) (stitch saves) (in nine))
[(say (,x* ,y*) ...) (append x* y*)]) (a stitch in time saves nine)
Ellipses can be nested, producing sequences of sequences of values.
(match '((a b c d) (e f g) (h i) (j))
[((,x* ,y** ...) ...)
(list x* y**)]) ((a e h j) ((b c d) (f g) (i) ()))
Recursion is frequently required while processing an input expression with match. Here is a simple definition of length using match.
(define length
(lambda (ls)
(match ls
[() 0]
[(,x . ,x*) (add1 (length x*))])))
Using ellipses may make this more clear.
(define length
(lambda (ls)
(match ls
[() 0]
[(,x ,x* ...) (add1 (length x*))])))
Here is a more realistic example of recursion. It also illustrates the use of guards and the use of error to signal match errors.
(define simple-eval
(lambda (x)
(match x
[,i (guard (integer? i)) i]
[(+ ,x* ...) (apply + (map simple-eval x*))]
[(* ,x* ...) (apply * (map simple-eval x*))]
[(- ,x ,y) (- (simple-eval x) (simple-eval y))]
[(/ ,x ,y) (/ (simple-eval x) (simple-eval y))]
[,x (error 'simple-eval "invalid expression ~s" x)])))
Try out simple-eval using Chez Scheme's new-cafe with simple-eval as the evaluation procedure:
> (new-cafe simple-eval)
>> (+ 1 2 3)
6
>> (+ (- 0 1) (+ 2 3))
4
>> (- 1 2 3)
Error in simple-eval: invalid expression (- 1 2 3).
Type (debug) to enter the debugger.
>>
Unlike case and cond, match does not have an explicit else clause. In fact, if a clause of the form
[else expr]
is used, it matches (only) the symbol else. To achieve the effect of a catch-all clause, simple-eval uses the pattern ,x which matches any input.
An even simpler version of the above uses match's "catamorphism" feature to perform the recursion automatically.
(define simple-eval
(lambda (x)
(match x
[,i (guard (integer? i)) i]
[(+ ,[x*] ...) (apply + x*)]
[(* ,[x*] ...) (apply * x*)]
[(- ,[x] ,[y]) (- x y)]
[(/ ,[x] ,[y]) (/ x y)]
[,x (error 'simple-eval "invalid expression ~s" x)])))
In the second version of simple-eval, the explicit recursive calls are gone. Instead, the pattern variables have been written as ,[var]; this tells match to recur on the matching subpart of the input before evaluating the output expressions of the clause. Parentheses may be used in place of brackets, i.e., ,(var); we use brackets for readability.
Here is another definition of length, this time using the catamorphism feature.
(define length
(lambda (x)
(match x
[() 0]
[(,x . ,[y]) (+ y 1)])))
Since we usually use match to write translators from one language to another, we usually need to build an output expression, rather than return an output value. For example, the following converts let expressions into equivalent lambda applications.
(define translate
(lambda (x)
(match x
[(let ((,var* ,expr*) ...) ,body ,body* ...)
`((lambda ,var* ,body ,@body*) ,@expr*)]
[,x (error 'translate "invalid expression: ~s" x)])))
(translate '(let ((x 3) (y 4)) (+ x y))) ((lambda (x y) (+ x y)) 3 4)
This procedure uses Scheme's quasiquote (backquote), unquote (comma), and unquote-splicing (comma-at) to piece together the output form. These are not described here, but are described in The Scheme Programming Language.
One useful feature of quasiquote not described in these documents is a match extension that allows ellipses to be used in place of unquote-splicing, which often leads to more readable code.
(define translate
(lambda (x)
(match x
[(let ((,var* ,expr*) ...) ,body ,body* ...)
`((lambda ,var* ,body ,body* ...) ,expr* ...)]
[,x (error 'translate "invalid expression: ~s" x)])))
Within each subform followed by an ellipsis, each comma-prefixed item must be a list and all such items within the same subform must have the same length. The following procedure extracts two lists from its input and merges them to form a list of pairs.
(define (f x)
(match x
[((,a ...) (,b ...)) `((,a . ,b) ...)]))
(f '((1 2 3) (a b c))) ((1 . a) (2 . b) (3 . c))
It fails if the two lists have different lengths.
(f '((1 2 3) (a b))) error
The following also fails because one of the items is not even a list.
(define (f x)
(match x
[(,a ,b ...) `((,a ,b) ...)]))
(f '(1 2 3 4)) error
A subform followed by an ellipses may be contained with a larger subform that is also followed by an ellipsis. In this case, each comma-prefixed item must be a list of lists, each such item must have the same length, and the corresponding sublists of each such item must have the same lengths. This requirement generalizes to comma-prefixed items nested within more than two levels of ellispis-followed subforms.
The following procedure illustrates two levels of nested ellipses in both the input and output. It converts unnamed let expressions into direct lambda applications, where the let has been generalized to allow an implicit begin in each right-hand-side expression.
(define (f x)
(match x
[(let ([,x ,e1 ...] ...) ,b1 ,b2 ...)
`((lambda (,x ...) ,b1 ,b2 ...)
(begin ,e1 ...) ...)]))
(f '(let ((x (write 'one) 3) (y (write 'two) 4)) (list x y)))
((lambda (x y) (list x y))
(begin (write 'one) 3)
(begin (write 'two) 4))
In the output, a subform may be followed directly by two or more ellipses; the requirements are the same as for nested ellipses, but the result is flattened rather than nested, as illustrated by the following example.
(define (f x)
(match x
[(foo (,x ...) ...)
`(list (car ,x) ... ...)]))
(f '(foo (a) (b c d e) () (f g)))
(list (car a) (car b) (car c) (car d) (car e) (car f) (car g))
Sometimes it is useful to explicitly name the operator in a "cata" subpattern. Whereas ,[id ...] recurs to the top of the current match, ,[cata -> id ...] recurs to cata. cata must evaluate to a procedure that accepts one argument, the input expression, and returns as many values as there are identifiers following the ->.
This allows processing to be split into several mutually recursive procedures, as in the following parser for the simple language defined by the grammar below.
<Prog> | ![]() | (program <Stmt>* <Expr>) |
<Stmt> | ![]() | (if <Expr> <Stmt> <Stmt>) |
| | (set! <var> <Expr>) | |
<Expr> | ![]() | <var> |
| | <integer> | |
| | (if <Expr> <Expr> <Expr>) | |
| | (<Expr> <Expr>*) |
(define parse
(lambda (x)
(define Prog
(lambda (x)
(match x
[(program ,[Stmt -> s*] ... ,[Expr -> e])
`(begin ,s* ... ,e)]
[,x (error 'parse "invalid program ~s" x)])))
(define Stmt
(lambda (x)
(match x
[(if ,[Expr -> e] ,[Stmt -> s1] ,[Stmt -> s2])
`(if ,e ,s1 ,s2)]
[(set! ,v ,[Expr -> e])
(guard (symbol? v))
`(set! ,v ,e)]
[,x (error 'parse "invalid statement ~s" x)])))
(define Expr
(lambda (x)
(match x
[,v (guard (symbol? v)) v]
[,n (guard (integer? n)) n]
[(if ,[e1] ,[e2] ,[e3])
`(if ,e1 ,e2 ,e3)]
[(,[rator] ,[rand*] ...) `(,rator ,rand* ...)]
[,x (error 'parse "invalid expression ~s" x)])))
(Prog x)))
(parse '(program (set! x 3) (+ x 4))) (begin (set! x 3) (+ x 4))
A mentioned above, the operator specified in the cata syntax can be any expression. We can make use of this to pass along an environment, if needed, say to handle a version of the language above extended with a let expression that creates bindings that might shadow keywords.
<Prog> | ![]() | (program <Stmt>* <Expr>) |
<Stmt> | ![]() | (if <Expr> <Stmt> <Stmt>) |
| | (set! <var> <Expr>) | |
<Expr> | ![]() | <var> |
| | <integer> | |
| | (if <Expr> <Expr> <Expr>) | |
| | (let ((var <Expr>)) <Expr>) | |
| | (<Expr> <Expr>*) |
In the following version of parse, the output is more structured, with an explicit call keyword so that there will not be any confusion in the output between a call to the value of an if variable and a if expression keyed by the if keyword.
(define parse
(lambda (x)
(define Prog
(lambda (x)
(match x
[(program ,[Stmt -> s*] ... ,[(Expr '()) -> e])
`(begin ,s* ... ,e)]
[,x (error 'parse "invalid program ~s" x)])))
(define Stmt
(lambda (x)
(match x
[(if ,[(Expr '()) -> e] ,[Stmt -> s1] ,[Stmt -> s2])
`(if ,e ,s1 ,s2)]
[(set! ,v ,[(Expr '()) -> e])
(guard (symbol? v))
`(set! ,v ,e)]
[,x (error 'parse "invalid statement ~s" x)])))
(define Expr
(lambda (env)
(lambda (x)
(match x
[,v (guard (symbol? v)) v]
[,n (guard (integer? n)) n]
[(if ,[e1] ,[e2] ,[e3])
(guard (not (memq 'if env)))
`(if ,e1 ,e2 ,e3)]
[(let ([,v ,[e]]) ,[(Expr (cons v env)) -> body])
(guard (not (memq 'let env)) (symbol? v))
`(let ([,v ,e]) ,body)]
[(,[rator] ,[rand*] ...)
`(call ,rator ,rand* ...)]
[,x (error 'parse "invalid expression ~s" x)]))))
(Prog x)))
(parse
'(program (begin
(let ([if (if x list values)]) (let ([if (if x list values)])
(if 1 2 3)))) (call if 1 2 3)))
When recurring via a cata call without an explicit operator, e.g., within the Expr if and application cases, the value of Expr's env argument does not change, which is good, since nothing needs to be added to the environment at those call sites. The value of Expr's x argument does not change either, so the clause that reads
[,v (guard (symbol? v)) v]
should not be replaced by one that reads
[,v (guard (symbol? v)) x]
or nested expressions will not be handled properly. In general, the values of any variables bound outside a match expression do not change when a cata call is used to recur directly to the match itself.
In some cases, a procedure that uses match will need to return multiple values. Programs return multiple values using the values procedure and receive them using the let-values syntactic form. These features are described in The Scheme Programming Language. Here is a simple example adapted from The Scheme Programming Language.
(define split
(lambda (ls)
(if (or (null? ls) (null? (cdr ls)))
(values ls '())
(let-values ([(odds evens) (split (cddr ls))])
(values (cons (car ls) odds)
(cons (cadr ls) evens))))))
(split '(a b c d e f)) (a c e)
(b d f)
The values procedure is used to return multiple values; it takes an arbitrary number of arguments and returns them as the values. If passed one value, values behaves like the identity. let-values is like let, except that each binding binds zero or more variables to zero or more return values.
The pattern matcher cata syntax can also be used to receive multiple values. When making implicit recursive calls using the cata (,[]) syntax, one can include zero or more variables between the brackets (after the ->, if one is present), each representing one of the expected return values. split may thus be defined using match as follows.
(define split
(lambda (ls)
(match ls
[() (values '() '())]
[(,x) (values `(,x) '())]
[(,x ,y . ,[odds evens])
(values (cons x odds)
(cons y evens))])))
(split '(a b c d e f)) (a c e)
(b d f)