Idiomdrottning’s homepage

A glex/acetone example

Glex is such a limited and clueless lexer that has zero idea about context, and acetone is such an strange and weird parser that doesn’t look anything like a traditional BNF string rewriting parser. At first glance they look like they’d be beyond useless. So here’s a worked example.

Let’s say we wanna transform this string:

"John Elliot Smith Jr.
1245 Sadsack Boulevard
Ghostville, XY 123 45"

into this sexp:

((name-part
   (first-name "John")
   (first-name "Elliot")
   (last-name "Smith")
   (suffix "Jr."))
 (street-address
  (street-number "1245")
  (street-name "Sadsack" "Boulevard"))
 (zip-part
  (town "Ghostville")
  (state-code "XY")
  (zip-code "123 45")))

This is going to be the lexer:

(glex
 (word word)
 ((or "Sr." "Jr.") suffix)
 ((+ space) space)
 ("," comma)
 (newline eol)
 ((: (+ numeric) (* (: (+ space) (+ numeric)))) number))

Next I’ll go through all the handlers for the parser.

First the import line:

(import brev mdg glex)

Acetone is already part of brev♥︎

A parameter for keeping track of which kind of line we’re in:

(define linestate (make-parameter '(init name-part street-address zip-part)))

A parameter to keep track of which was the most recent name (since we want to mark the last name separate from all the first names):

(define seen-name (make-parameter #f))

A parameter for what to call words in the zip part:

(define zip-tag (make-parameter 'town))

The following fallback isn’t used, but in general with match-generics, put fallbacks first and special cases last. Same goes for the clauses in glex, by the way.

(define (transform _ (tag x))
  (list tag x))

Just strip out whitespace (with a body-less handler):

(define (transform _ ('space _)))

Those two preceding follow the pattern for the transformers in this particular program. Two arguments: A linestate (such as ‘init, ‘name-part, ‘street-address or ‘zip-part) and a list of a tag and value.

Up next is our generic eol handler.

If you don’t understand what open and close does, it’s acetone’s way of opening and closing parens. Acetone’s parse procedure is sort of like a map except that where a mapping procedure only transform one value into one value, parse can transform one value into many values or no value at all or even insert opening and closing parentheses in the underlying list, and the latter is done by :#open and :#close.

So in this specific example, when we find eol, we increment the linestate parameter, close the old linestate, and open the new one. For example, if it finds an eol in the street-part, it sends the equivalent of “)(zip-part”. I know that’s weird and backwards. But that’s the magic of acetone.

(define (transform ls ('eol _))
  (with (car (linestate (cdr (linestate))))
   (values #:close #:open it)))

Here is where the seen-name parameter comes in handy. If we’re in the name part, we don’t wanna write the name right away because we don’t know if it’s the last name or one of the many first names. So we just stash it away.

If we already have one stash it away and find a new name, we know that the previously stashed name has got to be a first name so we send it as a first name, and we stash away the new name.

(define (transform 'name-part ('word name))
  (with (seen-name)
       (seen-name name)
       (when it
           (list 'first-name it))))

If we find a suffix or an eol in the name part, we send the stashed-away name as a last name:

(define (transform 'name-part (and sf ('suffix _)))
  (aif (seen-name)
       (begin (seen-name #f)
        (values (list 'last-name it) sf))
       sf))

(define (transform 'name-part ('eol _))
  (aif (seen-name)
       (->>* (transform 'n '(eol n))
             (values (list 'last-name it)))
       (transform 'n '(eol n))))

Now for the street address stuff. I didn’t implement apartment numbers for this little toy but we could use a similar implementation like the suffix/eol above. Here, I just trust that eol is gonna be the end of the street names.

So first, send the equivalent of “(street-number num)(street-name “:

(define (transform 'street-address ('number num))
  (values (list 'street-number num) #:open 'street-name))

Then just strip the tags off of the name parts in the street address because “(street-name” is already open.

(define (transform 'street-address ('word name)) name)

Finally, if we see eol, do its normal transformation but then prepend a close, so taken together we send the equivalent of “))(zip-part”, i.e. an extra close paren compared to the default eol handler:

(define (transform 'street-address ('eol _))
  (->>* (transform 'n '(eol n)) (values #:close)))

Now for the zip part handlers.
Saving the simplest part for last. The tag for the words is stored in the zip-tag parameter and starts out as 'town and then after we see a comma it changes to 'state-code.

(define (transform 'zip-part ('word name))
  (list (zip-tag) name))

(define (transform 'zip-part ('number name))
  (list 'zip-code name))

The comma is stripped away, it just changes the zip-tag parameter:

(define (transform 'zip-part ('comma _))
  (zip-tag 'state-code) (void))

Handling init is the trickiest of all since the name handler sends void:

(define (transform 'init x)
  (let* ((ls (car (linestate (cdr (linestate)))))
        (res (transform ls x)))
    (if (eq? res (void))
        (values #:open ls)
        (values #:open ls res))))

Since we know it always sends void with this name stuff, we could hardcode it as:

(define (transform 'init x)
  (let* ((ls (car (linestate (cdr (linestate)))))
         (res (transform ls x)))
    (values #:open ls)))

Conversely, if we knew it never sent void, we could do the ->>* thing from the eol handlers, or, another way to write that would be:

(define (transform 'init x)
  (let* ((ls (car (linestate (cdr (linestate)))))
         (res (transform ls x)))
    (values #:open ls res)))

But that’s for another program since in this case, we know it sends void.

Here is the wrapper that makes sure the handlers can dispatch on the linestate:

(define (handle el)
  (transform (car (linestate)) el))

And here is the monstrosity in use:

(pp
 (parse
  handle
  ((glex
    (word word)
    ((or "Sr." "Jr.") suffix)
    ((+ space) space)
    ("," comma)
    (newline eol)
    ((: (+ numeric) (* (: (+ space) (+ numeric)))) number))
   "John Elliot Smith Jr.
1245 Sadsack Boulevard
Ghostville, XY 123 45")))

You can get a repo of this source code by

git clone https://idiomdrottning.org/a-glex-acetone-example