\input texinfo @c -*-texinfo-*- @c $Id: parsing-strings.texi,v 1.12 2005/10/06 15:52:06 ashawley Exp $ @c This document was started with the ``GNU Sample Texts'' @c available in the GNU Texinfo Manual. It has been modified in @c the following notable ways: @c * Removed the use of version.texi. @c * Modified the GNU FDL Copying notice by changing the Front- and @c Back-Cover Texts. @c * Made a link to the GNU FDL located at the FSF Web site. @c * Mentioned Texinfo as the document's typesetting languge in the @c Copying section and added a link to the GNU Texinfo Web site. @c * Mentioned the document's source file as a hyperlink and mentioned @c the document's ``official'' Internet location. @c * Added RCS tags to be printed in the document's Copying section. @c * Displayed the Copying section after the first menu to avoid @c confusion in the Info output. @c * Removed the include of fdl.texi. @comment %**start of header @setfilename parsing-strings.info @settitle Parsing Strings with @acronym{MIT}/@acronym{GNU} Scheme @comment %**end of header @copying Copyleft @copyright{} 2005 Aaron S. Hawley @quotation Permission is granted to copy, distribute and/or modify this document under the terms of the @acronym{GNU} Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, with the Front-Cover Texts being ``Free Documentation,'' and with the Back-Cover Texts as in (a) below. A copy of the license is available from the Free Software Foundation Web site at @url{http://www.fsf.org/licenses/fdl.html}. (a) The Back-Cover Text is: ``You have freedom to copy and modify this free document, as you would free software.'' @end quotation The document was typeset with @uref{http://www.texinfo.org/, GNU Texinfo}. The document's source file is @uref{parsing-strings.texi}. It is available from @indicateurl{http://agave.garden.org/~aaronh/scheme/}. $Date: 2005/10/06 15:52:06 $ $Revision: 1.12 $ @end copying @titlepage @title Parsing Strings with Scheme @subtitle Using the @code{*parser} library of @acronym{MIT}/@acronym{GNU} Scheme @author Aaron S. Hawley (@email{ashawley(at)gnu.uvm.edu}) @page @vskip 0pt plus 1filll @insertcopying @end titlepage @contents @ifnottex @node Top @top Parsing Strings with Scheme This manual introduces parsing in @acronym{MIT}/@acronym{GNU} Scheme by giving a tutorial on how to parse string literals. Impatient readers can skip to the entire source code file of the final parser implementation in @ref{parse-string.scm}. The tutorial could then function to help explain confusing aspects of the source code file. @menu * Introduction:: * Getting Started:: * The Tutorial:: * Conclusion:: * parse-string.scm:: * Parsing with read:: @end menu @insertcopying @end ifnottex @node Introduction @chapter Introduction @iftex This manual introduces parsing in @acronym{MIT}/@acronym{GNU} Scheme by giving a tutorial on how to parse string literals. Impatient readers can skip to the entire source code file of the final parser implementation in @ref{parse-string.scm}. The tutorial could then function to help explain confusing aspects of the source code file. @end iftex The article begins and concludes with arguments for writing a parser in the Scheme programming language. After the tutorial's scope and intended audience are also introduced, pointers are given to technical and other relevant sources of documentation useful for understanding the tutorial. Information on executing the examples is provided for readers viewing the tutorial from inside an Emacs editor. @menu * Motivation:: * Overview:: * Assumptions:: * External References:: * Note to Readers Using Emacs:: @end menu @node Motivation @section Motivation The fun in using the @dfn{Scheme programming language} doesn't come from only repeating the same tired pedagogical examples, like writing a recursive solution for printing Fibonacci's numbers or writing yet another Scheme interpreter. Scheme can accomplish worthwhile tasks. And counter to the rumor, some Scheme implementations come with libraries to help programmers complete such tasks. These libraries are often as high-level, graceful and useful as the Scheme language itself. One such Scheme implementation---that also comes documented quite well---is @acronym{MIT}/@acronym{GNU} Scheme (hereafter ``@acronym{MIT} Scheme''). @node Overview @section Overview The following is a tutorial to writing a parser in @acronym{MIT} Scheme. @dfn{Parsers} are defined in computer science as software that can extract information from a stream of data. For example, reading the words in a sentence requires a parser. Reading the structur and keywords of computer software code requires a parser. Besides @acronym{MIT} Scheme, the human brain has parsing facilities for reading words and for identifying objects (like shapes or colors). Most introductions to parsing start with infix arithmetic expressions like @samp{1 + 1}. Instead, we shall introduce parsing a simpler construct, the @dfn{string literal}. String literals are any number of characters found between two quotation marks (``"''). Special scenarios usually exist for representing a quotation mark character inside of string literals. Those are explained later. By parsing only string literals, the tutorial covers what is more often termed @dfn{lexing} by classical computer science. A @dfn{lexer} lexes by converting a stream of data into @dfn{lexemes}. Lexemes are the individual tokens matched by a parser. In this tutorial, the lexemes are string literals. When a literate human brain reads sentences, it lexes words. This distinction exists likely because conventional programming languages needed the tasks separated to handle parsing and to handle @dfn{left recursion} for parsing languages that use @dfn{infix notation}. The distinction between lexing and parsing is not acknowledged any further in this document. To create a parser there must be a defined @dfn{grammar}. A grammar describes some desired thing that can be parsed. The desired thing may be a valid sentence in the english language, but is a string literal in this tutorial. Only with a well-defined grammar, the task of designing a parser can be accomplished. Writing a parser @node Assumptions @section Assumptions The tutorial assumes you understand the basic concepts of a grammar and how to represent them. Two common and related notations for grammars, @dfn{Regular Expressions} and @dfn{Backus-Naur Form} (@acronym{BNF}), are useful to know in this tutorial. It is assumed you are familiar with programming in Scheme, too. @node External References @section External References Information on the Scheme programming language available from @acronym{MIT} Scheme and how to install and use @acronym{MIT Scheme} is available elsewhere as free documentation. @xref{Overview, Overview, Overview, mit-scheme-ref, MIT/GNU Scheme Reference Manual}, or @ref{Introduction, Introduction, Introduction, mit-scheme-user, MIT/GNU Scheme Reference Manual}, respectively. Developed since at least the 1980s, @acronym{MIT} Scheme is a complete programming environment. It is now distributed as @dfn{free software}, was adopted as a @dfn{@acronym{GNU} package}, and has been used for decades to teach programming. More information on @acronym{MIT} Scheme is available at the @uref{http://www.gnu.org/software/mit-scheme/, MIT Scheme Web site}. @node Note to Readers Using Emacs @section Note to Readers Using Emacs Readers reading the tutorial in the @acronym{GNU} Info format from within @acronym{GNU} Emacs or the Edwin editor (the Emacs-like editor that comes with @acronym{MIT} Scheme) are encouraged to evaluate each piece of example code with the command @kbd{C-x C-e}. For Emacs, evaluating Scheme expressions in @command{Info-mode} requires running an @dfn{inferior} Scheme process in a separate buffer followed by binding the key to the @code{scheme-send-last-sexp} command. @example M-x run-scheme RET C-x b RET M-x local-set-key RET C-x C-e scheme-send-last-sexp RET M-x local-set-key RET C-M-x scheme-send-definition RET @end example @c Alternatively, one can run the following Emacs Lisp code to make the @c same definitions. @c @lisp @c (progn @c (run-scheme "scheme") @c (switch-to-buffer "*info*") @c (make-variable-buffer-local 'eval-last-sexp) @c (defalias 'eval-last-sexp 'scheme-send-last-sexp) @c ;; (local-set-key "" 'scheme-send-last-sexp) @c (make-variable-buffer-local 'eval-defun) @c (defalias 'eval-defun 'scheme-send-definition)) @c ;; (local-set-key "\230" 'scheme-send-definition) @c @end lisp Typing @kbd{C-x C-e} with the point (cursor) at the end of the last parenthesis of a Scheme expression evaluates the expression in the @file{*scheme*} buffer and print its evaluated value. Typing @kbd{C-M-x} anywhere inside a Scheme definition evaluates the definition. @node Getting Started @chapter Getting Started @menu * The MIT Scheme Parser:: * Using the Parser:: @end menu @node The MIT Scheme Parser @section The @acronym{MIT} Scheme Parser In @acronym{MIT} Scheme, the parser and matcher are provided as separate libraries to avoid conflicting with the default standard Scheme language provided by @acronym{MIT} Scheme. This is the scenario for creating parsers in other programming languages as well. Writing parsers in other programming languages often requires writing in a separate language that must be run through a separate tool. In @acronym{MIT} Scheme, a parser is written entirely in a Scheme-like syntax, does not require any separate utilities and can be combined with any other @acronym{MIT} Scheme code or features of @acronym{MIT} Scheme. @node Using the Parser @section Using the Parser The parsing and matching libraries of MIT Scheme are called the @dfn{star-parser} (@code{*parser}) and @dfn{star-matcher} (@code{*matcher}). The syntax for the @code{*parser} and @code{*matcher} are inspired by and therefore similar to Regular Expressions and @acronym{BNF}. This makes @acronym{MIT} Scheme's parser and matcher languages high-level and simple to use. The matcher returns true (@code{#t}) on success and false (@code{#f}) for a failure to match. The parser differs from this by returning a vector containing each successfully found token. Besides the parsers enhancements to modify the parsed value, the syntaxes of the parser and matcher systems are almost entirely interchangeable. To use the parser in @acronym{MIT} Scheme you must load the @code{*parser} using the following: @lisp (load-option '*parser) @end lisp @node The Tutorial @chapter The Tutorial @menu * String Literals:: * An Initial Parser:: * Explanation:: * Testing the Parser:: * Matching an Escaped Quotation Mark:: * Removing the Start and End Quotation Marks:: * Matching Other Escape Characters:: * Fine-tuning the Return Value More:: * Comparing the Parser with `read':: * Parsing All Strings:: @end menu @node String Literals @section String Literals The @dfn{string literal} is an object in Scheme and many other programming languages to represent strings of characters. Here are four examples of string literals (comments are inline at the right of each example beyond semicolons as Scheme comments): @example "foo" "foo bar" "" ;; The empty string "\"foo\"" ;; The string "foo" @end example Looking at these examples, a parser of string literals would need to to find characters with a start quotation mark character and end with a quotation mark character. The actual string literal between the quotation marks could contain zero or more characters. Notice that putting quotation mark characters in a string literal requires @dfn{escaping} them with the @dfn{backslash character} (\). This is an idiom understood in most programming languages including Scheme, and will later be understood by the parser. @node An Initial Parser @section An Initial Parser At first, the parser overlooks the escape character issue and therefore ignore string literals containing quotation marks. Instead, any characters between quotation marks is accepted by the parser. We could describe this simple parser as a procedure that: @enumerate @item looks for a quotation mark, @item reads as many characters as possible without reading a quotation mark, @item and then finds a quotation mark. @end enumerate This parser is described with @acronym{MIT} Scheme in the following definition: @lisp ;; parse-string : parser-buffer -> ( vector | #f ) (define parse-string (*parser (seq (match "\"") (* (match (not-char #\"))) (match "\"")))) @end lisp Notice that @code{parse-string} is not defined as a typical Scheme function with specified function arguments (historically called a ``defun'') nor assigned a @code{lambda} expression. It is not immediately clear how to use the @code{parse-string} function created with the @code{*parser}. Instead of a function, @code{parse-string} looks like a variable assigned the value resulting from the @code{*parser}-syntax. The single argument to @code{*parser} is called a ``syntax'' even though it appears @code{*parser} is a function taking a single argument. According to the documentation in @ref{Parser Language, MIT Scheme Documentation, Parser Language, mit-scheme-ref, MIT/GNU Scheme Reference Manual}, @code{*parser} is implemented as a @dfn{macro}. What does it mean that @code{parse-string} is a macro? The parser definition @code{parse-string} could be thought of as a description of another parser. The @code{*parser} macro uses the parser language syntax in the sub-expression to create a corresponding larger (thus ``macro''), parser function. The @emph{macro} parser created handles the specific details of reading and matching input, and is applied to a @dfn{parser buffer}---a buffer of data that parser functions are capable of operating on (This is what is hinted in the source code comment before the definitions of @code{parse-string}). Intelligently, all of these complicated details of actually parsing the data in the buffer are abstracted and hidden. Before explaining how to use the @code{parse-string} parser function on an parser buffer and testing it, the parser language syntax is explained first. @node Explanation @section Explanation Inside the above @code{*parser} expression is a @code{seq} expression. The @code{seq} expression guarantees the sub-expressions are matched @dfn{sequentially} on the data. The @code{seq} used above matches a quotation mark followed by anything but a quotation mark followed by a closing quotation mark. The order of this sequence is critical. The @code{match} expressions, as intuition hints, match their sub-expression. If the sub-expression is a string, then the data are matched against the string, but other @code{*matcher} expressions are allowed in @code{match}. For instance, the @code{not-char} expression matches any character other than the character provided to @code{not-char}. The star (@code{*}) expression matches its sub-expression zero or more times. In the parser above, the @code{*} expression matches any character---that is not a quotation mark---zero or more times. For those familiar with @acronym{BNF}, the source code for the parser maps quite closely to a @acronym{BNF} representation of the grammar. The syntax differs with its ``reverse Polish notation'' as adopted generally by Scheme. @example ::= "\"" * "\"" ::= "a" | "b" | "c" | @dots{} | (anything but a "\"" @enddots{}.) @end example Not even Extended @acronym{BNF} (@acronym{EBNF}) can easily capture the grammar the @code{parse-string} parser accepts, without adding the ellipsis ``hack''. @node Testing the Parser @section Testing the Parser The small parser above attempts to match a simple string literal. Testing the parser requires having a parser buffer. Fortunately, the @code{string->parser-buffer} procedure can create parser buffers from strings. The parser buffer created by @code{string->parser-buffer} can then be passed as an argument to the @code{parse-string} procedure. @lisp (parse-string (string->parser-buffer "\"foo\"")) => #("\"" "f" "o" "o" "\"") ;; "foo" (parse-string (string->parser-buffer "foo")) => #f (parse-string (string->parser-buffer "foo\"")) => #f (parse-string (string->parser-buffer "\"foo")) => #f (parse-string (string->parser-buffer "\"\"")) => #("\"" "\"") ;; "" (parse-string (string->parser-buffer "")) => #f (parse-string (string->parser-buffer "\"foo\" \"foo\"")) => #("\"" "f" "o" "o" "\"") ;; "foo" (parse-string (string->parser-buffer "bar \"foo\"")) => #f (parse-string (string->parser-buffer "\"foo\" bar")) => #("\"" "f" "o" "o" "\"") ;; "foo" @end lisp The tests of the initial parser displays some noteworthy behavior. When a leading quotation mark was not found right away or a closing quotation mark was never found the parser fails and returns false. When there were extra characters, including string literals, after the string literal, the parsing would succeed and ignore the trailing characters. On success, the parser function returns a vector an element for each successful match with the @code{match} function. The element returned is each character matched by the three uses of @code{match} in @code{parse-string}. To the right of the result in Scheme comments is a more readable version of the result. These tests, although maybe not exhaustive, have verified that the @code{parse-string} parser works for simple string literals. The tutorial will improve the parser by having it accept a broader definition of string literals and tune the values returned from parsing successfully. @node Matching an Escaped Quotation Mark @section Matching an Escaped Quotation Mark The next logical step is to allow quotation marks to exist in a string literal. To have the parser accept quotation characters in strings, we need to match ``\"''---an @emph{escaped} quotation mark. When writing an escaped quotation mark in Scheme, quotation marks @emph{and} backslashes need to be escaped. The escaped quotation mark is represented in a string value as ``\\\"'' in Scheme---two backslashes to make a backslash, and another backlash to escape the quotation mark. Where the parser used to match zero or more non-quotation mark characters needs to match both escaped-quotation characters and non-quotation characters. The ``both'' relation can be handled with the @code{alt} parser expression. Unlike @code{seq}, @code{alt} allows either subexpression to match and satisfies the first one that matches. @lisp ;; parse-string : parser-buffer -> ( vector | #f ) (define parse-string (*parser (seq (match "\"") (* (alt (match "\\\"") (match (not-char #\")))) (match "\"")))) @end lisp Tests of the new definition confirm that it works. @lisp (parse-string (string->parser-buffer "\"\\\"foo\\\"\"")) => #("\"" "\\\"" "f" "o" "o" "\\\"" "\"") ;; "\"foo\"" @end lisp Running the above test on the old definition of @code{parse-string} would return a different value and an incorrect parsing of the string literal. @lisp (parse-string (string->parser-buffer "\"\\\"foo\\\"\"\")) => #("\"" "\\" "\"") ;; "\" @end lisp Incorrectly, it returns only the starting quotation mark, the escaping backslash, and then ends prematurely on the escaped quotation mark of the string literal. @node Removing the Start and End Quotation Marks @section Removing the Start and End Quotation Marks The parser matches the leading and ending quotation mark characters and returns a vector value containing the delimiting quotation marks. Really, only the contained string of characters should be returned. To avoid this annoyance we can use the @code{noise} expression. The @code{noise} parser expression is equivalent to the @code{match} expression, except the match isn't included in the returned vector value. The use of @code{match} is replaced with @code{noise} for the start end ending quotation marks. @lisp ;; parse-string : parser-buffer -> ( vector | #f ) (define parse-string (*parser (seq (noise "\"") (* (alt (match "\\\"") (match (not-char #\")))) (noise "\"")))) @end lisp Here are some tests to make sure it worked. @lisp (parse-string (string->parser-buffer "\"foo\"")) => #("f" "o" "o") ;; foo (parse-string (string->parser-buffer "\"\\\"foo\\\"\"")) => #("\\\"" "f" "o" "o" "\\\"") ;; \"foo\" @end lisp @node Matching Other Escape Characters @section Matching Other Escape Characters The parser now allows the quotation mark and removes the delimiting quotation marks, but it is returning any escaped quotation marks (@samp{\\\"}) with slashes. The backslash before the quote should be removed in the returned value. This should even be generalized to all escaped characters. For instance, the backslash character even needs to be escaped by another backslash character. This is not generalized for all characters. Some backslash character sequences represent special characters with specific meanings in computing. These include the @dfn{newline} (@samp{\n}) and the @dfn{tab} (@samp{\t}) characters. To accept the sequence of a backslash character followed by either another backslash character or a quotation mark, the escaping backslash character is discarded using @code{noise}. @lisp ;; parse-string : parser-buffer -> ( vector | #f ) (define parse-string (*parser (seq (noise "\"") (* (alt (seq (noise "\\") (alt (match "\\") (match "\""))) (match (not-char #\")))) (noise "\"")))) @end lisp In the following tests, the resulting vector has strings printed by @acronym{MIT} Scheme with escaped quotation mark characters The backslashes used for escaping in the original string shall no longer be present in the parsed string. @lisp (parse-string (string->parser-buffer "\"\\\"foo\\\"\"")) => #("\"" "f" "o" "o" "\") ;; "foo" (parse-string (string->parser-buffer "\"\\\\\"\"")) => #("\\") ;; \ @end lisp @node Fine-tuning the Return Value More @section Fine-tuning the Return Value More Another common expectation of a parser is the ability to determine the returned value of the parsed input. The parser currently returns a vector composed of each individually matched element. This is not a useful return value for use by other programs. Usually, the atomic ``token'' of the grammar should be the return value, not some representation determined by the underlying @acronym{MIT} Scheme parsing system. In this tutorial, a token should be the matched string literal, not each result coming from the @code{match} call. Provided by the @code{*parser}, the @code{encapsulate} parser expression can modify a return value by applying a function to the vector returned by a parser function. For parsing string literals, @code{encapsulate} will need a function for creating a single string from each of the matched string elements in the vector. To convert a vector of strings, the vector needs to be converted to a list with the function @code{vector->list}. Then, the list of strings created from the vector are converted into a single string. Lists of strings can't be converted into strings automatically in Scheme ( Lists of characters can). Reducing a list of strings to a single string is done with the higher-order function @code{reduce}. The @code{reduce} procedure is often introduced with the example of adding a list of numbers together. @lisp (reduce + 0 '(1 2 3 4)) => 10 @end lisp Instead of addition, the list of strings created with @code{vector->list} will be concatenated by the @code{reduce} function to ``sum'' the strings into a single final string. @lisp (lambda (v) (reduce string-append "" (vector->list v))) @end lisp This function can be inserted in the parser as part of the @code{encapsulate} expression. @lisp ;; parse-string : parser-buffer -> ( vector | #f ) (define parse-string (*parser (encapsulate (lambda (v) (reduce string-append "" (vector->list v))) (seq (noise "\"") (* (alt (seq (noise "\\") (alt (match "\\") (match "\""))) (match (not-char #\")))) (noise "\""))))) @end lisp This allows @code{parse-string} to give a clear result. @lisp (parse-string (string->parser-buffer "\"foo\"")) => #("foo") @end lisp The procedure to @code{encapsulate} the vector can be alternatively defined as a function. It will be called @code{vector-string->string}. @lisp ;; vector-string->string : string vector -> string (define (vector-string->string v) (reduce string-append "" (vector->list v))) @end lisp With this definition we can define a more concise parser composed of reusable parts. @lisp ;; parse-string : parser-buffer -> ( vector | #f ) (define parse-string (*parser (encapsulate vector-string->string (seq (noise "\"") (* (alt (seq (noise "\\") (alt (match "\\") (match "\""))) (match (not-char #\")))) (noise "\""))))) @end lisp @node Comparing the Parser with `read' @section Comparing the Parser with @code{read} Ironically, the utility of this parsing tutorial emphasized in the introduction (@pxref{Introduction}) was misleading. The example parser can successfully parse string literals. The parser is not an astonishing breakthrough. It is actually duplicating what is is already available in Scheme. Programming languages, including Scheme, parse string literals all the time. Scheme, unlike other programming languages, makes the internal Scheme parser available to the user, to even use in their program. Therefore, somewhere in Scheme there exists the string literal parser, the exact capability of this tutorial's parser. The @code{read} procedure available in @acronym{MIT} Scheme and all Scheme implementations can parse Scheme ``objects'' (a string literal is a Scheme object). Instead of reading from a parser buffer, @code{read} reads from a port. A port can be created from a string with @acronym{MIT} Scheme's @code{open-input-string} procedure. So, the tutorial's parser is entirely unoriginal research. @lisp (read (open-input-string "\"foo\"")) => "foo" @end lisp The parser defined in this tutorial is still useful, though. For instance, some programming tasks require parsers for parsing @emph{only} strings (or numbers or some other value). Scheme's @code{read} function doesn't satisfy such a requirement. It accepts objects other than strings, including symbols, characters, numbers, lists and all other valid Scheme values and expressions. It also returns the objects as their Scheme types (strings as string type, symbols as symbol type, @dots{}) and not as a vector of strings. @lisp (read (open-input-string "foo")) => foo @end lisp A string literal parser written with @acronym{MIT} Scheme's @code{*parser} syntax would return false if it met these objects. Further, @code{read} does not return false for failed matches but on occasion will give a parse error. @lisp (read (open-input-string ")")) => PARSE-ERROR: Unmatched close paren #\) @end lisp The parser for literal strings could be rewritten using @code{read}. We've added the code for doing just that at the end of this tutorial in @ref{Parsing with read}. A parser written with @acronym{MIT} Scheme's @code{*parser} syntax and not with Scheme's @code{read} is a useful exercise. The parser is taken further by having it match and return all string literals available in the parser buffer, allowing @dfn{whitespace} characters to exist between each string literal. @node Parsing All Strings @section Parsing All Strings To match all string literals, the entire parser need only match one or more string literals. The plus (@code{+}) expression is similar to the star (@code{*}) expression used elsewhere in the parser. Except, @code{+} must match @emph{one} or more subexpressions rather than @emph{zero} or more. The @code{+} expression is added before the @code{encapsulate} expression. If it is put after, then the @code{encapsulate} expression would concatenate together all the matched string literals into one string literal rather than keeping them separate. Also, the matching of whitespace is also outside of the @code{encapsulate} command to also avoid being ``encapsulated'' into the returned value. @lisp ;; parse-all-strings : parser-buffer -> ( vector | #f ) (define parse-all-strings (*parser (+ (alt (noise " ") (encapsulate vector-string->string (seq (noise "\"") (* (alt (seq (noise "\\") (alt (match "\\") (match "\""))) (match (not-char #\")))) (noise "\""))))))) @end lisp The following tests show how @code{parse-all-strings} returns a vector element for each individual literal string matched. @lisp (parse-all-strings (string->parser-buffer "\"\"")) => #("") (parse-all-strings (string->parser-buffer "")) => #f (parse-all-strings (string->parser-buffer "\"foo\"\"foo\"")) => #("foo" "foo") (parse-all-strings (string->parser-buffer "\"foo\" \"bar\"")) => #("foo" "foo") @end lisp Really, we don't need to duplicate the entire code of @code{parse-string} in @code{parse-all-strings}. The @dfn{modularity} of the @acronym{MIT} Scheme parser language, as is accomplished with the Scheme programming language generally, allow us to reuse parsers as recursive calls. @lisp ;; parse-all-strings : parser-buffer -> ( vector | #f ) (define parse-all-strings (*parser (+ parse-string))) @end lisp @node Conclusion @chapter Conclusion The parsing capabilities in @acronym{MIT} Scheme are powerful, yet succinct and they are useful. Below are some of the qualities parsers created with MIT Scheme exhibit. @table @dfn @item Modular parsing A parser can be reused inside another parser, or separated into smaller manageable pieces. There is a macro language for adding features to a parser like seamless and implied error handling making a completed parser but still clearly represent its grammar. @item Clear syntax Instead of specifying @dfn{start condition} flags for enforcing parsing sequences or being limited by Regular Expressions, the @acronym{MIT} Scheme parsing language can be used to write human-readable parsers. This benefits the original author and others needing to make modifications or improvements to the parser. Complex Regular Expressions are notorious for being difficult to read, maintain and modify by other programmers. The parser syntax is verbose but is more powerful and lucid than Regular Expressions. @item Intelligent technology In commonly used programming languages, matching input requires reading in data to a variable and then operating on the variable. MIT Scheme requires only specifying the match and what to do with matched values. The detailed implementation that accomplishes the parsing of input is hidden. The resulting parser is optimized by @acronym{MIT} Scheme for performance and scale by reading input in smaller sequences to avoid bounds on the input size or the need to @dfn{backtrack} over the input. This makes a parser created with MIT Scheme not only useful and simple but extremely powerful. @item Intelligent philosophy There exists the slight possibility for a parser written in @acronym{MIT} Scheme and generated by @acronym{MIT} Scheme to be inadequate for any number of technical reasons, including parsing accuracy or runtime performance. This offers two possible solutions. One could @dfn{hack} their parser to correct for this situation by writing some or all of the parser in low-level Scheme code. Alternatively, one could investigate ways to improve @acronym{MIT} Scheme's underlying parser generation macro. Because @acronym{MIT} Scheme is @dfn{free software}, access is given to the source code for making or suggesting improvements to the package. @end table @node parse-string.scm @appendix @file{parse-string.scm} The following is the entire source code for the final version of the string literal parser written in @acronym{MIT} Scheme. @lisp (load-option '*parser) ;; vector-string->string : string vector -> string (define (vector-string->string v) (reduce string-append "" (vector->list v))) ;; parse-string : parser-buffer -> ( vector | #f ) (define parse-string (*parser (encapsulate vector-string->string (seq (noise "\"") (* (alt (seq (noise "\\") (alt (match "\\") (match "\""))) (match (not-char #\")))) (noise "\""))))) ;; parse-all-strings : parser-buffer -> ( vector | #f ) (define parse-all-strings (*parser (+ parse-string))) @end lisp @node Parsing with read @appendix Parsing with @code{read} Here is how to define a string literal parser by using the @code{read} function provided by Scheme as mention in @ref{Comparing the Parser with `read'}. @lisp (define (parse-string #!optional port) (let* ((port (if (input-port? port) port current-input-port)) (object (read port))) (if (string? object) object #f))) @end lisp @c (parse-string (open-input-string "\"foo\"")) A version of the function @code{parse-all-strings} that was introduced in @ref{Parsing All Strings} can also be rewritten using @code{read}. It is left to the reader as an exercise. @c @lisp @c (define (parse-all-strings #!optional port) @c (let ((object (parse-string port))) @c (if (string? object) @c (cons object (parse-all-strings port)) @c '()))) @c @end lisp @bye @c Local Variables: @c compile-command: "makeinfo --html --output parsing-strings.html --no-split --no-headers parsing-strings.texi && texi2pdf parsing-strings.texi" @c End: @c (set! compile-command "makeinfo --html --output parsing-strings.html --no-split --no-headers parsing-strings.texi")