abstract class TopDown::Parser
 
  - TopDown::Parser
 - TopDown::CharReader
 - Reference
 - Object
 
Overview
TopDown::Parser is the main class to derive for building a parser.
Basis
A minimal parser could be:
class Parser < TopDown::Parser
  root :expression
  syntax :expression do
  end
end
which will parse anything but EOF.
root indicate the starting point of parsing, the rest is to do inside syntax.
Syntax are like functions, in which, any code can be added.
Inside it, #parse & parse! can be used to define what to parse.
syntax :expression do
  a = parse('a')
  parse(/( )+/)
  foo = parse!("foo")
  {a, foo}
end
"a   foo" # ~> {'a', "foo"}
"a   bar" # ~> Unexpected character 'b', expected 'foo'
"ab"      # ~> Unexpected character 'b', expected expression
NOTE 
In this documentation, "foo" # ~> result is used as a shortcut for
Parser.new("foo").parse # => result
In the above, if parse!("foo") fail to parse, it raises an error.
But parse('a') don't raises, instead it use a break (break Fail.new) to stop the current sequence to be parsed.
This failure is caught by the root :expression, that why it next raises an exception.
This difference is important because is on what is based TopDown, using #parse let a change to the surrounding context to handle the failure, whereas
parse! permit the raise directly, leading to better errors.
For example, using parse! above:
syntax :expression do
  a = parse!('a')
  parse!(/( )+/)
  foo = parse!("foo")
  {a, foo}
end
"ab" # ~> Unexpected character 'b', expected pattern /( )+/
Gives a more precise error.
Repeat, union, maybe
repeat, union, maybe allow interesting things to be parsed. They take both a block, in which parse failure can occur.
syntax :expression do
  repeat do
    parse('a')
    parse!('b')
  end
  maybe { parse('c') }
  parse!('d')
  union do
    parse('e')
    parse('é')
    parse('è')
  end
  "ok"
end
This is equivalent to parse /(ab)*c?d[eéè]/, in a bit more verbose. However, doing so allow to insert code between and retrieve
only needed result, storing in array, or to insert some logic.
For example:
syntax :expression do
  array = [] of {Char, Char}
  repeat do
    array << {parse('a'), parse!('b')}
  end
  have_c = maybe { parse('c') }
  parse!('d')
  if have_c
    e = union do
      parse('e')
      parse('é')
      parse('è')
    end
  end
  {array, e}
end
Here, we store each ab in an array and accept the union 'e'|'é'|'è' only if the optional 'c' have been parsed.
"ababd" # ~> {[{'a', 'b'}, {'a', 'b'}], nil}
"cde"   # ~> {[], 'e'}
"dé"    # ~> Unexpected character 'é', expected 'EOF'
"abac"  # # Unexpected character 'c', expected 'b'
NOTE 
When using repeat, union, and maybe, we always use #parse at first (without exclamation), because they break on failure,
which is caught by the repeat/maybe/union. They know so they should continue parsing without a failure.
Infix
infix can be used inside a union. It permits to parse operators with precedence easily.
syntax :expression do
  union do
    parse(/\d+/).to_i
    infix 30 "**" { left() ** parse!(:expression) }
    infix 20 '*' { left() * parse!(:expression) }
    infix 20 '/' { left() / parse!(:expression) }
    infix 10 '+' { left() + parse!(:expression) }
    infix 10 '-' { left() - parse!(:expression) }
  end
end
Infix are treated specially by the union. They are parsed after any other member of the union, the left() result is updated each time.
They are parsed in function of their precedence (first argument), higher precedence mean grouped first.
"3*6+6*4" # ~> 42
Precedence:
current_precedence() is initially zero, it changes when entering in infix. This is this value that guide the recursion for parsing operators.
Its value can be changed at specific places, so parsing order can be entirely controlled.
For example, assuming we want to parse a ternary if _ ? _ : _ that have a higher precedence that classical operators (e.g. "1+1?2:3+3" gets parsed as (1 + (1?2:3)) + 3.
We can do the following:
syntax(:expression) do
  union do
    parse(/\d+/).to_i
    infix 10, :plus
    infix 90, :ternary_if
  end
end
syntax :plus, '+' do
  left() + parse!(:expression)
end
syntax :ternary_if, '?' do
  cond = left()
  then_arm = parse!(:expression)
  parse!(':')
  else_arm = parse!(:expression)
  cond != 0 ? then_arm : else_arm
end
"1+1?2:3+3" # ~> 6
However the following is not what we except:
"1+1?2+2:3+3" # => Unexpected character '+', expected ':'
This happens because the + inside the then_arm have lower precedence, it wants so
let its left with higher precedence finish before parsing itself. So the then_arm finish
but we except a : right after, so it fails (got '+', expected ':').
To fix that, we can set the current_precedence() for the then arm to 0:
then_arm = parse!(:expression, with_precedence: 0)
"1+1?2+2:3+3" # ~> 8
  Defined in:
ast.crhooks.cr
parser/parselets.cr
parser/parser.cr
parser/syntax_error.cr
parser/tokens.cr
Macro Summary
- 
        ast(ast_class, *args, at = nil, **options)
        
          
TODO docs
 - begin_location
 - 
        capture(&)
        
          
Captures all characters parsed inside the block.
 - 
        contexts(*contexts)
        
          
TODO docs
 - 
        current_precedence
        
          
Returns the current precedence.
 - 
        end_word
        
          
Allows to match a word only if a non-alphanumeric follows
 - 
        forward_fail(result)
        
          
Returns the given result, or
breakthe current sequence if result is aFail. - 
        infix(precedence, parselet, associativity = "right", &block)
        
          
TODO docs
 - left
 - 
        maybe(parselet = nil, &)
        
          
Parses the sequence inside the block, returns
nilif it fails. - 
        maybe_union(&block)
        
          
Equivalent to `maybe { union { ...
 - 
        no_skip(&)
        
          
Prevents the
skiprules to be triggered inside the given block. - 
        parse(parselet, with_precedence = nil, options = nil, &block)
        
          
Parses the given parselet (
ParseletLiteral), and yield/return the according result. - 
        parse!(parselet, error = nil, at = nil, with_precedence = nil, options = nil, &block)
        
          
Similar to
Parser.parse, but raisesSyntaxErrorif the parsing fail. - 
        partial_capture(&block)
        
          
Captures chosen characters parsed inside the block.
 - 
        peek(parselet, with_precedence = nil, options = nil, &block)
        
          
TODO docs parse parselet if succeed, don't move the cursor and return parselet result else, don't move the cursor and break a failure
 - 
        prefix(precedence, parselet, &block)
        
          
Equivalent to
parse(parselet, with_precedence: precedence) - 
        repeat(parselet = nil, &)
        
          
Repeatedly parses the sequence inside the block until it fails.
 - 
        repeat(parselet = nil, *, separator, &)
        
          
Repeatedly parses the sequence inside the block until it fails, with a separator parselet between each iteration.
 - 
        repeat_count(*, separator = nil)
        
          
TODO docs
 - repeat_n(count, parselet = nil, &block)
 - 
        repeat_to_a(type, *, separator = nil, &)
        
          
TODO docs
 - 
        repeat_to_h(type, *, separator = nil, &)
        
          
TODO docs
 - 
        repeat_to_s(*, separator = nil, &block)
        
          
TODO docs
 - 
        repeat_union(&block)
        
          
Equivalent to `repeat { union { ...
 - 
        root(parselet)
        
          
Defines the main syntax to parse.
 - 
        sequence(&)
        
          
Allows to group a sequence of parsing (inside a union for example).
 - 
        skip(&members)
        
          
Defines what is ignored during the parsing.
 - 
        syntax(syntax_name, *prefixs, end! = nil, &block)
        
          
Defines a syntax that can be called with
parse(SymbolLiteral). - 
        token(token_name, parselet = nil, context = nil, &block)
        
          
Defines a token to parse.
 - 
        tokens(&block)
        
          
This macro allows to define how tokens are parsed.
 - 
        union(&members)
        
          
Tries to parse each member of the union, returns the result of the first that succeed.
 - 
        union!(error = nil, at = nil, &members)
        
          
Similar to
Parser.union, but raisesSyntaxErrorif the parsing fail. 
Instance Method Summary
- 
        #each_token(&) : Nil
        
          
Yields successively parsed tokens.
 - 
        #hook_could_not_parse_any_token(got : Char, expected : Nil)
        
          
Override this method to modify the default error message when any token can be parsed.
 - 
        #hook_expected_any_character_but(got : Char, expected : Char)
        
          
Override this method to modify the default error message when
parse!(not('a'))fail. - 
        #hook_expected_any_character_but_range(got : Char, expected : Range(Char, Char))
        
          
Override this method to modify the default error message when
parse!(not('a'..'z'))fail. - 
        #hook_expected_any_in_range(got : Char, expected : Range(Char, Char))
        
          
Override this method to modify the default error message when
parse!('a'..'z')fail. - 
        #hook_expected_any_token_but(got : Token | Nil, expected : String)
        
          
Override this method to modify the default error message when
parse!(not(["TOKEN"]))fail. - 
        #hook_expected_character(got : Char, expected : Char)
        
          
Override this method to modify the default error message when
parse!('a')fail. - 
        #hook_expected_pattern(got : Char, expected : Regex)
        
          
Override this method to modify the default error message when
parse!(/regex/)fail. - 
        #hook_expected_syntax(got : Char, expected : Symbol)
        
          
Override this method to modify the default error message when
parse!(:syntax)fail. - 
        #hook_expected_token(got : Token | Nil, expected : String)
        
          
Override this method to modify the default error message when
parse!(["TOKEN"])fail. - 
        #hook_expected_word(got : Char, expected : String)
        
          
Override this method to modify the default error message when
parse!("string")fail. - #hook_union_failed(got : Char, expected : Nil)
 - 
        #location : Location
        
          
Returns the current cursor location.
 - 
        #location=(location : Location)
        
          
Move the cursor to the new location.
 - 
        #parse
        
          
Parses the source contained in this parser.
 - 
        #raise_syntax_error(message : String, at location : Location = self.location, source : String = self.source)
        
          
Raises a
SyntaxErrorat current#locationand currentsource. - 
        #raise_syntax_error(message : String, at location : Range(Location, Location), source : String = self.source)
        
          
Raises a
SyntaxErrorat current#locationand currentsource. - 
        #raise_syntax_error(message : String, at location : Range(Location, Nil), source : String = self.source)
        
          
Raises a
SyntaxErrorat current#locationand currentsource. - 
        #source=(source : String)
        
          
Modifies the source and reset the cursor
#locationto zero. - 
        #tokens
        
          
Returns the array of parsed tokens.
 
Instance methods inherited from class TopDown::CharReader
  
  
    
      each_char(& : Char -> )
    each_char, 
    
  
    
      location : Location
    location, 
    
  
    
      location=(location : Location)
    location=, 
    
  
    
      next_assci_char? : Char | Nil
    next_assci_char?, 
    
  
    
      next_char : Char
    next_char, 
    
  
    
      peek_char
    peek_char, 
    
  
    
      previous_assci_char? : Char | Nil
    previous_assci_char?, 
    
  
    
      source : String
    source, 
    
  
    
      source=(source : String)
    source=
    
  
    
  Constructor methods inherited from class TopDown::CharReader
  
  
    
      new(source : String)
    new
    
  
    
    
  
    
    
    
    
  
    
    
    
    
  
Macro Detail
Returns the Location at the beginning of the syntax.
The given location is after skipping characters.
Captures all characters parsed inside the block.
Returns a String
NOTE  skip characters are still captured, use partial_capture instead.
capture do
  repeat(/ +/) do
    parse(/\w+/)
  end
end
"a   bc d e"  # ~> "a   bc d e"
"Hello World" # ~> "Hello World"
        Returns the current precedence.
Initially zero.
Changes inside an infix or when with_precedence is used.
Allows to match a word only if a non-alphanumeric follows
parse("foo") { end_word }
Equivalent to: (but faster)
parse(/foo\b/) { nil }
"foo bar" # ~> "foo"
"foobar"  # ~> Unexpected character 'b', expected 'EOF'
NOTE  end_word returns nil.
Returns the given result, or break the current sequence if result is a Fail.
This macro is the recommended ways to handle Fail.
result = Array.new(3) do
  parse("a")
end
typeof(result) # => Array(String) | TopDown::Parser::Fail
result = forward_fail(result)
typeof(result) # => Array(String)
        In the context of an infix member, returns the left result of the current union.
Returns nil in a context other than inside an infix.
Each time a union member is successfully parsed, left value is updated.
As a union continue parsing infix until possible,
left allows to retrieve the subsequent results.
syntax(:exp) do
  union do
    parse('a')
    infix(1, :comma)
  end
end
syntax(:comma) do
  l = left() # left returns either
  r = parse(:exp)
  "#{l}, #{r}"
end
        Parses the sequence inside the block, returns nil if it fails.
x = parse('1').to_i
y =
  maybe do
    parse('+')
    parse!('1').to_i
  end
parse!(';')
x + (y || 0)
"1;"   # ~> 1
"1+1;" # ~> 2
"1+*;" # ~> "Unexpected character '*', expected '1'"
"1*;"  # ~> "Unexpected character '*', expected ';'"
        Parses the given parselet (ParseletLiteral), and yield/return the according result.
failure:
If the given parselet fails to parse, it break the current sequence with a Fail. Failure is catch by the surrounding context.
- inside an 
union, tell the union that member have fail. The union tries to parse the next member. - inside a 
maybe, the maybe will returnnil. - inside a 
repeat, make the repeat to stop. - inside a 
syntax, the syntax is considered to fail, it will in turnbreakor raises. 
block:
A block could be given to let return the value of the block.
precedence:
with_precedence (NumberLiteral) changes the current_precedence, the given parselet will be parsed as if the contained syntax have this precedence.
Allow to handle multi-precedence for ternary-or-more operator.
Similar to Parser.parse, but raises SyntaxError if the parsing fail.
error: allows to customize error message, it can be:
- a 
StringLiteral: Following percent interpolations are available:%{got}: The character or token causing the failure.%{expected}: The expectedChar,String,Regex,Tokenor syntax name (Symbol).
 - a 
ProcLiteraltaking two arguments: 'got' and 'expected', with types explained above. 
at: indicates where the error is raised, it can be a Location, Range(Location, Location) or Range(Location, Nil).
failure:
Because it raises, this shouldn't be used as first parselet inside a union, maybe, repeat and syntax.
This would raises before the surrounding context could try an other solution.
However, this should generally be used everywhere else, to allow more localized errors.
Captures chosen characters parsed inside the block.
Yields a String::Builder in which characters to capture can be added.
Returns a String.
partial_capture do |io|
  repeat(/ +/) do
    io << parse(/\w+/)
  end
end
"a   bc d e"  # ~> "abcde"
"Hello World" # ~> "HelloWorld"
        TODO docs parse parselet if succeed, don't move the cursor and return parselet result else, don't move the cursor and break a failure
Equivalent to parse(parselet, with_precedence: precedence)
Repeatedly parses the sequence inside the block until it fails.
Returns nil. Results should be either collected by capture or stored inside a variable or array.
x = parse('1').to_i
repeat do
  parse('+')
  x += parse!('1').to_i
end
parse!(';')
x
"1;"         # ~> 1
"1+1+1+1+1;" # ~> 5
"1+1+1+*;"   # ~> "Unexpected character '*', expected '1'"
"1*;"        # ~> "Unexpected character '*', expected ';'"
        Repeatedly parses the sequence inside the block until it fails, with a separator parselet between each iteration.
Returns nil. Results should be either collected by capture or stored inside a variable or array.
x = 0
parse('(')
repeat(',') do
  x += parse('1').to_i
end
parse!(')')
x
"()"          # ~> 0
"(1,1,1,1,1)" # ~> 5
"(1,1,)"      # ~> "Unexpected character ',', expected ')'"
"(11)"        # ~> "Unexpected character '1', expected ')'"
        Defines the main syntax to parse.
parselet (ParseletLiteral): the syntax name or parselet to be the root.
Allows to group a sequence of parsing (inside a union for example).
union do
  sequence do
    parse("a")
    parse!("b")
  end
  parse("2")
        Defines what is ignored during the parsing.
It is triggered every time prior a parselet is parsed.
It works similarly to the members of a Parser.union, in which each members are consumed while possible.
class Parser < TopDown::Parser
  skip do
    # Skips spaces:
    parse ' '
    parse '\n'
    parse '\t'
    parse '\r'
    # Line comments:
    parse "//" { repeat { parse not('\n') } }
    # Block comments:
    parse "/*" do
      repeat { parse not("*/") }
      parse "*/"
    end
  end
end
        Defines a syntax that can be called with parse(SymbolLiteral).
syntax_name (SymbolLiteral): The identifier of the syntax.
*prefix (ParseletLiteral): A list of parselet, to parse at the beginning of the syntax, results are yielded in the given block.
block: The body of the syntax, work like a usual body function.
class Parser < TopDown::Parser
  syntax :expression do
    a = parse('a')
    parse(/( )+/)
    foo = parse!("foo")
    {a, foo}
  end
end
        Defines a token to parse. Can be used only inside Parser.tokens.
Each token have a token_name (StringLiteral), and is parsed according to parselet.
The block indicate the value of the resulting Token.
- If no parselet provided, it deduced from token_name.
 - If no block provided, token value is nil. (use 
&.itselfto actually keep the parsed string) 
Example:
class Parser < TopDown::Parser
  tokens do
    token("+")                            # Parses '+', produces Token[+]
    token("hey")                          # Parses "hey", produces Token[hey]
    token("new_line", '\n')               # Parses '\n', produces Token[new_line]
    token("int", /\d+/) { |v| v.to_i }    # Parses /\d+/, produces Token[int:<int_value>]
    token("string", :tk_string, &.itself) # Parses syntax :tk_string, produces Token[string:<string_value>]
  end
end
        This macro allows to define how tokens are parsed.
Each line of the block correspond a token.
class Parser < TopDown::Parser
  tokens do
    token("+")
    token("-")
    token("**")
    token("*")
    token("/")
    token("new_line", '\n')
  end
end
Use Parser.token to define how a token is parsed
The order of token matter, because there are parsed in this order.
Hence, if * is moved before **, the result will be two token * parsed instead of one **.
In addition, characters are never skipped while parsing tokens (and all inner syntax calls), see no_skip.
Tries to parse each member of the union, returns the result of the first that succeed.
union do
  parse('1')
  parse('2')
  sequence do
    parse('a')
    parse('b')
    parse!('c')
  end
  parse('a')
end
"1"   # ~> '1'
"abc" # ~> 'c'
"ab*" # ~> "Unexpected character '*', expected 'c'"
"a"   # ~> 'a'
"*"   # ~> "Could not parse syntax ':main'
members:
Members are delimited by each expression in the Expressions of the given block.
NOTE  a block begin/end doesn't group a member since it is inlined by the crystal parser. Use sequence instead.
failure:
If all members of the union fail, the union is considered to fail, and will break the current sequence
infix:
infix members could be added to a union.
infix members are always treated after each normal members, in the order there are defined.
A union act as follow:
- Tries to parse each normal member.
 - When one succeed, store the result. 
leftallows to retrieve this result. - Tries to parse 
infixmembers whose precedence is greater than current precedence (initial is zero). - Inner of 
infixis executed, it potentially triggers parsing of this union recursively, but current precedence is sets to theinfixprecedence. - When one fully succeed, store the result. 
leftis updated. - Repeats step 3-5) until there no more infix.
 - Returns last result.
 
This is mainly the top-down operator precedence algorithm, also known as precedence climbing.
Similar to Parser.union, but raises SyntaxError if the parsing fail.
Instance Method Detail
Override this method to modify the default error message when any token can be parsed.
Override this method to modify the default error message when parse!(not('a')) fail.
Override this method to modify the default error message when parse!(not('a'..'z')) fail.
Override this method to modify the default error message when parse!('a'..'z') fail.
Override this method to modify the default error message when parse!(not(["TOKEN"])) fail.
Override this method to modify the default error message when parse!('a') fail.
Override this method to modify the default error message when parse!(/regex/) fail.
Override this method to modify the default error message when parse!(:syntax) fail.
Override this method to modify the default error message when parse!(["TOKEN"]) fail.
Override this method to modify the default error message when parse!("string") fail.
Returns the current cursor location.
The location can be used later to raise an error at that point.
loc = self.location
parse('(')
exp = parse!(:expression)
parse!(')', error: "Unterminated parenthesis expression", at: loc)
exp
        Move the cursor to the new location.
The location should be well formed, otherwise error display won't be right.
It is recommended to always use a location got by self.location.
This methods is used to backtrack the parser.
However, prefer using Parser.union, Parser.maybe, and Parser.repeat over manual backtracks.
Parses the source contained in this parser.
Returns the result of the root syntax.
Expects eof after parsing root syntax.
Raises SyntaxError if fail to parse.
Raises a SyntaxError at current #location and current source.
location could be a range between two Location. A endless range mean a location up to self.location.
Raises a SyntaxError at current #location and current source.
location could be a range between two Location. A endless range mean a location up to self.location.
Raises a SyntaxError at current #location and current source.
location could be a range between two Location. A endless range mean a location up to self.location.