class Chem::PullParser

Overview

A pull parser to read a text (ASCII) document by consuming one token at a time.

The parser reads line by line, which is set to an internal buffer. Subsequent tokens (consecutive non-whitespace characters) can be consumed via the #consume_token method (although only from the current line). The string representation of the current token can be obtained via the #str methods or it can be interpreted as a primitive type via the specialized #float and #int methods. It can be also interpreted as a custom type via the #parse method. Upon parsing issues, these methods may return nil or raise a ParseException exception.

When creating an instance, the parser must be positioned in the first line before consuming a token by calling #consume_line, using the yielding method #each_line, or checking for end of file via #eof?. Upon reading a new line, the cursor is reset and the current token is set to nil, so #consume_token must be called before parsing. If the current token is not set, the parsing methods may return nil or raise an exception. Alternatively, use the convenience #next_f, #next_i, or #next_s methods that consume the next token and return the interpreted value.

The #next_* methods always move forward into the IO or line, so care must be taken when calling them as invoking #next_s? twice will return the next two strings (if possible), not twice the same value. For instance, calling #next_i? after #next_f? will read the next token instead of re-interpreting the current one as an integer. In such cases, consider using the non-advancing methods (#int instead of #next_i).

Examples

pull = PullParser.new IO::Memory.new("abc def\n1234 5.6 abc\n")
pull.consume_token.str?        # => nil (line not read)
pull.consume_line              # place the parser at the first line
pull.str?                      # => nil (current token is nil)
pull.next_s?                   # => "abc"
pull.str?                      # => "abc" (current token was set by #next_s?)
pull.next_s?                   # => "def"
pull.next_s?                   # => nil (end of line)
pull.next_s                    # raises ParseException (no token can be consumed)
pull.consume_line              # place the parser at the second line
pull.next_i                    # => 1234
pull.next_f                    # => 5.6
pull.consume_token.str?        # => "abc"
pull.int?                      # => nil
pull.float?                    # => nil
pull.parse &.sum(&.+('a'.ord)) # => 3
pull.consume_token.str?        # => nil (end of line)
pull.consume_line              # => nil (place the parser at the end of IO)
pull.consume_token.str?        # => nil (current line is nil)

Additionally, the cursor can be manually placed on the current line via the #at methods. This is useful for parsing fixed-column formats such as PDB. The non-question variants will raise if the cursor is out of bounds.

pull = PullParser.new IO::Memory.new("abc123.45def\nABCDEF 5.16\n")
pull.consume_line
pull.at(3, 6)     # returns the parser itself
pull.str          # => "123.45"
pull.float        # => 123.45
pull.at(9).str    # => "d"
pull.at(0, 3).str # => "abc"
pull.at(100, 5)   # raises ParseException

Defined in:

chem/pull_parser.cr

Constructors

Instance Method Summary

Constructor Detail

def self.new(io : IO) #

Creates a PullParser which will consume the contents of io.


[View source]

Instance Method Detail

def at(index : Int, message : String = "Cursor out of current line") : self #

Sets the cursor to the character at index in the current line. Raises ParseException with the given message if index is out of bounds.


[View source]
def at(start : Int, count : Int, message : String = "Cursor out of current line") : self #

Sets the cursor at start spanning count or less (if there aren't enough) characters in the current line. Raises ParseException with the given message if start is out of bounds.


[View source]
def at(range : Range, message : String = "Cursor out of current line") : self #

Sets the cursor at range in the current line. Raises ParseException with the given message if range is out of bounds.


[View source]
def at?(start : Int, count : Int) : self #

Sets the cursor at start spanning count or less (if there aren't enough) characters in the current line. If start is out of bounds, the current token will be set to nil.


[View source]
def at?(index : Int) : self #

Sets the cursor to the character at index in the current line. If index is out of bounds, the current token will be set to nil.


[View source]
def at?(range : Range) : self #

Sets the cursor at range in the current line. If range is out of bounds, the current token will be set to nil.


[View source]
def bool(message : String = "Invalid boolean") : Bool #

Parses and returns the boolean represented by the current token. Raises ParseException with the given message if the token is not set or it is not a valid boolean.

Valid boolean values include "true", "t", "false", and "f". Parsing is case insensitive so "true", "True", and "TRUE" are valid.


[View source]
def bool(if_blank default : Bool) : Bool #

Parses and returns the boolean represented by the current token. Returns the given default value if the token is blank. Raises ParseException if the token is not set or it is not a valid boolean.

Valid boolean values include "true", "t", "false", and "f". Parsing is case insensitive so "true", "True", and "TRUE" are valid.


[View source]
def bool? : Bool | Nil #

Parses and returns the boolean represented by the current token, or nil if the token is not set or it is not a valid boolean.

Valid boolean values include "true", "t", "false", and "f". Parsing is case insensitive so "true", "True", and "TRUE" are valid.


[View source]
def char(message : String = "Empty token") : Char #

Returns the first character of the curren token. Raises ParseException with the message if the token is not set.


[View source]
def char? : Char | Nil #

Returns the first character of the curren token, or nil if it is not set.


[View source]
def consume(count : Int) : self #

Sets the current token to the next count characters in the current line. If there are no more characters, the token will be empty.

pull = PullParser.new IO::Memory.new("123 456\n789\n")
pull.consume_line
pull.consume(4).str?  # => "123 "
pull.consume(2).str?  # => "45"
pull.consume(20).str? # => "6"
pull.consume(10).str? # => nil

[View source]
def consume(& : Char -> Bool) : self #

Sets the current token to the next characters in the current line for which the given block is truthy. If the block is always falsey, the token will be empty.

pull = PullParser.new IO::Memory.new("abc def\n1234 56 789\n")
pull.consume_line
pull.consume(&.alphanumeric?).str? # => "123"
pull.consume(&.alphanumeric?).str? # => nil
pull.consume(&.whitespace?).str?   # => " "
pull.consume(&.alphanumeric?).str? # => "456"
pull.consume(&.alphanumeric?).str? # => nil

[View source]
def consume_line : self #

Sets the line to the next line from the enclosed IO.


[View source]
def consume_token : self #

Sets the current token to the next consecutive non-whitespace characters in the current line.

pull = PullParser.new IO::Memory.new("abc def\n1234 56 789\n")
pull.consume_token.str? # => nil
pull.consume_line       # place the parser at the first line
pull.consume_token.str? # => "abc"
pull.consume_token.str? # => "def"
pull.consume_token.str? # => nil
pull.consume_line       # place the parser at the second line
pull.consume_token.str? # => "1234"
pull.consume_token.str? # => "56"
pull.consume_token.str? # => "789"
pull.consume_token.str? # => nil
pull.consume_line       # place the parser at the end of IO
pull.consume_token.str? # => nil

[View source]
def consume_until(char : Char) : self #

Sets the current token to the next characters up to the first occurrence of char or end of line in the current line.

pull = PullParser.new IO::Memory.new("123 456\n789\n")
pull.consume_line
pull.consume_until(' ').str? # => "123"
pull.consume_until('4').str? # => " "
pull.consume_until('x').str? # => "456"
pull.consume_until('x').str? # => nil

[View source]
def consume_until(& : Char -> Bool) : self #

Reads the next characters in the current line until the given block is truthy or end of line is reached. Returns nil at end of line.

pull = PullParser.new IO::Memory.new("123 456\n789\n")
pull.consume_line
pull.consume_until(&.whitespace?).str?    # => "123"
pull.consume_until(&.in_set?("0-9")).str? # => " "
pull.consume_until(&.whitespace?).str?    # => "456"
pull.consume_until(&.alphanumeric?).str?  # => nil

[View source]
def current_line : String | Nil #

Returns the current line, or nil if it is not set.


[View source]
def each_line(& : String -> ) : Nil #

Yields each line in the enclosed IO.

pull = PullParser.new IO::Memory.new("123 456\n789\n")
pull.each_line { |line| puts line }

Prints out:

123 456
789

Note that the current line will be also yielded if it is set.

pull = PullParser.new IO::Memory.new("123 456\n789\n")
pull.consume_line # reads and sets the current line
pull.current_line # => "123 456"
pull.each_line { |line| puts line }

Prints out:

123 456
789

[View source]
def eof? : Bool #

Returns true at the end of file, otherwise false.

NOTE This method attempts to read a line from the enclosed IO if the current line is not set, so calling #consume_line after this could inadvertently discard a line.


[View source]
def eol? : Bool #

Returns true if the current token is at the end of line, otherwise false.

If no current line is set (at the beginning or end of file), it returns true as well.

pull = PullParser.new IO::Memory.new("123 456\n789\n")
pull.eol? # => true (no current line)
pull.consume_line
pull.eol?               # => false (beginning of line)
pull.consume_token.str? # => "123"
pull.eol?               # => false
pull.consume_token.str? # => "456"
pull.eol?               # => false
pull.consume_token.str? # => nil
pull.eol?               # => true

[View source]
def error(message : String) : NoReturn #

Raises ParseException with the given message. The exception will hold the location of the current line and token if set.

The current token is accessible via the named substitution %{token}.


[View source]
def expect(expected : Char, message : String = "Expected %{expected}, got %{actual}") : self #

Checks if the current token is one character long and equals expected, else raises ParseException.

If message is given, it is used as the parse error. Use "%{expected}" and "%{actual}" as placeholders for the expected and actual values.

pull = PullParser.new IO::Memory.new("1 2 3 4 5 6\n789\n")
pull.consume_line
pull.consume_token
pull.expect('1').char? # => '1'
pull.expect 'a'        # raises ParseException (123 != 'a')
pull.consume_line
pull.expect 'a' # raises ParseException (empty token)

[View source]
def expect(expected : Range(Char | Nil, Char | Nil), message : String = "Expected %{actual} to be within %{expected}") : self #

Checks if the current token is one character long and it is within the given range of characters, else raises ParseException.

If message is given, it is used as the parse error. Use "%{expected}" and "%{actual}" as placeholders for the expected and actual values.

pull = PullParser.new IO::Memory.new("a b c d e f")
pull.consume_line
pull.consume_token.expect('a'..'c').char? # => 'a'
pull.consume_token.expect('a'..'c').char? # => 'b'
pull.consume_token.expect('a'..'c').char? # => 'c'
pull.consume_token.expect 'a'..'c'        # raises ParseException ('d' not in 'a'..'c')
pull.consume_line
pull.expect 'a'..'c' # raises ParseException (empty token)

[View source]
def expect(expected : Enumerable(Char), message : String = "Expected %{expected}, got %{actual}") : self #

Checks if the current token is one character long and equals to any expected, else raises ParseException.

If message is given, it is used as the parse error. Use "%{expected}" and "%{actual}" as placeholders for the expected and actual values.

pull = PullParser.new IO::Memory.new("1 a 2 b 3 c\n789\n")
pull.consume_line
pull.consume_token.expect({'1', '2', '3'}).char? # => '1'
pull.consume_token.expect({'1', '2', '3'})       # raises ParseException ('a' != 1, 2, and 3)
pull.consume_line
pull.expect({'1', '2', '3'}) # raises ParseException (empty token)

[View source]
def expect(expected : String, message : String = "Expected %{expected}, got %{actual}") : self #

Checks if the current token equals expected, else raises ParseException.

If message is given, it is used as the parse error. Use "%{expected}" and "%{actual}" as placeholders for the expected and actual values.

pull = PullParser.new IO::Memory.new("123 456\n789\n")
pull.consume_line
pull.consume_token
pull.expect("123").str? # => "123"
pull.expect "abc"       # raises ParseException (123 != abc)
pull.consume_line
pull.expect "456" # raises ParseException (empty token)

[View source]
def expect(pattern : Regex, message : String = "Expected %{actual} to match %{expected}") : self #

Checks if the current token matches pattern, else raises ParseException.

If message is given, it is used as the parse error. Use "%{expected}" and "%{actual}" as placeholders for the expected and actual values.

pull = PullParser.new IO::Memory.new("123 456\n789\n")
pull.consume_line
pull.consume_token
pull.expect(/[0-9]+/).str? # => "123"
pull.expect /[a-z]+/       # raises ParseException (123 does not match [a-z]+)
pull.consume_line
pull.expect /[0-9]+/ # raises ParseException (empty token)

NOTE The entire token is returned even if the match is partial.

pull = PullParser.new IO::Memory.new("123abc\n789\n")
pull.consume_line
pull.consume_token
pull.expect(/[a-z]+/).str? # => "123abc"
pull.expect(/[0-9]+/).str? # => "123abc"

Use anchors to ensure full match.

pull = PullParser.new IO::Memory.new("123abc\n789\n")
pull.consume_line
pull.consume_token.expect /^[a-z]+$/ # raises ParseException

[View source]
def expect(expected : Enumerable(String), message : String = "Expected %{expected}, got %{actual}") : self #

Checks if the current token equals any of expected, else raises ParseException.

If message is given, it is used as the parse error. Use "%{expected}" and "%{actual}" as placeholders for the expected and actual values.

pull = PullParser.new IO::Memory.new("123 456\n789\n")
pull.consume_line
pull.consume_token
pull.expect(["123", "456"]).str? # => "123"
pull.consume_token
pull.expect(["123", "456"]).str? # => "456"
pull.expect(["abc", "def"])      # raises ParseException ("456" != "abc" or "def")
pull.consume_token
pull.expect(["123", "456"]) # raises ParseException (empty token)

[View source]
def expect_next(expected : String, message : String = "Expected %{expected}, got %{actual}") : self #

Same as #expect but advances to the next token first.


[View source]
def expect_next(pattern : Regex, message : String = "Expected %{actual} to match %{expected}") : self #

Same as #expect but advances to the next token first.


[View source]
def expect_next(expected : Enumerable(String), message : String = "Expected %{expected}, got %{actual}") : self #

Same as #expect but advances to the next token first.


[View source]
def float(message : String = "Invalid real number") : Float64 #

Parses and returns the floating-point number represented by the current token. Raises ParseException with the given message if the token is not set or it is not a valid float representation.


[View source]
def float(if_blank default : Float64) : Float64 #

Parses and returns the floating-point number represented by the current token. Returns the given default value if the token is blank. Raises ParseException if the token is not set or it is not a valid float representation.


[View source]
def float? : Float64 | Nil #

Parses and returns the floating-point number represented by the current token, or nil if the token is not set or it is not a valid float representation.


[View source]
def int(message : String = "Invalid integer") : Int32 #

Parses and returns the integer represented by the current token. Raises ParseException with the given message if the token is not set or it is not a valid number.


[View source]
def int(if_blank default : Int32) : Int32 #

Parses and returns the integer represented by the current token. Returns the given default value if the token is blank. Raises ParseException if the token is not set or it is not a valid number.


[View source]
def int? : Int32 | Nil #

Parses and returns the integer represented by the current token, or nil if the token is not set or it is not a valid number.


[View source]
def internal_parse(& : Bytes -> T | Nil) : T | Nil forall T #

Yields the current token if set and returns the parsed value.


[View source]
def io : IO #

Returns the enclosed IO.


[View source]
def line : String | Nil #

Returns the current line if set, else nil.

pull = PullParser.new IO::Memory.new("123 456\n")
pull.consume_line
pull.line # => "123 456"
pull.consume_token
pull.line # => "123 456"
pull.consume_token
pull.line # => "123 456"
pull.consume_line
pull.line # => nil

[View source]
def line!(message : String = "Expected a line") : String #

Returns the current line. Raises ParseException with the given message at the start or end of file.

pull = PullParser.new IO::Memory.new("123 456\n")
pull.consume_line
pull.line! # => "123 456"
pull.consume_token
pull.line! # => "123 456"
pull.consume_token
pull.line! # => "123 456"
pull.consume_line
pull.line! # raises ParseException

[View source]
def next_bool(message : String) : Bool #

Reads the next token in the current line, and interprets it via #bool, which raises ParseException with the given message at the end of line or if the token is an invalid representation.


[View source]
def next_bool : Bool #

Reads the next token in the current line, and interprets it via #bool, which raises ParseException at the end of line or if the token is an invalid representation.


[View source]
def next_bool? : Bool | Nil #

Reads the next token in the current line, and interprets it via #bool?, which returns nil at the end of line or if the token is an invalid representation.


[View source]
def next_f(message : String) : Float64 #

Reads the next token in the current line, and interprets it via #float, which raises ParseException with the given message at the end of line or if the token is an invalid representation.


[View source]
def next_f : Float64 #

Reads the next token in the current line, and interprets it via #float, which raises ParseException at the end of line or if the token is an invalid representation.


[View source]
def next_f? : Float64 | Nil #

Reads the next token in the current line, and interprets it via #float?, which returns nil at the end of line or if the token is an invalid representation.


[View source]
def next_i(message : String) : Int32 #

Reads the next token in the current line, and interprets it via #int, which raises ParseException with the given message at the end of line or if the token is an invalid representation.


[View source]
def next_i : Int32 #

Reads the next token in the current line, and interprets it via #int, which raises ParseException at the end of line or if the token is an invalid representation.


[View source]
def next_i? : Int32 | Nil #

Reads the next token in the current line, and interprets it via #int?, which returns nil at the end of line or if the token is an invalid representation.


[View source]
def next_s(message : String) : String #

Reads the next token in the current line, and interprets it via #str, which raises ParseException with the given message at the end of line or if the token is an invalid representation.


[View source]
def next_s : String #

Reads the next token in the current line, and interprets it via #str, which raises ParseException at the end of line or if the token is an invalid representation.


[View source]
def next_s? : String | Nil #

Reads the next token in the current line, and interprets it via #str?, which returns nil at the end of line or if the token is an invalid representation.


[View source]
def parse(type : Bool.class) : Bool #

Parses and returns the current token as Bool by calling #bool. Raises ParseException if parsing fails.


[View source]
def parse(type : Float64.class) : Float64 #

Parses and returns the current token as Float64 by calling #float. Raises ParseException if parsing fails.


[View source]
def parse(type : Int32.class) : Int32 #

Parses and returns the current token as Int32 by calling #int. Raises ParseException if parsing fails.


[View source]
def parse(type : String.class) : String #

Parses and returns the current token as String by calling #str. Raises ParseException if parsing fails.


[View source]
def parse(message : String = "Could not parse %{token} at %{loc_with_file}", & : String -> T | Nil) : T forall T #

Yields the current token and returns the parsed value. Raises ParseException with the given message if no token is set or the block returns nil.


[View source]
def parse?(type : Bool.class) : Bool | Nil #

Parses and returns the current token as Bool by calling #bool?, or nil if parsing fails.


[View source]
def parse?(type : Float64.class) : Float64 | Nil #

Parses and returns the current token as Float64 by calling #float?, or nil if parsing fails.


[View source]
def parse?(type : Int32.class) : Int32 | Nil #

Parses and returns the current token as Int32 by calling #int?, or nil if parsing fails.


[View source]
def parse?(type : String.class) : String | Nil #

Parses and returns the current token as String by calling #str?, or nil if parsing fails.


[View source]
def parse?(& : String -> T | Nil) : T | Nil forall T #

Yields the current token if set and returns the parsed value.


[View source]
def parse_if_present(message : String = "Could not parse %{token} at %{loc_with_file}", default : _ = nil, & : String -> _) #

Yields the current token if set and returns the parsed value. Raises ParseException with the given message if the block returns nil. If no token is set, returns default.


[View source]
def parse_next(message : String = "Could not parse %{token} at %{loc_with_file}", & : String -> T | Nil) : T forall T #

Yields the next token if present and returns the parsed value. Raises ParseException with the given message if no token is found or the block returns nil.


[View source]
def parse_next?(& : String -> T | Nil) : T | Nil forall T #

Yields the next token if present and returns the parsed value.


[View source]
def parse_next_if_present(message : String = "Could not parse %{token} at %{loc_with_file}", default : _ = nil, & : String -> _) #

Yields the next token if present and returns the parsed value. Raises ParseException with the given message if the block returns nil. If no token is found, returns default.


[View source]
def peek : Char | Nil #

Returns the character after the current token, or nil if empty.

pull = PullParser.new IO::Memory.new("123 456\n789\n")
pull.consume_line
pull.peek               # => '1' (first character at the beginning of line)
pull.peek               # => '1' (`#peek` does not consume characters)
pull.consume_token.str? # => "123"
pull.peek               # => ' ' (after the current token)
pull.consume_token.str? # => "456"
pull.peek               # => nil (end of line)

[View source]
def rest_of_line : String #

Returns the rest of the line (after the current token), or an empty string if cursor is at the end of line/file.

NOTE The cursor will span the entire returned string.

pull = PullParser.new IO::Memory.new("123 456\n789\n")
pull.consume_line
pull.next_s?      # => "123"
pull.rest_of_line # => " 456"
pull.str?         # => " 456"
pull.next_s?      # => nil
pull.rest_of_line # => ""

[View source]
def rewind_line : self #

Sets the cursor at the beginning of the current line if set.


[View source]
def skip_blank_lines : self #

Discards blank lines.


[View source]
def skip_whitespace #

Skips whitespace characters.

The cursor is placed at the first non-whitespace character, but it is not consumed.

pull = PullParser.new IO::Memory.new("  123 456\n789\n")
pull.consume_line
pull.skip_whitespace
pull.str? # => nil # token size is zero
pull.line # => "123 456"

[View source]
def str(message : String = "Empty token") : String #

Returns the current token as a string. Raises ParseException with the given message if the token is not set.


[View source]
def str? : String | Nil #

Returns the current token as a string, or nil if it is not set.


[View source]
def token : Bytes | Nil #

Returns the bytes of the current token if it is set.


[View source]