class HTML5::Tokenizer
- HTML5::Tokenizer
- Reference
- Object
Overview
Tokenizer returns a stream of HTML Tokens
Defined in:
html5/token.crConstructors
-
.new(r : IO, context_tag : String)
returns a new HTML5 Tokenizer for the given IO Reader, for tokenizing an existing element's InnerHTML fragment.
-
.new(r : IO)
returns a new HTML5 Tokenizer for the given IO Reader.
Instance Method Summary
-
#allow_cdata=(val : Bool)
allow_cdata sets whether or not the tokenizer recognizes as the text "foo".
-
#buffered
buffered returns a slice containing data buffered but not yet tokenized
- #eof? : Bool
- #exception? : Exception?
-
#max_buf=(n : Int32)
sets a limit on the amount of data buffered during tokenization.
-
#next
scans the next token and returns its type.
-
#next_is_not_raw_text
next_is_not_raw_text instructs the tokenizer that the next token should not be considered as 'raw text'.
-
#raw : Bytes
raw returns the unmodified text of the current token.
-
#tag_attr : Tuple(Bytes | Nil, Bytes | Nil, Bool)
tag_attr returns the HTML5.lower-cased key and unescaped value of the next unparsed attribute for the current tag token and whether there are more attributes.
-
#tag_name : Tuple(Bytes | Nil, Bool)
tag_name returns the HTML5.lower-cased name of a tag token (the "img" out of ) and whether the tag has attributes.
-
#text : Bytes | Nil
text returns the unescaped text of a text, comment or doctype token.
-
#token : Token
token returns the current Token.
Constructor Detail
returns a new HTML5 Tokenizer for the given IO Reader, for tokenizing an existing element's InnerHTML fragment. context_tag is that element's tag, such as "div" or "iframe".
For example, how the InnerHTML "a<b" is tokenized depends on whether it is for a
or a