class HTML5::Parser
- HTML5::Parser
- Reference
- Object
Overview
A parser implements the HTML5 parsing algorithm: https://html.spec.whatwg.org/multipage/syntax.html#tree-construction
Defined in:
html5/parser.crConstructors
Instance Method Summary
-
#acknowledge_self_closing_tag
Section 12.2.5
-
#add_child(n : Node)
add_child adds a child node n to the top element, and pushes n onto the stack of open elements if it is an element node.
-
#add_element
add_element adds a child element based on the current token.
-
#add_formatting_element
Section 12.2.4.3
-
#add_text(text : String)
add_text adds text to the preceding node if it is a Text Node, or else it calls add_child with a new Text Node
-
#adjusted_current_node
Section 12.2.4.2
-
#clear_active_formatting_elements
Section 12.2.4.3.
-
#clear_stack_to_context(s : Scope)
clear_stack_to_context pops elements off the stack of open elements until a scope-defined element is found.
-
#doc : Node
doc is the document root element
-
#element_in_scope(s : Scope, *match_tags)
element_in_scope is like pop_until, except that it doesn't modify the stack of open elements
-
#foster_parent(n : Node)
foster_parent adds a child node according to the foster parenting rules.
-
#fragment : Bool
fragment is whether the parser is parsing an HTML fragment.
-
#generate_implied_end_tags(*exceptions)
generate_implied_end_tags pops nodes off the stack of open elements as long as the top node has a tag name of dd, dt, li, optgroup, option, p, rb, rp, rt or rtc.
-
#has_self_closing_token : Bool
Self-closing tags like
are treated as start tags, except that has_self_closing_token is set while they are being processed -
#has_self_closing_token=(has_self_closing_token : Bool)
Self-closing tags like
are treated as start tags, except that has_self_closing_token is set while they are being processed -
#in_body_end_tag_formatting(atom : Atom::Atom, tag_name : String)
TODO this is a fairly literal line-by-line translation of that algorithm.
-
#in_body_end_tag_other(atom : Atom::Atom, tag_name : String)
performs the "any other end tag" algorithm for in_body_im.
-
#in_foreign_content
Section 12.2.6
-
#index_of_element_in_scope(s, *match_tags)
index_of_element_in_scope returns the index of in @oe of the highest element whose tag is in match_tags that is in scope.
- #oe=(arr : Array(Node))
- #parse
-
#parse_current_token
runs the current token through the parsing routines until it is consumed.
-
#parse_generic_raw_text_elements
parse_generic_raw_text_elements implements the generic raw text element parsing algorithm defined in 12.2.6.2.
-
#parse_implied_token(t : TokenType, atom : Atom::Atom, data : String)
parses a token as thoug it had appeared in the parser's input
-
#pop_until(s : Scope, *match_tags : Atom::Atom)
pop_until pops the stack of open elements at the highest element whose tag is in matchTags, provided there is no higher element in the scope's stop tags (as defined in section 12.2.4.2).
-
#reconstruct_active_formatting_elements
Section 12.2.4.3.
-
#reset_insertion_mode
Section 12.2.4.1, "reset the insertion mode".
-
#set_original_im
set_original_im sets the insertion mode to return to after completing a text or inTableText insertion mode.
-
#should_foster_parent
should_foster_parent returns whether the next node to be added should be foster parented.
- #top : Node
Constructor Detail
Instance Method Detail
add_child adds a child node n to the top element, and pushes n onto the stack of open elements if it is an element node.
add_text adds text to the preceding node if it is a Text Node, or else it calls add_child with a new Text Node
clear_stack_to_context pops elements off the stack of open elements until a scope-defined element is found.
element_in_scope is like pop_until, except that it doesn't modify the stack of open elements
foster_parent adds a child node according to the foster parenting rules. Section 12.2.6.1, "foster parenting"
generate_implied_end_tags pops nodes off the stack of open elements as long as the top node has a tag name of dd, dt, li, optgroup, option, p, rb, rp, rt or rtc. If exceptions are specified, nodes with that name will not be popped off.
Self-closing tags like
are treated as start tags, except that has_self_closing_token is set while they are being processed
Self-closing tags like
are treated as start tags, except that has_self_closing_token is set while they are being processed
TODO this is a fairly literal line-by-line translation of that algorithm. Once the code successfully parses the comprehensive test suite, we should refactor this code to be more idiomatic.
performs the "any other end tag" algorithm for in_body_im. "Any other end tag" handling from 12.2.6.5 The rules for parsing tokens in foreign content https://html.spec.whatwg.org/multipage/syntax.html#parsing-main-inforeign
index_of_element_in_scope returns the index of in @oe of the highest element whose tag is in match_tags that is in scope. If no matching element is in scope, it returns -1
parse_generic_raw_text_elements implements the generic raw text element parsing algorithm defined in 12.2.6.2. https://html.spec.whatwg.org/multipage/parsing.html#parsing-elements-that-contain-only-text
TODO Since both RAWTEXT and RCDATA states are treated as tokenizer's part officially, need to make tokenizer consider both states.
parses a token as thoug it had appeared in the parser's input
pop_until pops the stack of open elements at the highest element whose tag is in matchTags, provided there is no higher element in the scope's stop tags (as defined in section 12.2.4.2). It returns whether or not there was such an element. If there was not, popUntil leaves the stack unchanged.
For example, the set of stop tags for table scope is: "html", "table". If the stack was: ["html", "body", "font", "table", "b", "i", "u"] then pop_until(tableScope, "font") would return false, but pop_until(tableScope, "i") would return true and the stack would become: ["html", "body", "font", "table", "b"]
If an element's tag is in both the stop tags and match_tags, then the stack will be popped and the function returns true (provided, of course, there was no higher element in the stack that was also in the stop tags). For example, pop_until(tableScope, "table") returns true and leaves: ["html", "body", "font"]
set_original_im sets the insertion mode to return to after completing a text or inTableText insertion mode. Section 12.2.4.1, "using the rules for".
should_foster_parent returns whether the next node to be added should be foster parented.