class HTML5::Parser

Overview

A parser implements the HTML5 parsing algorithm: https://html.spec.whatwg.org/multipage/syntax.html#tree-construction

Defined in:

html5/parser.cr

Constructors

Instance Method Summary

Constructor Detail

def self.new(r : IO, **opts) #

[View source]

Instance Method Detail

def acknowledge_self_closing_tag #

Section 12.2.5


[View source]
def add_child(n : Node) #

add_child adds a child node n to the top element, and pushes n onto the stack of open elements if it is an element node.


[View source]
def add_element #

add_element adds a child element based on the current token.


[View source]
def add_formatting_element #

Section 12.2.4.3


[View source]
def add_text(text : String) #

add_text adds text to the preceding node if it is a Text Node, or else it calls add_child with a new Text Node


[View source]
def adjusted_current_node #

Section 12.2.4.2


[View source]
def clear_active_formatting_elements #

Section 12.2.4.3.


[View source]
def clear_stack_to_context(s : Scope) #

clear_stack_to_context pops elements off the stack of open elements until a scope-defined element is found.


[View source]
def doc : Node #

doc is the document root element


[View source]
def element_in_scope(s : Scope, *match_tags) #

element_in_scope is like pop_until, except that it doesn't modify the stack of open elements


[View source]
def foster_parent(n : Node) #

foster_parent adds a child node according to the foster parenting rules. Section 12.2.6.1, "foster parenting"


[View source]
def fragment : Bool #

fragment is whether the parser is parsing an HTML fragment.


[View source]
def generate_implied_end_tags(*exceptions) #

generate_implied_end_tags pops nodes off the stack of open elements as long as the top node has a tag name of dd, dt, li, optgroup, option, p, rb, rp, rt or rtc. If exceptions are specified, nodes with that name will not be popped off.


[View source]
def has_self_closing_token : Bool #

Self-closing tags like


are treated as start tags, except that has_self_closing_token is set while they are being processed


[View source]
def has_self_closing_token=(has_self_closing_token : Bool) #

Self-closing tags like


are treated as start tags, except that has_self_closing_token is set while they are being processed


[View source]
def in_body_end_tag_formatting(atom : Atom::Atom, tag_name : String) #

TODO this is a fairly literal line-by-line translation of that algorithm. Once the code successfully parses the comprehensive test suite, we should refactor this code to be more idiomatic.


[View source]
def in_body_end_tag_other(atom : Atom::Atom, tag_name : String) #

performs the "any other end tag" algorithm for in_body_im. "Any other end tag" handling from 12.2.6.5 The rules for parsing tokens in foreign content https://html.spec.whatwg.org/multipage/syntax.html#parsing-main-inforeign


[View source]
def in_foreign_content #

Section 12.2.6


[View source]
def index_of_element_in_scope(s, *match_tags) #

index_of_element_in_scope returns the index of in @oe of the highest element whose tag is in match_tags that is in scope. If no matching element is in scope, it returns -1


[View source]
def oe=(arr : Array(Node)) #

[View source]
def parse #

[View source]
def parse_current_token #

runs the current token through the parsing routines until it is consumed.


[View source]
def parse_generic_raw_text_elements #

parse_generic_raw_text_elements implements the generic raw text element parsing algorithm defined in 12.2.6.2. https://html.spec.whatwg.org/multipage/parsing.html#parsing-elements-that-contain-only-text

TODO Since both RAWTEXT and RCDATA states are treated as tokenizer's part officially, need to make tokenizer consider both states.


[View source]
def parse_implied_token(t : TokenType, atom : Atom::Atom, data : String) #

parses a token as thoug it had appeared in the parser's input


[View source]
def pop_until(s : Scope, *match_tags : Atom::Atom) #

pop_until pops the stack of open elements at the highest element whose tag is in matchTags, provided there is no higher element in the scope's stop tags (as defined in section 12.2.4.2). It returns whether or not there was such an element. If there was not, popUntil leaves the stack unchanged.

For example, the set of stop tags for table scope is: "html", "table". If the stack was: ["html", "body", "font", "table", "b", "i", "u"] then pop_until(tableScope, "font") would return false, but pop_until(tableScope, "i") would return true and the stack would become: ["html", "body", "font", "table", "b"]

If an element's tag is in both the stop tags and match_tags, then the stack will be popped and the function returns true (provided, of course, there was no higher element in the stack that was also in the stop tags). For example, pop_until(tableScope, "table") returns true and leaves: ["html", "body", "font"]


[View source]
def reconstruct_active_formatting_elements #

Section 12.2.4.3.


[View source]
def reset_insertion_mode #

Section 12.2.4.1, "reset the insertion mode".


[View source]
def set_original_im #

set_original_im sets the insertion mode to return to after completing a text or inTableText insertion mode. Section 12.2.4.1, "using the rules for".


[View source]
def should_foster_parent #

should_foster_parent returns whether the next node to be added should be foster parented.


[View source]
def top : Node #

[View source]