module
HTML5::StreamingHandler
Overview
StreamingHandler is a callback interface for SAX-style streaming HTML parsing.
Implement this module and pass it to HTML5.stream to receive events as the
HTML5 parser constructs the document tree. Events are emitted in document order
as the parser processes tokens — you don't have to wait for the full document
to be parsed.
The parser still builds the full DOM tree internally (required by the HTML5 spec for correct handling of misnested markup), but your handler receives events incrementally as nodes are created.
Example
class MyHandler
include HTML5::StreamingHandler
def on_element_open(tag : String, attrs : Array(HTML5::Attribute), namespace : String)
puts "Open: <#{tag}>"
end
def on_element_close(tag : String, namespace : String)
puts "Close: </#{tag}>"
end
def on_text(text : String)
puts "Text: #{text}" unless text.strip.empty?
end
end
handler = MyHandler.new
HTML5.stream(io, handler)
Defined in:
html5/streaming.crInstance Method Summary
-
#on_comment(text : String)
Called when a comment node is added to the tree.
-
#on_doctype(data : String)
Called when a doctype node is added to the tree.
-
#on_document_end(doc : Node)
Called when parsing is complete.
-
#on_element_close(tag : String, namespace : String)
Called when an element is closed (popped from the stack of open elements).
-
#on_element_open(tag : String, attrs : Array(Attribute), namespace : String)
Called when an element node is added to the tree.
-
#on_text(text : String)
Called when a text node is added to the tree.
Instance Method Detail
Called when parsing is complete. The final document Node is provided
for any post-processing that needs the full tree.
Called when an element is closed (popped from the stack of open elements).
Note: void elements like <br> and <img> will receive both an
#on_element_open and an #on_element_close call.
Called when an element node is added to the tree.
tag is the lower-cased tag name, attrs are the element's attributes,
and namespace is empty for HTML elements or "math"/"svg" for foreign content.