class Sanitize::Policy::HTMLSanitizer

Overview

This policy serves as a good default configuration that should fit most typical use cases for HTML sanitization.

Configurations

It comes in three different configurations with different sets of supported HTML tags.

They only differ in the default configuration of allowed tags and attributes. The transformation behaviour is otherwise the same.

Common Configuration

.common: Accepts most standard tags and thus allows using a good amount of HTML features (see COMMON_SAFELIST).

This is the recommended default configuration and should work for typical use cases unless strong restrictions on allowed content is required.

sanitizer = Sanitize::Policy::HTMLSanitizer.common
sanitizer.process(%(<a href="javascript:alert('foo')">foo</a>))        # => %(foo)
sanitizer.process(%(<p><a href="foo">foo</a></p>))                     # => %(<p><a href="foo" rel="nofollow">foo</a></p>)
sanitizer.process(%(<img src="foo.jpg">))                              # => %(<img src="foo.jpg">)
sanitizer.process(%(<table><tr><td>foo</td><td>bar</td></tr></table>)) # => %(<table><tr><td>foo</td><td>bar</td></tr></table>)

NOTE This configuration (nor any other) does not accept &lt;html&gt;, &lt;head&gt;, or # &lt;body&gt; tags by default. In order to use #sanitized_document they need to be added explicitly to accepted_arguments.

Basic Configuration

.basic: This set accepts some basic tags including paragraphs, headlines, lists, and images (see BASIC_SAFELIST).

sanitizer = Sanitize::Policy::HTMLSanitizer.basic
sanitizer.process(%(<a href="javascript:alert('foo')">foo</a>))        # => %(foo)
sanitizer.process(%(<p><a href="foo">foo</a></p>))                     # => %(<p><a href="foo" rel="nofollow">foo</a></p>)
sanitizer.process(%(<img src="foo.jpg">))                              # => %(<img src="foo.jpg">)
sanitizer.process(%(<table><tr><td>foo</td><td>bar</td></tr></table>)) # => %(foo bar)

Inline Configuration

.inline: Accepts only a limited set of inline tags (see INLINE_SAFELIST).

sanitizer = Sanitize::Policy::HTMLSanitizer.inline
sanitizer.process(%(<a href="javascript:alert('foo')">foo</a>))        # => %(foo)
sanitizer.process(%(<p><a href="foo">foo</a></p>))                     # => %(<a href="foo" rel="nofollow">foo</a>)
sanitizer.process(%(<img src="foo.jpg">))                              # => %()
sanitizer.process(%(<table><tr><td>foo</td><td>bar</td></tr></table>)) # => %(foo bar)

Attribute Transformations

Attribute transformations are identical in all three configurations. But more advanced transforms won't apply if the respective attribute is not allowed in accepted_tags. So you can easily add additional elements and attributes to lower-tier sets and get the same attribute validation. For example: .inline doesn't include &lt;img&gt; tags, but when img is added to accepted_attributes, the policy validates img tags the same way as in .common.

URL Sanitization

This transformation applies to attributes that contain a URL (configurable through (#url_attributes).

The same URISanitizer is used for any URL attributes.

Anchor Tags

For &lt;a&gt; tags with a href attribute, there are two transforms:

Anchor tags the have neither a href, name or id attribute are stripped.

NOTE name and id attributes are not in any of the default sets of accepted attributes, so they can only be used when explicitly enabled.

Image Tags

&lt;img&gt; tags are stripped if they don't have a src attribute.

Size Attributes

If a tag has width or height attributes, the values are validated to be numerical or percent values. By default, these attributes are only accepted for <img> tags.

Alignment Attribute

The align attribute is validated against allowed values for this attribute: center, left, right, justify, char. If the value is invalid, the attribute is stripped.

Classes

class attributes are filtered to accept only classes described by #valid_classes. String values need to match the class name exactly, regex values need to match the entire class name.

class is accepted as a global attribute in the default configuration, but no values are allowed in #valid_classes.

All classes can be accepted by adding the match-all regular expression /.*/ to #valid_classes.

Defined in:

policy/html_sanitizer.cr
policy/html_sanitizer/safelist.cr

Constant Summary

BASIC_SAFELIST = INLINE_SAFELIST.merge({"blockquote" => Set {"cite"}, "br" => Set(String).new, "h1" => Set(String).new, "h2" => Set(String).new, "h3" => Set(String).new, "h4" => Set(String).new, "h5" => Set(String).new, "h6" => Set(String).new, "hr" => Set(String).new, "img" => Set {"alt", "src", "longdesc", "width", "height", "align"}, "li" => Set(String).new, "ol" => Set {"start"}, "p" => Set {"align"}, "pre" => Set(String).new, "ul" => Set(String).new})

Compatible with basic Markdown features.

COMMON_SAFELIST = BASIC_SAFELIST.merge({"dd" => Set(String).new, "del" => Set {"cite"}, "details" => Set(String).new, "dl" => Set(String).new, "dt" => Set(String).new, "div" => Set(String).new, "ins" => Set {"cite"}, "kbd" => Set(String).new, "q" => Set {"cite"}, "ruby" => Set(String).new, "rp" => Set(String).new, "rt" => Set(String).new, "s" => Set(String).new, "samp" => Set(String).new, "strike" => Set(String).new, "sub" => Set(String).new, "summary" => Set(String).new, "sup" => Set(String).new, "table" => Set(String).new, "time" => Set {"datetime"}, "tbody" => Set(String).new, "td" => Set(String).new, "tfoot" => Set(String).new, "th" => Set(String).new, "thead" => Set(String).new, "tr" => Set(String).new, "tt" => Set(String).new, "var" => Set(String).new})

Accepts most standard tags and thus allows using a good amount of HTML features.

INLINE_SAFELIST = {"a" => Set {"href", "hreflang"}, "abbr" => Set(String).new, "acronym" => Set(String).new, "b" => Set(String).new, "code" => Set(String).new, "em" => Set(String).new, "i" => Set(String).new, "strong" => Set(String).new, "*" => Set {"dir", "lang", "title", "class"}}

Only limited elements for inline text markup.

Constructors

Instance Method Summary

Instance methods inherited from class Sanitize::Policy::Whitelist

accepted_attributes : Hash(String, Set(String)) accepted_attributes, accepted_attributes=(accepted_attributes : Hash(String, Set(String))) accepted_attributes=, global_attributes : Set(String) global_attributes, transform_attributes(name : String, attributes : Hash(String, String)) : String | CONTINUE | STOP transform_attributes, transform_tag(name : String, attributes : Hash(String, String)) : String | CONTINUE | STOP transform_tag, transform_text(text : String) : String | Nil transform_text

Constructor methods inherited from class Sanitize::Policy::Whitelist

new(accepted_attributes : Hash(String, Set(String))) new

Instance methods inherited from class Sanitize::Policy

block_tag?(name) block_tag?, block_whitespace : String block_whitespace, block_whitespace=(block_whitespace : String) block_whitespace=, process(html : String | XML::Node) : String process, process_document(html : String | XML::Node) : String process_document, transform_tag(name : String, attributes : Hash(String, String)) : String | Processor::CONTINUE | Processor::STOP transform_tag, transform_text(text : String) : String | Nil transform_text

Constructor Detail

def self.basic : HTMLSanitizer #

Creates an instance which accepts more basic tags including paragraphs, headlines, lists, and images (see BASIC_SAFELIST).


[View source]
def self.common : HTMLSanitizer #

Creates an instance which accepts even more standard tags and thus allows using a good amount of HTML features (see COMMON_SAFELIST).

Unless you need tight restrictions on allowed content, this is the recommended default.


[View source]
def self.inline : HTMLSanitizer #

Creates an instance which accepts a limited set of inline tags (see INLINE_SAFELIST).


[View source]

Instance Method Detail

def accept_tag(tag : String, attributes : Set(String) = Set(String).new) #

[View source]
def add_rel_nofollow : Bool #

Add rel="nofollow" to every &lt;a&gt; tag with href attribute.


[View source]
def add_rel_nofollow=(add_rel_nofollow : Bool) #

Add rel="nofollow" to every &lt;a&gt; tag with href attribute.


[View source]
def add_rel_noopener : Bool #

Add rel="noopener" to every &lt;a&gt; tag with href and target attribute.


[View source]
def add_rel_noopener=(add_rel_noopener : Bool) #

Add rel="noopener" to every &lt;a&gt; tag with href and target attribute.


[View source]
def append_attribute(attributes, attribute, value) #

[View source]
def no_links #

Removes anchor tag (&lt;a&gt; from the list of accepted tags).

NOTE This doesn't reject attributes with URL values for other tags.


[View source]
def transform_attributes(tag : String, attributes : Hash(String, String)) : String | CONTINUE | STOP #

[View source]
def transform_classes(tag, attributes) #

[View source]
def transform_tag_a(attributes) #

[View source]
def transform_tag_img(attributes) #

[View source]
def transform_uri(tag, attributes, attribute, uri : URI) : String | Nil #

[View source]
def transform_url_attribute(tag, attributes, attribute, value) #

[View source]
def transform_url_attributes(tag, attributes) #

[View source]
def uri_sanitizer : Sanitize::URISanitizer #

Configures the URISanitizer to use for sanitizing URL attributes.


[View source]
def uri_sanitizer=(uri_sanitizer : Sanitize::URISanitizer) #

Configures the URISanitizer to use for sanitizing URL attributes.


[View source]
def url_attributes : Set(String) #

Configures which attributes are considered to contain URLs. If empty, URL sanitization is disabled.

Default value: Set{"src", "href", "action", "cite", "longdesc"}.


[View source]
def url_attributes=(url_attributes : Set(String)) #

Configures which attributes are considered to contain URLs. If empty, URL sanitization is disabled.

Default value: Set{"src", "href", "action", "cite", "longdesc"}.


[View source]
def valid_class?(tag, klass, valid_classes) #

[View source]
def valid_classes : Set(String | Regex) #

Configures which classes are valid for class attributes.

String values need to match the class name exactly, regex values need to match the entire class name.

Default value: empty


[View source]
def valid_classes=(valid_classes : Set(String | Regex)) #

Configures which classes are valid for class attributes.

String values need to match the class name exactly, regex values need to match the entire class name.

Default value: empty


[View source]
def valid_classes=(classes) #

[View source]