module Luce

Overview

Parses text in a Markdown-like format building an AST tree that can then be rendered to HTML.

If you are only interested in rendering Markdown to HTML, please refer to the README which explains the use of Luce.to_html.

The main entrypoint to the library is the Document which encapsulates the parsing process converting a Markdown text into a tree of Node (Array(Node)).

Two main parsing mechanics are used:

Looking closely at Document.new() a few other concepts merit a mention:

If you are looking at extending the library to support custom formatting what you may want is to:

Defined in:

luce.cr
luce/ast.cr
luce/block_parser.cr
luce/block_syntaxes/block_html_syntax.cr
luce/block_syntaxes/block_syntax.cr
luce/block_syntaxes/block_tag_block_html_syntax.cr
luce/block_syntaxes/blockquote_syntax.cr
luce/block_syntaxes/code_block_syntax.cr
luce/block_syntaxes/dummy_block_syntax.cr
luce/block_syntaxes/empty_block_syntax.cr
luce/block_syntaxes/fenced_blockquote_syntax.cr
luce/block_syntaxes/fenced_code_block_syntax.cr
luce/block_syntaxes/header_syntax.cr
luce/block_syntaxes/header_with_id_syntax.cr
luce/block_syntaxes/horizontal_rule_syntax.cr
luce/block_syntaxes/list_syntax.cr
luce/block_syntaxes/long_block_html_syntax.cr
luce/block_syntaxes/ordered_list_syntax.cr
luce/block_syntaxes/ordered_list_with_checkbox_syntax.cr
luce/block_syntaxes/other_tag_block_html_syntax.cr
luce/block_syntaxes/paragraph_syntax.cr
luce/block_syntaxes/setext_header_syntax.cr
luce/block_syntaxes/setext_header_with_id_syntax.cr
luce/block_syntaxes/table_syntax.cr
luce/block_syntaxes/unordered_list_syntax.cr
luce/block_syntaxes/unordered_list_with_checkbox_syntax.cr
luce/charcode.cr
luce/document.cr
luce/emojis.cr
luce/extension_set.cr
luce/html_renderer.cr
luce/inline_parser.cr
luce/inline_syntaxes/autolink_extension_syntax.cr
luce/inline_syntaxes/autolink_syntax.cr
luce/inline_syntaxes/code_syntax.cr
luce/inline_syntaxes/color_swatch_syntax.cr
luce/inline_syntaxes/delimiter_syntax.cr
luce/inline_syntaxes/email_autolink_syntax.cr
luce/inline_syntaxes/emoji_syntax.cr
luce/inline_syntaxes/emphasis_syntax.cr
luce/inline_syntaxes/escape_syntax.cr
luce/inline_syntaxes/image_syntax.cr
luce/inline_syntaxes/inline_html_syntax.cr
luce/inline_syntaxes/inline_syntax.cr
luce/inline_syntaxes/line_break_syntax.cr
luce/inline_syntaxes/link_syntax.cr
luce/inline_syntaxes/strikethrough_syntax.cr
luce/inline_syntaxes/tag_syntax.cr
luce/inline_syntaxes/text_syntax.cr
luce/legacy_emojis.cr
luce/patterns.cr
luce/util.cr

Constant Summary

INDICATOR_FOR_CHECKED_CHECK_BOX = "\u200B\u200B"

Invisible string used to placehold for a checked Checkbox.

DEPRECATED This string is no longer used internally. It will be removed in a future version.

DEPRECATED This string is no longer used internally. It will be removed in a future version.

INDICATOR_FOR_UNCHECKED_CHECK_BOX = "\u200B"

Invisible string used to placehold for an unchecked Checkbox.

DEPRECATED This string is no longer used internally. It will be removed in a future version.

DEPRECATED This string is no longer used internally. It will be removed in a future version.

VERSION = "0.3.0"

Class Method Summary

Class Method Detail

def self.blockquote_fence_pattern : Regex #

Fenced blockquotes


def self.blockquote_pattern : Regex #

The line starts with > with one optional space after.


def self.code_fence_pattern : Regex #

Fenced code block.


def self.dummy_pattern : Regex #

A pattern which should never be used.

It just satisfies non-nullability of pattern methods.


def self.empty_pattern : Regex #

The line contains only whitespace or is empty


def self.header_pattern : Regex #

Leading (and trailing) # define atx-style headers.

Starts with 1-6 unescaped # characters which must not be followed by a non-space character. Line may end with any number of # characters.


def self.hr_pattern : Regex #

Three or more hyphens, asterisks or underscores by themselves.

Note that a line like ---- is valid as both HR and SETEXT. In case of a tie, SETEXT should win.


def self.indent_pattern : Regex #

A line indented four spaces. Used for code blocks and lists.


def self.ol_pattern : Regex #

A line starting with a number like 123..

May have up to three leading spaces before the marker and any number of spaces or tabs after.


def self.ol_with_checkbox_pattern : Regex #

Similar to .ol_pattern but with a GitHub style checkbox '[ ]'|'[x]'|'[X]' following the number.

The checkbox will be grabbed by group [5] and .ol_pattern's groups [4], [5], and [6] are all shifted 2 places to be [6], [7], and [8].


def self.ol_with_possible_checkbox_pattern : Regex #

Similar to .ol_with_checkbox_pattern but the checkbox is optional.

TODO This is temporary tech debt. I think we will collapse .ol_pattern and .ol_with_checkbox_pattern into this one pattern.


def self.render_html(nodes : Array(Node)) : String #

Render nodes to HTML.


def self.setext_pattern : Regex #

A series of = or - (on the next line) define setext-style headers.


def self.table_pattern : Regex #

A line of hyphens separated by at least one pipe.


def self.to_html(markdown : String, block_syntaxes = Array(BlockSyntax).new, inline_syntaxes = Array(InlineSyntax).new, extension_set : ExtensionSet | Nil = nil, link_resolver : Resolver | Nil = nil, image_link_resolver : Resolver | Nil = nil, inline_only : Bool = false, encode_html : Bool = true, with_default_block_syntaxes : Bool = true, with_default_inline_syntaxes : Bool = true) : String #

Converts the given string of Markdown to HTML


def self.ul_pattern : Regex #

A line starting with one of these markers: -, *, +.

May have up to three leading spaces before the marker and any number of spaces or tabs after.

Contains a dummy group at [2], so that groups in .ul_pattern and .ol_pattern match up; in both, [2] is the length of the number that begins the list marker.


def self.ul_with_checkbox_pattern : Regex #

Similar to .ul_pattern, but with a GitHub style checkbox '[ ]'|'[x]'|'[X]' following the number.

The checkbox will be grabbed by group [5] and .ul_pattern's groups [4], [5], and [6] are all shifted 2 places to be [6], [7], and [8].


def self.ul_with_possible_checkbox_pattern : Regex #

Similar to .ul_with_checkbox_pattern but the checkbox is optional.

TODO This is temporary tech debt. I think we will collapse .ul_pattern and .ul_with_checkbox_pattern into this one pattern.