module Luce
Overview
Parses text in a Markdown-like format building an AST tree that can then be rendered to HTML.
If you are only interested in rendering Markdown to HTML, please refer
to the README which explains the use of Luce.to_html
.
The main entrypoint to the library is the Document
which encapsulates the
parsing process converting a Markdown text into a tree of Node
(Array(Node)
).
Two main parsing mechanics are used:
- Blocks, representing top level elements like: headers, paragraphs, blockquotes,
code blocks, ... implemented via
BlockSyntax
subclasses. - Inlines, representing chunks of test within a block with special meaning, like:
links, emphasis, inlined code, ... implemented via
InlineSyntax
subclasses.
Looking closely at Document.new()
a few other concepts merit a mention:
ExtensionSet
that provides configurations for common Markdown flavorsResolver
which aid in resolving links and images.
If you are looking at extending the library to support custom formatting what you may want is to:
- Implement your own
InlineSyntax
subclasses - Implement your own
BlockSyntax
subclasses - Instruct the library to use those by:
- Creating a new
ExtensionSet
from one of the existing flavors adding your syntaxes - Passing your syntaxes to
Document
orLuce.to_html
as parameters.
- Creating a new
Defined in:
luce.crluce/ast.cr
luce/block_parser.cr
luce/block_syntaxes/block_html_syntax.cr
luce/block_syntaxes/block_syntax.cr
luce/block_syntaxes/block_tag_block_html_syntax.cr
luce/block_syntaxes/blockquote_syntax.cr
luce/block_syntaxes/code_block_syntax.cr
luce/block_syntaxes/dummy_block_syntax.cr
luce/block_syntaxes/empty_block_syntax.cr
luce/block_syntaxes/fenced_blockquote_syntax.cr
luce/block_syntaxes/fenced_code_block_syntax.cr
luce/block_syntaxes/header_syntax.cr
luce/block_syntaxes/header_with_id_syntax.cr
luce/block_syntaxes/horizontal_rule_syntax.cr
luce/block_syntaxes/list_syntax.cr
luce/block_syntaxes/long_block_html_syntax.cr
luce/block_syntaxes/ordered_list_syntax.cr
luce/block_syntaxes/ordered_list_with_checkbox_syntax.cr
luce/block_syntaxes/other_tag_block_html_syntax.cr
luce/block_syntaxes/paragraph_syntax.cr
luce/block_syntaxes/setext_header_syntax.cr
luce/block_syntaxes/setext_header_with_id_syntax.cr
luce/block_syntaxes/table_syntax.cr
luce/block_syntaxes/unordered_list_syntax.cr
luce/block_syntaxes/unordered_list_with_checkbox_syntax.cr
luce/charcode.cr
luce/document.cr
luce/emojis.cr
luce/extension_set.cr
luce/html_renderer.cr
luce/inline_parser.cr
luce/inline_syntaxes/autolink_extension_syntax.cr
luce/inline_syntaxes/autolink_syntax.cr
luce/inline_syntaxes/code_syntax.cr
luce/inline_syntaxes/color_swatch_syntax.cr
luce/inline_syntaxes/delimiter_syntax.cr
luce/inline_syntaxes/email_autolink_syntax.cr
luce/inline_syntaxes/emoji_syntax.cr
luce/inline_syntaxes/emphasis_syntax.cr
luce/inline_syntaxes/escape_syntax.cr
luce/inline_syntaxes/image_syntax.cr
luce/inline_syntaxes/inline_html_syntax.cr
luce/inline_syntaxes/inline_syntax.cr
luce/inline_syntaxes/line_break_syntax.cr
luce/inline_syntaxes/link_syntax.cr
luce/inline_syntaxes/strikethrough_syntax.cr
luce/inline_syntaxes/tag_syntax.cr
luce/inline_syntaxes/text_syntax.cr
luce/legacy_emojis.cr
luce/patterns.cr
luce/util.cr
Constant Summary
-
INDICATOR_FOR_CHECKED_CHECK_BOX =
"\u200B\u200B"
-
Invisible string used to placehold for a checked Checkbox.
DEPRECATED This string is no longer used internally. It will be removed in a future version.
DEPRECATED This string is no longer used internally. It will be removed in a future version.
-
INDICATOR_FOR_UNCHECKED_CHECK_BOX =
"\u200B"
-
Invisible string used to placehold for an unchecked Checkbox.
DEPRECATED This string is no longer used internally. It will be removed in a future version.
DEPRECATED This string is no longer used internally. It will be removed in a future version.
-
VERSION =
"0.3.0"
Class Method Summary
-
.blockquote_fence_pattern : Regex
Fenced blockquotes
-
.blockquote_pattern : Regex
The line starts with
>
with one optional space after. -
.code_fence_pattern : Regex
Fenced code block.
-
.dummy_pattern : Regex
A pattern which should never be used.
-
.empty_pattern : Regex
The line contains only whitespace or is empty
-
.header_pattern : Regex
Leading (and trailing)
#
define atx-style headers. -
.hr_pattern : Regex
Three or more hyphens, asterisks or underscores by themselves.
-
.indent_pattern : Regex
A line indented four spaces.
-
.ol_pattern : Regex
A line starting with a number like
123.
. -
.ol_with_checkbox_pattern : Regex
Similar to
.ol_pattern
but with a GitHub style checkbox'[ ]'|'[x]'|'[X]'
following the number. -
.ol_with_possible_checkbox_pattern : Regex
Similar to
.ol_with_checkbox_pattern
but the checkbox is optional. -
.render_html(nodes : Array(Node)) : String
Render nodes to HTML.
-
.setext_pattern : Regex
A series of
=
or-
(on the next line) define setext-style headers. -
.table_pattern : Regex
A line of hyphens separated by at least one pipe.
-
.to_html(markdown : String, block_syntaxes = Array(BlockSyntax).new, inline_syntaxes = Array(InlineSyntax).new, extension_set : ExtensionSet | Nil = nil, link_resolver : Resolver | Nil = nil, image_link_resolver : Resolver | Nil = nil, inline_only : Bool = false, encode_html : Bool = true, with_default_block_syntaxes : Bool = true, with_default_inline_syntaxes : Bool = true) : String
Converts the given string of Markdown to HTML
-
.ul_pattern : Regex
A line starting with one of these markers:
-
,*
,+
. -
.ul_with_checkbox_pattern : Regex
Similar to
.ul_pattern
, but with a GitHub style checkbox'[ ]'|'[x]'|'[X]'
following the number. -
.ul_with_possible_checkbox_pattern : Regex
Similar to
.ul_with_checkbox_pattern
but the checkbox is optional.
Class Method Detail
A pattern which should never be used.
It just satisfies non-nullability of pattern methods.
Leading (and trailing) #
define atx-style headers.
Starts with 1-6 unescaped #
characters which must not be followed
by a non-space character. Line may end with any number of #
characters.
Three or more hyphens, asterisks or underscores by themselves.
Note that a line like ----
is valid as both HR and SETEXT. In
case of a tie, SETEXT should win.
A line starting with a number like 123.
.
May have up to three leading spaces before the marker and any number of spaces or tabs after.
Similar to .ol_pattern
but with a GitHub style checkbox
'[ ]'|'[x]'|'[X]'
following the number.
The checkbox will be grabbed by group [5]
and .ol_pattern
's groups
[4]
, [5]
, and [6]
are all shifted 2 places to be [6]
, [7]
, and
[8]
.
Similar to .ol_with_checkbox_pattern
but the checkbox is optional.
TODO This is temporary tech debt. I think we will collapse
.ol_pattern
and .ol_with_checkbox_pattern
into this one pattern.
A series of =
or -
(on the next line) define setext-style headers.
Converts the given string of Markdown to HTML
A line starting with one of these markers: -
, *
, +
.
May have up to three leading spaces before the marker and any number of spaces or tabs after.
Contains a dummy group at [2]
, so that groups in .ul_pattern
and
.ol_pattern
match up; in both, [2]
is the length of the number
that begins the list marker.
Similar to .ul_pattern
, but with a GitHub style checkbox
'[ ]'|'[x]'|'[X]'
following the number.
The checkbox will be grabbed by group [5]
and .ul_pattern
's groups
[4]
, [5]
, and [6]
are all shifted 2 places to be [6]
, [7]
, and
[8]
.
Similar to .ul_with_checkbox_pattern
but the checkbox is optional.
TODO This is temporary tech debt. I think we will collapse
.ul_pattern
and .ul_with_checkbox_pattern
into this one pattern.