class Regex
Overview
A Regex
represents a regular expression, a pattern that describes the
contents of strings. A Regex
can determine whether or not a string matches
its description, and extract the parts of the string that match.
A Regex
can be created using the literal syntax, in which it is delimited by
forward slashes (/
):
/hay/ =~ "haystack" # => 0
/y/.match("haystack") # => Regex::MatchData("y")
See Regex
literals in the language reference.
Interpolation works in regular expression literals just as it does in string literals. Be aware that using this feature will cause an exception to be raised at runtime, if the resulting string would not be a valid regular expression.
x = "a"
/#{x}/.match("asdf") # => Regex::MatchData("a")
x = "("
/#{x}/ # raises ArgumentError
When we check to see if a particular regular expression describes a string, we can say that we are performing a match or matching one against the other. If we find that a regular expression does describe a string, we say that it matches, and we can refer to a part of the string that was described as a match.
Here "haystack"
does not contain the pattern /needle/
, so it doesn't match:
/needle/.match("haystack") # => nil
Here "haystack"
contains the pattern /hay/
, so it matches:
/hay/.match("haystack") # => Regex::MatchData("hay")
Regex methods that perform a match usually return a truthy value if there was
a match and nil
if there was no match. After performing a match, the
special variable $~
will be an instance of Regex::MatchData
if it matched, nil
otherwise.
When matching a regular expression using #=~
(either String#=~
or
Regex#=~
), the returned value will be the index of the first match in the
string if the expression matched, nil
otherwise.
/stack/ =~ "haystack" # => 3
"haystack" =~ /stack/ # => 3
$~ # => Regex::MatchData("stack")
/needle/ =~ "haystack" # => nil
"haystack" =~ /needle/ # => nil
$~ # raises Exception
When matching a regular expression using #match
(either String#match
or
Regex#match
), the returned value will be a Regex::MatchData
if the expression
matched, nil
otherwise.
/hay/.match("haystack") # => Regex::MatchData("hay")
"haystack".match(/hay/) # => Regex::MatchData("hay")
$~ # => Regex::MatchData("hay")
/needle/.match("haystack") # => nil
"haystack".match(/needle/) # => nil
$~ # raises Exception
Regular expressions have their own language for describing strings.
Many programming languages and tools implement their own regular expression language, but Crystal uses PCRE2, a popular C library, with JIT compilation enabled for providing regular expressions. Here give a brief summary of the most basic features of regular expressions - grouping, repetition, and alternation - but the feature set of PCRE2 extends far beyond these, and we don't attempt to describe it in full here. For more information, refer to the PCRE2 documentation, especially the full pattern syntax or syntax quick reference.
NOTE Prior to Crystal 1.8 the compiler expected regex literals to follow the original PCRE pattern syntax. The following summary applies to both PCRE and PCRE2.
The regular expression language can be used to match much more than just the
static substrings in the above examples. Certain characters, called
metacharacters,
are given special treatment in regular expressions, and can be used to
describe more complex patterns. To match metacharacters literally in a
regular expression, they must be escaped by being preceded with a backslash
(\
). .escape
will do this automatically for a given String.
A group of characters (often called a capture group or
subpattern)
can be identified by enclosing it in parentheses (()
). The contents of
each capture group can be extracted on a successful match:
/a(sd)f/.match("_asdf_") # => Regex::MatchData("asdf" 1:"sd")
/a(sd)f/.match("_asdf_").try &.[1] # => "sd"
/a(?<grp>sd)f/.match("_asdf_") # => Regex::MatchData("asdf" grp:"sd")
/a(?<grp>sd)f/.match("_asdf_").try &.["grp"] # => "sd"
Capture groups are indexed starting from 1. Methods that accept a capture
group index will usually also accept 0 to refer to the full match. Capture
groups can also be given names, using the (?<name>...)
syntax, as in the
previous example.
Following a match, the special variables $N (e.g., $1, $2, $3, ...) can be used to access a capture group. Trying to access an invalid capture group will raise an exception. Note that it is possible to have a successful match with a nil capture:
/(spice)(s)?/.match("spice") # => Regex::MatchData("spice" 1:"spice" 2:nil)
$1 # => "spice"
$2 # => raises Exception
This can be mitigated by using the nilable version of the above: $N?,
(e.g., $1? $2?, $3?, ...). Changing the above to use $2?
instead of $2
would return nil
. $2?.nil?
would return true
.
A character or group can be
repeated
or made optional using an asterisk (*
- zero or more), a plus sign
(#+
- one or more), integer bounds in curly braces
({n,m}
) (at least n
, no more than m
), or a question mark
(?
) (zero or one).
/fo*/.match("_f_") # => Regex::MatchData("f")
/fo+/.match("_f_") # => nil
/fo*/.match("_foo_") # => Regex::MatchData("foo")
/fo{3,}/.match("_foo_") # => nil
/fo{1,3}/.match("_foo_") # => Regex::MatchData("foo")
/fo*/.match("_foo_") # => Regex::MatchData("foo")
/fo*/.match("_foooooooo_") # => Regex::MatchData("foooooooo")
/fo{,3}/.match("_foooo_") # => nil
/f(op)*/.match("fopopo") # => Regex::MatchData("fopop" 1:"op")
/foo?bar/.match("foobar") # => Regex::MatchData("foobar")
/foo?bar/.match("fobar") # => Regex::MatchData("fobar")
Alternatives can be separated using a
vertical bar
(|
). Any single character can be represented by
dot
(.
). When matching only one character, specific
alternatives can be expressed as a
character class,
enclosed in square brackets ([]
):
/foo|bar/.match("foo") # => Regex::MatchData("foo")
/foo|bar/.match("bar") # => Regex::MatchData("bar")
/_(x|y)_/.match("_x_") # => Regex::MatchData("_x_" 1:"x")
/_(x|y)_/.match("_y_") # => Regex::MatchData("_y_" 1:"y")
/_(x|y)_/.match("_(x|y)_") # => nil
/_(x|y)_/.match("_(x|y)_") # => nil
/_._/.match("_x_") # => Regex::MatchData("_x_")
/_[xyz]_/.match("_x_") # => Regex::MatchData("_x_")
/_[a-z]_/.match("_x_") # => Regex::MatchData("_x_")
/_[^a-z]_/.match("_x_") # => nil
/_[^a-wy-z]_/.match("_x_") # => Regex::MatchData("_x_")
Regular expressions can be defined with these 3 optional flags:
i
: ignore case (Regex::Options::IGNORE_CASE
)m
: multiline (Regex::Options::MULTILINE
)x
: extended (Regex::Options::EXTENDED
)
/asdf/ =~ "ASDF" # => nil
/asdf/i =~ "ASDF" # => 0
/^z/i =~ "ASDF\nZ" # => nil
/^z/im =~ "ASDF\nZ" # => 5
PCRE2 supports other encodings, but Crystal strings are UTF-8 only, so Crystal regular expressions are also UTF-8 only (by default).
PCRE2 optionally permits named capture groups (named subpatterns) to not be
unique. Crystal exposes the name table of a Regex
as a
Hash
of String
=> Int32
, and therefore requires named capture groups to have
unique names within a single Regex
.
Included Modules
- Regex::PCRE2
Defined in:
json/any.crregex.cr
regex/match_data.cr
yaml/any.cr
Constant Summary
-
SPECIAL_CHARACTERS =
{' ', '.', '\\', '+', '*', '?', '[', '^', ']', '$', '(', ')', '{', '}', '=', '!', '<', '>', '|', ':', '-'}
-
List of metacharacters that need to be escaped.
See
Regex.needs_escape?
andRegex.escape
.
Constructors
-
.literal(pattern : String, *, i : Bool = false, m : Bool = false, x : Bool = false) : self
Creates a new
Regex
instance from a literal consisting of a pattern and the named parameter modifiers. - .new(source : String, options : Options = Options::None)
-
.union(patterns : Enumerable(Regex | String)) : self
Union.
-
.union(*patterns : Regex | String) : self
Union.
Class Method Summary
-
.error?(source) : String | Nil
Determines Regex's source validity.
-
.escape(str) : String
Returns a
String
constructed by escaping any metacharacters in str. -
.needs_escape?(char : Char) : Bool
Returns
true
if char need to be escaped,false
otherwise. -
.needs_escape?(str : String) : Bool
Returns
true
if str need to be escaped,false
otherwise. -
.supports_compile_options?(options : CompileOptions) : Bool
Returns
true
if the regex engine supports all options flags when compiling a pattern. -
.supports_match_options?(options : MatchOptions) : Bool
Returns
true
if the regex engine supports all options flags when matching a pattern.
Instance Method Summary
-
#+(other) : Regex
Union.
-
#==(other : Regex)
Equality.
-
#===(other : String)
Case equality.
- #===(other : JSON::Any)
- #===(other : YAML::Any)
-
#=~(other : String) : Int32 | Nil
Match.
-
#=~(other) : Nil
Match.
-
#capture_count : Int32
Returns the number of (named & non-named) capture groups.
- #clone
-
#dup
Returns a shallow copy of this object.
- #hash(hasher)
-
#inspect(io : IO) : Nil
Convert to
String
in literal format. -
#match(str : String, pos : Int32 = 0, options : Regex::MatchOptions = :none) : MatchData | Nil
Match at character index.
-
#match(str, pos, _options) : MatchData | Nil
Match at character index.
DEPRECATED Use the overload with
Regex::MatchOptions
instead. -
#match(str, pos = 0, *, options) : MatchData | Nil
Match at character index.
DEPRECATED Use the overload with
Regex::MatchOptions
instead. -
#match!(str : String, pos : Int32 = 0, *, options : Regex::MatchOptions = :none) : MatchData
Matches a regular expression against str.
-
#match_at_byte_index(str : String, byte_index : Int32 = 0, options : Regex::MatchOptions = :none) : MatchData | Nil
Match at byte index.
-
#match_at_byte_index(str, byte_index, _options) : MatchData | Nil
Match at byte index.
DEPRECATED Use the overload with
Regex::MatchOptions
instead. -
#match_at_byte_index(str, byte_index = 0, *, options) : MatchData | Nil
Match at byte index.
DEPRECATED Use the overload with
Regex::MatchOptions
instead. -
#matches?(str : String, pos : Int32 = 0, options : Regex::MatchOptions = :none) : Bool
Match at character index.
-
#matches?(str, pos, _options) : Bool
Match at character index.
DEPRECATED Use the overload with
Regex::MatchOptions
instead. -
#matches?(str, pos = 0, *, options) : Bool
Match at character index.
DEPRECATED Use the overload with
Regex::MatchOptions
instead. -
#matches_at_byte_index?(str : String, byte_index : Int32 = 0, options : Regex::MatchOptions = :none) : Bool
Match at byte index.
-
#matches_at_byte_index?(str, byte_index, _options) : Bool
Match at byte index.
DEPRECATED Use the overload with
Regex::MatchOptions
instead. -
#matches_at_byte_index?(str, byte_index = 0, *, options) : Bool
Match at byte index.
DEPRECATED Use the overload with
Regex::MatchOptions
instead. -
#name_table : Hash(Int32, String)
Returns a
Hash
where the values are the names of capture groups and the keys are their indexes. -
#options : Options
Returns a
Regex::CompileOptions
representing the optional flags applied to thisRegex
. - #source : String
-
#to_s(io : IO) : Nil
Convert to
String
in subpattern format.
Instance methods inherited from module Regex::PCRE2
finalize
finalize
Class methods inherited from module Regex::PCRE2
jit_stack
jit_stack,
match_context : Pointer(LibPCRE2::MatchContext)
match_context,
supports_compile_flag?(options)
supports_compile_flag?,
supports_match_flag?(options)
supports_match_flag?,
version : String
version,
version_number : Tuple(Int32, Int32)
version_number
Instance methods inherited from class Reference
==(other : self)==(other : JSON::Any)
==(other : YAML::Any)
==(other) ==, dup dup, hash(hasher) hash, initialize initialize, inspect(io : IO) : Nil inspect, object_id : UInt64 object_id, pretty_print(pp) : Nil pretty_print, same?(other : Reference) : Bool
same?(other : Nil) same?, to_s(io : IO) : Nil to_s
Constructor methods inherited from class Reference
new
new
Instance methods inherited from class Object
! : Bool
!,
!=(other)
!=,
!~(other)
!~,
==(other)
==,
===(other : JSON::Any)===(other : YAML::Any)
===(other) ===, =~(other) =~, as(type : Class) as, as?(type : Class) as?, class class, dup dup, hash(hasher)
hash hash, in?(collection : Object) : Bool
in?(*values : Object) : Bool in?, inspect(io : IO) : Nil
inspect : String inspect, is_a?(type : Class) : Bool is_a?, itself itself, nil? : Bool nil?, not_nil!(message)
not_nil! not_nil!, pretty_inspect(width = 79, newline = "\n", indent = 0) : String pretty_inspect, pretty_print(pp : PrettyPrint) : Nil pretty_print, responds_to?(name : Symbol) : Bool responds_to?, tap(&) tap, to_json(io : IO) : Nil
to_json : String to_json, to_pretty_json(indent : String = " ") : String
to_pretty_json(io : IO, indent : String = " ") : Nil to_pretty_json, to_s(io : IO) : Nil
to_s : String to_s, to_yaml(io : IO) : Nil
to_yaml : String to_yaml, try(&) try, unsafe_as(type : T.class) forall T unsafe_as
Class methods inherited from class Object
from_json(string_or_io, root : String)from_json(string_or_io) from_json, from_yaml(string_or_io : String | IO) from_yaml
Constructor Detail
Creates a new Regex
instance from a literal consisting of a pattern and the named parameter modifiers.
Creates a new Regex
out of the given source String
.
Regex.new("^a-z+:\\s+\\w+") # => /^a-z+:\s+\w+/
Regex.new("cat", Regex::CompileOptions::IGNORE_CASE) # => /cat/i
options = Regex::CompileOptions::IGNORE_CASE | Regex::CompileOptions::EXTENDED
Regex.new("dog", options) # => /dog/ix
Union. Returns a Regex
that matches any of patterns.
All capture groups in the patterns after the first one will have their indexes offset.
re = Regex.union([/skiing/i, "sledding"])
re.match("Skiing") # => Regex::MatchData("Skiing")
re.match("sledding") # => Regex::MatchData("sledding")
re = Regex.union({/skiing/i, "sledding"})
re.match("Skiing") # => Regex::MatchData("Skiing")
re.match("sledding") # => Regex::MatchData("sledding")
Union. Returns a Regex
that matches any of patterns.
All capture groups in the patterns after the first one will have their indexes offset.
re = Regex.union(/skiing/i, "sledding")
re.match("Skiing") # => Regex::MatchData("Skiing")
re.match("sledding") # => Regex::MatchData("sledding")
Class Method Detail
Determines Regex's source validity. If it is, nil
is returned.
If it's not, a String
containing the error message is returned.
Regex.error?("(foo|bar)") # => nil
Regex.error?("(foo|bar") # => "missing ) at 8"
Returns a String
constructed by escaping any metacharacters in str.
string = Regex.escape("*?{}.") # => "\\*\\?\\{\\}\\."
/#{string}/ # => /\*\?\{\}\./
Returns true
if char need to be escaped, false
otherwise.
Regex.needs_escape?('*') # => true
Regex.needs_escape?('@') # => false
Returns true
if str need to be escaped, false
otherwise.
Regex.needs_escape?("10$") # => true
Regex.needs_escape?("foo") # => false
Returns true
if the regex engine supports all options flags when compiling a pattern.
Returns true
if the regex engine supports all options flags when matching a pattern.
Instance Method Detail
Union. Returns a Regex
that matches either of the operands.
All capture groups in the second operand will have their indexes offset.
re = /skiing/i + /sledding/
re.match("Skiing") # => Regex::MatchData("Skiing")
re.match("sledding") # => Regex::MatchData("sledding")
Equality. Two regexes are equal if their sources and options are the same.
/abc/ == /abc/i # => false
/abc/i == /ABC/i # => false
/abc/i == /abc/i # => true
Case equality. This is equivalent to #match
or #=~
but only returns
true
or false
. Used in case
expressions. The special variable
$~
will contain a Regex::MatchData
if there was a match, nil
otherwise.
a = "HELLO"
b = case a
when /^[a-z]*$/
"Lower case"
when /^[A-Z]*$/
"Upper case"
else
"Mixed case"
end
b # => "Upper case"
Match. Matches a regular expression against other and returns
the starting position of the match if other is a matching String
,
otherwise nil
. $~
will contain a Regex::MatchData
if there was a match,
nil
otherwise.
/at/ =~ "input data" # => 7
/ax/ =~ "input data" # => nil
Match. When the argument is not a String
, always returns nil
.
/at/ =~ "input data" # => 7
/ax/ =~ "input data" # => nil
Returns the number of (named & non-named) capture groups.
/(?:.+)/.capture_count # => 0
/(?<foo>.+)/.capture_count # => 1
/(.)/.capture_count # => 1
/(.)|(.)/.capture_count # => 2
Returns a shallow copy of this object.
This allocates a new object and copies the contents of
self
into it.
Convert to String
in literal format. Returns the source as a String
in
Regex literal format, delimited in forward slashes (/
), with any
optional flags included.
/ab+c/ix.inspect # => "/ab+c/ix"
Match at character index. Matches a regular expression against String
str. Starts at the character index given by pos if given, otherwise at
the start of str. Returns a Regex::MatchData
if str matched, otherwise
nil
. $~
will contain the same value that was returned.
/(.)(.)(.)/.match("abc").try &.[2] # => "b"
/(.)(.)/.match("abc", 1).try &.[2] # => "c"
/(.)(.)/.match("クリスタル", 3).try &.[2] # => "ル"
Match at character index. Matches a regular expression against String
str. Starts at the character index given by pos if given, otherwise at
the start of str. Returns a Regex::MatchData
if str matched, otherwise
nil
. $~
will contain the same value that was returned.
/(.)(.)(.)/.match("abc").try &.[2] # => "b"
/(.)(.)/.match("abc", 1).try &.[2] # => "c"
/(.)(.)/.match("クリスタル", 3).try &.[2] # => "ル"
DEPRECATED Use the overload with Regex::MatchOptions
instead.
Match at character index. Matches a regular expression against String
str. Starts at the character index given by pos if given, otherwise at
the start of str. Returns a Regex::MatchData
if str matched, otherwise
nil
. $~
will contain the same value that was returned.
/(.)(.)(.)/.match("abc").try &.[2] # => "b"
/(.)(.)/.match("abc", 1).try &.[2] # => "c"
/(.)(.)/.match("クリスタル", 3).try &.[2] # => "ル"
DEPRECATED Use the overload with Regex::MatchOptions
instead.
Matches a regular expression against str. This starts at the character
index pos if given, otherwise at the start of str. Returns a Regex::MatchData
if str matched, otherwise raises Regex::Error
. $~
will contain the same value
if matched.
/(.)(.)(.)/.match!("abc")[2] # => "b"
/(.)(.)/.match!("abc", 1)[2] # => "c"
/(.)(タ)/.match!("クリスタル", 3)[2] # raises Exception
Match at byte index. Matches a regular expression against String
str. Starts at the byte index given by pos if given, otherwise at
the start of str. Returns a Regex::MatchData
if str matched, otherwise
nil
. $~
will contain the same value that was returned.
/(.)(.)(.)/.match_at_byte_index("abc").try &.[2] # => "b"
/(.)(.)/.match_at_byte_index("abc", 1).try &.[2] # => "c"
/(.)(.)/.match_at_byte_index("クリスタル", 3).try &.[2] # => "ス"
Match at byte index. Matches a regular expression against String
str. Starts at the byte index given by pos if given, otherwise at
the start of str. Returns a Regex::MatchData
if str matched, otherwise
nil
. $~
will contain the same value that was returned.
/(.)(.)(.)/.match_at_byte_index("abc").try &.[2] # => "b"
/(.)(.)/.match_at_byte_index("abc", 1).try &.[2] # => "c"
/(.)(.)/.match_at_byte_index("クリスタル", 3).try &.[2] # => "ス"
DEPRECATED Use the overload with Regex::MatchOptions
instead.
Match at byte index. Matches a regular expression against String
str. Starts at the byte index given by pos if given, otherwise at
the start of str. Returns a Regex::MatchData
if str matched, otherwise
nil
. $~
will contain the same value that was returned.
/(.)(.)(.)/.match_at_byte_index("abc").try &.[2] # => "b"
/(.)(.)/.match_at_byte_index("abc", 1).try &.[2] # => "c"
/(.)(.)/.match_at_byte_index("クリスタル", 3).try &.[2] # => "ス"
DEPRECATED Use the overload with Regex::MatchOptions
instead.
Match at character index. It behaves like #match
, however it returns Bool
value.
It neither returns MatchData
nor assigns it to the $~
variable.
/foo/.matches?("bar") # => false
/foo/.matches?("foo") # => true
# `$~` is not set even if last match succeeds.
$~ # raises Exception
Match at character index. It behaves like #match
, however it returns Bool
value.
It neither returns MatchData
nor assigns it to the $~
variable.
/foo/.matches?("bar") # => false
/foo/.matches?("foo") # => true
# `$~` is not set even if last match succeeds.
$~ # raises Exception
DEPRECATED Use the overload with Regex::MatchOptions
instead.
Match at character index. It behaves like #match
, however it returns Bool
value.
It neither returns MatchData
nor assigns it to the $~
variable.
/foo/.matches?("bar") # => false
/foo/.matches?("foo") # => true
# `$~` is not set even if last match succeeds.
$~ # raises Exception
DEPRECATED Use the overload with Regex::MatchOptions
instead.
Match at byte index. It behaves like #match_at_byte_index
, however it returns Bool
value.
It neither returns MatchData
nor assigns it to the $~
variable.
Match at byte index. It behaves like #match_at_byte_index
, however it returns Bool
value.
It neither returns MatchData
nor assigns it to the $~
variable.
DEPRECATED Use the overload with Regex::MatchOptions
instead.
Match at byte index. It behaves like #match_at_byte_index
, however it returns Bool
value.
It neither returns MatchData
nor assigns it to the $~
variable.
DEPRECATED Use the overload with Regex::MatchOptions
instead.
Returns a Hash
where the values are the names of capture groups and the
keys are their indexes. Non-named capture groups will not have entries in
the Hash
. Capture groups are indexed starting from 1
.
/(.)/.name_table # => {}
/(?<foo>.)/.name_table # => {1 => "foo"}
/(?<foo>.)(?<bar>.)/.name_table # => {2 => "bar", 1 => "foo"}
/(.)(?<foo>.)(.)(?<bar>.)(.)/.name_table # => {4 => "bar", 2 => "foo"}
Returns a Regex::CompileOptions
representing the optional flags applied to this Regex
.
/ab+c/ix.options # => Regex::CompileOptions::IGNORE_CASE | Regex::CompileOptions::EXTENDED
/ab+c/ix.options.to_s # => "IGNORE_CASE | EXTENDED"
Convert to String
in subpattern format. Produces a String
which can be
embedded in another Regex
via interpolation, where it will be interpreted
as a non-capturing subexpression in another regular expression.
re = /A*/i # => /A*/i
re.to_s # => "(?i-msx:A*)"
"Crystal".match(/t#{re}l/) # => Regex::MatchData("tal")
re = /A*/ # => "(?-imsx:A*)"
"Crystal".match(/t#{re}l/) # => nil