class String
- String
- Reference
- Object
Overview
A String
represents an immutable sequence of UTF-8 characters.
A String
is typically created with a string literal, enclosing UTF-8 characters
in double quotes:
"hello world"
See String
literals in the language reference.
A backslash can be used to denote some characters inside the string:
"\"" # double quote
"\\" # backslash
"\e" # escape
"\f" # form feed
"\n" # newline
"\r" # carriage return
"\t" # tab
"\v" # vertical tab
You can use a backslash followed by an u and four hexadecimal characters to denote a unicode codepoint written:
"\u0041" # == "A"
Or you can use curly braces and specify up to six hexadecimal numbers (0 to 10FFFF):
"\u{41}" # == "A"
A string can span multiple lines:
"hello
world" # same as "hello\n world"
Note that in the above example trailing and leading spaces, as well as newlines, end up in the resulting string. To avoid this, you can split a string into multiple lines by joining multiple literals with a backslash:
"hello " \
"world, " \
"no newlines" # same as "hello world, no newlines"
Alternatively, a backslash followed by a newline can be inserted inside the string literal:
"hello \
world, \
no newlines" # same as "hello world, no newlines"
In this case, leading whitespace is not included in the resulting string.
If you need to write a string that has many double quotes, parentheses, or similar characters, you can use alternative literals:
# Supports double quotes and nested parentheses
%(hello ("world")) # same as "hello (\"world\")"
# Supports double quotes and nested brackets
%[hello ["world"]] # same as "hello [\"world\"]"
# Supports double quotes and nested curlies
%{hello {"world"}} # same as "hello {\"world\"}"
# Supports double quotes and nested angles
%<hello <"world">> # same as "hello <\"world\">"
To create a String
with embedded expressions, you can use string interpolation:
a = 1
b = 2
"sum = #{a + b}" # "sum = 3"
This ends up invoking Object#to_s(IO)
on each expression enclosed by #{...}
.
If you need to dynamically build a string, use String#build
or IO::Memory
.
Non UTF-8 valid strings
A string might end up being composed of bytes which form an invalid
byte sequence according to UTF-8. This can happen if the string is created
via one of the constructors that accept bytes, or when getting a string
from String.build
or IO::Memory
. No exception will be raised, but every
byte that doesn't start a valid UTF-8 byte sequence is interpreted as though
it encodes the Unicode replacement character (U+FFFD) by itself. For example:
# here 255 is not a valid byte value in the UTF-8 encoding
string = String.new(Bytes[255, 97])
string.valid_encoding? # => false
# The first char here is the unicode replacement char
string.chars # => ['�', 'a']
One can also create strings with specific byte value in them by using octal and hexadecimal escape sequences:
# Octal escape sequences
"\101" # # => "A"
"\12" # # => "\n"
"\1" # string with one character with code point 1
"\377" # string with one byte with value 255
# Hexadecimal escape sequences
"\x41" # # => "A"
"\xFF" # string with one byte with value 255
The reason for allowing strings that don't have a valid UTF-8 sequence is that the world is full of content that isn't properly encoded, and having a program raise an exception or stop because of this is not good. It's better if programs are more resilient, but show a replacement character when there's an error in incoming data.
Note that this interpretation only applies to methods inside Crystal; calling
#to_slice
or #to_unsafe
, e.g. when passing a string to a C library, will
expose the invalid UTF-8 byte sequences. In particular, Regex
's underlying
engine may reject strings that are not valid UTF-8, or it may invoke undefined
behavior on invalid strings. If this is undesired, #scrub
could be used to
remove the offending byte sequences first.
Included Modules
- Comparable(String)
Defined in:
luce/util.crInstance Method Summary
-
#dedent(length : Int32 = 4) : Luce::DedentedText
Removes up to length characters of leading whitespace.
DEPRECATED Luce is removing its custom extensions. Use Luce.dedent_string(String, Int32) instead. Will be removed with Luce v1.0
-
#indentation : Int32
Calculates the length of indentation a
String
has.DEPRECATED Luce is removing its custom extensions. Use
Luce.string_indentation(String)
instead. Will be removed with Luce v1.0 -
#last(n : Int32 = 1)
DEPRECATED Luce is removing its custom extensions. Use String#[-1] (default behaviour) or String#[String#size - n..] instead. Will be removed with Luce v1.0
-
#prepend_space(width : Int32) : String
Adds width of spaces to the beginning of this string.
DEPRECATED Luce is removing its custom extensions. Use
String#insert(0, " " * width)
instead. Will be removed with Luce v1.0 -
#replace_all_mapped(pattern : Regex, replace : Proc(Regex::MatchData, String)) : String
Replace all substrings that match pattern by a computed string.
DEPRECATED Luce is removing its custom extensions. Use
String#gsub(Regex, &)
instead. Will be removed with Luce v1.0 -
#split_map_join(pattern : Regex, on_match : Proc(Regex::MatchData, String) | Nil = nil, on_non_match : Proc(String, String) | Nil = nil) : String
Split's the string, converts its parts, and combines them into a new string.
DEPRECATED Luce is removing its custom extensions. Use
Luce.string_split_map_join
instead. Will be removed with Luce v1.0 -
#to_lines : Array(Luce::Line)
Converts this string to an array of
Luce::Line
.DEPRECATED Luce is removing its custom extensions. Use `String#lines.map { |line| Luce::Line.new(line) } instead. Will be removed with Luce v1.0
Instance Method Detail
Removes up to length characters of leading whitespace.
DEPRECATED Luce is removing its custom extensions. Use Luce.dedent_string(String, Int32) instead. Will be removed with Luce v1.0
Calculates the length of indentation a String
has.
The behaviour of tabs: https://spec.commonmark.org/0.30/#tabs
DEPRECATED Luce is removing its custom extensions. Use Luce.string_indentation(String)
instead. Will be removed with Luce v1.0
DEPRECATED Luce is removing its custom extensions. Use String#[-1] (default behaviour) or String#[String#size - n..] instead. Will be removed with Luce v1.0
Adds width of spaces to the beginning of this string.
DEPRECATED Luce is removing its custom extensions. Use String#insert(0, " " * width)
instead. Will be removed with Luce v1.0
Replace all substrings that match pattern by a computed string.
Creates a new string in which the non-overlapping substrings that match
pattern (the ones iterated by pattern.all_matches(self)
) are replaced
by the result of calling replace on the corresponding Regex::MatchData
object.
DEPRECATED Luce is removing its custom extensions. Use String#gsub(Regex, &)
instead. Will be removed with Luce v1.0
Split's the string, converts its parts, and combines them into a new string.
DEPRECATED Luce is removing its custom extensions. Use Luce.string_split_map_join
instead. Will be removed with Luce v1.0
Converts this string to an array of Luce::Line
.
DEPRECATED Luce is removing its custom extensions. Use `String#lines.map { |line| Luce::Line.new(line) } instead. Will be removed with Luce v1.0