class String
Overview
A String
represents an immutable sequence of UTF-8 characters.
A String
is typically created with a string literal, enclosing UTF-8 characters
in double quotes:
"hello world"
See String
literals in the language reference.
A backslash can be used to denote some characters inside the string:
"\"" # double quote
"\\" # backslash
"\e" # escape
"\f" # form feed
"\n" # newline
"\r" # carriage return
"\t" # tab
"\v" # vertical tab
You can use a backslash followed by an u and four hexadecimal characters to denote a unicode codepoint written:
"\u0041" # == "A"
Or you can use curly braces and specify up to six hexadecimal numbers (0 to 10FFFF):
"\u{41}" # == "A"
A string can span multiple lines:
"hello
world" # same as "hello\n world"
Note that in the above example trailing and leading spaces, as well as newlines, end up in the resulting string. To avoid this, you can split a string into multiple lines by joining multiple literals with a backslash:
"hello " \
"world, " \
"no newlines" # same as "hello world, no newlines"
Alternatively, a backslash followed by a newline can be inserted inside the string literal:
"hello \
world, \
no newlines" # same as "hello world, no newlines"
In this case, leading whitespace is not included in the resulting string.
If you need to write a string that has many double quotes, parentheses, or similar characters, you can use alternative literals:
# Supports double quotes and nested parentheses
%(hello ("world")) # same as "hello (\"world\")"
# Supports double quotes and nested brackets
%[hello ["world"]] # same as "hello [\"world\"]"
# Supports double quotes and nested curlies
%{hello {"world"}} # same as "hello {\"world\"}"
# Supports double quotes and nested angles
%<hello <"world">> # same as "hello <\"world\">"
To create a String
with embedded expressions, you can use string interpolation:
a = 1
b = 2
"sum = #{a + b}" # "sum = 3"
This ends up invoking Object#to_s(IO)
on each expression enclosed by #{...}
.
If you need to dynamically build a string, use String#build
or IO::Memory
.
Non UTF-8 valid strings
A string might end up being composed of bytes which form an invalid
byte sequence according to UTF-8. This can happen if the string is created
via one of the constructors that accept bytes, or when getting a string
from String.build
or IO::Memory
. No exception will be raised, but every
byte that doesn't start a valid UTF-8 byte sequence is interpreted as though
it encodes the Unicode replacement character (U+FFFD) by itself. For example:
# here 255 is not a valid byte value in the UTF-8 encoding
string = String.new(Bytes[255, 97])
string.valid_encoding? # => false
# The first char here is the unicode replacement char
string.chars # => ['�', 'a']
One can also create strings with specific byte value in them by using octal and hexadecimal escape sequences:
# Octal escape sequences
"\101" # # => "A"
"\12" # # => "\n"
"\1" # string with one character with code point 1
"\377" # string with one byte with value 255
# Hexadecimal escape sequences
"\x41" # # => "A"
"\xFF" # string with one byte with value 255
The reason for allowing strings that don't have a valid UTF-8 sequence is that the world is full of content that isn't properly encoded, and having a program raise an exception or stop because of this is not good. It's better if programs are more resilient, but show a replacement character when there's an error in incoming data.
Note that this interpretation only applies to methods inside Crystal; calling
#to_slice
or #to_unsafe
, e.g. when passing a string to a C library, will
expose the invalid UTF-8 byte sequences. In particular, Regex
's underlying
engine may reject strings that are not valid UTF-8, or it may invoke undefined
behavior on invalid strings. If this is undesired, #scrub
could be used to
remove the offending byte sequences first.
Included Modules
- Comparable(ReQL::AbstractValue)
- Comparable(String)
Defined in:
reql/executor/abstract_value.crInstance Method Summary
-
#<=>(other : ReQL::AbstractValue)
The comparison operator.
Instance methods inherited from class Object
!=(other : RethinkDB::DSL::R)
!=,
%(other : RethinkDB::DSL::R)
%,
&(other : RethinkDB::DSL::R)
&,
*(other : RethinkDB::DSL::R)
*,
+(other : RethinkDB::DSL::R)
+,
-(other : RethinkDB::DSL::R)
-,
/(other : RethinkDB::DSL::R)
/,
<(other : RethinkDB::DSL::R)
<,
<=(other : RethinkDB::DSL::R)
<=,
==(other : RethinkDB::DSL::R)
==,
>(other : RethinkDB::DSL::R)
>,
>=(other : RethinkDB::DSL::R)
>=,
|(other : RethinkDB::DSL::R)
|
Instance Method Detail
The comparison operator. Returns 0
if the two objects are equal,
a negative number if this object is considered less than other,
a positive number if this object is considered greater than other,
or nil
if the two objects are not comparable.
Subclasses define this method to provide class-specific ordering.
The comparison operator is usually used to sort values:
# Sort in a descending way:
[3, 1, 2].sort { |x, y| y <=> x } # => [3, 2, 1]
# Sort in an ascending way:
[3, 1, 2].sort { |x, y| x <=> y } # => [1, 2, 3]