class String

Overview

A String represents an immutable sequence of UTF-8 characters.

A String is typically created with a string literal, enclosing UTF-8 characters in double quotes:

"hello world"

See String literals in the language reference.

A backslash can be used to denote some characters inside the string:

"\"" # double quote
"\\" # backslash
"\e" # escape
"\f" # form feed
"\n" # newline
"\r" # carriage return
"\t" # tab
"\v" # vertical tab

You can use a backslash followed by an u and four hexadecimal characters to denote a unicode codepoint written:

"\u0041" # == "A"

Or you can use curly braces and specify up to six hexadecimal numbers (0 to 10FFFF):

"\u{41}" # == "A"

A string can span multiple lines:

"hello
      world" # same as "hello\n      world"

Note that in the above example trailing and leading spaces, as well as newlines, end up in the resulting string. To avoid this, you can split a string into multiple lines by joining multiple literals with a backslash:

"hello " \
"world, " \
"no newlines" # same as "hello world, no newlines"

Alternatively, a backslash followed by a newline can be inserted inside the string literal:

"hello \
     world, \
     no newlines" # same as "hello world, no newlines"

In this case, leading whitespace is not included in the resulting string.

If you need to write a string that has many double quotes, parentheses, or similar characters, you can use alternative literals:

# Supports double quotes and nested parentheses
%(hello ("world")) # same as "hello (\"world\")"

# Supports double quotes and nested brackets
%[hello ["world"]] # same as "hello [\"world\"]"

# Supports double quotes and nested curlies
%{hello {"world"}} # same as "hello {\"world\"}"

# Supports double quotes and nested angles
%<hello <"world">> # same as "hello <\"world\">"

To create a String with embedded expressions, you can use string interpolation:

a = 1
b = 2
"sum = #{a + b}" # "sum = 3"

This ends up invoking Object#to_s(IO) on each expression enclosed by #{...}.

If you need to dynamically build a string, use String#build or IO::Memory.

Non UTF-8 valid strings

A string might end up being composed of bytes which form an invalid byte sequence according to UTF-8. This can happen if the string is created via one of the constructors that accept bytes, or when getting a string from String.build or IO::Memory. No exception will be raised, but every byte that doesn't start a valid UTF-8 byte sequence is interpreted as though it encodes the Unicode replacement character (U+FFFD) by itself. For example:

# here 255 is not a valid byte value in the UTF-8 encoding
string = String.new(Bytes[255, 97])
string.valid_encoding? # => false

# The first char here is the unicode replacement char
string.chars # => ['�', 'a']

One can also create strings with specific byte value in them by using octal and hexadecimal escape sequences:

# Octal escape sequences
"\101" # # => "A"
"\12"  # # => "\n"
"\1"   # string with one character with code point 1
"\377" # string with one byte with value 255

# Hexadecimal escape sequences
"\x41" # # => "A"
"\xFF" # string with one byte with value 255

The reason for allowing strings that don't have a valid UTF-8 sequence is that the world is full of content that isn't properly encoded, and having a program raise an exception or stop because of this is not good. It's better if programs are more resilient, but show a replacement character when there's an error in incoming data.

Note that this interpretation only applies to methods inside Crystal; calling #to_slice or #to_unsafe, e.g. when passing a string to a C library, will expose the invalid UTF-8 byte sequences. In particular, Regex's underlying engine may reject strings that are not valid UTF-8, or it may invoke undefined behavior on invalid strings. If this is undesired, #scrub could be used to remove the offending byte sequences first.

Included Modules

Defined in:

crystal_on_steroids/string/remove.cr
crystal_on_steroids/string/squish.cr
crystal_on_steroids/string/truncation.cr

Instance Method Summary

Instance methods inherited from class Object

in?(another_object) in?, presence presence, presence_in(another_object) presence_in, present? present?, to_param to_param, to_query(namespace)
to_query
to_query

Class methods inherited from class Object

❨╯°□°❩╯︵┻━┻ ❨╯°□°❩╯︵┻━┻

Instance Method Detail

def remove(*patterns) #

Alters the string by removing all occurrences of the patterns.

str = "foo bar test"
str.remove(" test", /bar/)         # => "foo "
str                                # => "foo "

source: Rails ActiveSupport


[View source]
def squish #

Remove first and last whitespace and reduce to one all the others in the same sentence

" foo   bar    \n   \t   boo".squish # => "foo bar boo"

source: Rails ActiveSupport


[View source]
def truncate(truncate_at, options = {} of Symbol => String) #

Truncates a given text after a given size if text is longer than size:

"Once upon a time in a world far far away".truncate(27)
# => "Once upon a time in a wo..."

Pass a string or regexp :separator to truncate text at a natural break:

"Once upon a time in a world far far away".truncate(27, separator: " ")
# => "Once upon a time in a..."

The last characters will be replaced with the :omission string (defaults to "...") for a total size not exceeding size:

"And they found that many people were sleeping better.".truncate(25, omission: "... (continued)")
# => "And they f... (continued)"

source: Rails ActiveSupport


[View source]
def truncate_words(words_count, options = {} of Symbol => String) #

Truncates a given text after a given number of words (words_count):

"Once upon a time in a world far far away".truncate_words(4)
# => "Once upon a time..."

Pass a string :separator to specify a different separator of words:

"Once<br>upon<br>a<br>time<br>in<br>a<br>world".truncate_words(5, separator: "<br>")
# => "Once<br>upon<br>a<br>time<br>in..."

The last characters will be replaced with the :omission string (defaults to "..."):

"And they found that many people were sleeping better.".truncate_words(5, omission: "... (continued)")
# => "And they found that many... (continued)"

source: Rails ActiveSupport


[View source]