class String
- String
- Reference
- Object
Overview
A String
represents an immutable sequence of UTF-8 characters.
A String
is typically created with a string literal, enclosing UTF-8 characters
in double quotes:
"hello world"
See String
literals in the language reference.
A backslash can be used to denote some characters inside the string:
"\"" # double quote
"\\" # backslash
"\e" # escape
"\f" # form feed
"\n" # newline
"\r" # carriage return
"\t" # tab
"\v" # vertical tab
You can use a backslash followed by an u and four hexadecimal characters to denote a unicode codepoint written:
"\u0041" # == "A"
Or you can use curly braces and specify up to six hexadecimal numbers (0 to 10FFFF):
"\u{41}" # == "A"
A string can span multiple lines:
"hello
world" # same as "hello\n world"
Note that in the above example trailing and leading spaces, as well as newlines, end up in the resulting string. To avoid this, you can split a string into multiple lines by joining multiple literals with a backslash:
"hello " \
"world, " \
"no newlines" # same as "hello world, no newlines"
Alternatively, a backslash followed by a newline can be inserted inside the string literal:
"hello \
world, \
no newlines" # same as "hello world, no newlines"
In this case, leading whitespace is not included in the resulting string.
If you need to write a string that has many double quotes, parentheses, or similar characters, you can use alternative literals:
# Supports double quotes and nested parentheses
%(hello ("world")) # same as "hello (\"world\")"
# Supports double quotes and nested brackets
%[hello ["world"]] # same as "hello [\"world\"]"
# Supports double quotes and nested curlies
%{hello {"world"}} # same as "hello {\"world\"}"
# Supports double quotes and nested angles
%<hello <"world">> # same as "hello <\"world\">"
To create a String
with embedded expressions, you can use string interpolation:
a = 1
b = 2
"sum = #{a + b}" # "sum = 3"
This ends up invoking Object#to_s(IO)
on each expression enclosed by #{...}
.
If you need to dynamically build a string, use String#build
or IO::Memory
.
Non UTF-8 valid strings
A string might end up being composed of bytes which form an invalid
byte sequence according to UTF-8. This can happen if the string is created
via one of the constructors that accept bytes, or when getting a string
from String.build
or IO::Memory
. No exception will be raised, but every
byte that doesn't start a valid UTF-8 byte sequence is interpreted as though
it encodes the Unicode replacement character (U+FFFD) by itself. For example:
# here 255 is not a valid byte value in the UTF-8 encoding
string = String.new(Bytes[255, 97])
string.valid_encoding? # => false
# The first char here is the unicode replacement char
string.chars # => ['�', 'a']
One can also create strings with specific byte value in them by using octal and hexadecimal escape sequences:
# Octal escape sequences
"\101" # # => "A"
"\12" # # => "\n"
"\1" # string with one character with code point 1
"\377" # string with one byte with value 255
# Hexadecimal escape sequences
"\x41" # # => "A"
"\xFF" # string with one byte with value 255
The reason for allowing strings that don't have a valid UTF-8 sequence is that the world is full of content that isn't properly encoded, and having a program raise an exception or stop because of this is not good. It's better if programs are more resilient, but show a replacement character when there's an error in incoming data.
Note that this interpretation only applies to methods inside Crystal; calling
#to_slice
or #to_unsafe
, e.g. when passing a string to a C library, will
expose the invalid UTF-8 byte sequences. In particular, Regex
's underlying
engine may reject strings that are not valid UTF-8, or it may invoke undefined
behavior on invalid strings. If this is undesired, #scrub
could be used to
remove the offending byte sequences first.
Included Modules
- Comparable(String)
Defined in:
inflector/string.crConstant Summary
-
SQUISH_REGEX =
/[[:space:]]+/
-
Performs a destructive squish. See String#squish. str = " foo bar \n \t boo" str.squish! # => "foo bar boo" str # => "foo bar boo"
Instance Method Summary
-
#as_underscore
The reverse of +camelize+.
-
#blank?
A string is blank if it's empty or contains whitespaces only:
-
#camelize(first_letter = :upper)
By default, +camelize+ converts strings to UpperCamelCase.
-
#classify
Creates a class name from a plural table name like Rails does for table names to models.
-
#dasherize
Replaces underscores with dashes in the string.
-
#deconstantize
Removes the rightmost segment from the constant expression in the string.
-
#demodulize
Removes the module part from the constant expression in the string.
-
#foreign_key(separate_class_name_and_id_with_underscore = true)
Creates a foreign key name from a class name.
-
#humanize(capitalize = true)
Capitalizes the first word, turns underscores into spaces, and strips a trailing '_id' if present.
-
#pluralize(count = nil, locale = :en)
Returns the plural form of the word in the string.
-
#singularize(locale = :en)
The reverse of +pluralize+, returns the singular form of a word in a string.
- #squish
-
#tableize
Creates the name of a table like Rails does for models to table names.
- #titlecase
-
#titleize
Capitalizes all the words and replaces some characters in the string to create a nicer looking title.
-
#upcase_first
Converts just the first character to uppercase.
Instance Method Detail
The reverse of +camelize+. Makes an underscored, lowercase form from the expression in the string.
+underscore+ will also change '::' to '/' to convert namespaces to paths.
'ActiveModel'.underscore # => "active_model" 'ActiveModel::Errors'.underscore # => "active_model/errors"
A string is blank if it's empty or contains whitespaces only:
''.blank? # => true ' '.blank? # => true "\t\n\r".blank? # => true ' blah '.blank? # => false
Unicode whitespace is supported:
"\u00a0".blank? # => true
@return [true, false]
By default, +camelize+ converts strings to UpperCamelCase. If the argument to camelize is set to :lower then camelize produces lowerCamelCase.
+camelize+ will also convert '/' to '::' which is useful for converting paths to namespaces.
'active_record'.camelize # => "ActiveRecord" 'active_record'.camelize(:lower) # => "activeRecord" 'active_record/errors'.camelize # => "ActiveRecord::Errors" 'active_record/errors'.camelize(:lower) # => "activeRecord::Errors"
Creates a class name from a plural table name like Rails does for table names to models. Note that this returns a string and not a class. (To convert to an actual class follow +classify+ with +constantize+.)
'ham_and_eggs'.classify # => "HamAndEgg" 'posts'.classify # => "Post"
Replaces underscores with dashes in the string.
'puni_puni'.dasherize # => "puni-puni"
Removes the rightmost segment from the constant expression in the string.
'Net::HTTP'.deconstantize # => "Net" '::Net::HTTP'.deconstantize # => "::Net" 'String'.deconstantize # => "" '::String'.deconstantize # => "" ''.deconstantize # => ""
See also +demodulize+.
Removes the module part from the constant expression in the string.
'ActiveRecord::CoreExtensions::String::Inflections'.demodulize # => "Inflections" 'Inflections'.demodulize # => "Inflections" '::Inflections'.demodulize # => "Inflections" ''.demodulize # => ""
See also +deconstantize+.
Creates a foreign key name from a class name. +separate_class_name_and_id_with_underscore+ sets whether the method should put '_' between the name and 'id'.
'Message'.foreign_key # => "message_id" 'Message'.foreign_key(false) # => "messageid" 'Admin::Post'.foreign_key # => "post_id"
Capitalizes the first word, turns underscores into spaces, and strips a trailing '_id' if present. Like +titleize+, this is meant for creating pretty output.
The capitalization of the first word can be turned off by setting the optional parameter +capitalize+ to false. By default, this parameter is true.
'employee_salary'.humanize # => "Employee salary" 'author_id'.humanize # => "Author" 'author_id'.humanize(capitalize: false) # => "author" '_id'.humanize # => "Id"
Returns the plural form of the word in the string.
If the optional parameter +count+ is specified, the singular form will be returned if count == 1. For any other value of +count+ the plural will be returned.
If the optional parameter +locale+ is specified, the word will be pluralized as a word of that language. By default, this parameter is set to :en. You must define your own inflection rules for languages other than English.
'post'.pluralize # => "posts" 'octopus'.pluralize # => "octopi" 'sheep'.pluralize # => "sheep" 'words'.pluralize # => "words" 'the blue mailman'.pluralize # => "the blue mailmen" 'CamelOctopus'.pluralize # => "CamelOctopi" 'apple'.pluralize(1) # => "apple" 'apple'.pluralize(2) # => "apples" 'ley'.pluralize(:es) # => "leyes" 'ley'.pluralize(1, :es) # => "ley"
The reverse of +pluralize+, returns the singular form of a word in a string.
If the optional parameter +locale+ is specified, the word will be singularized as a word of that language. By default, this parameter is set to :en. You must define your own inflection rules for languages other than English.
'posts'.singularize # => "post" 'octopi'.singularize # => "octopus" 'sheep'.singularize # => "sheep" 'word'.singularize # => "word" 'the blue mailmen'.singularize # => "the blue mailman" 'CamelOctopi'.singularize # => "CamelOctopus" 'leyes'.singularize(:es) # => "ley"
Creates the name of a table like Rails does for models to table names. This method uses the +pluralize+ method on the last word in the string.
'RawScaledScorer'.tableize # => "raw_scaled_scorers" 'ham_and_egg'.tableize # => "ham_and_eggs" 'fancyCategory'.tableize # => "fancy_categories"
Capitalizes all the words and replaces some characters in the string to create a nicer looking title. +titleize+ is meant for creating pretty output. It is not used in the Rails internals.
+titleize+ is also aliased as +titlecase+.
'man from the boondocks'.titleize # => "Man From The Boondocks" 'x-men: the last stand'.titleize # => "X Men: The Last Stand"
Converts just the first character to uppercase.
'what a Lovely Day'.upcase_first # => "What a Lovely Day" 'w'.upcase_first # => "W" ''.upcase_first # => ""