struct String::Grapheme
Overview
Grapheme
represents a Unicode grapheme cluster, which describes the smallest
functional unit of a writing system. This is also called a user-perceived character.
In the latin alphabet, most graphemes consist of a single Unicode codepoint
(equivalent to Char
). But a grapheme can also consist of a sequence of codepoints,
which combine into a single unit.
For example, the string "e\u0301"
consists of two characters, the latin small letter e
and the combining acute accent ´
. Together, they form a single grapheme: é
.
That same grapheme could alternatively be described in a single codepoint, \u00E9
(latin small letter e with acute).
But the combinatory possibilities are far bigger than the amount of directly
available codepoints.
"e\u0301".size # => 2
"é".size # => 1
"e\u0301".grapheme_size # => 1
"é".grapheme_size # => 1
This combination of codepoints is common in some non-latin scripts. It's also
often used with emojis to create customized combination. For example, the
thumbs up sign 👍
(U+1F44D
) combined with an emoji modifier such as
U+1F3FC
assign a colour to the emoji.
Instances of this type can be acquired via String#each_grapheme
or String#graphemes
.
The algorithm to determine boundaries between grapheme clusters is specified in the Unicode Standard Annex #29.
EXPERIMENTAL The grapheme API is still under development. Join the discussion at #11610.
Defined in:
string/grapheme/grapheme.crstring/grapheme/properties.cr
Instance Method Summary
-
#==(other : self) : Bool
Returns
true
if other is equivalent toself
. -
#bytesize : Int32
Returns the number of bytes in the UTF-8 representation of this grapheme cluster.
-
#inspect(io : IO) : Nil
Appends a representation of this grapheme cluster to io.
-
#size : Int32
Returns the number of characters in this grapheme cluster.
-
#to_s(io : IO) : Nil
Appends the characters in this grapheme cluster to io.
-
#to_s : String
Returns the characters in this grapheme cluster.
Instance methods inherited from struct Struct
==(other) : Bool
==,
hash(hasher)
hash,
inspect(io : IO) : Nil
inspect,
pretty_print(pp) : Nil
pretty_print,
to_s(io : IO) : Nil
to_s
Instance methods inherited from struct Value
==(other : JSON::Any)==(other : YAML::Any)
==(other) ==, dup dup
Instance methods inherited from class Object
! : Bool
!,
!=(other)
!=,
!~(other)
!~,
==(other)
==,
===(other : JSON::Any)===(other : YAML::Any)
===(other) ===, =~(other) =~, as(type : Class) as, as?(type : Class) as?, class class, dup dup, hash(hasher)
hash hash, in?(collection : Object) : Bool
in?(*values : Object) : Bool in?, inspect(io : IO) : Nil
inspect : String inspect, is_a?(type : Class) : Bool is_a?, itself itself, nil? : Bool nil?, not_nil!(message)
not_nil! not_nil!, pretty_inspect(width = 79, newline = "\n", indent = 0) : String pretty_inspect, pretty_print(pp : PrettyPrint) : Nil pretty_print, responds_to?(name : Symbol) : Bool responds_to?, tap(&) tap, to_json(io : IO) : Nil
to_json : String to_json, to_pretty_json(indent : String = " ") : String
to_pretty_json(io : IO, indent : String = " ") : Nil to_pretty_json, to_s(io : IO) : Nil
to_s : String to_s, to_yaml(io : IO) : Nil
to_yaml : String to_yaml, try(&) try, unsafe_as(type : T.class) forall T unsafe_as
Class methods inherited from class Object
from_json(string_or_io, root : String)from_json(string_or_io) from_json, from_yaml(string_or_io : String | IO) from_yaml
Instance Method Detail
Returns true
if other is equivalent to self
.
Two graphemes are considered equivalent if they contain the same sequence of codepoints.
Returns the number of bytes in the UTF-8 representation of this grapheme cluster.