struct String::Grapheme


Grapheme represents a Unicode grapheme cluster, which describes the smallest functional unit of a writing system. This is also called a user-perceived character.

In the latin alphabet, most graphemes consist of a single Unicode codepoint (equivalent to Char). But a grapheme can also consist of a sequence of codepoints, which combine into a single unit.

For example, the string "e\u0301" consists of two characters, the latin small letter e and the combining acute accent ´. Together, they form a single grapheme: . That same grapheme could alternatively be described in a single codepoint, \u00E9 (latin small letter e with acute). But the combinatory possibilities are far bigger than the amount of directly available codepoints.

"e\u0301".size # => 2
"é".size       # => 1

"e\u0301".grapheme_size # => 1
"é".grapheme_size       # => 1

This combination of codepoints is common in some non-latin scripts. It's also often used with emojis to create customized combination. For example, the thumbs up sign 👍 (U+1F44D) combined with an emoji modifier such as U+1F3FC assign a colour to the emoji.

Instances of this type can be acquired via String#each_grapheme or String#graphemes.

The algorithm to determine boundaries between grapheme clusters is specified in the Unicode Standard Annex #29.

EXPERIMENTAL The grapheme API is still under development. Join the discussion at #11610.

Defined in:


Instance Method Summary

Instance methods inherited from struct Struct

==(other) : Bool ==, hash(hasher) hash, inspect(io : IO) : Nil inspect, pretty_print(pp) : Nil pretty_print, to_s(io : IO) : Nil to_s

Instance methods inherited from struct Value

==(other : JSON::Any)
==(other : YAML::Any)
, dup dup

Instance methods inherited from class Object

! : Bool !, !=(other) !=, !~(other) !~, ==(other) ==, ===(other : JSON::Any)
===(other : YAML::Any)
, =~(other) =~, as(type : Class) as, as?(type : Class) as?, class class, dup dup, hash(hasher)
, in?(collection : Object) : Bool
in?(*values : Object) : Bool
, inspect(io : IO) : Nil
inspect : String
, is_a?(type : Class) : Bool is_a?, itself itself, nil? : Bool nil?, not_nil! not_nil!, pretty_inspect(width = 79, newline = "\n", indent = 0) : String pretty_inspect, pretty_print(pp : PrettyPrint) : Nil pretty_print, responds_to?(name : Symbol) : Bool responds_to?, tap(&) tap, to_json(io : IO) : Nil
to_json : String
, to_pretty_json(indent : String = " ") : String
to_pretty_json(io : IO, indent : String = " ") : Nil
, to_s(io : IO) : Nil
to_s : String
, to_yaml(io : IO) : Nil
to_yaml : String
, try(&) try, unsafe_as(type : T.class) forall T unsafe_as

Class methods inherited from class Object

from_json(string_or_io, root : String)
, from_yaml(string_or_io : String | IO) from_yaml

Instance Method Detail

def ==(other : self) : Bool #

Returns true if other is equivalent to self.

Two graphemes are considered equivalent if they contain the same sequence of codepoints.

[View source]
def bytesize : Int32 #

Returns the number of bytes in the UTF-8 representation of this grapheme cluster.

[View source]
def inspect(io : IO) : Nil #

Appends a representation of this grapheme cluster to io.

[View source]
def size : Int32 #

Returns the number of characters in this grapheme cluster.

[View source]
def to_s(io : IO) : Nil #

Appends the characters in this grapheme cluster to io.

[View source]
def to_s : String #

Returns the characters in this grapheme cluster.

[View source]