struct Char

Overview

A Char represents a Unicode code point. It occupies 32 bits.

It is created by enclosing an UTF-8 character in single quotes.

'a'
'z'
'0'
'_'
'あ'

You can use a backslash to denote some characters:

'\'' # single quote
'\\' # backslash
'\e' # escape
'\f' # form feed
'\n' # newline
'\r' # carriage return
'\t' # tab
'\v' # vertical tab

You can use a backslash followed by an u and four hexadecimal characters to denote a unicode codepoint written:

'\u0041' # == 'A'

Or you can use curly braces and specify up to four hexadecimal numbers:

'\u{41}' # == 'A'

See Char literals in the language reference.

Included Modules

Defined in:

char.cr
char/reader.cr
primitives.cr

Constant Summary

MAX = 1114111.unsafe_chr: The maximum character.
MAX_CODEPOINT = 1114111: The maximum valid codepoint for a character.
REPLACEMENT = '�': The replacement character, used on invalid UTF-8 byte sequences.
ZERO = '\0': The character representing the end of a C string.

Instance Method Summary

#!=(other : Char) : Bool
Returns true if self's codepoint is not equal to other's codepoint.
#+(str : String) : String
Concatenates this char and string.
#+(other : Int) : Char
Returns a char that has this char's codepoint plus other.
#-(other : Char) : Int32
Returns the difference of the codepoint values of this char and other.
#-(other : Int) : Char
Returns a char that has this char's codepoint minus other.
#<(other : Char) : Bool
Returns true if self's codepoint is less than other's codepoint.
#<=(other : Char) : Bool
Returns true if self's codepoint is less than or equal to other's codepoint.
#<=>(other : Char)
The comparison operator.
#==(other : Char) : Bool
Returns true if self's codepoint is equal to other's codepoint.
#===(byte : Int)
Returns true if the codepoint is equal to byte ignoring the type.
#>(other : Char) : Bool
Returns true if self's codepoint is greater than other's codepoint.
#>=(other : Char) : Bool
Returns true if self's codepoint is greater than or equal to other's codepoint.
#alphanumeric? : Bool
Returns true if this char is a letter or a number according to unicode.
#ascii? : Bool
Returns true if this char is an ASCII character (codepoint is in (0..127))
#ascii_alphanumeric? : Bool
Returns true if this char is an ASCII letter or number ('0' to '9', 'a' to 'z', 'A' to 'Z').
#ascii_control? : Bool
Returns true if this char is an ASCII control character.
#ascii_letter? : Bool
Returns true if this char is an ASCII letter ('a' to 'z', 'A' to 'Z').
#ascii_lowercase? : Bool
Returns true if this char is a lowercase ASCII letter.
#ascii_number?(base : Int = 10) : Bool
Returns true if this char is an ASCII number in specified base.
#ascii_uppercase? : Bool
Returns true if this char is an ASCII uppercase letter.
#ascii_whitespace? : Bool
Returns true if this char is an ASCII whitespace.
#bytes : Array(UInt8)
Returns this char bytes as encoded by UTF-8, as an Array(UInt8).
#bytesize : Int32
Returns the number of UTF-8 bytes in this char.
#clone
#control? : Bool
Returns true if this char is a control character according to unicode.
#downcase(options = Unicode::CaseOptions::None) : Char
Returns the downcase equivalent of this char.
#downcase(options = Unicode::CaseOptions::None, &)
Yields each char for the downcase equivalent of this char.
#dump(io)
Returns a representation of self as an ASCII-compatible Crystal char literal, wrapped in single quotes.
#dump : String
Returns a representation of self as an ASCII-compatible Crystal char literal, wrapped in single quotes.
#each_byte(&) : Nil
Yields each of the bytes of this char as encoded by UTF-8.
#hash(hasher)
See Object#hash(hasher)
#hex? : Bool
Returns true if this char is an ASCII hex digit ('0' to '9', 'a' to 'f', 'A' to 'F').
#in_set?(*sets : String) : Bool
Returns true if this char is matched by the given sets.
#inspect(io : IO) : Nil
Returns a representation of self as a Crystal char literal, wrapped in single quotes.
#inspect : String
Returns a representation of self as a Crystal char literal, wrapped in single quotes.
#letter? : Bool
Returns true if this char is a letter.
#lowercase? : Bool
Returns true if this char is a lowercase letter.
#mark? : Bool
Returns true if this char is a mark character according to unicode.
#number? : Bool
Returns true if this char is a number according to unicode.
#ord : Int32
Returns the codepoint of this char.
#pred : Char
Returns the predecessor codepoint before this one.
#printable?
Returns true if this char is a printable character.
#step(*, to limit = nil, exclusive : Bool = false, &)
Performs a #step in the direction of the limit.
#step(*, to limit = nil, exclusive : Bool = false)
Performs a #step in the direction of the limit.
#succ : Char
Returns the successor codepoint after this one.
#to_f : Float64
Returns the integer value of this char as a float if it's an ASCII char denoting a digit, raises otherwise.
#to_f32 : Float32
See also: #to_f.
#to_f32? : Float32?
See also: #to_f?.
#to_f64 : Float64
Same as #to_f.
#to_f64? : Float64?
Same as #to_f?.
#to_f? : Float64?
Returns the integer value of this char as a float if it's an ASCII char denoting a digit, nil otherwise.
#to_i(base : Int = 10) : Int32
Returns the integer value of this char if it's an ASCII char denoting a digit in base, raises otherwise.
#to_i16(base : Int = 10)
See also: #to_i.
#to_i16?(base : Int = 10)
See also: #to_i?.
#to_i32(base : Int = 10) : Int32
Same as #to_i.
#to_i32?(base : Int = 10) : Int32?
Same as #to_i?.
#to_i64(base : Int = 10)
See also: #to_i.
#to_i64?(base : Int = 10)
See also: #to_i?.
#to_i8(base : Int = 10)
See also: #to_i.
#to_i8?(base : Int = 10)
See also: #to_i?.
#to_i?(base : Int = 10) : Int32?
Returns the integer value of this char if it's an ASCII char denoting a digit in base, nil otherwise.
#to_s(io : IO) : Nil
Appends this char to the given IO.
#to_s : String
Returns this char as a string containing this char as a single character.
#to_u16(base : Int = 10)
See also: #to_i.
#to_u16?(base : Int = 10)
See also: #to_i?.
#to_u32(base : Int = 10)
See also: #to_i.
#to_u32?(base : Int = 10)
See also: #to_i?.
#to_u64(base : Int = 10)
See also: #to_i.
#to_u64?(base : Int = 10)
See also: #to_i?.
#to_u8(base : Int = 10)
See also: #to_i.
#to_u8?(base : Int = 10)
See also: #to_i?.
#unicode_escape(io : IO) : Nil
Returns the Unicode escape sequence representing this character.
#unicode_escape : String
Returns the Unicode escape sequence representing this character.
#upcase(options = Unicode::CaseOptions::None) : Char
Returns the upcase equivalent of this char.
#upcase(options = Unicode::CaseOptions::None, &)
Yields each char for the upcase equivalent of this char.
#uppercase? : Bool
Returns true if this char is an uppercase letter.
#whitespace? : Bool
Returns true if this char is a whitespace according to unicode.

Instance methods inherited from module `Steppable`

Instance methods inherited from module `Comparable(Char)`

, , , , , ,

Instance methods inherited from struct `Value`

Instance methods inherited from class `Object`

, , , , , , , , , , , , , , , , , , , , , , , , , ,

Class methods inherited from class `Object`

Instance Method Detail

def !=(other : Char) : Bool #

Returns true if self's codepoint is not equal to other's codepoint.

[View source]

def +(str : String) : String #

Concatenates this char and string.

'f' + "oo" # => "foo"

[View source]

def +(other : Int) : Char #

Returns a char that has this char's codepoint plus other.

'a' + 1 # => 'b'
'a' + 2 # => 'c'

[View source]

def -(other : Char) : Int32 #

Returns the difference of the codepoint values of this char and other.

'a' - 'a' # => 0
'b' - 'a' # => 1
'c' - 'a' # => 2

[View source]

def -(other : Int) : Char #

Returns a char that has this char's codepoint minus other.

'c' - 1 # => 'b'
'c' - 2 # => 'a'

[View source]

def <(other : Char) : Bool #

Returns true if self's codepoint is less than other's codepoint.

[View source]

def <=(other : Char) : Bool #

Returns true if self's codepoint is less than or equal to other's codepoint.

[View source]

def <=>(other : Char) #

The comparison operator.

Returns the difference of the codepoint values of self and other. The result is either negative, 0 or positive based on whether other's codepoint is less, equal, or greater than self's codepoint.

'a' <=> 'c' # => -2
'z' <=> 'z' # => 0
'c' <=> 'a' # => 2

[View source]

def ==(other : Char) : Bool #

Returns true if self's codepoint is equal to other's codepoint.

[View source]

def ===(byte : Int) #

Returns true if the codepoint is equal to byte ignoring the type.

'c'.ord       # => 99
'c' === 99_u8 # => true
'c' === 99    # => true
'z' === 99    # => false

[View source]

def >(other : Char) : Bool #

Returns true if self's codepoint is greater than other's codepoint.

[View source]

def >=(other : Char) : Bool #

Returns true if self's codepoint is greater than or equal to other's codepoint.

[View source]

def alphanumeric? : Bool #

Returns true if this char is a letter or a number according to unicode.

'c'.alphanumeric? # => true
'8'.alphanumeric? # => true
'.'.alphanumeric? # => false

[View source]

def ascii? : Bool #

Returns true if this char is an ASCII character (codepoint is in (0..127))

[View source]

def ascii_alphanumeric? : Bool #

Returns true if this char is an ASCII letter or number ('0' to '9', 'a' to 'z', 'A' to 'Z').

'c'.ascii_alphanumeric? # => true
'8'.ascii_alphanumeric? # => true
'.'.ascii_alphanumeric? # => false

[View source]

def ascii_control? : Bool #

Returns true if this char is an ASCII control character.

This includes the C0 control codes (U+0000 through U+001F) and the Delete character (U+007F).

('\u0000'..'\u0019').each do |char|
  char.control? # => true
end

('\u007F'..'\u009F').each do |char|
  char.control? # => true
end

[View source]

def ascii_letter? : Bool #

Returns true if this char is an ASCII letter ('a' to 'z', 'A' to 'Z').

'c'.ascii_letter? # => true
'á'.ascii_letter? # => false
'8'.ascii_letter? # => false

[View source]

def ascii_lowercase? : Bool #

Returns true if this char is a lowercase ASCII letter.

'c'.ascii_lowercase? # => true
'ç'.lowercase?       # => true
'G'.ascii_lowercase? # => false
'.'.ascii_lowercase? # => false

[View source]

def ascii_number?(base : Int = 10) : Bool #

Returns true if this char is an ASCII number in specified base.

Base can be from 0 to 36 with digits from '0' to '9' and 'a' to 'z' or 'A' to 'Z'.

'4'.ascii_number?     # => true
'z'.ascii_number?     # => false
'z'.ascii_number?(36) # => true

[View source]

def ascii_uppercase? : Bool #

Returns true if this char is an ASCII uppercase letter.

'H'.ascii_uppercase? # => true
'Á'.ascii_uppercase? # => false
'c'.ascii_uppercase? # => false
'.'.ascii_uppercase? # => false

[View source]

def ascii_whitespace? : Bool #

Returns true if this char is an ASCII whitespace.

' '.ascii_whitespace?  # => true
'\t'.ascii_whitespace? # => true
'b'.ascii_whitespace?  # => false

[View source]

def bytes : Array(UInt8) #

Returns this char bytes as encoded by UTF-8, as an Array(UInt8).

'a'.bytes # => [97]
'あ'.bytes # => [227, 129, 130]

[View source]

def bytesize : Int32 #

Returns the number of UTF-8 bytes in this char.

'a'.bytesize # => 1
'好'.bytesize # => 3

[View source]

def clone #

[View source]

def control? : Bool #

Returns true if this char is a control character according to unicode.

[View source]

def downcase(options = Unicode::CaseOptions::None) : Char #

Returns the downcase equivalent of this char.

Note that this only works for characters whose downcase equivalent yields a single codepoint. There are a few characters, like 'İ', than when downcased result in multiple characters (in this case: 'I' and the dot mark).

For a more correct method see the method that receives a block.

'Z'.downcase # => 'z'
'x'.downcase # => 'x'
'.'.downcase # => '.'

[View source]

def downcase(options = Unicode::CaseOptions::None, &) #

Yields each char for the downcase equivalent of this char.

This method takes into account the possibility that an downcase version of a char might result in multiple chars, like for 'İ', which results in 'i' and a dot mark.

[View source]

def dump(io) #

Returns a representation of self as an ASCII-compatible Crystal char literal, wrapped in single quotes.

Non-printable characters (see #printable?) and non-ASCII characters (codepoints larger U+007F) are escaped.

'a'.dump      # => "'a'"
'\t'.dump     # => "'\\t'"
'あ'.dump      # => "'\\u3042'"
'\u0012'.dump # => "'\\u0012'"
'😀'.dump      # => "'\\u{1F600}'"

See #unicode_escape for the format used to escape characters without a special escape sequence.

#inspect only escapes non-printable characters.

[View source]

def dump : String #

Returns a representation of self as an ASCII-compatible Crystal char literal, wrapped in single quotes.

Non-printable characters (see #printable?) and non-ASCII characters (codepoints larger U+007F) are escaped.

'a'.dump      # => "'a'"
'\t'.dump     # => "'\\t'"
'あ'.dump      # => "'\\u3042'"
'\u0012'.dump # => "'\\u0012'"
'😀'.dump      # => "'\\u{1F600}'"

See #unicode_escape for the format used to escape characters without a special escape sequence.

#inspect only escapes non-printable characters.

[View source]

def each_byte(&) : Nil #

Yields each of the bytes of this char as encoded by UTF-8.

puts "'a'"
'a'.each_byte do |byte|
  puts byte
end
puts

puts "'あ'"
'あ'.each_byte do |byte|
  puts byte
end

Output:

'a'
97

'あ'
227
129
130

[View source]

def hash(hasher) #

See Object#hash(hasher)

[View source]

def hex? : Bool #

Returns true if this char is an ASCII hex digit ('0' to '9', 'a' to 'f', 'A' to 'F').

'5'.hex? # => true
'a'.hex? # => true
'F'.hex? # => true
'g'.hex? # => false

[View source]

def in_set?(*sets : String) : Bool #

Returns true if this char is matched by the given sets.

Each parameter defines a set, the character is matched against the intersection of those, in other words it needs to match all sets.

If a set starts with a ^, it is negated. The sequence c1-c2 means all characters between and including c1 and c2 and is known as a range.

The backslash character \ can be used to escape ^ or - and is otherwise ignored unless it appears at the end of a range or set.

'l'.in_set? "lo"          # => true
'l'.in_set? "lo", "o"     # => false
'l'.in_set? "hello", "^l" # => false
'l'.in_set? "j-m"         # => true

'^'.in_set? "\\^aeiou" # => true
'-'.in_set? "a\\-eo"   # => true

'\\'.in_set? "\\"    # => true
'\\'.in_set? "\\A"   # => false
'\\'.in_set? "X-\\w" # => true

[View source]

def inspect(io : IO) : Nil #

Returns a representation of self as a Crystal char literal, wrapped in single quotes.

Non-printable characters (see #printable?) are escaped.

'a'.inspect      # => "'a'"
'\t'.inspect     # => "'\\t'"
'あ'.inspect      # => "'あ'"
'\u0012'.inspect # => "'\\u0012'"
'😀'.inspect      # => "'\u{1F600}'"

See #unicode_escape for the format used to escape characters without a special escape sequence.

#dump additionally escapes all non-ASCII characters.

[View source]

def inspect : String #

Returns a representation of self as a Crystal char literal, wrapped in single quotes.

Non-printable characters (see #printable?) are escaped.

'a'.inspect      # => "'a'"
'\t'.inspect     # => "'\\t'"
'あ'.inspect      # => "'あ'"
'\u0012'.inspect # => "'\\u0012'"
'😀'.inspect      # => "'\u{1F600}'"

See #unicode_escape for the format used to escape characters without a special escape sequence.

#dump additionally escapes all non-ASCII characters.

[View source]

def letter? : Bool #

Returns true if this char is a letter.

All codepoints in the Unicode General Category L (Letter) are considered a letter.

'c'.letter? # => true
'á'.letter? # => true
'8'.letter? # => false

[View source]

def lowercase? : Bool #

Returns true if this char is a lowercase letter.

'c'.lowercase? # => true
'ç'.lowercase? # => true
'G'.lowercase? # => false
'.'.lowercase? # => false

[View source]

def mark? : Bool #

Returns true if this char is a mark character according to unicode.

[View source]

def number? : Bool #

Returns true if this char is a number according to unicode.

'1'.number? # => true
'a'.number? # => false

[View source]

def ord : Int32 #

Returns the codepoint of this char.

The codepoint is the integer representation. The Universal Coded Character Set (UCS) standard, commonly known as Unicode, assigns names and meanings to numbers, these numbers are called codepoints.

For values below and including 127 this matches the ASCII codes and thus its byte representation.

'a'.ord      # => 97
'\0'.ord     # => 0
'\u007f'.ord # => 127
'☃'.ord      # => 9731

[View source]

def pred : Char #

Returns the predecessor codepoint before this one.

This can be used for iterating a range of characters (see Range#each).

'b'.pred # => 'a'
'ぃ'.pred # => 'あ'

This does not always return codepoint - 1. There is a gap in the range of Unicode scalars: The surrogate codepoints U+D800 through U+DFFF.

'\uE000'.pred # => '\uD7FF'

Raises OverflowError for Char::ZERO.

#succ returns the successor codepoint.

[View source]

def printable? #

Returns true if this char is a printable character.

There is no universal definition of printable characters in Unicode. For the purpose of this method, all characters with a visible glyph and the ASCII whitespace () are considered printable.

This means characters which are #control? or #whitespace? (except for ) are non-printable.

[View source]

def step(*, to limit = nil, exclusive : Bool = false, &) #

Performs a #step in the direction of the limit. For instance:

'd'.step(to: 'a').to_a # => ['d', 'c', 'b', 'a']
'a'.step(to: 'd').to_a # => ['a', 'b', 'c', 'd']

[View source]

def step(*, to limit = nil, exclusive : Bool = false) #

Performs a #step in the direction of the limit. For instance:

'd'.step(to: 'a').to_a # => ['d', 'c', 'b', 'a']
'a'.step(to: 'd').to_a # => ['a', 'b', 'c', 'd']

[View source]

def succ : Char #

Returns the successor codepoint after this one.

This can be used for iterating a range of characters (see Range#each).

'a'.succ # => 'b'
'あ'.succ # => 'ぃ'

This does not always return codepoint + 1. There is a gap in the range of Unicode scalars: The surrogate codepoints U+D800 through U+DFFF.

'\uD7FF'.succ # => '\uE000'

Raises OverflowError for Char::MAX.

#pred returns the predecessor codepoint.

[View source]

def to_f : Float64 #

Returns the integer value of this char as a float if it's an ASCII char denoting a digit, raises otherwise.

'1'.to_f # => 1.0
'8'.to_f # => 8.0
'c'.to_f # raises ArgumentError

[View source]

def to_f32 : Float32 #

See also: #to_i.

[View source]

def to_i16?(base : Int = 10) #

See also: #to_i?.

[View source]

def to_i32(base : Int = 10) : Int32 #

Same as #to_i.

[View source]

def to_i32?(base : Int = 10) : Int32? #

Same as #to_i?.

[View source]

def to_i64(base : Int = 10) #

See also: #to_i.

[View source]

def to_i64?(base : Int = 10) #

See also: #to_i?.

[View source]

def to_i8(base : Int = 10) #

See also: #to_i.

[View source]

def to_i8?(base : Int = 10) #

See also: #to_i?.

[View source]

def to_i?(base : Int = 10) : Int32? #

Returns the integer value of this char if it's an ASCII char denoting a digit in base, nil otherwise.

'1'.to_i?     # => 1
'8'.to_i?     # => 8
'c'.to_i?     # => nil
'1'.to_i?(16) # => 1
'a'.to_i?(16) # => 10
'f'.to_i?(16) # => 15
'z'.to_i?(16) # => nil

[View source]

def to_s(io : IO) : Nil #

Appends this char to the given IO.

This appends this char's bytes as encoded by UTF-8 to the given IO.

[View source]

def to_s : String #

Returns this char as a string containing this char as a single character.

'a'.to_s # => "a"
'あ'.to_s # => "あ"

[View source]

def to_u16(base : Int = 10) #

See also: #to_i.

[View source]

def to_u16?(base : Int = 10) #

See also: #to_i?.

[View source]

def to_u32(base : Int = 10) #

See also: #to_i.

[View source]

def to_u32?(base : Int = 10) #

See also: #to_i?.

[View source]

def to_u64(base : Int = 10) #

See also: #to_i.

[View source]

def to_u64?(base : Int = 10) #

See also: #to_i?.

[View source]

def to_u8(base : Int = 10) #

See also: #to_i.

[View source]

def to_u8?(base : Int = 10) #

See also: #to_i?.

[View source]

def unicode_escape(io : IO) : Nil #

Returns the Unicode escape sequence representing this character.

The codepoints are expressed as hexadecimal digits with uppercase letters. Unicode escapes always use the four digit style for codepoints U+FFFF and lower, adding leading zeros when necessary. Higher codepoints have their digits wrapped in curly braces and no leading zeros.

'a'.unicode_escape      # => "\\u0061"
'\t'.unicode_escape     # => "\\u0009"
'あ'.unicode_escape      # => "\\u3042"
'\u0012'.unicode_escape # => "\\u0012"
'😀'.unicode_escape      # => "\\u{1F600}"

[View source]

def unicode_escape : String #

Returns the Unicode escape sequence representing this character.

'a'.unicode_escape      # => "\\u0061"
'\t'.unicode_escape     # => "\\u0009"
'あ'.unicode_escape      # => "\\u3042"
'\u0012'.unicode_escape # => "\\u0012"
'😀'.unicode_escape      # => "\\u{1F600}"

[View source]

def upcase(options = Unicode::CaseOptions::None) : Char #

Returns the upcase equivalent of this char.

Note that this only works for characters whose upcase equivalent yields a single codepoint. There are a few characters, like 'ﬄ', than when upcased result in multiple characters (in this case: 'F', 'F', 'L').

For a more correct method see the method that receives a block.

'z'.upcase # => 'Z'
'X'.upcase # => 'X'
'.'.upcase # => '.'

[View source]

def upcase(options = Unicode::CaseOptions::None, &) #

Yields each char for the upcase equivalent of this char.

This method takes into account the possibility that an upcase version of a char might result in multiple chars, like for 'ﬄ', which results in 'F', 'F' and 'L'.

'z'.upcase { |v| puts v } # prints 'Z'
'ﬄ'.upcase { |v| puts v } # prints 'F', 'F', 'L'

[View source]

def uppercase? : Bool #

Returns true if this char is an uppercase letter.

'H'.uppercase? # => true
'Á'.uppercase? # => true
'c'.uppercase? # => false
'.'.uppercase? # => false

[View source]

def whitespace? : Bool #

Returns true if this char is a whitespace according to unicode.

' '.whitespace?  # => true
'\t'.whitespace? # => true
'b'.whitespace?  # => false

[View source]

Crystal

struct Char

Overview

Included Modules

Defined in:

Constant Summary

Instance Method Summary

Instance methods inherited from module Steppable

Instance methods inherited from module Comparable(Char)

Instance methods inherited from struct Value

Instance methods inherited from class Object

Class methods inherited from class Object

Instance Method Detail

Instance methods inherited from module `Steppable`

Instance methods inherited from module `Comparable(Char)`

Instance methods inherited from struct `Value`

Instance methods inherited from class `Object`

Class methods inherited from class `Object`