struct Char
Overview
A Char represents a Unicode code point.
It occupies 32 bits.
It is created by enclosing an UTF-8 character in single quotes.
'a'
'z'
'0'
'_'
'あ'You can use a backslash to denote some characters:
'\'' # single quote
'\\' # backslash
'\e' # escape
'\f' # form feed
'\n' # newline
'\r' # carriage return
'\t' # tab
'\v' # vertical tabYou can use a backslash followed by an u and four hexadecimal characters to denote a unicode codepoint written:
'\u0041' # == 'A'Or you can use curly braces and specify up to four hexadecimal numbers:
'\u{41}' # == 'A'See Char literals in the language reference.
Included Modules
Defined in:
char.crchar/reader.cr
primitives.cr
Constant Summary
- 
        MAX = 1114111.unsafe_chr
- 
        The maximum character. 
- 
        MAX_CODEPOINT = 1114111
- 
        The maximum valid codepoint for a character. 
- 
        REPLACEMENT = '�'
- 
        The replacement character, used on invalid UTF-8 byte sequences. 
- 
        ZERO = '\0'
- 
        The character representing the end of a C string. 
Instance Method Summary
- 
        #!=(other : Char) : Bool
        
          Returns trueifself's codepoint is not equal to other's codepoint.
- 
        #+(str : String) : String
        
          Concatenates this char and string. 
- 
        #+(other : Int) : Char
        
          Returns a char that has this char's codepoint plus other. 
- 
        #-(other : Char) : Int32
        
          Returns the difference of the codepoint values of this char and other. 
- 
        #-(other : Int) : Char
        
          Returns a char that has this char's codepoint minus other. 
- 
        #<(other : Char) : Bool
        
          Returns trueifself's codepoint is less than other's codepoint.
- 
        #<=(other : Char) : Bool
        
          Returns trueifself's codepoint is less than or equal to other's codepoint.
- 
        #<=>(other : Char)
        
          The comparison operator. 
- 
        #==(other : Char) : Bool
        
          Returns trueifself's codepoint is equal to other's codepoint.
- 
        #===(byte : Int)
        
          Returns trueif the codepoint is equal to byte ignoring the type.
- 
        #>(other : Char) : Bool
        
          Returns trueifself's codepoint is greater than other's codepoint.
- 
        #>=(other : Char) : Bool
        
          Returns trueifself's codepoint is greater than or equal to other's codepoint.
- 
        #alphanumeric? : Bool
        
          Returns trueif this char is a letter or a number according to unicode.
- 
        #ascii? : Bool
        
          Returns trueif this char is an ASCII character (codepoint is in (0..127))
- 
        #ascii_alphanumeric? : Bool
        
          Returns trueif this char is an ASCII letter or number ('0' to '9', 'a' to 'z', 'A' to 'Z').
- 
        #ascii_control? : Bool
        
          Returns trueif this char is an ASCII control character.
- 
        #ascii_letter? : Bool
        
          Returns trueif this char is an ASCII letter ('a' to 'z', 'A' to 'Z').
- 
        #ascii_lowercase? : Bool
        
          Returns trueif this char is a lowercase ASCII letter.
- 
        #ascii_number?(base : Int = 10) : Bool
        
          Returns trueif this char is an ASCII number in specified base.
- 
        #ascii_uppercase? : Bool
        
          Returns trueif this char is an ASCII uppercase letter.
- 
        #ascii_whitespace? : Bool
        
          Returns trueif this char is an ASCII whitespace.
- #bytes : Array(UInt8)
- 
        #bytesize : Int32
        
          Returns the number of UTF-8 bytes in this char. 
- #clone
- 
        #control? : Bool
        
          Returns trueif this char is a control character according to unicode.
- 
        #downcase(options = Unicode::CaseOptions::None) : Char
        
          Returns the downcase equivalent of this char. 
- 
        #downcase(options = Unicode::CaseOptions::None, &)
        
          Yields each char for the downcase equivalent of this char. 
- 
        #dump(io)
        
          Returns a representation of selfas an ASCII-compatible Crystal char literal, wrapped in single quotes.
- 
        #dump : String
        
          Returns a representation of selfas an ASCII-compatible Crystal char literal, wrapped in single quotes.
- 
        #each_byte(&) : Nil
        
          Yields each of the bytes of this char as encoded by UTF-8. 
- #hash(hasher)
- 
        #hex? : Bool
        
          Returns trueif this char is an ASCII hex digit ('0' to '9', 'a' to 'f', 'A' to 'F').
- 
        #in_set?(*sets : String) : Bool
        
          Returns trueif this char is matched by the given sets.
- 
        #inspect(io : IO) : Nil
        
          Returns a representation of selfas a Crystal char literal, wrapped in single quotes.
- 
        #inspect : String
        
          Returns a representation of selfas a Crystal char literal, wrapped in single quotes.
- 
        #letter? : Bool
        
          Returns trueif this char is a letter.
- 
        #lowercase? : Bool
        
          Returns trueif this char is a lowercase letter.
- 
        #mark? : Bool
        
          Returns trueif this char is a mark character according to unicode.
- 
        #number? : Bool
        
          Returns trueif this char is a number according to unicode.
- 
        #ord : Int32
        
          Returns the codepoint of this char. 
- 
        #pred : Char
        
          Returns the predecessor codepoint before this one. 
- 
        #printable?
        
          Returns trueif this char is a printable character.
- 
        #step(*, to limit = nil, exclusive : Bool = false, &)
        
          Performs a #stepin the direction of the limit.
- 
        #step(*, to limit = nil, exclusive : Bool = false)
        
          Performs a #stepin the direction of the limit.
- 
        #succ : Char
        
          Returns the successor codepoint after this one. 
- 
        #to_f : Float64
        
          Returns the integer value of this char as a float if it's an ASCII char denoting a digit, raises otherwise. 
- 
        #to_f32 : Float32
        
          See also: #to_f.
- 
        #to_f32? : Float32?
        
          See also: #to_f?.
- 
        #to_f64 : Float64
        
          Same as #to_f.
- 
        #to_f64? : Float64?
        
          Same as #to_f?.
- 
        #to_f? : Float64?
        
          Returns the integer value of this char as a float if it's an ASCII char denoting a digit, nilotherwise.
- 
        #to_i(base : Int = 10) : Int32
        
          Returns the integer value of this char if it's an ASCII char denoting a digit in base, raises otherwise. 
- 
        #to_i16(base : Int = 10)
        
          See also: #to_i.
- 
        #to_i16?(base : Int = 10)
        
          See also: #to_i?.
- 
        #to_i32(base : Int = 10) : Int32
        
          Same as #to_i.
- 
        #to_i32?(base : Int = 10) : Int32?
        
          Same as #to_i?.
- 
        #to_i64(base : Int = 10)
        
          See also: #to_i.
- 
        #to_i64?(base : Int = 10)
        
          See also: #to_i?.
- 
        #to_i8(base : Int = 10)
        
          See also: #to_i.
- 
        #to_i8?(base : Int = 10)
        
          See also: #to_i?.
- 
        #to_i?(base : Int = 10) : Int32?
        
          Returns the integer value of this char if it's an ASCII char denoting a digit in base, nilotherwise.
- 
        #to_s(io : IO) : Nil
        
          Appends this char to the given IO.
- 
        #to_s : String
        
          Returns this char as a string containing this char as a single character. 
- 
        #to_u16(base : Int = 10)
        
          See also: #to_i.
- 
        #to_u16?(base : Int = 10)
        
          See also: #to_i?.
- 
        #to_u32(base : Int = 10)
        
          See also: #to_i.
- 
        #to_u32?(base : Int = 10)
        
          See also: #to_i?.
- 
        #to_u64(base : Int = 10)
        
          See also: #to_i.
- 
        #to_u64?(base : Int = 10)
        
          See also: #to_i?.
- 
        #to_u8(base : Int = 10)
        
          See also: #to_i.
- 
        #to_u8?(base : Int = 10)
        
          See also: #to_i?.
- 
        #unicode_escape(io : IO) : Nil
        
          Returns the Unicode escape sequence representing this character. 
- 
        #unicode_escape : String
        
          Returns the Unicode escape sequence representing this character. 
- 
        #upcase(options = Unicode::CaseOptions::None) : Char
        
          Returns the upcase equivalent of this char. 
- 
        #upcase(options = Unicode::CaseOptions::None, &)
        
          Yields each char for the upcase equivalent of this char. 
- 
        #uppercase? : Bool
        
          Returns trueif this char is an uppercase letter.
- 
        #whitespace? : Bool
        
          Returns trueif this char is a whitespace according to unicode.
Instance methods inherited from module Steppable
  
  
    
      step(*, to limit = nil, by step, exclusive : Bool = false, &) : Nilstep(*, to limit = nil, by step, exclusive : Bool = false) step
Instance methods inherited from module Comparable(Char)
  
  
    
      <(other : T) : Bool
    <, 
    
  
    
      <=(other : T)
    <=, 
    
  
    
      <=>(other : T)
    <=>, 
    
  
    
      ==(other : T)
    ==, 
    
  
    
      >(other : T) : Bool
    >, 
    
  
    
      >=(other : T)
    >=, 
    
  
    
      clamp(min, max)clamp(range : Range) clamp
Instance methods inherited from struct Value
  
  
    
      ==(other : JSON::Any)==(other : YAML::Any)
==(other) ==, dup dup
Instance methods inherited from class Object
  
  
    
      ! : Bool
    !, 
    
  
    
      !=(other)
    !=, 
    
  
    
      !~(other)
    !~, 
    
  
    
      ==(other)
    ==, 
    
  
    
      ===(other : JSON::Any)===(other : YAML::Any)
===(other) ===, =~(other) =~, as(type : Class) as, as?(type : Class) as?, class class, dup dup, hash(hasher)
hash hash, in?(collection : Object) : Bool
in?(*values : Object) : Bool in?, inspect(io : IO) : Nil
inspect : String inspect, is_a?(type : Class) : Bool is_a?, itself itself, nil? : Bool nil?, not_nil! not_nil!, pretty_inspect(width = 79, newline = "\n", indent = 0) : String pretty_inspect, pretty_print(pp : PrettyPrint) : Nil pretty_print, responds_to?(name : Symbol) : Bool responds_to?, tap(&) tap, to_json(io : IO) : Nil
to_json : String to_json, to_pretty_json(indent : String = " ") : String
to_pretty_json(io : IO, indent : String = " ") : Nil to_pretty_json, to_s(io : IO) : Nil
to_s : String to_s, to_yaml(io : IO) : Nil
to_yaml : String to_yaml, try(&) try, unsafe_as(type : T.class) forall T unsafe_as
Class methods inherited from class Object
  
  
    
      from_json(string_or_io, root : String)from_json(string_or_io) from_json, from_yaml(string_or_io : String | IO) from_yaml
Instance Method Detail
Returns true if self's codepoint is not equal to other's codepoint.
Concatenates this char and string.
'f' + "oo" # => "foo"Returns a char that has this char's codepoint plus other.
'a' + 1 # => 'b'
'a' + 2 # => 'c'Returns the difference of the codepoint values of this char and other.
'a' - 'a' # => 0
'b' - 'a' # => 1
'c' - 'a' # => 2Returns a char that has this char's codepoint minus other.
'c' - 1 # => 'b'
'c' - 2 # => 'a'Returns true if self's codepoint is less than other's codepoint.
Returns true if self's codepoint is less than or equal to other's codepoint.
The comparison operator.
Returns the difference of the codepoint values of self and other.
The result is either negative, 0 or positive based on whether other's codepoint is
less, equal, or greater than self's codepoint.
'a' <=> 'c' # => -2
'z' <=> 'z' # => 0
'c' <=> 'a' # => 2Returns true if self's codepoint is equal to other's codepoint.
Returns true if the codepoint is equal to byte ignoring the type.
'c'.ord       # => 99
'c' === 99_u8 # => true
'c' === 99    # => true
'z' === 99    # => falseReturns true if self's codepoint is greater than other's codepoint.
Returns true if self's codepoint is greater than or equal to other's codepoint.
Returns true if this char is a letter or a number according to unicode.
'c'.alphanumeric? # => true
'8'.alphanumeric? # => true
'.'.alphanumeric? # => falseReturns true if this char is an ASCII character
(codepoint is in (0..127))
Returns true if this char is an ASCII letter or number ('0' to '9', 'a' to 'z', 'A' to 'Z').
'c'.ascii_alphanumeric? # => true
'8'.ascii_alphanumeric? # => true
'.'.ascii_alphanumeric? # => falseReturns true if this char is an ASCII control character.
This includes the C0 control codes (U+0000 through U+001F) and the
Delete character (U+007F).
('\u0000'..'\u0019').each do |char|
  char.control? # => true
end
('\u007F'..'\u009F').each do |char|
  char.control? # => true
endReturns true if this char is an ASCII letter ('a' to 'z', 'A' to 'Z').
'c'.ascii_letter? # => true
'á'.ascii_letter? # => false
'8'.ascii_letter? # => falseReturns true if this char is a lowercase ASCII letter.
'c'.ascii_lowercase? # => true
'ç'.lowercase?       # => true
'G'.ascii_lowercase? # => false
'.'.ascii_lowercase? # => falseReturns true if this char is an ASCII number in specified base.
Base can be from 0 to 36 with digits from '0' to '9' and 'a' to 'z' or 'A' to 'Z'.
'4'.ascii_number?     # => true
'z'.ascii_number?     # => false
'z'.ascii_number?(36) # => trueReturns true if this char is an ASCII uppercase letter.
'H'.ascii_uppercase? # => true
'Á'.ascii_uppercase? # => false
'c'.ascii_uppercase? # => false
'.'.ascii_uppercase? # => falseReturns true if this char is an ASCII whitespace.
' '.ascii_whitespace?  # => true
'\t'.ascii_whitespace? # => true
'b'.ascii_whitespace?  # => falseReturns this char bytes as encoded by UTF-8, as an Array(UInt8).
'a'.bytes # => [97]
'あ'.bytes # => [227, 129, 130]Returns the number of UTF-8 bytes in this char.
'a'.bytesize # => 1
'好'.bytesize # => 3Returns true if this char is a control character according to unicode.
Returns the downcase equivalent of this char.
Note that this only works for characters whose downcase equivalent yields a single codepoint. There are a few characters, like 'İ', than when downcased result in multiple characters (in this case: 'I' and the dot mark).
For a more correct method see the method that receives a block.
'Z'.downcase # => 'z'
'x'.downcase # => 'x'
'.'.downcase # => '.'Yields each char for the downcase equivalent of this char.
This method takes into account the possibility that an downcase version of a char might result in multiple chars, like for 'İ', which results in 'i' and a dot mark.
Returns a representation of self as an ASCII-compatible Crystal char literal,
wrapped in single quotes.
Non-printable characters (see #printable?) and non-ASCII characters
(codepoints larger U+007F) are escaped.
'a'.dump      # => "'a'"
'\t'.dump     # => "'\\t'"
'あ'.dump      # => "'\\u3042'"
'\u0012'.dump # => "'\\u0012'"
'😀'.dump      # => "'\\u{1F600}'"See #unicode_escape for the format used to escape characters without a
special escape sequence.
- #inspectonly escapes non-printable characters.
Returns a representation of self as an ASCII-compatible Crystal char literal,
wrapped in single quotes.
Non-printable characters (see #printable?) and non-ASCII characters
(codepoints larger U+007F) are escaped.
'a'.dump      # => "'a'"
'\t'.dump     # => "'\\t'"
'あ'.dump      # => "'\\u3042'"
'\u0012'.dump # => "'\\u0012'"
'😀'.dump      # => "'\\u{1F600}'"See #unicode_escape for the format used to escape characters without a
special escape sequence.
- #inspectonly escapes non-printable characters.
Yields each of the bytes of this char as encoded by UTF-8.
puts "'a'"
'a'.each_byte do |byte|
  puts byte
end
puts
puts "'あ'"
'あ'.each_byte do |byte|
  puts byte
endOutput:
'a'
97
'あ'
227
129
130Returns true if this char is an ASCII hex digit ('0' to '9', 'a' to 'f', 'A' to 'F').
'5'.hex? # => true
'a'.hex? # => true
'F'.hex? # => true
'g'.hex? # => falseReturns true if this char is matched by the given sets.
Each parameter defines a set, the character is matched against the intersection of those, in other words it needs to match all sets.
If a set starts with a ^, it is negated. The sequence c1-c2 means all characters between and including c1 and c2 and is known as a range.
The backslash character \ can be used to escape ^ or - and is otherwise ignored unless it appears at the end of a range or set.
'l'.in_set? "lo"          # => true
'l'.in_set? "lo", "o"     # => false
'l'.in_set? "hello", "^l" # => false
'l'.in_set? "j-m"         # => true
'^'.in_set? "\\^aeiou" # => true
'-'.in_set? "a\\-eo"   # => true
'\\'.in_set? "\\"    # => true
'\\'.in_set? "\\A"   # => false
'\\'.in_set? "X-\\w" # => trueReturns a representation of self as a Crystal char literal, wrapped in single
quotes.
Non-printable characters (see #printable?) are escaped.
'a'.inspect      # => "'a'"
'\t'.inspect     # => "'\\t'"
'あ'.inspect      # => "'あ'"
'\u0012'.inspect # => "'\\u0012'"
'😀'.inspect      # => "'\u{1F600}'"See #unicode_escape for the format used to escape characters without a
special escape sequence.
- #dumpadditionally escapes all non-ASCII characters.
Returns a representation of self as a Crystal char literal, wrapped in single
quotes.
Non-printable characters (see #printable?) are escaped.
'a'.inspect      # => "'a'"
'\t'.inspect     # => "'\\t'"
'あ'.inspect      # => "'あ'"
'\u0012'.inspect # => "'\\u0012'"
'😀'.inspect      # => "'\u{1F600}'"See #unicode_escape for the format used to escape characters without a
special escape sequence.
- #dumpadditionally escapes all non-ASCII characters.
Returns true if this char is a letter.
All codepoints in the Unicode General Category L (Letter) are considered
a letter.
'c'.letter? # => true
'á'.letter? # => true
'8'.letter? # => falseReturns true if this char is a lowercase letter.
'c'.lowercase? # => true
'ç'.lowercase? # => true
'G'.lowercase? # => false
'.'.lowercase? # => falseReturns true if this char is a mark character according to unicode.
Returns true if this char is a number according to unicode.
'1'.number? # => true
'a'.number? # => falseReturns the codepoint of this char.
The codepoint is the integer representation. The Universal Coded Character Set (UCS) standard, commonly known as Unicode, assigns names and meanings to numbers, these numbers are called codepoints.
For values below and including 127 this matches the ASCII codes and thus its byte representation.
'a'.ord      # => 97
'\0'.ord     # => 0
'\u007f'.ord # => 127
'☃'.ord      # => 9731Returns the predecessor codepoint before this one.
This can be used for iterating a range of characters (see Range#each).
'b'.pred # => 'a'
'ぃ'.pred # => 'あ'This does not always return codepoint - 1. There is a gap in the
range of Unicode scalars: The surrogate codepoints U+D800 through U+DFFF.
'\uE000'.pred # => '\uD7FF'Raises OverflowError for Char::ZERO.
- #succreturns the successor codepoint.
Returns true if this char is a printable character.
There is no universal definition of printable characters in Unicode.
For the purpose of this method, all characters with a visible glyph and the
ASCII whitespace () are considered printable.
This means characters which are #control? or #whitespace? (except for )
are non-printable.
Performs a #step in the direction of the limit. For instance:
'd'.step(to: 'a').to_a # => ['d', 'c', 'b', 'a']
'a'.step(to: 'd').to_a # => ['a', 'b', 'c', 'd']Performs a #step in the direction of the limit. For instance:
'd'.step(to: 'a').to_a # => ['d', 'c', 'b', 'a']
'a'.step(to: 'd').to_a # => ['a', 'b', 'c', 'd']Returns the successor codepoint after this one.
This can be used for iterating a range of characters (see Range#each).
'a'.succ # => 'b'
'あ'.succ # => 'ぃ'This does not always return codepoint + 1. There is a gap in the
range of Unicode scalars: The surrogate codepoints U+D800 through U+DFFF.
'\uD7FF'.succ # => '\uE000'Raises OverflowError for Char::MAX.
- #predreturns the predecessor codepoint.
Returns the integer value of this char as a float if it's an ASCII char denoting a digit, raises otherwise.
'1'.to_f # => 1.0
'8'.to_f # => 8.0
'c'.to_f # raises ArgumentErrorReturns the integer value of this char as a float if it's an ASCII char denoting a digit,
nil otherwise.
'1'.to_f? # => 1.0
'8'.to_f? # => 8.0
'c'.to_f? # => nilReturns the integer value of this char if it's an ASCII char denoting a digit in base, raises otherwise.
'1'.to_i     # => 1
'8'.to_i     # => 8
'c'.to_i     # raises ArgumentError
'1'.to_i(16) # => 1
'a'.to_i(16) # => 10
'f'.to_i(16) # => 15
'z'.to_i(16) # raises ArgumentErrorReturns the integer value of this char if it's an ASCII char denoting a digit
in base, nil otherwise.
'1'.to_i?     # => 1
'8'.to_i?     # => 8
'c'.to_i?     # => nil
'1'.to_i?(16) # => 1
'a'.to_i?(16) # => 10
'f'.to_i?(16) # => 15
'z'.to_i?(16) # => nilAppends this char to the given IO.
This appends this char's bytes as encoded by UTF-8 to the given IO.
Returns this char as a string containing this char as a single character.
'a'.to_s # => "a"
'あ'.to_s # => "あ"Returns the Unicode escape sequence representing this character.
The codepoints are expressed as hexadecimal digits with uppercase letters.
Unicode escapes always use the four digit style for codepoints U+FFFF
and lower, adding leading zeros when necessary. Higher codepoints have their
digits wrapped in curly braces and no leading zeros.
'a'.unicode_escape      # => "\\u0061"
'\t'.unicode_escape     # => "\\u0009"
'あ'.unicode_escape      # => "\\u3042"
'\u0012'.unicode_escape # => "\\u0012"
'😀'.unicode_escape      # => "\\u{1F600}"Returns the Unicode escape sequence representing this character.
The codepoints are expressed as hexadecimal digits with uppercase letters.
Unicode escapes always use the four digit style for codepoints U+FFFF
and lower, adding leading zeros when necessary. Higher codepoints have their
digits wrapped in curly braces and no leading zeros.
'a'.unicode_escape      # => "\\u0061"
'\t'.unicode_escape     # => "\\u0009"
'あ'.unicode_escape      # => "\\u3042"
'\u0012'.unicode_escape # => "\\u0012"
'😀'.unicode_escape      # => "\\u{1F600}"Returns the upcase equivalent of this char.
Note that this only works for characters whose upcase equivalent yields a single codepoint. There are a few characters, like 'ffl', than when upcased result in multiple characters (in this case: 'F', 'F', 'L').
For a more correct method see the method that receives a block.
'z'.upcase # => 'Z'
'X'.upcase # => 'X'
'.'.upcase # => '.'Yields each char for the upcase equivalent of this char.
This method takes into account the possibility that an upcase version of a char might result in multiple chars, like for 'ffl', which results in 'F', 'F' and 'L'.
'z'.upcase { |v| puts v } # prints 'Z'
'ffl'.upcase { |v| puts v } # prints 'F', 'F', 'L'Returns true if this char is an uppercase letter.
'H'.uppercase? # => true
'Á'.uppercase? # => true
'c'.uppercase? # => false
'.'.uppercase? # => falseReturns true if this char is a whitespace according to unicode.
' '.whitespace?  # => true
'\t'.whitespace? # => true
'b'.whitespace?  # => false