Crystal detects Emoji symbols in String

Crystal detects Emoji symbols in String

Problem is to identify unicode characters that has different byte size, but single symbol in render. Such of the symbols Emoji.

subject = "🇺🇦 Ukraine"
puts subject.size
> 10
puts subject.bytesize
> 16
puts subject.chars
> ['🇺', '🇦', ' ', 'U', 'k', 'r', 'a', 'i', 'n', 'e']

To help developers to skip building a big Regexp² to detect characters, introduced String::Grapheme¹.

puts subject.grapheme_size
> 9
puts subject.graphemes
> [String::Grapheme("🇺🇦"), String::Grapheme(' '), String::Grapheme('U'), String::Grapheme('k'), String::Grapheme('r'), String::Grapheme('a'), String::Grapheme('i'), String::Grapheme('n'), String::Grapheme('e')]

The result shows exactly the number of symbols to be rendered.

Example how Grapheme could be used. Here is original code:

result = ""
subject.each_char_with_index do |c, index|
  result += "<" if index == 2
  result += c
  result += ">" if index == 8
end
puts result
> 🇺🇦< Ukrain>e

and it converted to something very similar

index = 0
result = ""
subject.each_grapheme do |symbol|
  result += "<" if index == 2
  result += symbol.to_s
  result += ">" if index == 8
  index += 1
end
puts result
> 🇺🇦 <Ukraine>

References

https://crystal-lang.org/api/1.3.2/String/Grapheme.html