Skip to content
Snippets Groups Projects

Update support for Unicode 15.1

Merged Brett Walker requested to merge 28-update-support-for-unicode-16-0 into main
7 unresolved threads

Upgrade to start using Unicode 15.1 emojis. Note that while 16.0 was just released, noto-emoji doesn't yet support it. Various things start breaking without the proper fallback images, and it's better to wait until 16 is a little more prevalent.

  • load the Unicode emoji-test.txt file as the main basis for emoji data
  • load the gemojione data, adding as aliases
  • load any additional aliases

Closes #28 (closed)

Edited by Brett Walker

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Brett Walker added 3 commits

    added 3 commits

    Compare with previous version

  • mentioned in issue #4 (closed)

  • Gemojione shortcodes are added as aliases to the Unicode emojis. This provides backward compatibility. However there are some minor differences between the shortcodes of gemojione and the Unicode emojis.

    For example, :cow: in gemojione is :cow:, while in Unicode it is 🐄, which is :cow2: in gemojione. Now :cow_face: will give :cow:.

    I think this difference is acceptable. It's worth a minor change in emoji to better align with the Unicode CLDR names. And the emojis are not vastly different.

    Differences generated using a temporary spec:

        it 'check alpha_codes', :aggregate_failures do
          db = File.open(TanukiEmoji::Db::Gemojione::DATA_FILE, 'r:UTF-8') do |file|
            JSON.parse(file.read, symbolize_names: true)
          end
    
          db.each do |emoji_name, emoji_data|
            emoji = TanukiEmoji.find_by_alpha_code(emoji_data[:shortname])
    
            if emoji.codepoints != emoji_data[:moji] && !emoji.codepoints_alternates.include?(emoji_data[:moji])
              puts "#{emoji.codepoints} (#{emoji.alpha_code}) VS #{emoji_data[:moji]} (#{emoji_data[:shortname]})"
              puts emoji.inspect
              puts emoji_data
              puts
            end
            # expect(emoji.codepoints).to eq emoji_data[:moji]
          end
        end

    differences

    📅 (:calendar:) VS 📆 (:calendar:)
    #<TanukiEmoji::Character: 📅 (1f4c5) :calendar: aliases: [":date:"]>
    {:unicode=>"1F4C6", :unicode_alternates=>[], :name=>"tear-off calendar", :shortname=>":calendar:", :category=>"objects", :aliases=>[], :aliases_ascii=>[], :keywords=>["schedule", "object", "office"], :moji=>"📆"}
    
    🐪 (:camel:) VS 🐫 (:camel:)
    #<TanukiEmoji::Character: 🐪 (1f42a) :camel: aliases: [":dromedary_camel:"]>
    {:unicode=>"1F42B", :unicode_alternates=>[], :name=>"bactrian camel", :shortname=>":camel:", :category=>"nature", :aliases=>[], :aliases_ascii=>[], :keywords=>["animal", "hot", "nature", "bactrian", "camel", "hump", "desert", "central asia", "heat", "water", "hump day", "wednesday", "sex", "wildlife"], :moji=>"🐫"}
    
    🐈 (:cat:) VS 🐱 (:cat:)
    #<TanukiEmoji::Character: 🐈 (1f408) :cat: aliases: [":cat2:"]>
    {:unicode=>"1F431", :unicode_alternates=>[], :name=>"cat face", :shortname=>":cat:", :category=>"nature", :aliases=>[], :aliases_ascii=>[], :keywords=>["animal", "meow", "halloween", "vagina", "cat"], :moji=>"🐱"}
    
    🐄 (:cow:) VS 🐮 (:cow:)
    #<TanukiEmoji::Character: 🐄 (1f404) :cow: aliases: [":cow2:"]>
    {:unicode=>"1F42E", :unicode_alternates=>[], :name=>"cow face", :shortname=>":cow:", :category=>"nature", :aliases=>[], :aliases_ascii=>[], :keywords=>["animal", "beef", "ox"], :moji=>"🐮"}
    
    🐕 (:dog:) VS 🐶 (:dog:)
    #<TanukiEmoji::Character: 🐕 (1f415) :dog: aliases: [":dog2:"]>
    {:unicode=>"1F436", :unicode_alternates=>[], :name=>"dog face", :shortname=>":dog:", :category=>"nature", :aliases=>[], :aliases_ascii=>[], :keywords=>["animal", "friend", "nature", "woof", "dog", "pug"], :moji=>"🐶"}
    
    🐎 (:horse:) VS 🐴 (:horse:)
    #<TanukiEmoji::Character: 🐎 (1f40e) :horse: aliases: [":racehorse:"]>
    {:unicode=>"1F434", :unicode_alternates=>[], :name=>"horse face", :shortname=>":horse:", :category=>"nature", :aliases=>[], :aliases_ascii=>[], :keywords=>["animal", "brown", "wildlife"], :moji=>"🐴"}
    
    🐁 (:mouse:) VS 🐭 (:mouse:)
    #<TanukiEmoji::Character: 🐁 (1f401) :mouse: aliases: [":mouse2:"]>
    {:unicode=>"1F42D", :unicode_alternates=>[], :name=>"mouse face", :shortname=>":mouse:", :category=>"nature", :aliases=>[], :aliases_ascii=>[], :keywords=>["animal", "nature"], :moji=>"🐭"}
    
    ✏️ (:pencil:) VS 📝 (:pencil:)
    #<TanukiEmoji::Character: ✏️ (270f-fe0f) :pencil: aliases: [":pencil2:"]>
    {:unicode=>"1F4DD", :unicode_alternates=>[], :name=>"memo", :shortname=>":pencil:", :category=>"objects", :aliases=>[":memo:"], :aliases_ascii=>[], :keywords=>["documents", "paper", "station", "write", "work", "office"], :moji=>"📝"}
    
    🐖 (:pig:) VS 🐷 (:pig:)
    #<TanukiEmoji::Character: 🐖 (1f416) :pig: aliases: [":pig2:"]>
    {:unicode=>"1F437", :unicode_alternates=>[], :name=>"pig face", :shortname=>":pig:", :category=>"nature", :aliases=>[], :aliases_ascii=>[], :keywords=>["animal", "oink"], :moji=>"🐷"}
    
    🐇 (:rabbit:) VS 🐰 (:rabbit:)
    #<TanukiEmoji::Character: 🐇 (1f407) :rabbit: aliases: [":rabbit2:"]>
    {:unicode=>"1F430", :unicode_alternates=>[], :name=>"rabbit face", :shortname=>":rabbit:", :category=>"nature", :aliases=>[], :aliases_ascii=>[], :keywords=>["animal", "nature", "wildlife"], :moji=>"🐰"}
    
    🛰️ (:satellite:) VS 📡 (:satellite:)
    #<TanukiEmoji::Character: 🛰️ (1f6f0-fe0f) :satellite: aliases: [":satellite_orbital:"]>
    {:unicode=>"1F4E1", :unicode_alternates=>[], :name=>"satellite antenna", :shortname=>":satellite:", :category=>"objects", :aliases=>[], :aliases_ascii=>[], :keywords=>["communication", "object"], :moji=>"📡"}
    
    ☃️ (:snowman:) VS ⛄ (:snowman:)
    #<TanukiEmoji::Character: ☃️ (2603-fe0f) :snowman: aliases: [":snowman2:"]>
    {:unicode=>"26C4", :unicode_alternates=>["26C4-FE0F"], :name=>"snowman without snow", :shortname=>":snowman:", :category=>"nature", :aliases=>[], :aliases_ascii=>[], :keywords=>["christmas", "cold", "season", "weather", "winter", "xmas", "holidays", "snow"], :moji=>"⛄"}
    
    🐅 (:tiger:) VS 🐯 (:tiger:)
    #<TanukiEmoji::Character: 🐅 (1f405) :tiger: aliases: [":tiger2:"]>
    {:unicode=>"1F42F", :unicode_alternates=>[], :name=>"tiger face", :shortname=>":tiger:", :category=>"nature", :aliases=>[], :aliases_ascii=>[], :keywords=>["animal", "wildlife", "roar", "cat"], :moji=>"🐯"}
    
    🚆 (:train:) VS 🚋 (:train:)
    #<TanukiEmoji::Character: 🚆 (1f686) :train: aliases: [":train2:"]>
    {:unicode=>"1F68B", :unicode_alternates=>[], :name=>"Tram Car", :shortname=>":train:", :category=>"travel", :aliases=>[], :aliases_ascii=>[], :keywords=>["tram", "rail", "transportation", "travel", "train"], :moji=>"🚋"}
    
    ☂️ (:umbrella:) VS ☔ (:umbrella:)
    #<TanukiEmoji::Character: ☂️ (2602-fe0f) :umbrella: aliases: [":umbrella2:"]>
    {:unicode=>"2614", :unicode_alternates=>["2614-FE0F"], :name=>"umbrella with rain drops", :shortname=>":umbrella:", :category=>"nature", :aliases=>[], :aliases_ascii=>[], :keywords=>["rain", "weather", "sky", "cold"], :moji=>"☔"}
    
    🐋 (:whale:) VS 🐳 (:whale:)
    #<TanukiEmoji::Character: 🐋 (1f40b) :whale: aliases: [":whale2:"]>
    {:unicode=>"1F433", :unicode_alternates=>[], :name=>"spouting whale", :shortname=>":whale:", :category=>"nature", :aliases=>[], :aliases_ascii=>[], :keywords=>["animal", "nature", "ocean", "sea", "wildlife", "tropical", "whales"], :moji=>"🐳"}
  • Brett Walker added 6 commits

    added 6 commits

    Compare with previous version

  • 4 Warnings
    :warning: This merge request is definitely too big (59224 lines changed), please split it into multiple merge requests.
    :warning: cc86c389: Commits that change 30 or more lines across at least 3 files should describe these changes in the commit body. For more information, take a look at our Commit message guidelines.
    :warning: 1e89c66d: The commit subject must contain at least 3 words. For more information, take a look at our Commit message guidelines.
    :warning: a24a59a6: Commits that change 30 or more lines across at least 3 files should describe these changes in the commit body. For more information, take a look at our Commit message guidelines.

    Reviewer roulette

    Changes that require review have been detected! A merge request is normally reviewed by both a reviewer and a maintainer in its primary category and by a maintainer in all other categories.

    To spread load more evenly across eligible reviewers, Danger has picked a candidate for each review slot. Feel free to override these selections if you think someone else would be better-suited or use the GitLab Review Workload Dashboard to find other available reviewers.

    To read more on how to use the reviewer roulette, please take a look at the Engineering workflow and code review guidelines. Please consider assigning a reviewer or maintainer who is a domain expert in the area of the merge request.

    Once you've decided who will review this merge request, mention them as you normally would! Danger does not automatically notify them for you.

    Reviewer Maintainer
    No reviewer available @kerrizor profile link current availability (UTC-6, 1 hour behind @digitalmoksha)

    If needed, you can retry the :repeat: danger-review job that generated this comment.

    Generated by :no_entry_sign: Danger

    Edited by ****
  • Brett Walker added 2 commits

    added 2 commits

    Compare with previous version

  • 🤖 GitLab Bot 🤖 resolved all threads

    resolved all threads

  • Brett Walker changed title from Draft: Resolve "Update support for Unicode 16.0" to Draft: Resolve "Update support for Unicode 15.1"

    changed title from Draft: Resolve "Update support for Unicode 16.0" to Draft: Resolve "Update support for Unicode 15.1"

  • Brett Walker changed the description

    changed the description

  • Brett Walker added 4 commits

    added 4 commits

    Compare with previous version

  • Brett Walker added 2 commits

    added 2 commits

    • 7d24f923 - Update to noto-emoji v2.042 for Unicode 15.1
    • 5ca08e87 - Work in progress

    Compare with previous version

  • Brett Walker added 1 commit

    added 1 commit

    • a4f9c2d5 - Use Unicode emoji-test.tx as primary data

    Compare with previous version

  • Brett Walker requested review from @brodock

    requested review from @brodock

  • Brett Walker added 2 commits

    added 2 commits

    • 1a7d5821 - Sort by all indexd emojis rather than main emoji
    • 68dc730b - Don’t add missing gemojione emojis

    Compare with previous version

  • Current version of tankuki_emoji uses gemojione's 3.3.0 index. By adding adding its shortcodes as aliases, we maintain backward compatibility. gemojione is no longer being maintained, and the data it's based on, emojione has now been moved to emoji-toolkit, which has different licenses.

    Their 8.0 license does say non-artwork is under the MIT license

    JoyPixels Non-Artwork

    Applies to the Javascript, JSON, PHP, CSS, HTML files, and everything else not covered under the artwork license above, found in both the emoji-toolkit and emoji-assets repos. License: MIT Complete Legal Terms: https://opensource.org/license/mit/

    However, it's better at this time not to tie ourselves to that data. So we will not be using gemojione's final index data.

  • Brett Walker added 6 commits

    added 6 commits

    • 25cd0954 - Add support for Unicdoe 15.1 emojis
    • 92c27d86 - Update to noto-emoji v2.042 for Unicode 15.1
    • 464fb516 - Use Unicode emoji-test.tx as primary data
    • 50dc695f - Sort by all indexd emojis rather than main emoji
    • b3a7995c - Don’t add missing gemojione emojis
    • 2551577b - Update README

    Compare with previous version

  • Brett Walker added 1 commit

    added 1 commit

    • bd59f812 - Add changed aliases from 13.1 and 14.0

    Compare with previous version

  • Brett Walker added 3 commits

    added 3 commits

    Compare with previous version

  • Brett Walker marked this merge request as ready

    marked this merge request as ready

  • Brett Walker changed title from Draft: Resolve "Update support for Unicode 15.1" to Update support for Unicode 15.1

    changed title from Draft: Resolve "Update support for Unicode 15.1" to Update support for Unicode 15.1

  • Brett Walker changed milestone to %17.5

    changed milestone to %17.5

  • added emoji label

  • Brett Walker changed the description

    changed the description

  • Brett Walker added 10 commits

    added 10 commits

    • d8233551 - Add support for Unicdoe 15.1 emojis
    • bf5b4f85 - Update to noto-emoji v2.042 for Unicode 15.1
    • 1882e952 - Use Unicode emoji-test.tx as primary data
    • 91131998 - Sort by all indexd emojis rather than main emoji
    • 7cacd43a - Don’t add missing gemojione emojis
    • 4b76b356 - Update README
    • 83284eae - Add changed aliases from 13.1 and 14.0
    • 2e26e46a - Remove unicode versioning dataset
    • 91df8b55 - Add gemojione spec file
    • f51227e2 - Add additional specs

    Compare with previous version

  • Brett Walker added 7 commits

    added 7 commits

    • a24a59a6 - Use Unicode emoji-test.tx as primary data
    • 368aefdd - Sort by all indexd emojis rather than main emoji
    • 7ee9b441 - Don’t add missing gemojione emojis
    • 1e89c66d - Update README
    • ee6e4b1e - Remove unicode versioning dataset
    • 76c6f62a - Add gemojione spec file
    • 6d02f499 - Add additional specs

    Compare with previous version

  • Brett Walker mentioned in merge request !30 (closed)

    mentioned in merge request !30 (closed)

  • Brett Walker mentioned in merge request !68 (closed)

    mentioned in merge request !68 (closed)

4 4
5 5 module TanukiEmoji
6 6 module Db
7 # Emoji Unicode Version database
  • 1 # frozen_string_literal: true
    2
    3 require 'strscan'
    4 require 'date'
    5 require 'i18n'
    6 require_relative 'emoji_data'
    7
    8 module TanukiEmoji
    9 module Db
    10 # Reads and extract content from emoji-test.txt
    11 class EmojiTestParser
    • This is the parser that handles the emoji-test.txt file, which we're now using as the base Unicode single source of truth.

      This uses the same parsing style as emoji_data_parser.rb

    • Please register or sign in to reply
  • 2 2
    3 3 require 'strscan'
    4 4 require 'date'
    5 require_relative 'emoji_data'
    5 6
    6 7 module TanukiEmoji
    7 8 module Db
    8 9 # Reads and extract content from emoji-data.txt and its metadata
    9 10 class EmojiDataParser
    10 DATA_FILE = 'vendor/unicode/emoji-data.txt'
  • 1 # frozen_string_literal: true
    2
    3 require 'strscan'
    4 require 'date'
    5 require 'i18n'
    6 require_relative 'emoji_data'
    7
    8 module TanukiEmoji
    9 module Db
    10 # Reads and extract content from emoji-test.txt
    11 class EmojiTestParser
    12 DATA_FILE = "#{::TanukiEmoji::Db::UNICODE_DATA_DIR}/emoji-test.txt"
    13
    14 # https://www.unicode.org/reports/tr51/#Versioning
    15 EMOJI_UNICODE_VERSION = {
  • 24 24 JSON.parse(file.read, symbolize_names: true)
    25 25 end
    26 26
    27 db.each do |emoji_name, emoji_data|
    28 emoji = Character.new(emoji_name.to_s,
    29 codepoints: emoji_data[:moji],
    30 alpha_code: emoji_data[:shortname],
    31 description: emoji_data[:name],
    32 category: emoji_data[:category])
    27 db.each do |_emoji_name, emoji_data|
  • 112 111 end
    113 112
    114 113 def load_data_files
    114 Db::EmojiTestParser.new(index: self).load!
    115 115 Db::Gemojione.new(index: self).load!
    116 Db::UnicodeVersion.new(index: self).load!
  • Brett Walker requested review from @kerrizor

    requested review from @kerrizor

  • Brett Walker added 1 commit

    added 1 commit

    Compare with previous version

  • Kerri Miller approved this merge request

    approved this merge request

  • Brett Walker added 16 commits

    added 16 commits

    Compare with previous version

  • 4 Warnings
    :warning: This merge request is definitely too big (59225 lines changed), please split it into multiple merge requests.
    :warning: ae84f8f7: Commits that change 30 or more lines across at least 3 files should describe these changes in the commit body. For more information, take a look at our Commit message guidelines.
    :warning: 45bb4143: The commit subject must contain at least 3 words. For more information, take a look at our Commit message guidelines.
    :warning: ccac90b7: Commits that change 30 or more lines across at least 3 files should describe these changes in the commit body. For more information, take a look at our Commit message guidelines.

    Reviewer roulette

    Changes that require review have been detected! A merge request is normally reviewed by both a reviewer and a maintainer in its primary category and by a maintainer in all other categories.

    To spread load more evenly across eligible reviewers, Danger has picked a candidate for each review slot. Feel free to override these selections if you think someone else would be better-suited or use the GitLab Review Workload Dashboard to find other available reviewers.

    To read more on how to use the reviewer roulette, please take a look at the Engineering workflow and code review guidelines. Please consider assigning a reviewer or maintainer who is a domain expert in the area of the merge request.

    Once you've decided who will review this merge request, mention them as you normally would! Danger does not automatically notify them for you.

    Reviewer Maintainer
    No reviewer available @kerrizor profile link current availability (UTC-7, 2 hours behind @digitalmoksha)

    If needed, you can retry the :repeat: danger-review job that generated this comment.

    Generated by :no_entry_sign: Danger

    Edited by ****
  • Brett Walker added 1 commit

    added 1 commit

    Compare with previous version

  • Brett Walker added 1 commit

    added 1 commit

    Compare with previous version

  • Brett Walker added 1 commit

    added 1 commit

    Compare with previous version

  • mentioned in issue #28 (closed)

  • Gabriel Mazetto approved this merge request

    approved this merge request

  • mentioned in commit ba7d4cd1

  • mentioned in issue #17 (closed)

  • Brett Walker mentioned in merge request !73 (merged)

    mentioned in merge request !73 (merged)

  • Please register or sign in to reply
    Loading