Skip to content

Gitaly branch names returning ASCII-8BIT and mixing with UTF-8 in views

A number of users have complained about Error 500s loading certain views, such as project settings:

Encoding::CompatibilityError (incompatible character encodings: ASCII-8BIT and UTF-8):
  app/views/projects/edit.html.haml:329:in `_app_views_projects_edit_html_haml__1608548703085490935_70048676486980'
  app/controllers/projects_controller.rb:26:in `edit'
  lib/gitlab/i18n.rb:39:in `with_locale'
  lib/gitlab/i18n.rb:45:in `with_user_locale'
  app/controllers/application_controller.rb:307:in `set_locale'
  lib/gitlab/performance_bar/peek_performance_bar_with_rack_body.rb:16:in `call'
  lib/gitlab/middleware/multipart.rb:93:in `call'
  lib/gitlab/request_profiler/middleware.rb:14:in `call'
              %li Project visibility level will be changed to match namespace rules when transfering to a group.
  lib/gitlab/database/load_balancing/rack_middleware.rb:37:in `call'
  lib/gitlab/middleware/go.rb:16:in `call'
  lib/gitlab/etag_caching/middleware.rb:11:in `call'
  lib/gitlab/middleware/rails_queue_duration.rb:20:in `call'
  lib/gitlab/metrics/rack_middleware.rb:29:in `block in call'
  lib/gitlab/metrics/transaction.rb:49:in `run'
  lib/gitlab/metrics/rack_middleware.rb:29:in `call'
  lib/gitlab/middleware/readonly_geo.rb:30:in `call'
  lib/gitlab/request_context.rb:18:in `call'

It turns out that Gitaly is returning ASCII-8BIT data:

irb(main):090:0> p.repository.branch_names.map(&:encoding)
=> [#<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>]

We verified this theory by turning off Gitaly branch_names feature:

irb(main):091:0> module Gitlab
irb(main):092:1>   module Git
irb(main):093:2>     class Repository
irb(main):094:3>       def branch_names
irb(main):095:4>         branches.map(&:name)
irb(main):096:4>       end
irb(main):097:3>     end
irb(main):098:2>   end
irb(main):099:1> end
=> :branch_names
irb(main):100:0> p.repository.expire_branches_cache
=> nil
irb(main):101:0> p.repository.branch_names.map(&:encoding)
=> [#<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>]

Things worked after that.

For now, we suggest:

  1. Turn off the Gitaly feature flags for anything that retrieves strings (e.g. branch names)
  2. Fix either Gitaly or the gRPC handling to return UTF-8
  3. Expire branch names cache across the board

/cc: @dblessing, @lbot, @andrewn, @eReGeBe