Gitaly branch names returning ASCII-8BIT and mixing with UTF-8 in views
A number of users have complained about Error 500s loading certain views, such as project settings:
Encoding::CompatibilityError (incompatible character encodings: ASCII-8BIT and UTF-8):
app/views/projects/edit.html.haml:329:in `_app_views_projects_edit_html_haml__1608548703085490935_70048676486980'
app/controllers/projects_controller.rb:26:in `edit'
lib/gitlab/i18n.rb:39:in `with_locale'
lib/gitlab/i18n.rb:45:in `with_user_locale'
app/controllers/application_controller.rb:307:in `set_locale'
lib/gitlab/performance_bar/peek_performance_bar_with_rack_body.rb:16:in `call'
lib/gitlab/middleware/multipart.rb:93:in `call'
lib/gitlab/request_profiler/middleware.rb:14:in `call'
%li Project visibility level will be changed to match namespace rules when transfering to a group.
lib/gitlab/database/load_balancing/rack_middleware.rb:37:in `call'
lib/gitlab/middleware/go.rb:16:in `call'
lib/gitlab/etag_caching/middleware.rb:11:in `call'
lib/gitlab/middleware/rails_queue_duration.rb:20:in `call'
lib/gitlab/metrics/rack_middleware.rb:29:in `block in call'
lib/gitlab/metrics/transaction.rb:49:in `run'
lib/gitlab/metrics/rack_middleware.rb:29:in `call'
lib/gitlab/middleware/readonly_geo.rb:30:in `call'
lib/gitlab/request_context.rb:18:in `call'
It turns out that Gitaly is returning ASCII-8BIT data:
irb(main):090:0> p.repository.branch_names.map(&:encoding)
=> [#<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>, #<Encoding:ASCII-8BIT>]
We verified this theory by turning off Gitaly branch_names
feature:
irb(main):091:0> module Gitlab
irb(main):092:1> module Git
irb(main):093:2> class Repository
irb(main):094:3> def branch_names
irb(main):095:4> branches.map(&:name)
irb(main):096:4> end
irb(main):097:3> end
irb(main):098:2> end
irb(main):099:1> end
=> :branch_names
irb(main):100:0> p.repository.expire_branches_cache
=> nil
irb(main):101:0> p.repository.branch_names.map(&:encoding)
=> [#<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>]
Things worked after that.
For now, we suggest:
- Turn off the Gitaly feature flags for anything that retrieves strings (e.g. branch names)
- Fix either Gitaly or the gRPC handling to return UTF-8
- Expire branch names cache across the board
/cc: @dblessing, @lbot, @andrewn, @eReGeBe