Passwords are secretely truncated because of lack of validation and improper Bcrypt hashing

HackerOne report #890462 by yyyyyyyan on 2020-06-03, assigned to @dcouture:

Summary

Passwords are secretely truncated, that is - MODIFIED without user knowledge -, because of lack of validation and improper (limited) Bcrypt hashing.

GitLab uses Bcrypt to hash passwords, which is great - Bcrypt is indeed considered to be the best hashing algorithm there is to use.
The problem is - Bcrypt has a password length limitation of 72 bytes, while GitLab's password policy accepts passwords up to 128 characters.
What happens then, if a user tries to set a password longer than 72 bytes but still not passing 128 characters?
The password passes GitLab's validation, but Bcrypt's not able to hash a password that long. In that case, Bcrypt lib (not only in Ruby, but in many implementations) simply truncates the password to 72 bytes, blatantly ignoring the rest of it, modifying the original password. The worst is: Since Bcrypt lib also truncates the input to check the password (on login), the user will probably never know that the password they decided and they're using to login is not actually the one (hashed) stored in the database.

The easier solution may seem to be simply fixing GitLab's password policy and validation, but that is not exactly right, for two reasons:

First, long passwords are good. Since the famous xkcd's comic, people are starting to understand the importance of long passwords. Therefore, long passwords should always be encouraged.

The second reason is a bit more important and complicated. Let's say you decide your users' don't need or don't use long passwords, and you'll only change the password validation. The thing is - GitLab's validation must check for characters (because that's what makes sense for the final user when they're deciding and typing out their password), while Bcrypt works with bytes. UTF-8, the default encoding, may use up to 4 bytes for character. That means it's easy (specially for people like me, that don't live in the US and are used to using non-ASCII characters) to create a password shorter than 72 characters but longer than 72 bytes. Take "áááááááááááááááááááááááááááááááááááááááááááááááááá", for example - it's 50 characters long, easily passing the validation, but has 100 bytes (UTF-8), meaning Bcrypt would truncate it to only 36 characters. Of course, this is a light example. Imagine someone who uses characters that are 4-byte encoded. The maximum length of passwords for this person would be 18 characters (72 / 4), which is low, for someone who cares about security.

What to do, then? The best and most recommended way to deal with this is pre-hashing users' passwords with another (secure) algorithm (usually SHA-256 or SHA-512). Dropbox does that and recommends it, Django classifies this option as "considered to be the most secure algorithm", and even the official package for .NET is designed with this option available. In contrary of what it may seem, using a secure hash function to pre-process the password is secure, as cryptographer Thomas Pornin explains:

It can be shown that if bcrypt(SHA-256(password)) is broken, then either the password was guessed, or some security characteristic of SHA-256 has been proven false. [..] SHA-256 is considered to be a secure hash function.

A suggestion to implement this change without annoying users is to use the current algorithm as a second attempt on validating login. That is, when someone tries to login, GitLab first tries to login using the new algorithm (SHA+Bcrypt). If it fails, it tries using the old (current) algorithm. If it succeeds, the password is hashed with the new algorithm and stored in the database. Of course, this is a simplified suggestion. The focus here is reporting the bug :-).

Steps to reproduce

In gitlab.com, login with your user
Access the settings to change your password

Example 1:
3. Change your password to abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwx (128 characters long, GitLab's current limit)
4. Login with your user and the password abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrst (72 characters and bytes long, Bcrypt's limit)

You've just logged in with a password different from the one you just set.

Example 2:
3. Change your password to 𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽 (128 characters long, GitLab's current limit)
4. Login with your user and the password 𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽𩸽 (18 characters and 72 bytes long, Bcrypt's limit)

You've just logged in with a password 110 characters shorter from the one you just set.

Impact

Despite not being intentional, users are being lied to. Many passwords are being modified (and weakened, by shortening its size) without users knowing. Not only it impacts on the security, but also on the trust users have for the platform, for failing in acknowledging the current behavior. On the technical side, it evens weakens users' accounts to brute force attacks, since the password can be extremely shortened (as in Example 2 on the previous topic). Remember, this doesn't impact only people who uses long passwords, since, as I just showed, non-ASCII characters can take up to 4 bytes!!

What is the current bug behavior?

GitLab accepts passwords that Bcrypt doesn't (over 72 bytes long), meaning that
Users' passwords may be modified before being hashed and saved in the database, without disclosure;
Some users' accounts may be accessed using different (and shorter) passwords than the one chosen by the user.

What is the expected correct behavior?

GitLab accepts long passwords;
The password gets hashed as is, without any modification, and then saved in the database;
No other password, despite the one chosen by a user, can be user to login into this user's account.