GetRawChanges protocol makes wrong assumptions about encodings
message GetRawChangesRequest {
Repository repository = 1;
string from_revision = 2;
string to_revision = 3;
}
The default in gitaly-proto is to use bytes revison
to be encoding agnostic. Depending on how GetRawChanges
gets called this may or may not be a problem.
message GetRawChangesResponse {
message RawChange {
enum Operation {
// ...
}
string blob_id = 1;
int64 size= 2;
string new_path = 3;
string old_path = 4;
Operation operation= 5;
string raw_operation = 6;
int32 old_mode = 7;
int32 new_mode = 8;
}
repeated RawChange raw_changes = 1;
}
This is the bigger problem: GetRawChangesResponse
assumes that all paths in a Git repo are UTF-8. I have not verified this (yet) but I believe this is a false assumption. Git won't stop a user from creating a file with a non-UTF-8 name, and that will blow up the moment we try to create a GetRawChangesResponse reporting about that file.
The first step here is to verify my claim that we can encounter non-UTF-8 filenames.
cc @vsizov