cwstring: string conversions might fail
Summary
Sometimes cwstring fails to convert between AnsiString and UnicodeString and vice versa
System Information
- Operating system: macOS Ventura 13.6.1
- Processor architecture: x86-64
- Compiler version: trunk
Steps to reproduce
Running the example project will sometimes fail - for me it failed regularly when run under debugger from Lazarus IDE, but always worked fine when run from terminal.
See also forum thread.
Example Project
program TestSetCodePage;
{$mode objfpc}{$H+}
uses
{$IFDEF UNIX}
cthreads,
cwstring,
{$ENDIF}
Classes,
SysUtils;
const
TextToWrite: string = 'äöüß';
var
rbs: RawByteString;
i: Integer;
begin
rbs:= TextToWrite;
WriteLn('Original CodePage = ', StringCodePage(rbs), ' Length = ', Length(rbs));
for i:= 1 to Length(rbs) do
Write(IntToHex(Ord(rbs[i])), ' ');
WriteLn();
if Length(UnicodeString(rbs)) <> 4 then
Halt(1);
SetCodePage(rbs, 1252);
WriteLn('Converted CodePage = ', StringCodePage(rbs), ' Length = ', Length(rbs));
for i:= 1 to Length(rbs) do
Write(IntToHex(Ord(rbs[i])), ' ');
WriteLn();
if Length(rbs) <> 4 then
Halt(1);
end.
What is the current bug behavior?
Converting rbs to UnicodeString returns a string with Length() = 8.
The bug was introduced with commit bf3ced76 by @mvancanneyt.
Before RawByteStrings were used and cast to PAnsiChar when calling iconv_open. This ensured the strings were null-terminated.
Now ShortStrings are used the and the cast to PAnsiChar is applied to the pointer of the first string element. This no longer ensures null-terminated strings so iconv_open might fail and return -1, if the string happens to be not null-terminated by chance.
What is the expected (correct) behavior?
Converting rbs to UnicodeString returns a string with Length() = 4.
Possible fixes
Either return to using RawByteString or manually append #0 to both strings. The attached patch does the latter fpcsrc-16-32-25.patch