fcl-pdf: Octal strings parsing is invalid (not compliant with format definition)
Summary
fppdfscanner.pp code to parse octal strings is not compliant with PDF format specification. Current implementation assumes that octal strings always have 3 digits, however that is not true by specification (https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf, 7.3.4.2, pages 15-16)
System Information
- Operating system: Linux (Kubuntu 24.04)
- Processor architecture: x86-64
- Compiler version: Free Pascal Compiler version 3.3.1-18107-g994ebf55ec-dirty [2025/07/06] for x86_64
- Device: Computer
Steps to reproduce
- Create sample program that loads up a PDF file (example is provided below)
- Compile it and invoke with PDF file that does not always use 3-digits long octal strings (example file that I randomly stumbled across Get_Started_With_Smallpdf.pdf)
- The program will crash, because the octal string is considered invalid (but the file itself is valid by specification)
Example Project
program test_pdf;
{$Mode DelphiUnicode}
{$H+}
uses
Classes,
fppdfobjects,
fppdfparser;
var InputFileName: UnicodeString;
PdfFile: TPDFDocument;
PdfParser: TPDFParser;
FileStream: TFileStream;
begin
if ParamCount < 1 then begin
WriteLn('Usage: test_pdf <filename>');
Exit;
end;
InputFileName := ParamStr(1);
PdfFile := TPDFDocument.Create;
try
FileStream := TFileStream.Create(InputFileName, fmOpenRead);
try
PdfParser := TPDFParser.Create(FileStream);
try
PdfParser.ResolveContentStreams := False;
PdfParser.ResolveToUnicodeCMaps := False;
PdfParser.ParseDocument(PdfFile);
finally
PdfParser.Free;
end;
finally
FileStream.Free;
end;
WriteLn('Parsed!');
finally
PdfFile.Free;
end;
end.
What is the current bug behavior?
Currently the program crashes with:
An unhandled exception occurred at $00000000004C93CB:
EPDFScanner: Invalid octal character: "35%"
$00000000004C93CB
$00000000004C9459
$00000000004636B1
$00000000004637BE
$0000000000460682
What is the expected (correct) behavior?
PDF file is parsed and program exits without errors
Possible fixes
Current code that parses octal strings (https://gitlab.com/freepascal.org/fpc/source/-/blob/main/packages/fcl-pdf/src/fppdfscanner.pp?ref_type=heads#L669):
'0'..'9':
begin
if FSource.IsEOF then
DoError(senEOFWhileScanningString,SErrEOFWhileScanningString);
aChar3:=Char(FSource.GetByte());
if FSource.IsEOF then
DoError(senEOFWhileScanningString,SErrEOFWhileScanningString);
aChar4:=Char(FSource.GetByte());
aOctal:=StrToIntDef('&'+aChar2+aChar3+aChar4,-1);
if (aOctal=-1) or (aOctal>=256) then
DoError(senInvalidOctalCharacter,SErrInvalidOctalCharacter,[aChar2+aChar3+aChar4]);
AddToToken(aOctal and $FF)
end
Potential fix is to use this code to parse octal strings instead (patch file is fppdfscanner.pp.patch):
'0'..'9':
begin
aOctal := 0;
for i := 1 to 3 do begin
aOctal := aOctal * 8 + Ord(aChar2) - Ord('0');
if aOctal >= 256 then
// It is more like invalid octal string rather than invalid octal character
DoError(senInvalidOctalCharacter, SErrInvalidOctalCharacter, [ aChar2 ]);
if FSource.IsEOF then
DoError(senEOFWhileScanningString,SErrEOFWhileScanningString);
aChar2 := Char(FSource.GetByte());
if not (aChar2 in ['0'..'7']) then
Break;
end;
FSource.Previous;
AddToToken(aOctal);
end