Skip to content

fcl-pdf: Octal strings parsing is invalid (not compliant with format definition)

Summary

fppdfscanner.pp code to parse octal strings is not compliant with PDF format specification. Current implementation assumes that octal strings always have 3 digits, however that is not true by specification (https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf, 7.3.4.2, pages 15-16)

System Information

  • Operating system: Linux (Kubuntu 24.04)
  • Processor architecture: x86-64
  • Compiler version: Free Pascal Compiler version 3.3.1-18107-g994ebf55ec-dirty [2025/07/06] for x86_64
  • Device: Computer

Steps to reproduce

  1. Create sample program that loads up a PDF file (example is provided below)
  2. Compile it and invoke with PDF file that does not always use 3-digits long octal strings (example file that I randomly stumbled across Get_Started_With_Smallpdf.pdf)
  3. The program will crash, because the octal string is considered invalid (but the file itself is valid by specification)

Example Project

program test_pdf;

{$Mode DelphiUnicode}
{$H+}

uses 
    Classes,
    fppdfobjects,
    fppdfparser;

var InputFileName: UnicodeString;
    PdfFile: TPDFDocument;
    PdfParser: TPDFParser;
    FileStream: TFileStream;
begin 
    if ParamCount < 1 then begin 
        WriteLn('Usage: test_pdf <filename>');
        Exit;
    end;

    InputFileName := ParamStr(1);

    PdfFile := TPDFDocument.Create;
    try
        FileStream := TFileStream.Create(InputFileName, fmOpenRead);
        try
            PdfParser := TPDFParser.Create(FileStream);
            try 
                PdfParser.ResolveContentStreams := False;
                PdfParser.ResolveToUnicodeCMaps := False;
                PdfParser.ParseDocument(PdfFile);
            finally 
                PdfParser.Free;
            end;
        finally
            FileStream.Free;
        end;

        WriteLn('Parsed!');
    finally
        PdfFile.Free;
    end;
end.

What is the current bug behavior?

Currently the program crashes with:

An unhandled exception occurred at $00000000004C93CB:
EPDFScanner: Invalid octal character: "35%"
  $00000000004C93CB
  $00000000004C9459
  $00000000004636B1
  $00000000004637BE
  $0000000000460682

What is the expected (correct) behavior?

PDF file is parsed and program exits without errors

Possible fixes

Current code that parses octal strings (https://gitlab.com/freepascal.org/fpc/source/-/blob/main/packages/fcl-pdf/src/fppdfscanner.pp?ref_type=heads#L669):

        '0'..'9':
            begin
            if FSource.IsEOF then
              DoError(senEOFWhileScanningString,SErrEOFWhileScanningString);
            aChar3:=Char(FSource.GetByte());
            if FSource.IsEOF then
              DoError(senEOFWhileScanningString,SErrEOFWhileScanningString);
            aChar4:=Char(FSource.GetByte());
            aOctal:=StrToIntDef('&'+aChar2+aChar3+aChar4,-1);
            if (aOctal=-1) or (aOctal>=256) then
              DoError(senInvalidOctalCharacter,SErrInvalidOctalCharacter,[aChar2+aChar3+aChar4]);
            AddToToken(aOctal and $FF)
            end

Potential fix is to use this code to parse octal strings instead (patch file is fppdfscanner.pp.patch):

        '0'..'9':
            begin
            aOctal := 0;

            for i := 1 to 3 do begin 
              aOctal := aOctal * 8 + Ord(aChar2) - Ord('0');

              if aOctal >= 256 then 
                // It is more like invalid octal string rather than invalid octal character
                DoError(senInvalidOctalCharacter, SErrInvalidOctalCharacter, [ aChar2 ]);

              if FSource.IsEOF then 
                DoError(senEOFWhileScanningString,SErrEOFWhileScanningString);
              
              aChar2 := Char(FSource.GetByte());

              if not (aChar2 in ['0'..'7']) then
                Break;
            end;

            FSource.Previous;
            AddToToken(aOctal);
            end
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information