Skip to content
GitLab
  • Menu
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
    • Switch to GitLab Next
  • Sign in / Register
  • FPC Source FPC Source
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 1,281
    • Issues 1,281
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
  • Merge requests 56
    • Merge requests 56
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Releases
  • Packages & Registries
    • Packages & Registries
    • Package Registry
    • Container Registry
    • Infrastructure Registry
  • External wiki
    • External wiki
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • FPC
  • FPCFPC
  • FPC SourceFPC Source
  • Issues
  • #37833
Closed
Open
Created Sep 28, 2020 by FPC Admin account@fpc_adminOwner

Output to text streams other than real disk files goes unbuffered

Original Reporter info from Mantis: croco
  • Reporter name: Andrey "Croco" Stolyarov

Description:

Output to text streams, including stdout, is done without buffering, unless it is redirected to a real disk file. This significantly raises the program's execution time.

Steps to reproduce:

To reliably reproduce the effect, one can actually use any program that
does text output (including the output to stdout) of a notable volume. To
make a simple test, a program that outputs numbers from 0 to 100000000 (100
millions), one per line, will do:

program million;
var
    i: longint;
begin
    for i:=0 to 100000000 do
        writeln(i)
end.

100 mils is choosen because this takes measurable amount of time (tens of
seconds), but not too much in order not to fall asleep.

With the version of RTL present in current repositories, the following
measurements are observed. When the output goes to a plain disk file:

crocodil@frock:~/FPC$ time ./mil > file

real 0m28.787s
user 0m18.256s
sys 0m9.532s

So it took less than 29 seconds of real (wall clock) time, of which the
kernel had to spend 9.5 seconds. If the same thing is done using a
pipeline, the figures become different (please note the time command is
only applied to ./mil, not to the whole pipeline):

crocodil@frock:~/FPC$ time ./mil | cat > file

real 1m15.093s
user 0m26.912s
sys 2m0.156s

The "wall clock" time actually raised less than 2.5 times, but please note
the system time which is more that the "real". This means the poor kernel
had to utilize more than one CPU to serve this process and, roughly
speaking, spent 13 times more time.

This is because for the first run, the output (on the level of syscalls) is
done like this (the following are excerpts from what strace command shows):

write(1, "0\n1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11\n12\n13\n"..., 256) = 256
write(1, "\n89\n90\n91\n92\n93\n94\n95\n96\n97\n98\n9"..., 256) = 256
write(1, "5\n156\n157\n158\n159\n160\n161\n162\n16"..., 256) = 256

which seems reasonable (altough 256 is not the best buffer size -- well,
4096 is used by most of existing software; but 256 is still acceptable).
But in case the output goes anywhere but a real disk file, the output is
done like this:

write(1, "0\n", 2) = 2
write(1, "1\n", 2) = 2
write(1, "2\n", 2) = 2
[...]
write(1, "160\n", 4) = 4
write(1, "161\n", 4) = 4
write(1, "162\n", 4) = 4
[...]
write(1, "87337\n", 6) = 6
write(1, "87338\n", 6) = 6
write(1, "87339\n", 6) = 6
[...]

This means every writeln within the program causes its own system call
write(2), that is, there's no output buffering in effect. The funniest
thing here is that in case the output is redirected to /dev/null, it works
slower that for a disk file:

crocodil@frock:~/FPC$ time ./mil > /dev/null

real 0m38.543s
user 0m20.356s
sys 0m18.184s

This is ONLY because of total number of syscalls; every single syscall in
this case returns almost immediately, because there's nothing to do for it;
but the costs of switching contexts between the process and the kernel are
still in effect.

This all was not the case for the past releases, e.g., if we compile the
same program with fpc 3.2.0, we've got the following figures:

crocodil@frock:~/FPC$ time ./mil320 > file

real 0m31.800s
user 0m21.064s
sys 0m10.136s

crocodil@frock:~/FPC$ time ./mil320 | cat > file

real 0m21.423s
user 0m16.672s
sys 0m19.272s

crocodil@frock:~/FPC$ time ./mil320 > /dev/null

real 0m17.139s
user 0m16.592s
sys 0m0.544s

Please specifically note less than one second system time in the last case.

Additional information:

It looks like that this misbehaviour was introduced during the work on
ticket https://bugs.freepascal.org/view.php?id=37716 when the function
Do_IsDevice got changed despite that it had no relationship to the bug
reported in that ticket, and despite that I strolngly suggested not to do
this particular change.

The function Do_IsDevice was not broken, it just was (and still is)
misnamed. It was relatively safe not to touch it at all (as from SeekEof
it was called in error, and now in SeekEof there's no call to Do_IsDevice).
The "fix" for non-broken function lead to the obvious misbehaviour, as only
the streams that go to a human user may go unbuffered (and even these,
actually, should be buffered, but in a manner different from other
streams). Pipes, sockets and devices (like, e.g., /dev/null) are NOT of
this kind, only TTYs are; it is common practice in Unix world to pass
gigabytes of data through pipes and/or through standard input/output
streams, so unbuffering them all is a catastrophe, which can not even be
worked around anyhow from within the pascal program. Effectively this
makes Free Pascal totally unusable for implementing a large kind of
programs known as "unix filters".

Please note that before the "fix", Do_IsDevice was implemented as isatty(3)
-- that is, it issued ioctl(2) syscall to determine whether the stream is
connected to TTY or not.

It is very easy to fix this misbehaviour: roll back the Do_IsDevice
implementation to its previous state, and you're done.

What I also strongly suggest is to rename Do_IsDevice into Do_IsConsole or
the like, so that it will not do more confusion than it already did.

Mantis conversion info:

  • Mantis ID: 37833
  • OS: Linux
  • Build: fresh sources from git
  • Version: 3.3.1
  • Fixed in revision: 49324 (#9f6651fd)
  • Monitored by: » croco (Andrey "Croco" Stolyarov), » @xhajt03 (Tomas Hajny)
  • Target version: 4.0.0
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking