ld.so: Scan the marker on all components, the executable and its dependency shared libraries.
When performing symbol lookup for references in an object without single global definition:
Disallow copy relocations against protected data symbols in an object with single global definition.
Disallow non-zero symbol values of undefined function symbols, which are used as the function pointer, against protected function symbols in an object with single global definition.
GCC x86 can use local access for protected symbols today without breaking anything.
All accesses to protected definitions are local access.
In executable, all accesses to defined symbols are local access.
They already work.
All global function pointers, whose function bodies aren't locally defined, must use GOT.
It can be done without a marker.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593
Some architectures (e.g. i386/ppc32) require a GOT register because they don't support PC-relative instructions. These are legacy architectures. We can leave them unchanged.
Other architectures can default to "use GOT to take the address of an external default visibility function" in -fno-pic mode.
We can add an option -fdirect-access-external-function for rare users who want the original -fno-pic behavior.
There may be some little cost because taking the address of an external default visibility global variable is more frequent,
though I don't think it can be a bottleneck of anything.
We can add ld warnings when R_*_COPY is present.
Users can add an ld option to suppress the warning. No marker is needed.
When ld warning is prevailing, we can add a glibc ld.so warning for R_*_COPY.
Branches to undefined symbols may use PLT.
The 2018 R_X86_64_PLT32 scheme for call/jmp foo has already done this.
I hope folks can focus on functions/canonical PLT entries as the first step because it will give immediate performance boost. Once canonical PLT entries are eliminated, we can safely build software with ld -Bsymbolic-non-weak-functions (drop many R_*JUMP_SLOT and some function address caused R_*_GLOB_DAT; convert some absolute relocations to R_*_RELATIVE).
There are no R_X86_64_COPY in executable without using -fPIC nor -fPIE.
ld and ld.so should work together to detect the issue caused by R_X86_64_COPY
at compile-time and/or run-time.
R_X86_64_COPY removal won't happen overnight. We need ways to detect and
mitigate the potential R_X86_64_COPY related issues before R_X86_64_COPY
is completely removed.
eliminate copy relocations for -fno-pic (all architectures; if got indirection cost is a concern, opt out the legacy architectures (i386/ppc32))
We can do 1 and 2 immediately.
After we do 2, we can let ld default to warn for canonical PLT entries (st_shndx==0,st_value!=0).
When the ld warning has been there for a while, let ld.so warn for canonical PLT entries.
Distribution-wide default ld -Bsemantic-non-weak-functions is safe after 2.
Copy relocations are a bit subtle because some badly written assembly files may have problems. Some users may prefer performance despite copy relocations on architectures without x86-64 GOTPCRELX/ppc64 TOC optimization.
After we do 3, we can let ld default to warn for R_*_COPY.
When the ld warning has been there for a while, let ld.so warn for R_*_COPY.
Note that many action items can be parallelized.
I don't think compiler/assembler need any marker.
Many assembly files are written with good -fPIC/-fPIE in mind.
They should not need a marker like .note.GNU-stack
make -fno-pic default to -fno-direct-access-external-data for most architectures. Some users may prefer performance despite copy relocations without x86-64 GOTPCRELX/ppc64 TOC optimization. They can opt out.
make ld default to warn for R_*_COPY
make glibc ld.so warn for R_*_COPY
GCC: treat STV_PROTECTED similar to STV_HIDDEN
GCC aarch64/arm/x86/...: allow direct access relocations on protected symbols in -fpic mode.
GNU ld: treat STV_PROTECTED similar to STV_DEFAULT in -Bsymbolic mode
GNU ld aarch64/x86: allow direct access relocations on protected data symbols in -shared mode.
GNU ld x86: disallow copy relocations on protected data symbols. (I think canonical PLT entries on protected symbols have been disallowed.)
After elimination of canonical PLT entries, we can safely enable distribution-wide default ld -Bsemantic-non-weak-functions.
This will improve performance for lots of software, especially for short-lived processes where relocation symbol lookup takes a significant portion.
x2 is a non-PIE executable which has nothing to do with HAVE_LD_PIE_COPYRELOC.
A marker provides a way to identify issues with R_X86_64_COPY at link-time as
well as run-time. We have used the marker for CET enabling successfully.
gcc -fno-pic -O2 -o x2 x.c libbar.so -Wl,-rpath,. has problems on all architectures
gcc -fpie -O2 -o x2 x.c libbar.so -Wl,-rpath,. has problem only on x86-64. It is related to HAVE_LD_PIE_COPYRELOC
Many distributions configure GCC with --enable-default-pie.
CET has size cost and performance cost. Both SHSTK and IBT have interaction with many applications (stack manipulation, setjmp, JIT, etc). It is good to use an opt-in strategy.
Eliminating canonical PLT entries for -fno-pic has zero cost for most software (taking the address is rare, even rarer after SROA/indirect-to-direct call optimization/inlining/etc), e.g. a bootstrapped clang is byte identical.
For copy relocations elimination, many groups who don't prefer handling GNU_PROPERTY want to benefit from it as well.
Many assembly files are PIC aware. They should not add new markers to enable optimizations.
OK, ultimately I think I'd prefer to see these things fixed instead of keeping the current state as-is. If you want GNU_PROPERTY, I think it is fine as long as it is optional. For example, I can imagine that *BSD/Fushcia/perhaps other ELF OSes which just want to get rid of copy relocations/canonical PLT entries but don't want to deal with GNU_PROPERTY.
Not all GCC binaries are built with --enable-default-pie. Even if they are, they still support -no-pie. R_X86_64_COPY removal should be done for PIE and non-PIE.
R_X86_64_COPY removal on Linux will be done piece by piece. At link-time and run-time, we need to know which .o/.so are R_X86_64_COPY free. We need to track it for both assembly sources as well as high level language sources.
Once (a) HAVE_LD_PIE_COPYRELOC is fixed and (b) x86-64 -fno-pic defaults to -fno-direct-access-external-data, pure C/C++ software will be free of R_X86_64_COPY.
(Note: -fpic/-fpie default to -fno-direct-access-external-data)
The remaining is a small number pieces of software with bad assembly (I think most have good assembly).
The ld warning/error (think of the binutils configure option --enable-textrel-warning=warning) can expose them. When the ld with warning/error is prevailing, glibc ld.so can start to warn as well.
At run-time, R_X86_64_COPY on symbol foo in executable is a problem only when the shared library, which
defines foo, doesn't expect R_X86_64_COPY. Before R_X86_64_COPY is completely removed, a marker on such
shared libraries will help ld.so issue an error only when necessary. Otherwise R_X86_64_COPY removal on
Linux may be too difficult to happen.
We also need to make sure that a simple rebuild of a shared object with an updated toolchain does not break ABI because the object is no longer compatible with R_X86_64_COPY relocations in a main program.
Regarding the variable case: How does the proposal change shared object behaviors? I cannot find any which can make executable R_X86_64_COPY incompatible. If folks feel that a GNU PROPERTY is useful for copy relocations, I may not object, but I think it is good to make the function case and STV_PROTECTED fixes separate.
If rebuilding a shared lib that formerly used normal ELF lookup for data object 'foo' now uses local lookup for 'foo', it will break any existing executable that uses COPY for 'foo'. Formerly the version in the executable prevailed and was used by it and the shared lib. If the shared lib then is updated to use local lookup it uses the copy in the shared lib, while the executable still uses the copy in it, so you have two copies and trouble.
So, that cannot be done by default, it's an ABI change.
Remove copy relocation, add canonical function address and optimize locally defined symbol access:
All accesses to protected definitions are local access.
In executable, all accesses to defined symbols are local access.
All global function pointers, whose function bodies aren't locally defined, must use GOT.
All read/write accesses to symbols, which aren't locally defined, must use GOT.
Branches to undefined symbols may use PLT.
So, I agree with all these items, I think. I would even go so far as to say that these are
the intended ELF rules and any deviation from that is actually a bug or at least a quality
of implementation issue.
These should be enforced by
Compiler: Add a compiler option, -fsingle-global-definition
But I don't really see why you would need this? The compiler needs to know if the compilation model
is for an executable or a shared library, but otherwise every item of your list can be inferred by
the compiler right now in such a way that -fsingle-global-definition wouldn't make a difference.
(e.g. if it sees a reference without definition, and currently compiles as shared lib it can't infer
that the definition will be in this component, with or without -fsingle-global-definition).
I think your wish for markers in assembler and linker is over-engineering things, but if you want
to put in the work for that ...
Also, in the linked presentation I still don't see reasons for a compiler flag. Can you spell out the specific
changes that the flag would enable in the compiler, and, for each of these changes also say why you think
that that shouldn't simply be done always, even without the flag?
For instance, I will argue that always emitting GOT access for global undefined data should be done even
without the flag, i.e. we can rely on linker relaxation for performance?
The function call is unaffected and went through the PLT. The two data references are, as expected, indirect via the GOT. Compare that to a build without the -fsingle-global-definition:
The first, absolute address (0x401060) is undecorated in the output, but refers to the PLT slot for QTimer::timeout(QTimer::QPrivateSignal). The QCoreApplication::self symbol appears unmodified, but it was subject to copy relocation: it is directly accessed and its absolute address to the .bss section, not the .got.
The .o files show a change in relocation too: with -fsingle-global-definition, both are R_X86_64_REX_GOTPCRELX, whereas without they are R_X86_64_32S and R_X86_64_PC32.
Conclusion: this is working really well and is doing exactly what I wanted it to.
I think -fno-direct-access-extern-data is clearer and more specific. -fsingle-global-definition isn't really clear to a casual user why it could mean. Does "global" mean STB_GLOBAL or the combination of all components (exe and dso)?
When taking an external function address in -fno-pic code, I suggest -fno-direct-access-extern-function (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593). Actually, for many arches I suggest that we just use GOT by default, no need for a toggle.