compiler: function inline failure
Some functions fail to get inlined despite clear declaration. Steps to reproduce reported by Fade. This benchmark shows also other performance bottlenecks, but these two functions not inlined seems to be bugs.
> (ql:quickload 'ironclad)
;>compile and load this file
(in-package cl-user)
(defparameter *what-we-already-have* (make-hash-table :test #'equal))
(defparameter *aggregate-storage-directory*
(ensure-directories-exist
(merge-pathnames
(format nil "Pobrane/series/Flashpoint/")
(user-homedir-pathname))))
(defun sha1-file (path)
(let ((sha1 (ironclad:make-digest 'ironclad:sha1)))
(with-open-file (stream path :element-type '(unsigned-byte 8))
(ironclad:update-digest sha1 stream)
(values (ironclad:byte-array-to-hex-string (ironclad:produce-digest sha1)) path))))
(defun load-pool (&key (path *aggregate-storage-directory*))
;; for every file in the given directory (path), calculate its hash
;; and put it in the special table
(loop for file in (uiop:directory-files path)
for (hash path) = (multiple-value-list (sha1-file file))
:do (progn
(format t "~&~A :: ~A" hash path)
(setf (gethash hash *what-we-already-have*) path)
(format t " [Done]"))))
I've profiled runtime with perf and this yielded the following result:
(results contributing less than 1% not included on the screenshot).
ecl_function_dispatch is a known bottleneck (implementing fast gf dispatch is planned after the next release - that would be a huge win), _pthread_getspecific is interesting too, but that's mostly about managing environments which are separate for different threads (probably).
What interested me are two functions L20rol32 and L15mod32_ (contributing to top five functions). When we look into ironclad source code, we see:
(declaim #+ironclad-fast-mod32-arithmetic (inline rol32 ror32)
(ftype (function ((unsigned-byte 32) (unsigned-byte 5)) (unsigned-byte 32)) rol32 ror32))
(defun rol32 (a s)
(declare (type (unsigned-byte 32) a) (type (integer 0 32) s))
#+(and ccl x86-64)
(ccl::rol32 a s)
#+cmu
(kernel:32bit-logical-or #+little-endian (kernel:shift-towards-end a s)
#+big-endian (kernel:shift-towards-start a s)
(ash a (- s 32)))
#+ecl
(ffi:c-inline (a s)
(:uint32-t :uint8-t)
:uint32-t
"(#0 << #1) | (#0 >> (32 - #1))"
:one-liner t
:side-effects nil)
#+sbcl
(sb-rotate-byte:rotate-byte s (byte 32 0) a)
#-(or (and ccl x86-64) cmu ecl sbcl)
(logior (ldb (byte 32 0) (ash a s)) (ash a (- s 32))))
and
(declaim #+ironclad-fast-mod32-arithmetic (inline mod32+)
(ftype (function ((unsigned-byte 32) (unsigned-byte 32)) (unsigned-byte 32)) mod32+))
(defun mod32+ (a b)
(declare (type (unsigned-byte 32) a b))
#+ecl
(ffi:c-inline (a b)
(:uint32-t :uint32-t)
:uint32-t
"#0 + #1"
:one-liner t
:side-effects nil)
#+(and ccl x86-64)
(ccl::mod32+ a b)
#-(or ecl (and ccl x86-64))
(ldb (byte 32 0) (+ a b)))
*features*
have IRONCLAD-FAST-MOD32-ARITHMETIC
defined and functions are clear inlining candidates (and they are even declaimed inline!), still they are compiled into separate functions.
;;; newest ecl-develop as of 2018-0919
CL-USER> (lisp-implementation-version)
"16.1.3"
CL-USER> (ext:lisp-implementation-vcs-id)
"0476bca326ede723e883fe06c19c2cb748b1deec"
CL-USER> (software-type)
"Linux"
CL-USER> (software-version)
"4.15.0-34-generic"
CL-USER> (machine-type)
"x86_64"
CL-USER> *features*
(:IRONCLAD-FAST-MOD64-ARITHMETIC :IRONCLAD-FAST-MOD32-ARITHMETIC :SWANK
:SERVE-EVENT :PROFILE :QUICKLISP :ASDF-PACKAGE-SYSTEM :ASDF3.1 :ASDF3 :ASDF2
:ASDF :OS-UNIX :NON-BASE-CHARS-EXIST-P :ASDF-UNICODE :WALKER :CDR-1 :CDR-5
:LINUX :FORMATTER :CDR-7 :ECL-WEAK-HASH :LITTLE-ENDIAN :ECL-READ-WRITE-LOCK
:LONG-LONG :UINT64-T :UINT32-T :UINT16-T :RELATIVE-PACKAGE-NAMES :LONG-FLOAT
:UNICODE :DFFI :CLOS-STREAMS :CMU-FORMAT :UNIX :ECL-PDE :DLOPEN :CLOS :THREADS
:BOEHM-GC :ANSI-CL :COMMON-LISP :IEEE-FLOATING-POINT :CDR-14 :PREFIXED-API
:FFI :X86_64 :COMMON :ECL)
Some additional notes.