Skip to content

FFI and non-ASCII characters

This is three related issues, discovered while developing some handwritten FFI bindings.

base-char corruption across FFI boundary with :cstring

I have this straight-forward binding:

(ffi:def-function ("_DrawText" draw-text-raw)
    ((text :cstring)
     (pos-x :int)
     (pos-y :int)
     (font-size :int)
     (color (* color-raw)))
  :returning :void)

(declaim (ftype (function (string fixnum fixnum fixnum color) null) draw-text))
(defun draw-text (text pos-x pos-y font-size color)
  (draw-text-raw text pos-x pos-y font-size (color-pointer color)))

Which accesses the following C:

void _DrawText(const char *text, int posX, int posY, int fontSize,
               Color *color) {
  Color stack = *color;
  DrawText(text, posX, posY, fontSize, stack);
}

where the inner DrawText comes from Raylib. Under ECL, passing the string Café corrupts the é; if I print it here to standard out, it shows:

Caf\351

If I print out each character and its associated int code separately, we see:

C: 67
a: 97
f: 102
\351: -23
\0: 0

Note that the char-code of #\é is 233. Thus -23 seems suspiciously like what you'd get if there had been some overflow of an 8-bit signed char somewhere. If you instead create a string literal in C of "Café" and print its contents in the same way, you see:

C: 67
a: 97
f: 102
\303: -61
\251: -87
\0: 0

So it seems that using :cstring only handles characters that are also standard-char and thus fit in an 8-bit char. If this is intended, then at least it should be documented.

Condition signalled when passing CJK characters

In debugging all this, I did an entire review of CL's character and string types.

string-and-chars

For ECL we see that #\a, #\é, and #\涅 live in separate character categories. If in the previous code above we passed 涅槃 instead of Café, an interesting Condition is signalled:

Cannot coerce string 涅槃 to a base-string

This implies that the FFI logic at least expects a string of base-char, although we've already demonstrated that true base-char values get corrupted.

Potentially distantly related: https://github.com/clasp-developers/clasp/issues/1595

ffi:with-foreign-string oddly truncates

A related set of bindings:

(ffi:def-function ("_DrawTextEx" draw-text-ex-raw)
    ((font (* font-raw))
     (text (* :char))
     (position (* vector2-raw))
     (font-size :float)
     (spacing :float)
     (tint (* color-raw)))
  :returning :void)

(declaim (ftype (function (font string vector2 real real color)) draw-text-ex))
(defun draw-text-ex (font text position font-size spacing tint)
  "Draw text using a `font' and additional parameters."
  (ffi:with-foreign-string (ctext text)
                           (draw-text-ex-raw (font-pointer font)
                                             ctext
                                             (vector2-pointer position)
                                             (float font-size)
                                             (float spacing)
                                             (color-pointer tint))))

Here we see my attempt to use (* :char) instead, and manually convert the Lisp string to this via ffi:with-foreign-string. However, on the C side, it only receives:

C: 67
\0: 0

It seems that it only respects the first character of the original string. This occurs whether the original string was all standard-char or not.

Here's the relevant function within ECL:

(defun convert-to-foreign-string (string-designator)
  "Syntax: (convert-to-foreign-string string-designator)

Converts a Lisp string to a foreign string. Memory should be freed
with free-foreign-object."
  (let ((lisp-string (string string-designator))
        (foreign-type '(* :char)))
    (c-inline (lisp-string foreign-type) (t t) t
       "{
        cl_object lisp_string = #0;
        cl_index size = lisp_string->base_string.fillp;
        cl_object output = ecl_allocate_foreign_data(#1, size+1);
        memcpy(output->foreign.data, lisp_string->base_string.self, size);
        output->foreign.data[size] = '\\0';
        @(return) = output;
        }"
        :one-liner nil
        :side-effects t)
    ))

Looking at the embedded C, it might be a mistake to assume that the base_string.fillp (the fill pointer, I'm assuming) would help you if the string had been a non-adjustable simple-string with a NIL fill pointer. Note that this particular line was written in 2006! Quite an old bug 😄

Thank you for reading this report and for any guidance you can offer.


     VERSION "24.5.10"
      VCS-ID "c0720610ddd12d709508c99f25ac56bf8c73e0a2"
          OS "Linux"
  OS-VERSION "6.15.6-arch1-1"
MACHINE-TYPE "x86_64"
    FEATURES (:SLYNK :SERVE-EVENT :ASDF-PACKAGE-SYSTEM :ASDF3.1 :ASDF3 :ASDF2
              :ASDF :OS-UNIX :NON-BASE-CHARS-EXIST-P :ASDF-UNICODE :WALKER
              :CDR-6 :GRAY-STREAMS-MODULE :CDR-1 :CDR-5 :LINUX :FORMATTER
              :CDR-7 :ECL-WEAK-HASH :LITTLE-ENDIAN :ECL-READ-WRITE-LOCK
              :LONG-LONG :UINT64-T :UINT32-T :UINT16-T :COMPLEX-FLOAT
              :LONG-FLOAT :UNICODE :CLOS-STREAMS :CMU-FORMAT :UNIX :ECL-PDE
              :DLOPEN :CLOS :THREADS :BOEHM-GC :ANSI-CL :COMMON-LISP
              :FLOATING-POINT-EXCEPTIONS :IEEE-FLOATING-POINT
              :PACKAGE-LOCAL-NICKNAMES :CDR-14 :PREFIXED-API :FFI :X86_64
              :COMMON :ECL)
Edited by Colin Woodbury