Wireshark does not display Arabic, Greek, some other characters correctly
Summary
Two byte characters whose UTF-8 representation begins with a byte whose fifth bit is 1 (byte & 0x08 == 0x08) are displayed as REPLACEMENT CHARACTER
Steps to reproduce
- Open signaling_trace.pcap
- Look at the SMS text entry in packet 1.
What is the current bug behavior?
-
Lots of replacement characters displayed
SMS text: ���������� ������������, ���������� ������������, �������������� .txt
What is the expected correct behavior?
- The text أوقات الصلاة, اتجاه القبلة, المساجد .txt (which is also correctly displayed through "Copy -> Value"
Sample capture file
Relevant logs and/or screenshots
Build information
3.5.0 (v3.5.0rc0-321-gc4d19650)
Compiled (64-bit) with Qt 5.15.2, with libpcap, with POSIX capabilities (Linux), with libnl 3, with GLib 2.66.3, with zlib 1.2.11, with SMI 0.4.8, with c-ares 1.17.0, with Lua 5.1.5, with GnuTLS 3.6.15 and PKCS #11 (closed) support, with Gcrypt 1.8.7, with MIT Kerberos, with MaxMind DB resolver, with nghttp2 1.41.0, with brotli, with LZ4, with Zstandard, with Snappy, with libxml2 2.9.10, with QtMultimedia, without automatic updates, with SpeexDSP (using system library).
Running on Linux 5.9.9-200.fc33.x86_64, with Intel(R) Core(TM) i5-4670K CPU @ 3.40GHz (with SSE4.2), with 15943 MB of physical memory, with locale en_US.utf8, with light display mode, without HiDPI, with libpcap version 1.9.1 (with TPACKET_V3), with GnuTLS 3.6.15, with Gcrypt 1.8.7, with brotli 1.0.9, with zlib 1.2.11, binary plugins supported (21 loaded).
Built using gcc 10.2.1 20201125 (Red Hat 10.2.1-9).