Segfault for short-lived client app using open telemetry with OpenSSL
When running some Tango client apps that complete in a very short time we noticed they crash after enable OpenTelemetry, with traces sent over HTTPS.
Simple example, first one waits 0.5 seconds, and second one waits 2.5 seconds. The fast one tends to crash:
$ export TANGO_TELEMETRY_ENABLE=on
$ export TANGO_TELEMETRY_TRACES_EXPORTER=http
$ export TANGO_TELEMETRY_TRACES_ENDPOINT=https://my-telemetry-backend:443/v1/traces
$ python -c "print('starting...'); import tango; import time;
print('creating proxy...');
dp = tango.DeviceProxy('sys/tg_test/1');
print('sleeping...');
time.sleep(0.5);
print('awake')"
starting...
creating proxy...
sleeping...
awake
zsh: segmentation fault (core dumped)
$ python -c "print('starting...'); import tango; import time;
print('creating proxy...');
dp = tango.DeviceProxy('sys/tg_test/1');
print('sleeping...');
time.sleep(2.5);
print('awake')"
starting...
creating proxy...
sleeping...
awake
Click to see backtrace
Example from a simple script doing the same thing.
$ gdb --args python test_telem_quick_crash.py
...
(gdb) r
Starting program: /home/user/.conda/envs/tango-10-telemetry/bin/python test_telem_quick_crash.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7fffd9039700 (LWP 751173)]
[New Thread 0x7fffd8838700 (LWP 751174)]
cpp span
[New Thread 0x7fffd3fff700 (LWP 751175)]
[New Thread 0x7fffd37fe700 (LWP 751176)]
[Thread 0x7fffd37fe700 (LWP 751176) exited]
Thread 4 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd3fff700 (LWP 751175)]
0x00007ffff7bb9e56 in pthread_rwlock_rdlock () from /lib64/libpthread.so.0
(gdb) bt
#0 0x00007ffff7bb9e56 in pthread_rwlock_rdlock () from /lib64/libpthread.so.0
#1 0x00007fffe6f3643a in CRYPTO_THREAD_read_lock () from /home/user/.conda/envs/tango-10-telemetry/lib/python3.11/site-packages/tango/../../.././././libcrypto.so.3
#2 0x00007fffe701ce45 in RAND_get_rand_method () from /home/user/.conda/envs/tango-10-telemetry/lib/python3.11/site-packages/tango/../../.././././libcrypto.so.3
#3 0x00007fffe701d767 in RAND_status () from /home/user/.conda/envs/tango-10-telemetry/lib/python3.11/site-packages/tango/../../.././././libcrypto.so.3
#4 0x00007fffe6a8b997 in Curl_ossl_ctx_init () from /home/user/.conda/envs/tango-10-telemetry/lib/python3.11/site-packages/tango/../../../././././libcurl.so.4
#5 0x00007fffe6a8f4df in ossl_connect_common () from /home/user/.conda/envs/tango-10-telemetry/lib/python3.11/site-packages/tango/../../../././././libcurl.so.4
#6 0x00007fffe6a925ab in ssl_cf_connect () from /home/user/.conda/envs/tango-10-telemetry/lib/python3.11/site-packages/tango/../../../././././libcurl.so.4
#7 0x00007fffe6a1f2a6 in cf_setup_connect () from /home/user/.conda/envs/tango-10-telemetry/lib/python3.11/site-packages/tango/../../../././././libcurl.so.4
#8 0x00007fffe6a16f16 in cf_hc_connect () from /home/user/.conda/envs/tango-10-telemetry/lib/python3.11/site-packages/tango/../../../././././libcurl.so.4
#9 0x00007fffe6a1bd99 in Curl_conn_connect () from /home/user/.conda/envs/tango-10-telemetry/lib/python3.11/site-packages/tango/../../../././././libcurl.so.4
#10 0x00007fffe6a5c082 in multi_runsingle () from /home/user/.conda/envs/tango-10-telemetry/lib/python3.11/site-packages/tango/../../../././././libcurl.so.4
#11 0x00007fffe6a5e193 in curl_multi_perform () from /home/user/.conda/envs/tango-10-telemetry/lib/python3.11/site-packages/tango/../../../././././libcurl.so.4
#12 0x00007fffe7f5ae75 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<opentelemetry::v1::ext::http::client::curl::HttpClient::MaybeSpawnBackgroundThread()::{lambda(opentelemetry::v1::ext::http::client::curl::HttpClient*)#1}, opentelemetry::v1::ext::http::client::curl::HttpClient*> > >::_M_run() ()
from /home/user/.conda/envs/tango-10-telemetry/lib/python3.11/site-packages/tango/../../.././././libopentelemetry_http_client_curl.so
#13 0x00007fffe8bd5b6d in std::execute_native_thread_routine (__p=<optimized out>) at ../../../../../libstdc++-v3/src/c++11/thread.cc:104
#14 0x00007ffff7bb51ca in start_thread () from /lib64/libpthread.so.0
#15 0x00007ffff70868d3 in clone () from /lib64/libc.so.6
(I have a fix that fixes some clean-up when Python is exiting - MR pending...)