Performance optimization potential / Rust port

First of all, thanks for building and maintaining this library!

Because it makes integration into LibrePCB easier, I created a Rust port of stepreduce: https://github.com/dbrgn/stepreduce-rs After ensuring that the output is identical to your C++ version, I benchmarked the two implementations. To my surprise, the Rust version was 3.5x faster, and after some more optimization 5x faster than C++. (The benchmark results are shown in the README file.)

To rule out issues with compilation flags: Is this the proper way to build the library with relevant optimizations on?

cmake -DCMAKE_BUILD_TYPE=Release . && make

In any case, I tried to find out where the speed difference is coming from. Based on some search engine research and some LLM prompts, I suspect that the main bottleneck is the libstdc++ regex implementation. You might get a significant performance gain from replacing it if you think it's worth optimizing.

That's all, so feel free to close this issue. Have a nice weekend! 🙂