Time required for ccomps to finish rises exponentially with graph size making it appear non-responsive
I am running ccomps on 16 large all different non-directed graphs (input file size 14–129 MB, 273,700–2,688,198 nodes, 137,257–1,367,285 edges). The number of subgraphs is very close to the number of edges. In all cases ccomps outputs in less than a minute probably almost all subgraphs and then hangs for over an hour spending 100% CPU time. I experimented with smaller graph sizes, and it appears that that time spent by ccomps rises exponentially as the graph size increases. A connected components algorithm based on a BFS has a time complexity of O(|V| + |E|), so it appears that ccomps does something incorrectly after outputting the graph.
| Nodes | Edges | Time |
|---|---|---|
| 3996 | 1998 | 0m0.550s |
| 7997 | 3998 | 0m2.674s |
| 14,987 | 7503 | 0m13.658s |
| 29,927 | 15,004 | 1m15.495s |
| 59,886 | 30,004 | 5m53.489s |
| 284,032 | 142,476 | > 90m |
Steps to reproduce
Run ccomps on the file https://www.balab.aueb.gr/~dds/ec_pairs-12.gv
Expected Behaviour
ccomps should terminate soon after outputting the last subgraph.
Actual Behaviour
ccomps outputs most of the results (probably minus last unflushed buffer contents) and then appears to hang. Deleting most lines from the input reduces the runtime as shown in the above table.
OS Version
Debian, Red Hat Linux, and Windows 11 Cygwin.
Graphviz Version
12.1.0, 8.0.5 (20230508.1809), 2.50.0 (20211204.2007), 2.43.0.
Additional info
The unproductive time appears to be spent in the write.c:write_body edge iteration loop, after all subgraphs have been written. I guess each edge output test iterates over all generated subgraphs, thereby resulting in the exponential complexity. Removing the useless part from the code path, has ccomps finish correctly the processing of the table's last data set (284,032 nodes, 142,476 edges, yielding 141,555 subgraphs) in just 9s. The same speedup can be also achieved (for slightly different output) by running ccomps -x.