display connections
The underlying data has connections between entities. The same holds for groups of them. Therefore we can compute number of connections between two facets.
Computation involves counting either the overlap of two facets or counting the number of records in a many-to-many table between indices of content types. As the facets can represent multiple content types, the counting the overlap can involve summing over the content types. Due to these heterogeneities, the counting the number of connections is not trivial and must be implemented separately for different types of nodes.
At least the following different connections exist:
- records in many-to-many tables between main content types
- overlap between facets of same content type
- connections between single entities of the same type via two many-to-many tables (e.g. co-authors)
- overlap between single entities over their associated entities (works, groups, years)
Regardless of the way of computation, the connections can be used to lay out the entities. The main benefit and goal is that strongly connected facets end up near each other. This can be done either within single facet group or over multiple facet groups. The former is simpler and can benefit arrangement of facets in otherwise group driven layout. The latter is more complicated for UI because it must break facet groups in order to let facets flow near each other across groups.
Regardless of the way of computation, the number of connections is very high. In extreme case, the number of edges in a complete graph is computed with (n * (n - 1)) / 2
where n is the number of nodes. Therefore we cannot expect the connections being computed between each node pari and for each request.
A solution is to cache the computed connections as stratum edges. The edges will be built one by one or small sets and per client request. The client might like to begin with the (super)nodes that seem to have most of the connections.
Given that the connections are allowed to affect the layout, it is preferable that the layout stays somewhat stable. The familiarity helps user to keep the context. Most force-driven graph layout algorithms involve randomness and therefore the layout cannot be initiated from scratch client-side. The use of seedable pseudo-randomness does not help either because the layout build is chaotic thus even a single edge can completely reshape the final arrangement. Likely the only solution here is to build the layout server-side, provide the client with some initial coordinates for the nodes, and use a layout algorithm that allows to start from initially known positions.
In conclusion, challenges are three-fold:
- computing the connections from heterogeneous data
- solving the conflict between grouped facets and associated facets
- rendering the connections in force-drive but stable manner