Capture Polaris metadata when generating graph.json
As outlined in !102 (closed), we want to capture various bits of metadata when running polaris. The immediate use case is for putting this metadata into graph.json
when running polaris learn
, but the same metadata should be available to other polaris steps.
How will it be saved?
The metadata should be in the graph.json
file in its own node:
{
"metadata": { [stuff goes here] },
"nodes": { [rest as normal] }
}
What metadata do we want?
The requirements were discussed in !102 (closed). Here's what we agreed on:
- date of generation of the graph
- satellite name: the same name that is callable from
polaris fetch
- start date/ end date of the processed data
- Information on the creator/originator of the file (so we know if its human or auto-process)
- Execution information:
- command line or details of the call (polaris learn...)
- path to mlflow logs OR some reduced information about the all models hyperparameters (eventually add that to each node in the graph, maybe not a metadata scope)
- how long it took to process (to give second thoughts while one wants to erase it)
Optional, eventually good to have.
- number of nodes (for optimization)
- number of edges (for optimization)
- checksum of the file without the checksum line (weird)
- (... not an exhaustive list)
How will it be coded?
- Let's add a
Graph
class which manages the creation of a graph file.- The code in !102 (closed), which was meant to spark discussion, added a
PolarisHeatmap
class. However,PolarisHeatmap
is not an explicit name for dealing with graph files.
-
HeatMap
is one possible input out of many to create a graph. As such, graph outputs deserve their own class (potential other inputs: several other graph files, a graph database, a networkx Graph object, any kind of linked data service...)
- The code in !102 (closed), which was meant to spark discussion, added a
- As suggest in #56 there is the interesting central use of the PolarisMetadata class.
- The class file should live in the
polaris/learn/data
directory. If it becomes crowded, we can consider moving the data folder to justpolaris/data
.