mechanism to maintain current information about node and service availability -- adapter side
This issue depends on nunet-infra#27 (moved) which implements the centralized (for now) registry of services. We want this 'registry' to contain as up-to-date information as possible and, most importantly, to ensure that services in the platform do not call unavailable services. It is advisable to look at the description of mentioned issue before going on with this one -- as they describe platform components which will work together.
Main functionalities to be implemented:
-
Unavailable nodes should be deleted from the table/database as soon as they de-register or the unavailability is discovered; -
The list of nodes and services should contain metadata about each services or at least a reference to where metadata can be obtained.
Maintaining current information can be done in two ways. It may be that we will have to implement both ways so that they overlap to some extent (this is a decentralized system anyway):
-
decentralized option (implemented on adapter side #38 (closed)) :
- registry of nunet-infra#27 (moved) implements two grpc calls: callService; returnService;
- when an adapter asks for the list of available services (by service name), the registry returns the ordered list (how the list is ordered is determined on the registry -- see below);
- an adapter loops through the list and pings each each service in order by asking to return its metadata;
- if correct metadata is returned and the indicates that service is available, the required call is issued;
- if there is no response for some TTL, then the service is marked 'unavailable' and the adapter calls registry informing about that. registry then decides what to do with that (it can issue additional check or can simply delete the indicated endpoint from the database/table.
-
centralized option (to be implemented on the registry side -- of nunet-infra#27 (moved)):
- the registry periodically makes health checks to the all services in the table / list and updates their metadata;
- if a service does not respond in pre-determined TTL, the service is deregistered;
- the list of services is ordered (e.g. by the time it takes for a service to respond to health check or whatever);
Both centralized and decentralized options could work, but the decentralized one is preferred over centralized.