Scheduling control flow - retry to schedule application

**Implement retry with delay in scheduling control flow in case when the suitable cluster for the given application is not available yet. **

Proposed workflow

Create app
Scheduler doesn't match any cluster for given app (because there is not any cluster)
Scheduler raises RESOURCE_NOT_FOUND error (with error code under 100) and keeps PENDING status of the given app.

class ReasonCode(Enum):
    INTERNAL_ERROR = 1  # Default error

    INVALID_RESOURCE = 10  # Invalid values in the Manifest
    CLUSTER_NOT_REACHABLE = 11  # Connectivity issue with the Kubernetes deployment
    RESOURCE_NOT_FOUND = 12 # Scheduler not found the suitable resource 
    NO_SUITABLE_RESOURCE = 50  # Scheduler issue
    # Codes over 100 will cause the controller to delete the resource directly

# scheduler error handler (code snippet)
async def error_handler(self, app, error=None):
        if reason.code.value >= 100:
            app.status.state = ApplicationState.DELETING
        elif reason.code.value == 12:
            app.status.state = ApplicationState.PENDING
        else:
            app.status.state = ApplicationState.FAILED

Scheduler loop tries to process app again (with the PENDING state and RESOURCE_NOT_FOUND reason) but still there is not suitable cluster for given app

Only limited number or retries are permitted

app.status.scheduler_retries -= 1

! Scheduler should store the number of retries in status.scheduler_retries to prevent overloading of queue
- If the retries number is exceeded scheduler raises UnsuitableDeploymentError("No cluster available") and sets FAILED status in given app

kind: Application
api: kubernetes
metadata:
  changed:       # timestamp when spec was changed

status:
  
  scheduled:     # timestamp when app was bound to cluster
  scheduled_to:  # cluster where the app was scheduled to
  running_on:    # cluster where the app is currently running
  scheduler_retries: int = 5  # the number of retries to prevent overloading of scheduler queue

Create suitable cluster
Scheduler loop tries to process app again, there is suitable cluster for it
Scheduler sets app.status.scheduled_to for given app, remove RESOURCE_NOT_FOUND reason from app and also sets the proper state for app

! Consider scenario with application cluster constraints and updating the app

! Apply delayed queuing for scheduler work queue (it is already supported in Krake)

Edited Nov 15, 2019 by Matej Feder