Changes

Nikos Filinis · 0a336be3
--- a/Kubernetes-parallel-environment-V0.md
+++ b/Kubernetes-parallel-environment-V0.md
@@ -18,7 +18,7 @@ The goal of each agent is to find the optimal threshold depending on the applica
 * **sla_latency**: The proposed SLA latency values measured in seconds
 * **max_episode_steps**: Number defining the number of timesteps within an episode (Timesteps are arbitrarily grouped in episodes)
 * **app_endpoint**: The external URL that is used to route external requests to our app and stress the application
-* **stress_script_name**: The stress script name in the scripts directory used to stress the applciation
+* **stress_script_name**: The stress script name in the scripts directory used to stress the application
 * **prometheus_host**: The url of the prometheus endpoint
 * **prometheus_latency_metric_name**: The PROMQL query for the latency

@@ -38,28 +38,55 @@ The goal of each agent is to find the optimal threshold depending on the applica

 # Actions definition
 The action represents the proposed threshold increase/decrease with regard to the CPU threshold and is one of:  
-**0**: Decrease the CPU threshold by the threshold step
-**1**: Do nothing
-**2**: Increase the CPU threshold by the threshold step
+* **0**: Decrease the CPU threshold by the threshold step  
+* **1**: Do nothing  
+* **2**: Increase the CPU threshold by the threshold step

 # Reward definition
 The reward has two main parts that are based on:

-**The pods (number of resources)**: The higher the resource usage the lower the reward that the agent receives. The best case scenario is when we have only one pod therefore the reward given is the maximum (100). The worst case scenario is when we have the maximum number of pods in which case the reward becomes the minimum (0).
+- **The pods (number of resources)**: The higher the resource usage the lower the reward that the agent receives. The best case scenario is when we have only one pod therefore the reward given is the maximum (100). The worst case scenario is when we have the maximum number of pods in which case the reward becomes the minimum (0).

 The linear function for our reward therefore is:
 ```py
 Reward = -100 / (max_pod - 1) * num_pods + 100 * max_pod / (max_pod - 1)  
 ```

-**The latency**: Ideally the closer the application stays to the SLA the higher the reward. Since we want our agents to preemptively start responding to potential breaches of the SLA, we consider the value at the 80% of the SLA latency to be the point that is most rewarded. In contrast to the pods the reward here follows an exponential curve around the maximum point.
+- **The latency**: Ideally the closer the application stays to the SLA the higher the reward. Since we want our agents to preemptively start responding to potential breaches of the SLA, we consider the value at the 80% of the SLA latency to be the point that is most rewarded. In contrast to the pods the reward here follows an exponential curve around the maximum point.

 The reward is calculated based on the following formula:
 ```py
-if current_latecy / sla_latency < 0.8 :  
+if current_latency / sla_latency < 0.8:  
    Reward = 100 * e^(-0.3 * d * (0.8 - current_latency / sla_latency)^2) 
-if current_latecy / sla_latency > 0.8:  
+if current_latency / sla_latency > 0.8:  
    Reward = 100 * e^(-5 * d * (0.8 - current_latency / sla_latency)^2)
 ```

 Each part of the reward can contribute to the total reward with a different weight. In this case each part contributes equally.
+
+# Environment execution
+Install the environment:
+```bash
+pip3 install -e marl-k8s
+```
+Initialize and interact with the RL environment:
+```py
+import gym
+import marl_k8s
+
+env = gym.make('k8s-parallel-env-v0')
+
+env.reset()
+
+# Action '2' for all agents
+actions = [2 for agent in env.agents]
+results = env.step(actions)
+results = env.step(actions)
+
+print(results)
+
+env.close()
+```
+
+# Environment configuration
+Currently the environment configuration can be tweaked from the yaml files inside the `marl-k8s/marl_k8s/configs` directory. The registered environment (when created with the `gym.make()` function) uses the three service app config files inside the above mentioned directory. Alternatively, an object of the `K8sParallelEnvV0` class can be directly created providing the global configuration and microservice configuration files or it can even be registered in the same way as the [init file](https://gitlab.com/netmode/k8s-marl-autoscaler/-/blob/main/marl-k8s/marl_k8s/__init__.py).
\ No newline at end of file