Draft: Spike retrieving HTTP messages from the ZAP database to resolve memory issues
What does this MR do?
This MR is a spike to understand if retrieving the HTTP request/response messages for each alert from the ZAP database is possible and whether it has any impact to memory used by DAST. Only the fields required by DAST will be loaded from the database.
What are the relevant issue numbers?
gitlab-org/gitlab#223827 (closed)
Why this change is expected to use less memory
Currently, DAST uses the following process to get each messages. All of the alerts are retrieved in one call, then for each alert a separate call is made to ZAP for the associated message.
sequenceDiagram
DAST->>ZAP: GET alerts
ZAP-->>DAST: alerts[]
DAST->>ZAP: GET message/1
ZAP-->>DAST: message(1)
DAST->>ZAP: GET message/2
ZAP-->>DAST: message(2)
This leads to the following spike in Python memory usage at the end of a scan:
There are a few issues this MR attempts to solve:
- Reduce the number of API calls back and forth between DAST and ZAP. Messages should be able to be retrieved in batches.
- Each message returned from ZAP contains the entire request body and response body. DAST doesn't expose HTTP bodies, so these only make the problem worse. Consider a single response body that is 100MB. It will be loaded from the database into ZAP memory and returned in the API. It will then be loaded into Python memory. This is a lot of wasted memory usage.
- All of the alerts are held in memory at the same time in Python. This means that all of the associated HTTP messages (and their bodies) are held in memory at the same time.
- (Not confirmed) When the Java process requests memory from the OS to expand the heap, it will not give the memory back to the OS until the process has finished. Ditto with the Python process. This means that garbage collection isn't that helpful, we need to ensure that we don't use too much memory in the first place.
Does this MR meet the acceptance criteria?
-
Changelog entry added -
Documentation created/updated for GitLab EE, if necessary -
Documentation created/updated for this project, if necessary -
Documentation reviewed by technical writer or follow-up review issue created -
Tests added for this feature/bug -
Job definition updated, if necessary -
Job definition example -
Vendored CI Templates (also in CE)
-
-
Conforms to the code review guidelines -
Conforms to the Go guidelines -
Security reports checked/validated by reviewer
Edited by Cameron Swords