Optimize memory usage when object-relational mapping in DAST
Problem to solve
The MR gitlab-org/security-products/dast!239 (merged) changes the retrieval of ZAP HTTP messages by getting them directly from the ZAP database instead of the ZAP API. This change was made to reduce the memory requirements of running a DAST scan.
The need has therefore been introduced to map between results returned from the ZAP database into Python. Currently results from the ZAP database will be returned as Python-wrapped Java types, which are not substitutable for normal Python types. An example of this is jpype.JString
, which is not able to be used in place of a standard Python string
.
Currently, a naive implementation converts all returned database types into a Python string. This could be improved by converting to specific Python types. For example, when a JInteger
is returned from a database query, it would be more efficient from a memory perspective to store this as an int
instead of string
. It is also more conceptually correct.
Impact
It takes less memory to store integers than strings:
>>> sys.getsizeof('123456789')
58
>>> sys.getsizeof(int('123456789'))
28
However, as we're talking about differences in bytes, it would take a very large amount of strings for this to make a material difference to how much memory ZAP uses.
Potentially of more importance is for engineers to get access to the appropriate type straight away, without the need for the conversion in the first place. It may be worth looking into using a Python ORM tool to run the queries and handle the mapping to our internal objects.