Investigate using ML to generate data for API inputs

Problem

API Security performs security testing of various API technologies. A common starting point is an API specification document such as an OpenAPI document or GraphQL schema. While these documents provide the names and basic types of input field, they often do not include any example data. This makes it quite hard for API Security to correctly call many APIs as the input data doesn't pass validation.

Proposal

Investigate using machine learning to assist in discovering correct input values. How would this work? In many cases a human could look at the name of an input field and guess the data format. For example, given fields named: address1, address2, address3, city, state, zip; a human would know that these inputs are looking for an address. Machine learning could potentially be used to make the same inferences.

Another potential use of machine learning to discover the correct input values would be looking over data returned by API operations and inferring that it maps to an input field. This would work by finding conceptually similar fields (name, type, operation name).

One limitation for this concept is requiring CPU for inference vs. GPU/ML chip. This will place limitations on the model complexity.