API Analysis
API Analysis is a system for taking an API specification (OpenAPI/GraphQL) and figuring out how to call each operation/query/mutation (operation) correctly. This includes identifying dependencies between operations. Once that understanding is available, provide a mechanism for generating data on request for an operation. The goal is to enable our DAST API and API Fuzzing tools with the ability to correctly call each operation being tested multiple times with success. For example, when testing an operation that deletes data, first call the operation to create the data that will be deleted. This ensures the main code path is correctly tested. - Two components - API Exploration - API Oracle ### API Exploration API Exploration performs the core analysis for gaining an understanding of how to call operations in an API. This component is intended to run as a job in a project pipeline. The component should be able to perform a partial run based on a pre-existing API Exploration artifact and an updated OpenAPI document. * Tech: Preference for C#, .NET 5 * Input: * Required: OpenAPI document or GraphQL schema * Optional: API Exploration output artifact * Output: * An artifact that contains an understanding of the API suitable for API Oracle and API Exploration usage * Docker image that runs as a job in a CI Pipeline * Can take advantage of multiple CPUs to run faster * Requires a deployed instance of the API with some amount of pre-populated data * Will need to support same authentication methods as DAST API/API Fuzzing. If written in C#, could re-use the existing code. - Identify create-read-update-delete (CRUD) for each operation - Guess at first (GET == Read, POST == Create) - Once we can call the operation, verify the CRUD guess - Identify the shape of each operation input argument - Is the input required to be unique? - Must the input value already exist? - Data type (int64, string, enum, object, etc.) - Data format (name, name_with_spaces, credit_card, address, city, state, url, etc.) - If id field -- What is it an ID of? (pointer to set of operations possibly) - Collect known data for input arguments that cannot be generated - For example, a store might have item categories that cannot be created or updated - Map operation input to operation output across an API - This intelligence can be used by the analyzers when performing checks such as persistent XSS - Needed to understand the dependencies between operations - Identify dependencies between operations (have to call operation Y before X) - Figure out how to call each operation defined by an API - Ability to perform a differential exploration (identify new/changed operations, and call them) #### Petstore Example <details><summary>Click to expand</summary> Example based on [the Swagger Petstore example API](https://petstore3.swagger.io/#/). Operations: - PUT /pet - Update an existing pet - Input: - id: int - name: string - category: object - id: int - name: string - tags: array - item: object - id: int - name: string - status: enum - POST /pet - Add a new pet to the store - Input: - name: string - category: object - id: int - name: string - tags: array - item: object - id: int - name: string - status: enum - Output: - id: int - name: string - category: object - id: int - name: string - tags: array - item: object - id: int - name: string - status: enum - GET /pet/findByStatus - Find Pets by status - Input: - status: enum (available, pending, sold) - Output: - array - id: int - name: string - category: object - id: int - name: string - tags: array - item: object - id: int - name: string - status: enum - GET /pet/{petId} - Input: - petId: int - Output: - id: int - name: string - category: object - id: int - name: string - tags: array - item: object - id: int - name: string - status: enum - DELETE /pet/{petId} - Input: - petId: int _What would the system figure out:_ 1. We know all the input arguments for `GET /pet/findByStatus` 1. `GET /pet/findByStatus` returns input arguments needed to call: 1. `PUT /pet` 1. `GET /pet/{petId}` 1. `DELETE /pet/{petId}` 1. Data returned by `GET /pet/findByStatus` provides the basic shape of the input arguments for: 1. `PUT /pet` 1. `POST /pet` 1. The system would call `GET /pet/findByStatus` first then each operation that was understood after 1. The system would initially assign CRUD values to each operation based on method type, then verify them once each operation can be called. 1. Make sure a Create operation creates something 1. Make sure a Delete operation deletes something 1. The system would create a dependency graph for each operation: 1. `PUT /pet` depends on data created by `POST /pet` 1. Needs to know a valid `petId` 1. Needs to know a valid category id/name 1. `POST /pet` depends on data returned by `GET /pet/findByStatus` 1. Needs to know a valid category id/name 1. `GET /pet/findByStatus` has no dependencies 1. `GET /pet/{petId}` depends on data created by `POST /pet` or returned by `GET /pet/findByStatus` 1. `DELETE /pet/{petId}` depends on data created by `POST /pet` or returned by `GET /pet/findByStatus` 1. The system would collect all categories and tags it encounters 1. The system would expand knowledge of argument shapes 1. Check if strings/names can contain spaces 1. ??? </details> ### API Oracle The API Oracle is a library that can be called from the DAST API/API Fuzzing engines. The oracle provides information about an operation in an API and also provides the data needed to call the operation. When an operation has a dependency on another operation, the oracle may call the dependency operation as needed. To support this, a callback may be provided to handle authentication. * Language: Preference for C#, .NET 5 (must be callable from C#) * Library (assembly) that can be called from DAST API/API Fuzzing engine * Input: * Required: OpenAPI document of GraphQL schema * Required: API Exploration output artifact * Requires a deployed instance of the API - Oracle provides the shape of operation arguments - Oracle provides data needed to correctly call an operation/query/mutation - Able to call operation/query/mutation dependencies to generate required inputs if needed - Callback to inject authentication into requests made by Oracle component - Prefer creating new data over modifying existing data #### Petstore Example <details><summary>Click to expand</summary> Based on the information collected by the API Exploration component, the oracle could be asked to provide data to call an operation. 1. DAST API: I want to call `DELETE /pet/{petId}` 500 times 1. Oracle: 1. Loop 500 times 1. Call `POST /pet` 1. Use known category ids and tag ids 1. Returns: 1. CRUD Type: Delete 1. Shape of `petId`: 1. Type: int64 1. Unique 1. Existing 1. Array of created `petId`s </details>
epic