Provide isolation in image integration test examples

Problem to solve

Copy fixtures dir before running image tests (#352464 - closed) added support for copying the entire fixtures directory each time the image integration test suite is executed. This change provides isolation of fixture files between separate runs of the entire test suite, however, it does not provide isolation for each example.

To understand the impact of this, let's assume we have the following image_spec.rb:

context "with go-modules" do
  let(:project) { "go-modules/default" }

  it_behaves_like "successful scan"
end

The above test causes the analyzer to be executed once and populates the fixtures directory with the following generated artifacts and reports:

.
└── tmp/
    └── qa-63192/
        └── fixtures/
            └── go-modules/
                └── default/
                    ├── -rw-r--r--  1 adam  staff   413 May 19 16:14 go.mod                                // fixture file
                    ├── -rw-r--r--  1 adam  staff  2278 May 19 16:14 go.sum                                // fixture file
                    ├── -rw-r--r--  1 adam  staff  4213 May 19 16:15 cyclonedx-go-go.json                  // generated artifact
                    ├── -rw-r--r--  1 adam  staff   132 May 19 16:15 sbom-manifest.json                    // generated artifact
                    └── -rw-r--r--  1 adam  staff 11418 May 19 16:15 gl-dependency-scanning-report.json    // generated report

The gl-dependency-scanning-report.json in the above has a size of 11418 bytes and contains both vulnerabilities and dependency_files:

Click to expand
{
  "version": "14.0.4",
  "vulnerabilities": [
    {
      "id": "2ed8c35d5b30adb3a8cf1de0ee6e46453dcfea98d8e308b43de35415c5972980",
      "category": "dependency_scanning",
      "name": "Allocation of File Descriptors or Handles Without Limits or Throttling",
      "details": {
        "vulnerable_package": {
          "type": "text",
          "name": "Vulnerable Package",
          "value": "github.com/minio/minio:v0.0.0-20180419184637-5a16671f721f"
        }
      }
    },
  ],
  "dependency_files": [
    {
      "path": "go.sum",
      "package_manager": "go",
      "dependencies": [
        {
          "package": {
            "name": "github.com/astaxie/beego"
          },
          "version": "v1.10.0"
        }
      ]
    }
  ],
  "scan": {
    "scanner": {
      "id": "gemnasium",
      "name": "Gemnasium",
      "url": "https://gitlab.com/gitlab-org/security-products/analyzers/gemnasium",
      "vendor": {
        "name": "GitLab"
      },
      "version": "2.38.0"
    },
    "type": "dependency_scanning",
    "start_time": "2022-05-23T08:10:01",
    "end_time": "2022-05-23T08:10:04",
    "status": "success"
  }
}

If we update the above spec to include a second context as follows, and run the test suite again:

context "with go-modules" do
  let(:project) { "go-modules/default" }

  context "when not using DS_EXCLUDED_PATHS" do
    it_behaves_like "successful scan"
  end

  context "when using DS_EXCLUDED_PATHS" do
    let(:variables) { { "DS_EXCLUDED_PATHS": "/go.sum" } }

    it_behaves_like "successful scan"
  end
end

This causes the analyzer to be run twice, once when DS_EXCLUDED_PATHS is not configured, and once when DS_EXCLUDED_PATHS is configured. An unfortunate side effect of executing two tests is that the second test will overwrite the gl-dependency-scanning-report.json from the first test, which we can see by looking at the file size for the gl-dependency-scanning-report.json report in the fixtures directory and noticing that when we ran a single test, it was 11418 bytes, but now that we're running two tests, it's only 445 bytes:

.
└── tmp/
    └── qa-7854/
        └── fixtures/
            └── go-modules/
                └── default/
                    ├── -rw-r--r--  1 adam  staff   413 May 19 16:22 go.mod                                // fixture file      
                    ├── -rw-r--r--  1 adam  staff  2278 May 19 16:22 go.sum                                // fixture file      
                    ├── -rw-r--r--  1 adam  staff  4213 May 19 16:23 cyclonedx-go-go.json                  // generated artifact
                    ├── -rw-r--r--  1 adam  staff   132 May 19 16:23 sbom-manifest.json                    // generated artifact
                    └── -rw-r--r--  1 adam  staff   445 May 19 16:23 gl-dependency-scanning-report.json    // generated report  

Also, the gl-dependency-scanning-report.json in the above does not contain any vulnerabilities or dependency_files:

{
  "version": "14.0.4",
  "vulnerabilities": [],
  "scan": {
    "scanner": {
      "id": "gemnasium",
      "name": "Gemnasium",
      "url": "https://gitlab.com/gitlab-org/security-products/analyzers/gemnasium",
      "vendor": {
        "name": "GitLab"
      },
      "version": "2.38.0"
    },
    "type": "dependency_scanning",
    "start_time": "2022-05-23T08:15:05",
    "end_time": "2022-05-23T08:15:10",
    "status": "success"
  }
}

What's happening is the second context "when using DS_EXCLUDED_PATHS" in the above test has overwritten the gl-dependency-scanning-report.json report produced by the first test in context "when not using DS_EXCLUDED_PATHS".

Proposal

In order to provide test isolation and prevent tests from interfering with each other, we can change the name of top level directory that represents the entire test suite to test-* instead of qa-*, and then each time the analyzer is executed, we copy over only the fixture files for the specific test being run, and name it after the rspec example it represents, such as running-image-with-test-project-with-go-modules-behaves-like-successful-scan-creates-a-report:

.
└── tmp/
    └── test-63192/
        └── go-modules/
            └── default/
                ├── running-image-with-test-project-with-go-modules-behaves-like-successful-scan-creates-a-report/
                │   ├── go.mod
                │   ├── go.sum
                │   ├── cyclonedx-go-go.json
                │   ├── sbom-manifest.json
                │   └── gl-dependency-scanning-report.json
                └── running-image-with-test-project-with-go-modules-when-excluding-go.sum-with-ds_excluded_paths-behaves-like-successful-scan-creates-a-report/
                    ├── go.mod
                    ├── go.sum
                    ├── cyclonedx-go-go.json
                    ├── sbom-manifest.json
                    └── gl-dependency-scanning-report.json

I believe we can obtain the rspec example description by adding some additional code to the image_spec.rb, as follows:

description = ""
RSpec.configure do |config|
  config.before :example do |x|
    description = x.metadata[:full_description].downcase.gsub(" ", "-")
  end
end

describe "running image" do
  let(:fixtures_dir) { File.expand_path("../qa/fixtures", __dir__) }

  context "with test project" do
    let(:scan) do
      GitlabSecure::IntegrationTest::DockerRunner.run_with_cache(
        image_name, target_dir, description,
        command: command,
        script: script,
        offline: offline,
        variables: global_vars.merge(variables))
    end

    let(:report) { scan.report }

    context "with go-modules" do
      let(:project) { "go-modules/default" }

      it_behaves_like "successful scan"

      context "when excluding go.sum with DS_EXCLUDED_PATHS" do
        let(:variables) { { "DS_EXCLUDED_PATHS": "/go.sum" } }

        it_behaves_like "successful scan"
      end
    end
  end
end

This would have the following behaviour:

  1. Tests won't interfere with each other.
  2. Each test fixture is named after the rspec example it represents, making it easy to figure out which fixture directory corresponds to which test.
  3. We'll end up producing a complete copy of the qa directory for each test. For gemnasium v3, this is currently 5.6MB and there are 31 individual contexts so we'd end up creating around 31*5.6MB = 173MB of temporary files each time the spec is run, for gemnasium alone.

I'm sure we can optimize the process and reduce the amount of space required by not copying over the expectation directory each time, perhaps we could even copy over only the exact fixtures directory related to the test being run.

Implementation Plan

  1. Update the Fixtures class:

    • Add new arguments to the initialize method:

      -      def initialize(original_target_dir)
      +      def initialize(relative_fixtures_dir, original_target_dir, description)
    • Place fixture files in tmp/test-* instead of tmp/qa-*

    • Copy the fixtures for the specific test from the original qa/fixtures/<path/to/test> directory to a directory named after the rspec example. For example, let's assume we start with the following project structure:

      context "with go-modules" do
        let(:project) { "go-modules/default" }
      
        it_behaves_like "successful scan"
      
        it_behaves_like "recorded report" do
          let(:recorded_report) { parse_expected_report(project) }
        end
      
        context "when excluding go.sum with DS_EXCLUDED_PATHS" do
          let(:variables) { { "DS_EXCLUDED_PATHS": "/go.sum" } }
      
          it_behaves_like "successful scan"
        end
      end

      This will now produce two separate fixture directories, named after each rspec context they represent:

      .
      └── gemnasium/
          ├── qa/
          │   ├── scripts
          │   ├── expect
          │   └── fixtures/
          │       └── go-modules/
          │           └── default/
          │               ├── go.mod
          │               └── go.sum
          └── tmp/
              └── test-63192/
                  └── go-modules/
                      └── default/
                          ├── running-image-with-test-project-with-go-modules-behaves-like-successful-scan-creates-a-report/
                          │   ├── go.mod
                          │   ├── go.sum
                          │   ├── cyclonedx-go-go.json
                          │   ├── sbom-manifest.json
                          │   └── gl-dependency-scanning-report.json
                          └── running-image-with-test-project-with-go-modules-when-excluding-go.sum-with-ds_excluded_paths-behaves-like-successful-scan-creates-a-report/
                              ├── go.mod
                              ├── go.sum
                              ├── cyclonedx-go-go.json
                              ├── sbom-manifest.json
                              └── gl-dependency-scanning-report.json
    • Add a new relative_expectation_dir argument to the assert_files_match_expectations method, which will be passed from any spec that uses the recorded CycloneDX files or recorded SBOM manifest shared examples

    • Update unit tests in the integration-test project to test the new behaviour

    • Update run_with_cache in the DockerRunner class to accept new arguments:

      • fixtures_dir
      • expectations_dir
      • description

      This has been implemented by: Provide isolation in image integration test exa... (gitlab-org/security-products/analyzers/integration-test!29 - merged).

  2. Update the following image spec files:

    These spec files need to be updated to set let(:relative_expectation_dir) { project } for any SBOM Manifest or CycloneDX SBOMs contexts. They also need to pass a description to DockerRunner.run_with_cache containing the rspec example name.

    This has been implemented by Provide isolation in image integration test exa... (gitlab-org/security-products/analyzers/gemnasium!340 - merged).

/cc @fcatteau @gonzoyumo

Edited by Adam Cohen