nodejs-scan should work with directories that contain symlinks

Problem to solve

nodejs-scan v3.1.0 will fail to scan any repository that contains compatible files that are symlinks, i.e. a project layout like this:

nodejs-scan-debugging on  main [?] via 
❯ ls -la
total 32
drwxr-xr-x   8 james  staff   256 29 Jul 15:26 .
drwxr-xr-x  24 james  staff   768 29 Jul 15:17 ..
drwxr-xr-x  12 james  staff   384 28 Jul 18:17 .git
-rw-r--r--   1 james  staff   532 28 Jul 11:50 main.js
lrwxr-xr-x   1 james  staff     7 28 Jul 11:50 main_sym.js -> main.js

The analyser will crash with the following logs:

❯ analyzer-build && analyzer-run ../../tests/nodejs-scan-debugging
tag: nodejs-scan:master
[+] Building 1.2s (15/15) FINISHED
 => [internal] load build definition from Dockerfile                                                                                                                                  0.0s
 => => transferring dockerfile: 37B                                                                                                                                                   0.0s
 => [internal] load .dockerignore                                                                                                                                                     0.0s
 => => transferring context: 2B                                                                                                                                                       0.0s
 => [internal] load metadata for docker.io/library/python:3.10-alpine                                                                                                                 1.1s
 => [internal] load metadata for docker.io/library/golang:1.17-alpine                                                                                                                 1.1s
 => [internal] load build context                                                                                                                                                     0.0s
 => => transferring context: 3.48kB                                                                                                                                                   0.0s
 => [stage-1 1/5] FROM docker.io/library/python:3.10-alpine@sha256:a746f64081fca7d6368935750ffcbf04d447cb0131408c60cbf1a4392981890a                                                   0.0s
 => [build 1/4] FROM docker.io/library/golang:1.17-alpine@sha256:844031724987d525bd99857b3b8c00f99ff003241afdc5d1ee121d81eb4b8301                                                     0.0s
 => => resolve docker.io/library/golang:1.17-alpine@sha256:844031724987d525bd99857b3b8c00f99ff003241afdc5d1ee121d81eb4b8301                                                           0.0s
 => CACHED [stage-1 2/5] RUN pip install ruamel.yaml==0.16.12 njsscan==0.3.1                                                                                                          0.0s
 => CACHED [stage-1 3/5] RUN apk --no-cache add git ca-certificates gcc libc-dev                                                                                                      0.0s
 => CACHED [build 2/4] WORKDIR /go/src/app                                                                                                                                            0.0s
 => CACHED [build 3/4] COPY . .                                                                                                                                                       0.0s
 => CACHED [build 4/4] RUN CHANGELOG_VERSION=$(grep -m 1 '^## v.*$' "CHANGELOG.md" | sed 's/## v//') &&         PATH_TO_MODULE=`go list -m` &&         go build -ldflags="-X '$PATH_  0.0s
 => CACHED [stage-1 4/5] COPY --chown=root:root --from=build /go/src/app/analyzer /                                                                                                   0.0s
 => CACHED [stage-1 5/5] COPY .njsscan .njsscan                                                                                                                                       0.0s
 => exporting to image                                                                                                                                                                0.0s
 => => exporting layers                                                                                                                                                               0.0s
 => => writing image sha256:389d1d5e1db1bf345b01e7294b84393eff14b7e6b048811865bd10a7b79019b1                                                                                          0.0s
 => => naming to docker.io/library/nodejs-scan:master                                                                                                                                 0.0s
image: nodejs-scan:master
[INFO] [NodeJsScan] [2022-07-28T01:50:52Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/command@v1.8.0/command.go:76] ▶ GitLab NodeJsScan analyzer v3.1.0
[INFO] [NodeJsScan] [2022-07-28T01:50:52Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/command@v1.8.0/run.go:125] ▶ Detecting project
[INFO] [NodeJsScan] [2022-07-28T01:50:52Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/command@v1.8.0/run.go:147] ▶ Found relevant files in project, analyzing entire repository
[INFO] [NodeJsScan] [2022-07-28T01:50:52Z] [/go/pkg/mod/gitlab.com/gitlab-org/security-products/analyzers/command@v1.8.0/run.go:159] ▶ Running analyzer
[DEBU] [NodeJsScan] [2022-07-28T01:50:52Z] [/go/src/app/loadRuleset.go:21] ▶ /tmp/app/.gitlab/sast-ruleset.toml not found, ruleset support will be disabled.
[DEBU] [NodeJsScan] [2022-07-28T01:50:53Z] [/go/src/app/analyze.go:40] ▶ /usr/local/bin/njsscan --config .njsscan --json --output /tmp/njsscan.json /tmp/app
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/semgrep/semgrep_main.py", line 336, in main
    target_manager = TargetManager(
  File "<attrs generated init semgrep.target_manager.TargetManager>", line 24, in __init__
  File "/usr/local/lib/python3.10/site-packages/semgrep/target_manager.py", line 483, in __attrs_post_init__
    self.targets = [
  File "/usr/local/lib/python3.10/site-packages/semgrep/target_manager.py", line 484, in <listcomp>
    Target(
  File "<attrs generated init semgrep.target_manager.Target>", line 7, in __init__
  File "/usr/local/lib/python3.10/site-packages/semgrep/target_manager.py", line 338, in validate_path
    raise FilesNotFoundError(paths=tuple([value]))
semgrep.error.FilesNotFoundError: File not found: /tmp/app/main_sym.js

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/njsscan", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/site-packages/njsscan/__main__.py", line 77, in main
    ).scan()
  File "/usr/local/lib/python3.10/site-packages/njsscan/njsscan.py", line 44, in scan
    result = scanner.scan()
  File "/usr/local/lib/python3.10/site-packages/libsast/scanner.py", line 65, in scan
    self.options).scan(valid_paths)
  File "/usr/local/lib/python3.10/site-packages/libsast/core_sgrep/semantic_sgrep.py", line 40, in scan
    sgrep_out = invoke_semgrep(paths, self.scan_rules)
  File "/usr/local/lib/python3.10/site-packages/libsast/core_sgrep/helpers.py", line 50, in invoke_semgrep
    ) = semgrep_main.main(
  File "/usr/local/lib/python3.10/site-packages/semgrep/semgrep_main.py", line 347, in main
    raise SemgrepError(e)
semgrep.error.SemgrepError: File not found: /tmp/app/main_sym.js

[FATA] [NodeJsScan] [2022-07-28T01:50:53Z] [/go/src/app/main.go:28] ▶ open /tmp/njsscan.json: no such file or directory

Note the semgrep.error.SemgrepError: File not found: /tmp/app/main_sym.js exception.

This problem was peculiar because the underlying scanner (njsscan) had not been upgraded between the v3.0.0 and v3.1.0 release of our analyser, so we weren't expecting any changes in scanner behaviour. Further discovery uncovered that semgrep is installed as a transitive dependency of njsscan, via a library called libsast. libsast is not pinned to a specific version, so rebuilding the Docker container for njsscan could cause newer versions of libsast to be downloaded. That's in fact what has happened here.

  • v3.0.0 of our nodejs-scan analyser was built with version 1.5.0 of libsast , which pulls semgrep 0.80.0
  • v3.1.0 of our nodejs-scan analyser was built with version 1.5.2 of libsast, which pulls semgrep 0.104.0

Because the v3.1.0 Docker image was built recently, it also implicitly upgraded semgrep because the libsast dependency wasn’t being pinned by the upstream scanner.

semgrep 0.80.0 only filtered out “invalid” files, but 0.104.0 will raise an exception.

Proposal

An upstream issue has been filed: https://github.com/ajinabraham/njsscan/issues/99

In the meantime, it's possible to downgrade libsast to 1.5.0 in the analyser's Dockerfile to restore the old, working behaviour.

References