Add Spamcheck anti-spam engine to omnibus-gitlab packages
## Details Request to include GitLab's [Spamcheck](https://gitlab.com/gitlab-org/spamcheck) anti-spam engine in omnibus-gitlab installations It also includes a spam classifier, that is an obfuscated Python script along with a tensorflow model for classification. * **URL:** * Spam Classifier: https://gitlab.com/gitlab-com/gl-security/engineering-and-research/automation-team/ml-spam-detection/spam-classifier * Spamcheck: https://gitlab.com/gitlab-org/spamcheck * **License:** * Spam Classifier: Proprietary license, obfuscated code - https://gitlab.com/gitlab-com/gl-security/engineering-and-research/automation-team/ml-spam-detection/spam-classifier/-/blob/main/LICENSE * Spamcheck: https://gitlab.com/gitlab-org/spamcheck - MIT license * **How does it integrate into GitLab (service, built-in feature)?** * Spam classifier and spamcheck needs to be running. The former will listen over a socket. The latter will listen over two TCP endpoints (for GRPC and REST connections) for GitLab Rails to communicate with it. * On creating a new issue, GitLab Rails will communicate with Spamcheck via gRPC for a verdict on whether the issue is spam or ham. * Spamcheck communicates with spam-classifier to classify the incoming issue and returns a verdict to GitLab Rails (`ALLOW`, `BLOCK`, `CONDITIONAL_ALLOW`, `DISALLOW`, `NOOP`) * **Does it need to run under a specific user or have specific permissions?**: No * **What are the concerns for running it behind a firewall, proxy, etc?**: It is self-contained, but requires 3 ports (GRPC, REST and metrics endpoints) to be available. * **Does it have any additional compilation or runtime requirements beyond what is already used within omnibus?** * Spamcheck requires `libtensorflow_lite` for compilation * Spam-classifier requires Python 3.9 runtime for execution ## Running (on the same node where GitLab runs) ### Spam-classifier 1. Download and extract the tarball from `https://glsec-spamcheck-ml-artifacts.storage.googleapis.com/spam-classifier/0.2.0/linux.tar.gz` 1. Run the following command ``` python3 dist/preprocess.py ``` ### Spamcheck (on a different terminal) 1. Clone spamcheck repo and change to the target directory 1. Ensure the dependencies are present 1. Golang runtime 1. `make` 1. `libtensorflow_lite` - https://www.tensorflow.org/lite/guide/build_cmake 1. Set `GOPATH` ```shell export GOPATH=${HOME}/go ``` 1. Update `PATH` to include Golang binary path ```shell export PATH="$PATH:$(go env GOPATH)/bin" ``` 1. Build the binary ```shell make build ``` 1. Copy example config ```shell cp config/config.toml.example config/config.toml ``` 1. Change `modelPath` in the config file to point to the `model.tflite` file from the extracted spam-classifier tarball. 1. Run Spamcheck ```shell make run ``` ## Testing (on the node where GitLab runs) ### Command line (on a different terminal) 1. Create a file `spam.json` with the following content ```json { "title": "fifa xxx porn stream fifa xxx porn stream", "description": "fifa xxx porn stream fifa xxx porn stream", "user_in_project": false, "project": { "project_id": 14, "project_path": "spamtest/hello" }, "user": { "emails": [{"email": "mr_stupendous@hotmail.com", "verified": true}], "username": "MrStupendous", "org": "GitLab" }, "created_at": "2021-01-01T10:00:00Z", "updated_at": "2021-01-01T11:00:00Z" } ``` 1. Create a file `ham.json` with the following content ```json { "title": "Sign up page not working", "description": "Sign up page not working when accessed from mobile", "user_in_project": true, "project": { "project_id": 14, "project_path": "spamtest/hello" }, "user": { "emails": [{"email": "mr_stupendous@hotmail.com", "verified": true}], "username": "MrStupendous", "org": "GitLab" }, "created_at": "2021-01-01T10:00:00Z", "updated_at": "2021-01-01T11:00:00Z" } ``` 1. Download and install [`grpcurl`](https://github.com/fullstorydev/grpcurl) 1. Run the following commands ```shell # Pass the spam.json file to the endpoint and see `BLOCK` verdict $ grpcurl -plaintext -d "$(cat spam.json)" localhost:8001 spamcheck.SpamcheckService/CheckForSpamIssue # Pass the haam.json file to the endpoint and see `ALLOW` verdict $ grpcurl -plaintext -d "$(cat ham.json)" localhost:8001 spamcheck.SpamcheckService/CheckForSpamIssue ``` ### Web UI (on a different terminal) 1. Go to Admin > Settings > Reporting page in the GitLab instance, and update the external spamcheck settings as follows: 1. Check the `Enable Spam Check via external API endpoint` checkbox 1. Use `grpc://localhost:8001` as the URL 1. No need to fill any API key ![image](/uploads/f5f1c27736e03eb9bea5163e1d935c4e/image.png) 1. Create a project in the GitLab instance. 1. As a different user (who is not a member of the project) create an issue in the project with the following text as subject and description: `fifa xxx porn stream fifa xxx porn stream`. 1. See that issue creation has been blocked.
issue