Add Spamcheck anti-spam engine to omnibus-gitlab packages
## Details
Request to include GitLab's [Spamcheck](https://gitlab.com/gitlab-org/spamcheck) anti-spam engine in omnibus-gitlab installations
It also includes a spam classifier, that is an obfuscated Python script along with a tensorflow model for classification.
* **URL:**
* Spam Classifier: https://gitlab.com/gitlab-com/gl-security/engineering-and-research/automation-team/ml-spam-detection/spam-classifier
* Spamcheck: https://gitlab.com/gitlab-org/spamcheck
* **License:**
* Spam Classifier: Proprietary license, obfuscated code - https://gitlab.com/gitlab-com/gl-security/engineering-and-research/automation-team/ml-spam-detection/spam-classifier/-/blob/main/LICENSE
* Spamcheck: https://gitlab.com/gitlab-org/spamcheck - MIT license
* **How does it integrate into GitLab (service, built-in feature)?**
* Spam classifier and spamcheck needs to be running. The former will listen over a socket. The latter will listen over two TCP endpoints (for GRPC and REST connections) for GitLab Rails to communicate with it.
* On creating a new issue, GitLab Rails will communicate with Spamcheck via gRPC for a verdict on whether the issue is spam or ham.
* Spamcheck communicates with spam-classifier to classify the incoming issue and returns a verdict to GitLab Rails (`ALLOW`, `BLOCK`, `CONDITIONAL_ALLOW`, `DISALLOW`, `NOOP`)
* **Does it need to run under a specific user or have specific permissions?**: No
* **What are the concerns for running it behind a firewall, proxy, etc?**: It is self-contained, but requires 3 ports (GRPC, REST and metrics endpoints) to be available.
* **Does it have any additional compilation or runtime requirements beyond what is already used within omnibus?**
* Spamcheck requires `libtensorflow_lite` for compilation
* Spam-classifier requires Python 3.9 runtime for execution
## Running (on the same node where GitLab runs)
### Spam-classifier
1. Download and extract the tarball from `https://glsec-spamcheck-ml-artifacts.storage.googleapis.com/spam-classifier/0.2.0/linux.tar.gz`
1. Run the following command
```
python3 dist/preprocess.py
```
### Spamcheck (on a different terminal)
1. Clone spamcheck repo and change to the target directory
1. Ensure the dependencies are present
1. Golang runtime
1. `make`
1. `libtensorflow_lite` - https://www.tensorflow.org/lite/guide/build_cmake
1. Set `GOPATH`
```shell
export GOPATH=${HOME}/go
```
1. Update `PATH` to include Golang binary path
```shell
export PATH="$PATH:$(go env GOPATH)/bin"
```
1. Build the binary
```shell
make build
```
1. Copy example config
```shell
cp config/config.toml.example config/config.toml
```
1. Change `modelPath` in the config file to point to the `model.tflite` file from the extracted spam-classifier tarball.
1. Run Spamcheck
```shell
make run
```
## Testing (on the node where GitLab runs)
### Command line (on a different terminal)
1. Create a file `spam.json` with the following content
```json
{
"title": "fifa xxx porn stream fifa xxx porn stream",
"description": "fifa xxx porn stream fifa xxx porn stream",
"user_in_project": false,
"project": {
"project_id": 14,
"project_path": "spamtest/hello"
},
"user": {
"emails": [{"email": "mr_stupendous@hotmail.com", "verified": true}],
"username": "MrStupendous",
"org": "GitLab"
},
"created_at": "2021-01-01T10:00:00Z",
"updated_at": "2021-01-01T11:00:00Z"
}
```
1. Create a file `ham.json` with the following content
```json
{
"title": "Sign up page not working",
"description": "Sign up page not working when accessed from mobile",
"user_in_project": true,
"project": {
"project_id": 14,
"project_path": "spamtest/hello"
},
"user": {
"emails": [{"email": "mr_stupendous@hotmail.com", "verified": true}],
"username": "MrStupendous",
"org": "GitLab"
},
"created_at": "2021-01-01T10:00:00Z",
"updated_at": "2021-01-01T11:00:00Z"
}
```
1. Download and install [`grpcurl`](https://github.com/fullstorydev/grpcurl)
1. Run the following commands
```shell
# Pass the spam.json file to the endpoint and see `BLOCK` verdict
$ grpcurl -plaintext -d "$(cat spam.json)" localhost:8001 spamcheck.SpamcheckService/CheckForSpamIssue
# Pass the haam.json file to the endpoint and see `ALLOW` verdict
$ grpcurl -plaintext -d "$(cat ham.json)" localhost:8001 spamcheck.SpamcheckService/CheckForSpamIssue
```
### Web UI (on a different terminal)
1. Go to Admin > Settings > Reporting page in the GitLab instance, and update the external spamcheck settings as follows:
1. Check the `Enable Spam Check via external API endpoint` checkbox
1. Use `grpc://localhost:8001` as the URL
1. No need to fill any API key

1. Create a project in the GitLab instance.
1. As a different user (who is not a member of the project) create an issue in the project with the following text as subject and description: `fifa xxx porn stream fifa xxx porn stream`.
1. See that issue creation has been blocked.
issue