Add Spamcheck anti-spam engine to omnibus-gitlab packages
Details
Request to include GitLab's Spamcheck anti-spam engine in omnibus-gitlab installations
It also includes a spam classifier, that is an obfuscated Python script along with a tensorflow model for classification.
-
URL:
-
License:
- Spam Classifier: Proprietary license, obfuscated code - https://gitlab.com/gitlab-com/gl-security/engineering-and-research/automation-team/ml-spam-detection/spam-classifier/-/blob/main/LICENSE
- Spamcheck: https://gitlab.com/gitlab-org/spamcheck - MIT license
-
How does it integrate into GitLab (service, built-in feature)?
- Spam classifier and spamcheck needs to be running. The former will listen over a socket. The latter will listen over two TCP endpoints (for GRPC and REST connections) for GitLab Rails to communicate with it.
- On creating a new issue, GitLab Rails will communicate with Spamcheck via gRPC for a verdict on whether the issue is spam or ham.
- Spamcheck communicates with spam-classifier to classify the incoming issue and returns a verdict to GitLab Rails (
ALLOW
,BLOCK
,CONDITIONAL_ALLOW
,DISALLOW
,NOOP
)
-
Does it need to run under a specific user or have specific permissions?: No
-
What are the concerns for running it behind a firewall, proxy, etc?: It is self-contained, but requires 3 ports (GRPC, REST and metrics endpoints) to be available.
-
Does it have any additional compilation or runtime requirements beyond what is already used within omnibus?
- Spamcheck requires
libtensorflow_lite
for compilation - Spam-classifier requires Python 3.9 runtime for execution
- Spamcheck requires
Running (on the same node where GitLab runs)
Spam-classifier
-
Download and extract the tarball from
https://glsec-spamcheck-ml-artifacts.storage.googleapis.com/spam-classifier/0.2.0/linux.tar.gz
-
Run the following command
python3 dist/preprocess.py
Spamcheck (on a different terminal)
-
Clone spamcheck repo and change to the target directory
-
Ensure the dependencies are present
- Golang runtime
make
-
libtensorflow_lite
- https://www.tensorflow.org/lite/guide/build_cmake
-
Set
GOPATH
export GOPATH=${HOME}/go
-
Update
PATH
to include Golang binary pathexport PATH="$PATH:$(go env GOPATH)/bin"
-
Build the binary
make build
-
Copy example config
cp config/config.toml.example config/config.toml
-
Change
modelPath
in the config file to point to themodel.tflite
file from the extracted spam-classifier tarball. -
Run Spamcheck
make run
Testing (on the node where GitLab runs)
Command line (on a different terminal)
- Create a file
spam.json
with the following content{ "title": "fifa xxx porn stream fifa xxx porn stream", "description": "fifa xxx porn stream fifa xxx porn stream", "user_in_project": false, "project": { "project_id": 14, "project_path": "spamtest/hello" }, "user": { "emails": [{"email": "mr_stupendous@hotmail.com", "verified": true}], "username": "MrStupendous", "org": "GitLab" }, "created_at": "2021-01-01T10:00:00Z", "updated_at": "2021-01-01T11:00:00Z" }
- Create a file
ham.json
with the following content{ "title": "Sign up page not working", "description": "Sign up page not working when accessed from mobile", "user_in_project": true, "project": { "project_id": 14, "project_path": "spamtest/hello" }, "user": { "emails": [{"email": "mr_stupendous@hotmail.com", "verified": true}], "username": "MrStupendous", "org": "GitLab" }, "created_at": "2021-01-01T10:00:00Z", "updated_at": "2021-01-01T11:00:00Z" }
- Download and install
grpcurl
- Run the following commands
# Pass the spam.json file to the endpoint and see `BLOCK` verdict $ grpcurl -plaintext -d "$(cat spam.json)" localhost:8001 spamcheck.SpamcheckService/CheckForSpamIssue # Pass the haam.json file to the endpoint and see `ALLOW` verdict $ grpcurl -plaintext -d "$(cat ham.json)" localhost:8001 spamcheck.SpamcheckService/CheckForSpamIssue
Web UI (on a different terminal)
-
Go to Admin > Settings > Reporting page in the GitLab instance, and update the external spamcheck settings as follows:
- Check the
Enable Spam Check via external API endpoint
checkbox - Use
grpc://localhost:8001
as the URL - No need to fill any API key
- Check the
-
Create a project in the GitLab instance.
-
As a different user (who is not a member of the project) create an issue in the project with the following text as subject and description:
fifa xxx porn stream fifa xxx porn stream
. -
See that issue creation has been blocked.