Commit f118768d authored by Johan Bloemberg's avatar Johan Bloemberg

Merge branch '57-implement-security-header-check-worker' into 'master'

Resolve "Implement security header check worker."

Closes #57

See merge request failmap/admin!18
parents e1cef456 b36c8f2c
# create/load python3 virtualenv when entering directory
layout python3
# enable DEBUG mode by default
export DEBUG=1
......@@ -29,4 +29,5 @@ vendor/theHarvester/
vendor/Google Chrome.app/
failmap_dataset*
failmap_testdataset*
failmap_debug_dataset*
\ No newline at end of file
dev_db/
failmap_debug_dataset*
......@@ -67,8 +67,13 @@ test_postgres: &test_postgres_template
check:
<<: *test_template
stage: check
before_script:
- pip install tox
- apt-get update -qq
- apt-get install -yqq shellcheck
script:
- tox -e check
- shellcheck tests/*.sh tools/*.sh
dataset:
<<: *test_template
......@@ -128,10 +133,16 @@ test_build:
image: docker:git
before_script:
- apk add --no-cache curl
script:
# build docker image without explicit tag to test building
# build docker imageg to test building
- docker login -u gitlab-ci-token -p $CI_BUILD_TOKEN registry.gitlab.com
- docker build .
- docker build . -t admin
# run simple smoketests to verify Docker image is sane
- tests/docker.sh docker
# run on MR
except: [master]
......
......@@ -6,13 +6,19 @@ RUN pip install virtualenv
RUN virtualenv /pyenv
# install requirements seperately as they change less often then source, improved caching
COPY requirements*.txt /
RUN /pyenv/bin/pip install -r requirements.txt
RUN /pyenv/bin/pip install -r requirements.deploy.txt
COPY requirements*.txt /source/
RUN /pyenv/bin/pip install -r /source/requirements.txt
RUN /pyenv/bin/pip install -r /source/requirements.deploy.txt
# install the app
COPY . /source/
RUN /pyenv/bin/pip install /source/
# copy all relevant files for python installation
COPY ./failmap_admin/ /source/failmap_admin/
COPY ./setup.py /source/setup.py
COPY ./setup.cfg /source/setup.cfg
COPY ./MANIFEST.in /source/MANIFEST.in
# Install app by linking source into virtualenv. This is against convention
# but allows the source to be overwritten by a volume during development.
RUN /pyenv/bin/pip install -e /source/
# switch to lightweight base image for distribution
FROM python:3-slim
......@@ -22,11 +28,16 @@ RUN /bin/bash -c 'mkdir -p /usr/share/man/man{1..8}'
# install dependent libraries (remove cache to prevent inclusion in layer)
RUN apt-get update && \
apt-get install -yqq libxml2 libmysqlclient18 mysql-client postgresql postgresql-contrib mime-support && \
apt-get install -yqq libxml2 libmysqlclient18 mysql-client postgresql \
postgresql-contrib mime-support python-watchdog python-setuptools && \
rm -rf /var/lib/apt/lists/*
ADD tools/autoreload.sh /usr/local/bin/autoreload
RUN chmod a+x /usr/local/bin/autoreload
# install build application
COPY --from=build /pyenv /pyenv
COPY --from=build /source /source
# expose relevant executable(s)
RUN ln -s /pyenv/bin/failmap-admin /usr/local/bin/
......
......@@ -26,13 +26,9 @@ Download and install below system requirements to get started:
- [git](https://git-scm.com/downloads) (download and install)
- [python3](https://www.python.org/downloads/) (download and install)
- [direnv](https://direnv.net/) (download and install, then follow [setup instructions](https://direnv.net/))
- [rabbitmq](http://www.rabbitmq.com/download.html) (for scanners only, download and install)
After installation of above tools, all following steps use the command line:
sudo easy_install pip # installs pip, a python package manager, with the command pip3
- Tox (`pip3 install --user tox`)
- [direnv](https://direnv.net/) (optional, download and install, then follow [setup instructions](https://direnv.net/), see Direnv section below)
- [Docker](https://docs.docker.com/engine/installation/) (optional, recommended, follow instructions to install.)
# Obtaining the software
......@@ -44,16 +40,88 @@ In a directory of your choosing:
# enter the directory of the downloaded software
cd admin
# sets Debug to true in this folder. Do not change the settings.py file.
Use Direnv to manage environment (see Direnv section below). This manages the Python Virtualenv and `DEBUG` setting required for local development.
direnv allow
# Quickstart
For the quickstart we assume the usage of Docker as it offers the most complete environment for development. This project aims to be environment agnostic and development without Docker is possible. See Development section below.
Below commands result in a failmap installation that is suitable for testing and development. It is
capable of handling thousands of urls and still be modestly responsive.
If you need a faster, more robust installation, please [contact us](mailto:[email protected]).
Ensure Docker is installed and running. Execute the following command to bring up the environment:
docker-compose up
This will build and start all required components and dependencies to run a complete instance of the Failmap project (grab some coffee the first time this takes a while).
You can now visit the [map website](http://127.0.0.1:8000/) and/or the
[admin website](http://127.0.0.1:8000/admin/) at http://127.0.0.1:8000 (credentials: admin:faalkaart).
To stop the environment simply press [CTRL][C].
The environment is aware of code changes in the `failmap_admin` folder. Services are automatically restarted to reflect the latest changes.
It is possible to start the environment in the background using:
docker-compose up -d
This can be shutdown using `docker-compose down`
There is a command-line application available to perform administrative tasks. To run it do:
docker-compose exec admin failmap-admin
Further in this documentation the `failmap-admin` command is mentioned, when using the Docker environment always prepend `docker-compose exec admin` before the command.
To view (and follow) all logs run:
docker-compose logs -f
To see all running components:
$ docker-compose ps
Name Command State Ports
---------------------------------------------------------------------------------------------
admin_admin_1 /usr/local/bin/autoreload ... Up 0.0.0.0:8000->8000/tcp
admin_broker_1 docker-entrypoint.sh redis ... Up 0.0.0.0:5672->5672/tcp, 6379/tcp
admin_database_1 docker-entrypoint.sh mysqld Up 0.0.0.0:3306->3306/tcp
admin_loaddata_1 /usr/local/bin/failmap-adm ... Exit 0
admin_migrate_1 /usr/local/bin/failmap-adm ... Exit 0
admin_worker_1 /usr/local/bin/autoreload ... Up 8000/tcp
The platform consists of 2 external dependencies `broker` (redis), `database` (mysql) and 2 main components `admin` (web frontend and administrative environment), `worker` (async task executor).
Two tasks are run at startup `migrate` (database schema management) and `loaddata` (test and development data loading).
Most composer commands can be run against individual components, eg:
docker-compose logs -f worker
For more information consult docker composer [documentation](https://docs.docker.com/compose/) or:
docker-compose -h
# Development
To perform non-Docker development, make sure all Requirements are installed. Run the following script to setup a development instance:
tools/dev_setup.sh
After this run:
# finally start the development server
failmap-admin runserver
Now visit the [map website](http://127.0.0.1:8000/) and/or the
[admin website](http://127.0.0.1:8000/admin/) at http://127.0.0.1:8000 (credentials: admin:faalkaart).
The setup script performs the following steps:
# download even more requirements needed to run this software
pip3 install -e .
......@@ -61,7 +129,7 @@ If you need a faster, more robust installation, please [contact us](mailto:hosti
failmap-admin migrate
# create a user to view the admin interface
failmap-admin createsuperuser
failmap-admin loaddata development
# loads a series of sample data into the database
failmap-admin load-dataset testdata
......@@ -69,12 +137,6 @@ If you need a faster, more robust installation, please [contact us](mailto:hosti
# calculate the scores that should be displayed on the map
failmap-admin rebuild-ratings
# finally start the development server
failmap-admin runserver
Now visit the [map website](http://127.0.0.1:8000/) and/or the
[admin website](http://127.0.0.1:8000/admin/) at http://127.0.0.1:8000
# Scanning services (beta)
Todo: add celery beat information
......
# docker-compose configuration to build a local development installation of the failmap platform.
version: "3"
services:
# message broker to distribute tasks
broker:
image: redis
# this containers logging is not really that interesting during development
logging:
driver: none
# Not configuring persistent storage for broker. Restarting will cause all unfinished
# tasks to be forgotten, instead of lingering around.
ports:
- 5672:5672
# stateful storage
database:
image: mysql
# this containers logging is not really that interesting during development
logging:
driver: none
environment:
MYSQL_ALLOW_EMPTY_PASSWORD:
MYSQL_ROOT_PASSWORD: "${DB_ROOT_PASSWORD:-secret}"
MYSQL_DATABASE: "${DB_NAME:-failmap}"
MYSQL_USER: "${DB_USER:-failmap}"
MYSQL_PASSWORD: "${DB_PASSWORD:-failmap}"
ports:
- 3306:3306
# Configure database to persist accross restarts of the development environment.
volumes:
- ./dev_db/:/var/lib/mysql/
# task executer
worker:
image: failmap/admin
build:
context: .
dockerfile: "${PWD}/Dockerfile"
links:
- broker
- database:mysql
# celery dislikes running as root
user: nobody
environment:
BROKER: redis://broker:6379/0
# BROKER: amqp://guest:[email protected]:5672
DJANGO_DATABASE: production
# mount current source into container to allow changes to propagate without container rebuild
volumes:
- .:/source/
# let celery be a little more informative regarding console messages
tty: true
# use watchdog to provide auto restart functionality on changed code
entrypoint: [ '/usr/local/bin/autoreload', 'failmap-admin']
command: [ "celery", "worker", "-l", "info", "-c", "6" ]
# web interface
admin:
image: failmap/admin
build:
context: .
dockerfile: "${PWD}/Dockerfile"
links:
- broker
- database:mysql
environment:
BROKER: redis://broker:6379/0
# BROKER: amqp://guest:[email protected]:5672
DJANGO_DATABASE: production
UWSGI_PYTHON_AUTORELOAD: "yes"
# mount current source into container to allow changes to propagate without container rebuild
volumes:
- .:/source/
ports:
- 8000:8000
# django decides what to log based on type of console
tty: true
entrypoint: [ '/usr/local/bin/autoreload', 'failmap-admin']
command: [ "runuwsgi" ]
# migrate
migrate:
image: failmap/admin
build:
context: .
dockerfile: "${PWD}/Dockerfile"
links:
- database:mysql
environment:
DJANGO_DATABASE: production
tty: true
command: [ "migrate" ]
loaddata:
image: failmap/admin
build:
context: .
dockerfile: "${PWD}/Dockerfile"
links:
- database:mysql
environment:
DJANGO_DATABASE: production
tty: true
command: [ "loaddata", "development", "testdata" ]
import logging
from collections import defaultdict
from django.conf import settings
log = logging.getLogger(__name__)
# log database settings during init for debug purposes
log.info('Database settings: {ENGINE}, {NAME}, {USER}, {HOST}'.format_map(
defaultdict(str, **settings.DATABASES['default'])))
Application specific fixtures (user accounts, settings) for different environments.
# Development
Contains a admin user for use during development. Username `admin`, password: `faalkaart`.
failmap-admin loaddata development
- model: auth.user
pk: 1
fields:
password: pbkdf2_sha256$36000$AVp6H74Dcqmy$5L1bnfpvg06UCs9XRrC+5lXjx4KUhHEWtEMgFrXeggo=
last_login: null
is_superuser: true
username: admin
first_name: ''
last_name: ''
email: [email protected]
is_staff: true
is_active: true
date_joined: 2017-10-30 14:32:44.614938+00:00
groups: []
user_permissions: []
"""Management command base classes."""
import json
import logging
import time
from collections import Counter
import celery.exceptions
import kombu.exceptions
from celery.result import AsyncResult, GroupResult, ResultSet
from django.conf import settings
from django.core.management.base import BaseCommand
from failmap_admin.celery import app
log = logging.getLogger(__name__)
......@@ -29,14 +37,37 @@ class TaskCommand(BaseCommand):
"""
task = None
# it is a anti-pattern to instantiate empty lists/dicts as class parameters
# but since management commands are contained in their own invocation this can fly
args = list()
kwargs = dict()
def _add_arguments(self, parser):
"""Method to allow subclasses to add command specific arguments."""
pass
def add_arguments(self, parser):
"""Add common argument for Celery tasks."""
parser.add_argument('-m', '--method', choices=['direct', 'sync', 'async'])
self.mutual_group = parser.add_mutually_exclusive_group()
parser.add_argument('-m', '--method', default='direct',
choices=['direct', 'sync', 'async'],
help='Execute the task directly or on remote workers.')
parser.add_argument('-i', '--interval', default=5, type=int,
help="Interval between status reports (sync only).")
self.mutual_group.add_argument('-t', '--task_id', default='',
help="Report status for task ID and return result (if available).")
self._add_arguments(parser)
def compose(self, *args, **options):
"""Placeholder to allow subclass to compose a task(set) if task is not specified."""
raise NotImplementedError()
def handle(self, *args, **options):
"""Command handle logic, eg: logging."""
# set django loglevel based on `-v` argument
verbosity = int(options['verbosity'])
root_logger = logging.getLogger('')
......@@ -47,12 +78,78 @@ class TaskCommand(BaseCommand):
elif verbosity == 0:
root_logger.setLevel(logging.ERROR)
self.interval = options['interval']
if options['task_id']:
# return self.wait_for_result(GroupResult(options['task_id']))
# this currently doesn't work
raise NotImplementedError('needs to be added')
# output resulting dict as JSON object as that plays nice with
# tools like jq for output parsing
def serialize(value):
"""Convert value into JSON serializable output."""
# recursively output exception trace
if isinstance(value, Exception):
error = {
'error': value.__class__.__name__,
'message': str(value)
}
if value.__cause__:
error['cause'] = serialize(value.__cause__)
return error
else:
return value
return json.dumps([serialize(r) for r in self.run_task(*args, **options)])
def run_task(self, *args, **options):
# try to compose task if not specified
if not self.task:
self.task = self.compose(*args, **options)
# execute task based on selected method
if options['method'] == 'sync':
self.task.apply_async().get()
elif options['method'] == 'async':
task_id = self.task.apply_async()
log.info('Task %s scheduled for execution.', task_id)
if options['method'] in ['sync', 'async']:
# verify if broker is accessible (eg: might not be started in dev. environment)
try:
app.connection().ensure_connection(max_retries=3)
except kombu.exceptions.OperationalError:
log.warning(
'Connection with task broker %s unavailable, tasks might not be starting.',
settings.BROKER_URL)
task_id = self.task.apply_async(args=self.args, kwargs=self.kwargs)
log.info('Task scheduled for execution.')
log.debug("Task ID: %s", task_id.id)
# wrap a single task in a resultset to not have 2 ways to handle results
if not isinstance(task_id, ResultSet):
task_id = ResultSet([task_id])
if options['method'] == 'sync':
return self.wait_for_result(task_id)
else:
# if async return taskid to allow query for status later on
return task_id.id
else:
# by default execute the task directly without involving celery or a broker
self.task()
# By default execute the task directly without involving celery or a broker.
# Return all results without raising exceptions.
return self.task.apply(*self.args, **self.kwargs).get(propagate=False)
def wait_for_result(self, task_id):
"""Wait for all (sub)tasks to complete and return result."""
# wait for all tasks to be completed
while not task_id.ready():
# get latests intermediate state update from results backend
try:
task_id.collect(timeout=0)
except BaseException:
pass
# show intermediate status
log.info('Task status: %s', dict(Counter([t.state for t in task_id.results])))
time.sleep(self.interval)
log.info('Final task status: %s', dict(Counter([t.state for t in task_id.results])))
# return final results, don't reraise exceptions
return task_id.get(propagate=False)
......@@ -3,11 +3,10 @@
# http://oddbird.net/2017/03/20/serializing-things/
# http://docs.celeryproject.org/en/latest/userguide/security.html
# Kept for reference, when (if) moving to celery.
# from __future__ import absolute_import
import os
from celery import Celery
import celery.exceptions
from celery import Celery, Task
from django.conf import settings
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "failmap_admin.settings")
......@@ -22,3 +21,24 @@ app.autodiscover_tasks([app for app in settings.INSTALLED_APPS if app.startswith
@app.task(bind=True)
def debug_task(self):
print('Request: {0!r}'.format(self.request))
class ExceptionPropagatingTask(Task):
"""Task baseclass that propagates exceptions down the chain as results."""
def __call__(self, *args, **kwargs):
"""Wrap task run to propagate Exception down the chain and to reraise exception if it is passed as argument."""
# If any of the arguments is an Exception reraise this adding current task for context.
for arg in args:
if isinstance(arg, Exception):
raise Exception('failed because parent task failed') from arg
# Catch any exception from the task and return it as an 'result'.
try:
return Task.__call__(self, *args, **kwargs)
except celery.exceptions.Retry:
# Do not return a retry exception as it is raised when a task is retried.
# If the task keeps failing eventually a MaxRetriesExceededError will come.
raise
except Exception as e:
return e
......@@ -185,8 +185,7 @@ def timeline(url):
# take the last moment of the date, so the scan will have happened at the correct time
datetimes2 = [x.replace(hour=23, minute=59, second=59, microsecond=999999, tzinfo=pytz.utc)
for x in datetimes]
datetimes2 = list(set(datetimes2))
datetimes2.sort()
datetimes2 = sorted(set(datetimes2))
# if the last datetime2 is today, then just reduce it to NOW to cause less confusion in
# the dataset (don't place ratings in the future).
......@@ -350,21 +349,21 @@ def rate_timeline(timeline, url):
these_ratings = {}
if endpoint.id in endpoint_ratings.keys():
for rating in endpoint_ratings[endpoint.id]:
if type(rating) == TlsQualysScan:
if isinstance(rating, TlsQualysScan):
these_ratings['tls_qualys_scan'] = rating
if type(rating) == EndpointGenericScan:
if isinstance(rating, EndpointGenericScan):
if rating.type == 'Strict-Transport-Security':
these_ratings['Strict-Transport-Security'] = rating
if type(rating) == EndpointGenericScan:
if isinstance(rating, EndpointGenericScan):
if rating.type == 'X-Content-Type-Options':
these_ratings['X-Content-Type-Options'] = rating
if type(rating) == EndpointGenericScan:
if isinstance(rating, EndpointGenericScan):
if rating.type == 'X-Frame-Options':
these_ratings['X-Frame-Options'] = rating
if type(rating) == EndpointGenericScan:
if isinstance(rating, EndpointGenericScan):
if rating.type == 'X-XSS-Protection':
these_ratings['X-XSS-Protection'] = rating
if type(rating) == EndpointGenericScan:
if isinstance(rating, EndpointGenericScan):
if rating.type == 'plain_https':
these_ratings['plain_https'] = rating
......@@ -757,8 +756,7 @@ def significant_times(organization=None, url=None):
# take the last moment of the date, so the scan will have happened at the correct time
datetimes2 = [x.replace(hour=23, minute=59, second=59, microsecond=999999, tzinfo=pytz.utc)
for x in datetimes]
datetimes2 = list(set(datetimes2))
datetimes2.sort()
datetimes2 = sorted(set(datetimes2))
# if the last datetime2 is today, then just reduce it to NOW to cause less confusion in
# the dataset (don't place ratings in the future).
......
import json
import logging
from django.core.exceptions import ObjectDoesNotExist
from django.core.management.base import BaseCommand
from failmap_admin.app.management.commands._private import TaskCommand
from failmap_admin.organizations.models import Organization
from failmap_admin.scanners.scanner_security_headers import compose_scan_organizations
from failmap_admin.organizations.models import Organization, Url
from failmap_admin.scanners.scanner_security_headers import (scan_organization,
scan_organization_celery)
from failmap_admin.scanners.state_manager import StateManager
log = logging.getLogger(__name__)
logger = logging.getLogger(__package__)
class Command(TaskCommand):
"""Remove all organization and url ratings, then rebuild them from scratch."""
class Command(BaseCommand):
help = 'Scan for http sites that don\'t have https'
help = __doc__
def handle(self, *args, **options):
organizations = StateManager.create_resumed_organizationlist(scanner="Security Headers")
for organization in organizations:
StateManager.set_state("Security Headers", organization.name)
scan_organization_celery(organization)
return
def _add_arguments(self, parser):
"""Add command specific arguments."""
self.mutual_group.add_argument('-o', '--organizations', nargs='*',
help="Perform scans on these organizations (default is all).")
def compose(self, *args, **options):
"""Compose set of tasks based on provided arguments."""
# select specified or all organizations to be scanned
if options['organizations']:
organizations = list()
for organization_name in options['organizations']:
try:
organizations.append(Organization.objects.get(name__iexact=organization_name))
except Organization.DoesNotExist as e:
raise Exception("Failed to find organization '%s' by name" % organization_name) from e
else:
organizations = Organization.objects.all()
# compose set of tasks to be executed
return compose_scan_organizations(organizations)
......@@ -3,146 +3,54 @@ Check if the https site uses HSTS to tell the browser the site should only be re
(useful until browsers do https by default, instead of by choice)
"""
import logging
import random
import time
from datetime import datetime
import celery
import pytz
from celery import Celery
import requests
from celery import Celery, Task, group
from celery.task import task
from requests import ConnectionError, ConnectTimeout, HTTPError, ReadTimeout, Timeout
from failmap_admin.celery import app
from failmap_admin.celery import ExceptionPropagatingTask, app
from failmap_admin.organizations.models import Url
from failmap_admin.scanners.endpoint_scan_manager import EndpointScanManager
from failmap_admin.scanners.models import EndpointGenericScanScratchpad
from .models import Endpoint