Address issues found by the quick security evaluation
PeerDB Search had a quick security evaluation and some issues were found. Based on the issues, I propose the following actions:
- PeerDB uses raw HTML in text fields and we display it as-is on the frontend. The assumption is that stored data is trusted and all HTML is escaped/sanitized. But:
- We do not document this at all. We should warn users that they have to sanitize data.
- With default tooling to insert data, we should have some standard HTML sanitization in place, which users can disable or modify, but currently by default no sanitization is done.
- I would not do additional step of sanitization on the frontend. We should just assume data is already sanitized. We could make sure PeerDB works with strict (no inline code) Content Security Policy header and issue those headers.
- Currently we do not persist search queries in the database but in memory only. We should persist search queries in the database and just use small cache with expiration in memory (search queries are immutable so caching is easy).
- We do not limit much the size/complexity of a search query (except by the POST request size). Maybe we should?
- We do not do any rate limiting at an API level.
- We should limit the amount of queries one can do per IP per some time period. (In the future, we can limit also based on access token.)
- We should decide if we just want to return an error code when you hit the limit and then we have to add code to the frontend to retry (but ideally human use from the frontend should never hit limits - but this might not be true if IP is shared by multiple users). Or should we have soft limiting where requests are just slowed down if they are too fast (until hard limiting on really too many requests) - this could simply frontend.
- There are no limits on how many requests one can open in parallel. Maybe just limiting requests per IP addresses this as well.
- HTTP server does have any timeouts for request handling. Ideally, I would want some limiting on how fast the client has to be sending the request and reading the response, instead of using timeouts, because rate adapts to large request and response payloads automatically.
- We should support authenticated access to ElasticSearch and document that as well by default so that users use good defaults. Documentation (and deployment) should use two users, one with read-only permissions for backend to use, and one with write permissions for indexing scripts to use.
Full report: report_ngie-peerdbsearch.pdf