v4: Investige PoW-based solution to protect GraphQL API from DoS
Since GraphQL use a single URL and is much more flexible, it is harder to protect the API against DoS attacks. The rate limiter of v3 used different limits based on the URL and HTTP verb. This is no longer possible. Therefore, we are currently requiring authentication to access it.
It is quite common for the use case of our application, that there is a huge amount of requests in short amount of time from a single IP address. Organizations often use NAT, so all users from the network share a single or a few IP addresses. As a consequence, we have to set high limits even for sensitive endpoints like guest user creation.
Proposal
Prove of Work is a method, which can limit the amount of requests a client can sent by requiring it to solve a complex challenge before it is granted access. By limiting the amount of requests per solved challenge, it reduces the effectiveness of DoS attacks.
PoW is often used as part of modern solutions against DoS and bots as a replacement of CAPTCHAs. The most common solution on the web is probably Cloudflare's Turnstile (SaaS), which combines PoW with other signals for client verification. An open source solution is Altcha, which is solely based on PoW.
We could apply Altcha to our API the following way:
- The unauthenticated client requests a challange using a public API endpoint.
- The backend generates a challenge using Altcha's backend library and sends it to the client.
- The client solves the challenge using Altcha's frontend library.
- The client sends the solution to a public API endpoint
- The backend verifies the solution and responds with a JWT if the solution is valid.
- The client can now access the GraphQL API using the JWT for authentication. Access is still limited because no User ID is assigned.
Advantages
- The challenge complexity can be adjusted. Combined with rate limiting, it could be dynamically increased for suspicious clients. It could also take server load into consideration.
- Invisible to the user. No interaction required.
Disadvantages
- Complex challenges take a long time to solve on older devices. When old devices need to be considered, the complexity needs to be reduced making the protection less effective.
- Implementation more complex than simple rate limiting (backend + client).