Detection of JWTs in the various Secret Scanners
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Motivation
The following issue tries to tackle the detection of JWTs (JSON Web Tokens) within GitLabs Security Scanners.
The Scanners are using certain Regexes to determine if a secret is detected or not.
However it is tricky due to variations in token lengths and encoding of JWT's
Proposal
After I wrapped my head around this, I would have a generic proposal:
(?:eyJ[0-9a-zA-Z_-]{7,}\.[0-9a-zA-Z_-]{7,}\.[0-9a-zA-Z_-]+)
Let me try to explain why I think "eyJ" is a good keyword to start:
JWTs consist of three parts:
- Header (Base64-URL encoded JSON)
- Payload (Base64-URL encoded JSON)
- Signature (Base64-URL encoded cryptographic signature)
Each part is separated by a dot (.
).
The header typically contains metadata about the token, such as the signing algorithm (alg
) and token type (typ
). Since this header is always a JSON object encoded in Base64-URL format, if would always start like this:
{"[a-zA-Z]
Means {"
followed by at least one letter.
And the base64-encoding of these strings always starts with eyJ
,
Here's a short snippet to verify:
import base64
import string
# Generate variations
variations = [f'{{"{char}' for char in string.ascii_letters]
# Encode each variation in Base64
encoded_variations = [base64.b64encode(variant.encode()).decode() for variant in variations]
# Print results
print("\n".join(encoded_variations))
Hence I think it's safe to assume eyJ
is a good way to start.
-
[0-9a-zA-Z_-]{7,}
→ Base64-URL encoded payload (minimum length of 7 is arbitrary, can be longer but this ensures we don't mistakenly match short Base64 strings). -
\.
→ The dots (.
) separate the JWT header, payload, and signature. - Repeated similar patterns for the second and third parts of the JWT.
@dbolkensteyn Would it make sense to think about a general pattern like the above for JWTs?
This would bring a huge benefit, as my test-results over here looks kinda scary, but accurate:
WDYT?