README.md 23.9 KB
Newer Older
1
# Meep Meep! A story of certificate (un)verification 🔏📜🔍❌
Steve Kerrison's avatar
Steve Kerrison committed
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

This article discusses the lack of certificate checking done by ACMEv2 clients,
as well as the lack of provision in the ACMEv2 protocol specification to
encourage any checking. This article explores the implications of this, and
demonstrate why we should probably being doing some additional checks in our
ACMEv2 clients.

The project is called "Meep Meep", because that's the sound a roadrunner
makes.  The author couldn't think of a cleverer name for something related to
ACME. There's already a Go client library called
[`meepmeep`](https://github.com/calavera/meepmeep), last worked on in 2018,
proving that all ideas are derivative.

## In brief

17 18 19 20 21
ACMEv2, best known for its use with [Let's Encrypt](https://letsencrypt.org/),
is a protocol designed to make it relatively simple to get certificates. The
protocol has a number of verification steps in it, and requirements for the
server and client to meet.  However, once a certificate issuance has been
agreed, and a certificate downloaded by the client, the requirements run out.
Steve Kerrison's avatar
Steve Kerrison committed
22 23 24 25 26 27 28 29

Summarising what happens:

1. The client establishes trust in the service ✔️
2. The service establishes trust in the client ✔️
3. The service establishes trust in the identities claimed by the client ✔️
4. The client **does not** establish trust in the certificates its given ❌

30 31 32 33 34
To see the evidence, you can jump straight to this articles [survey][#a-survey]
of clients. To see how to deal with it, here's a [proposed
enhancement](https://gitlab.com/microsec-public/acme.sh/-/commit/c2d7e7e60ae3f136b2266f4c801b97b0273d2856)
to one such client.

Steve Kerrison's avatar
Steve Kerrison committed
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383
### So what?

This means that the certificates we get from the likes of Let's Encrypt aren't
checked to see if they contain what we _expect_. That means, when we come to
use them in our web servers etc, they might not work, or might not be as secure
as we asked for. Some quick examples of potential abuses:

- The certifcate we get has domains in its SAN that are different to what we
  requested
- We want a certificate for a TLS Server, but the certificate we get also has
  code signing in its Key Usage
- The signature type and length doesn't match what we were expecting
- We asked for the OSCP Must Staple extension, but didn't get it
  - Which has [already come to some people's attention](https://community.letsencrypt.org/t/certbot-enhance-must-staple/94107)

### Who's problem is that?

There's nothing stopping us from checking these things ourselves, and, quite
possibly, this was left intentionally out of the scope of ACMEv2. However, the
author believes that some best-practices should be established for verifying
certificates, something that perhaps all clients _should_ perform, and
something that might be worth considering as forming part of any future
versions of the ACME spec.

This isn't just about checking that issued certificates are _valid_, but also
ensuring that they contain _what was requested_.

### What do I do?

There's a few things:

1. Carry on as you are, it's probably not _that_ big of a deal.
1. Have a conversation in your DevSecOps team about how you check that the
certificates you get contain what you expect, and how you might want to
strengthen that capability.
1. Read the rest of document and think about what you can do to effect a
change to close the loop of "trust, but verify".


## In depth

There are several ways to get certificates from a Public Key Infrastucture
(PKI). Manually setting up your own PKI by using `openssl`, or even using it to
make self-signed certificates, is possible, but not usually very convenient or
useful. Copying and pasting CSRs into services that will then give you a
downloadable certificate was once the most common way of getting legitimate
certificates for the web. Then there's enrollment protocols like SCEP and EST,
used on some networking and mobile devices. But by far the most prolific
enrollment protocol is now ACMEv2, thanks to Let's Encrypt. ACMEv2 is not
Let's Encrypt, but Let's Encrypt had a big role to play in shaping ACMEv2.

Let's Encrypt's purpose is to give a free and easy way for people to make
websites safer by providing certificates to site owners. Then, there's no
reason to use HTTP... [most of the time](https://https.cio.gov/guide/#are-federally-operated-certificate-revocation-services-crl-ocsp-also-required-to-move-to-https).

### The process

The basic steps for ACMEv2 are:

1. Create an account with the service provider (usually Let's Encrypt).
1. Generate an order for some certificates with the service provider
1. Give the service provider a way to verify that the identities you wish to
use in the certificates are owned and controlled by you.
1. Request and obtain the certificates

A sub-set of this process is used for renewals of certificates, and revocation
of certificates is also possible. AJ ONeal has written a [step by step node.js example](https://coolaj86.com/articles/lets-encrypt-v2-step-by-step/) of how to interact with Let's Encrypt using ACMEv2.

### The sticking point: Getting certs

It's the last step in the above simplified break-down of ACMEv2 where things
_could_ go wrong. A client has proven who they are and that they control the
identities (domains) they want to represent, and has generated a Certificate
Signing Request (CSR) for the very same. The issuer will check this CSR to
make sure it contains what was agreed, and then issue a certificate. Clients
download that issued certificate and that's it.

#### CSRs vs certs

CSRs are _very_ similar to certificates. They're an
[ASN.1](https://www.itu.int/en/ITU-T/asn1/Pages/introduction.aspx) encoded data
structure with a bunch of fields, one of which is a public key, and other is a
signature. Assuming a chain of trust can be established with that public key,
then the origin of the ASN.1 data structure, and its integrity, are verifiable.

A CSR is signed by its creator, and the private key used to sign it is owned by
the creator. There's no _chain_ at this point - CSRs are self-signed. The main
things that can be known by verifying a CSR signature are that:

1. The CSR wasn't modified by anyone who isn't the key-holder
1. Some cryptographic effort was used to generate a key and sign it
1. The public key in the CSR is correctly paired with the private key used to
sign the CSR.

That's why ACMEv2 ties CSRs to accounts, as well as orders with verified
challenges. More than the CSR alone is required to determine if the contents
of a CSR are something that should be put into a certificate and issued to the
requester.

So what about a certificate? In a nutshell, when creating a certificate, some
fields are copied from a CSR into a new ASN.1 structure, which has some
additional fields in it as well. So a verified certificate proves that:

1. The certificate issuer accepted some of the fields from a CSR into the
certificate.
1. The issuer has provided their details in the certificate (creating a chain)
1. The public key from the CSR is presented in the certificate
1. The certificate could only have been created and signed by the issuer, and
hasn't been modified by anybody else.

If a chain of trust stretches back to the issuer, and the issuer's identity
information and public key is available, then there is trust that they chose to
issue this cert for the requester. Therefore, there is trust in the identify of
the holder of the certificate. How that's done is not an exercise for this
article, but you can [have some fun with
`openssl`](https://www.openssl.org/docs/man1.1.1/man1/verify.html) if you want.

So a certificate is very similar to a CSR, except it's got a few extra and
different fields, and is usually signed by a different key.


#### The big reveal

A question came up in my workplace recently: "Can you modify a CSR without
holding the private key?" Of course not. Not without invalidating the
signature, at least. But that question was meant to take us towards a more
important one: "Can I use my CSR to control what goes into the certificate I
get?"

The answer to that is... not really. At least, not with any guarantee.

The fact is, requesters are at the mercy of the issuer to take a CSR and use
the information in it to build an appropriate certificate. If they don't, it
could be for one of several reasons:

1. The requester wanted something we're not allowed to have
   - The validity period is not something we can choose
1. The requested capabilitity isn't part of the product
   - Wildcard domains, code signing capabilities, etc, aren't always included
     in the same product
1. The capabiltiies exceed those required by the certificate
   - A web server shouldn't be able to sign code. If an ACME server provided
	 such capabilities unexpectedly, the web server could be next on the list
to compromise, thanks to its excessive privileges.
1. The issuer got compromised and somebody is mangling certs for nefarious
purposes
   - A DoS on servers by giving out garbage certs, for example

The last two are quite exciting... and unlikely. But does that mean it won't
ever happen? Further, by doing better checking of issued certs, DevSecOps
teams can squash issues relating to misunderstandings of what is being
requested/purchased from a provider.

Most of these issues can be mitigated in other ways. For example, code signing
certs probably shouldn't share the same root of trust as web servers, and
different trust stores should be used for each. But, the principle of
defence-in-depth tells us that we shouldn't rely on a single mitigation.

##### A closer look

If you're interested in going deeper still, here's a comparison of some (but
not all) of the components in taking a CSR and creating a signed certificate
from it.

![CSR to cert](csr-to-cert.png)

You can generate similar outputs to these with `openssl (req|x509) -in <infile> -noout -text`.

###### Sections

CSRs and certificates have different headers, obviously, which are shown in
grey here.

In blue is the public key of the CSR or certificate owner.

In orange, is the identity information of the certificate, or the subject.
This is the "who", and for ACMEv2 we have proven ownership of these
identifying subjects, to the CA, through one of ACMEv2's challenges. This
inforamtion appears in two places: "Subject" and "Subject Alernative Name".
The latter is newer, and now required for modern browsers and TLS libraries to
verify certs.

In yellow are usage extensions. In a CSR, these take the form of _requested_
extensions. These denote capabilities the requester wants encoded into the
certificate. In other words, these are the activities the requester wants to
perform, with the identity and key information provided. For a certificate,
these are the activites _allowed_ to be performed. It's up to clients
interpreting certificates to enforce these.

Then, in pinkish-purple, is the signature. For a CSR, this is self-signed
by the owner. For the certificate, this is signed by the issuing CA, the
public key for which can be used to verify the certificate, but must be found
elsewhere.

Finally, in green, and only in the certificate, is extra information added by
the CA, such as the validity period of the certificate, identifying
information for the issuer's key, how to perform revocation lookups, and so
on.

###### Copied/modified data from CSR to cert

So there's a lot more in a certificate than a request, but _some_ of the
request makes it into the certificate. Arrowed lines indicate some of that
transfer from CSR to cert.

The solid line indicates an essential cryptographic part. If the public key in
the certificate does not match the public key we used in the request, then the
certificate is completely useless, as the owner will not possess the correct
private key to do any cryptographic activities backed by the cert.

The dashed lines indicate identifying information. These are necessary in
order to use a certificate to identify the requester in the way they want.
However, there may be some deviations. For example, some CAs will add an extra
SAN entry for the `www.` prefix of a domain, even if one wasn't included in
the request. This is not the case for Let's Encrypt, however.

The dotted line indicates that the usage extensions and constraints that were
requested are, hopefully, reflected in the certificate, but additional
constraints, policy information, and other information may be added. This is
probably fine, but if the certificate gets unexpected capabilities, that might
pose a problem for the owner.

If ACMEv2 clients are responsible for generating the CSRs (which they usually
are), then it seems logical that they also check that the issued certificate
respects the CSR to a reasonable extent. For externally generated CSRs, it
becomes more of a grey area.

#### This has nothing to with ACMEv2

The issue detailed in this article is actually related to requesters and
Certificate Authorities. This issue can exist regardless of protocol: ACME,
SCEP, EST, some in-house solution or even manual CSR submissions. But, because
ACMEv2 is so prolific, and has a well defined and extensible set of
verification challenges, it seems right to look to it to see how we could do a
_little bit more_ on the client side.

Ultimately, only the application that takes the certificate, or the clients
connecting to the application, can know if the certificate is suitable. But if
there was a way of specifying to the ACMEv2 client exactly what is required,
then the client could provide some assurance that those requirements had been
met. The parent application could then do further checks, if needed.

##### A note on Certificate Transparency

Approaches such as [Certificate
Transparency](https://www.certificate-transparency.org/) go some way towards
providing checks against misbehaving CAs. However, the main focus in that, so
far, is to prevent a rogue or compromised CA from issuing certificates to
people or businesses who don't own the domains they're claiming to.

However, there are other deficiencies or deviations that could find their way
into certificates without immediately alerting those watching the CT logs.
Further, ACMEv2 clients aren't doing anything to check these anyway (at least
not the ones looked at in this article). While it's great that browsers can
conduct extra checks like this, it seems like a good idea of a certificate
enrollment client to do some checks of its own, too.

### A survey

The ACMEv2 specification, [RFC8555](https://tools.ietf.org/html/rfc8555)
doesn't mention any need to verify the certificate that is returned. So, if
clients don't check the cert, they're still following the spec. But maybe they
_should_ check the cert. Below is a sample of ACMEv2 clients and an assessment
of what certificate verification they perform:

| Client | Verifies cert? | Remarks |
| ------ | -------- | ------- |
| [`acme.sh`](https://github.com/acmesh-official/acme.sh) | ❌ [Writes response straight to file](https://github.com/acmesh-official/acme.sh/blob/f2d350002e7c387fad9777a42cf9befe34996c35/acme.sh#L4784) | Also fetches the chain, but doesn't verify that either. |
| [`getssl`](https://github.com/srvrco/getssl) | ❌ [Straight to file](https://github.com/srvrco/getssl/blob/f211f581f8ae3b7187924c67400dc0b564f0d4b0/getssl#L1440) | Fetches chain |
| [`acmez`](https://github.com/mholt/acmez/) | ❌ [Fetches chain, doesn't verify](https://github.com/mholt/acmez/blob/dc9c5f05ed1ecfd68f0597a7bf9ea2603433665c/acme/certificate.go#L42) | Used by `caddy` for automated certs. |
| [`certbot`](https://github.com/certbot/certbot) | ❌ [Reads response](https://github.com/certbot/certbot/blob/79297ef5cbb39e7d66cfa21039ea4b962e5619a5/acme/acme/client.py#L761) then [writes to file](https://github.com/certbot/certbot/blob/79297ef5cbb39e7d66cfa21039ea4b962e5619a5/certbot/certbot/_internal/main.py#L1152) | Some people would like [better must-staple behaviour](https://community.letsencrypt.org/t/certbot-enhance-must-staple/94107), which falls under the topic of this article |
| [`acme-client`](https://github.com/unixcharles/acme-client/) | ❌ [Copies response as cert pem](https://github.com/unixcharles/acme-client/blob/ded2cebc31ec9bcc3b18261d5bd413108d43435d/lib/acme/client.rb#L131) | Ruby client, e.g. used by [`gitlab`](https://gitlab.com/gitlab-org/gitlab/-/blob/7524bf7f0c806c97dd42657e3ca9d9dc2d5e2091/Gemfile.lock#L7) for giving LE certs to pages. Incidentally, `gitlab` ✔️  [does have some certificate tests](https://gitlab.com/gitlab-org/gitlab/-/blob/7524bf7f0c806c97dd42657e3ca9d9dc2d5e2091/app/models/pages_domain.rb#L22). |

There are [many more clients](https://letsencrypt.org/docs/client-options/), so
PRs to this table are welcome.

So, using GitLab's validation of certificates obtained via `acme-client` for
Pages as an example of an application that does some due diligence on the
certs it receives, the ACMEv2 client itself leaves the job of checking the
certificate to the parent application. That separation of duties is what
really comes into question in this article.

### What next?

In the brief section of this article, I listed three things that we could do,
so let's expand on those here.

#### Carry on as you are, it's probably not _that_ big of a deal.

The truth is, a web server will probably complain if the certificate loaded
into it is completely nonsensical. And, hopefully, you already have decent
healthchecks in place that you can stage and then deploy a cert _after_
proving it doesn't break things.

Not breaking things doesn't mean everything is fine, but if the thing you care
about most if people being able to securely connect to your website, then so
long as their browser trusts the cert you've got, you're good in that regard.

#### Have a conversation within your DevSecOps team

Certificate issuance and renewal is, hopefully, part of your deployment
process, and updates are scheduled somehow.

It makes sense to ensure that you check your certificates before putting them
to use. Just because you were issued one, doesn't mean it will work. Ensure
there's enough coupling between the certificate issuance/renewal process, and
the services that rely on them. That way, if one day Let's Encrypt has a crazy
moment, you'll catch it, and keep using your old cert until they fix the
problem. If certificate enrollment clients helped with this, that would be
great.

#### Strengthen the specification

There are a few ways in which the specification could be strengthened.

##### Additional operational considerations

[RFC8555 section 11.4](https://tools.ietf.org/html/rfc8555#section-11.4)
addresses potentially malformed certificate chains, and how the contents
should be verified to only be a bundle of PEM-encoded certificates, to
mitigate possible private key replacement attacks.

Section 11 may benefit from an additional sub-section that encourages the
certificate chain to be verified in full, and the certificate contents and
capabilities to be compared to those expected by the client, or based on the
CSR.

##### Additional expectations

A future protocol version could include an indication of expected properties
of the certificate. For example, certificate duration, exact key usage
requirements, etc, could be stated. If the server cannot, or will not, abide
by these, then it can reject the order, rather than signing a certificate that
would ultimately be rejected by the client.

Similarly, definition of these expectations would form the basis for exactly
what checks a client would need to perform.

This could also be addressed in the specification of CSRs, and could then apply
regardless of protocol, although implementations of enrollment protocols would
at least need to indicate whether they support these CSR extensions. This is
similar in nature to the identification and handling of [`critical`
extensions](https://tools.ietf.org/html/rfc5280#section-4.2) in X.509
certificates.

#### Strengthen the clients

Standards be damned, there's nothing stopping clients from offering "hardened"
modes where some of these checks are performed now. In fact, the author has
384 385
[forked `acme.sh`](https://gitlab.com/microsec-public/acme.sh) to provide
[an example of such hardening](https://gitlab.com/microsec-public/acme.sh/-/commit/c2d7e7e60ae3f136b2266f4c801b97b0273d2856).
Steve Kerrison's avatar
Steve Kerrison committed
386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442
The modification performs the following:

- Verifies the issued certificate and chain against the system's trust store,
  or a specified root of trust.
  - No specific consideration for alt chains is given, or how the root is
    obtained if it's not in the trust store.
- Verifies the certificate has the purpose "TLS Server" set in the certificate
  - This may be excessive, or need parameterising, because while Let's Encrypt
    and similar services are expected to provide certs for TLS servers, not
all uses of ACMEv2 may require it.
- Checks that the common name in the cert matches the main requested domain
  - This fails to account for certificates with _only_ a SAN and no subject/DN,
    which [is possible](https://tools.ietf.org/html/rfc5280#section-4.2.1.6)
- Checks that the list of requested domains exactly matches the list of names
  in the SAN
  - To ensure none are missing, modified, or unexpected entries added
- Performs an OCSP query based on the certificates responder URL
  - Assumes one exists, and will fail if it doesn't, which may be excessive
- If OCSP must-staple was specified when forming the CSR, checks that
  [`status_request`](https://tools.ietf.org/html/rfc7633#section-3) is present
in the certificate's TLS features list.

It's around 50 addional lines of `ash` compatible shell scripting, or less
than 1% additional code. In terms of compatibility, it relies only on
`openssl`, basic `grep` and basic `sed` for its checks. Clients built around
more sophisticated languages could do more robust and sophisticated checking
more easily.

A good potential next step from this is, after some validation by the
communitity, formulating some updates to the [Let's Encrypt Integration
Guide](https://github.com/letsencrypt/website/blob/master/content/en/docs/integration-guide.md),
so that more clients might adopt such enhancements.

## Conclusions

Certificate enrollment protocols have made the adoption of PKI easier, but
there are still gaps in the way we trust the interactions between clients and
CAs. This article showed how one enrollment protocol - ACMEv2 - misses out on
an opportunity to ensure clients get the certificates they expect, and as a
result, clients supporting ACMEv2 happily accept whatever they're given,
without extra checks.

At this stage, we have a talking point more than we have a critical security
issue. A broken or misbehaving CA _could_ take advantage of these lack of
checks, but then, there are much worse things that could happen in such a
scenario. Nevertheless, having stricter checks, and standardising them, might
reveal such issues faster, and lead to fewer downtime notifications.

It's quite possible that there's another RFC, standard, or best practices
document out there that addresses this issue, but has been overlooked by the
author. In which case, get in touch, so that this article can be updated to
give those materials the spotlight.

## About the author

Steve started writing technology guides in the early 2000s and has been
involved in tech, teaching and innovation ever since. Today, Steve is CTO at
Steve Kerrison's avatar
Steve Kerrison committed
443 444
[MICROSEC](https://usec.io/), leading a team creating the next generation of
cyber-security for IoT and smart nations.
Steve Kerrison's avatar
Steve Kerrison committed
445 446 447

## Acknowledgements

Steve Kerrison's avatar
Steve Kerrison committed
448 449 450
The author would like to thank [MICROSEC](https://usec.io/) colleagues Ahnaf
Siddiqi, Ragavan Kalatharan and Shazina Zaini for being part of the discussions
that lead to this exploration.
Steve Kerrison's avatar
Steve Kerrison committed
451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472

Also thanks to Phil from [ISRG](https://www.abetterinternet.org/)/[Let's
Encrypt](https://letsencrypt.org/) for answering my security support query and
recommending to engage with the [client developer
community](https://community.letsencrypt.org/c/client-dev) as well as taking a
look at the integration guide.

### List of contributors

People who contribute changes to this work can add themselves here:

- Steve Kerrison

## License

This work is licensed under a
[Creative Commons Attribution 4.0 International License][cc-by].

[![CC BY 4.0][cc-by-image]][cc-by]

[cc-by]: http://creativecommons.org/licenses/by/4.0/
[cc-by-image]: https://i.creativecommons.org/l/by/4.0/88x31.png