Extract data + metadata for the entire PDB

We need a pipeline that would process the PDB to extract useful data + metadata:

We should probably use mmCIF files for this (see #13 #14).

It takes a long time to process a non-trivial number of PDB files. We could convert all PDBs into a faster binary format such as HDF5, but this would make it difficult for us to distribute our code to others...
The pipeline should be reasonably easy to run for new PDBs. This way we could always fetch a PDB from the RCSB website if it is not available locally.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information