Determine ADP Infrastructure Work Scope for 2023
DAHR has a CLIR grant running until July 2023. The existing architecture supporting the adp application includes three separate LAMP stacks supporting "development" and "production" environments. These stacks provide a frontend to a single FileMaker Pro system. All components are hosted on-premises. The current architecture can be diagrammed:
As part of the scope, the DAHR/ADP contractor would like to add an additional LAMP stack. We would promote the existing "development" to "staging" (it is used as part of the production data pipeline), and introducing a new "development" system which can be safely used for development purposes.
Additionally, the existing LAMP systems are critically out-of-date. We know that the following major dependencies are End-of-Life:
- CentOS 5 (EOL: Mar. 31, 2017)
- PHP 5.6 (EOL: Dec. 31, 2018)
- MySQL 5.5 (EOL: Dec. 31, 2018)
There is a need to identify and update other out-of-date dependencies, especially those which are not receiving security updates or have known vulnerabilities.
These updates are urgent and it is highly desirable to complete them with the contractor's engagement during the CLIR project timeline. To avoid a need for duplicated work, CLIR deliverables should not be developed for the existing PHP and MySQL versions. Additionally, the needed upgrades (particularly for PHP and MySQL) have application side implications that will require work which the contractor is uniquely placed to carry out.
The goals of this work are:
- provide the requested development system as quickly as feasible;
- upgrade dependencies to current versions and resolve current security risks;
- establish sustainable infrastructure and a model for ongoing maintenance.
Questions
- How easily can we upgrade the PHP code to support PHP 8 and MySQL 5.6?
- will it be necessary to provide a development system supporting PHP 5 in the short term, or can we update to PHP 8 as "stage 1"
- which version of the yiiframework is in use? are other PHP library dependencies under current support?
 
- How do we use shell access on the current VMs?
On-premises Updates (VSphere)
The approach involving the least architectural change is to upgrade the existing vSphere systems to the latest versions. We would deploy a new system supporting PHP 8 and a current MySQL database (either 5.7 or 8, depending on application considerations). In a later phase, but before the end of the current grant, we would replace the current systems following the model of the new development system.
This approach has the virtue of limiting structural change. The network topology would remain (more or less) the same, as would the technologies used at the infrastructure layer. Developer access and software supply chain approaches (such as they are) would also be unchanged.
Issues
- Due to the age of the software involved, it's not feasible to provision a new development system that is significantly like the existing stack in vSphere. Any rebuild would be more or less from scratch.
- There are risks associated with lost knowledge and lack of documentation around the existing systems. It will be hard to accurately estimate this work.
- The existing database synchronization scripts likely need to be replaced, while simple solutions are available in AWS, this vSphere option defers resolution of this issue.
- The existing model for maintenance is reactive and requires sustained attention for routine tasks, leading to deferred maintenance. Continuing to build on this model is a missed opportunity.
Cloud Deployment
A more forward looking approach would be to move the DAHR infrastructure into the AWS cloud. In general, the Library is moving it's web application workloads in this direction.
General Cloud Issues
Developer On-boarding
The contractor maintaining the PHP code is accustomed to having ssh access to the running "development" and "production" systems. Providing this kind of access to either of the proposed cloud environments may not result in the same capabilities for the developer. The systems involved may or may not have stable IP addresses or be open to SSH traffic. They would likely not persist changes made in the running environment for any predictable duration or have common utilities installed for developer use.
The contractor would need to be on-boarded to a new approach and any concerns about their ability to do their existing work would need to be addressed.
Networking
The current data pipeline runs from an on-premises FileMaker system which synchronizes in near real-time to the MySQL "development" database. "development" then synchronizes to the two "production" systems in sequence on a nightly schedule.
In order to keep this FileMaker driven process (and assuming we will not move FileMaker into cloud infrastructure), we would need to establish VPC access to the database for the FileMaker system so data can move off campus.
Database
AWS provides a managed MySQL database solution as part of the RDS product. In either architecture, we would benefit from moving away from bundled servers running both PHP and MySQL, toward more isolated systems. RDS also supports delayed replication to "read replicas", which may be a drop in replacement for the current database synchronization processes. The RDS product supports MySQL 5.7 (planned end-of-life October 2023) and 8.0.
Containerization & kubernetes Deployment
Containerizing is the most radical architectural change on the table, but has the advantage that infrastructure provisioning, maintenance, and the software supply chain can follow established models worked out on Project Surfliner, Cylinders, and other DLD cloud systems.
This process would involve first creating container image for the PHP application and webserver. In the immediate term, this image might either run PHP 5 or PHP 8, depending on timeline for application updates.
System Dependencies and Software Supply Chain
- DLD supports existing deployments of the Official PHP container image, which is based on Debian Linux
- Additional system dependencies would be managed by a version controlled Dockerfile, which would also import the project code.
- Future updates to OS, PHP, and system dependencies would be automated using DLD's existing renovatebotconfigurations.
Virtualized Deployment with EC2
A virtualized deployment using the AWS EC2 offering is a less radical approach. In this model, the existing VMs are ported to run in AWS (and updated accordingly). This approach is in line with goals to move workloads to the cloud, but shares many problems with the on-premises update.
System dependencies and Software Supply Chain
- AMI maintenance?
- Can we automate system dependency updates?
 
- Code Deployment?
Infrastructure Provisioning and Monitoring
- identify suitable Infrastructure as Code approach for EC2 deployments.
Developer On-boarding
- System access would be substantially the same as the existing setup.
- SSH keys would need to be delivered, and firewall rules configured to allow broad SSH access.



