The concept of reproducibility has been the keystone of both ancient and modern scientific methods. In spite of this, digital science has recently been put to task to improve its failing record of repeatable experimentation. A plethora of digital archives have appeared in response, yet the community has not defined the end goal. There exists no means of comparing or evaluating digital archives nor the quality of preserved software, and thus no means of knowing if the tools are valid toward that goal. A metric for evaluating software sustainability is provided and used to define a metric for evaluating and comparing interactive software archives.
Active Curation of Artifacts is Changing the Way Digital Libraries will Operate. 4th Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE4, 2016). slidespdf
Software Provenance: Track the Reality not the Virtual Machine. Proceedings of the First International Workshop on Practical Reproducible Evaluation of Computer Systems (P-RECS, 2018). slides [acm] pdf
The growing use of computers and massive storage by individuals is driving interest in digital preservation. The scientific method demands accountability through digital reproducibility, adding another strong motivation for preservation. However, data alone can become obsolete if the interactivity of software required to interpret the data is lost. Virtual machines (VMs) may preserve interactivity however do so at the cost of obscuring the nature of what lies within. Occam, instead, builds VMs on-the-fly while storing and distributing well-described software packages. Thus, the system can track the exact components inside VMs without storing the machines themselves, allowing software to be repeatably built and executed. For Occam to recreate VMs, it needs to know exactly what software was used within. Through this tracking, such software can even be modified and rebuilt. Occam keeps track of all such components in manifests, allowing anybody to know exactly what is in each VM, and the origins of each component.
Supporting Long-term Reproducible Software Execution Proceedings of the First International Workshop on Practical Reproducible Evaluation of Computer Systems (P-RECS, 2018). slides [acm] pdf
A recent widespread realization that software experiments are not as easily replicated as once believed brought software execution preservation to the science spotlight. As a result, scientists, institutions, and funding agencies have recently been pushing for the development of methodologies and tools that preserve software artifacts. Despite current efforts, long term reproducibility still eludes us.
In this paper, we present the requirements for software execution preservation and discuss how to improve long-term reproducibility in science. In particular, we discuss the reasons why preserving binaries and pre-built execution environments is not enough and why preserving the ability to replicate results is not the same as preserving software for reproducible science. Finally, we show how these requirements are supported by Occam, an open curation framework that fully preserves software and its dependencies from source to execution, promoting transparency, longevity, and re-use. Specifically, Occam provides the ability to automatically deploy workflows in a fully-functional environment that is able to not only run them, but make them easily replicable.
Composing, Reproducing, and Sharing Simulations. 4th Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE4, 2016). slidespdf
Open Curation and Repeatability for Scientific Artifact Evaluation. Science Gateways 2017. [figshare]
Supporting Long-term Reproducible Software Execution. Proceedings of the First International Workshop on Practical Reproducible Evaluation of Computer Systems (P-RECS, 2018). pdf
Artifact Execution Curation for Repeatability in Artifact Evaluation. 2018. [figshare]