Software Defined Cyberinfrastructure
Ian Foster, Ben Blaiszik, Kyle Chard and Ryan Chard
Argonne National Laboratory &amp, The University of Chicago, The University of Chicago, Computation Institute, University of Chicago and Argonne National Lab, Victoria University of Wellington

Within and across thousands of science labs, researchers and students struggle to manage data produced in experiments, simulations, and analyses. Largely manual research data lifecycle management processes mean that much time is wasted, research results are often irreproducible, and data sharing and reuse remain rare. In response, we propose a new approach to data lifecycle management in which researchers are empowered to define the actions to be performed at individual storage systems when data are created or modified: actions such as analysis, transformation, copying, and publication. We term this approach software-defined cyberinfrastructure because users can implement powerful data management policies by deploying rules to local storage systems, much as software-defined networking allows users to configure networks by deploying rules to switches. We argue that this approach can enable a new class of responsive distributed storage infrastructure that will accelerate research innovation by allowing any researcher to associate data workflows with data sources, whether local or remote, for such purposes as data ingest, characterization, indexing, and sharing. We report on early experiments with this approach in the context of experimental science, in which a simple if-trigger-then-action (IFTA) notation is used to define rules.