Abstract
Modern workflow systems can enable scientists to run ensemble simulations at unprecedented scales and levels of complexity, allowing them to study system sizes previously impossible to achieve. However as a result of these new capabilities the science teams suddenly also face unprecedented data volumes that they are unable to analyze with their existing tools and methodologies in a timely fashion. In this paper we describe the ongoing development work to create an integrated data intensive scientific workflow and analysis environment that offers researchers the ability to easily create and execute complex simulation studies and provides them with different scalable methods to analyze the resulting data volumes. The capabilities of the new environment are demonstrated on a use case that focuses on building energy modeling. As part of the PNNL research initiative PRIMA (Platform for Regional Integrated Modeling and Analysis) the team performed an initial 3-year study of building energy demands for the US Eastern Interconnect domain. They are now planning to extend to predict the demand for the complete century. In the 3-year study the team simulated 2000 individual building types for 100 independent climate similar regions (600 000 individual runs) raising their data demands from a few MBs to 400 GB for the 3-year study.