news / clients' success stories

img

The NSF's TeraGrid supercomputing infrastructure consists of large clusters of computers located at eleven centers in the US.

by Suzanne Jacoby, Jeff Kantor, Tim Axelrod and Anna Spitz
LSST E-News

We’ve just received word that the Data Management (DM) team has been awarded significant resources on the TeraGrid, the National Science Foundation’s supercomputing infrastructure consisting of large clusters of computers located at eleven centers in the US. This award will allow the team to perform its most ambitious test to date as it practices processing the massive amounts of data LSST will produce.

As it carries out its 10-year survey, LSST will produce over 15 terabytes of raw astronomical data each night (30 terabytes processed), resulting in a database catalog of 22 petabytes and an image archive of 100 petabytes.

How much data?
A megabyte is 106 or 1,000,000 bytes
A gigabyte = 109 bytes
A terabyte = 1012 bytes
A petabyte = 1015 bytes.

During the LSST design & development phase, the DM group has been developing a software framework and science codes with the scalability and robustness necessary to process this unprecedented data stream.

In order to test the scalability of the software, the LSST Data Management team has performed a series of Data Challenges — targeted demonstrations of the processing software, with each challenge encompassing tasks of incrementally larger scope and complexity building toward the final production code that will be used during operations. Data challenges to date were performed on a fairly modest TeraGrid allocation and on High Performance Computing (HPC) clusters hosted at National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign, an LSST Institutional Member.

The DM team is now at the point with the current data challenge, DC3b, where it will process 10 TB of data from an existing astronomical survey and 47 TB of a simulated LSST data set. DC3b will be done in a series of three incrementally more demanding performance tests, resulting in the production of science data products from an archive’s worth of data, at a scale of 15% of the operational DM System. The goals of these tests are to verify the code for correctness and robustness, understand the code’s performance, and to create a large dataset that can be used by astronomers to plan science projects for the LSST.

“Although we use images from previous surveys, our heavy reliance on simulated images drives the need for 1.5 million core hours on the TeraGrid for the next stage to be conducted over the next few months,” comments Tim Axelrod, the LSST DM System Scientist.

In January, 2010, the LSST Data Management project turned in the proposal to the TeraGrid program requesting infrastructure for DM design and development. Several lead scientists and engineers on the DM team developed the proposal under the leadership of NCSA, who have a long history of involvement in the TeraGrid. The period of the allocation is from April, 2010 through March, 2011. The TeraGrid infrastructure allocated will be provided by systems from NCSA, TACC, LONI, and Purdue.

Mike Freemon, Infrastructure Lead for DM and Project Manager at NCSA, says the team’s proposal was awarded their full request of TeraGrid resources both CPU hours and data storage: 1.51M Service Units (CPU-hours), 400TB of dual-copy mass storage, and 20TB spinning disk storage.

NCSA has led the effort to provide infrastructure for DC3b, which in addition to the TeraGrid allocation includes contributions from SLAC, SDSC, IN2P3, CalTech, Purdue, and the REDDnet project/Vanderbilt University. This architecture includes data production and archiving capabilities, database scaling test resources, and for the first time, resources to replicate and serve the input and output data to scientific users in the LSST Science Collaborations for validation and experimentation.

And if the TeraGrid proposal had not been successful what were the options? Tim tells us DC3b could be run on a fast PC, but it would take 1.5 million hours — about 200 years!