news / clients' success stories
The NSF's TeraGrid supercomputing infrastructure consists of large clusters of computers located at eleven centers in the US.
by Suzanne Jacoby, Jeff Kantor, Tim Axelrod and Anna SpitzLSST E-News
- Arizona Daily Star
- BizPlanIt
- Linux World Expo
- 40 under 40
- They're on the A-List
- Dotche system built by Ephibian
- AzBusiness
- Arizona Daily Star
- Arizona Daily Star
- Phoenix Business Journal
- Ranking Arizona
- The Arizona Republic
- Hostingtech.com
- American City Business Journals Inc.
- AZtechBiz
- Inside Tucson Business
- Arizona Business Gazette
- Inside Tucson Business
- Fiesta Mall
- Arizona Daily Star
- .com Success!
- Business Wire
- Buck's Woodside Menu
- CRN
- Arizona Daily Star
- LocalBusiness.com
- The Business Journal - Phoenix
- Phoenix Business Journal
- LocalBusiness.com
- Business Wire
- Inside Tucson Business
- internet.com
- AzBusiness
- AZtechBiz
- designshops.com
- AZtechBiz
- BizAZ
- Virtualized Cloud
- Collaboration and Communication
- Personally Identifiable Information
- Cyberwarfare
- iPad and E-Readers
- Trusted Platform Module
- Smartphone Data Security
- Cyber-Espionage
- DTNs
- Have a Plan
- Cloud Computing - Part 2
- Impact of Technology on Existing...
- Data Archiving
- Mobile Telephony - Part 2
- Cloud Computing
- Social Networks
- Password Management
- Netbooks
- Microtargeting
- Packet Analysis
- IP v6
- Surge Protection
- Traveling Safely
- Thin Client
- Uptime
- Mobile Telephony
- Know Thy Programs
- Voice Over IP - Part 3
- Google Apps
- Virtual Computing
- Securing Users
- Simple Desktop Management
- Service Oriented Architecture
- Light-based Communication
- Data Mining
- Small Business Architecture
- Voice Over IP - Part 2
- Business Automation
- Database Needs
- DMZs
- CPUs
- SPAM & Botnets
- Security Testing
- Customer Advocacy
- Laptop Security
- Windows Vista
- Large Scale Deployment
- Network Access Control
- Generator Use
- Uninterrupted Power Supplies
- Web Site Security
- Blu-ray vs. HD-DVD
- Dual-Core Processors
- Business Security
- AJAX
- 3G Mobile Internet
- Apple Intel Processors
- Entertainment Tech
- Cafe Wireless
- Commercial Hosting
- Gaming Consoles
- Voice Over IP
- Blogging
- Is WI-FI Secure?
- OpenDocument Format
- Allured Publishing Changes Name to...
- Computer Model Can Help Prevent War?
- Defense contractors run gamut from...
- ASU gears on-site construction...
- The Cleveland Foundation Selects...
- Global Partners Join Forces to Speed...
- Intuit Completes Acquisition of...
- Strategy unveiled on how tobacco tax...
- Gaiam's, Real Goods' revenues increase...
- LSST Awarded Time on TeraGrid
- Aldine Independent School District...
- Miraval featured in Natural Solutions...
- Ventana Medical Systems Joins TSIA to...
- UA $3 Million Bioterrorism Grant...
- Arizona Center for Integrative...
We’ve just received word that the Data Management (DM) team has been awarded significant resources on the TeraGrid, the National Science Foundation’s supercomputing infrastructure consisting of large clusters of computers located at eleven centers in the US. This award will allow the team to perform its most ambitious test to date as it practices processing the massive amounts of data LSST will produce.
As it carries out its 10-year survey, LSST will produce over 15 terabytes of raw astronomical data each night (30 terabytes processed), resulting in a database catalog of 22 petabytes and an image archive of 100 petabytes.
How much data?
A megabyte is 106 or 1,000,000 bytes
A gigabyte = 109 bytes
A terabyte = 1012 bytes
A petabyte = 1015 bytes.
During the LSST design & development phase, the DM group has been developing a software framework and science codes with the scalability and robustness necessary to process this unprecedented data stream.
In order to test the scalability of the software, the LSST Data Management team has performed a series of Data Challenges — targeted demonstrations of the processing software, with each challenge encompassing tasks of incrementally larger scope and complexity building toward the final production code that will be used during operations. Data challenges to date were performed on a fairly modest TeraGrid allocation and on High Performance Computing (HPC) clusters hosted at National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign, an LSST Institutional Member.
The DM team is now at the point with the current data challenge, DC3b, where it will process 10 TB of data from an existing astronomical survey and 47 TB of a simulated LSST data set. DC3b will be done in a series of three incrementally more demanding performance tests, resulting in the production of science data products from an archive’s worth of data, at a scale of 15% of the operational DM System. The goals of these tests are to verify the code for correctness and robustness, understand the code’s performance, and to create a large dataset that can be used by astronomers to plan science projects for the LSST.
“Although we use images from previous surveys, our heavy reliance on simulated images drives the need for 1.5 million core hours on the TeraGrid for the next stage to be conducted over the next few months,” comments Tim Axelrod, the LSST DM System Scientist.
In January, 2010, the LSST Data Management project turned in the proposal to the TeraGrid program requesting infrastructure for DM design and development. Several lead scientists and engineers on the DM team developed the proposal under the leadership of NCSA, who have a long history of involvement in the TeraGrid. The period of the allocation is from April, 2010 through March, 2011. The TeraGrid infrastructure allocated will be provided by systems from NCSA, TACC, LONI, and Purdue.
Mike Freemon, Infrastructure Lead for DM and Project Manager at NCSA, says the team’s proposal was awarded their full request of TeraGrid resources both CPU hours and data storage: 1.51M Service Units (CPU-hours), 400TB of dual-copy mass storage, and 20TB spinning disk storage.
NCSA has led the effort to provide infrastructure for DC3b, which in addition to the TeraGrid allocation includes contributions from SLAC, SDSC, IN2P3, CalTech, Purdue, and the REDDnet project/Vanderbilt University. This architecture includes data production and archiving capabilities, database scaling test resources, and for the first time, resources to replicate and serve the input and output data to scientific users in the LSST Science Collaborations for validation and experimentation.
And if the TeraGrid proposal had not been successful what were the options? Tim tells us DC3b could be run on a fast PC, but it would take 1.5 million hours — about 200 years!
As it carries out its 10-year survey, LSST will produce over 15 terabytes of raw astronomical data each night (30 terabytes processed), resulting in a database catalog of 22 petabytes and an image archive of 100 petabytes.
How much data?
A megabyte is 106 or 1,000,000 bytes
A gigabyte = 109 bytes
A terabyte = 1012 bytes
A petabyte = 1015 bytes.
During the LSST design & development phase, the DM group has been developing a software framework and science codes with the scalability and robustness necessary to process this unprecedented data stream.
In order to test the scalability of the software, the LSST Data Management team has performed a series of Data Challenges — targeted demonstrations of the processing software, with each challenge encompassing tasks of incrementally larger scope and complexity building toward the final production code that will be used during operations. Data challenges to date were performed on a fairly modest TeraGrid allocation and on High Performance Computing (HPC) clusters hosted at National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign, an LSST Institutional Member.
The DM team is now at the point with the current data challenge, DC3b, where it will process 10 TB of data from an existing astronomical survey and 47 TB of a simulated LSST data set. DC3b will be done in a series of three incrementally more demanding performance tests, resulting in the production of science data products from an archive’s worth of data, at a scale of 15% of the operational DM System. The goals of these tests are to verify the code for correctness and robustness, understand the code’s performance, and to create a large dataset that can be used by astronomers to plan science projects for the LSST.
“Although we use images from previous surveys, our heavy reliance on simulated images drives the need for 1.5 million core hours on the TeraGrid for the next stage to be conducted over the next few months,” comments Tim Axelrod, the LSST DM System Scientist.
In January, 2010, the LSST Data Management project turned in the proposal to the TeraGrid program requesting infrastructure for DM design and development. Several lead scientists and engineers on the DM team developed the proposal under the leadership of NCSA, who have a long history of involvement in the TeraGrid. The period of the allocation is from April, 2010 through March, 2011. The TeraGrid infrastructure allocated will be provided by systems from NCSA, TACC, LONI, and Purdue.
Mike Freemon, Infrastructure Lead for DM and Project Manager at NCSA, says the team’s proposal was awarded their full request of TeraGrid resources both CPU hours and data storage: 1.51M Service Units (CPU-hours), 400TB of dual-copy mass storage, and 20TB spinning disk storage.
NCSA has led the effort to provide infrastructure for DC3b, which in addition to the TeraGrid allocation includes contributions from SLAC, SDSC, IN2P3, CalTech, Purdue, and the REDDnet project/Vanderbilt University. This architecture includes data production and archiving capabilities, database scaling test resources, and for the first time, resources to replicate and serve the input and output data to scientific users in the LSST Science Collaborations for validation and experimentation.
And if the TeraGrid proposal had not been successful what were the options? Tim tells us DC3b could be run on a fast PC, but it would take 1.5 million hours — about 200 years!