Forum für Wissenschaft, Industrie und Wirtschaft

Hauptsponsoren:     3M 


Bug repellent for supercomputers proves effective

Lawrence Livermore National Laboratory (LLNL) researchers have used the Stack Trace Analysis Tool (STAT), a highly scalable, lightweight tool to debug a program running more than one million MPI processes on the IBM Blue Gene/Q (BGQ)-based Sequoia supercomputer.

The debugging tool is a significant milestone in LLNL's multi-year collaboration with the University of Wisconsin (UW), Madison and the University of New Mexico (UNM) to ensure supercomputers run more efficiently.

Playing a significant role in scaling up the Sequoia supercomputer, STAT, a 2011 R&D 100 Award winner, has helped both early access users and system integrators quickly isolate a wide range of errors, including particularly perplexing issues that only manifested at extremely large scales up to 1,179,648 compute cores. During the Sequoia scale-up, bugs in applications as well as defects in system software and hardware have manifested themselves as failures in applications. It is important to quickly diagnose errors so they can be reported to experts who can analyze them in detail and ultimately solve the problem.

"STAT has been indispensable in this capacity, helping the multi-disciplined integration team keep pace with the aggressive system scale-up schedule," said LLNL computer scientist Greg Lee.

"While testing a subsystem of Blue/Gene Q, my test program consistently failed only when scaled to 1,179,648 MPI processes. Although the test program was simple, the sheer scale at which this program ran made debugging efforts highly challenging. But when I applied STAT, it quickly revealed that one particular rank process was consistently stuck in a system call," said Dong Ahn, a computer scientist in Livermore Computing.

Based on this finding, a system expert took a close look at the compute core on which this rank process was running and discovered a hardware defect. "Replacing the component suddenly got the entire Sequoia system back to life," Ahn said. "Putting this exercise into perspective, this error was due to a defect in a tiny hardware unit, the decrementor, of a single hardware thread out of a total of 4.7 million hardware threads. I felt it was like finding a needle in a haystack over a coffee break."

Sequoia delivers 20 petaflops of peak power and was ranked No. 1 in June of this year's TOP500 list. It is currently ranked No. 2, behind Oak Ridge National Laboratory's Titan.

LLNL plans to use Sequoia's impressive computational capability to advance understanding of fundamental physics and engineering questions that arise in the National Nuclear Security Administration's (NNSA) program to ensure the safety, security and effectiveness of the United States' nuclear deterrent without testing. Sequoia also will support NNSA/DOE programs at LLNL that focus on nonproliferation, counterterrorism, energy, security, health and climate change.

As LLNL takes delivery of the Sequoia system and works to move it into production, computer scientists will migrate applications that have been running on earlier systems to this newer architecture. This is a period of intense activity for LLNL's application teams as they gain experience with the new hardware and software environment.

"Having a highly effective debugging tool that scales to the full system is vital to the installation and acceptance process for Sequoia. It is critical that our development teams have a comprehensive parallel debugging tool set as they iron out the inevitable issues that come up with running on a new system like Sequoia," said Kim Cupps, leader of the Livermore Computing Division at LLNL.

STAT is particularly important for LLNL because supercomputer simulations are essential in virtually every mission area of the Laboratory. The tool also has been used at other sites and proved to be effective on a wide range of supercomputer platforms, including Linux clusters and Cray systems.

The team is actively pursuing further optimization of STAT technologies and is exploring commercialization strategies. More information about STAT, including a link to the source code, is available on the Web.

More Information
ASC Sequoia
Early science runs prepare Lawrence Livermore National Lab's Sequoia for national security missions

LLNL news release, Nov. 9, 2012

"Venturing into the heart of high-performance computing simulations"
Science & Technology Review, September 2012
Founded in 1952, Lawrence Livermore National Laboratory provides solutions to our nation's most important national security challenges through innovative science, engineering and technology. Lawrence Livermore National Laboratory is managed by Lawrence Livermore National Security, LLC for the U.S. Department of Energy's National Nuclear Security Administration.

Anne Stark | EurekAlert!
Further information:

More articles from Information Technology:

nachricht New software speeds origami structure designs
12.10.2017 | Georgia Institute of Technology

nachricht Seeing the next dimension of computer chips
11.10.2017 | Osaka University

All articles from Information Technology >>>

The most recent press releases about innovation >>>

Die letzten 5 Focus-News des innovations-reports im Überblick:

Im Focus: Smarte Sensoren für effiziente Prozesse

Materialfehler im Endprodukt können in vielen Industriebereichen zu frühzeitigem Versagen führen und den sicheren Gebrauch der Erzeugnisse massiv beeinträchtigen. Eine Schlüsselrolle im Rahmen der Qualitätssicherung kommt daher intelligenten, zerstörungsfreien Sensorsystemen zu, die es erlauben, Bauteile schnell und kostengünstig zu prüfen, ohne das Material selbst zu beschädigen oder die Oberfläche zu verändern. Experten des Fraunhofer IZFP in Saarbrücken präsentieren vom 7. bis 10. November 2017 auf der Blechexpo in Stuttgart zwei Exponate, die eine schnelle, zuverlässige und automatisierte Materialcharakterisierung und Fehlerbestimmung ermöglichen (Halle 5, Stand 5306).

Bei Verwendung zeitaufwändiger zerstörender Prüfverfahren zieht die Qualitätsprüfung durch die Beschädigung oder Zerstörung der Produkte enorme Kosten nach...

Im Focus: Smart sensors for efficient processes

Material defects in end products can quickly result in failures in many areas of industry, and have a massive impact on the safe use of their products. This is why, in the field of quality assurance, intelligent, nondestructive sensor systems play a key role. They allow testing components and parts in a rapid and cost-efficient manner without destroying the actual product or changing its surface. Experts from the Fraunhofer IZFP in Saarbrücken will be presenting two exhibits at the Blechexpo in Stuttgart from 7–10 November 2017 that allow fast, reliable, and automated characterization of materials and detection of defects (Hall 5, Booth 5306).

When quality testing uses time-consuming destructive test methods, it can result in enormous costs due to damaging or destroying the products. And given that...

Im Focus: Cold molecules on collision course

Using a new cooling technique MPQ scientists succeed at observing collisions in a dense beam of cold and slow dipolar molecules.

How do chemical reactions proceed at extremely low temperatures? The answer requires the investigation of molecular samples that are cold, dense, and slow at...

Im Focus: Kalte Moleküle auf Kollisionskurs

Mit einer neuen Kühlmethode gelingt Wissenschaftlern am MPQ die Beobachtung von Stößen in einem dichten Strahl aus kalten und langsamen dipolaren Molekülen.

Wie verlaufen chemische Reaktionen bei extrem tiefen Temperaturen? Um diese Frage zu beantworten, benötigt man molekulare Proben, die gleichzeitig kalt, dicht...

Im Focus: Astronomen entdecken ungewöhnliche spindelförmige Galaxien

Galaxien als majestätische, rotierende Sternscheiben? Nicht bei den spindelförmigen Galaxien, die von Athanasia Tsatsi (Max-Planck-Institut für Astronomie) und ihren Kollegen untersucht wurden. Mit Hilfe der CALIFA-Umfrage fanden die Astronomen heraus, dass diese schlanken Galaxien, die sich um ihre Längsachse drehen, weitaus häufiger sind als bisher angenommen. Mit den neuen Daten konnten die Astronomen außerdem ein Modell dafür entwickeln, wie die spindelförmigen Galaxien aus einer speziellen Art von Verschmelzung zweier Spiralgalaxien entstehen. Die Ergebnisse wurden in der Zeitschrift Astronomy & Astrophysics veröffentlicht.

Wenn die meisten Menschen an Galaxien denken, dürften sie an majestätische Spiralgalaxien wie die unserer Heimatgalaxie denken, der Milchstraße: Milliarden von...

Alle Focus-News des Innovations-reports >>>



im innovations-report
in Kooperation mit academics

Meeresbiologe Mark E. Hay zu Gast bei den "Noblen Gesprächen" am Beutenberg Campus in Jena

16.10.2017 | Veranstaltungen

bionection 2017 erstmals in Thüringen: Biotech-Spitzenforschung trifft in Jena auf Weltmarktführer

13.10.2017 | Veranstaltungen

Tagung „Energieeffiziente Abluftreinigung“ zeigt, wie man durch Luftreinhaltemaßnahmen profitieren kann

13.10.2017 | Veranstaltungen

Weitere VideoLinks >>>
Aktuelle Beiträge

ESO-Teleskope beobachten erstes Licht einer Gravitationswellen-Quelle

16.10.2017 | Physik Astronomie

Was läuft schief beim Noonan-Syndrom? – Grundlagen der neuronalen Fehlfunktion entdeckt

16.10.2017 | Biowissenschaften Chemie

Gewebe mit Hilfe von Stammzellen regenerieren

16.10.2017 | Förderungen Preise