Forum für Wissenschaft, Industrie und Wirtschaft

Hauptsponsoren:     3M 
Datenbankrecherche:

 

Bug repellent for supercomputers proves effective

15.11.2012
Lawrence Livermore National Laboratory (LLNL) researchers have used the Stack Trace Analysis Tool (STAT), a highly scalable, lightweight tool to debug a program running more than one million MPI processes on the IBM Blue Gene/Q (BGQ)-based Sequoia supercomputer.

The debugging tool is a significant milestone in LLNL's multi-year collaboration with the University of Wisconsin (UW), Madison and the University of New Mexico (UNM) to ensure supercomputers run more efficiently.

Playing a significant role in scaling up the Sequoia supercomputer, STAT, a 2011 R&D 100 Award winner, has helped both early access users and system integrators quickly isolate a wide range of errors, including particularly perplexing issues that only manifested at extremely large scales up to 1,179,648 compute cores. During the Sequoia scale-up, bugs in applications as well as defects in system software and hardware have manifested themselves as failures in applications. It is important to quickly diagnose errors so they can be reported to experts who can analyze them in detail and ultimately solve the problem.

"STAT has been indispensable in this capacity, helping the multi-disciplined integration team keep pace with the aggressive system scale-up schedule," said LLNL computer scientist Greg Lee.

"While testing a subsystem of Blue/Gene Q, my test program consistently failed only when scaled to 1,179,648 MPI processes. Although the test program was simple, the sheer scale at which this program ran made debugging efforts highly challenging. But when I applied STAT, it quickly revealed that one particular rank process was consistently stuck in a system call," said Dong Ahn, a computer scientist in Livermore Computing.

Based on this finding, a system expert took a close look at the compute core on which this rank process was running and discovered a hardware defect. "Replacing the component suddenly got the entire Sequoia system back to life," Ahn said. "Putting this exercise into perspective, this error was due to a defect in a tiny hardware unit, the decrementor, of a single hardware thread out of a total of 4.7 million hardware threads. I felt it was like finding a needle in a haystack over a coffee break."

Sequoia delivers 20 petaflops of peak power and was ranked No. 1 in June of this year's TOP500 list. It is currently ranked No. 2, behind Oak Ridge National Laboratory's Titan.

LLNL plans to use Sequoia's impressive computational capability to advance understanding of fundamental physics and engineering questions that arise in the National Nuclear Security Administration's (NNSA) program to ensure the safety, security and effectiveness of the United States' nuclear deterrent without testing. Sequoia also will support NNSA/DOE programs at LLNL that focus on nonproliferation, counterterrorism, energy, security, health and climate change.

As LLNL takes delivery of the Sequoia system and works to move it into production, computer scientists will migrate applications that have been running on earlier systems to this newer architecture. This is a period of intense activity for LLNL's application teams as they gain experience with the new hardware and software environment.

"Having a highly effective debugging tool that scales to the full system is vital to the installation and acceptance process for Sequoia. It is critical that our development teams have a comprehensive parallel debugging tool set as they iron out the inevitable issues that come up with running on a new system like Sequoia," said Kim Cupps, leader of the Livermore Computing Division at LLNL.

STAT is particularly important for LLNL because supercomputer simulations are essential in virtually every mission area of the Laboratory. The tool also has been used at other sites and proved to be effective on a wide range of supercomputer platforms, including Linux clusters and Cray systems.

The team is actively pursuing further optimization of STAT technologies and is exploring commercialization strategies. More information about STAT, including a link to the source code, is available on the Web.

More Information
STAT
ASC Sequoia
Early science runs prepare Lawrence Livermore National Lab's Sequoia for national security missions

LLNL news release, Nov. 9, 2012

"Venturing into the heart of high-performance computing simulations"
Science & Technology Review, September 2012
Founded in 1952, Lawrence Livermore National Laboratory provides solutions to our nation's most important national security challenges through innovative science, engineering and technology. Lawrence Livermore National Laboratory is managed by Lawrence Livermore National Security, LLC for the U.S. Department of Energy's National Nuclear Security Administration.

Anne Stark | EurekAlert!
Further information:
http://www.llnl.gov

More articles from Information Technology:

nachricht Powerful IT security for the car of the future – research alliance develops new approaches
25.05.2018 | Universität Ulm

nachricht Supercomputing the emergence of material behavior
18.05.2018 | University of Texas at Austin, Texas Advanced Computing Center

All articles from Information Technology >>>

The most recent press releases about innovation >>>

Die letzten 5 Focus-News des innovations-reports im Überblick:

Im Focus: Starke IT-Sicherheit für das Auto der Zukunft – Forschungsverbund entwickelt neue Ansätze

Je mehr die Elektronik Autos lenkt, beschleunigt und bremst, desto wichtiger wird der Schutz vor Cyber-Angriffen. Deshalb erarbeiten 15 Partner aus Industrie und Wissenschaft in den kommenden drei Jahren neue Ansätze für die IT-Sicherheit im selbstfahrenden Auto. Das Verbundvorhaben unter dem Namen „Security For Connected, Autonomous Cars (SecForCARs) wird durch das Bundesministerium für Bildung und Forschung mit 7,2 Millionen Euro gefördert. Infineon leitet das Projekt.

Bereits heute bieten Fahrzeuge vielfältige Kommunikationsschnittstellen und immer mehr automatisierte Fahrfunktionen, wie beispielsweise Abstands- und...

Im Focus: Powerful IT security for the car of the future – research alliance develops new approaches

The more electronics steer, accelerate and brake cars, the more important it is to protect them against cyber-attacks. That is why 15 partners from industry and academia will work together over the next three years on new approaches to IT security in self-driving cars. The joint project goes by the name Security For Connected, Autonomous Cars (SecForCARs) and has funding of €7.2 million from the German Federal Ministry of Education and Research. Infineon is leading the project.

Vehicles already offer diverse communication interfaces and more and more automated functions, such as distance and lane-keeping assist systems. At the same...

Im Focus: Mit Hilfe molekularer Schalter lassen sich künftig neuartige Bauelemente entwickeln

Einem Forscherteam unter Führung von Physikern der Technischen Universität München (TUM) ist es gelungen, spezielle Moleküle mit einer angelegten Spannung zwischen zwei strukturell unterschiedlichen Zuständen hin und her zu schalten. Derartige Nano-Schalter könnten Basis für neuartige Bauelemente sein, die auf Silizium basierende Komponenten durch organische Moleküle ersetzen.

Die Entwicklung neuer elektronischer Technologien fordert eine ständige Verkleinerung funktioneller Komponenten. Physikern der TU München ist es im Rahmen...

Im Focus: Molecular switch will facilitate the development of pioneering electro-optical devices

A research team led by physicists at the Technical University of Munich (TUM) has developed molecular nanoswitches that can be toggled between two structurally different states using an applied voltage. They can serve as the basis for a pioneering class of devices that could replace silicon-based components with organic molecules.

The development of new electronic technologies drives the incessant reduction of functional component sizes. In the context of an international collaborative...

Im Focus: GRACE Follow-On erfolgreich gestartet: Das Satelliten-Tandem dokumentiert den globalen Wandel

Die Satellitenmission GRACE-FO ist gestartet. Am 22. Mai um 21.47 Uhr (MESZ) hoben die beiden Satelliten des GFZ und der NASA an Bord einer Falcon-9-Rakete von der Vandenberg Air Force Base (Kalifornien) ab und wurden in eine polare Umlaufbahn gebracht. Dort nehmen sie in den kommenden Monaten ihre endgültige Position ein. Die NASA meldete 30 Minuten später, dass der Kontakt zu den Satelliten in ihrem Zielorbit erfolgreich hergestellt wurde. GRACE Follow-On wird das Erdschwerefeld und dessen räumliche und zeitliche Variationen sehr genau vermessen. Sie ermöglicht damit präzise Aussagen zum globalen Wandel, insbesondere zu Änderungen im Wasserhaushalt, etwa dem Verlust von Eismassen.

Potsdam, 22. Mai 2018: Die deutsch-amerikanische Satellitenmission GRACE-FO (Gravity Recovery And Climate Experiment Follow On) ist erfolgreich gestartet. Am...

Alle Focus-News des Innovations-reports >>>

Anzeige

Anzeige

VideoLinks
Industrie & Wirtschaft
Veranstaltungen

Im Fokus: Klimaangepasste Pflanzen

25.05.2018 | Veranstaltungen

Größter Astronomie-Kongress kommt nach Wien

24.05.2018 | Veranstaltungen

22. Business Forum Qualität: Vom Smart Device bis zum Digital Twin

22.05.2018 | Veranstaltungen

VideoLinks
Wissenschaft & Forschung
Weitere VideoLinks im Überblick >>>
 
Aktuelle Beiträge

Berufsausbildung mit Zukunft

25.05.2018 | Unternehmensmeldung

Untersuchung der Zellmembran: Forscher entwickeln Stoff, der wichtigen Membranbestandteil nachahmt

25.05.2018 | Interdisziplinäre Forschung

Starke IT-Sicherheit für das Auto der Zukunft – Forschungsverbund entwickelt neue Ansätze

25.05.2018 | Informationstechnologie

Weitere B2B-VideoLinks
IHR
JOB & KARRIERE
SERVICE
im innovations-report
in Kooperation mit academics