Forum für Wissenschaft, Industrie und Wirtschaft

Hauptsponsoren:     3M 
Datenbankrecherche:

 

Bug repellent for supercomputers proves effective

15.11.2012
Lawrence Livermore National Laboratory (LLNL) researchers have used the Stack Trace Analysis Tool (STAT), a highly scalable, lightweight tool to debug a program running more than one million MPI processes on the IBM Blue Gene/Q (BGQ)-based Sequoia supercomputer.

The debugging tool is a significant milestone in LLNL's multi-year collaboration with the University of Wisconsin (UW), Madison and the University of New Mexico (UNM) to ensure supercomputers run more efficiently.

Playing a significant role in scaling up the Sequoia supercomputer, STAT, a 2011 R&D 100 Award winner, has helped both early access users and system integrators quickly isolate a wide range of errors, including particularly perplexing issues that only manifested at extremely large scales up to 1,179,648 compute cores. During the Sequoia scale-up, bugs in applications as well as defects in system software and hardware have manifested themselves as failures in applications. It is important to quickly diagnose errors so they can be reported to experts who can analyze them in detail and ultimately solve the problem.

"STAT has been indispensable in this capacity, helping the multi-disciplined integration team keep pace with the aggressive system scale-up schedule," said LLNL computer scientist Greg Lee.

"While testing a subsystem of Blue/Gene Q, my test program consistently failed only when scaled to 1,179,648 MPI processes. Although the test program was simple, the sheer scale at which this program ran made debugging efforts highly challenging. But when I applied STAT, it quickly revealed that one particular rank process was consistently stuck in a system call," said Dong Ahn, a computer scientist in Livermore Computing.

Based on this finding, a system expert took a close look at the compute core on which this rank process was running and discovered a hardware defect. "Replacing the component suddenly got the entire Sequoia system back to life," Ahn said. "Putting this exercise into perspective, this error was due to a defect in a tiny hardware unit, the decrementor, of a single hardware thread out of a total of 4.7 million hardware threads. I felt it was like finding a needle in a haystack over a coffee break."

Sequoia delivers 20 petaflops of peak power and was ranked No. 1 in June of this year's TOP500 list. It is currently ranked No. 2, behind Oak Ridge National Laboratory's Titan.

LLNL plans to use Sequoia's impressive computational capability to advance understanding of fundamental physics and engineering questions that arise in the National Nuclear Security Administration's (NNSA) program to ensure the safety, security and effectiveness of the United States' nuclear deterrent without testing. Sequoia also will support NNSA/DOE programs at LLNL that focus on nonproliferation, counterterrorism, energy, security, health and climate change.

As LLNL takes delivery of the Sequoia system and works to move it into production, computer scientists will migrate applications that have been running on earlier systems to this newer architecture. This is a period of intense activity for LLNL's application teams as they gain experience with the new hardware and software environment.

"Having a highly effective debugging tool that scales to the full system is vital to the installation and acceptance process for Sequoia. It is critical that our development teams have a comprehensive parallel debugging tool set as they iron out the inevitable issues that come up with running on a new system like Sequoia," said Kim Cupps, leader of the Livermore Computing Division at LLNL.

STAT is particularly important for LLNL because supercomputer simulations are essential in virtually every mission area of the Laboratory. The tool also has been used at other sites and proved to be effective on a wide range of supercomputer platforms, including Linux clusters and Cray systems.

The team is actively pursuing further optimization of STAT technologies and is exploring commercialization strategies. More information about STAT, including a link to the source code, is available on the Web.

More Information
STAT
ASC Sequoia
Early science runs prepare Lawrence Livermore National Lab's Sequoia for national security missions

LLNL news release, Nov. 9, 2012

"Venturing into the heart of high-performance computing simulations"
Science & Technology Review, September 2012
Founded in 1952, Lawrence Livermore National Laboratory provides solutions to our nation's most important national security challenges through innovative science, engineering and technology. Lawrence Livermore National Laboratory is managed by Lawrence Livermore National Security, LLC for the U.S. Department of Energy's National Nuclear Security Administration.

Anne Stark | EurekAlert!
Further information:
http://www.llnl.gov

More articles from Information Technology:

nachricht New technique controls autonomous vehicles on a dirt track
24.05.2016 | Georgia Institute of Technology

nachricht Engineers take first step toward flexible, wearable, tricorder-like device
24.05.2016 | University of California - San Diego

All articles from Information Technology >>>

The most recent press releases about innovation >>>

Die letzten 5 Focus-News des innovations-reports im Überblick:

Im Focus: Computational High-Throughput-Screening findet neue Hartmagnete die weniger Seltene Erden enthalten

Für Zukunftstechnologien wie Elektromobilität und erneuerbare Energien ist der Einsatz von starken Dauermagneten von großer Bedeutung. Für deren Herstellung werden Seltene Erden benötigt. Dem Fraunhofer-Institut für Werkstoffmechanik IWM in Freiburg ist es nun gelungen, mit einem selbst entwickelten Simulationsverfahren auf Basis eines High-Throughput-Screening (HTS) vielversprechende Materialansätze für neue Dauermagnete zu identifizieren. Das Team verbesserte damit die magnetischen Eigenschaften und ersetzte gleichzeitig Seltene Erden durch Elemente, die weniger teuer und zuverlässig verfügbar sind. Die Ergebnisse wurden im Online-Fachmagazin »Scientific Reports« publiziert.

Ausgangspunkt des Projekts der IWM-Forscher Wolfgang Körner, Georg Krugel und Christian Elsässer war eine Neodym-Eisen-Stickstoff-Verbindung, die auf einem...

Im Focus: University of Queensland: In weniger als 2 Stunden ans andere Ende der Welt reisen

Ein internationales Forschungsteam, darunter Wissenschaftler der University of Queensland, hat im Süden Australiens einen erfolgreichen Hyperschallgeschwindigkeitstestflug absolviert und damit futuristische Reisemöglichkeiten greifbarer gemacht.

Flugreisen von London nach Sydney in unter zwei Stunden werden, dank des HiFiRE Programms, immer realistischer. Im Rahmen dieses Projekts werden in den...

Im Focus: Computational high-throughput screening finds hard magnets containing less rare earth elements

Permanent magnets are very important for technologies of the future like electromobility and renewable energy, and rare earth elements (REE) are necessary for their manufacture. The Fraunhofer Institute for Mechanics of Materials IWM in Freiburg, Germany, has now succeeded in identifying promising approaches and materials for new permanent magnets through use of an in-house simulation process based on high-throughput screening (HTS). The team was able to improve magnetic properties this way and at the same time replaced REE with elements that are less expensive and readily available. The results were published in the online technical journal “Scientific Reports”.

The starting point for IWM researchers Wolfgang Körner, Georg Krugel, and Christian Elsässer was a neodymium-iron-nitrogen compound based on a type of...

Im Focus: Mit atomarer Präzision: Technologien für die übernächste Chipgeneration

Im Projekt »Beyond EUV« entwickeln die Fraunhofer-Institute für Lasertechnik ILT in Aachen und für angewandte Optik und Feinmechanik IOF in Jena wesentliche Technologien zur Fertigung einer neuen Generation von Mikrochips mit EUV-Strahlung bei 6,7 nm. Die Strukturen sind dann kaum noch dicker als einzelne Atome und ermöglichen besonders hoch integrierte Schaltkreise zum Beispiel für Wearables oder gedankengesteuerte Prothesen.

Gordon Moore formulierte 1965 das später nach ihm benannte Gesetz, wonach sich alle ein bis zwei Jahre die Komplexität integrierter Schaltungen verdoppelt. Er...

Im Focus: Ein negatives Enzym liefert positive Resultate

In den letzten zwanzig Jahren hat die Chemie viele wichtige Instrumente und Verfahren für die Biologie hervorgebracht. Heute können wir Proteine herstellen, die in der Natur bisher nicht vorkommen. Es lassen sich Bilder von Ausschnitten lebender Zellen aufnehmen und sogar einzelne Zellen in lebendigen Tieren beobachten. Diese Woche haben zwei Forschungsgruppen der Universitäten Basel und Genf, die beide dem Nationalen Forschungsschwerpunkt Molecular Systems Engineering angehören, im Forschungsmagazin «ACS Central Science» präsentiert, wie man ein nicht-natürliches Protein designt, das völlig neue Fähigkeiten aufweist.

Proteine sind die Arbeitspferde jeder Zelle. Sie bestehen aus Aminosäurebausteinen, die als Kette verbunden sind, welche sich zu funktionalen Maschinen...

Alle Focus-News des Innovations-reports >>>

Anzeige

Anzeige

IHR
JOB & KARRIERE
SERVICE
im innovations-report
in Kooperation mit academics
Veranstaltungen

DFG unterstützt Kongresse und Tagungen - Juli 2016

25.05.2016 | Veranstaltungen

"European Conference on Modelling and Simulation" an der OTH Regensburg

25.05.2016 | Veranstaltungen

Fachtagung »Magnetwerkstoffe und Seltene Erden«

25.05.2016 | Veranstaltungen

 
B2B-VideoLinks
Weitere VideoLinks >>>
Aktuelle Beiträge

ILA 2016: Additive Produktion ­einsatzfähiger Bauteile durch effiziente Prozessketten

25.05.2016 | Messenachrichten

Reliable in-line inspections of high-strength automotive body parts within seconds

25.05.2016 | Messenachrichten

Wie Zellen Barrieren überwinden

25.05.2016 | Förderungen Preise