|
MRNet: A Multicast/Reduction Network |
|
News and Recent Developments
December 2015 - Version 5.0.1 has been released. See below for details.
July 2015 - Version 5.0.0 has been released.
March 2014 - Version 4.1.0 has been released.
April 2012 - Version 4.0.0 has been released.
A recent paper on using MRNet on Cray XT systems:
Michael J. Brim, Luiz DeRose, Barton P. Miller, Ramya Olichandran, and Philip C. Roth,
"MRNet: A Scalable Infrastructure for the Development of Parallel Tools and Applications",
Cray User Group 2010, Edinburgh, Scotland, May 2010.
[PDF]
A recent paper on lightweight TBON infrastructure:
Emily R. Jacobson, Michael J. Brim, and Barton P. Miller,
"A Lightweight Library for Building Scalable Tools",
Para 2010: State of the Art in Scientific and Parallel Computing, Reykjavik, Iceland, June 2010.
[PDF]
A recent paper on scalable TBON reliability:
Dorian C. Arnold and Barton P. Miller,
"Scalable Failure Recovery for High-performance Data Aggregation"
,
International Parallel and Distributed Processing Symposium (IPDPS),
Atlanta, April 2010.
[PDF]
Overview
MRNet is a software overlay network that provides efficient multicast
and reduction communications for parallel and distributed tools and systems.
MRNet uses a tree of processes between the tool's front-end and back-ends to
improve group communication performance. These internal processes are also
used to distribute many important tool activities, reducing
data analysis time and keeping tool front-end loads manageable.
MRNet-based tool components communicate across logical channels called
streams. At MRNet internal processes, filters are bound to these streams to
synchronize and aggregate dataflows. Using filters, MRNet can efficiently
compute averages, sums, and other more complex aggregations and analyses
on tool data. MRNet also supports facilities that allow tool developers
dynamically load new tool-specific filters into the system.
Key Features:
- Flexible Organization: MRNet process tree organization is specified in a
configuration file that can specify common network layouts like k-ary and
k-nomial trees, or custom layouts tailored to the system(s) running the tool.
- Scalable, Flexible Data Aggregation: MRNet's built-in filters provide
efficient computation of averages, sums, concatenation, and other common data
reductions. Custom filters can be loaded dynamically into the network to perform
tool-specific aggregation operations.
- High-bandwidth Communication: MRNet transfers data within the
tool system using an efficient, packed binary representation. Zero-copy data
paths are used whenever possible to reduce the cost of transferring data through
internal processes.
- Scalable Multicast: MRNet supports efficient message multicast
to reduce the cost of issuing control requests from the tool front-end to its
back-ends.
- Multiple Concurrent Data Channels: MRNet supports multiple
logical streams of data between tool components. Data aggregation and message
multicast takes place within the context of a data stream, and multiple
operations (both upward and downward) can be active simultaneously.
- Open Source Licensing.
Software and Manuals
MRNet Version 5.0.1, December 2015.
MRNet Version 5.0.0, July 2015.
MRNet Version 4.1.0, March 2014.
MRNet Version 4.0.0, April 2012.
MRNet Version 3.1.0, June 2011.
MRNet Version 3.0.1, December 2010.
MRNet Version 3.0, August 2010.
MRNet Version 2.1, May 2009.
MRNet Version 2.0, July 2008.
MRNet Version 1.2, March 2007.
MRNet Version 1.1, April 2005.
MRNet Version 1.0, September 2003.
Publications
UW Publications
- Benjamin Welton, Evan Samanas, and Barton P. Miller, "Mr. Scan: Extreme Scale Density-Based Clustering using a Tree-Based Network of GPGPU Nodes", Supercomputing 2013 (SC2013) Denver, CO, November 2013. [PDF]
- Michael J. Brim, Luiz DeRose, Barton P. Miller, Ramya Olichandran, and Philip C. Roth, "MRNet: A Scalable Infrastructure for the Development of Parallel Tools and Applications", Cray User Group 2010, Edinburgh, Scotland, May 2010. [PDF]
- Emily R. Jacobson, Michael J. Brim, and Barton P. Miller, "A Lightweight Library for Building Scalable Tools", Para 2010: State of the Art in Scientific and Parallel Computing, Reykjavik, Iceland, June 2010. [PDF]
- Dorian C. Arnold and Barton P. Miller, "Scalable Failure Recovery for High-performance Data Aggregation",International Parallel and Distributed Processing Symposium (IPDPS), Atlanta, April 2010. [PDF]
- Dorian C. Arnold, Gary D. Pack and Barton P. Miller, "Tree-based Overlay Networks for Scalable Applications", 11th International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS 2006), Rhodes, Greece, April 2006. [ PDF ]
- Philip C. Roth and Barton P. Miller, "On-line Automated Performance Diagnosis on Thousands of Processes", ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'06), New York City, March 2006. [ PDF ]
- Philip C. Roth, Dorian C. Arnold, and Barton P. Miller, "Benchmarking the MRNet Distributed Tool Infrastructure: Lessons Learned", 2004 High-Performance Grid Computing Workshop, Santa Fe, New Mexico, April 2004. [ PDF ]
- Philip C. Roth, Dorian C. Arnold, and Barton P. Miller, "MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools", SC2003, Phoenix, Arizona, November 2003. [ PDF ]
- MRNet Poster. SC2004, Pittsburgh, PA November 2004. [ PDF ]
Joint/External Publications
- Dong H. Ahn, Bronis R. de Supinski, Ignacio. Laguna, Greg L. Lee, Ben Liblit, Barton P. Miller, and Martin Schulz, "Scalable Temporal Order Analysis for Large Scale Debugging", Supercomputing 2009 (SC2009), Portland, OR, November 2009. [PDF]
- Gregory L. Lee, Dong H. Ahn, Dorian C. Arnold, Bronis R. de Supinski, Matthew Legendre,
Barton P. Miller, Martin Schulz, and Ben Liblit,
"Lessons Learned at 208K: Towards Debugging Millions of Cores",
Supercomputing 2008 (SC2008), Austin, TX, November 2008.
[ PDF ]
- Dong H. Ahn, Dorian C. Arnold, Bronis R. de Supinski, Gregory Lee,
Barton P. Miller, and Martin Schulz,
"Overcoming Scalablility Challenges for Tool Daemon Launching",
37th International Conference on Parallel Processing (ICPP-08),
Portland, Oregon, September, 2008.
[ PDF ]
- Aroon Nataraj Allen D. Malony, Alan Morris, Dorian C. Arnold and Barton P. Miller,
"In Search of Sweet-Spots in Parallel Performance Monitoring", IEEE Cluster 2008,
Tsukuba, Japan, September 2008.
[ PDF ]
- Aroon Nataraj Allen D. Malony, Alan Morris, Dorian C. Arnold and Barton P. Miller,
"A Framework for Scalable, Parallel Performance Monitoring using TAU and MRNet",
International Workshop on Scalable Tools for High-End Computing (STHEC 2008),
Island of Kos, Greece, June 2008.
[ PDF ]
- Dorian C. Arnold, Dong H. Ahn, Bronis R. de Supinski, Gregory Lee, Barton P. Miller, and Martin Schulz, "Stack Trace Analysis for Large Scale Applications", International Parallel & Distributed Processing Symposium, Long Beach, California, March 2007. [ PDF ]
- Martin Schulz, Dong Ahn, Andrew Bernat, Bronis R. de Supinski, Steven Y. Ko, Gregory Lee, and Barry Rountree, "Scalable Dynamic Binary Instrumentation for Blue Gene/L." ACM SIGARCH Computer Architecture News 33(5), pp. 9-14, December, 2005.
Contact Information:
Paradyn Project
Computer Sciences Department
University of Wisconsin
1210 West Dayton Street
Madison, WI 53706
E-mail: mrnet@cs.wisc.edu
FAX: +1 608-262-9777