Table of Contents
List of Examples
MRNet is a customizable, high-throughput communication software system for parallel tools and applications with a master/slave architecture. MRNet reduces the cost of these tools' activities by incorporating a tree of processes between the tool's front-end and back-ends. MRNet uses these internal processes to distribute many important tool activities, reducing analysis time and keeping tool front-end loads manageable.
MRNet-based tools send data between front-end and back-ends on logical flows of data called streams. MRNet internal processes use filters to synchronize and aggregate data sent to the tool's front-end. Using filters to manipulate data in parallel as it passes through the network, MRNet can efficiently compute averages, sums, and other more complex aggregations on back-end data.
Several features make MRNet especially well-suited as a general facility for building scalable parallel tools:
Table of Contents
For this discussion, $MRNET_ROOT
is the location of the
top-level directory of the MRNet distribution and
$MRNET_ARCH
is a string describing the platform (OS and
architecture) as discovered by autoconf. For the installation instructions, it is
assumed that the current working directory is $MRNET_ROOT
.
MRNet has been developed to be highly portable; we expect it to run properly on all common Unix-based as well as Microsoft Windows platforms. This being said, we have successfully built and tested MRNet using GCC version 3.3 compilers on the following systems:
Our build system attempts to native system compilers where appropriate, for instance, xlc and xlC in AIX environments.
Here we list the third party tools that MRNet uses and needs for proper installation:
MRNet uses GNU autoconf to discover the platform specific configuration parameters. The script that does this auto-configuration is called configure.
UNIX>
./configure --help
shows all possible options of the command. Below, we display the MRNet-specific ones:
--with-libfldir Directory containing flex library
./configure without any options should give reasonable results, but the user may specify certain options. For example,
UNIX>
./configure CXX=g++ CXXFLAGS=-O3 --with-libfldir=/usr/local/lib
instructs the configure script to use g++ for the C++
compiler with level 3 optimization, very verbose output
and /usr/local/lib/libfl.a
as the location of the flex library.
To build the MRNet toolkit by type:
UNIX>
make
After a successful build, the following files will be present:
$MRNET_ROOT/lib/$MRNET_ARCH/libmrnet.a
: MRNet
API library$MRNET_ROOT/lib/$MRNET_ARCH/libxplat.a
: A library that exports platform dependent routines to MRNet$MRNET_ROOT/bin/$MRNET_ARCH/mrnet_commnode
: MRNet
internal communcation nodeTyping:
UNIX>
make mrnet_tests
builds the mrnet test files. In addition to those files above, you will also generate:
$MRNET_ROOT/bin/$MRNET_ARCH/mrnet_topgen
: MRNet
topology file generator$MRNET_ROOT/bin/$MRNET_ARCH/*_[FE,BE]
: MRNet test front-end and back-end programs$MRNET_ROOT/lib/$MRNET_ARCH/test_DynamicFilters.so
: Shared object used in tests of dynamic filter loading.$MRNET_ROOT/bin/$MRNET_ARCH/mrnet_tests.sh
: A shell script that runs the test programs and checks for errors in an automated fashion.
UNIX>
mrnet_tests.sh [ -l | -r <hostfile> | -a <hostfile> | -f <sharedobject>
-l
flag is used to run all tests using only topologies that create processes on the local machine. The -r
flag runs tests using remote machines specified in the file whose name immediately follows this flag. To run test both locally and remotely, use the -a
flag and specify a hostfile to use. To run the programs that test MRNet's ability to dynamically load filters, you must specify the absolute location of the shared object test_DynamicFilters.so
produced when the tests were built.
$PATH
. For testing dynamic filters, the filesystem containing the shared object must be available to all the host machines participating in the test.Table of Contents
The MRNet distribution has two main components: libmrnet.a
, a
library that is linked into a tool's front-end and back-end components, and
mrnet_commnode, a program that runs on intermediate nodes
interposed between the application front-end and back-ends. libmrnet.a
exports an API (See Chapter 5, The MRNet C++ API Reference) that enables
I/O interaction between the front-end and groups of back-ends via MRNet. The
primary purpose of mrnet_commnode is to distribute data
processing functionality across multiple computer hosts and to implement
efficient and scalable group communications. The following sub-sections describe
the lower-level components of the MRNet API in more detail.
An MRNet end-point represents a tool or application process or node. In particular, they represent the back-end processes in the system. The front-end can communicate in a unicast or multicast fashion with one or more of these end-points as described below.
MRNet uses communicators to represent groups of network end-points. Like communicators in MPI, MRNet communicators provide a handle that identifies a set of end-points for point-to-point, multicast or broadcast communications. MPI applications typically have a non-hierarchical layout of potentially identical processes. In contrast, MRNet enforces a tree-like layout of all processes, rooted at the tool front-end. Accordingly, MRNet communicators are created and managed by the front-end, and communication is only allowed between a tool's front-end and its back-ends, i.e. back-ends cannot interact with each other directly via MRNet.
A stream is a logical channel that connects the front-end to the end-points of a communicator. All tool-level communication via MRNet must use these streams. Streams carry data packets downstream, from the front-end toward the back-ends, and upstream, from the back-ends toward the front-end. Upward streams are expected to carry data of a specific type allowing data aggregation operations to be associated with a stream. The type is specified using a format string (See Appendix A, MRNET Format Strings) similar to those used in C formatted I/O primitives, e.g. a packet whose data is described by the format string "%d %d %f %s" contains two integers followed by a float then a character string. MRNet expands the standard specification to allow for specifiers that describe arrays of integers and floats.
Data Aggregation is the process of merging multiple input data packets and transforming them into one or more output packets. Though it is not necessary for the aggregation to result in less or even different data, aggregations that reduce or modify data values are most common. MRNet uses data filters to aggregate data packets. Filters specify an operation to perform and the type of the data expected on the bound stream. Filter instances are bound to a stream at stream creation. MRNet uses two types of filters: synchronization filters and transformation filters. Synchronization filters organize data packets from downstream nodes into synchronized waves of data packets, while transformation filters operate on the synchronized data packets yielding one or more output packets.
Filters operate on data flowing upstream in the network. Synchronization filters receive packets one at a time and do not output any packets until the specified synchronization criteria has occurred. Transformation filters input the group of synchronized packets, perform some type of data transformation on the data contained in the packets and output one or more packets. A distinction between synchronization and transformation filters is that synchronization filters are independent of the packet data type, but transformation filters operate on packets of a specific type. Synchronization filters provide a mechanism to deal with the asynchronous rrival of packets from children nodes; the synchronizer collects packets and typically aligns them into waves, passing an entire wave onward at the same time. Therefore, synchronization filters do no data transformation and can operate on packets in a type-independent fashion. MRNet currently supports three synchronization modes:
Transformation filters combine data from multiple packets by performing an aggregation that yields one or more new data packets. Since transformation filters are expected to perform computational operations on data packets, there is a type requirement for the data packets to be passed to this type of filter: the data format string of the stream's packets and the filter must be the same. Transformation operations must be synchronous, but can carry state from one transformation to the next using static storage structures. MRNet provides several transformation filters that should be of general use:
the section called “Creating and Using MRNet Filter Shared Object Files” describes facilities a tool developer may use to add new filters to the provided set.
Table of Contents
A complete description of the MRNet API is in Chapter 5, The MRNet C++ API Reference. This section offers a brief overview only. Using
libmrnet.a
, a tool can leverage a system of internal
processes, instances of the mrnet_commnode
program, as a
communication substrate. After instantiation of the MRNet network (discussed in
the section called “MRNet Instantiation”, the front-end and back-end processes
are connected by the internal processes. The connection topology and host
assignment of these processes is determined by a configuration file, thus the
geometry of MRNet's process tree can be customized to suit the physical topology
of the underlying hardware resources. While MRNet can generate a variety of
standard topologies, users can easily specify their own topologies; see Chapter 6, MRNET Process-tree Topologies for further discussion.
The MRNet API, provided by libmrnet, contains network, end-point, communicator, and stream objects that a tool's front-end and back-end use for communication. The network object is used to instantiate the MRNet network and access end-point objects that represent available tool back-ends. The communicator object is a container for groups of end-points, and streams are used to send data to the end-points in a communicator.
Example 4.1. MRNet Front-end Sample Code
front_end_main(){ 1. Network * net; 2. Communicator * comm; 3. Stream * stream; 4. Packet *packet; 5. float result; 6. net = new Network(config_file, backend_exe, backend_argv); 7. comm = net->get_broadcast_communicator( ); 8. stream = net->new_Stream(comm, FMAX_FIL); 9. stream->send(tag, "%d", FLOAT_MAX_INIT); 10. stream->recv(&tag, &Packet) 11. Network::unpack(Packet, "%f", result); }
A simplified version of code from an example tool front-end is shown in Example 4.1, “MRNet Front-end Sample Code”. In the front-end code, after some variable definitions in lines 1-5, in line 6, an instance of the MRNet network is created using the topology specification in config_file. In line 7, the newly created network object is queried for an auto-generated broadcast communicator that contains all available end-points. In line 8, this communicator is used to established a stream for which the MRNet internal processes will use a filter that finds the maximum floating point data value of the data sent upstream. The front-end then might send one or more initialization messages to the backends; in our example code on line 9, we broadcast an integer initializer; the tag parameter is meant to be an application specific value specifying the nature of the message being transmitted. After the send operation, the front-end performs a blocking recv. This call returns a tag and a packet. Finally, line 11 calls unpack to deserialize the floating point value result from the packet data structure.
Example 4.2. MRNet Back-end Sample Code
back_end_main(){ 1. Stream * stream; 2. Packet * packet; 3. int val, tag; 4. Network * net = new Network( ); 5. net->recv(&tag, &packet, &stream); 6. Network::unpack(packet, "%d", &val ); 7. if( val == FLOAT_MAX_INIT ){ 8. stream->send("%f", rand_float); } }
Example 4.2, “MRNet Back-end Sample Code” shows the code for the back-end that
reciprocates the actions of the front-end. Each tool back-end
first connects to the appropriate internal process, via the backend's version of the network constructor, Network::Network()
in line 4. While the front-end makes a stream-specific recv call, the back-ends
make a stream-anonymous recv that returns the tag sent by the front-end, the packet containing the actual data sent, and a stream object representing the stream that the front-end has established. Finally, each back-end sends a scalar floating point value
upstream toward the front-end.
A complete example of MRNet code can be found in Appendix D, A Complete MRNet Example.
While conceptually simple, creating and connecting the internal processes is complicated by interactions with the various job scheduling systems. In the simplest environments, we can launch jobs manually using facilities like rsh or ssh. In more complex environments, it is necessary to submit all requests to a job management system. In this case, we are constrained by the operations provided by the job manager (and these vary from system to system). We currently support two modes of instantiating MRNet-based tools.
In the first mode of process instantiation, MRNet creates the internal and back-end processes, using the specified MRNet topology configuration to determine the hosts on which the components should be located. First, the front-end consults the configuration and uses rsh or ssh to create internal processes for the first level of the communication tree on the appropriate hosts. Upon instantiation, the newly created processes establish a network connection to the process that created it. The first activity on this connection is a message from parent to child containing the portion of the configuration relevant to that child. The child then uses this information to begin instantiation of the sub-tree rooted at that child. When a sub-tree has been established, the root of that sub-tree sends a report to its parent containing the end-points accessible via that sub-tree. Each internal node establishes its children processes and their respective connections sequentially. However, since the various processes are expected to run on different compute nodes, sub-trees in different branches of the network are created in concurrently, maximizing the efficiency of network instantiation.
In the second mode of process instantiation, MRNet relies on a process management system to create some or all of the MRNet processes. This mode accommodates tools that require their back-ends to create, monitor, and control the application processes. For example, IBM's POE uses environment variables to pass information, such as the process' rank within the application's global MPI communicator, to the MPI run-time library in each application process. In cases like this, MRNet cannot provide back-end processes with the environment necessary to start MPI application processes. As a result, MRNet creates its internal processes recursively as in the first instantiation mode, but does not instantiates any back-end processes. MRNet then starts the tool back-ends using the process management system to ensure they have the environment needed to create application processes successfully. When starting the back-ends, MRNet must provide them with the information needed to connect to the MRNet internal process tree, such as the leaf processes' host names and connection port numbers. This information is provided via the environment, using shared filesystems or other information services as available on the target system.
Table of Contents
All classes are included in the MRN
namespace.
For this discussion, we do not explicitly include reference to the
namespace; for example, when we reference the class Network
, we are
implying the class MRN::Network
.
In MRNet, there are five top-level classes: Network
,
EndPoint
, Communicator
,
Stream
, and Event
.
The Network class contains primarily
static methods that allow one to instantiate, and destroy MRNet
process trees and to query instantiated trees for information.
Application back-ends are referred to as end-points and are
encapsulated by objects of type EndPoint. The Communicator
class is used to reference a group of EndPoints and can be used
to establish MRNet Streams for unicast, multicast or broadcast
communications via the MRNet infrastructure. The Event class represents
the interface used by MRNet to report errors and other events of interest
to the user level. (This feature is in development mode.)
The public members of these classes are detailed below.
void Network::Network( | iconfig_filename, | |
ibackend_exe, | ||
iargv) ; |
const char * | iconfig_filename; |
const char * | ibackend_exe; |
const char ** | iargv; |
Network::Network()
is a constructor method that is used to instantiate the MRNet process tree.iconfig_filename
is the path to a configuration file that describes the desired process tree topology.ibackend_exe
is the path to the executable to be used for the application's back-end processes. Finally,iargv
is a null terminated list of arguments to pass to the back-end application upon creation.When this function returns without error, all MRNet internal processes and the application back-end processes will have been instantiated using rsh or ssh depending on the setting of the environment variable
MRNET_RSH
. Error conditions may be determined via the functionsNetwork::fail()
.Note
If it is necessary to run the rsh with a utility like runauth to non-interactively authenticate the unattended remote process, that command may be specified using theMRNET_RUNAUTH
environment variable.
void Network::~Network( | ) ; |
Network::~Network
is used to tear down the
MRNet process tree. When this function is called, each node in the
MRNet configuration sends a control message to its immediate children
informing them of the "delete network" request. After delivering this
message, the process itself terminates. Note: if the application back-ends
have not already terminated, invoking this method will cause them to
terminate.
int Network::recv( | otag, | |
opacket, | ||
ostream, | ||
iblocking = true) ; |
int * | otag; |
Packet * * | opacket; |
Stream * * | ostream; |
bool | iblocking = true; |
This function is used to invoke a stream-anonymous receive operation. Any packet available (addressed to any stream) will be returned (in roughly FIFO ordering) via the output parameters passed in.otag
will be filled in with the integer tag value that was passed by the correspondingStream::send()
operation.opacket
is the packet that was received and must be passed to theStream::unpack
described below for deserialization of the data contents. A pointer to the stream to which the packet was addressed will be returned inostream
.iblocking
is used signal whether this call should block or return if data is not immediately available; it defaults to a blocking call. A return value of -1 indicates an error, 0 indicates no packets were available, and 1 indicates success.
void Network::print_error( | error_msg) ; |
const char * | error_msg; |
Network::print_error()
prints a message to stderr describing the last error encountered during a MRNet library call. It first prints the null-terminated string,error_msg
followed by a colon then the actual error message followed by a newline.
bool Network::fail( | ) ; |
Network::fail()
returns true if the network has experienced
a failure condition and false otherwise.
bool Network::good( | ) ; |
Network::good()
returns false if the network has experienced
a failure condition and true otherwise.
void Network::get_SocketFd( | ofd_array, | |
ofd_array_size) ; |
int ** | ofd_array; |
int * | ofd_array_size; |
Network::get_SocketFd()
is used to notify an application of all the file descriptors MRNet is using for network communication. This version of the function is expected to be called from front-end applications and returns an array of sizeofd_array_size
entries in the output arrayofd_array
.
int Network::get_SocketFd( | ) ; |
Network::get_SocketFd()
is used to notify an application of
all the file descriptors MRNet is using for network communication. This version
of the function is expected to be called from the back-end applications and
returns the single filedescriptor MRNet uses to connect the back-end to its
upstream parent.
Instances of communicators are network specific, so their creation methods are functions of an instantiated MRNet network object.
Communicator * Network::new_ Communicator( | ) ; |
This function returns a pointer to a new Communicator object. The object
initially contains no endpoints. Use
Communicator::add_EndPoint( )
to populate the communicator.
Communicator * Network::new_ Communicator( | icomm) ; |
Communicator & | icomm; |
This function returns a pointer to a new Communicator object that initially
contains the set of endpoints contained in icomm
.
Communicator * Network::new_ Communicator( | iendpoints) ; |
std::vector <EndPoint *> & | iendpoints; |
This function returns a pointer to a new Communicator object that initially
contains the set of endpoints contained in iendpoints
.
Communicator * Network::get_ BroadcastCommunicator( | ) ; |
This function returns a pointer to a default communicator containing all the endpoints available in the system. Multiple calls to this function return the same pointer to the broadcast communicator object created at network instantiation. This object should not be deleted.
int Communicator::add_EndPoint( | hostname, | |
port) ; |
const char * | hostname; |
unsigned short | port; |
This function is used to add a new EndPoint object to the set contained by the communicator. The original set of endpoints contained by the communicator is tested to see if it already contains the potentially new endpoint. If so, the function silently returns successfully. This function fails if there exists no endpoint defined byhostname
:port
. This function returns 0 on success, -1 on failure.
int Communicator::add_EndPoint( | endpoint) ; |
EndPoint & | endpoint; |
This function is similar to the add_EndPoint()
above except that it takes an explicit EndPoint object instead of
hostname and port parameters. Success and failure conditions are
exactly as stated above. This function also returns 0 on success and
-1 on failure.
unsigned int Communicator::size( | ) ; |
This function returns the number of endpoints contained in the communicator.
const char * Communicator::get_HostName( | idx) ; |
unsigned int | idx; |
This function returns a character string identifying the hostname of the endpoint at positionidx
in the set contained by the communicator. A return value of NULL signals thatidx>
is out of range.
unsigned short Communicator::get_Port( | idx) ; |
unsigned int | idx; |
This function returns an unsigned short identifying the connection port of the endpoint at positionidx
in the set contained by the communicator. A return value of NULL signals thatidx>
is out of range.
unsigned int Communicator::get_Id( | idx) ; |
unsigned int | idx; |
This function returns an unsigned int that is used by MRNet to uniquely identify the endpoint at positionidx
in the set contained by the communicator. A return value of NULL signals thatidx>
is out of range.
Instances of streams are network specific, so their creation methods are functions of an instantiated MRNet network object.
Stream * Network::new_Stream( | comm, | |
iupstream_tfilter_id = TFILTER_NULL, | ||
iupstream_sfilter_id = SFILTER_WAITFORALL, | ||
idownstream_tfilter_id = TFILTER_NULL) ; |
Communicator * | comm; |
int | iupstream_tfilter_id = TFILTER_NULL; |
int | iupstream_sfilter_id = SFILTER_WAITFORALL; |
int | idownstream_tfilter_id = TFILTER_NULL; |
Network::new_Stream()
creates a MRNet stream object attached to the endpoints specified by thecomm
argument. The second argumentiupstream_tfilter_id
specifies the transformation filter to apply to data flowing upstream from the application back-ends toward the front-end; the default value is the "Null Filter".iupstream_sfilter_id
specifies the synchronization filter to apply to upstream packets; the default value is the "Wait-for-all".idownstream_tfilter_id
allows the user to specify a filter to apply to downstream data flows; the default value is the "Null Filter". For a complete discussion of MRNet Filters, see the section called “Filters”
Stream * Network::get_Stream( | iid) ; |
int | iid; |
Network::get_Stream()
returns the stream identified byiid
or -1 on failure.
int Stream::send( | tag, | |
format_str, | ||
...) ; |
int | tag; |
const char * | format_str; |
This function invokes a data output operation on the calling stream.tag
is an integer identifier that is expected to classify the data in the packet to be transmitted across the stream.format_str
is a format string describing the data in the packet (See Appendix A, MRNET Format Strings for a full description.) On success,Stream::send()
returns 0; on failure, -1.
int Stream::recv( | otag, | |
opacket, | ||
iblocking = true) ; |
int * | otag; |
Packet * * | opacket; |
bool | iblocking = true; |
Stream::recv()
invokes a stream-specific receive operation. Packets addressed to the calling stream will be returned in strictly FIFO ordering via the output parameters passed in.otag
will be filled in with the integer tag value that was passed by the correspondingStream::send()
operation.opacket
is the recieved Packet that must be passed to theStream::unpack
for deserialization of its data elements.iblocking
determines whether the recv call should block or return if data is not immediately available; it defaults to a blocking call. A return value of -1 indicates an error, 0 indicates no packets were available, and 1 indicates success.
static int Stream::unpack( | ipacket, | |
iformat_str, | ||
...) ; |
Packet * | ipacket; |
const char * | iformat_str; |
This function operates similarily to C'ssscanf
. It takes a packet,ipacket
, that was returned by a previous call toStream::recv()
.iformat_str
is a format string describing the datatypes expected in the packet returned byStream::recv()
(See Appendix A, MRNET Format Strings for a full description.) On success,Stream::unpack()
returns 0; on failure, -1.
int Stream::flush( | ) ; |
This function commits a flush of all packets currently buffered by the stream pending an output operation. A successful return indicates that all packets on the calling stream have been passed to the operating system kernel for network transmission.
int Network::load_FilterFunc( | iso_file, | |
ifunc, | ||
iis_transformation_filter=true) ; |
const char * | iso_file; |
const char * | ifunc; |
bool | iis_transformation_filter=true; |
This method, used for loading new filter operations into the network
is conveniently similar to the conventional dlopen()
facilities for opening a shared object and dynamically
loading symbols defined within.
iso_file
is the path to a shared object file that contains
the filter function to be loaded and ifunc
is the name of the
function to be loaded. The last parameter iis_transformation_filter
defaults to true and can usually be omitted since the common case
is to load transformation, not synchronization, filters. Additionally,
the shared object file must contain a const char * symbol named
by the string formed by the concatenation of the filter function name and
the suffix "_format_string". For instance, if the filter function is named
my_filter_func
, the shared object must define a symbol
const char *my_filter_func_format_string
.
The value of this string will be the MRNet format string describing the
format of data that the filter can operate on. A value of ""
denotes that the filter can operate on data of arbitrary type.
On success, Network::load_FilterFunc()
returns the id
of the newly loaded filter which may be used in subsequent calls to
Network::new_Stream()
. A value of -1 is returned
on failure.
To properly demonstrate MRNet filters, we must discuss MRNet's packet
abstraction in more detail. A packet encapsulates a chunk of formatted
data, usually created as the result of a Stream::send()
call. Let's say a packet was created using the format string
"%s %d"
describing a null-terminated string, followed
by a 32-bit integer, the packet is said to contain 2 data elements, of
those types respectively. In the packet class, the operator[]
is overloaded so that packet[i]
conveniently returns the
ith data element in the packet. Furthermore,
the data element abstraction contains accessor functions to return proper
data values. In the above example, packet[0].get_string() returns the value
of the string in the packet and packet[1].get_int32_t() returns the value of
the integer that represents the second data element. It is the filter
developers responsibility to properly define the format of the data
expected by and determine how to access respective elements.
Appendix B, MRNET Data Elements contains the full listings of data types
and accessor functions available to MRNet filter functions.
A filter function has the following signature
void filter_name( | ipackets_in, | |
ipackets_out, | ||
ilocal_storage) ; |
std::vector<Packet> & | ipackets_in; |
std::vector<Packet> & | ipackets_out; |
void ** | ilocal_storage; |
ipackets_in
is the reference to a vector of packets serving
as input to the filter function. ipackets_out
is the
reference to a vector into which output packets should be placed.
ilocal_storage
may be used to define and maintain
filter-instance specific state.
For each filter function defined in a shared object file, there must be a
const char * symbol named
by the string formed by the concatenation of the filter function name and
the suffix "_format_string". For instance, if the filter function is named
my_filter_func
, the shared object must define a symbol
const char *my_filter_func_format_string
.
The value of this string will be the MRNet format string describing the
format of data that the filter can operate on. A value of ""
denotes that the filter can operate on data of arbitrary value.
This topic currently pertains to usage with the GNU C++ compiler only. We will update the topic to discuss using other compilers as well.
Since we use the C facility dlopen()
to dynamically
load new filter functions, all symbols to be exported this way must be
"extern C'd". That is, the symbol definitions must fall with the statements
extern "C"{
and
}
The file that contains the filter functions and format strings may be
compiled with the GNU compiler options "-fPIC -shared -rdynamic"
to produce a valid shared object.
A front-end that will dynamically load filters must be built
with the GNU compiler options "-Wl,-E"
to notify the linker
export global symbols externally.
Currently, MRNet has primitive support for asynchronous error notification based on events. Whenever errors, failures, or other significant events take place at internal or back-end nodes, an event is propagated toward the front-end. This event is placed in a queue which the user can access to determine the systems state. Eventually, MRNet will migrate to error semantics this will give the user more immediate and responsive error notification, including the use of callbacks into the user's space. For now, we describe the portion of the event infrastructure relevant to the user.
static bool Event::have_Event( | ) ; |
Event::have_Event
checks the MRNet event queue returning
true if new events exist and false, otherwise.
static Event * Event::get_NextEvent( | ) ; |
Event::get_NextEvent
returns the next event in the
MRNet event queue. A pointer to the event is returned if the queue is
non-empty, otherwise, NULL is returned. This function removes the event
it returns from the MRNet event queue.
EventType Event::get_Type( | ) ; |
Event::get_Type
returns the type of the event.
Appendix C, MRNET Event Types defines all the valid event types.
const std::string &Event::get_HostName( | ) ; |
Event::get_HostName
returns the name of the host
on which the event occured.
unsigned short Event::get_Port( | ) ; |
Event::get_Port
returns the MRNet process identifier
of the process in which the event occured.
const std::string & Event::get_Description( | ) ; |
Event::get_Description
returns a string containing
a human-readable description of the event.
Table of Contents
MRNet allows a tool to specify a node allocation and process connectivity tailored to its computation and communication requirements and to the system where the tool will run. Choosing an appropriate MRNet configuration can be difficult due to the complexity of the tool's own activity and its interaction with the system. This section describes how users may define their own process topologies, and the mrnet_topgen tool provided by MRNet to facilitate the process.
The first parameter to the Network::new_Network()
function is the name of an MRNet topology file. This file defines the
topological layout of the front-end, internal nodes, and back-end MRNet and
tool processes. In the syntax of the topology file, the
hostname:id
tuple represents a process with MRNet id
id
running on hostname
. It is important
to note that the id
is of symbolic value only and does not
reflect a port or process number associated with the system. A line in the
topology file is always of the form:
hostname1:0 => hostname1:1 hostname1:2 ;
meaning process on hostname1
with MRNet id, 0,
has two children, with MRNet ids, 1 and 2, respectively, running on the
same host. MRNet will parse the topology file without error if the file
properly defines a tree, in the mathematical sense (i.e. a tree must have
a single root, no cycles, full connection, and no node can be its own
descendant).
When the MRNet test programs are built, a topology generator program,
$MRNET_ROOT/bin/$MRNET_ARCH/mrnet_topgen
, will also
be created. The usage of this program is:
mrnet_topgen <infile> <outfile> <befile> <num_backends> <fan-out>
infile
is a machine file containing a set of
MRNet host/process identifiers in the format, hostname:id
,
described above. outfile
is the name of the MRNet topology
file to be created by the generator. befile
must be
specified, but can be ignored. Finally, num_backends
and
fan-out
define the number of backends and the fan-out to
be used at the front-end and all internal nodes, respectively.
The specified input machine file must contain enough unique host/process
tuples to support the entire process tree. Currently, this program
can only build completely balanced trees with the same fan-out at each
parent node.
Following the % character introducing a conversion there may be a number of flag characters. u, h, l, and a are special modifiers meaning unsigned, short, long and array, respectivley. The full set of conversions are:
c | Matches a signed 8-bit character |
uc | Matches an unsigned 8-bit character |
ac | Matches an array of signed 8-bit characters |
auc | Matches an array of unsigned 8-bit characters |
hd | Matches a signed 16-bit decimal integer |
uhd | Matches an unsigned 16-bit decimal integer |
ahd | Matches an array of signed 16-bit decimal integers |
auhd | Matches an array of unsigned 16-bit decimal integers |
d | Matches a signed 32-bit decimal integer |
ud | Matches an unsigned 32-bit decimal integer |
ad | Matches an array of signed 32-bit decimal integers |
aud | Matches an array of unsigned 32-bit decimal integers |
ld | Matches a signed 64-bit decimal integer |
uld | Matches an unsigned 64-bit decimal integer |
ald | Matches an array of signed 64-bit decimal integers |
auld | Matches an array of unsigned 64-bit decimal integers |
f | Matches a 32-bit floating-point number |
af | Matches an array of 32-bit floating-point numbers |
lf | Matches a 64-bit floating-point number |
alf | Matches an array of 64-bit floating-point numbers |
s | Matches a null-terminated character string. |
as | Matches an array of null-terminated character strings. |
Data Type | Read Accessor Function | Write Accessor Function |
CHAR_T | char get_char( ) | void set_char( char ) |
UCHAR_T | unsigned char get_char( ) | void set_char( unsigned char ) |
INT16_T | int16_t get_int16_t( ) | void set_int16_t( int16_t ) |
UINT16_T | uint16_t get_uint16_t( ) | void set_uint16_t( uint16_t ) |
INT32_T | int32_t get_int32_t( ) | void set_int32_t( int32_t ) |
UINT32_T | uint32_t get_uint32_t( ) | void set_uint32_t( uint32_t ) |
INT64_T | int64_t get_int64_t( ) | void set_int64_t( int64_t ) |
UINT64_T | uint64_t get_uint64_t( ) | void set_uint64_t( uint64_t ) |
FLOAT_T | float get_float( ) | void set_float( float ) |
DOUBLE_T | double get_double( ) | void set_double( double ) |
STRING_T | char * get_string( ) | void set_string( char * ) |
*_ARRAY_T | void * get_array( DataType *, uint32_t * ) | void set_array( void *, DataType, uint32_t ) |
EBADCONFIG | Configuration/Initialization Error |
ESYSTEM | Failed System/Library Call |
EPACKING | Failure while packing/unpacking Packet data |
EFMTSTR | Format string mismatch Error |
EPROTOCOL | Internal Protocol Error |
UNKNOWN_EVENT | Unknown Event |
Example D.1. A Complete MRNet Front-End
#include "mrnet/MRNet.h" using namespace MRN; int main( int argc, char **argv ) { const char * dummy_argv=NULL; int num_packets=10; unsigned int num_packets_to_recv=0; if( argc != 3 ){ fprintf(stderr, "Usage: %s <topology file> <backend_exe>\n", argv[0]); exit(-1); } set_OutputLevel(1); Network * net = new Network( argv[1], argv[2], &dummy_argv ); if( net->fail() ){ net->error_str(argv[0]); exit(-1); } MRN::Stream * stream = net->new_Stream( net->get_BroadcastCommunicator(), MRN::TFILTER_NULL, MRN::SFILTER_DONTWAIT ); stream->send( 7777, "%d", num_packets ); stream->flush(); //each backend will send num_packets packets ... num_packets_to_recv = net->get_BroadcastCommunicator()->size() * num_packets; while( num_packets_to_recv > 0 ){ int retval, tag, recv_val; Packet *packet; //blocking receive ... if( stream->recv(&tag, &packet, true) == -1 ) return -1; if( Stream::unpack( packet, "%d", &recv_val ) == -1 ) return -1; num_packets_to_recv--; } return 0; }
Example D.2. A Complete MRNet Back-End
#include "mrnet/MRNet.h" using namespace MRN; int main( int argc, char** argv ){ Stream * stream; Packet * packet=NULL; int tag; int num_packets, sleep_time; set_OutputLevel(1); if( argc != 4 ){ fprintf(stderr, "usage: %s parent_hostname parent_port my_rank\n", argv[0]); exit( -1 ); } const char* parHostname = argv[argc-3]; Port parPort = (Port)strtoul( argv[argc-2], NULL, 10 ); Rank myRank = (Rank)strtoul( argv[argc-1], NULL, 10 ); Network * net = new Network( parHostname, parPort, myRank ); if( net->fail() ) return -1; if( net->recv( &tag, &packet, &stream ) != 1) return -1; assert( tag == 7777 ); Stream::unpack( packet, "%d", &num_packets ); for( unsigned int i=0; i<num_packets; i++ ){ if( stream->send( tag, "%d", rand() ) == -1 ) return -1; if( stream->flush( ) == -1 ) return -1; } return 0; }
Example D.3. An MRNet Topology File
nutmeg:0 => c01:0 c02:0 c03:0 c04:0 ; c03:0 => c05:0 ; c04:0 => c06:0 c07:0 c08:0 c09:0 ; c08:0 => c10:0 ; c09:0 => c11:0 ; # nutmeg # | # | # ------- # /| |\ # / | | \ # / | | \ # / | | \ # c01 c02 c03 c04 # | | # c05 | # ------- # / | | \ # / | | \ # / | | \ # c06 c07 c08 c09 # | | # c10 c11
Example D.4. An MRNet Filter: Integer Maximum
//Must Declare the format of data expected by the filter const char * tfilter_int_max_format_str="%d"; void tfilter_int_max( const std::vector < Packet > &packets_in, std::vector < Packet > &packets_out ) { // initialize result to the value in first packet int result = packets_in[0][0].get_int32_t(); for( unsigned int i = 1; i < packets_in.size( ); i++ ) { Packet cur_packet = packets_in[i]; // 1st and only data element in packet is an int ("%d") result = max( result, cur_packet[0].get_int32_t() ); } Packet new_packet( packets_in[0].get_StreamId( ), packets_in[0].get_Tag( ), "%d", result ); packets_out.push_back( new_packet ); }