Load Balancing to Achieve High-Performance Parallel and Distributed Systems

Exploitable results

ArrayTracer is a high level, low overhead Parallel Performance Analysis (PPA) tool, which is easy-to-use and allows the level of trace detail (granularity) to be tuned. It is designed specifically for development and performance analysis of large parallel applications, on workstation networks and distributed parallel machines with shared-memory. By tracing data and code flow it provides information for the optimization of application code for the target system and results from the trace can be used as input into algorithms. ArrayTracer provides programmers with many flexible trace options, including tracing of specific data during specified computation phases; tracing at elementary data object level; code tracing at higher levels; and flow control tracing with adjustable granularity. Whenever possible, ArrayTracer uses computation of events rather than tracing: this reduces the performance overhead during application execution, introduced by the tracing process itself, and distortions due to the tracing operation. Traces are produced at application run-time by trace commands inserted into the application's source code. This approach rare for PPA tools, eliminates the need for correlating trace data and mapping of correlation results into the corresponding application level events. It also minimizes program distortion compared with trace techniques which rely on trace data from a separate system monitor. These are also more time consuming, as they interrupt application execution to extract trace information from memory, and may have impacts on the control/data flow.

In large, distributed computer systems, such as those used in on-line transaction processing (OLTP), every transaction is served by one of the participating processors. Load balancing is thus crucial, since imbalance greatly decreases overall system performance. CLUE, a CLUstering Environment tool, now provides automatic workload characterization and balancing. Using a series of parameters, CLUE uniquely identifies transactions with similar requirements in terms of resources. It then compresses the information without losing the essential data required for effective workload control. This has not previously been possible, which has been a barrier to using such information in dynamic routing and scheduling algorithms designed to balance network load automatically. CLUE then applies workload clustering algorithms, to gather transactions into different classes with similar resource demands and referencing behaviour.

TPsim is designed to simulate distributed database systems and on-line transaction processing (OLTP) systems, as used in large commercial or financial networks. Various parameters which describe the target system are defined by the user, as part of a simple high-level description of the system and its workload. This is then processed, to create and configure the simulation of the run-time environment. TPsim is written in C, on top of a simulation support library and a description language parser. The description language provides a mechanism to describe performance-related characteristics of system nodes, storage devices and communication media; data file allocation across the network; and the service requirements of different classes of transaction programs and users. It uses a hierarchical interconnection model for the system, where nodes can be grouped into local area network (LAN)-based clusters, which in turn can be interconnected through a hierarchy of back-bone networks. The parser uses primitives from the support library to initialize the run-time environment for the simulation, as well as the configuration of the model system supplied by the user. The simulation driver then runs user-supplied routines which emulate the system under test. The whole package offers a complete simulation environment that includes configuration of the test system, execution of testing steps and automatic collection of test data. TPsim libraries cover transaction routing, central processing unit (CPU) scheduling, input/output (I/O) scheduling, concurrency control, buffer management and logging algorithms.

Exploitable results

Share this page

Download