University of Arkansas at Little Rock
Operating Systems
CPSC 7321
Dr. Coskun Bayrak
Final Project
By
Frank McCown
November 20, 2001
The objective of this project was the development of a distributed system which could take a given problem and solve it by dividing it into smaller tasks which would then be solved by workers in parallel[1]. In order to accomplish this, the system was divided into three roles: the Task Master process, the Worker processes, and the Task Bag object. The Task Master process is responsible for the problem definition. It places sub-tasks into a “Task Bag” where autonomous Worker processes then pull tasks out of the Task Bag, complete the tasks, and return the answers to the Task Bag. The Task Master pulls solutions from the Task Bag and assembles the final result.
Any problem that can be divided for parallel implementation would suffice for this system. Examples include searching for a text string in multiple files, finding prime numbers, matrix multiplication, and computing fractal images. Matrix multiplication was chosen for the implementation of this project.
There were two remote method invocation architectures that were available for implementing this project: Java RMI and CORBA. Java RMI is ideal when all processes in the distributed system are implemented with only Java[2]. When another programming language is needed in the system, RMI is not possible. CORBA allows distributed systems to be built using other high level languages like C++ as long as there is an IDL mapping. Because of this, CORBA was chosen to implement the system. Although the Task Master and Workers are implemented with Java, other users of the system may implement Task Masters or Workers in other languages as well.
The CORBA architecture allows processes to move physical locations in very a transparent way. The Task Master and the Worker processes may reside on the same computer or on separate computers as long as the computers are connected over a network and the CORBA ORB is accessible to all processes. There is no code changes or configuration parameters needed to change the physical location of the processes.
There are several free implementations of CORBA ORBs available. The ORBacus ORB by Iona Technologies was chosen since it contained an IDL-to-Java converter and ample amounts of documentation and support if needed[3]. The ORBacus ORB also contains an Event Service which could be used instead of callbacks for event communication between the Task Master and Workers.
The Task Bag concept originated from the Linda system at Yale University where it was implemented as distributed shared memory[4] [5]. This project moves the Task Bag from distributed shared memory to a remote object. All Workers have equal access to the Task Bag object as they would if it were used as distributed memory.
The Common Object Request Broker Architecture (CORBA) is a middleware technology developed by the Object Management Group (OMG), a not-profit consortium of over 800 companies that develops open specifications for interoperable enterprise applications[6]. CORBA works very similar to Java RMI except the client and server may be implemented in any programming language for which there’s an IDL mapping. IDL (interface definition language) is a neutral declarative language that allows a programmer to specify the interface for a remote object. The IDL is mapped into the specific programming language used to implement the object. This mapping is called the skeleton. The client uses a stub, the mapping of IDL into the programming language used by the client.
Clients and servers communicate using an Object Request Broker (ORB). The ORB uses the Internet InterORB Protocol (IIOP) for communication. Actually many vendors of ORBs use proprietary protocols for communication, although to be CORBA compliant they must also use IIOP. This allows ORBs from different vendors to be compatible.
The Task Bag distributed system executes using a master/slave paradigm. The master process (Task Master) breaks the master tasks into equal, smaller sub-tasks and then allows the slaves (Workers) to solve the sub-tasks autonomously. The Task Bag is the repository for all work from the Task Master and all answers from the Workers. This separation hides the Task Master from directly working with the Workers. The Task Master behaves the same whether there is 1, 10, or 1000 Workers working on the sub-tasks.
Matrix multiplication was chosen for the implementation of this project since the Workers can do matrix multiplication operations in parallel. In order to compute the result for row R and column C, the Worker only needs to know the R row values from the first matrix and the C column values from the second matrix. The implemented system allows the user to enter integer values in two 4 by 4 matrices. The resulting matrix will also be 4 by 4. If the intention of the project had been to implement a useful matrix calculator instead of demonstrating the concepts of parallel computation, the system would be more useful to not put any restraints on the number of rows in columns in the source matrices.
The following examines the role of the Task Bag, Task Master, and Worker in more detail.
The Task Bag is an object that serves as a repository for Pairs. A Pair may contain a task description or data needed for performing a task. A Pair consists of two strings: a key and value. The key is any unique identifier for a value, for example a task name or number. The value is the actual task description or data.
There are three operations which may be performed on the Task Bag:
The following three methods can be used for performing these operations:
The Task Master process provides a GUI for entering values of two 4 by 4 matrices. The GUI also allows the user to see the results of the Workers as they finish their tasks. Each Worker is shown in a table along with their status (busy or not) and the problem they are working on. A result matrix is also visible which dynamically displays calculated results as they are returned by the Workers. The user may start and stop the computations and clear the result matrix using a Control Panel with 3 command buttons.
Figure 1 shows the Task Master application interface.
As Workers become available, their name is added to the Workers table. When Worker are busy solving a task, their status changes to “BUSY” and the task is displayed. As each Worker completes its task, its results are shown in the Results Matrix.
The Worker processes may be started on the same computer as the Task Master or on separate computers. There are no limits on the number of Workers that can work on tasks from the Task Bag. Each Worker will extract data from the Task Bag indicating the work that should be done and data from the Task Bag needed to do the computations. The result will also be sent back to the Task Bag. The Task Bag notifies workers when there are more tasks available. This requires the Workers to register themselves with the Task Bag when they startup and deregister when they shutdown.
Each Worker identifies itself using a unique name. This name is obtained by default using the computer’s host name. The user may also specify the name manually (useful when 2 workers are executing on the same computer) using the –n option when starting up a Worker.
When a Worker is busy with a task it will indicate the task name in its window. When it is not busy working a task, it will display “Not working”. Figure 2 shows the Worker application window.
There are three types of Pairs that will be used within the distributed system to communicate tasks, data, and results. They are Task Pairs, Data Pairs, and Result Pairs. Task Pairs indicate the next task to be done. Data Pairs indicate matrix data needed by the Workers to perform their operations, and Result Pairs are used to indicate the calculated results.
In order for the Worker to begin its work, there must be a Task Pair available in the Task Bag. The Task Master is responsible for placing the first Task Pair in the Task Bag. When the Task Bag receives a Task Pair, it will immediately send a notification to all free Workers that a new task is available. The first free Worker to respond will remove the Task Pair and replace it with the next Task Pair indicating the next task that needs to be done. If another free Worker also requests the Task Pair before the first Worker replaces it, it will simply fail to read the Task Pair and wait for another Task Pair notification from the Task Bag. In this way there will never be more than one Task Pair in the Task Bag. When there are no more tasks to be done (all matrix calculations have been performed), the last Worker will not replace the Task Pair. Since there will be no more Task Pairs, there will be no more work for Workers, and all processing will cease.
The key for every Task Pair will be “Next Task”. For matrix multiplication, the Task Pair value will consist of a value indicating the row and column of the result matrix needing to be computed. There is no need to enforce an ordering of these Task Pairs, so the implemented system places the tasks in from left to right, top to bottom.
Here is an example of typical pairIn/pairOut calls:
pairOut(“Next Task”, “1 1”) // call by Task Master to start computing row 1, column 1 of result matrix
task = pairIn(“Next Task”) // call by some free Worker
pairOut(“Next Task”, “1 2”) // call by same Worker to replace Task Pair
task = pairIn(“Next Task”) // call by another free Worker
pairOut(“Next Task”, “1 3”) // call by same Worker to replace Task Pair
…
task = pairIn(“Next Task”) // call by another Worker. If task received is “4 4”, it will not replace Task Pair.
In order for a Worker to perform a task, it needs to read the row and column information from the Task Bag. For example, if the task “1 3” is retrieved from the Task Bag, the Worker needs to know the values in row 1 from matrix A and the values from column 3 in matrix B. This entails the use of Data Pairs.
Initially the Task Master will supply all Data Pairs needed for the Workers to the Task Bag. Each Data Pair indicates the row from matrix A or column from matrix B and the values in that row or column. For example, suppose the user supplied the following matrices:
Matrix A = Matrix B =
The Task Master would supply the data like so:
pairOut(“A1”, “1 2 3 4”)
pairOut(“A2”, “5 6 7 8”)
…
pairOut(“B3”, “7 3 9 5”)
pairOut(“B4”, “6 2 8 4”)
When the worker receives a task like “1 3”, it will extract the needed data like so:
row = readPair(“A1”) // get row 1 from matrix A à “1 2 3 4”
col = readPair(“B3”) // get col 3 from matrix B à “7 3 9 5”
Notice the readPair was used instead of pairIn since the row and column data will need to be used by other Workers to compute other tasks.
Once the Data Pairs have been retrieved from the Task Bag, the Worker can then calculate a result. The answer then needs to be placed back into the Task Bag using a Result Pair. The Result Pair uses the result matrix row and column as the key and the calculated result as the value. For example:
pairOut(“1 3”, 56) // send the answer 56 to the Task Bag
When a Result Pair is received by the Task Bag, it notifies the Task Master of the new result. The Task Master will retrieve the result using a pairIn call to remove the answer from the Task Bag and display the result to its result matrix for the user to see.
There are two approaches that could have been used in implementing this project: polling and callbacks. Polling would require the Task Master to constantly poll the Task Bag for Result Pairs and Workers to poll for Task Pairs. Both the Task Master and Workers would continually poll after some specified amount of time, say 2 seconds, before polling again. Polling has several disadvantages. Not only does it waste network resources by making repeated calls to the Task Bag, it also slows down the Task Bag by forcing it to respond to unnecessary calls. Furthermore, there may be new Result Pairs or Task Pairs that arrive in the Task Bag between delays in the polling. This means there is action needing to be performed but no process aware of it. Decreasing the delay time reduces this problem but increases the previously mentioned problem of wasting network resources.
Callbacks allow the Task Master and Workers to be notified of new Result Pairs and Task Pairs when they arrive. This not only reduces network resources but ensures that Task Bag data will be retrieved as soon as computationally possible. The drawback with this approach lies in the implementation. The Task Bag must maintain information about which Workers are available for work, and in order not to bother busy Workers, it also needs to maintain “busy” state information for each Worker. When Workers are made available or cease processing, the Task Bag also needs to be notified. This increases the overhead necessary for the Task Bag.
When comparing the advantages and disadvantages to polling and callbacks, it can be argued that although callbacks require more overhead, the savings on network resources (which are typically in much shorter supply) make callbacks the preferred implementation.
Another type of callback-style event handling is the use of the CORBA Event Services[7]. Event Services allow consumers/producers to consume and produce events. The Task Master could register with the service as a consumer for Result Pair events. When the producer, the Task Bag, receives a Result Pair, it would produce an event, and the service would send it to the Task Master.
Figure 3 shows a high-level view of the Task Bag system. The Task Master is responsible for placing Data Pairs and the first Task Pair into the Task Bag. Workers may then extract Task Pairs and Data Pairs from the Task Bag and place Result Pairs and Task Pairs back into the Task Bag. There may be any number of Workers in the system.
Figure 3 – Relationship between Task Master, Task Bag, and Workers.
Figure 4 – Typical ordering of events.
Figure 4 shows the order of events between the Task Master, Task Bag, and a single Worker. Both the Task Master and Task Bag must register for notification. These events may happen at the same time and are therefore given the same order number. It should be noted that the Task Master is responsible for creating the Task Bag. It must register with the Task Bag to receive notifications of Result Pairs, but there is no need to deregister because when the Task Master process goes down, so will the Task Bag.
The Worker will not necessarily deregister after producing a Result Pair. A continually active Worker will continue to receive Task Pair notifications, obtains Data Pairs, and produce Result Pairs until the computations are finished. The Worker process will deregister when it is terminated.
These are the Java classes that make up the Task Master, Task Bag, and Workers.
TaskMaster – Displays the GUI, registers the Task Bag object with CORBA, registers Task Bag with Naming Service, registers for notification (using callbacks) with the Task Bag, and performs matrix multiplication tasks.
TaskMasterFrame – Actual GUI implementation consisting of ControlPanel, 3 MatrixPanels, and WorkerPanel.
ControlPanel – Displays control panel with 3 command buttons.
MatrixPanel – Displays the three matrix table and identifiers.
MatrixTable – Table implementation of matrices.
WorkerPanel – Displays worker table.
TaskMasterControllerCallbacksImpl – Implementation of remote TaskMasterControllerCallbacks object for receiving notifications of Result Pairs and Worker (de)registrations.
TaskBagImpl – Implementation of remote TaskBag object. Maintains a hash for storing Worker notification objects and for storing Pairs.
Worker – Displays Worker window, obtains reference to Task Bag remote object using Naming Service, registers for notification (using callbacks) with the Task Bag,
TaskReadyCallbackImpl – Implementation of remote TaskReadyCallback object for receiving notifications of Task Pairs.
project.conf – Properties file needed by the ORB to determine location of Naming Service and port. Used by TaskMaster, Worker, and Naming Service.
The TaskBag.idl file below was used to define the Task Bag interface as well as the interfaces for the remote callback objects to be used by the Task Bag to notify Task Master and Worker processes of events.
module tasks
{
interface TaskReadyCallback
{
oneway void taskReady();
};
interface TaskMasterControllerCallbacks
{
oneway void taskStarted(in string workerName, in string task);
oneway void taskFinished(in string workerName, in string task);
oneway void addWorker(in string workerName);
oneway void removeWorker(in string workerName);
};
interface TaskBag
{
oneway void pairOut(in string key, in string value);
oneway void pairOutTask(in string key, in string value);
oneway void pairOutResult(in string key, in string value, in string workerName);
string pairIn(in string key);
string pairInTask(in string key, in string workerName);
string readPair(in string key);
void clear();
void registerTaskMasterCallbacks(in TaskMasterControllerCallbacks callback);
void addWorker(in string workerName, in TaskReadyCallback callback);
void removeWorker(in string workerName);
};
};
The “oneway” keyword was used for all non-blocking methods. That is, methods that don’t need to return any values and allow the caller to continue processing without waiting for the method to finish. This is necessary in systems where two processes act as both client and server, like in the case of using callbacks. Since the Task Bag must make a taskReady call to the Worker and the Worker must then make a pairIn call to the Task Bag, the taskReady call must be oneway or the call to pairIn will hang since the Task Bag is still waiting for the taskReady call to complete. Another way of getting around this problem is to make the Task Bag and Worker multithreaded where each thread would handle the separate calls.
Notice there are three types of pairOut methods and two types of pairIn methods.
The pairOut method is used for Data Pairs. The pairOutTask and pairInTask methods are used for Task Pairs. pairOutResult is used for Result Pairs. The need for different types of pairOut methods is because of the notification system employed. A new Task Pair means that all Workers should be notified of new tasks. New Result Pairs require notification to the Task Master so the results can be obtained. The reason an extra workerName parameter is added to pairOutTask and pairInTask is so the Task Master can update its worker table when a worker starts a task and is finished with a task.
The clear method is provided to remove all Pairs from the Task Bag. The registerTaskMasterCallbacks, addWorker, and removeWorker methods are provided for event notification registration purposes.
The Task Master and Workers needs the JDK 1.3, ORBacus for Java, and the project files to be installed before the system can be executed.
The JDK 1.3 can be obtained from http://java.sun.com. Make sure C:\<java_home>\bin is in your PATH environment variable so the java command can be executed from any directory location on the computer.
ORBacus can be downloaded from http://www.orbacus.com or may be installed from the orbacus.zip file provided. After you have installed ORBacus or unzipped the files, make sure C:\<orbacus_home>\lib\OB.jar, OBNaming.jar, and OBUtil.jar are included in the CLASSPATH environment variable.
Unzip the project zip file TaskBag.zip in a directory called C:\ualr\TaskBag or any directory name of your choosing.
In order to run the system, the processes must be started in this order:
The Naming Service is a process used by CORBA to determine where remote objects are located. Because it is used by the Task Master and Workers to find the Task Bag remote object, it must be running before starting those processes.
Use this command to start the ORBacus Name Service in Windows:
start java com.ooc.CosNaming.Server -ORBconfig
project.conf
This will launch the service in a console window. The project.conf file is needed by the Naming Service to determine what computer the service is running on and which port to be listening.
The line
ooc.orb.service.NameService=corbaloc::fmccown:5000/NameService
in project.conf must be changed to make the Naming Service work on the installed computer. Simply change “fmccown” to the host computer name the Name Service is started on.
On the same computer the Naming Service was started, open a command-line widow and move to the C:\ualr\TaskBag directory. Start the Task Master in Windows with the command:
java TaskMaster
This will launch the Task Master window. There will be no Workers in the workers table.
On the same computer as the TaskMaster or on a different computer, open a command-line widow and move to the C:\ualr\TaskBag directory where you unzipped the project zip file. Start the Worker with the command:
java Worker
or
java
Worker –n WorkerName
in order to explicitly set the Worker’s name.
If the Worker is not started after the Task Master, it will not be able to access the Task Bag. If it is working properly, the name of the Worker should appear in the worker table in the Task Master window.
As each Worker is started, its name will appear in the workers table of the Task Master window. Once there is at least one worker available, click the “Add Tasks” button to start the computations. When the results matrix is complete, the values may be erased from the matrix by clicking the “Clear Results Matrix” button. Then the tasks may be started again by pressing the “Add Tasks” button. If the “Remove Tasks” button is pressed while computations are taking place, the Task Bag will be emptied of all Pairs, and the processing will cease. It can be restarted by pressing “Add Tasks”.
When Workers are shutdown, their names will be removed from the worker table.
The Task Bag distributed System performs parallel execution of matrix multiplication using the slave/worker paradigm introduced in the Linda distributed system. Although this system performs matrix multiplication, it would be very straightforward to use the same architecture to perform other types of parallel execution. The implementation in CORBA hides the networking complexities of remote method invocation and allows the slave/worker processes to be built in other languages besides Java.
[1] http://www.cdk3.net/ig/projects, Coulouris, et. al. Distributed Systems: Concepts and Design, 3rd Edition, Addison-Wesley, Pearson Education 2001.
[2] Silberschatz, et. al., Applied Operating Systems Concepts, 1st Edition, 2000, pp.507-517.
[3] http://www.orbacus.com
[4] Ahuja, et. al., Linda and Friends, IEEE Computer, August 1986, pp. 26-34.
[5] Carriero and Gelertner, How to Write Parallel Programs: A Guide to the Perplexed, ACM Computing Surveys, Vol. 21, No. 3, Sept. 1989.
[6] http://www.omg.org
[7] Ibid.