Asgard   -  Project Description
Clustersysteme der InformatikdiensteID-Clustersysteme ETH Logo

Parallelization of Low-Communication Processes

Researchers

Project Description

General Parallelization Aspects: Failsafe and Scalable Setup

Tapping the large amounts of idle time on numerous workstations and parallelizing large, but simply structured computational efforts is a cheap method of advancing to the limits of computationally feasible tasks. Known systems like PVM are able to function under quite general conditions, however, they achieve considerable complexity. The goal of this project at the Seminar for Applied Mathematics (SAM) of ETH was to parallelize an arbitrary number of workstations for distributed computations with low communication. In particular, we want to exploit the simple situation of a large computational effort that splits naturally into highly independent subtasks producing low data traffic. It was possible to come up with a small but robust system taking advantage of this simple, yet important situation. A typical class of problems of this type are exhaustive-search problems.

The following flexible and fail-safe concept has been realized on a network of Unix-based workstations. There is a varying set of independent processes running on (not necessarily) different machines: One master process and several client processes. The master assigns tasks to any connecting client, keeps track of the assigned tasks and processes the reported results. Each client tries to connect to the master, receives a task, disconnects and starts its computation. After successful execution, the result is reported back to the master and a new task is immediately received. The only fixed parameter of this system is the master's internet addres; the number of client processes and machines taking part can increase or decrease at any time without need for notifying a central instance. The system can survive almost any failure it may encounter: breakdown of a client process (or machine), breakdown of the entire network, even breakdown of the master process itself. If a critical breakdown (e.g. master's machine) lasts shorter than a certain amount of time (e.g. 2 hours), the system will recover without human interaction.

In any case, the central mechanism for managing the results is immune against all typical hardware or software breakdown situations and would even survive a (local) media failure on the master's file system. Furthermore, much care has been taken of using as little resources as possible. The clients are designed to use no filesystem at all, so even if a client fails or if its machine is shut down, there will be no trace left. The parallelization software uses exclusively the standard C libraries contained in every Unix installation. Network load is kept small since there is no permanent connection between the processes.

Current Application: Sieving Primes

Currently, we are applying our parallelization concept by running sieving techniques for locating and counting clusters of prime numbers. Whereas the distribution of primes seems to be fairly regular (if the Riemann hypothesis is true), the distribution of twin primes and longer clusters is largely unknown and characterized by large-scale anomalies. Efficient sieving techniques involving the Chinese remainder theorem are being used. At the upper end of currently feasible computations we can locate/count dense clusters of 18 primes in the range of 10^24. Locating the only expected repetition of the pattern of the primes from 13 to 83 (or its mirror image) in the range 10^24 truely amounts to finding the proverbial needle in the haystack. If the search - an estimated 20 months using the idle time of 20 workstations of SAM - is successful, we could claim a world record that wouldn't be easy to break. Dense clusters of 19 primes in the patterns of 13 ... 89, 37 ... 113 or their mirror images are not expected to repeat/occur below 10^26.

Technical Aspects

Software

The current implementation takes advantage of the mathematical software package PARI, written in C by the Bordeaux group C. Batut, K. Belabas, D. Bernardi, H. Cohen and M. Olivier, e-mail pari@math.u-bordeaux.fr. Features of the package are arbitrary-precision arithmetics with integer, real and complex numbers, many special functions, general symbolic computations, computational number theory and computational algebra; it may well be useful for other projects and for teaching. It is freely available from http://www.parigp-home.de/. Due to the advanced features of PARI the entire algorithm (without the parallelization) can be formulated with less than 30 lines of code. Near-optimum speed is achieved by implementing the innermost loop (the sieve involving only single-precision integers) in C.

Hardware

Currently we are using the following hardware:
Number of systemsname/type/op.systemrelative speed
4spinoza/intel/Linux200
5..9sophokles/Sun/SunOS100
6heron/Sun/SunOS50
4archimedes/Sun/SunOS20

As a prerequisite, PARI would have to be installed on each client machine or cluster (requiring disk space of some 10 MBytes). The processes running on the clients use little memory (<1 MByte) and are given the lowest possible priority. Tests on the SAM hardware have confirmed that the project does not disturb other users.

In order to speed up the search we are soliciting support from other hardware owners at ETH, e.g. during the summer break from departments with many student machines, or from the Beowulf cluster, etc.

Publications and Presentations

This is the start of the project. Therefore, no presentations were held and no publications were made until today.

Valid HTML 4.0! Valid CSS!