Parallelization of Low-Communication Processes
Researchers
Project Description
General Parallelization Aspects: Failsafe and Scalable Setup
Tapping the large amounts of idle time on numerous workstations and
parallelizing large, but simply structured computational efforts is
a cheap method of advancing to the limits of computationally feasible
tasks. Known systems like PVM are able to function under quite general
conditions, however, they achieve considerable complexity. The goal
of this project at the Seminar
for Applied Mathematics (SAM) of ETH
was to parallelize an arbitrary number of workstations for distributed
computations with low communication. In particular, we want to exploit the
simple situation of a large computational effort that splits naturally into
highly independent subtasks producing low data traffic. It was possible
to come up with a small but robust system taking advantage of this simple,
yet important situation. A typical class of problems of this type are
exhaustive-search problems.
The following flexible and fail-safe concept has been realized on a network
of Unix-based workstations. There is a varying set of independent processes
running on (not necessarily) different machines: One master process and
several client processes. The master assigns tasks to any connecting client,
keeps track of the assigned tasks and processes the reported results. Each
client tries to connect to the master, receives a task, disconnects and
starts its computation. After successful execution, the result is reported
back to the master and a new task is immediately received. The only fixed
parameter of this system is the master's internet addres; the number of
client processes and machines taking part can increase or decrease at any
time without need for notifying a central instance. The system can survive
almost any failure it may encounter: breakdown of a client process (or
machine), breakdown of the entire network, even breakdown of the master
process itself. If a critical breakdown (e.g. master's machine) lasts shorter
than a certain amount of time (e.g. 2 hours), the system will recover without
human interaction.
In any case, the central mechanism for managing the results is immune against
all typical hardware or software breakdown situations and would even survive
a (local) media failure on the master's file system. Furthermore, much care
has been taken of using as little resources as possible. The clients are
designed to use no filesystem at all, so even if a client fails or if its
machine is shut down, there will be no trace left. The parallelization
software uses exclusively the standard C libraries contained in every Unix
installation. Network load is kept small since there is no permanent
connection between the processes.
Current Application: Sieving Primes
Currently, we are applying our parallelization concept by running sieving
techniques for locating and counting clusters of prime numbers. Whereas the
distribution of primes seems to be fairly regular (if the Riemann hypothesis
is true), the distribution of twin primes and longer clusters is largely
unknown and characterized by large-scale anomalies. Efficient sieving
techniques involving the Chinese remainder theorem are being used. At the
upper end of currently feasible computations we can locate/count dense
clusters of 18 primes in the range of 10^24. Locating the only expected
repetition of the pattern of the primes from 13 to 83 (or its mirror image)
in the range 10^24 truely amounts to finding the proverbial needle in the
haystack. If the search - an estimated 20 months using the idle time of 20
workstations of SAM - is successful, we could claim a world record that
wouldn't be easy to break. Dense clusters of 19 primes in the patterns of
13 ... 89, 37 ... 113 or their mirror images are not expected to
repeat/occur below 10^26.
Technical Aspects
Software
The current implementation takes advantage of the mathematical
software package PARI,
written in C by the Bordeaux group C. Batut, K. Belabas, D. Bernardi,
H. Cohen and M. Olivier, e-mail
pari@math.u-bordeaux.fr. Features
of the package are arbitrary-precision arithmetics with integer, real and
complex numbers, many special functions, general symbolic computations,
computational number theory and computational algebra; it may well be
useful for other projects and for teaching. It is freely available from
http://www.parigp-home.de/. Due
to the advanced features of PARI the entire
algorithm (without the parallelization) can be formulated with less than 30
lines of code. Near-optimum speed is achieved by implementing the innermost
loop (the sieve involving only single-precision integers) in C.
Hardware
Currently we are using the following hardware:
| Number of systems | name/type/op.system | relative speed |
| 4 | spinoza/intel/Linux | 200 |
| 5..9 | sophokles/Sun/SunOS | 100 |
| 6 | heron/Sun/SunOS | 50 |
| 4 | archimedes/Sun/SunOS | 20 |
As a prerequisite, PARI would have to be installed on each client machine or
cluster (requiring disk space of some 10 MBytes). The processes running on
the clients use little memory (<1 MByte) and are given the lowest possible
priority. Tests on the SAM hardware have confirmed that the project does not
disturb other users.
In order to speed up the search we are soliciting support from other
hardware owners at ETH, e.g. during the summer break from departments with
many student machines, or from the Beowulf cluster, etc.
Publications and Presentations
This is the start of the project. Therefore, no presentations were
held and no publications were made until today.
|