Parallelization of Low-Communication Processes
1. Project Description
1.1. General Parallelization
Tapping the large amounts of idle time of numerous workstations and
parallelizing large, but simply structured computational efforts is
a cheap method of advancing to the limits of computationally feasible
tasks. Known systems like PVM are able to function under quite general
conditions, however, they achieve considerable complexity. The goal
of this project at the Seminar for Applied Mathematics (SAM) of ETH
is to parallelize an arbitrary number of workstations for distributed
computations with low communication. In particular, we want to exploit
the simple situation of a large computational effort that naturally splits
into highly independent subtasks producing low data traffic. It was
possible to come up with a small but robust system taking advantage of
this simple, yet important situation. A typical class of problems of
this type are exhaustive-search problems.
The following flexible and fail-safe concept has been implementeded on a
network of Unix-based workstations, including clusters of workstations like
the Asgard cluster of ETH. There is a varying set of independent processes
running on (not necessarily) different machines: One master process and
several client processes. The master assigns tasks to any connecting
client, keeps track of the assigned tasks and processes the reported
results. Each client tries to connect to the master, receives a task,
disconnects and starts its computation. After completion of a task the
result is reported back to the master, and a new task is immediately
assigned to the client.
The only fixed parameter of this system is the master's internet address;
the number of client processes and machines taking part can increase or
decrease at any time without the need of notifying a central instance. The
system can survive almost any failure it may encounter: breakdown of a
client process (or machine), breakdown of the entire network, even the
breakdown of the master process itself. If a critical breakdown (e.g. the
master's machine) lasts shorter than, say, 2 hours the system will recover
without human interaction. In any case, the central mechanism for managing
the results is immune against all typical hardware or software breakdown
situations and would even survive a (local) media failure on the master's
In addition, much care has been taken of using as little resources as
possible. The clients are designed to use no file system at all, so even
if a client fails or if its machine is shut down, there will be no trace
left. The parallelization software exclusively uses the standard C++
libraries contained in every Unix installation. Network load is kept
small since there is no permanent connection between the processes.
Description of the Software: Poor Man's
1.3. Current Application: Clusters of Primes
Currently, we are applying our parallelization concept to an algorithm
involving sieving techniques for locating and counting clusters of
prime numbers. Whereas the distribution of primes seems to be fairly
regular (if the Riemann hypothesis is true), the distribution of twin
primes and longer clusters is largely unknown and is characterized by
large-scale anomalies. Collecting experimental data on these anomalies
is one of the reasons for the interest in clusters of primes.
Onother challenge of finding clusters of prime numbers is the unproven
prime-k-tuple hypothesis: Any pattern of integers that is not forbidden
by simple divisibility considerations (such as, e.g., the pattern
c=[x, x+2, x+4] -- at least one of its elements is divisible by 3) occurs
infinitely often in the sequence of primes. A proof is currently out of
reach; not even for the simplest case, the twin prime hypothesis, a
proof is in sight. In contrast, the observed average densities of
prime-k-tuples in the accessible range are in perfect agreement with the
densities conjectured by Hardy and Littlewood  in 1922:
where Hc is the Hardy-Littlewood
constant associated with the constellation c, and |c| is the number
of elements in the constellation c.
/ ((log x)|c|
As an example, consider the pattern c=[x, x+4, x+6,..., x+60, x+66, x+70],
consisting of 18 elements, such that for x=13 the sequence of consecutive
primes beginning at 13 and ending at 83 is obtained. For the corresponding
Hardy-Littlewood constant the value Hc=6723654.312 is obtained. Equ. (1)
implies that the expected number of occurrences of the pattern c (as
well as of its inverse) in a large interval near a much larger x
is given by Hc / (log x)18. Integration of this average
density yields the following expected frequencies HL(x) (referred to as
the Hardy-Littlewood count, HL count for short):
Therefore, one could ''hope`` that the above prime pattern c would
repeat for some x in the range of 3e24, and also that its mirror image
occurs in the same range. One could even be lucky to have occurrences
for smaller values of x, but also, with bad luck, the search might
have to be pushed much higher.
The basic ingredients in our search algorithm are the Chinese remainder
theorem to exclude divisibility by small primes (e.g. p <= 53), sieving
techniques to exclude divisibility by intermediate primes (53 < p <= 641),
and a probabilistic primality test (Miller-Rabin) for the remaining
large primes p>641. Finally, candidates for new clusters of large
primes are ''hardened`` by rigorous primality proofs. An account of
a similar algorithm was given by Tony Forbes in .
Searching the natural numbers up to 3e24 for clusters of 18 primes among
71 consecutive integers turned out to be at the upper end of currently
feasible computations. Locating the two expected occurrences truely
amounts to finding the proverbial needle in the haystack. To get
an idea of the sheer size of the number 3e24: It is in the order of
magnitude of the total number of cells in all living humans on Earth.
Immagine to look for a particular subset of 100 cells within this set!
Preliminary experiments with the idle time of 20 workstations of the
Seminar for Applied Mathematics SAM were carried out for testing purposes.
The time necessary in this setting was estimated as 2 years -- with luck,
otherwise it could be 5 years as well. If the search is successful
within a reasonable timspan, we could claim a world record that wouldn't
be easy to break. Dense clusters of 19 primes in the patterns of [13...89],
[37...113] or their mirror images are expected to repeat/occur only in the
range of 1e26.
The current implementation takes advantage of the mathematical software
package PARI, written in C++ by the Bordeaux group C. Batut, K. Belabas, D.
Bernardi, H. Cohen and M. Olivier, e-mail email@example.com. Features
of the package are arbitrary-precision arithmetics with integer, real and
complex numbers, many special functions, general symbolic computations,
computational number theory and computational algebra; it may well be
useful for other projects and for teaching. It is available for free from
. Due to the advanced features of PARI the entire algorithm (without
the parallelization) can be formulated with less than 30 lines of code.
Near-optimum speed is achieved by implementing the innermost loop (the
sieve involving only single-precision integers) in C++.
As a prerequisite for our implementation, PARI must be installed on each
client machine or cluster (requiring disk space of some 10 MBytes). The
processes running on the clients use little memory (<1 MByte) and are
given the lowest possible priority. Tests on 20 workstations of SAM
have confirmed that the project will not noticeably disturb other users.
2. Results and Future Projects
The project was given the idle time of 432 processors, i.e. the idle
time of 90% of Asgard's processors. The authors express their gratitude
to the steering committee and to the Asgard operator team for this
generous allotment and for the continuous support. Since our project uses
little memory and is always running on lowest priority (nice +19), it
does not noticeably channel off resources from other users: from a busy
processor our project is getting at most 6% CPU time.
In the past 12 months the project was using an estimated portion of
6% to 30% of Asgard's capacity. The high turn-out was mainly achieved
shortly before shut-downs of Asgard, when most users had withdrawn their
processes. Due to the robustness of our algorithms we were able to have
our processes stopped by the operator -- without any loss of data. The
restart after the recovery of the system is a matter of a few minutes.
During the upgrading (including renaming of the nodes) of September 2000
and during several periods of hardware problems in 2001 (February,
May/June) the robustness was also essential: there were no losses except
for the time lost when the machine was down.
All clusters of primes considered so far are repetitions of patterns of
consecutive small primes or the mirror images of such patterns. Clusters
repeating the pattern of the sequence of primes
pn, pn+1, ..., pm
will be said to be clusters of Pattern [pn...pm], e.g.
[227,229,233,239,241] is of Pattern [17...31]; an inverse constellation,
e.g., [1289,1291,1297,1301,1303] will be said to be of Pattern [-31..-17],
thus introducing the obvious concept of negative primes.
- We approched our ''prime`` target, the clusters of 18 primes among 71
consecutive integers, by searching blocks of size 1e24. 432 processors of
the Beowulf Cluster and 20 workstations of SAM were involved. The search
has been completed up to 2.9999949836e24. Results:
Pattern Block Begin End Success Initial element HLcount
-83..-13 1 3. 8.-19. 9.2000
-83..-13 2 19. 9.-26.10.2000
-83..-13 3 29.10.-20.11.2000 13.11,12:24 2845372542509911868266807 0.880
13...83 1 19.12.-23. 1.2001
13...83 2 23. 1.-27. 2.2001 31.1.2001 1906230835046648293290043 0.673
13...83 3 27. 2.-26. 3.2001
- Minimal clusters of 17 primes among 67 consecutive integers can exist
in the patterns [13...79], [17...83] or in the mirror images of these
patterns. We searched the range up to 3.3335450206e23 for all 4 patterns.
As the table below shows, the distribution is very uneven, e.g., one of the
patterns does not occur at all. However, the total number (18) of tuples
present agrees fairly well with the expected number of 19.256. Among these
clusters (listed below by their initial elements), 12 are new, 4 have been
discovered earlier (see below), and 2 are in the range of small primes.
Pattern [ 13...79], 26. 3.-30. 4.2001, 6 tuples, HLcount = 4.316
Pattern [-79..-13], 2. 5.-29. 6.2001, 4 tuples, HLcount = 4.316
1620784518619319025971 J. Waldvogel 1997, Tony Forbes 1998
2639154464612254121531 J. Waldvogel 1998, Tony Forbes 1998
3259125690557440336631 Tony Forbes 1998, J. Waldvogel 1998
Pattern [ 17...83], 17. 8.-25.10.2001, 8 tuples, HLcount = 5.312
53947453971035573715707 Tony Forbes 1998
Pattern [-83..-17], 2. 7.-17. 8.2001, 0 tuples, HLcount = 5.312
- In order to demonstrate the capability of our algorithm to handle even
larger numbers, we screened the interval [1e30 - 1e20 , 1e30 + 99e20]
for 14-tuples in the pattern c=[11...61]. With the corresponding HL
constant Hc = 50975.35252 we expect about
50975.35252 * 1e22 / (log(1e30))^14 = 9.05
such 14-tuples in the above interval; actually 12 are present, where
the first one happens to be a 15-tuple in the pattern of [11...67]
(we also indicate the difference patterns of consecutive primes):
999999999900000000000000000000 = 1e30 - 1e20, lower search limit
1 999999999962618227626700812281 114 2 4 2 4 6 2 6 4 2 4 6 6 2 6 30
2 1000000001044178961179268851051 22 2 4 2 4 6 2 6 4 2 4 6 6 2 18
3 1000000001544051614464292419601 162 2 4 2 4 6 2 6 4 2 4 6 6 2 126
4 1000000001553601074663653211311 52 2 4 2 4 6 2 6 4 2 4 6 6 2 18
5 1000000001772437688818681781011 48 2 4 2 4 6 2 6 4 2 4 6 6 2 82
6 1000000003068759599025980926181 54 2 4 2 4 6 2 6 4 2 4 6 6 2 112
7 1000000004930964950164522054901 112 2 4 2 4 6 2 6 4 2 4 6 6 2 60
8 1000000005644941246959007679801 22 2 4 2 4 6 2 6 4 2 4 6 6 2 106
9 1000000005832631360266813468481 52 2 4 2 4 6 2 6 4 2 4 6 6 2 78
10 1000000006672161724368529625351 82 2 4 2 4 6 2 6 4 2 4 6 6 2 112
11 1000000007541367760266886291861 150 2 4 2 4 6 2 6 4 2 4 6 6 2 286
12 1000000008282508019026959814211 240 2 4 2 4 6 2 6 4 2 4 6 6 2 148
1000000009900000000000000000000 = 1e30 + 99e20, upper search limit
Prime numbers have been one of the favorite objects of research of
classical mathematics, and various computations involving primes have
been done already by Euclid. With the advent of modern computing
machines the size of the accessible numbers has dramatically increased,
and ever more impressive -- though mostly futile -- results have been
obtained. Only with the rise of modern cryptology  primes had grown
up from objects mainly existing in the brains of mathematicians to
real-world objects with eminently important applications.
This is only a partial explanation for the ongoing quest for all kinds
of prime number records. The new book of prime number records by
Paulo Ribenboim  has 541 (the 100th prime!) pages and is in its third
edition. Among hundreds of records there is the largest known prime,
the largest known value of , the largest known maximal gap, etc.,
and last not least, the longest known dense cluster of large primes.
The discovery of a dense cluster of 18 primes in the range of 3e24 on
November 13, 2000 received immediate coverage in the web journal of ETH
(ethlife, December 6, 2000, Reference ). The news about this
computation, being 30 to 100 times harder than previous computations in
the same field, were also announced in the number theory press, e.g. .
Dense clusters of primes receive particular attention on the website ,
continuously actualized by Tony Forbes. The successes of our implementation
bear the danger of monopolizing this site and taking away all the fun!
2.3. Future projects
The main property of computations involving prime numbers is that they
can go on forever. This is not our goal, however. It was mentioned that
finding a large cluster of 19 primes (among 77 consecutive integers)
is by far harder than the 18er, perhaps 30 times the effort. Therefore,
this is out of reach for the current implementation.
If the idle time of Asgard is still available to our project in the
current rate, we plan to further exploit the excellent performance
of our system in the directions mentioned below. The computation
time for each item is expected to be in the order of months.
- Generate a fair number (30..100) of 16ers, Patterns [11...71] and
[-71..-11], in order to investigate the regularity/irregularities
of the distribution of this instance of a long dense cluster.
- Search for exotic long consellations, such as 9 twin primes within
105 consecutive integers in order to support the prime-k-tuple
- Search for exotic shorter patterns, e.g. 10 consecutive primes
with mutual difference 210 to support the hypothesis that in some
range of the natural numbers 210 is the most abundant difference
of consecutive primes.
 Tony Forbes: Prime clusters and Cunningham chains. Math. of Comp. 68,
 G.H. Hardy and J.E. Littlewood: Some problems of Partitio Numerorum III.
Acta Math. 44, 1922, 1-70.
 Paulo Ribenboim: The New Book of Prime Number Records, 3rd ed.
Springer 1996, 541 pp.
 R.L. Rivest, A. Shamir, L. Adleman: A method for obtaining digital
signatures and public-key cryptosystems. Communications of the
ACM 21, 1978, 121-126.
Die tägliche Webzeitung der ETH. 6. Dezember 2000, Archiv.
ETH-Mathematiker stellen Weltrekord auf. Prozessoren malochten
 Tony Forbes' prime-k-tuple website
 Website of the number-theoretic software package PARI (freeware)
 Number theory news