CES 713 - Assignment #1
Due 10 pm, Tuesday, January 22, 2013. Submit by email.

1. (problem from P. Pacheco, Parallel Programming with MPI)
A more accurate alternative to the trapezoidal rule is Simpson's rule. The basic idea is to approximate the graph of f(x) by arcs of parabolas rather than line segments. Suppose that p < q are real numbers, and let r be the midpoint of the segment [p,q]. If we let h = (q-p)/2, then the equation for the parabola passing through the points (p,f(p)),(r,f(r)), and (q,f(q)) is

Equation 1

If we integrate this from p to q, we get

Equation 2

Thus, if we use the same notation that we used in our discussion of the trapezoidal rule and assume that n, the number of subintervals of [a,b] , is even, we can approximate

Equation 3


a. a serial program and
b. a parallel program that uses Simpson's rule to estimate the integral of Equation 4
You can assume that the number of processes is less than n/2. However, do not assume that n/2 divided by the number of processes is an integer. Aim to assign the same amount of work to all processes, as far as that is reasonably possible.

2. A good random number generator has to pass strict quality tests. One family of tests involves checking whether certain ordered sequences of numbers are present in the output with the expected frequency. In this problem, you will write a serial and then a parallel program to perform a simple test of this kind.

Use a random number generator of your choice to generate a random sequence of integers 0 and 1. Let the sequence length be N. Now determine how many times a subsequence of n identical integers appears in it (for example, for n=4, you would be looking for subsequences 0000 and 1111).

(a) Write a serial program to perform this task.

(b) On the basis of the serial program, write a parallel program to perform this task.Use the divide and conquer algorithm, which in this case simply means having the master (zero) process generate the long random sequence, then parceling out the data to other processes for analysis. Use collective communications as much as possible to achieve efficiency in collecting and distributing the data. Remember to account for subsequences of 00's and 11's which straddle the data distributed to the processes (you do not need to use collective communications for this part). For simplicity, you can assume that N/p is an integer. You can also assume that n < N/p, which avoids the possibility that a subsequence you are looking for will be spread across more than 2 processes.

(c) Based on your results, briefly discuss whether your random number generator functions properly.

3. For this part of the assignment, you will study the performance of MPI communications routines, using two parallel computing clusters of your choice on SHARCNET. Try to select clusters with different architectures to study differences in performance.

(a) Measure the time to send an MPI message by using code segment of the form (in pseudocode):

Process 0 - master
MPI_Send(message, to process 1);
MPI_Recv(tworker, from process 1);
printf("Master time = %d", tmaster);
printf("Worker time = %d", tworker);

Process 1 - worker
MPI_Recv(message, from process 0)
MPI_Send(tworker, to Process 0)

Select a method for determining the time, make sure it is sufficiently precise. gettimeofday works pretty well but choose something else if you prefer.

Experiment with sending messages of different sizes to obtain a good estimate for the time of message transfers. Plot the time for sending a message against the size of the message and fit a line to the results.

(b) Repeat above for MPI_Bcast and MPI_Scatter routines, picking 2 values for the number of processes (for example 4 and 16). Study the average, minimum and maximum communication times among worker processes. Estimate how the communication time scales with the number of processes.

NOTE : On SHARCNET communication performance may be affected by the distribution of processes in the hardware, most importantly by whether they run on the same processor (in its different cores), or separate processors. Consult output of sqjobs to find out which nodes your processes are running on. Consult "man sqsub" to look at the options at the time of job submission which allow to keep better control over this. For plotting, use MATLAB, Excel, or any other tool you like. You don't have to write your own code to do the plot. In order to use more than 8 processors, you will need to have SHARCNET user level 1. If you still have level 0, please take the seminar (offered every Monday) to upgrade your level.