Week 8: More Coding
April 28, 2023
Hi everyone! This week I continued working on MPI communication.
An update about all the errors with using MPI last week… Good news and bad news! I’ve figured out how to get it running on Google Colab with two cores, but after building Open MPI from the source and redownloading everything, I still get the same error when running my code in terminal and VSCode. I’ll probably try using a different laptop that runs Linux since I just realized 2 cores is not enough. For example, if I want to divide the channel into 2 subsections, you would think (or I originally thought) 2 cores would be enough, but I also need one master node – that makes 3 cores minimum. If I were using 2 cores, I could only afford to divide the channel into 1 section, which doesn’t do anything…
But onto the code!
These are the parameters for my parallelization method. nrank is the number of processes, I’m using 2 just so I can use Colab to test run my code. x_sub0 is the size of the subsection without the overlap region (in green). x_sub is x_sub0 + 2 because there are two columns to the left and right that are “ghost nodes”.
The two main methods for communication are send_f and send_overlap.
send_f sends the green part of the diagram from the master node, seen in the if statement as rank 0, to the subprocesses. The second if statement is for the subprocesses receiving data. It knows what section of the channel to send where because of the line x0 = (r-1)*x_sub0, which specifies the x coordinate based on which rank it is about to be sent to. In the comm.Send line, dest=r means data is being sent to destination node r. In the comm.Recv line, source=0 because they’re always receiving data from the master node.
send_overlap sends the blue part of the diagram. It’s pretty much the same, but with a few extra lines because we specify which column is being sent in what direction. edgeR and edgeL are the right and left edges, to be sent to rank+1 and rank-1 respectively.
For both these methods, I’m using comm.Send and comm.Recv and not their counterparts comm.send and comm.recv. The former is for buffer-like objects whereas the latter is pickle-based communication. I chose to use Send and Recv because it works better for the type of data that’s being communicated, NumPy arrays. This method parallel_comm ties it all together by calling the two methods.
Finally, I cleaned up my long block of spaghetti code for the body of my Lattice Boltzmann simulation. The steps are now all divided into separate methods, so the main method lbm_step looks a lot cleaner. I won’t show the specifics of the methods, but the important part is the order of the steps: collision, communication, stream, calculate fluid variables.
Now that most of my code is complete, I have to tie it all together and get it running on something with more than 2 cores. And fix the bugs that will definitely come up
Thanks for reading, see you next week!