HPL with OpenMPI Stalls – on BEOWULF Cluster with GIGABIT Ethernet

Well, let me explain the situation first.

Situation : I have a Beowulf cluster with two compute node say node-1[192.168.2.10] and node-2[192.168.2.11]. Among them node-1 is acting as nfs server – providing common home directory for both the compute node. The NIS server is also on node-1  allowing uniform login and authentication on all compute node.

                   Both the compute node are connected to each other with Gigabit Ethernet switch. This is how my Beowulf cluster is organized. I have installed OpenMPI-1.4.3 on on both node at /usr/local location. Environment variables are properly set in .bashrc file of the common login ID in cluster for runtime linking and loading of libraries.

The compilation of HPL is done with following libraries and software:

1) hpl-2.0.tar.gz

2) GotoBLAS2-1.13_bsd.tar.gz

3) OpenMPI-1.4.3

4) gcc – 4.1.2-42

5) Cent OS 5.2

 To run mpi program I design hostfile[ hostfile.txt ] as follows.

192.168.2.10 slots=8

192.168.2.11 slots=8

where both the system[ node-1, node-2] is dual socket quad core system.

For simple testing purpose while I run a simple hello world program using following command

mpiexec -np 16 -hostfile hostfile.txt ./helloWorldMPI

it runs perfectly. But while I run HPL with following command

mpiexec -np 16 -hostfile hostfile.txt ./xhpl

It simply stalls. Which means I can see 34% to 40% use of each of the core on each of the system [ using top command ] – but it continues that way and no output comes ever on screen – that is too even for small problem size say N=200 . It run forever till I feel like killing the job.

  But instead of using OpenMPI if I use MPICH-2 with same system configuration to run HPL, then computation completes – output comes in perfect form.

Solution : As usual I have gone through lot of materials and forums related to OpenMPI and HPL. Some of them are listed bellow. One of them talks about HPL_NO_MPI_DATATYPE [ see the thread ]

I have tried this , with no help. After a lot of trial-and-error,  I zeroed in at conclusion that as  each of the compute node of my cluster is having multiple Ethernet connectivity port [ eth0,eth1,eth2 ] , the OpenMPI is confused at the time of HPL-MPI communication[ send/receive ] about – which port to use for packet transfer. This understanding takes me to  Mailing List Archives  which talks about MCA flag “btl_tcp_if_include eth0”. So I decided to give it a try – and surprisngly this solved my problem. The final command is as follows

mpiexec  -mca btl_tcp_if_include eth0   -np 16  -hostfile hostfile.txt   ./xhpl

AHH …..Ah….what a relief … thank God .. after such long irritating trial-and-error – I am relaxed now :D.  As I found almost no citation of this issue on web I decided to write it down for us. I hope this will help Others too…

Advertisements
Tagged with: , , , , , , , , , , , , , , , , ,
Posted in computer, debug, install, Passion, Science, troubleshoot, Uncategorized

JASMIN REVOLUTION aginst CORRUPT POLITICIANS – LOKPAL BILL india 2011

Its been 42 years , India is starving for a bill like Lokpal BIll. Now when Anna Hazare fights for it , spineless politicians of our country says its a premature step toward bringing this bill to country.

SHAME ON THEIR ATTITUDE.

I believe this time it is not fight between corruption and people at first step. Actually this is fight between a creature called Politicians and fearless people of our country.

I am waiting for JASMIN REVOLUTION against politicians In India this time.

[ Note – 1) consider all politician in same category. all of them did nothing – in order to keep them immune from accountability.
2) Even after 40 years if politicians says “we need to discuss” , “we are in the process” , “we have taken numerous steps towards it” — It proves that they are lying – hence they are lairs as they were. ]

Tagged with: , , , , , , , , , , , , , , , ,
Posted in latest news, Story, Uncategorized, update

GDB – libexpat.so.1 not found : solved

Here again another post about CentOS 5.5. Today morning while I was trying to start gdb with default compiler gcc 4.1.2 which comes with CentOS 5.5 – it crashed with following descriptive error :

gdb: error while loading shared libraries: libexpat.so.1: cannot open shared object file: No such file or directory

After a 10 min search I found expat rpm for Cent OS 5.5 from

I downloaded and try to install it with “rpm -ivh “ command but it failed because of expact-devel dependancy.

Then I used command : yum install expat-1.95.8-8.3.el5_5.3.i386.rpm

It went fine. But when I tried to start gdb the same problem persists. I found “libexpat.so” in “/usr/lib ” of my system . But no “libexpat.so.1” there .

Then I did simple tricks with ln command : ln -s /usr/lib/libexpat.so /usr/lib/libexpat.so.1

wow … now its like Makhan[ butter].. Problem is solved .. I hope this will help you too…

Tagged with: , , , , , , , , , , , , , , , , , ,
Posted in computer, debug, install, Passion, Science, troubleshoot, Uncategorized

Installing QT : GLIBCXX_3.4.9 not found solved

I was trying to install latest version of qt on CentOS 5.4 , It went smooth.  But at the end of installation there was a message saying that all the application may not run  because of some library dependency. I ignored that message and decided to proceed. But unfortunately  when I try to lunch qt creator from shell , it failed because of libstdc++ library dependency with following message.

Failed to load core: /opt/qtsdk-2010/lib/qtcreator/plugins/Nokia/libCore.so: Cannot load library /opt/qtsdk-2010/lib/qtcreator/plugins/Nokia/libCore.so: (/usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.9‘ not found (required by /opt/qtsdk-2010/lib/qtcreator/plugins/Nokia/../../libBotan.so.1))

I explored and found that my system is having libstdc++.so.6.0.8 which comes with base compiler of the system [ i.e. gcc 4.1.2]   and it does not contain the symbol “GLIBCXX_3.4.9”.

$ readelf   -s libstdc++.so.6.0.8 | grep GLIBCXX_3.4.9
[ no output ]

Solution : Installing  more latest version libstdc++ [ libstdc++.so.6.0.9 ] is the perfect solution, but I choose to install the gcc4.4.0 at /usr/local location and link the binary with the library libstdc++.so.6.11 with qt – which contain the symbol

$ readelf  -s libstdc++.so.6.0.11 | grep  GLIBCXX_3.4.9

[ output ]
30: 00050bc0    35 FUNC    GLOBAL DEFAULT   11
_ZNSt6__norm15_List_node_@@GLIBCXX_3.4.9
84: 000859e0   943 FUNC    WEAK   DEFAULT   11
_ZSt16__ostream_insertIwS@@GLIBCXX_3.4.9
112: 0006a560   328 FUNC    WEAK   DEFAULT   11
_ZNSt13basic_istreamIwSt1@@GLIBCXX_3.4.9
185: 00085050   506 FUNC    WEAK   DEFAULT   11
_ZNSt13basic_ostreamIwSt1@@GLIBCXX_3.4.9

Here are the steps I followed.

Steps :
1) install gcc 4.4.0 at /usr/local
[ download link gcc 4.4.0 ]
2) link the library befor we run qtcreator
export LD_LIBRARY_PATH=/usr/local/gcc-4.4.0/lib:$LD_LIBRARY_PATH
3) run qtcreator from same shell.

Which solves my problem. I hope this will help you too.

Tagged with: , , , , , , , , , , ,
Posted in computer, debug, install, Science, troubleshoot, Uncategorized

Car route to sinhgad , pune is closed for development work

Road to Sinhgad, a favorite week end picnic spot for puneties [ people form pune, india] is closed for development work. I visited the place yesterday , 13th Feb. But I could not take the vehicle road to reach hill top. I had to take the trekking road. It took me two and half hour to reach at top from base by walk/ trekking.

I would suggest people who are going to visit Sinhgad through car route, please postponed your trip for 3 month.

In the mean time if you are interested to trek to top then opt for morning slot because at noon , trekking will be very tough.

There is direct PMPL Buses to Sinhgad from Saniwarwada , pune at morning time.

Tagged with: , , , , , , , , , ,
Posted in latest news, travel, update

2010 in review

The stats helper monkeys at WordPress.com mulled over how this blog did in 2010, and here’s a high level summary of its overall blog health:

Healthy blog!

The Blog-Health-o-Meter™ reads This blog is doing awesome!.

Crunchy numbers

Featured image

A Boeing 747-400 passenger jet can hold 416 passengers. This blog was viewed about 3,600 times in 2010. That’s about 9 full 747s.

 

In 2010, there were 7 new posts, growing the total archive of this blog to 11 posts. There were 9 pictures uploaded, taking up a total of 284kb. That’s about a picture per month.

The busiest day of the year was January 5th with 54 views. The most popular post that day was Near Death Experience : A stupid helmet owner’s argument in court.

Where did they come from?

The top referring sites in 2010 were openclexperiments.blogspot.com, forums.amd.com, google.com, stackoverflow.com, and google.co.in.

Some visitors came searching, mostly for debug opencl, opencl debug, opencl debugger, opencl debugging, and opencl ubuntu lucid.

Attractions in 2010

These are the posts and pages that got the most views in 2010.

1

Near Death Experience : A stupid helmet owner’s argument in court December 2009
1 comment

2

Debugging OpenCL Program with gdb November 2009
11 comments

3

Cannot compute suffix of object files : gcc 4.4.3 and gcc 4.5.1 installation problem solved June 2010
2 comments

4

Enabling Multilanguage Support : CentOS 5.5 August 2010

5

How to install freemind-0.8.1 on fedora 8 July 2010

Posted in Uncategorized

Installing PyOpenCL CentOS 5.4

Installing PyOpenCL on CentOS 5.4

1) ./bootstrap.sh –prefix=/usr/local/boost-1.42 –exec-prefix=/usr/local/bin –libdir=/usr/local/boost-1.42/lib –includedir=/usr/local/boost-1.42/include –with-python=/usr/bin/python2.4 –with-python-version=2.4

2) ./bjam

3) ./bjam install

4) export LD_LIBRARY_PATH=/usr/local/boost-1.42/lib:$LD_LIBRARY_PATH

5) ./configure.py –python-exe=/usr/bin/python2.4 –boost-inc-dir=/usr/local/boost-1.42/include –boost-lib-dir=/usr/local/boost-1.42/lib –boost-compiler=/usr/bin/gcc34 –boost-python-libname=boost_python –boost-thread-libname=boost_thread –cl-enable-gl –cl-inc-dir=/usr/local/cuda/include/CL –cl-inc-dir=/usr/local/cuda/include –cl-lib-dir=/usr/local/cuda/lib –cl-lib-dir=/usr/local/cuda/lib64

5.1) ./configure.py –python-exe=/usr/bin/python2.4 –boost-inc-dir=/usr/local/boost-1.42/include –boost-lib-dir=/usr/local/boost-1.42/lib –boost-compiler=/usr/bin/gcc34 –boost-python-libname=boost_python –boost-thread-libname=boost_thread –cl-enable-gl –cl-inc-dir=/usr/local/cuda/include/CL –cl-inc-dir=/usr/local/cuda/include –cl-lib-dir=/usr/local/cuda/lib –cl-lib-dir=/usr/local/cuda/lib64

6) make
6.1) export http_proxy=http://xxx.x.xxx.xx:yyyy
xxx.x.xxx.xx is ip-address of proxy server of your LAN.
yyyy is port on which ther server is listening for http request.
exampe : $export http_proxy=http://190.10.190.25:8080

7) make install

More on PyOpenCL :
1) http://wiki.tiker.net/PyOpenCL
2) http://mathema.tician.de/software/pyopencl
3) http://documen.tician.de/pyopencl/

Tagged with: , , , , , , , , ,
Posted in Uncategorized