Polymake & OpenMP
Posted: 27 Dec 2011, 23:54
I've been working on getting OpenMP to work inside polymake with mixed results. So far there are two major issues with OpenMP and polymake.
The first problem looks to me like a bug in the openmp implementation of gcc. If polymake is compiled with CFLAGS=-fopenmp CXXFLAGS=-fopenmp LDFLAGS=-fopenmp to enable OpenMP something simple as
pasted into some function (for example lp_projection in the polytope app) indeed uses multiple threads when executed. The problem is that the load of each core stays below 80% and the runtime increases to seriel_runtime*number_of_threads. I tried to build a simple test case to simulate what it going on in polymake with a perl script that calls a function inside a shared library that uses openmp but I wasn't able to reproduce the problem. I suppose the procedure isn't that simple in polymake.
I have only tested this with gcc version 4.6.1 so far. I considered giving icc a shot but failed already at the ./configure stage of building polymake with it so I gave up on it. I figured out that I don't have that problem if I use polymake as a callable library so for me this isn't that much of an issue right now.
The other issue are the smart objects used in polymake. As far as I understand it almost every access to those shared objects can trigger a change to the alias set, even read only access. This creates problems if multiple threads require access to one of those objects because most likely it will leave the aliasset in an inconsistent state after a parallel section. I didn't see an obvious way to get a 'dumb' reference to the data encapsulated in a smart objects when poking around in polymakes internals so I created very crued wrappers for the Vector and Matrix classes. They just forward access to the memory location of the shared object data. It's an ugly hack and I wouldn't be surprised if it breaks things in other places because my understanding of the polymake internals is very limited, but it seems to work. At least for the specific case where I tested it. Obviously things will fall apart if the memory allocated by the sharedobject/array gets relocated.
I've attached the wrapper and a version of the lp_projection function that uses openmp. It gives a reasonable parallel speedup, if used via the callable library mechanism.
The first problem looks to me like a bug in the openmp implementation of gcc. If polymake is compiled with CFLAGS=-fopenmp CXXFLAGS=-fopenmp LDFLAGS=-fopenmp to enable OpenMP something simple as
Code: Select all
#pragma omp parallel for
for(unsigned long int i = 0; i < 1000000000; ++i){
d = sqrt((double)i);
}
I have only tested this with gcc version 4.6.1 so far. I considered giving icc a shot but failed already at the ./configure stage of building polymake with it so I gave up on it. I figured out that I don't have that problem if I use polymake as a callable library so for me this isn't that much of an issue right now.
The other issue are the smart objects used in polymake. As far as I understand it almost every access to those shared objects can trigger a change to the alias set, even read only access. This creates problems if multiple threads require access to one of those objects because most likely it will leave the aliasset in an inconsistent state after a parallel section. I didn't see an obvious way to get a 'dumb' reference to the data encapsulated in a smart objects when poking around in polymakes internals so I created very crued wrappers for the Vector and Matrix classes. They just forward access to the memory location of the shared object data. It's an ugly hack and I wouldn't be surprised if it breaks things in other places because my understanding of the polymake internals is very limited, but it seems to work. At least for the specific case where I tested it. Obviously things will fall apart if the memory allocated by the sharedobject/array gets relocated.
I've attached the wrapper and a version of the lp_projection function that uses openmp. It gives a reasonable parallel speedup, if used via the callable library mechanism.