I have an 8 core machine, so vanilla Python increasingly annoys me by running roughly 200*8 times slower than optimized C code (I use OpenMP), rather than 200 times slower than C.
To use more cores, I occasionally use these parallel map functions in Python:
threadmap
spawns a number of threads equal to the number of cores, and distributes work across threads.
forkmap
requires Cygwin, Mac, or Unix, and does the same thing with fork() and pipe(), basically to get around GIL restrictions in regular Python. Probably a better solution is to use a Python implementation such as IronPython which doesn't have the GIL, but the FFIs for CPython libraries seem spotty in these other Python implementations.
Merely using parallel map() functions is rather limiting; OpenMP-style syntax should probably be added to Python at some point, after the GIL is removed. (As the number of processor cores goes to infinity, either the GIL will be removed or Python will become irrelevant, as languages with similar syntax and fine-grain locking will take over.)
There are
libraries which implement parallelism by using spawn() instead of fork() to get Windows compatibility without Cygwin, but I couldn't figure out how exactly these libraries were spawning new copies of my code, so due to lack of documentation on this point I used other solutions.