>That is, if one processor writes to memory in a cache-line shared by another processor, they must stop whatever they are doing to synchronize the dirty cache lines with RAM. Thus, updating reference counts would flood the memory bus with traffic and be much worse than the GIL.
I dont' understand. Isn't this going to happen if you have multiple threads running even if the GIL is blocking them from running? I'm not a hardware expert, but I'm not sure how constant locking would prevent cache synchronization just because they weren't truly running in parallel.
I am fairly certain that constant synchronization(lock) because of the GIL would negatively impact cache performance, especially since well designed multithreaded applications avoid locking for as long as possible.
I believe his argument is that it would reduce thrashing between the caches. With the GIL ownership of a cache line containing the reference count for any given object will only have to be transfered at most once per timeslice. If multiple threads were concurrently accessing a python object it would be ping-ponging back and forth between caches much more frequently.
EDIT: Also, "stop whatever they are doing to synchronize the dirty cache lines with RAM," is not a very good way to describe what is going on, often times you don't have to hit RAM at all, the caches just synchronize between each other. It is still pretty bad for performance though.
>I believe his argument is that it would reduce thrashing between the caches. With the GIL ownership of a cache line containing the reference count for any given object will only have to be transfered at most once per timeslice.
Ah. Makes sense.
>just synchronize between each other
Yes, but that's bad because that cache line is 'stuck' for all processors while the synchronization is occurring, if I'm not mistaken...
In general at least the two processors with the conflict will either have to block for a bit or switch to another hardware thread when write conflicts are occurring. There are lots of architecture tricks people pull to try to mitigate the impact but the reality of the matter is frequently mutating shared state (e.g. reference counts) makes it extremely difficult to have good performance with threads running in parallel.
> I dont' understand. Isn't this going to happen if you have multiple threads running even if the GIL is blocking them from running?
I don't think it would. If there is a GIL (Global Interpreter Lock) only one thread of the process can be scheduled to run at any time. As the poster (Sturla) says, Python threads are native OS threads so they should be scheduled by the OS kernel (right?). A good scheduler would use affinity scheduling and schedule all threads of the Python program on the same processor/core every time to get benefits from cached data and code. I believe modern kernels (Linux, MacOS, Solaris, probably Windows as well) use this kind of affinity scheduling, so if we're lucky the Python process gets scheduled on the same processor every time and there will be no need any cache synchronization.
> I'm not a hardware expert, but I'm not sure how constant locking would prevent cache synchronization just because they weren't truly running in parallel.
I'm not sure if you misunderstood the mail. The constant locking would only be used if they were running in parallel.
Anyway, if you have a GIL you don't need that kind of locking described in the mail. You only need to do explicit locking on shared data structures when you read or update the contents of those data structures. If you have reference counting, threads that run in parallel and no GIL you would have to lock even if you are just assigning a reference to such a data structure to a new variable. If you have a GIL you are certain that only one thread at a time are updating the reference count. That is indeed what the GIL is, one coarse lock for all data (and the interpreter) instead of fine grained locks for every data structure.
(I don't know Python very well, I just answer from general knowledge of computer architecture and language implementation. But I've read about the Python GIL several times, since it's the most discussed GIL of any language.)
I don't think there's anything to prevent more than one thread from being scheduled at any time. They just block when trying to run concurrently because of the GIL.
>I'm not sure if you misunderstood the mail. The constant locking would only be used if they were running in parallel.
No, I'm saying that the GIL is constant locking. You still have two threads being run concurrently on (possibly) two separate cores accessing the same cache lines. They just cannot actually run in parallel. I have no idea how the GIL time slices between the two threads, so what i'm saying is completely possible.
However, below my original post meastham correctly pointed out the GIL does prevent cache thrashing where updates to shared memory might go back and fourth multiple times unnecessarily. So it's not as bad as I was imagining.
Yes, you are right that the OS can schedule two Python threads to run at the same time, it's just that one of them will only run a few instructions and then block, just not any Python instructions. I hadn't really thought that through thoroughly, thanks for pointing it out.
Ah, I see now what you meant with constant locking. I interpreted your words as "constant" as in happening all the time as would be the case with fine grained locks instead of one long-lasting, global lock.
Rubinius (in the current 2.0 betas) has removed the GIL as well. In fact, the only Ruby that still has a GIL is MRI. MacRuby, JRuby and Rubinius (as of 2.0) all have threading without a GIL.
"I want to point out one more time that the language doesn't require the GIL -- it's only the CPython virtual machine that has historically been unable to shed it."
I don't want to get into a software hipster flamewar about the definition of 'mainstream', but CPython is the Standard Python. When I click on "Windows Installer" on python.org, I get CPython.
python is widely used as a scripting language. If you are on windows they will probably ship with their own python. Even if not they will probably have to use something to glue the main program to the interpreter which will probably not be compatible with any python implementation.
Well, if Python runs really fast for the clued-in users who use PyPy and pretty slow for the non-clueful, then I'm providing my users with a bad experience. I need to be able to reliably deliver fast code without futzing around with custom installs of Python just to deliver an app.
Actually, it runs well enough in CPython unless you are doing something terrible with your code. What it doesn't do is execute more than one thread within the same process. If you don't need to share in-memory data between your threads (which is something you really should avoid, for your own sake), you can use the same mechanism you use to create a thread to create a new process.
Can anyone shed light on where the GIL winds up hurting you?
According to this paper [1]: "Thus, in all cases, the single global lock semantics seem fundamentally compatible with both lock-based and transactional memory implementations."
If you leave aside C-extensions and libraries, CPython is a pretty bad language for number crunching. This is just not the use-case it tries to solve. I am glad they favor simplicity over execution speed.
Removing the GIL might be useful for a faster implementation with JIT compiler though.
This is not exactly true. We can alleviate the GIL issue by releasing it inside C extensions, true, but it is still there nonetheless. Incidentally, one of the big advantage of the GIL is to make C extensions easier to write.
As long as you are not creating or destroying the Python objects you expect to give back to the Python side of your program, you don't have to care much about it.
CPython is a pretty bad language for number crunching
Perhaps, but thanks to numpy, scipy and a host of other amazing libraries it still ties with matlab as the go to language for number crunching among everyone I know who crunches numbers for a living.
Reference counts could be stored separately from objects and migrated to the thread that modified it. Or several reference counts for the same object could be used. Has this been tried?
If the GIL bites you, it's most likely a warning that your program is badly written, independent of the GIL issue.
Ah, that old line again.
Translation: "We really don't like to even think about changing this crappy design that we started with in the first place, because we can just explain ourselves out of it by coming up with suitable language goals that don't actually require concurrent access to the interpreter. Not accessing the interpreter concurrently is one of our language goals because you can do everything else. So, if you think you still need to get rid of GIL then you're just a bad programmer and your programs are badly written because hey, we just defined the universe you're playing in."
The question is not and never has been "Does the GIL have undesirable characteristics?" It has always been "can someone produce an implementation that is missing the GIL and actually better, while meeting all the needs CPython has?" So far, the answer is no, despite rather a lot of smart people trying.
(Also note that many people have succeeded by dropping the second clause. Many non-CPython Python implementations don't have a GIL. But they aren't CPython, which in particular means that extensions written for CPython don't work in them, which is really the key thing that distinguishes CPython from just generic "Python".)
I don't think that he is arguing the the GIL isn't a limitation, just that the fundamental limitation it imposes can't be removed without also changing the threading model or the garbage collector. It's really impossible run threads in parallel with any sort of performance when they're all constantly generating a huge amount of cache coherency traffic by updating reference counts.
He's not saying that. He's saying that the GIL isn't a limitation to certain kinds of application, the kinds that Python usually is used for. The kinds of applications where the GIL would be a limitation, Python also has another limitation: slow performance, and performance is usually the reason to run things in parallel.
With PyPy the performance will get better, and they also have a GC, so that hinder is removed. I don't really know if PyPy has a GIL, I would guess that they don't.
"With PyPy the performance will get better, and they also have a GC, so that hinder is removed. I don't really know if PyPy has a GIL, I would guess that they don't."
Ok, the PyPy FAQ says: "Yes, PyPy has a GIL. Removing the GIL is very hard. The first problem is that our garbage collectors are not re-entrant."
Is it really necessary for the GC to be re-entrant to run the interpreter in parallel? Couldn't you have the interpreter running in parallel and then when there is a need to run the GC you have a global GC lock that prevents all threads from running - a stop the world GC. The application runs for a longer time than the GC, right? So it would be a win and a step in the right direction? I believe the early Java mark and sweep GC was like that, and then later Sun developed several different kinds of concurrent and parallel GCs.
> Official PyPy Status Blog
Oh I read that every time they write something. :) But I started reading it in late 2010 and I haven't gone back to the archives, I guess it's time to do that. Thanks for the links.
GIL is a problem - that has been acknowledged every time it has been brought up. There has been past attempts to remove the GIL - that slowed down the interpreter and the patch wasn't merged.
Removing GIL is massive work and will make the interpreter complex. Meanwhile, you have gevent, multiprocessing, c extensions... to work out the limitations.
The most frustrating thing about Python is its community's complete denial about what a joke their concurrency situation is. Truth is python is not truly multi-threaded, and no, claiming that multi-process is the way to do parallel computation across the board is not a sane argument at all. It's religious zeal. My company is currently using it for web apps, and that's proving a pain (i.e. having to use proxy servers for database access to minimize connections across all the python process instances). Using python for anything more serious, like a message queuing system for example is even more prohibitive. People in charge should wake up and start taking serious steps about it. I guess PyPy is the biggest hope. Meanwhile in the JVM world..
Running Java threads on a single processor machines makes it not truly multi-threaded then? Python has multi-threading - due to GIL, only one of them execute at a time, which isn't very different from running multiple threads on a single processor machine. It facilitates concurrency, not parallelism.
> claiming that multi-process is the way to do parallel computation across the board is not a sane argument at all.
If you have n processing units, anything greater than n isn't parallel. Spawning 100 threads in a JVM doesn't give you 100 parallel workers(assuming JVM mapped those 100 JVM threads to 100 system threads).
Multi processes make perfect sense for parallel jobs. They do fine with nothing shared and message passing. They are problematic when the jobs need resource sharing.
> having to use proxy servers for database access to minimize connections across all the python process instances
That doesn't sound like pain. Some systems intentionally have kept db connection pooling outside the application server. Application server talks to the manager and manager delegates to the database.
> Using python for anything more serious, like a message queuing system for example is even more prohibitive.
Celery works great. Thank you.
For custom needs, there is gearman, then there is zeromq...and they are not written in Python, and I don't care, and that works for me.
> Meanwhile in the JVM world..
Then why not stick to the JVM world rather than crib and whine in Python world.
The most frustrating thing about the ferrari community is that they are in complete denial about what a joke their affordable practical sedan story is.
Have you ever heard Guido speak about the issue? He and a number of others don't think it's one worth solving. Really.
Yea it may be a lot of work to create a new GC implementation and change the threading model, but if you want the language to progress that's the way forward.
I dont' understand. Isn't this going to happen if you have multiple threads running even if the GIL is blocking them from running? I'm not a hardware expert, but I'm not sure how constant locking would prevent cache synchronization just because they weren't truly running in parallel.
I am fairly certain that constant synchronization(lock) because of the GIL would negatively impact cache performance, especially since well designed multithreaded applications avoid locking for as long as possible.