Their interpreters are single threaded. You know that massively scalable webapp you wrote with ruby (python/lua/a bunch of other trendy dynamic languages) well - it won't scale thanks to ruby (python/lua/a bunch of other trendy dynamic languages).
Shame of it is, Java has proper posix thread mapping.
But also, the latest Guile has proper thread handling inside the interpreter, even with continuations!
Discussion (39)
Since when is that a secret?
They don't advertise it Ryan.
This doesn't mean that you can't write "massively scalable webapps" in these languages, just that real parallelism needs to happen at the process level instead of at the thread level.
Memory is cheap!
Also, I'd add that it's a little silly to claim a single-threaded web app won't scale simply because it's single-threaded. There are tons of things to worry about before you even consider threading, and even then there are plenty of relatively easy ways to deal with the scalability challenges a single-threaded interpreter presents.
Yeah. And processes have advantages over threads.
But threads are more scalable than processes. When you're really tight for resources threads are better.
Of course. It's silly. I retract it... no, threads are no use... much better to use processes all the time.
How foolish of me.
Hey now. No need to get all pissy.
I'm not saying threads are useless. I'm just saying that, all things considered, single-threadedness does not in itself mean that a language isn't scalable.
Brian raises a very good point: if you're really tight for resources, it's probably cheaper to add more resources. Programmers are hideously expensive. Hardware is cheap.
Claims inspired by this comment
Programmers are hideously expensive.No need to get pissy? Why because you said it was silly so that must be alright?
And threads are easier to program. You can just use shared data protected with mutexes. Java Hashtable say. Java will scale really well because it's threads are sensible.
If you don't do that you've got all the issues of serializing that data to some backend. Nasty business. Especially when there is a large bunch of people who don't really understand the issue.
One of the consistent themes on the mod_python list (where I happen to hang out a bit) is "how do I share this data that I've got - how can I get to shared memory?" which shows you that programmers want to do that kind of thing.
Programmer scalability goes both ways.
Where are you from, nic? I've gotten the feeling from past claims and comments that you might be in Europe somewhere. If so, that might explain some things. "Silly" isn't considered an insult where I come from, and it wasn't intended as such.
That said, if you think multi-threaded programming is easy, you're either a genius or a fool. No offense intended.
You think this:
MyServlet {
Hashtable things();
public void doGet(...) {
out = resonse.getWriter();
out.println("<html><body>" + things.get(request.getParameter("someparam")) + "</body></html>");
}
}
(plus the stuff to update it which is no worse than getting it)
is hard?
Fool it is.
It's not secret, nor dirty. As cheap as new hardware is these days, it's much easier to scale web applications by throwing a few new servers up when things get crazy.
The advantages in interpreted languages usually outweigh the difficulties (not impossibilities) in scaling. I'd much rather use a pretty language that doesn't wear me out and makes for very agile code and need a few extra servers than want to rip my eyeballs out programming more Java. Just because the industry likes it most does not make it better.
Of course, I can make the argument:
1. Given, Myspace
2. Given, Backpack
QED.
In Python, at least, this is no secret. It's acknowledged all the time in comp.lang.python that the GIL (Global Interpreter Lock) can cause these kinds of issues.
Whether it's a problem or not depends on how poorly processes are implemented in your OS, of course.
The issue here is that threads are a good scaling tool but they're pretty useless because of the implementations of those languages interpreters. There is nothing in the languages that makes them inherently unthreadsafe. Guile has shown that you can do a thread safe interpreter in a purely dynamic language quite easily despite all the other schemes being single threaded.
So, fix the implementation of the interpreters and it gives us all another scaling option.
Ruby 2.0 will have support for native threads, and it's possible to compile Ruby 1.8 and 1.9 to use native threads as well, but this doesn't change the fact that you seem to have a very naïve attitude about the difficulties of concurrent programming.
"[your massively scalable webapp] won't scale thanks to ruby [...]"
Granted it may remain a secret during your first week of Ruby discovery that its threads are green.
Granted, when you eventually land on page 127 (out of 800) of the pickaxe and you read: "Finally, if your machine has more than one processor, Ruby threads won’t take advantage of that fact—because they run in one process, and in a single native thread, they are constrained to run on one processor at a time.", you can be a little disappointed as a beginner.
However, an experienced programmer who just wrote a "massively scalable webapp" will not be much impressed by the fact that his carefully crafted threaded scheme will not be run on native OS threads ( let alone the fact that no programmer is able to achieve a "massively scalable webapp" after his first week of programming, even with Ruby ;-) ).
I have always found that deciding on the right data modeling and object design when dealing with memory managed environments was the most tricky aspect of a scalable system. The memory managers/GC have always taugh me humility the hard way as far as scalability is concerned.
The green threading models have almost never been a problem by themselves. The only case I have really experienced is with synchrounous IO with databases on heavy requests and after all it was more a design weakness which was solved by using a Linda system.
I do not really understand that you see native threading as a significant scalability lever.
Multithreading is a design and modeling aid, to achieve flexibility and/or responsiveness at the human perception level. This is true whatever the implementation, hardware or green.
Multithreading is NOT a performance tool. It is often seen as such by beginners who like the idea that their programs spread magically on multiple processor cores. But it is really poor man's scaling-in, pertinent only for simple CPU intensive tight loops. (unless you run your apps on 144-way Sun Fire E25ks ! :D)
A "massively scalable webapp" will have been designed to scale-out nicely on multiple servers (read hundreds). Which means it will equally-well scale inside a single machine on multiple cores before having to resort to multiple boxes. In my view, the correct way to massively scale today is to scale out. For that, you use clustering, distributed caches, clever DB models, disk arrays, and load balancers. All these on reasonably-priced hardware (not too cheap but no Sun/HP/IBM/SGI).
Sorry for the slight digression, in a nutshell:
Trivial (limited) scaling-in using threads for trivial transactions/processing, YES !
Attempting to massively scale non-trivial apps by the means of multithreading: NONSENSE !
Multithreading is always hard to get right on large apps and for all the pain you hardly gain O(log n) for a n-way system.
You will sooner or later need process-parallelizing anyway for serious load multiplication, so design your whole architecture for it upfront !
I can't speak for the others, but Lua is thread-safe. I'll grant you that the stock portable interpreter doesn't support threads directly, because it uses only ANSI C library functions, but it's trivial to make it support multithreading.
@zed I'm losing track of this - but I'm not talking about green threads per se. I'm talking about the interpreter loop. Ruby does not have a thread safe interp. it can only evaluate code in one thread at a time.
I see threads as a performance tool precisely because they're easy to reason about and use.
Claims inspired by this comment
Threads are hard to reason aboutHey, firstly... you didn't forget Lua!!! Everyone else does :D
My next point (defending Lua, because that's what I do)... No, Lua does not have built-in threading. Lua is written in clean C, and the standard C has no concept of threads.
However it is trivial to: 1) Use a thread library 2) (In the case of embedded scripting) add thread management to the main application.
It happens that I am planning on using Lua for a web app already. No matter what, it's a whole lot lighter than PHP.
Um, you can use OS-level threads just fine from Python. There is just a little thing called the Global Interpreter Lock that makes them not useful for scaling across CPUs. Just fire up N processes, where N is the number of CPUs in your system, and you are set.
Threads are overrated anyway:
http://cleverdevil.org/computing/30/needled-by-threads
Oh, and try to scale across *machines* with threads. Go ahead, I dare you! If you use processes for scaling, rather than threads, scaling across machines can be much more natural.
Have you seen Azul? It awesomely scales threads.
And I'm getting grumpy about this now because I think people are willfully misunderstand my point.
Threads are a useful scaling and abstraction tool, saying simply "they are not" is not good enough. There are too many multithreading applications out there (every Java servlet container in the world) to say "threads are useless do it another way".
I agree there are challenges with scaling threaded apps beyond 1 machine but I think the same answers that work for process based apps work for threaded apps and with threads you make each machine give you more.... And in a situation where you have just one machine a threaded app works really well and you can simply scale the app by scaling the hardware. Nice. Managers love it. Simple techies love it (because it IS simple).
So come on, stop being disingenuous. Python, Ruby and Lua WOULD all be better if they could evaluate more than one thread of execution at a time: Guile can do it and Java can do it so it isn't that difficult.
I don't think anyone's saying threading is useless. All we're saying is that it's not the lowest-hanging fruit when it comes to optimization, and it's a lot harder than you think to write reliable multi-threaded code.
Yes, native threading would make many scripting languages more powerful, but threading is not a silver bullet and it comes with a lot of caveats.
Actually, eliminating the GIL in Python would make it *much* more difficult to write extension modules in C. Right now, its trivially easy, which is why you see such a great abundance of Python bindings out there. You eliminate the GIL, and writing Python extension modules could be as complicated as using Java's JNI, which is a total headache. Personally, thats not a tradeoff I am willing to make. Python is like glue for all the great native libraries out there, and I like that.
I would much prefer the Python and Ruby communities to spend time stealing ideas about concurrency from Erlang (message passing, shared nothing, lightweight processes), rather than Java (threads, shared-memory).
Your claim is false: this isn't a dirty secret, its a conscious design decision that is well-known by anyone who has taken more than a casual glance at either language. Deal with it.
This is just not true. Have you even looked at Guile? Guile does proper posix threading with no lock and writing extensions for it is not hard.
And I'm not talking about native threading!!!!!
Agh! I'm suggesting that those languages are poor-er for not being able to interpret more than one thread at a time.
You can do threaded apps in Python, Ruby and Lua. But you can't run more than one thread at once because they have an interpreter lock (as Jonathan says) but it is not difficult to get rid of the lock without having the madness that is JNI.
GCJ does it and has CNI (much nicer) and Guile does it and Guile's extensions are not hard to write.
If its so easy, why not spend some time writing some patches for the Python and Ruby interpreters instead of bitching about it on the internet?
Bottom line: sure, its possible to do what you are asking for, but it would introduce complexity into the languages (and VMs), and would take a lot of time. Honestly, the issue isn't nearly big enough of a deal to justify the associated complexity and work costs, at least not to me.
Again, I would prefer that time be spent doing things like improving the quality of the standard libraries, implementing Erlang-like concurrency, and improving raw performance.
If you're so worried about the standard libraries why not spend time doing something to improve them instead of arguing with me?
It's a bullshit argument.
Claims inspired by this comment
How to win friends and influence people.Hitler would have agreed with this claim.
(this discussion has officially been Godwinned and must now end as per the Law of the Internets)
Thank you, this was very fun to read... :)
Shared state, preemptive multithreading is positively *evil*. Use asynchronous events instead.
both native threads and events have their places. native threads just become more important because of the prevalence of multi-core CPUs.
I was annoyed to discover that buried deep in the ocaml documentation was the information that ocaml code cannot execute in parallel, despite being a compiled language (and generally a pretty fast one, too). Hopefully they fix that eventually.
Well, you should agree with me then. Another language with a poor implementation that they don't really admit to.
All I have to say to this is... so?
If you're working with un-RESTfull applications then it can matter a lot how many concurrent requests you can service on a single box. The databases are all parallel, the webservers are all either highly parallel or driven by asynchrony and yet only one thread at a time (without separate interpreters) can execute in the interpreter of the most popular dynamic languages.
So if you were to end up doing something interpreter expensive (say, doing a lot of language featured cache lookups or something) then you'd be seriously limiting your performance when you didn't need to.
Of course there are ways to mitigate and to some people it just doesn't matter,
But I still believe it's a dirty little secret rather than a design decision. Well, maybe it's a design decision in as much as the designers go "ooooh that looks a bit hard I won't bother doing that".
Wow, where did interesting conversations like this go?
Heh. This problem bit a colleague in the arse yet again only last month.
Rendered some HTML through an XSLT from a django based, fcgi app. The XSLT had a document() call back to the same server.
Oh dear. It doesn't work. He spends a day looking at it.
He asks me. I change the fcgi server to prefork and it magically works.