Why parallelism is hardI read a presentation from the folks at Pervasive about their new DataRush product (http://www.pervasivedatarush.com/) along with an accompanying article hosted at http://www.theserverside.com/tt/articles/article.tss?l=PervasiveDataRush The presentation and article state what most of the hardware folks have been saying for a while: Computers will include many more CPU, cores and software had better start taking advantage of them. They state that: “Parallel programming is hard” but they don’t talk about why it is hard. Parallel programming becomes hard when two threads have the need to modify the same piece of data. The challenge then becomes coming up with ways to ensure that the two threads do not interfere with each other. This is traditionally done by having each thread obtain and release a set of locks. Determining when locks are obtained and released is left up to the programmer and if improperly done can be the cause of very hard to find bugs. This is why parallel programming is hard and the articles from Pervasive don’t seem to address this. It is also unclear if DataRush will ever be able to take advantage of the SPE processors on the IBM Cell CPU. A dataflows approach isn’t incompatible with programming the Cell but it is unclear how DataRush will be able to support this using their 100% pure Java model. I see transactional memory as a much more promising technology. You can find a good introduction to transactional memory at http://acmqueue.com/modules.php?name=Content&pa=showpage&pid=444. The basic idea is that programmers define a transaction block (like a scope) that does not need to concern itself with locking. The execution of the block might finish successfully or it might get ‘rolled-back’. This is very similar to how database transactions work and I would expect transactional memory to have a similar set of concerns. Another approach to the problem is being taken by a company called RapidMind (www.rapidmind.net), they have developed a toolkit that seems to operate at the data-type level that allows small operations to be done concurrently on GPU’s or the Cell SPE type processors. |