Sign up for our newsletter and get the latest HPC news and analysis.

Transactional memory pros and cons from ACM Queue

Multicoreinfo.com has a pair of pointers to ACM Queue articles pro and con transactional memory in parallel programming for multicore systems. From the (mostly) pro article it is clear that the emphasis is on improved programmer productivity, which is important because it means that (correct) parallel versions of software are available faster

More and more people have come to the conclusion that locking is the wrong approach to solving the consistency issue. This is especially true for programmers who are not intimately familiar with all the problems of parallel programming (which means almost everybody).

…The problem of consistency is nothing new in the world of computers. In fact, it has been central to the entire solution in one particular area: databases….The solution in the database world is transactions.

The concept of the transaction is something that falls out of most programming tasks quite naturally. If all changes that are made as part of a transaction are made available atomically all at once, the order in which the changes are added to the transaction does not matter. The lack of a requirement to perform the operations in a particular order helps tremendously. All that is needed is to remember to modify the data sets always as part of a transaction and not in a quick-and-dirty, direct way.

…Fortunately, this is not needed. The concept of TM (transactional memory) has been defined without this restriction. Maurice Herlihy and J. Eliot B. Moss in their 1993 paper1 describe a hardware implementation that can be implemented on top of existing cache coherency protocols reasonably easily.

The con article mostly focuses on the performance impacts of software transactional memory, an intermediate step between where we are today and hardware supported TM. The title gives its focus away: “Software Transactional Memory: why is it only a research toy?”

TM (transactional memory)1 is a concurrency control paradigm that provides atomic and isolated execution for regions of code. TM is considered by many researchers to be one of the most promising solutions to address the problem of programming multicore processors. Its most appealing feature is that most programmers only need to reason locally about shared data accesses, mark the code region to be executed transactionally, and let the underlying system ensure the correct concurrent execution. This model promises to provide the scalability of fine-grain locking, while avoiding common pitfalls of lock composition such as deadlock. In this article we explore the performance of a highly optimized STM and observe that the overall performance of TM is significantly worse at low levels of parallelism, which is likely to limit the adoption of this programming paradigm.

But the closing section of the full paper indicates that the authors are…skeptical…that even a hardware implementation will help

Based on our results, we believe that the road ahead for STM is quite challenging. Lowering the overhead of STM to a point where it is generally appealing is a difficult task, and significantly better results have to be demonstrated. If we could stress a single direction for further research, it is the elimination of dynamically unnecessary read and write barriers—possibly the single most powerful lever toward further reduction of STM overheads. Given the difficulty of similar problems explored by the research community such as alias analysis, escape analysis, and so on, this may be an uphill battle. Because the argument for TM hinges upon its simplicity and productivity benefits, we are deeply skeptical of any proposed solutions to performance problems that require extra work by the programmer.

We observed that the TM programming model itself, whether implemented in hardware or software, introduces complexities that limit the expected productivity gains, thus reducing the current incentive for migration to transactional programming and the justification at present for anything more than a small amount of hardware support.

I read both of them, though admittedly at more of a skim. I’ll be reading both of these papers carefully on the plane tomorrow.

Comments

  1. This model promises to provide the scalability of fine-grain locking, while avoiding common pitfalls of lock ……… I bet!

  2. No James.. this is true .. I can prove you that .. Emailed you!

Resource Links: