Parallel Lounge: Parallel Computing Blog for Engineers, Scientists, Analysts

Current Posts |  RSS Feed

What is the Cost of Performance?

Posted by David Gibson



As data sets grow, and algorithms grow increasingly complex, there’s a need by engineers, scientists and analysts to increase performance. The first step is often to re-write their algorithm – originally coded in MATLAB® or another very high level language (VHLL) – into a lower level language, C, C++, or Fortran. A typical project may take several months, and result in a 5-10X performance gain on a typical workstation (Option 1 below).

Option 1: Port to C++

  • ~6 person-months (~$120,000)
  • 5-10X gain in performance on a single processor
  • Calendar time to solution: 6 months
  • Does not scale beyond a single processor without further work

In Option 1, it should be noted that this serial programming effort does not scale “for free” beyond a single processor. So while the 5-10X gain is a perfectly acceptable target for many projects, those needing more performance will need to take a different approach. To increase performance further, one can turn to clustered servers. Of course, this typically involves some degree of parallel programming, with the relatively low-level paradigm of message passing (MPI or OpenMP). Here’s some data from a recent survey we carried out, asking 25 organizations about their MPI-based development efforts; presented below are the histograms of team size, and project length. While parallel programming projects vary widely, it is common to see teams of several engineers working 1-2 years.







So, let’s consider a fairly typical example, when the required computing power outstrips a single desktop, and the decision is made to develop an MPI-based application running on HPC clusters.

Option 2: Port to C++, with message passing (MPI)
  • Total incremental investment: $1,000,000
  • ~48 person-months (~$900,000)
    • 60X gain on a 128-core server (~$100,000)
    • Calendar time to solution: 12-18 months
  • Scales with more hardware, if higher performance is desired

Recently, a new programming paradigm has become available: Using existing VHLL code developed on a desktop, but extended to HPC server clusters with the Star-P software platform. This approach eliminates the C/MPI programming, and instead requires some incremental coding in the familiar MATLAB environment, leveraging much of the application’s existing code base. One can learn the handful of tags and commands within several days, and within a short number of weeks typical codes can be parallelized to run on the cluster. For a number of reasons, the processor utilization and compute efficiency may not be as high as on a “hand-tuned” custom MPI code, so a somewhat larger server may be necessary. (Fortunately, hardware is cheap, and getting cheaper.)
Here’s how the numbers come out for the typical case:

Option 3: Star-P extends MATLAB® to HPC Servers w/o MPI
  • Total incremental investment: $270,000
    • 1 person-month (~$20,000)
    • 60X gain on a 256-core server (~$200,000)
    • Star-P license ($50,000)
  • Calendar time to solution: 1 month
  • Scales with more hardware, if higher performance is desired

So this new programming model offers us the flexibility to trade off labor cost and time savings versus hardware costs. Because many projects are constrained by calendar time and available technical resources, a solution such as Star-P offers a way to radically transform the “cost of performance” equation.

Furthermore, this assessment only covers the short-term costs of the parallel port. In fact, most software costs are in maintenance of a code over time. In that case, the VHLL benefits of Star-P (faster and hence cheaper software development) will continue to pay off time after time, whereas the MPI-based approach will continue to cost substantially more.

I am curious to hear your feedback on the argument laid out here, and how it may relate to your projects:
  • What do you do to increase performance for codes written in MATLAB®, Python, R, and other VHLLs?
  • How long do your parallel ports take, with what size team?
  • What are your thoughts on the notion of trading hardware efficiency for calendar time and labor costs?



Article has 1 comments. Click To Read/Write Comments

 
 

Subscribe by Email

Your email:
 
 

Latest Posts

 
 

Browse by Tag

 
 

Most Popular Posts