Got news (hi Audra!) via email from NVIDIA today that they have just released a new OpenCL performance profiler for Windows and Linux. According to the company, key features include
- Profiling of actual hardware signals, kernel efficiency, and instruction issue rate
- Timing of memory copies between system memory and GPU dedicated memory
- Customizable graphs to help developers focus in on problem areas
- Basic auto-analysis to reveal warp serialization problems
- Easy import/export to CSV for custom analysis
You can download the profiler now if you are a GPU Computing Registered developers (here’s a link to the development site), or you can just wait until the next rev of the CUDA Toolkit. Goodness.
I also got some news from the company today about a new GPU programming best practices guide.
NVIDIA has also prepared a helpful OpenCL Best Practices Guide designed to help OpenCL developers programming for the CUDA architecture implement high performance parallel algorithms and understand best practices for GPU Computing.
Chapters on the following topics and more are included in the guide:
- GPU Computing with OpenCL
- Performance Metrics
- Memory Optimizations
- NDRange Optimizations
- Instruction Optimizations
- Control Flow
- Performance Optimization Strategies
You can download the guide here [PDF].