Libraries that are tuned to the underlying hardware architecture can increase performance tremendously. Higher level libraries such at the Intel Data Analytics Acceleration Library (Intel DAAL) can assist the developer with highly tuned algorithms for data analysis as well as machine learning. Intel DAAL functions can be called within other, more comprehensive frameworks that deal with the various types of data and storage, increasing the performance and lowering the development time of a wide range of applications.
Have you ever wondered why your HPC installation is not performing as you had envisioned ? You ran small simulations. You spec’d out the CPU speed, the network speed and the disk drive speed. You optimized your application and are taking advantage of new architectures. But now as you scale the installation, you realize that the storage system is not performing as expected. Why ? You bought the latest disk drives and expect even better than linear performance from the last time you purchased a storage system. Read how you can get increased efficiency of your storage system.
“When designing an application that contains many threads and less cores than threads, it is important to understand what is the optimal number of threads that should be assigned to a core. This value should be parameterized, in order to easily run tests to determine which is the optimum value for a given machine. One thread per core on the Intel Xeon Phi processor will give the highest performance per thread. When the number of threads per core is set at two or four, the individual thread performance may be lower, but the aggregate performance will be greater.”
As data center sprawl is now understood to be expensive and may not deliver performance increases for all types of applications, new technologies are coming to the rescue. A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturing – hence “field-programmable”. While the use of GPUs and HPC accelerators are generally understood today, there are a number of misconceptions about FPGAs that need to be understood.
A workflow to support genomic sequencing requires a collaborative effort between many research groups and a process from initial sampling to final analysis. Learn the 4 steps involved in pre-processing.
The Intel Xeon Phi processor supports different types of memory, and can organize this into three types of memory mode. The new processor from Intel contains two type of memory, MCDRAM and DDR memory. These different memory subsystems are complimentary but can be used in different ways, depending on the application that is being executed. “By using these two types of memory in the same system gives flexibility to the overall system and will show an increase in performance for almost any application.”
Demonstrating Asetek’s adaptability to any data center cooling need, HPC installations from around the world are currently on display at SC16 in Salt Lake City, Utah November 14-17. Servers from these installations featuring Asetek liquid cooling will be on display including servers installed at Oakforest-PACS, the highest Performance Supercomputer System in Japan.
FPGAs will become increasing important for organizations that have a wide range of applications that can benefit from performance increases. Rather than a brute force method to increasing performance in a data center by purchasing and maintaining racks of hardware and associated costs, FPGAs may be able to equal and exceed the performance of additional servers, while reducing costs as well.
To get maximum parallelization for an application, not only must the application be developed to take advantage of multiple cores, but should also have the code in place to keep a number of threads working on each core. A modern processor architecture, such as the Intel Xeon Phi processor, can accommodate at least 4 threads for each core. “On the Intel Xeon Phi processor, each of the threads per core is known as a hyper-thread. In this architecture, all of the threads on a core progress through the pipeline simultaneously, producing results much more quickly than if just one thread was used. The processor decides which thread should progress, based on a number of factors, such as waiting for data from memory, instruction availability, and stalls.”
“Designing a new generation of hardware with such high performance needs to make sure that developers understand the basics, and are familiar with the architecture of a new system. Single thread performance with the Intel Xeon Phi processor is significantly better than previous designs. In addition, in order to speed up performance even more, vector processing, where applicable is critical in application performance. With two vector processing units (VPUs) per core, applications can execute two 512-bit vector multiply-add instructions per cycle. Each of these cores can deliver 32 double precision operations per clock cycle. The VPU executes all of the floating point operations as well as legacy instructions from SSE to AVX to the new AVX-512 instructions.”