“With the current Lustre Performance Monitoring Tool (LMT) no longer in active development, and the current version incompatible with DNE based Lustre 2.5 deployments, there is a critical need for a new set of tools delivering the same basic Lustre performance metrics with the added ability to work compatibly with contemporary releases of Lustre.”
The ISC High Performance conference brings out the very best in international supercomputing, and this year will be no exception with a host of content focused on HPC in Asia. Singapore has been a hotbed of HPC activity of late, so we caught up with Marek Michalewicz, who was recently named Chief Executive Officer at A*STAR Computational Resource Centre.
“The combination of the ephemeral nature of the cloud and directly addressable archives such as S3 suggest novel methods for using the Lustre HSM interface. Persistent data sets in the cloud need to be managed independently from a ephemeral filesystem and compute resources. Managing datasets in the cloud could, for example, involves importing data from Amazon’s S3 back into a freshly-created Lustre filesystem, performing I/O intensive computations, and then persisting the datasets back to S3 before terminating the filesystem and compute resources. Alternatives for archive formats will also be discussed. AWS S3 will be used for concrete examples, but the general methods should be applicable to other cloud environments as well.”
Today’s large-scale compute clusters take a lot of preparation for shipping. “Logistics planning for the KAUST shipment started months in advance, and it took 12 Cray personnel almost five days to prepare and pack the system. The KAUST shipment consisted of 184 boxes, weighed 127 tons, and filled an entire Boeing 747 airplane — plus part of another. Ten semi-trucks were required to transport Shaheen II from Cray’s manufacturing facility in Chippewa Falls, Wisconsin, to the airport in Chicago.”
“Large scale HPC IO is usually done either with a file per process or to a single shared file. Single shared file IO does not scale well in Lustre compared to file per process. This presentation from Cray’s Patrick Farrell will give details, examine the reasons for this, and explore existing and potential solutions. Group locks and a new feature, lock ahead, will be discussed in the context of strided IO.”