Tilera revamps chip in attempt to scale memory wall

Print Friendly, PDF & Email

News from The Register’s Timothy Prickett Morgan that Tilera has updated the Tile64 chip announced last year

The Tile64 chip announced last year and the TilePro64 and TilePro36 kickers announced this week are not based on any existing processor cores and their associated instruction sets. The chips embody a new core that was designed from the ground up to take advantage of mesh networking on each core. This creates a large pool of compute resources that can be dedicated to running a single instance of Linux and its applications or carved up on the fly into virtual Linux images, each isolated from other virtualized slices.

…The Tile64 design is clever in a number of ways, which means it might see some use in IT devices near you someday soon. First, it does not use a bus architecture to talk to peripherals or to have processors and cache memory talk to each other. The iMesh network allows point-to-point communication between the chips and does away with bus architectures, which require high clock speeds and lots of energy to deliver bandwidth and scale.

The article is interesting for the technical details it provides on the chip: informative, but within the understanding of an ordinary mortal (unlike much of the discussions in the silicon community).

The TilePro upgrades include changes to do a better job of getting data into the cores. TilePro adds another mesh network for cache coherency, doubles L1 caches, and

The chips also implement some electronics called “hash for home,” which spreads data over the caches on the chip, eliminating hot spots where cores keep hammering the same caches. The new chips also have instructions added specifically for handling video and audio data (important for streaming appliances that will be using the chip) and other instructions for moving and copying data in memory.

The memory controllers on the TilePro chips also have memory striping – akin to RAID striping on disks – to reduce bottlenecks and a direct memory access feature to put data into cache memory without having to go through main memory. All of these and a number of features on the chips have boosted power consumption by 5 percent, but the performance per watt of the chips is nearly double.

Showstopper for almost all HPC applications: no floating point math. But it is refreshing to see new memory designs being tried in practice.