Doug Eadline explores the idea of tossing inexpensive cluster nodes when they go down instead of repairing them
Designing a cluster with slower processors means that the node cost can be quite low. Of course there are interconnect issues, but for the sake of my argument, let’s assumed [sic] we are building a cluster out of small Mini ITX motherboards with on-board GigE (for instance the Intel DH57JG). These boards cost on the order of $100. If we assume disk-less nodes and add a processor, memory, and small case the node costs may reach $400 (or less). The question, I ask myself, at what price point would it make it too expensive to bother repairing? Of course it might be fixable, but given the TCO costs, is it possible that the cheapest approach might be to turn off the node and forget about it. For a small amount of nodes, this may seem extreme, but if you have a cluster with 10,000 nodes, then losing a small amount of computing power may not be that significant.
Full article at Linux Mag.
[…] Node down? Toss it. (insidehpc.com) […]