d2-LBDR: Distance-driven routing to handle permanent failures in 2D mesh NoCs

With the advent of deep sub-micron technology, fault-tolerant solutions are needed to keep many-core chips operative. In NoCs, Logic Based Distributed Routing (LBDR) proved to be a flexible routing framework for 2D meshes with link and router faults. However, to provide full coverage, LBDR requires a module named FORKS which replicates some messages. This imposes the use of virtual cut-through switching and a complex router arbiter, increasing excessively the router cost, mainly in buffer area. Also, some failure combinations require the use of a non-trivial dynamic reconfiguration strategy to avoid deadlocks.

We propose d2-LBDR which adds, on every router, a distance register to the closest failure. This enables the support of more failure combinations without an excessive implementation cost. Indeed, we restore the use of wormhole switching, keeping router architecture simple, while achieving the same fault coverage as the best LBDR version, without requiring complex switching strategies nor any dynamic reconfiguration strategy. Results show that a small area overhead (3%) is enough for the implementation of a fully flexible routing method without any limiting support case when compared with LBDR. d2-LBDR reduces area overhead over the best LBDR approach (300% overhead against 3%) while preserving fault coverage. Results show d2-LBDR performance equal to LBDR.