On-chip PowerGrid EMIR Prediction and Failure Fixing


Table of contents
  1. Brief of Electromigration Effects
  2. GenAI Accelerated Fast EMIR Prediction
  3. GenAI Accelerated Fast EMIR Failure Mitigation
  4. Publications

Brief of Electromigration

Electromigration

Electromigration (EM) is the movement of atoms based on the flow of current through a material. If the current density is high enough, the heat dissipated within the material will repeatedly break atoms from the structure and move them. This will create both ‘vacancies’ and ‘deposits’. The vacancies can grow and eventually break circuit connections resulting in open-circuits, while the deposits can grow and eventually close circuit connections resulting in short-circuit.

On-chip PowerGrid EMIR

On-chip power distribution network (PDN) is a mesh-structured network that provides power from top metals. Due to the large and unidirectional current, PDNs are usually vulnerable to EM-induced failures. The wires’ resistance may change over time due to the EM effect, resulting in the IR drops below the threshold voltage after years of aging effect, makes it difficult to design reliable PDN with area requirement

Left: Electromigration. Middle: An example of ARM Core on-chip power grid. Right: The EMIR on this on-chip power grid.

On-chip PowerGrid EMIR Prediction Using Generative ML Models

AEs, VAEs, and CVAEs: CNN-based encoder-decoder structures are widely used in image generation tasks. This architecture forms the basis for several generative models, including the simplest form AE, GAN, and VAE. The key point of using the abovementioned models is that they treat the power grid EMIR distribution similarly to a one-shot image generation task, skipping the iterative process of physics-based models.

On-chip powergrid EMIR prediction using VAE-based Models

We also investigated other generative models, such as transformer, which fits better in real-time timing sequence data scenarios, such as predicting CPU thermal with command sequence. However, for image-liked input, such as on-chip power grid EMIR prediction, a CNN-based encoder-decoder model still outperforms Vi-transformer-based models.

ML-accelerated PowerGrid EMIR Failure Fixing

  • Problem formulation: With the power grid information at T = 0 as our starting point, we set out to predict the EMIR drop at the target aging lifetime T = t. Our goal is not just to predict, but to alleviate revealed EMIR drop failure by resizing the power grid interconnection trees’ width with a minimum metal area increase. This scenario can be considered as an non-linear optimization problem.

  • Main Innovations: Two computationally intensive steps were resolved during the optimization, significantly improving the power grid EMIR failure fixing speed.

    • v(t,s): The aing-aware EMIR. v(t,s) has been the output of the ML model from the previous step.
    • $\frac{\partial v(t,s)}{\particl s}$: Sensitivity of aging-aware EMIR to solution point s, which can be calculated using the by-product of model inferencing, backgradient of output EMIR to input resistance.
ML-accelerated fast On-chip powergrid EMIR prediction and fixing framework.