# Efficiency, Stability, and Reliability Implications of Unbalanced Current Sharing Among Distributed On-Chip Voltage Regulators

Longfei Wang, S. Karen Khatamifard, Orhun Aras Uzun, Ulya R. Karpuzcu, and Selçuk Köse, Member, IEEE

Abstract-Power delivery networks with distributed on-chip voltage regulators (VRs) serve as an effective way for fast localized voltage regulation within modern microprocessors. Without careful consideration of the interactions among the distributed VRs and the power grid, unbalanced current sharing (CS) among those regulators may, however, lead to efficiency degradations, stability, and reliability issues, and even malfunctions of the regulators. This paper is a first attempt to investigate the efficiency, stability, and reliability implications of unbalanced CS among distributed on-chip VRs. Benefits of balanced CS are demonstrated with concrete examples, showing the necessity of an appropriate current balancing scheme. An adaptive reference voltage control method and the corresponding control algorithms specifically for distributed on-chip VRs are proposed to balance the CS among regulators at different locations. The proposed techniques successfully balance the CS among distributed VRs and can be applied to different regulator types. Simulation results based on practical microprocessor setups confirm the efficiency, stability, and reliability implications.

*Index Terms*—Current sharing (CS), distributed on-chip voltage regulator (VR), power delivery network (PDN), power efficiency, reliability, stability.

## I. INTRODUCTION

**E**FFICIENT, stable, and reliable operations of power delivery networks (PDNs) are crucial to sustain highperformance and low-power design targets of modern largescale integrated circuits (ICs). Thermal design power (TDP) of microprocessors has increased over generations and can go beyond 100 W [1]. The peak power of a microprocessor can, however, be 1.5 times the TDP rating [2]. Even small power conversion efficiency degradations within such power-hungry ICs lead to tremendous power loss, resulting in higher heat dissipation. Meanwhile, the complexity and the large component count incur serious stability and reliability concerns.

Manuscript received October 4, 2016; revised January 13, 2017, March 31, 2017, and June 22, 2017; accepted July 28, 2017. Date of publication September 4, 2017; date of current version October 23, 2017. This work was supported in part by the National Science Foundation CAREER Award under Grant CCF-1350451, in part by the National Science Foundation Award under Grant CCF-1421988, and in part by the Cisco Research Award. (*Corresponding author: Longfei Wang.*)

L. Wang, O. A. Uzun, and S. Köse are with the Department of Electrical Engineering, University of South Florida, Tampa, FL 33620 USA (e-mail: longfei@mail.usf.edu; orhunuzun@mail.usf.edu; kose@usf.edu).

S. K. Khatamifard and U. R. Karpuzcu are with the Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455 USA (e-mail: khata006@umn.edu; ukarpuzc@umn.edu).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TVLSI.2017.2742944

Voltage regulators (VRs) as an essential part of PDNs, including commonly used buck, switched capacitor (SC), and low-dropout (LDO) regulators, have been moved from off-chip placements to on-chip implementations to save board area and to enable efficient, fast, and secure localized voltage regulation [3]–[5]. Distributed on-chip voltage regulation has recently become an emerging research field where multiple on-chip VRs are connected in parallel and distributed across the power grid to supply current across the whole die [6]-[13]. Previous work mainly focuses on the efficiency improvement of stand-alone VRs [3] and that of the PDNs as a whole [14]. The implications of the complex interactions among on-chip VRs and the power grid have, however, been typically overlooked. Although there are appealing benefits of the distributed on-chip voltage regulation, complex interactions among regulators and the power grid may lead to significant efficiency, stability, and reliability issues. Among the various implications of distributed on-chip voltage regulation, unbalanced current sharing (CS), if not carefully controlled, can stultify the previously proposed efficiency enhancement benefits or even shorten the lifetime of the chip.

Unbalanced CS problem has been widely studied in conventional power electronics field for multiphase interleaving buck regulators [15], [16]. Little attention has, however, been paid to this problem within microelectronics field for distributed on-chip voltage regulation, and, to the best of our knowledge, the efficiency, stability, and reliability implications of unbalanced CS within distributed on-chip PDNs have not yet been investigated.

VRs within distributed on-chip PDNs are connected to a passive mesh network [1], which supplies the required current to the load circuits. Several factors may lead to unbalanced CS within distributed on-chip power delivery systems that consist of multiple parallel VRs. These factors include mismatches in the component values and control loop mismatches, which are common factors leading to the unbalanced current within conventional centralized multiphase regulators [15], [16]. Specific to distributed on-chip PDNs, the power grid parasitic impedance among the VRs and load circuits, although quite small, may have significant variations based on the placement of the VRs and the load circuits. Therefore, even with perfectly matched components and control loops among different distributed on-chip VRs, the variations of the power grid resistance among individual VRs and load circuits may lead to nonnegligible mismatch and severe CS problems.

1063-8210 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.



Fig. 1. On-chip PDN with distributed VRs.

The contribution of this paper is threefold. First, the unbalanced CS problem is presented with extensive simulations in both Cadence Virtuoso and VoltSpot [18]. Power efficiency, stability, and reliability implications of the unbalanced CS within distributed on-chip PDNs are investigated. Theoretical derivations and simulation results lead to the observation that unbalanced CS can adversely affect the important design concerns, which necessitates an efficient current balancing scheme. Second, an adaptive reference voltage control mechanism is proposed as the current balancing scheme for distributed on-chip VRs to dynamically modulate the reference voltage of each individual VR. Circuit implementations are analyzed for the proposed control algorithm and preliminary simulations are performed to verify the effectiveness. Finally, an IBM POWER8-like [17] microprocessor simulation platform is constructed in VoltSpot [18] to study the implications of the unbalanced CS problem in practical applications. Extensive simulations based on several benchmarks are performed and simulation results confirm the benefits of balanced CS. Although the analyses are conducted assuming a homogeneous PDN with buck regulators, without loss of generality, the proposed technique can be easily applied to heterogeneous PDNs that house different regulator types.

#### II. UNBALANCED CURRENT SHARING PROBLEM

An on-chip PDN model with distributed VRs is shown in Fig. 1. The inputs of the distributed VRs are connected to a global power grid that is connected to the package through the dedicated C4 pads. The outputs of the distributed on-chip VRs provide the required current at the target voltage level to the local power grid that feeds the load circuits. The global ground distribution provides the ground plane for the load circuits and is connected to the package through the dedicated GND C4 pads. The global and local power grid, and the global ground distribution are composed of orthogonal metal lines connected with vias [1]. With a first-order approximation, these power grids can be modeled as a resistive mesh where the effective resistance between any two nodes on the power



Fig. 2. Unbalanced CS between two identical distributed on-chip buck regulators. (a) Inductor currents of two identical regulators supplying total load current of 1 A. (b) Zoomed-in view of the inductor current profiles at steady state. (c) Inductor currents of two identical regulators supplying total load current of 2 A, one inductor current goes saturated due to the maximum 1.27-A load current one regulator can supply. (d) Zoomed-in view of the inductor current.

grid depends on the distance between the two nodes [19], [20]. The effective resistance mismatch between the distributed VRs with only local voltage regulation loops may cause unbalanced CS among the VRs and may even cause VR malfunctions.

To demonstrate the unbalanced CS problem, two sets of simulations are performed. First, two identical buck regulators providing localized voltage regulation are designed and simulated in Cadence Virtuoso using the IBM 130-nm CMOS process. The input voltage of the buck regulator is 3.3 V and the output voltage is 1 V. The switching frequency is 140 MHz with a 5-nH inductor. The peak-to-peak current ripple on the inductor is about 1 A and the load regulation is 0.02%/A. Each regulator has a maximum load current supply capability of 1.27 A. The on-chip power grid is designed as a resistive mesh using the design parameters of respective metal layers in [21]. Second, a buck regulator model is extracted and included in VoltSpot [18] for PDN simulations with large number of on-chip VRs. An IBM POWER8-like processor with 96 identical distributed regulators is used in the simulations. Detailed VoltSpot simulation setup is explained in Section VII. Simulation results demonstrating the unbalanced CS problem in both Cadence Virtuoso and VoltSpot are summarized in this section.

## A. Large Current Variations

The load current supplied by a buck regulator is the average value of the respective inductor current. The inductor current of the two regulators when the total load current is 1 A is shown in Fig. 2(a) and (b). Due to the difference in the effective resistance for the two regulators, these regulators have different average inductor current values of 328.7 and 671.3 mA. With unbalanced CS, one regulator supplies more than twice the output current than the other. With a larger effective resistance mismatch, the difference can be even larger.



Fig. 3. Unbalanced CS among 96 identical distributed on-chip VRs within IBM POWER8-like microprocessor.

The output current values of the 96 identical distributed on-chip VRs within an IBM POWER8-like microprocessor chip for application lu\_ncb are shown in Fig. 3. The detailed simulation setup is explained in Section VII. In this simulation, 96 on-chip VRs are evenly distributed across the chip. As can be seen from Fig. 3, large current variations occur among these on-chip VRs. The highest current supplied by one VR goes up to nearly 2.5 A and the lowest current supplied by one VR is around 0.5 A. There is  $5 \times$  difference between the highest and the lowest on-chip VR current.

#### B. Voltage Regulator Malfunctions

For the same two buck regulator design at the same physical locations on the power grid as used in Section II-A, with a higher total load current of 2 A, the inductor current distribution between the two regulators is shown in Fig. 2(c) and (d). As can be seen from Fig. 2(c) and (d), the difference between the two regulator inductor currents gradually becomes larger, and at steady state, one inductor becomes saturated and provides a constant current. For the saturated regulator, the pull-up pMOS transistor is always ON, leading to 100% duty cycle operation and the malfunction of the VR. When the total load current is equally shared between the two, the malfunction of the VRs could be avoided, as the current supplied by each VR is less than the maximum VR current capability.

Note that, in Fig. 3, the on-chip VR model is included in VoltSpot for current distribution simulations and no limit is set for the maximum current that an individual VR can provide. If the output current capability of a VR is designed to be 1.5 A, there would be more than ten on-chip VRs that enter this saturation point in this simulation, leading to chipwide VR malfunctions. As overcurrent protection schemes are implemented for most VRs, VR malfunctions can be avoided. However, overloaded current can lead to output voltage drop [22], which is still not acceptable. Furthermore, as one VR supplies  $5 \times$  current than the other, huge current density can lead to local hotspots of the VR and even destroy the VR and the nearby functional blocks.

With unbalanced CS, each on-chip VR needs to be designed for the worst case scenario to be able to supply the highest possible current with high efficiency. The size of power MOSFETs needs to be increased as compared with the design targeting



Fig. 4. Conventional buck regulator, SC regulator, and LDO efficiency curves.

at the total load current divided by N for N distributed VRs, which may introduce extra power and area overhead, as power MOSFETs can occupy a large percentage of the total VR area.

# III. EFFICIENCY IMPLICATIONS OF UNBALANCED CURRENT SHARING

Power conversion efficiency curves for the conventional buck, SC, and LDO regulators are shown in Fig. 4. Consider two identical distributed on-chip buck or SC regulators with each design optimized at  $I_o/2$  for a total load current of  $I_o$ . With balanced CS, each buck or SC regulator operates at the optimum design point, providing maximum efficiency. With unbalanced CS, one regulator provides lower current  $I_1$  while the other one provides higher current  $I_2$ . As can be seen from Fig. 4, any variation in the load current from the optimum load current point leads to an unavoidable power efficiency loss. For LDOs, the efficiency is determined by

$$\eta_{\text{LDO}} = \frac{I_o V_o}{(I_o + I_q) V_i} \tag{1}$$

where  $I_o$  is the output current of the LDO and  $I_q$  is the quiescent current. With balanced CS, each LDO provides  $I_o/2$  current and the total efficiency is  $(I_oV_o/2)/(I_o/2 + I_q)V_i = I_oV_o/(I_o + 2I_q)V_i$ . With unbalanced CS, one of the LDOs provides  $I_1$  current and the other one provides  $I_2$  current with  $I_1 + I_2 = I_o$ . Since MOS transistors have a nearly constant quiescent current with respect to the load current [23], the total efficiency can be expressed as  $(I_1 + I_2)V_o/(I_1 + I_2 + 2I_q)V_i$ , which is the same as the balanced CS case. Theoretically, there is no significant efficiency degradation due to unbalanced CS for LDOs; however, larger currents induced by the unbalanced CS do adversely affect the reliability as will be discussed in Section V.

Buck regulators will be the focus throughout this paper; however, the proposed techniques can also be tailored for SC and LDO regulators. The regulator loss model and optimum efficiency discussions are provided in Section III-A. The extra power loss and the efficiency degradation induced by unbalanced CS for the general case of N identical distributed on-chip regulators are theoretically explored in Section III-B.

#### A. Regulator Loss Model and Efficiency

The simplified schematic of a synchronous buck regulator is shown in Fig. 5. It is composed of the high-side (Q1) and low-side (Q2) power MOSFETs for synchronous rectification, LC filter with parasitic resistances  $R_{\text{DCR}}$  and  $R_{\text{ESR}}$ , and a feedback control path.



Fig. 5. Simplified schematic of synchronous buck regulator.

The simplified power loss model in [25] is enhanced by including the conduction loss of the capacitor equivalent series resistance (ESR) ( $P_{\text{ESR}}$ ) for the power loss analysis in synchronous buck regulators

$$P_{\rm loss} = R_{\rm eq} \cdot i_{\rm rms}^2 + P_{\rm ESR} + A \cdot f \tag{2}$$

where  $R_{eq}$  is the regulator equivalent resistance,  $i_{rms}$  is the inductor rms current, A is the switching power loss factor, and f is the regulator switching frequency. Detailed power loss analysis and expressions for  $R_{eq}$ ,  $P_{ESR}$ , and A can be referred to [24] and [25].

Power conversion efficiency can be written as

$$\eta = \frac{P_{\text{out}}}{P_{\text{out}} + P_{\text{loss}}}.$$
(3)

Since  $P_{\text{ESR}}$  is independent of the regulator output current  $I_o$ , by setting  $\partial \eta / \partial I_o = 0$ , the maximum efficiency for the continuous conduction mode (CCM) operation is obtained as [25]

$$\eta_{\max} = \frac{1}{1 + 2\frac{R_{eq}}{V_o} \cdot I_{o\_opt}}$$
(4)

at the optimum load current of

$$I_{o\_opt} = \sqrt{\frac{A \cdot f + P_{\text{ESR}}}{R_{\text{eq}}} + \frac{1}{12}I_{p-p}^2}$$
(5)

where  $V_o$  and  $I_{p-p}$  are, respectively, the regulator output voltage and the inductor peak-to-peak current.

# *B. Efficiency Degradation of Distributed Regulators With Unbalanced Current Sharing*

Consider two identical buck regulators and assume the total load current supplied by these two regulators is  $I_o$  and each regulator design is optimized at  $I_o/2$ . With unbalanced CS, the load currents supplied by the two regulators are, respectively,  $I_1$  and  $I_2$  for regulators 1 and 2. CS ratios (CSRs) for the two regulators are

$$CSR_1 = \frac{I_1}{I_o}, \quad CSR_2 = \frac{I_2}{I_o}.$$
 (6)

According to (2), for CCM operations, the extra power loss induced by the unbalanced CS for two regulators as compared with the balanced case is

$$P_{\text{loss}_2}^{\text{ex}} = R_{\text{eq}} \cdot I_o^2 \cdot \left( \text{CSR}_1^2 + \text{CSR}_2^2 - \frac{1}{2} \right)$$
(7)

and  $P_{\text{loss}_2}^{\text{ex}} = 0$  if and only if when  $\text{CSR}_1 = \text{CSR}_2 = 1/2$ , otherwise,  $P_{\text{loss}_2}^{\text{ex}} > 0$ , which means that unbalanced CS leads to extra power loss.

Efficiency degradation due to unbalanced CS can be written as

$$\eta_{\text{deg}_2} = \eta_{\text{max}}|_{I_{o\_\text{opt}} = \frac{I_o}{2}} - \frac{V_o}{\frac{V_o}{\eta_{\text{max}}|_{I_{o\_\text{opt}} = \frac{I_o}{2}} + R_{\text{eq}} \cdot I_o \cdot \left(\text{CSR}_1^2 + \text{CSR}_2^2 - \frac{1}{2}\right)}$$
(8)

where  $\eta_{\text{max}}|_{I_o_{\text{opt}}=I_o/2}$  is the maximum efficiency at the optimum load current of  $I_o/2$ . Note that  $\eta_{\text{deg}} = 0$  for balanced CS.

Equations (7) and (8) can be generalized for N identical distributed on-chip VRs with each design optimized at  $I_o/N$  for a total load current of  $I_o$  as explained in the following.

The extra power loss induced by unbalanced CS with  $CSR_i$  for the *i*th regulator is

$$P_{\text{loss}\_N}^{\text{ex}} = R_{\text{eq}} \cdot I_o^2 \cdot \left(\sum_{i=1}^N \text{CSR}_i^2 - \frac{1}{N}\right).$$
(9)

The total efficiency degradation induced by unbalanced CS is

$$\eta_{\text{deg}_N} = \eta_{\text{max}}|_{I_{o_{\text{opt}}} = \frac{I_o}{N}} - \frac{V_o}{\frac{V_o}{\eta_{\text{max}}|_{I_{o_{\text{opt}}} = \frac{I_o}{N}}} + R_{\text{eq}} \cdot I_o \cdot \left(\sum_{i=1}^N \text{CSR}_i^2 - \frac{1}{N}\right)}.$$
 (10)

Note that (9) and (10) can be applied to a wide range of load current. As the phase shedding technique [26], [27] for conventional multiphase converters and the converter gating technique [5] for distributed on-chip VRs are well developed to enhance the light load efficiency and achieve a high efficiency over a wide load range, the number of active VRs  $N_{\text{active}}$  can be dynamically changed to make sure that each regulator can operate at the optimal efficiency point under various load conditions with balanced CS. Thus, (9) and (10) hold for extra power loss and efficiency degradation calculations under a wide load range.

As an example, using design parameters in [3] for the fully integrated buck regulator, the extra power loss and the efficiency degradation are evaluated for two and three distributed buck regulator cases with different CSR values. Each regulator is optimized at 225 mA, and the total load currents are, respectively, 450 and 675 mA for two and three regulator cases. As can be seen from Fig. 6, as CSR varies from the balanced CS point (CSR $_1 = 0.5$  for two regulator case and  $CSR_1 = CSR_2 = 1/3$  for three regulator case), the additional power loss and the efficiency degradation increase rapidly. Moreover, the highest extra power loss and the efficiency degradation points for the three regulator case are worse than the two regulator case. It is difficult to visually demonstrate the extra power loss and the efficiency degradation change when the number of regulators increase over three. With more number of regulators and larger output current, however, the highest extra power loss and efficiency degradation further increase. This indicates that significant attention should be paid to guarantee the proper CS among distributed on-chip VRs that are widely used in high-performance microprocessors.



Fig. 6. Unbalanced CS-induced extra power loss and efficiency degradation as a function of  $CSR_i$  for N identical distributed on-chip VRs. (a) Extra power loss, N = 2. (b) Efficiency degradation, N = 2. (c) Extra power loss, N = 3. (d) Efficiency degradation, N = 3.

## IV. STABILITY IMPLICATIONS OF UNBALANCED CURRENT SHARING

Stable operation of the stand-alone on-chip VR as well as the whole PDN is the basis for every other performance metric. Oscillations can occur due to an unstable internal feedback loop of a single VR or interactions among different VRs. The stability issue, if not properly addressed, can adversely affect important design aspects, including line and load regulations, making other performance enhancing techniques useless.

Stability implications of unbalanced CS are explored for both individual on-chip VRs and the PDN as a whole in this section. To evaluate the effects of unbalanced CS on individual on-chip VRs, the state-space averaging method [28] is applied to obtain the various important transfer functions of closedloop synchronous buck regulators while considering parasitic impedances. For the stability of the whole PDN, the implications of unbalanced CS can be examined by analyzing the Y-parameter model of the individual on-chip VRs based on the recently proposed hybrid stability framework for PDNs [8].

#### A. Stability of Individual On-Chip Voltage Regulators

The state-space expression for a conventional voltage mode controlled buck regulator with diode rectification and gparameters has been explored in [29]. For the synchronous buck regulator operating in CCM, as shown in Fig. 5, the openloop g-parameter set can be written as

$$\begin{bmatrix} Y_{i_{o}} & T_{oi_{o}} \\ G_{io_{o}} & -Z_{o_{o}} \end{bmatrix} = \frac{\begin{bmatrix} \frac{D^2s}{L} & \frac{D(1+sR_{\text{ESR}}C)}{LC} \\ \frac{D(1+sR_{\text{ESR}}C)}{LC} & -\frac{(R_E+sL)(1+sR_{\text{ESR}}C)}{LC} \end{bmatrix}}{s^2 + s\frac{R_E+R_{\text{ESR}}}{L} + \frac{1}{LC}}$$
(11)

$$\begin{bmatrix} G_{\rm ci} \\ G_{\rm co} \end{bmatrix} = \frac{\left\lfloor \frac{U_E(1+sR_{\rm ESR}C)}{L} \right\rfloor}{s^2 + s\frac{R_E + R_{\rm ESR}}{L} + \frac{1}{LC}} + \begin{bmatrix} I_o \\ 0 \end{bmatrix}$$
(12)

where

$$R_E = R_{\text{DCR}} + R_{\text{on}\_hs}D + R_{\text{on}\_ls}(1-D)$$
(13)

$$U_E = V_i + (R_{\text{on}\_ls} - R_{\text{on}\_hs})I_o.$$
<sup>(14)</sup>

 $Y_{i_o}$ ,  $T_{oi_o}$ ,  $G_{io_o}$ ,  $Z_{o_o}$ ,  $G_{ci}$ ,  $G_{co}$ , and D are, respectively, the open-loop input admittance, the output to input current transfer function, the input to output voltage transfer function, the output impedance, the control to input current transfer function, the control to output voltage transfer function, and the duty cycle of the buck regulator.

The line and load regulation capabilities of a buck regulator can be examined by analyzing the closed-loop input to output voltage transfer function  $G_{io_c}$  and the output impedance  $Z_{o_c}$ , respectively. To achieve a stable line and load regulation, all poles of the corresponding transfer function need to lie within the left-half of the s-plane. The closed-loop g-parameters can be obtained based on the open-loop g-parameters and the relationship demonstrated in [29]. Assuming Type III compensation [30], the characteristic equation of  $G_{io_c}$  and  $Z_{o_c}$  is

$$CLs^{2} + (CG_{a}G_{cc}G_{se}U_{E}R_{ESR} + CR_{ESR} + CR_{E})s + G_{a}G_{cc}G_{se}U_{E} + 1 = 0$$
(15)

where  $G_{se}$ ,  $G_{cc}$ , and  $G_a$  are, respectively, the sensing gain of the output voltage, the transfer function of the error amplifier (EA) and compensator, and the pulse width modulation generator gain. Typically,  $G_{se}$  and  $G_a$  are constant. As some of the coefficients are a function of  $I_o$ , solutions of (15) change as  $I_o$  changes. For N identical distributed on-chip VRs with unbalanced CS, some of the parallel on-chip VRs will supply more current while others will supply less, leading to the movement of system poles. As the stability is affected by the right-half-plane (RHP) poles, we define a CSR- and N-dependent function S(CSR, N) as

$$S(\text{CSR}, N) = \begin{cases} \max_{i=1,\dots,n} \{\text{Re}(p_i)\}, & \max_{i=1,\dots,n} \{\text{Re}(p_i)\} < 0\\ \min_{i=1,\dots,j} \{\text{Re}(p_i^+)\}, & \text{otherwise} \end{cases}$$
(16)

where *n*, *j*, *p<sub>i</sub>* (*i* = 1,...,*n*), and  $p_i^+(i = 1,...,j)$  are, respectively, the total number of system poles, the total number of RHP (or 0) poles, the *i*th system pole, and the *i*th RHP (or 0) pole. |S(CSR, N)| either indicates how close the system is to be unstable (for  $\max_{i=1,...,n} {\text{Re}(p_i)} < 0$ ) or how far the system has gone beyond the marginally stable point (for otherwise). The system is stable if S(CSR, N) < 0 and unstable otherwise.

Using similar design parameters in [3],  $S(\text{CSR}_i, N)$  for the *i*th VR within N identical distributed on-chip VRs is plotted as a function of CSR<sub>i</sub> and N in Fig. 7. It can be seen from Fig. 7 that, for a fixed number N,  $S(\text{CSR}_i, N)$  increases as CSR<sub>i</sub> increases. Note that although all CSR<sub>i</sub> values are plotted even for large number of N in Fig. 7 for completeness, due to the maximum current supply capability of a single VR, inductor current of individual VR can become saturated and the CCM model is no longer valid. The output voltage can drop [22] for large number of N and CSR<sub>i</sub> values, for example, N = 80 and CSR<sub>i</sub> = 0.5. Also, as N becomes large,  $S(\text{CSR}_i, N)$ 



Fig. 7. Stability of individual on-chip VR as a function of  $CSR_i$  and N.

approaches the unstable region from the stable one as  $CSR_i$  increases, indicating the negative effects of unbalanced CS on the stability and proper operation of individual VR.

## B. Stability of the Power Delivery Network

A sufficient condition for stability checking of the PDN is proposed in [8] based on the hybrid stability framework. This condition consists of a complementary way of using either passivity evaluation or system gain evaluation for linear time-invariant systems. By satisfying either one of these two conditions, the stability of the PDN can be guaranteed. For stability checking using the system gain condition, a Zparameter model of the passive subnetwork is needed for evaluation. The passive subnetwork model can vary for different applications or design requirements, which makes it difficult to evaluate the general effects of unbalanced CS on the stability of PDN. However, the passivity evaluation does shed light on this point.

The synchronous buck regulator system is approximated as a linear continuous-time time-invariant system through the statespace averaging method [31]. Thus, the passivity criterion [8] can be applied, which is given by

$$\lambda_{\min}(j\omega_k) = \min_{i=1,\dots,N; \, j=1,2} \left\{ \lambda_j \left( \mathbf{Y}_i(j\omega_k) + \mathbf{Y}_i^H(j\omega_k) \right) \right\} \quad (17)$$

where  $\lambda_{\min}(j\omega_k)$  is the minimum eigenvalue among any *i*th VR at  $\omega_k$  and *H* denotes the complex conjugate transpose. Passivity condition is met for the VRs if  $\lambda_{\min}(j\omega_k) \ge 0$ .

The Y-parameter model for the *i*th VR can be obtained through the closed-loop g-parameters. Note that the Y-parameter model is a function of individual VR output current  $I_o$ , and thus, with unbalanced CS, it will be affected and so does  $\lambda_{\min}(j\omega_k)$ . Using the same design parameters in Section IV-A,  $\lambda_{\min}^i(j\omega_k)$  is examined for the *i*th VR under different CSR<sub>i</sub> and N values in Fig. 8, where

$$\lambda_{\min}^{i}(j\omega_{k}) = \min_{j=1,2} \{\lambda_{j} \left( Y_{i}(j\omega_{k}) + Y_{i}^{H}(j\omega_{k}) \right) \}.$$
(18)

 $\lambda_{\min}^{i}(j\omega_{k})$  remains negative for  $f_{k} < 10$  MHz and positive for  $f_{k} > 100$  MHz. As  $I_{o}$  supplied by the *i*th VR (i.e.,  $N \cdot \text{CSR}_{i}$  increases),  $\lambda_{\min}^{i}(j\omega_{k})$  shifts rightward, rendering



Fig. 8.  $\lambda_{\min}^{i}(j\omega_{k})$  as a function of  $f_{k}$  under different values of CSR<sub>i</sub> and N.  $\lambda_{\min}^{i}(j\omega_{k})$  shifts rightward as  $N \cdot \text{CSR}_{i}$  increases, demonstrating the adverse effects of unbalanced CS on VR passivity.

the following:

$$\lambda_{\min}(j\omega_k)|_{\omega_k \le \omega_{k0}} = \min_{i=1,\dots,N} \{\lambda^i_{\min}(j\omega_k)\}$$
  
=  $\lambda^i_{\min}(j\omega_k)|_{\text{CSR}_i = \text{CSR}_{\max}}$  (19)

where

$$\lambda_{\min}(j\omega_{k0}) = 0, \quad \text{CSR}_{\max} = \max_{i=1,\dots,N} \text{CSR}_i. \tag{20}$$

For example, at  $f_k = 45$  MHz, with balanced CS (i.e.,  $\forall N$ , balanced CS),  $\lambda_{\min}(j\omega_k)|_{\omega_k=9\pi \cdot 10^7} > 0$ , the passivity condition is satisfied. However, with unbalanced CS case (e.g., N = 20, CSR<sub>i</sub> = 0.1),  $\lambda_{\min}(j\omega_k)|_{\omega_k=9\pi \cdot 10^7} < 0$ , which pushes the originally passive point to the potentially unstable region, indicating the adverse effects of unbalanced CS on the stability of the whole PDN.

## V. RELIABILITY IMPLICATIONS OF UNBALANCED CURRENT SHARING

Electromigration (EM)-induced wear-out dictates the lifetime of each component of the PDN. EM results in gradual mass transport in metal conductors along the direction of an applied electric field, which, in turn, may cause open or short circuits. The metal wires in the PDN are particularly vulnerable to EM as they experience unidirectional currents [32], and such constant stress reveals EM failures faster. EM grows with current density *J*.

Black's equation [33] captures the mean time to failure (MTTF) due to EM

$$MTTF = AJ^{-n}\exp(Ea/kT)$$
(21)

where A is a constant that depends on the geometry, Ea is the EM activation energy, k is Boltzmann's constant, n is a material-specific constant, and T is the temperature. Following [18], Black's equation can be adjusted to consider current crowding and Joule heating as:

$$MTTF = A(cJ)^{-n} \exp[Q/k(T + \Delta T)]$$
(22)

where both Q and c are material-specific constants.



Fig. 9. MTTF as a function of  $CSR_i$ .

Consider N identical distributed on-chip VRs, each of which optimized for a load current of  $I_o/N$ , where  $I_o$  represents the total load current. Since J is directly related to CSR<sub>i</sub> at a specific  $I_o$ , MTTF of the metal wire at the output of the *i*th regulator can be expressed in terms of CSR<sub>i</sub> as

$$MTTF_i = A'(cCSR_i)^{-n} \exp[Q/k(T + \Delta T)]$$
(23)

where A' is a constant that depends on the geometry and  $I_o$ .

For the same example in [3], for two and three regulator cases with a total load current of 450 and 675 mA, respectively, Fig. 9 shows how  $MTTF_i$  for the *i*th regulator changes due to unbalanced CS. Fig. 9 captures the impact of unbalanced CS on MTTF under EM per (23). We report how the MTTF varies as a function of CSR, where n = 1.8, Q = 0.8 eV, c = 10, and  $\Delta T = 40$  °C [34]. We observe that differences in CSR can result in notable differences in MTTF. The MTTF at CSR = 0.5 (0.33), which corresponds to perfect load balance, is 5 years at 65 °C for the two (three) regulator case. For the two regulator case, both regulators would have this similar MTTF = 5 years at CSR = 0.5. If CSR assumes a higher value than 0.5 for one of the regulators, the MTTF value quickly decreases below 5 years. The other regulator's CSR in this case remains lower than 0.5, and hence induces an MTTF of more than 5 years. In this case, one of the regulators would fail much earlier than the other. Better load balance (i.e., CSR = 0.5for the two regulator case) mitigates this adverse effect on reliability. Fig. 9 reveals a similar trend for three VR case.

## VI. ADAPTIVE REFERENCE VOLTAGE CONTROL

The implications of unbalanced CS on power efficiency, stability, reliability, and overall functionality of the chip are demonstrated earlier. Balanced CS is beneficial to maintain the overall PDN performance. An adaptive reference voltage control method designed specifically for distributed on-chip VRs is proposed to balance the CS. The proposed technique is scalable for different numbers of distributed on-chip VRs and can be used for different types of VRs. The control algorithm is explained and circuit implementation and simulations are presented to verify the effectiveness of the proposed techniques. Practical concerns are also addressed in this section.

## A. Adaptive V<sub>ref</sub> Control Mechanism

Consider two identical distributed VRs connected to the same power grid. The simplified model is shown in Fig. 10



Fig. 10. Simplified model of two identical distributed on-chip VRs with power grid effective resistances.

with the power grid effective resistance included between any two connection nodes within the grid. With a large number of VDD C4 pads, the input voltage of the VRs  $V_i$  can be considered ideal and constant. To perform a steady-state analysis with multiple VRs, suppose  $V_{o1} = V_{o2}$ , then  $I_3 = 0$ , and  $R_{eff3}$  can be removed as open circuit. When  $V_{o1} = V_{o2}$ , to make  $I_1 = I_2$  for balanced CS,  $R_{eff1}$  and  $R_{eff2}$  have to be equal. However, in practice, due to the location variations of the VRs with respect to the load,  $R_{eff1}$  and  $R_{eff2}$  can hardly be equal, which means variations between  $V_{o1}$  and  $V_{o2}$ are unavoidable to make  $I_1 = I_2$  for balanced CS. In fact, the effective resistances  $R_{eff1}$ ,  $R_{eff2}$ , and  $R_{eff3}$  are very small, making the balanced CS possible with quite small variations of  $V_{o1}$  and  $V_{o2}$  with negligible effects on the regulated output voltage  $V_o$ .

Based on the above-mentioned analyses, an adaptive reference voltage  $V_{ref}$  control mechanism that is tailored specifically for distributed on-chip VRs is proposed. A system-level block diagram of the proposed adaptive V<sub>ref</sub> control method is shown in Fig. 11, and the  $V_{ref}$  control algorithm is presented in Fig. 12 for N identical distributed on-chip VRs. The proposed adaptive  $V_{ref}$  control block consists of an average current sensor within each VR, two comparators with N inputs for each (N comparator) [35] to determine the maximum and minimum currents, a current\_mismatch decision block, and a  $V_{\rm ref}$  control logic. For each iteration, the average current value of each VR for that cycle is obtained through the average current sensor and represented by respective output voltage  $V_{\text{sense}_i}$  (i = 1, ..., N). The maximum and minimum values of  $V_{\text{sense}_i}$  (i = 1, ..., N) are decided by the N comparator [35]. The difference between the maximum and minimum current is compared with a *current\_mismatch* value by the *current\_mismatch* decision block. The processed outputs of the N comparator and *current\_mismatch* decision block serve as the control signals for the  $V_{ref}$  control logic for multilevel  $V_{ref}$  generation through the switch network and resistor string. Mismatch between the maximum and minimum average inductor current indicates unbalanced CS. If the mismatch is larger than a certain threshold *current mismatch*, the proposed  $V_{ref}$  control algorithm is triggered and the corresponding reference voltages are adjusted. The current\_mismatch value is added as an option to adjust the desired accuracy for the current matching among the VRs and to eliminate constant toggling during steady state where all the VR output currents are close to each other. If the optimal load current



Fig. 11. System-level block diagram of the proposed  $V_{ref}$  control method and multilevel  $V_{ref}$  generator for N identical distributed on-chip VRs.



Fig. 12. Flowchart of the proposed adaptive  $V_{ref}$  control algorithm.

 $[I_{o_{opt}}$  in (5)] a single VR can supply is in the range of several hundred mA, a few mA of the threshold value can be considered as balanced current. A threshold value of 30 mA is used in the simulations. A too small threshold value can lead to toggling reference voltages at steady state.

By increasing (decreasing)  $V_{ref}$  of an individual on-chip VR, the output current supplied by that VR will increase (decrease).  $V_{ref_max}$  and  $V_{ref_min}$  in Fig. 12 denote the reference voltages for the on-chip VRs with the maximum and minimum average inductor current, respectively. Once the difference between the maximum and minimum average inductor current values is greater than *current\_mismatch*,  $V_{ref_max}$  is decreased by a voltage step to decrease the output current supplied by the VR, which provides the maximum output current.  $V_{ref_min}$  is increased by a voltage step to increase the output current supplied by the VR, which provides the minimum output current. The reference voltages of other VRs remain unchanged.

Note that the  $V_{\text{ref}}$  control loop waits *n* clock cycles before changing  $V_{\text{ref}}$  again. This is done in order to allow the VR's voltage regulation feedback loop to respond before any changes made to the  $V_{\text{ref}}$  in the next step. Making the reference control loop slower than the VR's voltage regulation feedback loop improves the stability of the overall system.

As compared with [6], the proposed method does not rely on equalizing duty cycles to balance the CS, and thus can be applied to most regulator types that need a reference



Fig. 13. Schematic of the average current sensor.

voltage to operate. Furthermore, as the reference voltage of each VR is adjusted individually with respect to an initial reference voltage, the power noise on the local power grids is less affected by localized load fluctuations.

## B. Adaptive V<sub>ref</sub> Control Implementation

Circuit-level implementation of the proposed adaptive  $V_{\text{ref}}$  control method is analyzed in this section. Although buck regulator is adopted for demonstration, the proposed  $V_{\text{ref}}$  control method can be applied to other regulator types by adopting an appropriate current sensor for that regulator type, as the proposed method is a general way of modulating  $V_{\text{ref}}$  to balance the current.

1) Average Current Sensor: The schematic of the average current sensor [36] is shown in Fig. 13. When the sampling clock  $\phi$  becomes high, the drain voltages of the power MOSFET and the sense MOSFET are equalized by the operational amplifier. The inductor current from the power MOSFET is mirrored to the sense MOSFET and a corresponding voltage  $V_{\text{sense}}$  that is proportional to the inductor current is generated as output.  $V_{\text{sense}}$  is maintained when  $\phi$  becomes low. By replacing the ramp signal in Fig. 5 with a symmetrical triangular waveform shown in Fig. 11, a clock signal  $\phi'$  can be generated to sample the instant inductor current value in the middle of the inductor energizing or deenergizing phase, which corresponds to the average inductor current value [36]. As *n* clock cycles need to be skipped before taking the next sample for average inductor current, the frequency  $f_{\phi}$  of the actual sampling clock signal  $\phi$  needs to be  $f_{\phi'}/(n+1)$ .

2) N Comparator: The schematic of the N comparator [35] for maximum and minimum current decision is shown in Fig. 14.  $V_{\text{sense}_i}$  (i = 1, ..., N) from the output of the average current sensor serves as the input of the N comparator. For the N comparator for maximum current decision, the tail current provided by transistor  $M_{\text{tail}}$  is divided into each branch equally when the same voltage is given to all inputs.  $M_i$  (i = 1, ..., N) devices are biased and sized appropriately  $[(W/L)_{M_{\text{tail}}} = N(W/L)_{M_i}]$  to reflect this distribution. The voltage input  $V_{\text{sense}_i}$  determines the portion of the tail current that passes through each branch. Since the sum of the currents from all the branches must be equal to the tail current provided by the  $M_{\text{tail}}$  device, the branch with the highest input voltage gets the largest portion of the tail current. The branch currents are then mirrored and a highresistance output node is formed using the  $M_i$  (i = 1, ..., N)devices. Since the  $M_i$  (i = 1, ..., N) devices are biased for



Fig. 14. Schematic of the analog N comparator for maximum and minimum current decision.

1/N of the tail current, the output voltage becomes logic high when a branch gets more than 1/N of the tail current, which is true for the branch with the highest voltage, and logic low if a branch gets less than 1/N of the tail current. The high-resistance node provides high gain at the output but further cascading may be needed to provide rail-to-rail outputs. Less than 1 mV input voltage difference can be distinguished by cascading three stages in the simulations. In the case where the input voltages are very close to each other, this comparator may give incorrect outputs where more than one current is minimum or maximum. Considering this case, the outputs of the N comparator  $V_{\max_i}$  and  $V_{\min_i}$ (i = 1, ..., N) are processed by a digital logic to generate  $V'_{\max_i}$  and  $V'_{\min_i}$  (i = 1, ..., N) to control the current\_mismatch decision block and the  $V_{ref}$  control logic shown in Fig. 11. If there are more than one maximum or minimum current, the digital logic simply selects the VR with smaller *i* as the one that supplies the maximum or minimum current. The N comparator for minimum current decision can be implemented as a complement of the N comparator for maximum current decision shown in Fig. 14.

3) Current\_Mismatch Decision: The schematic of the current\_mismatch decision block is shown in Fig. 15. The processed outputs of the N comparator  $V'_{\max_i}$  and  $V'_{\min_i}$ (i = 1, ..., N) are fed to 2N transmission gates as selection signals for the maximum and minimum value of  $V_{\text{sense}}$ (i = 1, ..., N). The maximum and minimum values of  $V_{\text{sense}}$ serve as the inputs of the current\_mismatch comparator as, respectively,  $V_{\text{max}}$  and  $V_{\text{min}}$  to generate the enable signal EN for subsequent V<sub>ref</sub> control logic. An intentional input transistor size mismatch is introduced for the *current mismatch* comparator with larger transistor size connected to  $V_{\min}$  as compared with that connected to  $V_{\text{max}}$  to achieve the offset voltage  $V_{offset}$  that corresponds to the current\_mismatch value. Only when  $V_{\text{max}} - V_{\text{min}} > V_{\text{offset}}$  will the EN signal be active. As current\_mismatch does not need to be accurate as long as it is larger than  $\Delta$  ( $\Delta I$ ), as will be discussed



Fig. 15. Schematic of the current\_mismatch decision block.

next, practical circuit implementations considering process variations have negligible impacts on the circuit function.

4) Multilevel  $V_{ref}$  Generation: The proposed multilevel  $V_{ref}$  generator is composed of a  $V_{ref}$  control logic, a bandgap voltage reference, and a simple resistor string digital-to-analog converter (DAC) as shown in Fig. 11. There are two resistors with large resistance  $R_b$  at the top and the bottom of the string and a few resistors with smaller resistance  $R_s$  connected in the middle to generate the desired  $V_{refs}$ .  $V'_{max_i}$ ,  $V'_{min_i}$  (i = 1, ..., N), EN, and a clock signal, which is a delayed version of  $\phi$ , are given to the  $V_{ref}$  control logic. This logic determines how the reference voltages for each VR should behave according to the algorithm in Fig. 12. The logic can be implemented completely in Verilog and synthesized.

The reference voltage generation requires analog implementation, and this implementation can be a resistor string DAC. The voltage step level that can achieve the desired current\_mismatch value is the LSB of the DAC. The goal of the adaptive  $V_{\rm ref}$  control method is to achieve  $\Delta I =$  $I_{\text{max}} - I_{\text{min}} < current\_mismatch$ . If without  $V_{\text{ref}}$  control,  $\Delta I = \Delta I_0$ , and one voltage step change can introduce  $\Delta (\Delta I)$ of  $\Delta I$  change, the number of bits for the DAC (N<sub>DAC</sub>) that is fine enough for balanced CS can be estimated as  $N_{\text{DAC}} > \log_2(\Delta I_0 / \Delta(\Delta I))$ . A 7-bit DAC is used to achieve a 30-mA current\_mismatch value with a voltage step of 1 mV in the simulations. In the case where large number of VRs and high-resolution DAC are needed, a charge pump can be utilized for each phase after the  $V_{ref}$  control logic for DAC implementation to avoid possible routing problem induced by the resistor string.

#### C. Simulation Verifications

To demonstrate the effectiveness of the proposed control method, two and three identical distributed on-chip VR cases are simulated. The power grid parameters are provided in Section VII. Simulation results with constant dc load current are shown in Figs. 16 and 17, respectively, for the two and three VR cases. In the simulations, ideal  $V_{\rm ref} = 0.5$  V is used to realize 1 V output voltage. A  $V_{\rm ref}$ step of 1 mV is used in the simulations. The proposed adaptive  $V_{\rm ref}$  control method begins to operate at 5  $\mu$ s. As can be seen from Figs. 16(a) and (c) and 17(a) and (c), for



Fig. 16. Simulation results with and without the proposed adaptive  $V_{\text{ref}}$  control scheme for two identical distributed on-chip VRs. (a) Inductor currents before and after the proposed  $V_{\text{ref}}$  control is applied. (b) Zoomed-in view of balanced CS showing the effectiveness of the proposed  $V_{\text{ref}}$  control method. (c) Zoomed-in view of unbalanced CS without the proposed  $V_{\text{ref}}$  control. (d)  $V_{\text{refs}}$  signal change showing the operation of the proposed  $V_{\text{ref}}$  control method.



Fig. 17. Simulation results with and without the proposed adaptive  $V_{\text{ref}}$  control scheme for three identical distributed on-chip VRs. (a) Inductor currents before and after the proposed  $V_{\text{ref}}$  control is applied. (b) Zoomed-in view of balanced CS showing the effectiveness of the proposed  $V_{\text{ref}}$  control method. (c) Zoomed-in view of unbalanced CS without the proposed  $V_{\text{ref}}$  control. (d)  $V_{\text{refs}}$  signal change showing the operation of the proposed  $V_{\text{ref}}$  control control method.

stand-alone VRs operating without proper  $V_{ref}$  control, large inductor current variations occur among those VRs. After the proposed  $V_{ref}$  control mechanism is applied, seen from Figs. 16(a) and (b) and 17(a) and (b), the unbalanced current converges quickly to the balanced one for both two and three VR cases. Also, as can be seen from Figs. 16(d) and 17(d), only small variations of reference voltage lead to quite good inductor current match, and meanwhile, the proper operation of the VRs is guaranteed. Simulation results with a fast changing sinusoidal and a step current load are shown in Fig. 18. In the simulations, the frequency of the sinusoidal wave is ten times of the VR switching frequency. As can be seen from Fig. 18, the proposed  $V_{ref}$  control method works well under changing load currents.



Fig. 18. Simulation results with sinusoidal and step load current for three identical distributed on-chip VRs. (a) Sinusoidal load current applied at 2  $\mu$ s. (b) Step load current waveform applied. (c) Balanced inductor currents under sinusoidal current load. (d) Balanced inductor currents under step current load. (e) Zoomed-in view of balanced inductor currents near the rising edge of the step current load. (f) Zoomed-in view of balanced inductor currents near the falling edge of the step current load.

## D. Practical Concerns

Considering the practical implementations of the  $V_{ref}$  control method, there are parasitic impedances between the generated reference voltage and the corresponding EA introduced by the distribution wires. The impedance of the distribution wires among different VRs can be different. Also, there can be VR components and control loop mismatches. Considering these effects, simulations are performed by introducing wire resistances and capacitances as well as VR components, and the loop delay mismatches to justify the effectiveness of the proposed method. The 1-mm distribution wire is assumed in the simulations. Based on the IBM 130 nm process, the parasitic resistance and capacitance are, respectively, around 70  $\Omega$ and 230 fF. A 10% mismatch is introduced among each VR regarding distribution wire impedance, L, C,  $R_{\text{DCR}}$ ,  $R_{\text{ESR}}$ , Q1, and Q2 size. The 5-ns control loop delay difference is introduced among each phase. The simulation results for three phases are shown in Fig. 19. As can be seen from the simulation results, the proposed method is immune to these mismatches.

## VII. CASE STUDY: IBM POWER8-LIKE MICROPROCESSOR

# A. Benchmarks

All the benchmarks used in the simulations are from SPLASH2x [37]. The benchmarks experimented represent typical application domains and features. Eight threads are involved in the simulations and analysis is limited to the region-of-interest of the benchmarks.



Fig. 19. Simulation results with and without the proposed adaptive  $V_{\rm ref}$  control scheme for three distributed on-chip VRs under distribution wire and VR mismatches. (a) Inductor currents before and after the proposed  $V_{\rm ref}$  control is applied. (b) Zoomed-in view of balanced CS showing the effectiveness of the proposed  $V_{\rm ref}$  control method under distribution wire and VR mismatches. (c) Zoomed-in view of unbalanced CS without the proposed  $V_{\rm ref}$  control. (d)  $V_{\rm refs}$  signal change showing the operation of the proposed  $V_{\rm ref}$  control method.

#### B. Architecture

An IBM POWER8-like [17] processor is modeled to quantitatively characterize unbalanced CS effects. The technology and architecture parameters of the processor are summarized in Table I. The schematic of a core is shown in Fig. 20(a), which contains a private L2, an instruction scheduling unit, an execution unit, a load store unit (LSU), and an instruction fetch unit (IFU). L1 data cache is a part of LSU, while L1 instruction cache resides inside IFU. Fig. 20(b) shows the whole chip floor plan, which contains 8 cores, 96 identical on-chip regulators, shown as little squares, network-on-chip, and memory controller.

#### C. Simulation Framework

Dynamic power traces are collected by integrating MR2 [38] version of McPAT [39] into SNIPER6.0 [40] microarchitectural simulator. Then, we calculate the static power of each unit based on its temperature and area. We use the equation from [41] to capture temperature dependence of static power. The static power of the whole chip is calibrated in a way that it takes less than 30% of the total chip power at 80 °C. Hotspot6.0 [42] is used to find the transient temperature across the chip. Transient temperature (output of Hotspot) is used to calculate the static power (input to Hotspot). So, we iteratively run Hotspot and update the static power numbers until they converge. Default parameters of Hotspot are used. VoltSpot is deployed to capture the current distribution among VRs at different locations and the method from [18] is followed to generate cycle-accurate power traces. One sample contains 2K cycles and 200 samples are obtained with equal distance for each application. The first 1K cycles are used for warm-up and the rest for analysis. The four clock cycles are used as the power trace sampling interval.

 TABLE I

 Technology and Architecture Parameters

| Technology Parameters                           |
|-------------------------------------------------|
| Technology node: 22nm, Frequency: 4.0GHz        |
| TDP: 150W, Area: $441mm^2$ , Vdd: 1.03V         |
| Architecture Parameters                         |
| # cores: 8                                      |
| issue width: 8                                  |
| 64 architectured FRF, 32 architectured IRF      |
| L1-I cache: 32KB, 8-way, 64B, LRU, 1-cycle hit  |
| L1-D cache: 64KB, 8-way, 64B, LRU, 1-cycle hit  |
| L2 cache: 512KB, 8-way, 128B, LRU, 11-cycle hit |
| L3 cache: 64MB, 8-way, 128B, LRU, 30-cycle hit  |



Fig. 20. Chiplet simplified floorplan. (a) Core. (b) Chip.

#### D. Power Grid and Voltage Regulator Properties

In VoltSpot configurations, the on-chip power grid is designed as a resistive mesh using similar metal width, pitch, and thickness parameters in [21] for the global, intermediate, and local PDN layers. The unit power grid resistance is around 8 m $\Omega$  and the total power grid size is 345 × 345. The effective resistance between any two nodes can be estimated using the equations in [19] and [20].

LDOs used in IBM POWER8 microprocessor and fully integrated voltage regulators (FIVRs) used in the Intel Haswell microprocessor are two state-of-the-art on-chip power delivery solutions. It is demonstrated in [13] that the FIVR-based power delivery scheme is more advantageous with large number of cores due to high efficiency over a wide conversion ratio. The gaining impetus and benefits of distributed on-chip voltage regulation together with the advantages of FIVR motivate us to investigate distributed buck regulators in the simulation setups.

The 96 identical on-chip VRs, with the area of each as 0.04 mm<sup>2</sup>, are used in the simulations to distribute across the chip as shown in Fig. 20(b). The optimal placement of LDOs is first investigated in [43] to meet the IR-drop constraint. To avoid any adversely biased analysis in our simulations, we mimic the algorithm proposed in [44] where a voltage-noise-minimizing technique is proposed to determine the locations of the C4 pads across several benchmarks. We use this algorithm to determine the optimal locations of the on-chip VRs that would minimize the voltage noise. Since the resulting maximum voltage noise only decreases by less than 0.4% with the optimal placement as compared with



Fig. 21. Calibrated efficiency curve for the on-chip VR.



Fig. 22. Power saving and regulator power loss saving with balanced CS for different applications.

the uniform distribution, we adopt the uniform placement of the VRs to simplify the analysis. These on-chip regulators are calibrated to match the conversion efficiency of FIVR design in the Intel's Haswell processor [27] as it is one of the most efficient regulators in industry. Efficiency curves in [27] are picked for calibration and each VR provides around 1-A load current with the optimum efficiency of about 90%. The calibrated efficiency curve is shown in Fig. 21. The on-chip VR is modeled as an ideal supply voltage in series with an *RLC* network in VoltSpot [18] simulations. Simpler *RL* and *RC*-based models have previously been used, respectively, in [1], [45], and [46] to model VRs. The proposed adaptive  $V_{ref}$  control method can be applied to balance the CS.

Simulation results showing the power saving and regulator power loss saving with balanced CS for different applications are shown in Fig. 22. Power saving up to 1 W and VR power loss saving up to 8% are observed. Note that balancing the current may lead to extra power losses on the power-grid resistors. The total gained power saving is due to the fact that the power saving induced by balanced CS can be much larger than the extra power loss consumed on the power grid resistors. For a general case of N distributed VRs, a total load current  $NI_{o_opt}$  with any  $CSR_i$  (i = 1, ..., N) for the *i*th VR, when  $CSR_i$  varies further from the balanced CS point, balanced CS may introduce more loss on the powergrid parasitic resistors; however, balanced CS-induced power saving also increases as can be seen from Fig. 6 and (9). With large number of VRs deployed, distributed load currents are supplied by adjacent VRs, which effectively reduces the distance VR output currents travel to balance others. Furthermore, effective resistance between two nodes on the power grid does not increase linearly with distance [19], [20]. Even with quite large distance, effective resistance can be only a few times of the unit power-grid resistance. All these factors contribute to the power savings seen from Fig. 22. More importantly, with balanced CS, VR malfunctions can be avoided and stability and reliability are enhanced.

#### VIII. CONCLUSION

Efficiency, stability, and reliability implications of unbalanced CS among distributed on-chip VRs are investigated in this paper both theoretically and through extensive simulations. A current balancing scheme that can be applied to most regulator types is proposed in this paper. A simple relationship between the individual VR output current and its corresponding  $V_{ref}$  is identified for balanced CS. And an adaptive  $V_{ref}$  control method based on the relationship is proposed. The proposed method generates and modulates  $V_{ref}$  for each regulator to balance the output current. The implementation of the method is analyzed and simulations are presented to verify the effectiveness. Regulator power loss saving up to 8%, enhanced system stability, and several years of MTTF improvement are verified through practical case studies.

#### REFERENCES

- I. Vaisband et al., On-Chip Power Delivery and Management, 4th ed. Cham, Switzerland: Springer, 2016.
- [2] J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach, 5th ed. San Francisco, CA, USA: Morgan Kaufmann, 2011.
- [3] C. Huang and P. K. T. Mok, "An 84.7% efficiency 100-MHz package bondwire-based fully integrated buck converter with precise DCM operation and enhanced light-load efficiency," *IEEE J. Solid-State Circuits*, vol. 48, no. 11, pp. 2595–2607, Nov. 2013.
- [4] S. Chong and P. K. Chan, "A 0.9-μA quiescent current outputcapacitorless LDO regulator with adaptive power transistors in 65-nm CMOS," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 60, no. 4, pp. 1072–1081, Apr. 2013.
- [5] O. A. Uzun and S. Köse, "Converter-gating: A power efficient and secure on-chip power delivery system," *IEEE J. Emerg. Sel. Topics Circuits Syst.*, vol. 2, no. 2, pp. 169–179, Jun. 2014.
- [6] J. F. Bulzacchelli *et al.*, "Dual-loop system of distributed microregulators with high DC accuracy, load response time below 500 ps, and 85-mV dropout voltage," *IEEE J. Solid-State Circuits*, vol. 47, no. 4, pp. 863–874, Apr. 2013.
- [7] I. Vaisband and E. G. Friedman, "Stability of distributed power delivery systems with multiple parallel on-chip LDO regulators," *IEEE Trans. Power Electron.*, vol. 31, no. 8, pp. 5625–5634, Aug. 2016.
- [8] S. Lai, B. Yan, and P. Li, "Localized stability checking and design of IC power delivery with distributed voltage regulators," *IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.*, vol. 32, no. 9, pp. 1321–1334, Sep. 2013.
- [9] Z. Zeng, X. Ye, Z. Feng, and P. Li, "Tradeoff analysis and optimization of power delivery networks with on-chip voltage regulation," in *Proc. IEEE/ACM Design Autom. Conf.*, Jun. 2010, pp. 831–836.
- [10] S. Köse and E. G. Friedman, "Distributed on-chip power delivery," *IEEE J. Emerg. Sel. Topics Circuits Syst.*, vol. 2, no. 4, pp. 704–713, Dec. 2012.
- [11] S. Köse, S. Tam, S. Pinzon, B. McDermott, and E. G. Friedman, "Active filter-based hybrid on-chip DC–DC converter for point-of-load voltage regulation," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 21, no. 4, pp. 680–691, Apr. 2012.
- [12] S. Köse and E. G. Friedman, "On-chip point-of-load voltage regulator for distributed power supplies," in *Proc. ACM 20th Symp. Great Lakes Symp. (VLSI)*, May 2010, pp. 377–380.
- [13] A. Paul *et al.*, "System-level power analysis of a multicore multipower domain processor with on-chip voltage regulators," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 24, no. 12, pp. 3468–3476, Dec. 2016.

- [14] W. Lee, Y. Wang, and M. Pedram, "Optimizing a reconfigurable power distribution network in a multicore platform," *IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.*, vol. 34, no. 7, pp. 1110–1123, Jul. 2015.
- [15] J. A. Abu-Qahouq, "Analysis and design of *N*-phase current-sharing autotuning controller," *IEEE Trans. Power Electron.*, vol. 25, no. 6, pp. 1641–1651, Jun. 2010.
- [16] G. Eirea and S. R. Sanders, "Phase current unbalance estimation in multi-phase buck converters," in *Proc. IEEE PESC*, Jun. 2006, pp. 1–6.
- [17] E. J. Fluhr *et al.*, "POWER8: A 12-core server-class processor in 22 nm SOI with 7.6 Tb/s off-chip bandwidth," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2014, pp. 96–97.
- [18] R. Zhang, K. Wang, B. H. Meyer, M. R. Stan, and K. Skadron, "Architecture implications of pads as a scarce resource," in *Proc. ISCA*, Jun. 2014, pp. 373–384.
- [19] S. Köse and E. G. Friedman, "Efficient algorithms for fast *IR* drop analysis exploiting locality," *Integr., VLSI J.*, vol. 45, no. 2, pp. 149–161, 2012.
- [20] S. Köse and E. G. Friedman, "Effective resistance of a two layer mesh," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 58, no. 11, pp. 739–743, Nov. 2011.
- [21] K. Mistry *et al.*, "A 45 nm logic technology with high-k+metal gate transistors, strained silicon, 9 Cu interconnect layers, 193 nm dry patterning, and 100% Pb-free packaging," in *IEDM Tech. Dig.*, Dec. 2007, pp. 247–250.
- [22] F.-F. Ma, W.-Z. Chen, and J.-C. Wu, "A monolithic current-mode buck converter with advanced control and protection circuits," *IEEE Trans. Power Electron.*, vol. 22, no. 5, pp. 1836–1846, Sep. 2007.
- [23] (Oct. 1999). TI Application Report SLVA079. [Online]. Available: http://www.ti.com/lit/an/slva079/slva079.pdf
- [24] X. Zhou, T. G. Wang, and F. C. Lee, "Optimizing design for low voltage DC–DC converters," in *Proc. IEEE APEC*, Feb. 1997, pp. 612–616.
- [25] M. Gildersleeve, H. P. Forghani-Zadeh, and G. A. Rincon-Mora, "A comprehensive power analysis and a highly efficient, mode-hopping DC–DC converter," in *Proc. IEEE AP-ASIC*, Aug. 2002, pp. 153–156.
- [26] P. Zumel, C. Fernández, A. de Castro, and O. García, "Efficiency improvement in multiphase converter by changing dynamically the number of phases," in *Proc. IEEE PESC*, Jun. 2006, pp. 1–6.
- [27] E. A. Burton *et al.*, "FIVR—Fully integrated voltage regulators on 4th generation Intel Core SoCs," in *Proc. IEEE APEC*, Mar. 2014, pp. 432–439.
- [28] R. Middlebrook and S. Cuk, "A general unified approach to modelling switching-converter power stages," in *Proc. IEEE PESC*, Jun. 1976, pp. 18–34.
- [29] M. Hankaniemi, "Dynamical profile of switched-mode converter—Fact or fiction?" Ph.D. dissertation, Inst. Power Electron., Tampere Univ. Technol., Tampere, Finland, 2007.
- [30] C. P. Basso, Switch-Mode Power Supplies SPICE Simulations and Practical Designs. New York, NY, USA: McGraw-Hill, 2008.
- [31] B. Johansson, "DC–DC converters—Dynamic model design and experimental verification," Ph.D. dissertation, Dept. Ind. Elect. Eng. Autom., Lund Univ., Lund, Sweden, 2004.
- [32] X. Huang, T. Yu, V. Sukharev, and S. X.-D. Tan, "Physics-based electromigration assessment for power grid networks," in *Proc. 51st* ACM/EDAC/IEEE Design Autom. Conf., Jun. 2014, pp. 1–6.
- [33] J. R. Black, "Electromigration—A brief survey and some recent results," *IEEE Trans. Electron Devices*, vol. 16, no. 4, pp. 338–347, Apr. 1969.
- [34] R. Zhang, B. H. Meyer, K. Wang, M. R. Stan, and K. Skadron, "Tolerating the consequences of multiple EM-induced C4 bump failures," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 24, no. 6, pp. 2335–2344, Jun. 2016.
- [35] J. G. Delgado-Frias and W. R. Moore, VLSI for Neural Networks and Artificial Intelligence. New York, NY, USA: Springer, 1994.
- [36] V. R. H. Lorentz *et al.*, "Lossless average inductor current sensor for CMOS integrated DC–DC converters operating at high frequencies," *Analog Integr. Circuits Signal Process.*, vol. 62, no. 3, pp. 333–344, 2009.
- [37] C. Bienia, S. Kumar, J. P. Singh, and K. Li, "The PARSEC benchmark suite: Characterization and architectural implications," Dept. Comput. Sci., Princeton Univ., Princeton, NJ, USA, Tech. Rep. TR-811-08, Jan. 2008.
- [38] S. L. Xi, H. Jacobson, P. Bose, G.-Y. Wei, and D. Brooks, "Quantifying sources of error in McPAT and potential impacts on architectural studies," in *Proc. IEEE HPCA*, Feb. 2015, pp. 577–589.

- [39] S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi, "McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures," in *Proc. 42nd Annu. IEEE/ACM Int. Symp. Microarchitecture*, Dec. 2009, pp. 469–480.
- [40] T. E. Carlson, W. Heirman, and L. Eeckhout, "Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation," in *Proc. Int. Conf. High Perform. Comput., Netw., Storage Anal.*, Nov. 2011, pp. 1–12.
- [41] Y. Zhang, D. Parikh, K. Sankaranarayanan, K. Skadron, and M. Stan, "HotLeakage: A temperature-aware model of subthreshold and gate leakage for architects," Dept. Comput. Sci., Univ. Virginia, Charlottesville, VA, USA, Tech. Rep. CS-2003-05, 2003, pp. 1–15.
- [42] K. Skadron, M. R. Stan, K. Sankaranarayanan, W. Huang, S. Velusamy, and D. Tarjan, "Temperature-aware microarchitecture: Modeling and implementation," ACM Trans. Archit. Code Optim., vol. 1, no. 1, pp. 94–125, Mar. 2004.
- [43] T. Yu and M. D. F. Wong, "Efficient simulation-based optimization of power grid with on-chip voltage regulator," in *Proc. ASP-DAC*, Jan. 2014, pp. 531–536.
- [44] K. Wang, B. H. Meyer, R. Zhang, M. Stan, and K. Skadron, "Walking pads: Managing C4 placement for transient voltage noise minimization," in *Proc. ACM/EDAC/IEEE Design Autom. Conf. (DAC)*, Jun. 2014, pp. 1–6.
- [45] AN 574: Printed Circuit Board (PCB) Power Delivery Network (PDN) Design Methodology. Accessed on Oct. 2016. [Online]. Available: https://www.altera.com/content/dam/altera-www/global/en\_US/pdfs/ literature/an/an574.pdf
- [46] R. Zhang, K. Mazumdar, B. H. Meyer, K. Wang, K. Skadron, and M. R. Stan, "Transient voltage noise in charge-recycled power delivery networks for many-layer 3D-IC," in *Proc. ISLPED*, Jul. 2015, pp. 1–6.



Longfei Wang received the B.S. degree in electronic information engineering from Southwest Jiaotong University, Chengdu, China, in 2010, and the M.S. degree in electrical engineering from Texas Tech University, Lubbock, TX, USA, in 2013. He is currently pursuing the Ph.D. degree in electrical engineering with the University of South Florida, Tampa, FL, USA.

His current research interests include on-chip voltage regulation and power management.



**S. Karen Khatamifard** received the B.S. degree in electrical engineering from the Sharif University of Technology, Tehran, Iran, in 2013. He is currently pursuing the Ph.D. degree with the University of Minnesota, Minneapolis, MN, USA.

He joined the ALTAI Group, University of Minnesota, in 2013. His current research interests include computer architecture.



**Orhun Aras Uzun** received the B.S. degree in electronics engineering from Istanbul Technical University, Istanbul, Turkey, in 2012. He is currently pursuing the Ph.D. degree in electrical engineering with the University of South Florida, Tampa, FL, USA.

His current research interests include on-chip voltage converters and analog/mixed signal circuit design.



**Ulya R. Karpuzcu** received the M.S. and Ph.D. degrees in electrical and computer engineering from the University of Illinois at Urbana–Champaign, Champaign, IL, USA.

She is currently an Assistant Professor of electrical and computer engineering with the University of Minnesota, Minneapolis, MN, USA. Her current research interests include the impact of technology on computing systems, energy-efficient computer architecture, hardware reliability, and approximate computing.

Dr. Karpuzcu is a Fulbright Fellow. She was a recipient of the NSF CAREER Award.



Selçuk Köse (S'10–M'12) received the B.S. degree in electrical and electronics engineering from Bilkent University, Ankara, Turkey, in 2006, and the M.S. and Ph.D. degrees in electrical engineering from the University of Rochester, Rochester, NY, USA, in 2008 and 2012, respectively.

He was with TUBITAK, Ankara, Intel Corporation, Santa Clara, CA, USA, and Freescale Semiconductor, Austin, TX, USA. He is currently an Assistant Professor with the Department of Electrical Engineering, University of South Florida, Tampa,

FL, USA. His current research interests include integrated voltage regulation, 3-D integration, hardware security, and green computing.

Dr. Köse was a recipient of the NSF CAREER Award, the Cisco Research Award, the USF College of Engineering Outstanding Junior Researcher Award, and the USF Outstanding Faculty Award. He is an Associate Editor of the *Journal of Circuits, Systems, and Computers* and the *Microelectronics Journal*. He has served on the Technical Program and Organization Committees of various conferences.