NVRH-LUT: A nonvolatile radiation-hardened hybrid MTJ/CMOS-based look-up table for ultralow power and highly reliable FPGA designs

Vahid JAMSHIDI*
Department of Computer Engineering, Shahid Bahonar University of Kerman, Kerman, Iran

Received: 25.12.2018 • Accepted/Published Online: 15.08.2019 • Final Version: 26.11.2019

Abstract: Complementary metal oxide semiconductor (CMOS) downscaling leads to various challenges, such as high leakage current and increase in radiation sensitivity. To solve such challenges, hybrid MTJ/CMOS technology-based design has been considered as a very promising approach thanks to the high speed, low power, good scalability, and full compatibility of magnetic tunnel junction (MTJ) devices with CMOS technology. One important application of MTJs is the efficient utilization in building nonvolatile look-up tables (NV-LUTs) used in reconfigurable logic. However, NV-LUTs face severe reliability issues in nanotechnology due to the increasing process variations, reduced supply voltage, and high energetic particle strike at sensitive nodes of CMOS circuits. This paper proposes a nonvolatile radiation-hardened look-up table (NVRH-LUT) for advanced reconfigurable logic. Compared with previous works, the proposed NVRH-LUT is fully robust against single-event upsets and also single-event double-node upsets that are among the main reliability-challenging issues for NV-LUTs. Results have shown that NVRH-LUT not only provides increasing reliability and reduced bit error rate but also offers low delay and low energy consumption.

Key words: Magnetic tunnel junction, nonvolatility, hybrid MTJ/CMOS logic circuits, radiation immunity, soft error, single-event upset, single-event double-node upset

1. Introduction
With recent acceleration of advanced complementary metal oxide semiconductor (CMOS) downscaling, sensitivity to radiation effects is increasing [1, 2]. In addition, further challenges such as process variation and leakage power are becoming more and more important in today’s VLSI design [1–4]. The decrease in threshold voltage results in an exponential increase in the subthreshold leakage current [5, 6]. When an energetic particle strikes a MOS-transistor in the OFF-state, it induces a localized ionization, which is capable of reversing (flipping) the data state of a memory cell, logic gate, latch, or flip-flop and thus causing a soft error. This soft error is called a single-event upset (SEU) [1, 2].

In order to avoid CMOS downscaling limitations (due to short-channel effect control and the resulting OFF-state current increase), nanometric magnetic elements, especially magnetic tunnel junction (MTJ) devices, have recently been attracting much attention because of such features as their small dimensions, nearly zero leakage power, nonvolatility, and compatibility with semiconductors [7]. Along with their other specific features, MTJ devices are robust against particle strikes. A magnetic tunnel junction (MTJ) comprises two layers of ferromagnetic material (e.g., CoFe) separated by a thin insulating tunnel barrier (e.g., MgO). One of the

*Correspondence: vjamshidi@uk.ac.ir

This work is licensed under a Creative Commons Attribution 4.0 International License.
ferromagnetic layers has a spin direction that is always fixed, the so-called fixed layer. The spin direction of
the other ferromagnetic layer (the so-called free layer) can be switched by a writing circuitry associated with
the magnetic logic device [7]. The MTJ offers low resistance when the two ferromagnetic layers are magnetized
in the same direction (RL), and it offers high resistance when the magnetization direction of both layers is
opposite (RH). A charge current flowing through the MTJ has been used to switch the MTJ between RL and
RH resistance states using the well-known spin transfer torque (STT) phenomenon [7]. The RL and RH states
are nonvolatile, so shutting off the power has no effect on the direction of any layer’s magnetization. These
states can be used for creating logic ‘0’ and logic ‘1’ in digital circuits. Various studies have been conducted on
MTJ-based memories [8–10] and logical circuits [11–13].

Figure 1. Field programmable gate array (FPGA): (a) schematic of a traditional SRAM-based FPGA, (b) single sense
amplifier STT-LUT, (c) multisense amplifier STT-LUT.

SRAM-based FPGA devices are increasingly becoming the most suitable platforms for implementing
modern system applications due to the high reconfigurability, low cost, availability, and fast time to market
demanded of today’s computing devices. Static random access memory (SRAM) cells are the basis of most
commercial FPGAs and can be found in well-known Xilinx and Altera products. As shown in Figure 1a, SRAM
cells are used in lookup-tables (LUTs) to store the logic function configuration data, which constitute the
primary components in reconfigurable fabrics [14]. However, the critical issues in SRAM-based FPGA devices
are the increasing leakage currents in the configuration memory and vulnerability against particle strikes. The
integration of STTRAM in FPGA instead of SRAM is one of the most promising solutions to overcome these
issues [15–23]. Spin-torque transfer random access memory (STTRAM) is a promising memory technology
for high-density on-chip caches due to its low leakage power and robustness against particle strikes. STTRAM
allows to completely power off the configurable logic blocks (CLBs) during the “idle” states of the FPGA circuit.

Existing designs of STT-LUTs suffer from long read delay and less reliability because of such failures as
decision failure, write failure, and read disturb. In this paper, we propose a novel STT-LUT (here called NVRH-
LUT), which shows significant impact on the area (the reduction of transistor count), speed (the reduction of path-length), and reliability (the reduction of path-length, path-transistor count, and robustness against SEUs) compared to the previously proposed STT-LUTs.

The rest of the paper is organized as follows: Section 2 offers a background on STT-LUT design and an overview of existing STT-LUT designs. Section 3 presents the proposed approach to the design of STT-LUTs. Section 4 shows how the proposed NVRH-LUT can be used to form a full adder. Section 5 presents power, performance, and robustness comparisons of the proposed and conventional STT-LUTs. Finally, conclusions are drawn in Section 6.

2. Related works
To reduce the leakage power of the SRAM-based FPGA devices and protect them against particle strikes, many practical solutions for FPGA logic circuits based on STTRAM technology have been proposed. They can be classified into two significant categories: a) single-sense amplifier STT-LUT and b) multisense amplifier STT-LUT. In the following, we will introduce the works of each category.

Single-sense amplifier STT-LUT: Figure 1b shows the concept of a compact “single-sense amplifier STT-LUT” (SSA). An n-input STT-LUT (called LUTn for short) requires $2^n$ MTJs that store digital information. It is generally composed of three components: 1) a sense amplifier (SA) to read a selected MTJ cell in the STT-LUT and produce full swing output voltage; 2) a selection tree with volatile logic data to choose a unique MTJ cell; 3) a write circuit to program the input data in a selected MTJ cell. In this design, the MTJ cells are directly connected to the selection tree circuit and the sense amplifier is shared among them [15–21].

Multisense amplifier STT-LUT: Figure 1c shows the concept of a compact “multisense amplifier STT-LUT” (MSA). In this design, each MTJ cell requires one sense amplifier. Therefore, an n-input STT-LUT requires $2^n$ sense amplifiers, inserted between MTJ cells and the selection tree as shown in Figure 1c [22–24].

SSA designs benefit from low area due to a shared sense amplifier, but they suffer from high read delay and low reliability due to long read path. Although MSA designs can improve the read speed as compared with SSA designs, they suffer from high area and power consumption due to multisense amplifiers and low reliability due to a larger number of sensitive nodes than can be upset (see Figure 1c). Hence, the challenging aspects of STT-LUT designs are reliability, power consumption, and the read speed of the MTJ resistive state and converting it to a binary voltage.

Different failure mechanisms can occur in a STT-LUT due to the process variation of spintronic and semiconductor devices, such as write failure, decision failure, retention failure, and failures due to read disturbs [25, 26]. We will briefly discuss these failure mechanisms as follows:

Write failure: This failure occurs when the MTJ cell cannot be flipped to the desired state during a write operation. This happens due to insufficient current ($I < I_{C0}$) or write period [27].

Decision failure: This failure occurs if the MTJ resistance is incorrectly sensed during a read operation. It happens in STT-MRAMs in which sense amplifiers are utilized to read the storage values of cells by comparing MTJ resistances with a reference resistance.

Retention failure: This failure happens if the state of an idle MTJ cell flips due to the inherent thermal instability in MTJ cells.

Read disturb: Read disturb occurs when read current is higher than MTJ switching current, causing the state of the MTJ cell to accidentally flip during read operation. This happens due to common read and write paths of MTJ cells.
Write failure rate can be mitigated by increasing the write pulse width or the write current value [28]. Similarly, the decision failure rate can be decreased by increasing read pulse width or read current value [28]. Retention failure rate is reduced by increasing the thermal stability factor (Δ) of MTJ cells at device level [28]. Read disturb can be controlled by increasing the difference between read and write currents [28]. Such increased differences can be obtained by either increasing write current or decreasing read current. However, reducing read current increases not only read latency but also the decision failures [29]. On the other hand, increasing write current also leads to more power dissipation. Therefore, neither of these is a sufficient solution for reducing the read disturb failure rate. Furthermore, it has been recently reported that the rate of read disturb failures is increased with technology downscaling [8]. For this reason, read disturb failures have become more challenging than ever for the reliability of MTJ-based circuits. An efficient way to control read disturb problems is the separation of the read and write paths of MTJ cells. For this purpose, a device was presented in [30], called “mCell”. An mCell is a four-terminal element that has electrically isolated read path and write path.

Different STT-LUT circuits have already been proposed in the literature (e.g., [15–24]). However, they suffer from long read delays and less reliability (due to such failures as decision failure, write failure, and read disturb). In this paper, we propose a novel STT-LUT that achieves significant impact on the area (the reduction of transistor count), speed (the reduction of path-length), and reliability (the reduction of path-length, path-transistor count, and robustness against SEUs) compared to the previously proposed STT-LUTs.

3. The proposed STT-based lookup table

Figure 2 shows the structure of the proposed nonvolatile radiation-hardened look-up table (NVRH-LUT). As shown in Figure 2, the proposed NVRH-LUT is composed of five components: 1. read circuit (Figure 2a); 2. selection tree (Figure 2b); 3. data-MTJ array (Figure 2c); 4. reference-1T/1MTJ part (Figure 2d); 5. write circuit (Figure 2e). The function of each component is explained in detail in the following subsections.

3.1. Read circuit

The read circuit is a precharge sense amplifier (PCSA) coupled to the selected data path and the reference path for reading a data-MTJ cell by discharging the current through the couple of paths. The sense amplifier compares the voltage level of the data path with the reference path and amplifies the comparison result for determining the state of the data-MTJ. Conventional PCSA circuits are vulnerable to radiation effects known as SEUs [16, 19, 21]. To overcome vulnerability against SEU effects, we propose a radiation-hardened PCSA (RH-PCSA) circuit. Figure 2a shows the structure of the proposed RH-PCSA. RH-PCSA operates in two phases:

Writing phase (precharge phase): “CLK” = ‘0’, PMOS transistors P_{1-4} are turned on, NMOS transistor N_0 is turned off, both outputs OUT and OUTB are precharged to logic 11; nodes D_{1-4} will be pulled up to the voltage level of VDD by transistors P_{1-4}, respectively. In this case, the writing circuit can be enabled to reconfigure the data-MTJs.

Reading phase (evaluation phase): “CLK” = ‘1’, NMOS transistor N_0 is turned on to flow reading currents I_0 and I_1 to the ground. These created currents are different because of the resistance difference between the selected data-MTJ and reference-MTJ. Different currents lead to different discharge speeds. Nodes D_1 and D_2 (D_3 and D_4) are redundant nodes, which have the same behavior and obtain the same cooperation values.
During the reading phase, the lower resistance branch discharges the node pair \((D_1, D_2)\) (or \((D_3, D_4)\)) more quickly than \((D_3, D_4)\) (or \((D_1, D_2)\)). This discharge will continue until the corresponding output is pulled down to “Gnd” or logic ‘0’. The other output will stay in logic ‘1’. Therefore, complementary outputs will be available at two branch outputs. PMOS transistors \(PL_1\) and \(PL_2\) act as active loads to produce full swing output voltage.

When an energetic particle strikes one of the nodes in STT-LUT, it could deposit some charge at the struck node and cause a logic flip in the status of the node. However, in the proposed RH-PCSA, due to the redundant nodes, the correct logic value will be restored. This case is discussed in Section 5.2 and the simulation figures are presented.

An important advantage of RH-PCSA is that it allows the designer to increase redundant nodes, which results in decreasing single-event multiple-node upsets. Figure 2f shows the RH-PCSA circuit, which is robust against SEDUs. The results of our circuit-level simulations prove that the RH-PCSA not only can fully tolerate a single-event upset occurring on any one of its single nodes but can also tolerate single-event multiple-node upsets in a STT-LUT.

### 3.2. Selection tree circuit

The selection tree is used to choose an individual data-MTJ cell for reading. As shown in Figure 3a, in SSAs, the selection tree is connected in series with MTJ-cells. This will result in the number of read path transistors increasing in accordance with the number of inputs (Figure 3a). The addition of these transistors
drastically reduces the current sensing margin of the read path due to the additional resistance introduced by the transistors in series, which, in turn, ends in read delay and decision failure rate increase. Both aspects lead to significant concern and poor outcomes. To address these issues, we separate the selection tree from the read path (Figure 3b).

![Selection tree in SSA and NVRH-LUT](image)

**Figure 3.** Selection tree location: (a) reading path in a conventional SSA, (b) reading path in the proposed NVRH-LUT.

The separation of the selection tree and read path causes the read path to be independent of number of inputs. Whatever the increase of number of inputs, the number of transistors in the read path is only equal to one, called the reading transistor. The reading transistor is selected by the selection tree and results in a sufficient sense current and a short read delay. The advantage of this structure is that it can implement LUTs containing 8 or more inputs.

### 3.3. Data-MTJ array

A data-MTJ array includes data-MTJ cells (Figure 2a). The data-MTJ is accessible for read/write operations through a read-transistor/write-transistor. RH-PCSA and the selection tree are shared by the entire data-MTJ array for both read and write operations; thus, the proposed LUT structure significantly reduces the area overhead of the die and in turn its cost, and also allows flexibility in the layout.

Read disturb is one of the important failure mechanisms in the STT-LUT that occurs when the state of the MTJ cell accidentally flips during a read operation. Read disturb will not occur in the proposed NVRH-LUT, because the unit storage cell used in the data-MTJ array is a spintronic device called mCell [30]. As shown in Figure 4a, mCell is a four-terminal device that has an electrically separated read path (R, R*) and write path (W-, W+).

The principle of its operation is based on a magnetic domain wall in the write path, which is displaced back and forth by positive and negative pulsed current, respectively. When the write path is magnetically coupled to the free layer of a magnetic tunnel junction, read path resistance changes as determined by the element’s tunneling magnetoresistance.
Figure 4. mCell device: (a) mCell cross-section, schematic symbol, and 3-D view. The read path of the mCell is through the (R, R*) terminals and the write path is through the (W-, W+) terminals [30]. (b) Path-1 performs a write operation of logical value of ‘1’. (c) Path-2 performs a write operation of logical value of ‘0’.

We consider an mCell as a “black box” device with four terminals. Two terminals form a write-path, wherein the direction of flowing input current charges the digital state of the device. The other two terminals include a read-path that is electrically separated from the write-path. The state of the device is detected as high or low resistance through the read-path terminals. By sending a small current pulse from W- to W+ or W+ to W-, the mCell read path resistance is changed between R and R* to be high (RH) or low (RL), respectively. These two resulting resistance states are nonvolatile, i.e. their states are preserved even if the supply voltage is removed. The RH and RL states can be used to indicate logic-1 and logic-0 in digital circuits, respectively. The ratio between the two resistance values is the tunnel magnetoresistance ratio (TMR), defined by Eq. 1:

\[
TMR = \frac{\Delta R}{R} = \frac{R_H - R_L}{R_L}.
\] (1)

Since a large TMR ratio indicates a large difference between RL and RH resistances, it can provide a measure of how easily the two states of the MTJ can be distinguished. A TMR ratio of 600% is observed at room temperature in CoFeB/MgO/CoFeB MTJs [31].

3.4. Reference-1T/1MTJ part

The reference-1T/1MTJ part is used for making a comparison between the resistances of data-MTJ and reference-MTJ to determine the state of data-MTJ. As shown in Figure 2d, the reference-1T/1MTJ part includes a balance-transistor (Nb) and a reference-MTJ (Mref). The balance-transistor is similar to a read-transistor (Nr), inserted to balance the transistor paths (path balancing). The resistance of data-MTJ can be altered by the write circuit while reference-MTJ always has a fixed resistance. In order to obtain the highest TMR ratio, the resistance of reference-MTJ is set to \((R_H + R_L)/2\) [32].

It should be noted that, with the separation of the selection tree and data path, the reference part consists only of 1 transistor instead of a reference tree with many transistors, used in previous works [15–19]. Thus, the design will lead to a significant impact on the area (the reduction of transistor count), speed (the reduction of path-length), and reliability (the reduction of path-length and path-transistor count).
3.5. Writing circuit

A write circuit is employed to program the input data in a data-MTJ cell. In order to lower the power consumption of the proposed NVRH-LUT, a low-cost write circuit has been included that uses only 7 transistors instead of 16 transistors used in previous write circuits [33, 34]. As shown in Figure 4a, the proposed write circuit generates a sufficient bidirectional current ($> I_{C0}$) to switch the data-MTJ ($I_{C0}$ is the data-MTJ switching current-critical current). Data saved in data-MTJ are ‘1’ when downward current (Figure 4b) with the magnitude of ‘I’ is created. Data saved in data-MTJ are ‘0’ when upward current (Figure 4c) with the magnitude of ‘I’ is created. During the write phase, depending on the state of the NMOS transistors in the selection tree, only one write-transistor ($N_w$) is selected; the write enable signal (WE) is set to ‘1’, which activates the write circuit. The ‘In’ signal determines the direction of the writing current (downward/upward). Transistor $Ng$ is a power-gating transistor used to cut off the write-circuit from ground rails during the idle state of the write-circuit.

4. Implementation of a full adder

The full adder is one of the most important components of an arithmetic logic unit (ALU) for any processor, because each arithmetic operation (such as addition, subtraction, multiplication, division, comparison, and exponentiation) can be done by using full adders. It is a combinational circuit that adds three 1-bit inputs (A, B, and C) together and produces two 1-bit outputs (SUM and Cout). Figure 5a shows the truth table of the full adder. According to Figure 5a, the implementation of a full adder requires two 3-input STT-LUTs, each of which consists of 8 data-MTJs to store data. Figure 5b shows the normal operation of the proposed NVRH-LUT-based full adder. As can be seen in Figure 5b, the output signals plotted against the inputs correctly follow the truth table and present a full rail swing between 0 and VDD. The simulations were carried out using 0.9 V supply voltage at room temperature. In our simulation, the lower and higher resistances of data-MTJ are 1.25 k\(\Omega\) and 2.5 k\(\Omega\), respectively. The resistance of Ref-MTJ is 1875 \(\Omega\).

5. Evaluating the proposed NVRH-LUT

5.1. Simulation environment

In order to evaluate the efficiency of the proposed method, the methods of [15], [16], [17], [18], and [19] were simulated and compared with the proposed NVRH-LUT. Simulations were performed by the SPICE tool in 32nm CMOS technology at room temperature. Data analysis of signals was carried out by Cosmos scope software. Parameters used for the simulations of MOS transistors and data-MTJ cells are listed in Table 1 and Table 2, respectively.

5.2. Results and discussion

5.2.1. SEU tolerance

A soft error or SEU occurs when an energetic particle strikes the FPGA (or supporting) device and changes the state of a data-MTJ.

To inject the SEU fault into the simulated STT-LUTs, we used the model presented in [36], in which an SE hit could be simulated at the output (sensitive node) of the STT-LUT using a double exponential current source with the behavior of Eq. 2:

$$I_{inj}(t) = \frac{Q_{inj}}{\tau_a - \tau_b} (e^{-t/\tau_a} - e^{-t/\tau_b}),$$

(2)
TRUTH TABLE OF FULL ADDER

<table>
<thead>
<tr>
<th>A</th>
<th>B</th>
<th>C</th>
<th>Cout</th>
<th>SUM</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

(a)

Figure 5. The proposed NVRH-LUT-based full adder: (a) truth table, (b) normal operation.

Table 1. Parameters of 32nm technology [35].

| Parameter                  | Value
<table>
<thead>
<tr>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>Supply voltage (V)</td>
<td>0.9</td>
</tr>
<tr>
<td>Effective gate length (nm)</td>
<td>25–35</td>
</tr>
<tr>
<td>Ion N (μA/μm) at 1 V</td>
<td>1000–1550</td>
</tr>
<tr>
<td>Ion P (μA/μm) at 1 V</td>
<td>500–1210</td>
</tr>
<tr>
<td>Ioff N (nA/μm)</td>
<td>0.1–200</td>
</tr>
<tr>
<td>Ioff P (nA/μm)</td>
<td>0.1–100</td>
</tr>
<tr>
<td>Gate dielectric</td>
<td>HfO2, SiON</td>
</tr>
<tr>
<td>Equivalent oxide thickness</td>
<td>0.9–1.2</td>
</tr>
<tr>
<td># of metal layers</td>
<td>6–11</td>
</tr>
<tr>
<td>Interconnect layer permittivity</td>
<td>2.4–3.0</td>
</tr>
</tbody>
</table>

Table 2. Parameters of mCell elements [30].

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Threshold current density</td>
<td>4 MA/cm²</td>
</tr>
<tr>
<td>MTJ resistance*area</td>
<td>2 ohm*μm²</td>
</tr>
<tr>
<td>Tunnel magnetoresistance ratio</td>
<td>100%</td>
</tr>
<tr>
<td>Read path low resistance</td>
<td>1.25 k</td>
</tr>
<tr>
<td>Read path high resistance</td>
<td>2.5 k</td>
</tr>
<tr>
<td>Write path resistance</td>
<td>120</td>
</tr>
<tr>
<td>Length of a MTJ in the read-path [nm]</td>
<td>12.0</td>
</tr>
<tr>
<td>Space of RD-path and WR-path contact [nm]</td>
<td>8.0</td>
</tr>
<tr>
<td>Space of the two cells in the read-path [nm]</td>
<td>8.0</td>
</tr>
<tr>
<td>Width of the device [nm]</td>
<td>10.0</td>
</tr>
</tbody>
</table>

where \( Q_{inj} \) is the total amount of charge deposited at the affected node. \( \tau_a \) and \( \tau_b \) are two material-dependent time constants [36]. \( \tau_a \) is the collection time constant of the junction and \( \tau_b \) is the time constant for initially establishing the ion track.

Figure 6a shows the SEU-injection results at output nodes of the circuit presented in [17]. As shown in this figure, when the clock signal (CLK) is equal to ‘1’ and the circuit is in the evaluation phase, the effect of an energetic particle strike to a node of the SA circuit could force the associated output to settle within an incorrect state. The structure of other considered SAs is similar to [17]. Therefore, they are vulnerable to SEUs as well.
Figure 6. The results associated with SEU injection to the circuit: (a) SEU injection to ref. [17]; (b) SEU injection to the proposed NVRH-LUT.

Figure 6b shows the SEU-injection results at output nodes of the proposed NVRH-LUT. As shown in Figure 6b, any error that occurred with the SEU injections was recovered after a short period of time. This recovery is the resultant of redundant nodes that restore the corrupted data. In addition, the MTJs used in circuits are inherently robust against radiation particles, because the data stored in the MTJs are obtained by the spin direction of the electrons instead of electrical charges [7]. Thus, the strike of energetic particles will not change their states.

5.2.2. Device count (area), delay, power, and PDP comparisons

Table 3 summarizes the results of a comprehensive comparison of different STT-LUT circuits implemented and examined. To provide a fair comparison, all LUT circuits are simulated by the SPICE circuit simulator with the 32nm CMOS library with 0.9 V nominal voltage.

In the first and second columns of Table 3, the device cost of the presented STT-LUT circuits is compared in terms of the number of MOS transistors in their SA circuit and the number of data-MTJ cells employed. The proposed NVRH-LUT uses fewer data-MTJ cells than other considered circuits, which will result in less switching energy consumption. In contrast, the proposed NVRH-LUT includes more transistors than other samples to gain the SEU and SEDU tolerance. Instead, there are no single or double nodes in the proposed SA circuit so that an energetic particle can impose a corruption.

The third, fourth, and fifth columns compare delay, power, and energy of the considered STT-LUT circuits. The incurred delay, as well as the consumed energy for read operation, depends mainly on the SA circuit and read path length. According to Table 3, the proposed NVRH-LUT circuit obtains much less delay than other considered STT-LUTs during the read operation. The decrease in delay is due to the separation of the selection tree from the read path (critical path reduction). It should be noted that one of the most dominant factors for total energy consumption of an MTJ-based circuit is the energy consumed for write operations. To
reduce the energy dissipation of the write circuit, the proposed NVRH-LUT uses a power-gating transistor ("Ng" in Figure 2e) and mCell devices. Transistor Ng is a power-gating transistor used to cut off the write-circuit from ground rails during the idle state of the write-circuit. mCell devices have the potential to reduce the critical switching current density (Jc0) while maintaining data stability and improving the read signal, which are the main concerns of data-MTJ devices. The sixth column of the table compares the TMR ratios of considered STT-LUT circuits. High TMR ratio leads to greater intrinsic critical switching current density (Jc0) and MTJ-cell size [37]. The seventh and eighth columns of the table compare the radiation sensitivity of the considered STT-LUT circuits.

As the reported results in Table 3 reveal, the proposed NVRH-LUT not only provides low delay and energy but also offers full immunity against SEDUs as compared with previously proposed NVRH-LUT circuits.

### Table 3. Design parameters of the proposed rad-hard NVRH-LUT in comparison with previously presented STT-LUTs (all considered as 4-input LUT).

<table>
<thead>
<tr>
<th>STT-LUT circuits</th>
<th># of MTJs</th>
<th># of MOSs</th>
<th>Delay (ps)</th>
<th>Power (μW)</th>
<th>PDP</th>
<th>TMR (%)</th>
<th>SEU immune</th>
<th>SEDU immune</th>
</tr>
</thead>
<tbody>
<tr>
<td>Ref. [15]</td>
<td>48</td>
<td>120</td>
<td>84.72</td>
<td>0.84</td>
<td>71.16</td>
<td>100</td>
<td>NO</td>
<td>NO</td>
</tr>
<tr>
<td>Ref. [16]</td>
<td>32</td>
<td>252</td>
<td>174.6</td>
<td>1.764</td>
<td>307.99</td>
<td>700</td>
<td>YES</td>
<td>NO</td>
</tr>
<tr>
<td>Ref. [17]</td>
<td>17</td>
<td>75</td>
<td>107.9</td>
<td>0.64</td>
<td>69.05</td>
<td>100</td>
<td>NO</td>
<td>NO</td>
</tr>
<tr>
<td>Ref. [18]</td>
<td>20</td>
<td>40</td>
<td>59.8</td>
<td>0.57</td>
<td>34.08</td>
<td>100</td>
<td>NO</td>
<td>NO</td>
</tr>
<tr>
<td>Ref. [19]</td>
<td>32</td>
<td>168</td>
<td>116.3</td>
<td>1.176</td>
<td>136.76</td>
<td>400</td>
<td>YES</td>
<td>YES</td>
</tr>
<tr>
<td>Proposed NVRH-LUT</td>
<td>17</td>
<td>72</td>
<td>11.18</td>
<td>0.32</td>
<td>3.57</td>
<td>100</td>
<td>YES</td>
<td>YES</td>
</tr>
</tbody>
</table>

#### 5.2.3. BTI sensitivity

Bias temperature instability (BTI) is the phenomenon on silicon MOS devices where high electric fields and/or high temperatures stress the MOSFET gate, changing the threshold voltage (Vth) of the device [38]. BTI is classified into two categories: negative BTI (NBTI) and positive BTI (PBTI). NBTI impacts PMOS and PBTI impacts NMOS transistors during circuit operation, causing the degradation of their threshold voltage and in turn speed degradation [39].

Because of the dynamic operation of STT-LUTs, the model presented in [40] is applied to evaluate the impact of BTI degradations in simulated STT-LUTs. According to this model, the threshold voltage (Vth) shift due to BTI is dependent on bias (Vsg) and temperature (T). The Vth shift due to BTI is modeled in Eq. 3:

$$\Delta V_{th} = (t + t_0) = \Delta_1 + \Delta_2,$$

where

$$\Delta_1 = \Phi(A + B \log(1 + C.t)),$$

$$\Delta_2 = \Delta V_{th}(t_0)(1 - \frac{k + \log(1 + C.t)}{k + \log(1 + C.(t + t_0))}),$$

$$\Phi = \Phi_0 \exp(\frac{\beta V_{th}}{k T}), \exp(-\frac{E_0}{k T}).$$

4496
$T_{ox}$ is the oxide thickness, $K$ is the Boltzmann constant, $T$ is the temperature in Kelvin, $t_0$ is the initial time of a given cycle when the voltage $V_{sg}$ is applied, $t$ is the time duration that the voltage $V_{sg}$ is kept, and $\Delta V_{th}(t_0)$ is initial threshold voltage shift, which is the final threshold voltage shift from the previous cycle. $A$, $B$, $C$, $\beta$, $\Phi_0$, $E_0$, and $k$ are constants [40]. Under constant DC stress, the model of Eq. 7 predicts the BTI over time [40]:

$$\Delta V_{th}(t) = \Phi(A + B \log(1 + C.t)).$$ (7)

To evaluate the sensitivity of LUT circuits to BTIs in the presence of stress-induced $V_{th}$ shifts, Monte Carlo simulation in SPICE has been used to model LUT circuits and obtain the bit error rate (BER), and the results have been obtained for 10,000 simulation points. As shown in Figure 7, the BER of NVRH-LUT is much less than those of other considered STT-LUTs. The observed decrease in BER is due to the separation of the selection tree from the read path and the reduction of the read path devices. In fact, the read path is a critical path of LUT circuits and its device count has a significant role in BER. Therefore, the reduction in read path device count results in the reduction of BER. According to the remarkable feature of NVRH-LUT, its critical path is independent of the number of inputs, i.e. with the increase of NVRH-LUT inputs, no change occurs in the critical path. NVRH-LUT exhibits a 5.33% bit error rate in the presence of BTI. According to simulations, NVRH-LUT can reach a BER of zero with a TMR of 200%.

![Figure 7. Comparison of the BTI effects in different STT-LUT circuits.](image)

5.2.4. Process variations

Process variation is a manufacturing phenomenon that results in some parameters of devices in a real chip to be different from those determined in the design. The amount of process variation becomes particularly pronounced with device shrinkage in nanoscale technologies, which can cause significant performance reduction or even functional failure of a chip [41].

In a STT-LUT, the device variations of the MOS transistor and MTJ cell can significantly alter their resistances, which results in read delay increase or even functional failure.

The resistance of the MOS transistor in ON-state is obtained from Eq. 8:

$$R_{T_{on}} = \frac{V_{DS}}{I_{D(\text{on})}} = \frac{1}{\mu_C \omega \frac{W}{L}(V_{GS} - V_{th})}.$$. (8)

According to Eq. 8, $R_{T_{on}}$ of the NMOS transistor is affected by the variations of transistor channel length (L)/width (W) and threshold voltage ($V_{th}$).
The resistances of the MTJ in $R_L$ and $R_H$ are simplified in the following equations [42]:

$$R_L = K_1 \frac{t_{ox}}{W_{MTJ} \times L_{MTJ}} (e^{K_2 t_{ox}}),$$

where $t_{ox}$ is the oxide barrier layer thickness, and $W_{MTJ} \times L_{MTJ}$ is the MTJ area. $K_1$ and $K_2$ represent all remaining process parameters [42].

$$R_H = R_L \times (1 + TMR).$$

The prerequisite of TMR is that the electrons should be able to tunnel through the oxide barrier layer while maintaining their spin coherence. Therefore, the thickness of the oxide barrier layer affects TMR. Based on [21], TMR can be written as in Eq. 11:

$$TMR = P_1 - \frac{P_2}{P_3} (1 - e^{-P_3 \cdot t_{ox}}),$$

where $P_1$, $P_2$, and $P_3$ are fitting parameters. Thus, $t_{ox}$ variations lead to TMR variations, which, in turn, result in resistance variations.

According to Eq. 9-10, $R_L$ and $R_H$ of the MTJ cell are affected by the variations of MTJ length (L)/width (W) and the oxide barrier layer thickness ($t_{ox}$).

In a STT-LUT, the process variations of the MOS transistor and MTJ cell can significantly alter $R_{T_{on}}$, $R_L$, and $R_H$ based on Eq. 8-10, which can cause read delay increase or even functional failure.

Furthermore, the free layer thickness of MTJ is susceptible to process variation, which alters the thermal stability factor ($\Delta$). According to Eq. 12, the thermal stability factor ($\Delta$) is linearly dependent on the free layer thickness:

$$\Delta = \frac{E}{k_B \times T} = \frac{K_U \times V}{k_B \times T}.$$  

The stability factor is the height of the energy barrier relative to $k_B T$. $K_U$ is an anisotropy constant and $V$ is the free layer volume, which equals the product of the free layer cross-sectional area and the free layer thickness.

Based on the theoretical model in [43] and the measurements in [44], the switching probability of the MTJ can be obtained using Eq. 13:

$$P_{SW} (MTJ) = 1 - \exp \left\{ -\frac{t}{\tau_0} \exp \left[-\Delta \left(1 - \frac{I_{WR}}{I_{C0}}\right)\right]\right\},$$

where $P_{SW} (MTJ)$ is the probability of successful switching in the MTJ; $\Delta$ is the thermal stability factor and is equal to $E/k_B T$, where $E$ is the energy barrier between the parallel and the antiparallel states of the MTJ; $k_B$ is the Boltzmann constant; $T$ is the operating temperature in Kelvin; $I_{WR}$ is the current flowing through the MTJ-write path; $t$ is the current pulse width (2 ns in our simulations); $\tau_0$ is 1 ns; and $I_{C0}$ is the least current required for switching the MTJ (high-to-low or low-to-high) in accordance with $\tau_0$ [37].

Thus, the process variation of free layer thickness can significantly alter the switching probability of the MTJ, which can cause write failure. We have considered five major sources of process variations affecting the electrical resistance of the MOS transistor and MTJ cell: a) variations in transistor area, b) variations in transistor threshold voltage ($V_{th}$), c) variations in the thickness of the MTJ free layer, d) variations in MTJ cross-sectional area, and e) variations in the thickness of the oxide barrier layer ($t_{ox}$).
To evaluate the sensitivity of LUT circuits to manufacturing variations in the presence of the above-mentioned PV sources, we have used a Monte Carlo simulation in SPICE, and the results have been obtained for 10,000 simulation points. As shown in Figure 8, the BER of NVRH-LUT is much less than other considered STT-LUTs. The decrease in BER is due to the separation of the selection tree from the read path and the reduction of read path devices, which, in turn, ends in the degradation of critical path transistors. NVRH-LUT exhibits a 8.63% BER in the presence of PV sources. According to simulations, NVRH-LUT can reach a BER of zero with a TMR of 200%.

![Figure 8. Comparison of the PV effects in different STT-LUT circuits.](image)

6. Conclusions
In this paper, we proposed a novel nonvolatile radiation-hardened LUT circuit, the so-called NVRH-LUT, based on mCell-MRAM devices. NVRH-LUT uses the proposed RH-PCSA, which is robust against SEUs and SEDUs with injected charge values ranging from –200 fC to +200 fC, which is a typical range that can be injected by the radiation particles striking NV-LUT circuits, as established in the literature. A comparison with previously proposed NV-LUTs revealed that NVRH-LUT not only provides increasing reliability and reduced BER but also offers low delay, area, power, and energy consumption. NVRH-LUT can thus be a promising candidate for power gating and reliable FPGAs.

References


