# Multiple Transient Faults in Combinational Logic with Placement Considerations

Georgios Ioannis Paliaroutis Department of Electrical and Computer Engineering University of Thessaly Volos 38221, Greece gepaliar@uth.gr Pelopidas Tsoumanis Department of Electrical and Computer Engineering University of Thessaly Volos 38221, Greece petsouma@uth.gr Nestor Evmorfopoulos Department of Electrical and Computer Engineering University of Thessaly Volos 38221, Greece nestevmo@uth.gr

George Dimitriou Department of Electrical and Computer Engineering, and Department of Computer Science University of Thessaly Volos 38221, Greece dimitriu@uth.gr Georgios I. Stamoulis Department of Electrical and Computer Engineering, and Department of Computer Science University of Thessaly Volos 38221, Greece georges@uth.gr

Abstract—Integrated circuit susceptibility to radiationinduced faults remains a major reliability concern. The continuous downscaling of device feature size and the reduction in supply voltage in CMOS technology tend to worsen the problem. Thus, the evaluation of Soft Error Rate (SER) in the presence of multiple transient faults is necessary, since it remains an open research field. This tool is based on Monte-Carlo simulations and, in combination with the implementation of the masking mechanisms and the consideration of placement information, provides an accurate SER estimation.

# Keywords—Masking mechanisms, multiple transient faults, sensitive regions, soft error rate, temperature, vulnerability

# I. INTRODUCTION

Reliability in VLSI circuits has always been a challenge during the design process, let alone in recent years where chips are more vulnerable to radiation-induced hazards [1], [2]. Alpha particles emitted from radioactive impurities in package material and high-energy particles from cosmic radiation may strike the silicon of an IC resulting in *transient faults* (TFs). The nature of this kind of errors is non-destructive since it may only affect the proper operation of the circuit. However, in critical systems this could cause an unexpected behavior. Thus, the need of identifying the impact of such errors on the circuit function is imperative. These errors are called soft errors and SER is the metric that indicates the grade of a circuit susceptibility to radiation-induced faults.

Most of the work that exist in the bibliography and model the soft errors are based on the implementation of the three natural masking phenomena that mitigate SER, i.e. logical, electrical and timing masking [3], [4]. The authors in [5], [6] support their work on probabilistic models and statistical methods for the estimation of SER. However, modern chips tend to be more susceptible to high-energy particle strikes due to the technology downscaling and, thus, the reduction in the distance between the cells has increased the emergence of *multiple transient faults* (MTFs) occurred from a single particle strike [7-11]. Therefore, recently, research in this field focuses on the estimation of SER in the presence of *single event multiple transients* (SEMTs). In [8] heavy-ion experiments are conducted to characterize the SEMTs. The authors in [11], [12] introduce the identification of the sensitive zones of the cells for the estimation of SER. Some approaches consider that SEMTs occur at the output of physically adjacent gates which are identified by examining the netlist [13], [14]. Nevertheless, utilizing only logic-level netlists for the determination of circuits' error sites and neglecting the layout-level adjacency of the cells may lead to inaccurate results. Other approaches provide a more realistic and accurate SER estimation analysis, by taking into account the circuit layout [11], [12], [15]-[18].

This paper presents a detailed overview of SER analysis for combinational logic and focuses on the modeling and handling of SEMTs originated from a particle strike. Based on Monte-Carlo simulations and regarding the layout information we obtain an accurate SER estimation. The rest of the paper is organized as follows: Section II introduces the basics of the transient faults; Section III presents the proposed methodology and its algorithm; Section IV shows the experimental results on the benchmarks and, finally, Section V concludes this work.

# II. TRANSIENT FAULTS BASICS

In this section, we present the modeling of voltage pulses using a SPICE simulator, the sensitive zones of a cell, and how the masking effects prevent a TF from propagating through the ICs mitigating its impact on their functionality.

#### A. Spice simulations and Sensitive Regions

TFs are mainly caused by high-energy neutrons that strike on a transistor's depletion region. The result is a current pulse that appears at the gate output as a voltage drop. In order to model the pulse generation and characterize gate's sensitivity SPICE simulations should be performed for each cell. In particular, current pulses are inserted both to nMOS and pMOS transistors for all input combinations of each gate in order to observe the output pulse. For the fault injection process a script is utilized, parsing the *ISCAS'89* SPICE netlists, to insert current pulses on random transistor nodes. Subsequently, the propagation of the generated pulses is examined, whereas an adequate number of simulations are made so as to obtain an estimation of SER. The characterization offers an overview of the TFs and shows how the initial duration of a glitch attenuates, while passing through the ICs, considering various capacitive loads on the output of the affected gates. Simulations show that the generated faults tend to be filtered as capacitance increases.

The critical charge required to change the logic state of a gate has been significantly decreased due to the technology downscaling. Thus, electron-hole pairs generated by a particle, which hits a sensitive transistor, can change the logic state of a gate. However, the emergence of a transient pulse at the gate's output depends on whether a high-energy particle affects a sensitive region. The aforementioned SPICE simulation analysis shows that according to all input combinations sensitive regions are the off transistors [11], [12], [18]. For the identification of the sensitive zones on the circuit layout the GDSII (Graphic Data System) file of each cell has been utilized [15]. These files are binary files containing layout information of ICs, hence a parser, written in C language, has been included in the proposed tool for the extraction of the precise location of the transistors' diffusions for each gate.

# B. Masking Mechanisms

There are three mechanisms that prevent transient faults from propagating through the circuits and, subsequently, resulting in soft errors [5]. The first, that is *logical masking*, occurs when the propagation of a glitch is prevented due to an on-path gate, whose output value is controlled by one or more input values. For example, if an OR gate's input has a logic value 1 its output will always be logic 1 regardless of the other input values. Thus, every glitch that arrives on any other input will, eventually, be masked. Similarly, the output of an AND gate will be at logic 0 if at least one input is at logic 0. *Electrical masking* is the second factor which prevents a TF from reaching the memory elements and, thus, becoming a soft error. The generated pulses are electrically masked due to the electrical properties of the cells they propagate through, since they are attenuated after each pass. For the propagation of the pulse a simple linear function, which depends on the gate's delay, has been used. A slow gate has a greater contribution to electrical masking than a fast gate. Finally, the third factor which contributes to the elimination of such disturbances is *timing masking*. This mechanism is associated with the memory elements and their latching window which is the time interval, determined by the setup and hold time, that the input signal should be stable to be reliably latched. Therefore, a TF that reaches a *flip-flop* (FF) outside of the latching window becomes masked.

# III. METHODOLOGY DESCRIPTION

The proposed framework is based on Monte-Carlo simulations, as this technique provides more accurate results compared to other probabilistic methods, even though it is regarded old-fashioned and time-consuming. Also, emphasis is placed upon MTFs and the modeling of the three masking phenomena (logical, electrical and timing masking) that affect the probability that a TF will become a soft error.

# A. Behavior of Multiple Transient Faults

A MTF occurs when a particle hit affects an area over the chip producing glitches on adjacent cells [17]. Therefore, gates' output may be changed owing to a corresponding number of sensitive transistors that may be influenced by the hit. The surface affected by a particle hit is depicted by an oval shape, according to the average affected area which depends on the particle's energy [12].

The *DEF* (Design Exchange Format) files are parsed - for the corresponding *ISCAS'89* benchmark circuits - which describe the position and placement orientation of each logic cell on the circuit layout. Therefore, along with the GDSII files, the transistors' position is identified as well as the sensitive regions of each gate on the die area. This process is crucial, since affected cells from a particle strike are regarded those whose inactive transistors are located within the oval affected area [11], [15], [16].

# B. Proposed Algorithm

For the identification of a circuit's vulnerability to transient faults a topological analysis is presented which is based on the division of the circuit layout to several smaller equal parts, called so on grids [15]. Algorithm 1 summarizes the proposed framework for the evaluation of SER.

| Algorithm Compute SER                                   |
|---------------------------------------------------------|
| ESTIMATION OF SER(ISCAS'89)                             |
| /* parse corresponding def files for ISCAS'89 circuit*/ |
| CREATE-CIRCUIT(def file);                               |
| /*Identify diffusions' coordinates of each gate*/       |
| PARSE-GDSII(gdsii file);                                |
| FIND-DELAY(circuit); /*find delay for each gate*/       |
| function LOGICAL-MASKING(circuit)                       |
| Initially divide circuit into a number of grids         |
| for each $grid$ in $def$ do                             |
| /*find which gates are affected by a particle hit*/     |
| function ERRORS-GEN(grid)                               |
| /*find which diffusions of each gate are affected $*/$  |
| SENSITIVE-REGION(error,gate,radius);                    |
| return <i>errors</i>                                    |
| end function                                            |
| for $10000$ simulations do                              |
| for each node in def $\mathbf{do}$                      |
| for $i=0$ to errors do                                  |
| compute temp-error-state[i]                             |
| /*examine if error affects off or on transistors*/      |
| end for                                                 |
| function ELECTRICAL-MASKING(node)                       |
| $compute \ error-width[node]$                           |
| end function                                            |
| function TIMING-MASKING(node)                           |
| compute error-time[node]                                |
| end function                                            |
| end for                                                 |
| end for                                                 |
| TOTAL-LATCHING-PROB();                                  |
| end for                                                 |
| OVERALL-SER();                                          |
| EXPERIMENTAL-RESULTS();                                 |
| end function                                            |

The algorithm starts by parsing the DEF and GDSII files of the benchmark under simulation in order to record circuit's connectivity and identify the precise position of the gates and their nMOS and pMOS diffusions. The implemented tool is based on a simple gate-level simulator and utilizes the logical effort to determine the gates' delay with find delay function, since it is considered a straightforward delay estimation technique. A key point for the proposed implementation is the treatment of the MTFs propagation concerning the three masking effects. In particular, each pulse originated from a single particle strike that appears at the output of the affected cells, propagates throughout the circuit along with its own logical, electrical and timing masking information. Subsequently, the circuit is divided into a number of grids, which depends on its size and random particle strikes occur, via errors-gen function, producing multiple errors. Furthermore, prior to the modeling of the masking mechanisms, the affected transistors are extracted. This process is necessary for the identification of the sensitive zones which takes into account the current gate input values. In order to examine each error separately, and determine those that will be captured by the memory elements, three tables one for each masking effect - for each circuit node have been used. Their size changes dynamically and depends on the number of MTFs generated from a particle strike. Particularly, temp error state is used for logical, whereas error width and error time are employed for the electrical and timing masking mechanism respectively.

The masking effects information is used to estimate the total latching probability for each simulation, comparing initially the input logic state of each FF with the pointer estimated for the logical masking and then all together in *total\_latching\_prob* function. As a result, in *overall\_SER* function the circuit SER is estimated considering the latching probabilities of each simulation. At the end of simulation various results and statistics are extracted to evaluate the vulnerability of the circuit to radiation-induced errors.

# C. Reconvergent Transient Faults

Another crucial factor, which affects the propagation of the faults, is the examination of reconvergent pulses. A TF following multiple paths that may reconverge at a subsequent gate is taken into account by this tool. As mentioned before, electrical masking is the mechanism, which models the pulses, according to the delay of each gate. Therefore, when two or more pulses of the same transient fault reconverge at a cell having the same direction, the output pulse is the sum of the input pulses. On the other hand, as for the overlapping pulses with opposite direction, the resulting pulse at the output of the gate depends on its type and controlling value [11].

#### IV. RESULTS AND DISCUSSION

The proposed tool is implemented in C and all the simulations are performed on a Linux machine with an Intel Core i7-3770 processor @3.4GHz and 8GB of RAM. The experiments are conducted on a set of *ISCAS*'89 benchmark circuits synthesized with respect to 45nm Nangate Open Cell Library [20]. The inputs of this tool are the DEF and GDSII files for the corresponding benchmarks.

The graph of Fig. 1 shows the vulnerability of the benchmark s1423 presenting the SER estimation for each grid. Some areas seem to be more susceptible than others, making it

possible for the designers to reconsider the placement process in order to mitigate SER.



Fig. 1. Estimated SER of each grid for s1423 benchmark circuit.

Fig. 2 presents in what degree the masking phenomena affect SER for some grids of the circuit s15850. Particularly, logical and electrical masking effects have a greater impact on mitigation of SER than timing. Also, grid 60 is expected to be less vulnerable compared to grid 42, since almost all errors are completely masked. SER estimation depends on the type of the affected transistors as well. When a particle strikes an inactive pMOS transistor the generated pulse is greater, since the parasitic bipolar effect is worse than in an nMOS. Thus, the results in Fig. 3, in combination with those of Fig. 2 give a more detailed view of the grids' susceptibility.



Fig. 2. Particular grids of s15850 and the percentage of the injected errors that become logically, electrically and timingly masked.

Particularly, Fig. 3 shows in how many of the 100 simulations, i.e. particle strikes, the number of affected pMOS exceed the corresponding number of nMOS transistors and vice versa. Also, it presents the number of simulations that particle hits have no impact on the circuit as well as the SER of each grid. The SER of grid 25 is greater than grid's 31 even though the corresponding percentages of the errors that are not masked are nearly equal. This is explained since the affected pMOS transistors for the former grid are more than the latter.



Fig. 3. Number of affected transistors for 100 simulations for some grids of s15850 with the corresponding SER values.

The modeling of SET pulse width is a key factor as it is a function of operating temperature [19]. Increasing the temperature, pulse widths become more intense leading, as a result, to a greater SER. Fig. 4 shows the estimated SER at three different temperatures. Increasing the temperature for the same technology of 45-nm, the generated pulses become greater and this explains the fact that at the temperature of 100°C, SER is bigger in comparison with the other two cases.



Fig. 4. Number of affected transistors for 100 simulations for some grids of s15850 with the corresponding SER values.

Table I presents SER estimation for some benchmarks along with the execution time. SER is expressed in FIT (Failures In Time) and temperature remains stable at 25°C. The difference between two types of simulations is shown. The first regards MTFs while the second is for TFs. The verification with SPICE, which gives a maximum deviation of 10%, has been made for small benchmarks since it is considered time-consuming for the large-scale circuits.

 TABLE I.
 EVALUATION OF SER OBTAINED FROM THE PROPOSED TOOL FOR

 SOME BENCHMARK CIRCUITS FOR TWO TYPES OF ANALYSIS AS WELL AS THE
 CORRESPONDING EXECUTION TIME

| Benchmark | # of cells | Proposed | TF Analysis | Difference<br>(%) | Execution<br>time |
|-----------|------------|----------|-------------|-------------------|-------------------|
| s27       | 13         | 0.001501 | 0.000632    | 57.9              | 9 sec.            |
| s298      | 166        | 0.004012 | 0.002581    | 35.6              | 22 sec.           |
| s1423     | 991        | 0.017326 | 0.012636    | 27.1              | ~ 2 min.          |
| s9234     | 6,983      | 0.018341 | 0.011705    | 36.1              | ~ 17 min.         |
| s15850    | 12,101     | 0.049041 | 0.040283    | 17.8              | ~ 35 min.         |
| s35932    | 21,243     | 0.011752 | 0.010972    | 6.63              | ~ 58 min.         |

## V. CONCLUSION

The proposed tool provides an accurate SER estimation examining the propagation of each error separately, which is not taken into consideration by other studies, and statistics regarding the vulnerable areas of the circuits. The results that can be exploited from industry, present the relationship between the circuits' topology and their vulnerability, as well as how the type and the characteristics of each gate impact on the SER estimation.

#### References

- P. Hazucha and C. Svensson, "Impact of CMOS technology scaling on the atmospheric neutron soft error rate," in *IEEE Transactions on Nuclear Science*, vol. 47, no. 6, pp. 2586-2594, Dec. 2000.
- [2] N. Seifert et al., "Radiation-induced soft error rates of advanced CMOS bulk devices," in Proceedings of the IEEE International Reliability Physics Symposium (IRPS), Mar. 2006, pp. 217-225.

- [3] M. Zhang and N. R. Shanbhag, "Soft-Error-Rate-Analysis (SERA) methodology," in *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 25, no. 10, pp. 2140-2155, Oct. 2006.
- [4] M. Anglada, R. Canal, J. L. Aragón and A. González, "MASkIt: Soft error rate estimation for combinational circuits," 2016 IEEE 34th International Conference on Computer Design (ICCD), Scottsdale, AZ, 2016, pp. 614-621.
- [5] F. Wang and V. D. Agrawal, "Soft error rate determination for nanoscale sequential logic," in *Proceedings of the 11th International Symposium on Quality Electronic Design (ISQED)*, 2010, pp. 225-230.
- [6] A. C.-C. Chang, R. H. -. Huang and C. H. -. Wen, "CASSER: A closedform analysis framework for statistical soft error rate," in *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 21, no. 10, pp. 1837-1848, Oct. 2013.
- [7] B. T. Kiddie, W. H. Robinson and D. B. Limbrick, "Single-Event Multiple-Transients (SEMT): Circuit characterization and analysis," in *IEEE Workshop Silicon Errors in Logic-System Effects (SELSE)*, Palo Alto, CA, USA, 2013.
- [8] A. Evans, M. Glorieux, D. Alexandrescu, C. B. Polo and V. Ferlet-Cavrois, "Single event multiple transient (SEMT) measurements in 65 nm bulk technology," 2016 16th European Conference on Radiation and Its Effects on Components and Systems (RADECS), 2016, pp. 1-6.
- [9] D. Rossi, M. Omana, F. Toma and C. Metra, "Multiple transient faults in logic: an issue for next generation ICs?," 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT), Oct. 2005, pp. 352-360.
- [10] P. R. Nanditha, S. Sarik and M. P. Desai, "On the likelihood of multiple bit upsets in logic circuits." arXiv preprint arXiv:1401.1003, 2014.
- [11] H. Huang and C. H.-P. Wen, "Layout-Based soft error rate estimation framework considering multiple transient faults—From device to circuit level," in *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 35, no. 4, pp. 586-597, Apr. 2016.
- [12] M. Ebrahimi, H. Asadi and M. B. Tahoori, "A layout-based approach for multiple event transient analysis," 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC), 2013, pp. 1-6.
- [13] N. Miskov-Zivanov and D. Marculescu, "Multiple transient faults in combinational and sequential circuits: A systematic approach," in *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 29, no. 10, pp. 1614-1627, Oct. 2010.
- [14] M. Fazeli, S. N. Ahmadian, S. G. Miremadi, H. Asadi and M. B. Tahoori, "Soft error rate estimation of digital circuits in the presence of Multiple Event Transients (METs)," in Design, Automation & Test in Europe Conference & Exhibition (DATE), Mar. 2011, vol., no., pp.1-6.
- [15] G. I. Paliaroutis, P. Tsoumanis, N. Evmorfopoulos, G. Dimitriou and G. I. Stamoulis, "A placement-aware soft error rate estimation of combinational circuits for multiple transient faults in CMOS technology," 2018 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), Chicago, IL, USA, 2018, pp. 1-6.
- [16] Y. Du and S. Chen, "A novel layout-based single event transient injection approach to evaluate the soft error rate of large combinational circuits in Complimentary Metal-Oxide-Semiconductor bulk technology," in *IEEE Transactions on Reliability*, vol. 65, no. 1, pp. 248-255, Mar. 2016.
- [17] J. Li and J. Draper, "Accelerated Soft-Error-Rate (SER) estimation for combinational and sequential circuits," in ACM Transactions on Design Automation of Electronic Systems, vol. 22, no. 3, article 57, May 2017.
- [18] X. Cao et al., "A layout-based soft error vulnerability estimation approach for combinational circuits considering Single Event Multiple Transients (SEMTs)," in *IEEE Transactions on Computer-Aided Design* of Integrated Circuits and Systems, May 2018.
- [19] M. J. Gadlage *et al.*, "The effect of elevated temperature on digital single event transient pulse widths in a bulk CMOS technology," in 2009 IEEE International Reliability Physics Symposium (IRPS), Montreal, QC, 2009, pp. 170-173.
- [20] Nangate 45nm Open Cell Library, Nangate Inc., 2009. [Online]. Available: http://www.nangate.com/