

# Detecting and Correcting the Multiple Errors in Video Coding System

## <sup>1</sup>Kunkala Suman, <sup>2</sup>K Dhanunjaya

<sup>1</sup>M.tech / ECE, Audisankara College of Engineering and Technology, Gudur, A.P, INDIA <sup>2</sup>Professor & Head of the department of ECE, Audisankara College of Engineering and Technology, Gudur, A.P, INDIA

## ABSTRACT

Motion Estimation (ME) is the process of determining the motion vectors that describe the transformation from one 2D image to other, usually from adjacent frames in a video sequence. The process of ME is the critical part of any video coding system as the video quality will be affected if an error has occurred in the ME. In order to test Motion Estimation in a video coding system an Error Detection and Correction Architecture (EDCA) is designed based on the Residue-and-Quotient (RQ) code. Multiple bit errors in the processing element (PE) i.e., key component of a ME, can be detected and the original data can be recovered effectively by using EDCA design. Experimental results indicate that the proposed design can detect multiple bit errors and effectively recover the data with reduced area. The area is reduced from 1665 to 345.

Keywords: Processing Element, Error detection, Error correction, Motion estimation, Residue-and-quotient (RQ) code

## **INTRODUCTION**

Multimedia applications are becoming more flexible and reliable with the advancements in semiconductors, Digital Signal Processing and Communication technologies. Some of the Video Compression standards include MPEG-1, MPEG-2, MPEG-4. The advanced Video Coding standard is MPEG-4. Video compression is essential in various applications to reduce the total amount of data required for transmitting or storing the video data. ME is of priority concern in removing the temporal redundancy between the successive frames in a video coding system and also it consumes more time. ME is considered as the intensive unit in terms of computation [1].

Regular arrangement of PEs with size 4x4 constitutes a ME. Advancements in VLSI technologies facilitate the integration of large number of PEs into a single chip. Large number of PEs arranged as an array helps in accelerating the computation speed.

Testing of PEs is essential as an error in PE affects the video quality and signal-to-noise ratio. Numerous PEs in a ME can be tested concurrently using Concurrent Error Detection (CED) methods [8]. In this method, different operations are performed on the same operand. An error is detected by the conflicting results produced by the operations performed. Concurrent fault simulation is essentially an event-driven simulation with the fault-free circuit and faulty circuits simulated altogether [6].

\*Address for correspondence:

jtinureddy@gmail.com

Design for Testability (DFT) techniques are required in order to improve the quality and reduce the test cost of the digital circuit, while at the same time simplifying the test, debug and diagnose tasks [3].Logic built-in self-test (BIST) is a design for testability technique in which a portion of a circuit on a chip, board, or system is used to test the digital logic circuit itself [4]. BIST technique for testing logic circuits can be online or offline. Online BIST is performed when the functional circuitry is in normal operational mode. In concurrent online BIST, testing is conducted simultaneously during normal functional operation. The functional circuitry is usually implemented with coding techniques [5].

Any input pattern or sequence of input patterns that produces a different output response in a faulty circuit from that of the fault-free circuit is a test vector, or sequence of test vectors, that will detect the faults. The goal of test generation is to find an efficient set of test vectors that detects all faults considered for that circuit. Because a given set of test vectors is usually capable of detecting many faults in a circuit, fault simulation is typically used to evaluate the fault coverage obtained by that set of test vectors.

Because of the diversity of VLSI defects, it is difficult to generate tests for real defects. Fault models are necessary for generating and evaluating a set of test vectors. Generally, a good fault model should satisfy two criteria: (1) It should accurately reflect the behavior of defects, and (2) it should be computationally efficient in terms of fault simulation and test pattern generation.

The rest of this paper is organized as follows. Section 2 introduces the proposed EDCA, fault model definition, blocks in the architecture and test method. Next, Section 3 evaluates the performance in terms of area, delay to demonstrate the feasibility of the proposed EDCA for ME testing applications. Conclusions are finally drawn in Section 4.

## **PROPOSED EDCA DESIGN**

The proposed EDCA scheme shown in Fig. 1 consists of two major blocks, i.e. error detection circuit (EDC) and data recovery circuit (DRC), to detect the errors and recover the corresponding data in a specific CUT [9]. The test code generator (TCG) in the architecture utilizes the concepts of RQ code to generate the corresponding test codes for error detection and data recovery.



Figure 1. Conceptual View of the proposed EDCA design

The output from the circuit under test is compared with the test code values in the EDC. The output of EDC indicates the occurrence of error. DRC is in charge of recovering data from TCG. Additionally, a selector is enabled to select the error-free data or data-recovery results.

A ME consists of many PE's incorporated in a 1-D or 2-D array for video encoding applications. A PE generally consists of two ADDs (i.e. an 8-b ADD and a 12-b ADD) and an accumulator (ACC) [2]. Next, the 8-b ADD (a pixel has 8-b data) is used to estimate the

addition of the current pixel (cur\_pixel) and reference pixel (Ref\_pixel). Additionally, a 12b ADD and an ACC are required to accumulate the results from the 8-b ADD in order to determine the sum of absolute difference (SAD) value for video encoding application. Fig. 2 shows the proposed EDCA circuit design for a specific  $PE_i$  of a ME [9]. This architecture consists of blocks that generate the residue and quotient values that are used to detect the errors.

## **Fault Model**

A more practical approach is to select specific test patterns based on circuit structural information and a set of fault models. This approach is called structural testing. Structural testing saves time and improves test efficiency.

A stuck-at fault affects the state of logic signals on lines in a logic circuit, including primary inputs (PIs), primary outputs (POs), internal gate inputs and outputs, fanout stems (sources), and fanout branches. A stuck-at fault transforms the correct value on the faulty signal line to appear to be stuck at a constant logic value, either a logic 0 or a logic 1, referred to as stuck-at-0 (SA0) or stuck-at-1 (SA1), respectively. The stuck-at fault model can also be applied to sequential circuits; however, high fault coverage test generation for sequential circuits is much more difficult than for combinational circuits. The stuck-at (SA) model, must be adopted to cover actual failures in the interconnect data bus between PEs. The SA fault in a ME architecture can incur errors in computing SAD values [7]. A distorted computational error and the magnitude of e are assumed here to be equal to SAD'- SAD where SAD ' denotes the computed SAD value with SA faults.



Figure2. Proposed EDCA design

#### **Residue-And-Quotient Code Generation**

Earlier codes like Parity Codes, Berger Codes were used for detecting a single bit error. Next evolved the residue code which is generally a separable arithmetic code that estimates the residue of the given data and appends it to the data [10]. This code is capable of detecting a single bit error. Error detection logic using residue codes are simple and it can be easily implemented. For instance, assume that N denotes an integer,  $N_I$  and  $N_2$  represent data words, and m refers to the modulus. A separate residue code is one in which N is coded as a pair (N, |N/m). Notably,  $|N|_m$  is the residue of N modulo m. However, only a bit error

can be detected based on the residue code. Additionally, error correction is not possible by using the residue codes. Therefore, this work presents a quotient code to assist the residue

code in detecting multiple bit errors and recovering errors. The mathematical model of RQ code is simply described as follows. Assume that binary data X is expressed as

$$X = \{b_{n-1}b_{n-2}.....b_2b_1b_0\} = \sum_{j=0}^{n-1}b_j 2^j$$
(1)

The RQ code for X modulo m is expressed as  $R = |X|_m$  and  $Q = \frac{X}{m}$  respectively [9]. Notably,

 $[\underline{i}]$  denotes the largest integer not exceeding i.

According to the above RQ code expression, the corresponding circuit design of the RQCG can be realized. In order to simplify the complexity of circuit design, the implementation of the module is generally dependent on the addition operation. Additionally, based on the concept of residue code, the following definitions shown can be applied to generate the RQ code for circuit design.

$$\begin{aligned} & \text{Definition 1:} \\ & \left| N_{1} + N_{2} \right|_{m} = \left\| N_{1} \right\|_{m} + \left| N_{2} \right\|_{m} \right\|_{m} \end{aligned} \tag{2} \\ & \text{Definition 2: Let, } N_{j} = n_{1} + n_{2} + \dots + n_{j} \\ & \left| N_{j} \right|_{m} = \left\| n_{1} \right\|_{m} + \left| n_{2} \right\|_{m} + \dots + \left| n_{j} \right\|_{m} \right\|_{m} \end{aligned} \tag{3}$$

To accelerate the circuit design of RQCG, the binary data shown in (1) can generally be divided into two parts:

$$X = \sum_{j=0}^{n-1} b_j 2^j$$
  
=  $\left(\sum_{j=0}^{k-1} b_j 2^j\right) + \left(\sum_{j=k}^{n-1} b_j 2^{j-k}\right) 2^k$   
=  $Y_0 + Y_1 2^k$  (4)

Significantly, the value of k is equal to  $\frac{m}{2}$  and the data formation of  $Y_0$  and  $Y_1$  are a decimal system. If the modulus  $m = 2^k \uparrow 1$ , then the residue code of X modulo m is given by,

$$R = |X|_{m}$$

$$= |Y_{0} + Y_{1}|_{m} = |Z_{0} + Z_{1}|_{m} = (Z_{0} + Z_{1})\alpha \quad (5)$$

$$Q = \left\lfloor \frac{X}{m} \right\rfloor$$

$$= \left\lfloor \frac{Y_{0} + Y_{1}}{m} \right\rfloor + Y_{1} = \left\lfloor \frac{Z_{0} + Z_{1}}{m} \right\rfloor + Z_{1} + Y_{1}$$

$$= Z_{1} + Y_{1} + \beta \quad (6)$$
where,
$$\alpha(\beta) = \begin{cases} 0(1), & \text{if } Z_{0} + Z_{1} = m \\ 1(0), & \text{if } Z_{0} + Z_{1} < m \end{cases}$$

Since the value of Y<sub>0</sub> and Y<sub>1</sub> is generally greater than that of modulus *m*, the equations in (5) and (6) must be simplified further to replace the complex module operation with a simple addition operation by using the parameters Z<sub>0</sub>, Z<sub>1</sub>,  $\alpha$  and  $\beta$  [10].

Based on (5) and (6), the corresponding circuit design of the RQCG is easily realized by using the simple adders (ADDs). Namely, the RQ code can be generated with a low

95

complexity and little hardware cost.

## **Test Code Generation**

TCG is the main block of the proposed EDCA design. Test Code Generation (TCG) design is based on the ability of the RQCG circuit to generate corresponding test codes in order to detect errors and recovers data. The specific  $PE_i$  estimates the absolute difference between the Cur\_pixel of the search area and the Ref\_pixel of the current macro block. The SAD value for the macro block with size of *NxN* can be evaluated

N-1 N-1

$$\begin{split} SAD &= \sum_{i=0}^{n} \sum_{j=0}^{n} \left| X_{ij} - Y_{ij} \right| \\ R_{1} &= \left| \sum_{i=0}^{N-1} \sum_{j=0}^{N-1} (X_{ij} - Y_{ij}) \right|_{m} \\ &= \left| X_{00} - Y_{00} \right|_{m} + \left| (X_{01} - Y_{01}) \right|_{m} + \ldots + \left| (X_{(N+0)(N+0)} - Y_{(N+0)(N+0)}) \right|_{m} \right|_{m} \\ &= \left\| (q_{s00}m + r_{s00}) - (q_{y00}m + r_{y00}) \right|_{m} + \ldots + \left| (q_{s(N+0)(N+0)} - Y_{(N+0)(N+0)}) \right|_{m} \right|_{m} \\ &= \left\| (r_{s00} - r_{j00}) \right|_{m} + \left| (r_{s01} - r_{s01}) \right|_{m} + \ldots + \left| r_{s(N+0)(N+1)} - r_{s(N+0)(N+1)} \right|_{m} \right|_{m} \\ &= \left\| (r_{s00} - r_{j00}) \right|_{m} + \left| r_{01} \right|_{m} + \ldots + \left| r_{(N-1)(N-1)} \right|_{m} \right|_{m} \qquad (8) \\ Q_{T} &= \left[ \frac{\sum_{j=0}^{N-1} \sum_{j=0}^{N-1} (X_{ij} - Y_{ij}) \right]_{m} \\ &= \left| \left| \frac{(X_{00} - Y_{00}) + (X_{01} - Y_{01}) + \ldots + (X_{(N+0)(N-1)} - Y_{(N-0)(N-1)}) \right|_{m} \right|_{m} \\ &= \left| \left| \frac{(q_{n0}m - q_{j00}m)}{m} + \frac{(r_{s00} - r_{j00})}{m} + \frac{(r_{s00} - r_{j00}) + (r_{s01} - r_{j01})}{m} + \ldots \right|_{m} \\ &= \left| \left| q_{00} - q_{j00} \right|_{s(0)} + \left( q_{s01} - q_{j01} \right) + \ldots + \left( \frac{(r_{s00} - r_{j00}) + (r_{s01} - r_{j01})}{m} + \ldots \right|_{m} \right|_{m} \\ &= \left| q_{00} + q_{01} + \ldots + q_{(N+0)(N-0)} + \left| \frac{(r_{00} + r_{01} + \ldots + r_{(N+0)(N-1)})}{m} \right|_{m} \right|_{m} \end{aligned} \right|_{m}$$

#### **Error Detection Process**

Error Detection Circuit (EDC) is used to perform the operation of error detection in the specific PE<sub>i</sub> as shown in Fig.2. This block is used to compare the output from the TCG i.e., ( $R_T$  and  $Q_T$ ) with output from RQCG1 i.e., ( $R_{PEi}$  and  $Q_{PEi}$ ), in order to detect the occurrence of an error. If the value of  $R_{PEi} \neq R_T$  and  $Q_{PEi} \neq Q_T$ , then the error in the  $PE_i$  can be detected. The EDC output is then generated as a 0/1 signal to indicate that the tested  $PE_i$  is error free/ with error.

#### **Error Correction Process**

The original data is recovered in the Data Recovery Circuit (DRC) during error correction process, by separating the RQ code from the TCG. The data recovery is possible by implementing the mathematical model as

$$SAD = m \times Q_T + R_T$$
  
=  $(2^j \uparrow 1) \times Q_T + R_T$   
=  $2^j \times Q_T \uparrow Q_T + R_T$  (10)

The proposed EDCA design executes the error detection and data recovery operations simultaneously. Additionally, error-free data from the tested  $PE_i$  or the data recovery that results from DRC is selected by a multiplexer (MUX) to pass to the next specific  $PE_{i+1}$  for

subsequent testing.

|           | 0   | 1   | 2   | 3   |           | 0 | 1 | 2 | 3 |
|-----------|-----|-----|-----|-----|-----------|---|---|---|---|
| 0         | 128 | 128 | 64  | 255 | 0         | 1 | 1 | 2 | 3 |
| 1         | 128 | 64  | 255 | 64  | 1         | 1 | 2 | 3 | 4 |
| 2         | 64  | 255 | 64  | 128 | 2         | 2 | 3 | 4 | 5 |
| 3         | 255 | 64  | 128 | 128 | 3         | 3 | 4 | 5 | 5 |
| Cur_pixel |     |     |     |     | Ref_pixel |   |   |   |   |

## **RESULTS AND DISCUSSIONS**

Residue-and-quotient code is used in the existing error detection architecture that has 16 Processing Elements (PEs) and 16 Test Code Generation (TCG) blocks for computing the test codes for each pixel value in the macro block [10]. Though this architecture can detect multiple bit errors, the area of the architecture is high and the overall delay is high. To overcome this disadvantage, the proposed architecture is used which has a single PE and a TCG for computing the test codes for the pixel values.

Verification of the proposed design is performed by using VHDL. The performance of the proposed EDCA design is estimated in terms of its area. In this architecture, the pixel values in the 4x4 macro block are taken out for the code generation based on the clocking signal. At each triggering edge of the clock the 8-bit pixel value from the ref\_pixel and cur\_pixel is taken and given to the TCG and PE blocks for code generation. The generated code from test code generation block and the RQCG output is compared in the EDC block, output of which indicates the error. As the code generation for each 8 bit pixel value is done based on the clock signal, the need for separate PE and TCG for each 16 elements of the 4x4 macro block is avoided. By this the overall area and delay of the proposed architecture is reduced.

The overall test strategy is based on the RQ code generation which can detect multiple bit errors at a reduced delay and very less area. This reduction in area can be utilized for VLSI circuit approach to design the proposed architecture in an area efficient way. By using a single clock triggering, the number of macro blocks can be extended with reduced area for testing the PEs of a ME.

## CONCLUSION

The motion estimation process is done by the video coding system to find the motion vector pointing to the best prediction macro block in a reference frame or field. An error occurring in ME can cause degradation in the video quality. In order to make the coding system efficient an Error Detection and Correction Architecture (EDCA) is designed for detecting the errors and recovering the data of the PEs in a ME. Based on the RQ code, a RQCG-based TCG design is developed to generate the corresponding test codes that can detect multiple bit errors and recover data. Experimental results indicate that the proposed EDCA design can effectively detect the errors and recover data in PEs of a ME with reduced area and delay

## REFERENCES

- [1] Y.W.Huang, B.Y.Hsieh,S.Y. Chien, S.Y. Ma, and L.G. Chen, "Analysis and complexity reduction of multiple reference frames motion estimation in H.264/AVC," IEEE Trans. Circuits Syst.Video Technol., Vol.16,no.4,pp.507-522,Apr.2006.
- [2] S.Surin and Y.H. Hu,"Frame-level pipeline motion estimation array processor," IEEE Trans. Circuits Syst. Video Technol., vol. 11, no.2, pp.248-251, Feb.2001.
- [3] M.Y.Dong, S.H.Yang, and S.K. Lu," Design-for-testability techniques for motion estimation

computing arrays," in Proc. Int. Conf. Commun., Circuits Syst., May 2008, pp.1188-1191.

- [4] J.F. Lin, J.C. Yeh, R.F. Hung, and C.W. Wu, "A built-in self repair design for RAMs with 2-D redundancy," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol.13, no. 6, pp.742-745, Jun. 2005.
- [5] C.L. Hsu, C.H. Cheng, and Y. Liu," Built-in self detection/correction architecture for motion estimation computing arrays," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 18, no.2, pp. 319-324, Feb.2010.
- [6] S.Bayat-Sarmadi and M.A. Hasan,"On concurrent detection of errors in polynomial basis multiplication," IEEE Trans. Very Large Scale Integr. (VLSI) Systs., vol. 15, no.4, pp. 413-426, Apr.2007.
- [7] D. K. Park, H.M.Cho, S.B. Cho, and J.H. Lee, "A fast motion estimation algorithm for SAD optimization in sub-pixel," Proc. Int. Symp. Integr. Circuits, Sep. 2007, pp. 528-531
- [8] C.W. Chiou, C.C. Chang, C.Y. Lee, T.W. Hou, and J.M. Lin," Concurrent error detection and correction in Gaussian normal basis multiplier over GF (2<sup>m</sup>), IEEE Trans. Comput., vol.58, no.6, pp.851-857, Jun.2009.
- [9] Chang- Hsin Cheng, Yu Liu, and Chun-Lung Hsu, Member, IEEE " Design of an Error Detection and Data recovery Architecture for Motion Estimation Testing Applications," IEEE transactions on Very Large scale integration (VLSI) systems, vol.20, no.4, April.2012.
- [10] L.Breveglieri, P. Maistri, and I. Koren, "Anote on error detection in an RSA architecture by means of residue codes," in Proc. IEEE Int. Symp. On-Line Testing, Jul. 2006, 176 177

## **AUTHORS' BIOGRAPHY**



Kunkala Suman I received B.Tech in the year 2013 from VEL Tech. University, chennai. Now pursuing M.Tech in Audisankara College of Engineering and Technology, Gudur, A.P, INDIA



**Mr. K Dhanunjaya** received his B.Tech Degree in Electronics & Communication Engineering from G.Pulla Reddy Engineering College, Kurnool, AP in 1998, M.Tech. in ECE from Jawaharlal Nehru Technological University Kakinada in 2001. He is currently doing Perusing Ph.D in Low power VLSI design from Jawaharlal Nehru Technological University Anantapur. He has 14 years teaching experience, presently working as Professor & Head of the department of ECE,

Audisankara College of Engineering and Technology (Autonomous), affiliated to JNTUA, Gudur. He is a life time member of IETE & ISTE and member of IEEE.