PARTIAL UPDATE ALGORITHMS AND ECHO DELAY ESTIMATION PARTIAL UPDATE ALGORITHMS AND ECHO DELAY ESTIMATION

In this paper, we introduce methods for extracting an echo delay between speech signals using adaptive filtering algorithms. Time delay estimation is an initial step for many speech processing applications. Conventional techniques that estimate a time difference of arrival between two signals are based on the peak determination of the generalized cross-correlation between the signals. To achieve a good precision and sta-bility in estimation, the input sequences have to be multiplied by an appropriate weighting function. Regularly, the weighting functions are dependent on the signals power spectra. The spectra are generally unknown and have to be estimated in advance. An implementation of the time delay estimation via the adaptive least mean squares is analogous to estimating the Roth generalized cross-correlation weighting func-tion. The estimated parameters using proportionate partial-update adaptive


Introduction
Time delay estimation (TDE) has always been and remains a popular research topic. It finds application in many areas of electrical engineering [1][2][3][4]. As technology advances and the data transmission methods tend more to packet-switching concepts; the traditional echo problem remains important. An issue in echo analysis is the round-trip delay of the network. The main problem associated with IP-based networks is that the round-trip delay can be never reduced below its fundamental limit. There is always the delay of at least two to three packet sizes (50 to 80 ms) [5] that can make the existing network echo more audible [6]. A number of efforts were made in order to improve the TDE precision. Various methods based on the Generalized Cross-Correlation (GCC) were recently proposed [7][8][9][10]. The GCC algorithms mainly arrange a pre-filter to obtain the modified signal spectrum for optimal time delay estimation. To specify the filter's characteristic, it requires a priori knowledge of the statistics of the received signals. However, the efficiency of the algorithms decreases considerably when little or no prior knowledge about the signal statistics is known. From the time when B.Widrow proposed an adaptive filtering technique based on Least Mean Squares (LMS) [11][12][13], an adaptive theory also found an application to delay estimation. An adaptive implementation of the time delay estimation via Widrow's LMS algorithm is usually referred to as TDLMS. Comparing to the GCC algorithms, the adaptive filtering techniques do not require a priori information of the signal statistics, because the estimation of the signal spectrum is no longer needed. The adaptive filtering algorithms determine the time delay in an iterative manner. There are comparative studies, which provide comparison of the LMS versus the generalized cross-correlation [14], [15]. Generally, the time domain imple-mentation of any adaptive filter is associated with high computational complexity. It directly depends on the length of the adaptive filter [16]. In order to reduce the computational load of the TDLMS, we offer using adaptive filtering algorithms with reduced computational complexity [17][18][19].

Time Domain Adaptive Techniques
Traditionally in the implementation of the echo canceller, the NLMS algorithm performs as a reference [13]. Basically, the NLMS algorithm is a simple extension of the Widrow's LMS algorithm [12]. Knowing the adaptive theory, it is trivial that the delay estimation can be achieved by selecting the largest value from the adaptive filter weights vector, w. There is only one issue that has to be taken into account. The adaptive filter needs some time in order to converge to the optimal performance. The existing adaptive algorithms differ from each other with different convergence properties and computational memory requirements. The robust fast converging algorithms are primarily used in the acoustical echo cancellation applications. They take a lot of computational resources. In our case, it is not necessary to apply the complex algorithms, because the adaptive filter is not directly used for the purpose of echo cancellation, but for the delay estimation. Therefore, the reduced complexity adaptive filtering algorithms became the subject of our interest.

Proportionate Adaptive Filtering
The proportionate normalized least mean squares (PNLMS) algorithm proposed in [20] has been developed for use especially

PARTIAL UPDATE ALGORITHMS AND ECHO DELAY ESTIMATION Kirill Sakhnov -Ekaterina Verteletskaya -Boris Simak *
In this paper, we introduce methods for extracting an echo delay between speech signals using adaptive filtering algorithms. Time delay estimation is an initial step for many speech processing applications. Conventional techniques that estimate a time difference of arrival between two signals are based on the peak determination of the generalized cross-correlation between the signals. To achieve a good precision and stability in estimation, the input sequences have to be multiplied by an appropriate weighting function. Regularly, the weighting functions are dependent on the signals power spectra. The spectra are generally unknown and have to be estimated in advance. An implementation of the time delay estimation via the adaptive least mean squares is analogous to estimating the Roth generalized cross-correlation weighting function. The estimated parameters using the adaptive filter have a smaller variance, because it avoids the need for the spectrum estimation. In the following, we discuss proportionate and partial-update adaptive techniques and consider their performance in term of delay estimation.
in the telephone network environment. For hybrid echo cancellers, it is reasonable to assume that the echo path has a sparse character (i.e., many IR's (Impulse Response) coefficients are close to zero). Although there are studies and research on the multiple reflection echo paths [17], a typical echo path impulse response in the practical communication networks has only one reflection, which means all the active coefficients are occupied in a continuous area of the whole echo span. Proportionate approaches achieve their higher convergence rate by using the fact that the active part of network echo path is usually much smaller (4-8ms) compared to 64-128 ms of the whole echo path that has to be covered by the adaptive filter. In case of voice transmission over the packet-switching network, these numbers may be more considerable [5]. In the PNLMS algorithm, the adaptive step-size parameters are assigned to all the filter coefficients. They are calculated from the last estimate of the filter weights in such a way that a larger coefficient receives a larger increment. As a result, the convergence rate can be increased the fact that the active taps are adjusted faster than non-active coefficients. Therefore for the sparse IR, the PNLMS algorithm converges much faster comparing to the NLMS. This feature is an advantage especially when it is necessary to estimate the long echo delays. The PNLMS algorithm can be described using the following equations [21]: where G(nϪ1) is a diagonal matrix adjusting the step-size parameters, μ 0 is an overall step-size parameter. The diagonal elements of G(n) are estimated as follows: Parameters δ p and ρ are positive numbers with typical values δ p ϭ 0.01 and ρ ϭ 5/L. The first term in (5), ρ, prevents w l (n) from stalling when it is much smaller than the largest coefficient and δ p regularizes the updating when all coefficients have zero values at initialization.
In spite of the sparse system identification, which is a vital requirement for the fast converging adaptive filters, there is another requirement. It is directly addressed to the adaptive filter implementation. The algorithm should have reasonable power concerns. Unfortunately, the PNLMS algorithm has several drawbacks. One of them is an increase in the computational complexity by 50 % compared to the NLMS algorithm. Furthermore, the PNLMS algorithm shows the slow convergence rate after the fast initial start. It is because of the slow convergence rate dedicated to the small , g n n n n n n n n en n coefficients [22]. The increased computational complexity can be reduced by the way of selective partial-updating. In turn, the slow convergence of the PNLMS in the stable state can be improved by switching from the PNLMS to NLMS equations after the fast initial convergence has been achieved [23].

Partial-Update Adaptive Filtering
The partial-update algorithms can be seen to exploit the sparseness of the echo path in two different ways. It is known that when the unknown system's impulse response is sparse, many of the adaptive filter's weights can be approximated to zero. Alternatively, the sparseness may be present in the weight update vector as a consequence of the distribution of the input samples in the (Lx1) input vector, T . In both these cases, exploiting the sparseness properties can reduce complexity and improve performance of the adaptive algorithm [24], [25]. Some of the first work on the partial-update algorithms was done by Douglas [26]. It presents the periodic and the sequential updating schemes for the Max-NLMS algorithm. However, these partialupdate algorithms show slow convergence 2properties compared to the full-update algorithms. The reason is inconsistent updating schemes. More recently, the partial-updating concept was developed by Aboulnasr [27]. It leads to the M-Max NLMS algorithm and supporting convergence analysis [28]. Another block-updating scheme for the NLMS algorithm was studied by Schertler [29]. The latter work was published by Dogancay and Tanrikulu. They consider approaches for more robust Affine Projection Algorithm (APA) [30], [31].

MϪMax NLMS
The algorithm selects a specified number of the coefficients providing the largest reduction in the mean squared error per iteration [32]. Only M out of the total L filter coefficients are updated. Those M coefficients are the ones associated with the M largest values within the following vector |x(n Ϫ i ϩ 1)|; i ϭ 1; …; L. The update equations for this algorithm are , One of the features of the M-MAX-NLMS algorithm is that it reduces the complexity of the adaptive filter by selectively updating the coefficients while maintaining the closest performance to the full-update NLMS algorithm. We present misalignment curves for the algorithm in the follow-up section.

Selective-partial-update NLMS
This algorithm opposed to the M-Max NLMS has a block structure. An objective behind the latter is the same: it reduces computational costs by updating a subset of the filter coefficients. But first, the vector x(n) and the coefficient vector w(n) are arranged into K blocks of length M ϭ L/K, where L is an integer as in (7) .
The coefficient vector's blocks w 1 (n), w 2 (n), …, w K (n) represent candidate subsets that can be updated during the current iteration. For a single-block updating scheme, the constrained minimization problem, which is solved by the NLMS algorithm, can be written as , The selection of the block that has to be updated is made by determining the block with the smallest squared-Euclidian-norm update [30]. According to (9), that justification can be described by the following terms where x I B and w I B are defined as follows (12) .
The computational and memory requirements of the selectivepartial-update NLMS algorithm are almost identical to those of the selective-block-update algorithm proposed in [28]. Nevertheless, simulation results illustrated in the next section shows that this approach does not lead to the reasonable trade-off between performance and simplicity. The algorithm's efficiency is weaker than the one of the M-Max NLMS algorithm. As an alternative approach, a sparse-partial-update NLMS algorithm applies more relevant selection criterion.

Sparse-partial-update NLMS
This algorithm utilizes a so-called sparse-partial (SP) weight selection criterion [33]. The adaptive filter weights are updated based on the largest product of the multiplication of x(n) and w(n). The SP-NLMS single-block update equations are given by , (14) n n n e n n w w n n n w w n e n x x 1 1 Hongyang and Dyba recently suggested a generalization for updating B blocks out of K [17], i.e. (16) (17)

Simple-partial-update PNLMS
The approach is based on the proportionate technique and partial updating of the adaptive filter coefficients. The algorithm exploits the sparseness of the communication channel to speed up the initial convergence and employs the partially updating scheme to reduce the computational complexity. A selection procedure is performed in accordance with the estimated magnitude of the channel's impulse response. The S-PNLMS algorithm for singleblock update is defined as follows. Arrange x(n) and w(n) into K blocks of length M ϭ L/K in the same way as it is done in (7) and (8). Then let G i (n) denote the corresponding M ϫ M block of the diagonal weighing matrix, G(n). The recursion for updating adaptive filter weights is given by , (18) where the block selection is done according to the following It is different to , which is used with the SPU-PNLMS algorithm [30]. It is apparent from the simulations that the S-PNLMS has similar performance to the SP-NLMS and outperforms the SPU-PNLMS algorithm. Its misalignment curves are presented in the next section. The S-PNLMS algorithm for updating B blocks out of M has these update equations (19) .
Further, we provide the comparison results for the presented algorithms and demonstrate their performance while estimating n n n e n n w w x x n x 1 1 the predefined echo delay. Table 2 illustrates the computational complexity of the full-update algorithms and shows saving achieved by the partially updating schemes. The only down side is that in order to find out the M largest outputs or inputs, you have to sort the output or input values. If the fast sorting algorithm is chosen [34], only 2log 2 (L)ϩ2 comparisons are required. For large L and small M, which is appropriate for the sparse impulse response, big computational savings are expected.

Results of experiments
To evaluate the performance of the algorithms, we implemented an adaptive filter in MATLAB. The filter has to estimate the predefined echo path's impulse responses specified in the ITU-T Recommendation [35]. The overall step-size parameter, μ 0 , is chosen to be 0.1. The control parameters ρ and δ p are chosen to be 0.001 and 0.01 respectively. For simplicity reason, a double-talk situation is not considered. In the first part of the experiment, we look at the misalignment curves of the M-Max-, SPU-, SP-and S-PNLMS algorithms. They are illustrated in Fig. 1 below. The SPU-updating scheme produces the worst results. The proposed S-criterion considerably outperforms it, especially in terms of the initial convergence speed. The rest of the algorithms have nearly the same convergence and tracking performance. All the algorithms, except the M-Max-PNLMS, show poor results when the M value equals 64. It can be explained by the fact that the active part of the IR is approximately 16ms long. This value corresponds to 128 samples for sampling frequency of 8kHz, therefore, 64 samples are not enough to cover the active region completely. Regarding to the dissimilar selection criterion, the M-Max-PNLMS algorithm can deal relatively well with that problem. The Max-updating formula does not count with the sparse character of the IR. It performs selection according to the distribution of the values of the input vector. Otherwise, its drawback is lower initial convergence speed comparing to the SP-PNLMS algorithm. The second part of our experiment concerns the performance of the adaptive algorithms versus the ones based on the generalized cross-correlation function. They are compared in the context of the time delay estimation.
Comparison in computational complexity Table 2.

Conclusion
The presented paper is a comparative study on the partialupdate algorithms and their application to the time delay estimation. When delivering the VoIP service in the packet-switching network, it is important to have the value of the echo delay under control. The increasing transmission delay associated with packet data transmission can make a negligible echo more annoying. Therefore, it is suggested using the echo assessment algorithm based on the reduced complexity partial-update adaptive filters. If the estimated echo is considerably delayed, it can be audible to the user. As a decision, an additional attenuation has to be placed to a particular channel in order to activate an echo canceller that removes the echo. The experiments show a reliable performance of these algorithms. Their precision only suffers at the initial stage when the adaptive filter's coefficients have not converged to the optimum value yet. According to the ITU-T Recommendation G.168, this period should not last more than one second. Taking into account the fact that the generalized cross-correlation algorithms operate in the frequency domain and use advantages of the fast Fourier transform, further computational savings for the adaptive filters can be achieved. It can be done through the multi-delay filters that outperform their time domain counterparts in terms of convergence rate and complexity. Therefore, the multi-delay filters and their implementation aspects are the next subject to our research of the adaptive filtering theory.

Acknowledgement
Research described in the paper was supervised by Prof. Ing. B. Simak, CSc., FEL CTU in Prague and supported by Czech Technical University grant SGS10/275/OHK3/3T/13 and the Ministry of Education, Youth and Sports of Czech Republic by the research program MSM 6840770014.
Mean values of the estimated echo delays Table 3. [