Архитектура Аудит Военная наука Иностранные языки Медицина Металлургия Метрология
Образование Политология Производство Психология Стандартизация Технологии


Measures for Fast Algorithms



According to the number of arithmetic operations required, there are many other fast Fourier transform algorithms which are still more eff ec- tive. Most of them are based on polynomial algebra and number theory. An in-depth discussion of these algorithms is given by Blahut [10]. How- ever, the mere number of arithmetic operations is not the only measure for an effi cient algorithm. We must also consider a number of other factors.

Access to the data requires additional operations. Consider the sim- ple example of the addition of two vectors. There, besides addition, the following operations are performed: the addresses of the appropriate el- ements must be calculated; the two elements are read into registers, and the result of these additions is written back to the memory. Depending on the architecture of the hardware used, these fi ve operations consti- tute a signifi cant overhead which may take much more time than the addition itself. Consequently, an algorithm with a complicated scheme to access the elements of a vector might add a considerable overhead to the arithmetic operations. In eff ect, a simpler algorithm with more arithmetic operations but less overhead to compute addresses may be faster.

Another factor for rating algorithms is the amount of storage space needed. This not only includes the space for the code but also storage space required for intermediate results or tables for constants. For ex- ample, a so-called in-place FFT algorithm, which can perform the Fourier transform on an image without using an intermediate storage area for the image, is very advantageous. Often there is a trade-off between stor- age space and speed. Many integer FFT algorithms, for example, precal-

N
culate the complex phase factors wv and store them in statically allo-

cated tables.

To a large extent the effi ciency of algorithms depends on the com- puter architecture used to implement them. If multiplication is per- formed either in software or by a microcoded instruction, it is much slower than addition or memory access. In this case, the aim of fast al- gorithms is to reduce the number of multiplications even at the cost of more additions or a more complex memory access. Such a strat- egy makes no sense on some modern high-speed architectures where pipelined fl oating-point addition and multiplication take just one clock cycle. The faster the operations on the processor, the more the memory access becomes the bottleneck. Fast algorithms must now consider ef- fective memory access schemes. It is crucial that as many computations as possible can be performed with one and the same set of data. In this way, these data can be kept in a fast intermediate storage area, known as the memory cache, and no direct access to the much slower general memory (RAM) is required.


72                                                                                        2 Image Representation

 

After this detailed discussion of the algorithm, we can now estimate the number of operations necessary. At each stage of the composition, N/2 complex multiplications and N complex additions are carried out. In total we need N/2 ldN complex multiplications and N ldN complex additions. A deeper analysis shows that we can save even more multi- plications. In the fi rst two composition steps only trivial multiplications by 1 or i occur (compare Fig. 2.22). For further steps the number of triv- ial multiplications decreases by a factor of two. If our algorithm could avoid all the trivial multiplications, the number of multiplications would be reduced to (N/2)(ld N 3).

The FFT algorithm is a classic example of a fast algorithm. The com- putational savings are enormous. For a 512-element vector, only 1536 instead of 262 144 complex multiplications are needed compared to the direct calculation according to Eq. (2.29). The number of multiplications has been reduced by a factor 170.

Using the FFT algorithm, the discrete Fourier transform can no longer be regarded as a computationally expensive operation, since only a few operations are necessary per element of the vector. For a vector with 512 elements, only 3 complex multiplications and 8 complex additions, corresponding to 12 real multiplications and 24 real additions, need to be computed per pixel.

 

2.5.4 Radix-4 Decimation-in-Time FFT‡

=
Having worked out one fast algorithm, we still do not know whether the algo- rithm is optimal or if even more effi cient algorithms can be found. Actually, we have applied only one special case of the divide-and-conquer strategy. Instead of parting the vector into two pieces, we could have chosen any other partition, say P Q-dimensional vectors, if N PQ. This type of algorithms is called a Cooley-Tukey algorithm [10].

Another partition often used is the radix-4 FFT algorithm. We can decompose a vector into four components:

N/4− 1                                            N/4− 1


v  =


. g4nw− 4nv + w− v. g4n+1w− 4nv


n=0


N

 

N/4− 1


N

n=0


N

 

N/4− 1


+ w− 2v. g4n+2w− 4nv + w− 3v


. g4n+3w− 4nv.


N

n=0


N                N                                   N

n=0


N
···
For simpler equations, we will use similar abbreviations as for the radix-2 algo- rithm and denote the partial transformations by 0gˆ,  , 3 gˆ. Making use of the symmetry of wv, the transformations into quarters of each of the vectors are


given by


 

N
N
N
gˆ v            =  0gˆ v + w− v 1gˆ v + w− 2v 2gˆ v + w− 3v 3gˆ v


N
N
N
gˆ v+N/4  =  0gˆ v − iw− v 1gˆ v − w− 2v 2gˆ u + iw− 3v 3gˆ v

N
N
N
gˆ v+N/2  =        0gˆ v − w− v 1gˆ v + w− 2v 2gˆ v − w− 3v 3gˆ v

N
N
N
gˆ v+3N/4  =        0gˆ v + iw− v 1gˆ v − w− 2v 2gˆ v − iw− 3v 3gˆ v


2.5 Fast Algorithms for Unitary Transforms                                      73

 

 

g
g0 000
g1 001
g2 010
g3 011
g4 100
g5 101
g6 110
g7 111

 

g0 000
g4 100
g2 010
g6 110

 

g0
g2
g4
g6

 

^

0

g
^

1

 

g
^

2

^
g3

 

 

g
g1 001
g5 101
g3 011
g7 111

 

g1
g3
g5
g7

 

^

4

g
^

5

 

g
^

6

^
g7

 

Figure 2.24: Signal fl ow diagram of the radix-2 decimation-in-frequency FFT algorithm for N = 8.

 

or, in matrix notation,


   gˆ v


  1   1   1   1  


0gˆ v      

 

  


gˆ                       1 − i − 1      i


w− v  1gˆ

  


 
  =    1  − 1        1  − 1       w2v 2gˆ v
v+N/4


                   

 


N
v+3N/4           1     i − 1   − i


     w− 3v  3gˆ v  


. 
To compose 4-tuple elements of the vector, 12 complex additions and 3 complex multiplications are needed. We can reduce the number of additions further by decomposing the matrix into two simpler matrices:


   gˆ v       1  0     1  0       1  0       1   0    


0gˆ v      

 

  


gˆ                       0  1     0  − i


1  0  − 1      0


wv 1gˆ

  


 
  =    1  0  − 1         0       0  1       0   1       w2v 2gˆ v
 
v+N/4


                   

 


N
gˆ v+3N/4  


  0  1    0   i


     0  1     0  − 1       w3v 3gˆ v  


−    =                  −                                                                              =
. 
The fi rst matrix multiplication yields intermediate results which can be used for several operations in the second stage. In this way, we save four addi- tions. We can apply this decomposition recursively log4 N times. As for the radix-2 algorithm, only trivial multiplications in the fi rst composition step are needed. At all other stages, multiplications occur for 3/4 of the points. In total, 3/4N(log4 N                     1)  3/8N(ldN    2) complex multiplications and 2N log4 N NldN complex additions are necessary for the radix-4 algorithm. While the number of additions remains equal, 25 % fewer multiplications are required than for the radix-2 algorithm.

 

2.5.5 Radix-2 Decimation-in-Frequency FFT‡

The decimation-in-frequency FFT is another example of a Cooley-Tukey algo- rithm. This time, we break the N-dimensional input vector into N/2 fi rst and N/2 second components. This partition breaks the output vector into its even and odd components:


74                                                                                        2 Image Representation

 

N/2− 1


N/2
2v    =


n.=0

N/2− 1


(gn + gn+N/2)w− nv


 

N/2
(2.88)


gˆ 2v+1  =


n.=0


WN− n(gn − gn+N/2)w− nv.


=
A recursive application of this partition results in a bit reversal of the elements in the output vector, but not the input vector. As an example, the signal fl ow graph for N 8 is shown in Fig. 2.24. A comparison with the decimation-in-time fl ow graph (Fig. 2.22) shows that all steps are performed in reverse order. Even the elementary butterfl y operations of the decimation-in-frequency algorithm are the inverse of the butterfl y operation in the decimation-in-time algorithm.

 

2.5.6 Multidimensional FFT Algorithms‡

Generally, there are two possible ways to develop fast algorithms for multidi- mensional discrete Fourier transforms. Firstly, we can decompose the multidi- mensional DFT into 1-D DFTs and use fast algorithms for them. Secondly, we can generalize the approaches of the 1-D FFT for higher dimensions. In this section, we show examples for both possible ways.

 

Decomposition into 1-D Transforms. A two-dimensional DFT can be bro- ken up in two one-dimensional DFTs because of the separability of the kernel. In the 2-Dcase Eq. (2.37), we obtain


1 M− 1  N− 1

 


. 2π inv Σ 


. 2π imu Σ


m=0
n=0
u, v = √ MN  .    . gm, n exp  −                 N       exp  −      M    .     (2.89)

The inner summation forms M 1-D DFTs of the rows, the outer N 1-D DFTs of the columns, i. e., the 2-DFFT is computed as M row transformations followed by N column transformations


 

Row transform


m, v =  1  . gm, n exp.− 2 π i nv Σ


N− 1
N n=0                                        N

M− 1


Column transform


u, v =  1  . g˜ m, v exp.− 2 π i m u Σ  .


M m=0                                        M

In an analogous way, a W -dimensional DFT can be composed of W one-dimen- sional DFTs.

 

Multidimensional Decomposition. A decomposition is also directly pos- sible in multidimensional spaces. We will demonstrate such algorithms with the simple case of a 2-Dradix-2 decimation-in-time algorithm.

×
We decompose an M N matrix into four submatrices by taking only every second pixel in every second line (Fig. 2.25). This decomposition yields

                                 
0, 0


   gˆ


gˆ u, v


1   1   1   1

 =  1     1 − 1 − 1  
 
1  − 1     1  − 1


gˆ u, v

. 
 
w− v  0, 1gˆ


u, v+N/2


u, v

M
M
N


 
gˆ u+M/2, v+N/2  


   1  − 1  − 1         1       wuwv 1, 1gˆ u, v  


2.5 Fast Algorithms for Unitary Transforms                                      75

 

0 1 0 1 0 1 0 1
2 3 2 3 2 3 2 3
0 1 0 1 0 1 0 1
2 3 2 3 2 3 2 3
0 1 0 1 0 1 0 1
2 3 2 3 2 3 2 3
0 1 0 1 0 1 0 1
2 3 2 3 2 3 2 3

 

Figure 2.25: Decomposition of an image matrix into four partitions for the 2-D radix-2 FFT algorithm.

 

The superscripts in front of Gˆ  denote the corresponding partial transformation. The 2-D radix-2 algorithm is very similar to the 1-D radix-4 algorithm. In a similar manner as for the 1-D radix-4 algorithm (Section 2.5.4), we can reduce the number of additions from 12 to 8 by factorizing the matrix:

                                
                                                            
1   1   1   1          1  0    1   0      1   1  0    0

    1    1  − 1  − 1    =    1  0  − 1                 0       0     0  1     1    .
 1  − 1     1  − 1       0  1    0   1   1  − 1  0       0 

1  − 1  − 1       1    0  1    0 − 1   0      0 1 − 1 

 

×
The 2-D radix-2 algorithm for an N N matrix requires (3/4N2) ld N complex multiplications, 25 % fewer than the separation into two 1-Dradix-2 FFTs. How- ever, the multidimensional decomposition has the disadvantage that the mem- ory access pattern is more complex than for the 1-DFourier transform. With the partition into a 1-Dtransform, the access to memory becomes local, yielding a higher cache hit rate than with the distributed access of the multidimensional decomposition.

 

2.5.7 Transformation of Real Images‡

So far, we have only discussed the Fourier transform of complex-valued signals. The same algorithms can be used also for real-valued signals. Then they are less effi cient, however, because the Fourier transform of a real-valued signal is Hermitian (Section 2.3.5) and thus only half of the Fourier coeffi cients are independent. This corresponds to the fact that also half of the signal, namely the imaginary part, is zero.

= +
It is obvious that another factor two in computational speed can be gained for the DFT of real data. The easiest way to do so is to compute two real 1-D sequences at once. This concept can easily be applied to the DFT of images, because many 1-D DFTs must be computed. Thus we can put the fi rst row x in the real part and the second row y in the imaginary part and yield the complex vector z  x   i y. From the symmetry properties discussed in Section 2.3.5, we infer that the transforms of the real and imaginary parts map in Fourier space


76                                                                                        2 Image Representation

 

to the Hermitian and anti-Hermitian parts. Thus the Fourier transforms of the two real M-dimensional vectors are given by

v = 1/2(zˆ v + zˆ N∗ − v ),             iyˆ v = 1/2(zˆ v − zˆ N∗ − v ).                           (2.90)

 

2.6 Further Readings‡

 

The classical textbook on the Fourier transform — and still one of the best — is Bracewell [11]. An excellent source for various transforms is the “Handbook on Transforms” by Poularikas [141]. For the basics of linear algebra, especially unitary transforms, the reader is referred to one of the modern textbooks on linear algebra, e. g., Meyer [125], Anton [4], or Lay [106]. It is still worthwhile to read the historical article of Cooley and Tukey [22] about the discovery of the fi rst fast Fourier transform algorithm. The monograph of Blahut [10] covers a variety of fast algorithms for the Fourier transform.


 

 


Поделиться:



Последнее изменение этой страницы: 2019-05-04; Просмотров: 216; Нарушение авторского права страницы


lektsia.com 2007 - 2024 год. Все материалы представленные на сайте исключительно с целью ознакомления читателями и не преследуют коммерческих целей или нарушение авторских прав! (0.095 с.)
Главная | Случайная страница | Обратная связь