Measures for Fast Algorithms

⇐ ПредыдущаяСтр 15 из 85Следующая ⇒

According to the number of arithmetic operations required, there are many other fast Fourier transform algorithms which are still more eﬀ ec- tive. Most of them are based on polynomial algebra and number theory. An in-depth discussion of these algorithms is given by Blahut [10]. How- ever, the mere number of arithmetic operations is not the only measure for an eﬃ cient algorithm. We must also consider a number of other factors.

Access to the data requires additional operations. Consider the sim- ple example of the addition of two vectors. There, besides addition, the following operations are performed: the addresses of the appropriate el- ements must be calculated; the two elements are read into registers, and the result of these additions is written back to the memory. Depending on the architecture of the hardware used, these ﬁ ve operations consti- tute a signiﬁ cant overhead which may take much more time than the addition itself. Consequently, an algorithm with a complicated scheme to access the elements of a vector might add a considerable overhead to the arithmetic operations. In eﬀ ect, a simpler algorithm with more arithmetic operations but less overhead to compute addresses may be faster.

Another factor for rating algorithms is the amount of storage space needed. This not only includes the space for the code but also storage space required for intermediate results or tables for constants. For ex- ample, a so-called in-place FFT algorithm, which can perform the Fourier transform on an image without using an intermediate storage area for the image, is very advantageous. Often there is a trade-oﬀ between stor- age space and speed. Many integer FFT algorithms, for example, precal-

culate the complex phase factors w^v and store them in statically allo-

cated tables.

To a large extent the eﬃ ciency of algorithms depends on the com- puter architecture used to implement them. If multiplication is per- formed either in software or by a microcoded instruction, it is much slower than addition or memory access. In this case, the aim of fast al- gorithms is to reduce the number of multiplications even at the cost of more additions or a more complex memory access. Such a strat- egy makes no sense on some modern high-speed architectures where pipelined ﬂ oating-point addition and multiplication take just one clock cycle. The faster the operations on the processor, the more the memory access becomes the bottleneck. Fast algorithms must now consider ef- fective memory access schemes. It is crucial that as many computations as possible can be performed with one and the same set of data. In this way, these data can be kept in a fast intermediate storage area, known as the memory cache, and no direct access to the much slower general memory (RAM) is required.

72 2 Image Representation

−

After this detailed discussion of the algorithm, we can now estimate the number of operations necessary. At each stage of the composition, N/2 complex multiplications and N complex additions are carried out. In total we need N/2 ldN complex multiplications and N ldN complex additions. A deeper analysis shows that we can save even more multi- plications. In the ﬁ rst two composition steps only trivial multiplications by 1 or i occur (compare Fig. 2.22). For further steps the number of triv- ial multiplications decreases by a factor of two. If our algorithm could avoid all the trivial multiplications, the number of multiplications would be reduced to (N/2)(ld N 3).

The FFT algorithm is a classic example of a fast algorithm. The com- putational savings are enormous. For a 512-element vector, only 1536 instead of 262 144 complex multiplications are needed compared to the direct calculation according to Eq. (2.29). The number of multiplications has been reduced by a factor 170.

Using the FFT algorithm, the discrete Fourier transform can no longer be regarded as a computationally expensive operation, since only a few operations are necessary per element of the vector. For a vector with 512 elements, only 3 complex multiplications and 8 complex additions, corresponding to 12 real multiplications and 24 real additions, need to be computed per pixel.

2.5.4 Radix-4 Decimation-in-Time FFT‡

Having worked out one fast algorithm, we still do not know whether the algo- rithm is optimal or if even more eﬃ cient algorithms can be found. Actually, we have applied only one special case of the divide-and-conquer strategy. Instead of parting the vector into two pieces, we could have chosen any other partition, say P Q-dimensional vectors, if N PQ. This type of algorithms is called a Cooley-Tukey algorithm [10].

Another partition often used is the radix-4 FFT algorithm. We can decompose a vector into four components:

N/4− 1 N/4− 1

gˆ _v =

. g4nw− 4nv + w− v. g4n+1w− 4nv

n=0

N/4− 1

n=0

N/4− 1

+ w− 2v. g4n+2w− 4nv + w− 3v

. g4n+3w− 4nv.

n=0

N N N

n=0

···

For simpler equations, we will use similar abbreviations as for the radix-2 algo- rithm and denote the partial transformations by 0gˆ, , 3 gˆ. Making use of the symmetry of w^v, the transformations into quarters of each of the vectors are

given by

gˆ v = 0gˆ v + w− v 1gˆ v + w− 2v 2gˆ v + w− 3v 3gˆ v

gˆ v+N/4 = 0gˆ v − iw− v 1gˆ v − w− 2v 2gˆ u + iw− 3v 3gˆ v

gˆ v+N/2 = 0gˆ v − w− v 1gˆ v + w− 2v 2gˆ v − w− 3v 3gˆ v

gˆ v+3N/4 = 0gˆ v + iw− v 1gˆ v − w− 2v 2gˆ v − iw− 3v 3gˆ v

2.5 Fast Algorithms for Unitary Transforms 73

g0 000

g1 001

g2 010

g3 011

g4 100

g5 101

g6 110

g7 111

g0 000

g4 100

g2 010

g6 110

g1 001

g5 101

g3 011

g7 111

Figure 2.24: Signal ﬂ ow diagram of the radix-2 decimation-in-frequency FFT algorithm for N = 8.

or, in matrix notation,

 gˆ _v

  1 1 1 1  

0gˆ _v 

gˆ 1 − i − 1 i

w− v 1gˆ

 

 =  1 − 1 1 − 1   w⁻²^v 2gˆ _v







v+N/4

  

gˆ _v₊₃_N_/₄   1 i − 1 − i

  w− 3v 3gˆ v 

. 

To compose 4-tuple elements of the vector, 12 complex additions and 3 complex multiplications are needed. We can reduce the number of additions further by decomposing the matrix into two simpler matrices:

 gˆ _v   1 0 1 0   1 0 1 0  

0gˆ _v 

gˆ 0 1 0 − i

1 0 − 1 0

w⁻^v 1gˆ

 

 =  1 0 − 1 0   0 1 0 1   w⁻²^v 2gˆ _v







 

v+N/4

  

gˆ v+3N/4 

 0 1 0 i

  0 1 0 − 1   w⁻³^v 3gˆ _v 

− = − =

. 

The ﬁ rst matrix multiplication yields intermediate results which can be used for several operations in the second stage. In this way, we save four addi- tions. We can apply this decomposition recursively log₄ N times. As for the radix-2 algorithm, only trivial multiplications in the ﬁ rst composition step are needed. At all other stages, multiplications occur for 3/4 of the points. In total, 3/4N(log₄ N 1) 3/8N(ldN 2) complex multiplications and 2N log₄ N NldN complex additions are necessary for the radix-4 algorithm. While the number of additions remains equal, 25 % fewer multiplications are required than for the radix-2 algorithm.

2.5.5 Radix-2 Decimation-in-Frequency FFT‡

The decimation-in-frequency FFT is another example of a Cooley-Tukey algo- rithm. This time, we break the N-dimensional input vector into N/2 ﬁ rst and N/2 second components. This partition breaks the output vector into its even and odd components:

74 2 Image Representation

N/2− 1

N/2

gˆ ₂_v =

n.=0

N/2− 1

(gn + gn+N/2)w− nv

N/2

(2.88)

gˆ 2v+1 =

n.=0

WN− n(gn − gn+N/2)w− nv.

A recursive application of this partition results in a bit reversal of the elements in the output vector, but not the input vector. As an example, the signal ﬂ ow graph for N 8 is shown in Fig. 2.24. A comparison with the decimation-in-time ﬂ ow graph (Fig. 2.22) shows that all steps are performed in reverse order. Even the elementary butterﬂ y operations of the decimation-in-frequency algorithm are the inverse of the butterﬂ y operation in the decimation-in-time algorithm.

2.5.6 Multidimensional FFT Algorithms‡

Generally, there are two possible ways to develop fast algorithms for multidi- mensional discrete Fourier transforms. Firstly, we can decompose the multidi- mensional DFT into 1-D DFTs and use fast algorithms for them. Secondly, we can generalize the approaches of the 1-D FFT for higher dimensions. In this section, we show examples for both possible ways.

Decomposition into 1-D Transforms. A two-dimensional DFT can be bro- ken up in two one-dimensional DFTs because of the separability of the kernel. In the 2-Dcase Eq. (2.37), we obtain

1 M− 1  N− 1

. 2π inv Σ 

. 2π imu Σ



m=0

n=0

gˆ _{u, v} = √ MN .  . g_m,_n exp − N exp − M . (2.89)

The inner summation forms M 1-D DFTs of the rows, the outer N 1-D DFTs of the columns, i. e., the 2-DFFT is computed as M row transformations followed by N column transformations

Row transform

g˜ _m,_v = 1 . g_m,_n exp.− 2 π i nv Σ

N− 1

N n=0 N

M− 1

Column transform

gˆ _{u, v} = 1 . g˜ _m,_v exp.− 2 π i m u Σ .

M m=0 M

In an analogous way, a W -dimensional DFT can be composed of W one-dimen- sional DFTs.

Multidimensional Decomposition. A decomposition is also directly pos- sible in multidimensional spaces. We will demonstrate such algorithms with the simple case of a 2-Dradix-2 decimation-in-time algorithm.

We decompose an M N matrix into four submatrices by taking only every second pixel in every second line (Fig. 2.25). This decomposition yields

  



0, 0



 gˆ

gˆ u, v

1 1 1 1

 =  1 1 − 1 − 1  

 

1 − 1 1 − 1

gˆ u, v

. 

 



w− v 0, 1gˆ

u, v+N/2

u, v

 

gˆ u+M/2, v+N/2 

 1 − 1 − 1 1   w⁻^uw⁻^v 1, 1gˆ _{u, v} 

2.5 Fast Algorithms for Unitary Transforms 75

0	1	0	1	0	1	0	1
2	3	2	3	2	3	2	3
0	1	0	1	0	1	0	1
2	3	2	3	2	3	2	3
0	1	0	1	0	1	0	1
2	3	2	3	2	3	2	3
0	1	0	1	0	1	0	1
2	3	2	3	2	3	2	3

Figure 2.25: Decomposition of an image matrix into four partitions for the 2-D radix-2 FFT algorithm.

The superscripts in front of Gˆ denote the corresponding partial transformation. The 2-D radix-2 algorithm is very similar to the 1-D radix-4 algorithm. In a similar manner as for the 1-D radix-4 algorithm (Section 2.5.4), we can reduce the number of additions from 12 to 8 by factorizing the matrix:

 

   

1 1 1 1 1 0 1 0 1 1 0 0

  1 1 − 1 − 1  =  1 0 − 1 0   0 0 1 1  .

 1 − 1 1 − 1   0 1 0 1   1 − 1 0 0 

1 − 1 − 1 1   0 1 0 − 1   0 0 1 − 1 

The 2-D radix-2 algorithm for an N N matrix requires (3/4N2) ld N complex multiplications, 25 % fewer than the separation into two 1-Dradix-2 FFTs. How- ever, the multidimensional decomposition has the disadvantage that the mem- ory access pattern is more complex than for the 1-DFourier transform. With the partition into a 1-Dtransform, the access to memory becomes local, yielding a higher cache hit rate than with the distributed access of the multidimensional decomposition.

2.5.7 Transformation of Real Images‡

So far, we have only discussed the Fourier transform of complex-valued signals. The same algorithms can be used also for real-valued signals. Then they are less eﬃ cient, however, because the Fourier transform of a real-valued signal is Hermitian (Section 2.3.5) and thus only half of the Fourier coeﬃ cients are independent. This corresponds to the fact that also half of the signal, namely the imaginary part, is zero.

= +

It is obvious that another factor two in computational speed can be gained for the DFT of real data. The easiest way to do so is to compute two real 1-D sequences at once. This concept can easily be applied to the DFT of images, because many 1-D DFTs must be computed. Thus we can put the ﬁ rst row x in the real part and the second row y in the imaginary part and yield the complex vector z x i y. From the symmetry properties discussed in Section 2.3.5, we infer that the transforms of the real and imaginary parts map in Fourier space

76 2 Image Representation

to the Hermitian and anti-Hermitian parts. Thus the Fourier transforms of the two real M-dimensional vectors are given by

xˆ _v = 1/2(zˆ _v + zˆ N∗ − v ), iyˆ _v = 1/2(zˆ _v − zˆ N∗ − v ). (2.90)

2.6 Further Readings‡

The classical textbook on the Fourier transform — and still one of the best — is Bracewell [11]. An excellent source for various transforms is the “Handbook on Transforms” by Poularikas [141]. For the basics of linear algebra, especially unitary transforms, the reader is referred to one of the modern textbooks on linear algebra, e. g., Meyer [125], Anton [4], or Lay [106]. It is still worthwhile to read the historical article of Cooley and Tukey [22] about the discovery of the ﬁ rst fast Fourier transform algorithm. The monograph of Blahut [10] covers a variety of fast algorithms for the Fourier transform.

⇐ Предыдущая 10 11 12 13 141516 17 18 19 Следующая ⇒

Последнее изменение этой страницы: 2019-05-04; Просмотров: 216; Нарушение авторского права страницы