World and Camera Coordinates

7.2.1 Deﬁ nition

dinates describe the horizontal and X₃'

the vertical positions, respec-

Basically, the position of objects in 3-D space can be described in two diﬀ erent ways (Fig. 7.1). First, we can use a coordinate system which is related to the scene observed. These coordinates are called world

coordinates and denoted as X ' = Σ X₁', X₂', X₃' Σ . The X₁' and X₂' coor-

177

ISBN 3–540–67754–2 All rights of reproduction in any form reserved.

178 7 Image Formation

world coordinates

Figure 7.1: Illustration of world and camera coordinates.

tively. Sometimes, an alternative convention with non-indexed coordi- nates X ' [X', Y ', Z']^T is more convenient. Both notations are used in this book.

A second system, the camera coordinates X [X₁, X₂, X₃]^T, can be ﬁ xed to the camera observing the scene. The X₃ axis is aligned with the optical axis of the camera system (Fig. 7.1). Physicists are familiar with such considerations. It is common to discuss physical phenomena in diﬀ erent coordinate systems. In elementary mechanics, for example, motion is studied with respect to two observers, one at rest, the other moving with the object.

Transition from world to camera coordinates generally requires a translation and a rotation. First, we shift the origin of the world co- ordinate system to the origin of the camera coordinate system by the translation vector T (Fig. 7.1). Then we change the orientation of the shifted system by rotations about suitable axes so that it coincides with the camera coordinate system. Mathematically, translation can be de- scribed by vector subtraction and rotation by multiplication of the coor- dinate vector with a matrix:

Rotation

X = R ( X ' − T ). (7.1)

Rotation of a coordinate system has two important features. It does not change the length or norm of a vector and it keeps the coordinate system orthogonal. A transformation with these features is known in linear algebra as an orthonormal transform.

The coeﬃ cients in a transformation matrix have an intuitive meaning. This can be seen when we apply the transformation to unit vectors E ¯ p

7.2 World and Camera Coordinates 179

in the direction of the coordinate axes. With E ¯ 1, for instance, we obtain

¯ ^' ¯

 a11 a12 a13

  1 

 a11 

E ₁ = A E ₁ =   a₂₁ a₂₂ a₂₃

  0  =  a21

 . (7.2)

a31 a32 a33

  0 

 a31 

Thus, the columns of the transformation matrix give the coordinates of the base vectors in the new coordinate system. Knowing this property, it is easy to formulate the orthonormality condition that has to be met by the rotation matrix R:

R T R = I or

r_kmr_lm = δ _k_{− l}, (7.3)

m=1

where I denotes the identity matrix, whose elements are one and zero on diagonal and non-diagonal positions, respectively. Using Eq. (7.2), this equation simply states that the transformed base vectors remain orthogonal:

E ¯ 'T E ¯ ' = δ _k₋_l. (7.4)

Equation Eq. (7.3) leaves three matrix elements independent out of nine. Unfortunately, the relationship between the matrix elements and three parameters to describe rotation turns out to be quite complex and nonlinear. A common procedure involves the three Eulerian rotation angles (φ, θ, ψ ). A lot of confusion exists in the literature about the deﬁ nition of the Eulerian angle. We follow the standard mathematical approach. We use right-hand coordinate systems and count rotation an- gles positive in the counterclockwise direction. The rotation from the shifted world coordinate system to the camera coordinate system is de- composed into three steps (see Fig. 7.2, [53]).

1. Rotation about X₃' axis by the angle φ, X '' = R _φ X ':

 cos φ sin φ 0 

R _φ =   − sin φ cos φ 0   (7.5)

2. Rotation about X₁'' axis by θ, X ''' = R _θ X '':

 1 0 0 

 

R _θ = 0 cos θ sin θ

0 − sin θ cos θ

3. Rotation about X₃''' axis by ψ, X = R _ψ X ''':

  (7.6)

 cos ψ sin ψ 0 

R _ψ =   − sin ψ cos ψ 0   (7.7)

180 7 Image Formation

X'3''

X'3

X'2''

X'3

X'2

X' X''' =X''

X'2

1 1 1

Figure 7.2: Rotation of world coordinates X ' to camera coordinates X using the three Eulerian angles (φ, θ, ψ ) with successive rotations about the a X₃', b X₁'', and c X₃''' axes.

Cascading the three rotations, R _ψ R _θ R _φ, yields the matrix





cos ψ cos φ − cos θ sin φ sin ψ cos ψ sin φ + cos θ cos φ sin ψ sin θ sin ψ

sin ψ cos φ − cos θ sin φ cos ψ − sin ψ sin φ + cos θ cos φ cos ψ sin θ cos ψ .

 − sin θ sin φ − sin θ cos φ cos θ 

The inverse transformation from camera coordinates to world coor- dinates is given by the transpose of the above matrix. Since matrix mul- tiplication is not commutative, rotation is also not commutative. There- fore, it is important not to interchange the order in which rotations are performed.

Rotation is only commutative in the limit of an inﬁ nitesimal rotation. Then, the cosine and sine terms reduce to 1 and ε, respectively. This limit has some practical applications since minor rotational misalignments are common.

Rotation about the X₃ axis, for instance, can be

 1 ε 0 

X₁ = X₁' + ε X₂'

X = R _ε X ' =   − ε 1 0   X ' or

X₂ = X₂' − ε X₁' . X₃ = X₃'

As an example we discuss the rotation of the point Σ X₁', 0, 0Σ . It is ro-

tated into the point Σ X₁', ε X₁', 0Σ while the correct position would be

Σ X₁' cos ε, X₁' sin ε, 0Σ

. Expanding the trigonometric function in a Taylor

series to third order yields a position error of Σ 1/2ε 2X₁', 1/6ε 3X₁', 0Σ .

For a 512 × 512 image (X₁' < 256 for centered rotation) and an error

± = ±

limit of less than 1/20 pixel, ε must be smaller than 0.02 or 1.15 °. This is still a signiﬁ cant rotation vertically displacing rows by up to ε X' 5

pixels.

7.3 Ideal Imaging: Perspective Projection† 181

Figure 7.3: Image formation with a pinhole camera.

7.3 Ideal Imaging: Perspective Projection†

The Pinhole Camera

−

The basic geometric aspects of image formation by an optical system are well modeled by a pinhole camera. The imaging element of this camera is an inﬁ nitesimally small hole (Fig. 7.3). The single light ray coming from a point of the object at [X₁, X₂, X₃]^T which passes through this hole meets the image plane at [x₁, x₂, d_i]^T. Through this condition an image of the object is formed on the image plane. The relationship between the 3-Dworld and the 2-D image coordinates [x₁, x₂]^T is given by

x1 d'X₁

d'X₂

=− X3 , x =− X3 . (7.8)

The two world coordinates parallel to the image plane are scaled by the factor d'/X₃. Therefore, the image coordinates [x₁, x₂]^T contain only ratios of world coordinates, from which neither the distance nor the true size of an object can be inferred.

A straight line in the world space is projected onto a straight line at the image plane. This important feature can be proved by a sim- ple geometric consideration. All light rays emitted from a straight line pass through the pinhole. Consequently they all lie on a plane which is spanned by the straight line and the pinhole. This plane intersects with the image plane in a straight line.

All object points on a ray through the pinhole are projected onto a single point in the image plane. In a scene with several transparent ob- jects, the objects are projected onto each other. Then we cannot infer the three-dimensional structure of the scene at all. We may not even be able to recognize the shape of individual objects. This example demon- strates how much information is lost by projection of a 3-D scene onto a 2-Dimage plane.

182 7 Image Formation

occluded space

occluded surface

Figure 7.4: Occlusion of more distant objects and surfaces by perspective pro- jection.

Most natural scenes, however, contain opaque objects. Here the ob- served 3-D space is essentially reduced to 2-D surfaces. These sur- faces can be described by two two-dimensional functions g(x₁, x₂) and X₃(x₁, x₂) instead of the general description of a 3-D scalar gray value image g(X₁, X₂, X₃). A surface in space is completely projected onto the image plane provided that not more than one point of the surface lies on the same ray through the pinhole. If this condition is not met, parts of the surface remain invisible. This eﬀ ect is called occlusion. The oc- cluded 3-Dspace can be made visible if we put a point light source at the position of the pinhole (Fig. 7.4). Then the invisible parts of the scene lie in the shadow of those objects which are closer to the camera.

As long as we can exclude occlusion, we only need the depth map X₃(x₁, x₂) to reconstruct the 3-Dshape of a scene completely. One way to produce it — which is also used by our visual system — is by stereo imaging, i. e., observation of the scene with two sensors from diﬀ erent points of view (Section 8.2.1).

Projective Imaging

Imaging with a pinhole camera is essentially a perspective projection, because all rays must pass through one central point, the pinhole. Thus the pinhole camera model is very similar to imaging with penetrating rays, such as x-rays, emitted from a point source (Fig. 7.5). In this case, the object lies between the central point and the image plane.



The projection equation corresponds to Eq. (7.8) except for the sign:

  X2

 X1 X₃



x₁

  − → x2

Σ =  

d'X₁ X₃ d'X₂ X₃



 . (7.9)

7.3 Ideal Imaging: Perspective Projection† 183

Figure 7.5: Perspective projection with x-rays.

The image coordinates divided by the image distance d_i are called

generalized image coordinates:

=₁

x˜ x 1 , d'

x˜ x 2 . (7.10)

=₂

Generalized image coordinates are dimensionless and denoted by a tilde. They are equal to the tangent of the angle with respect to the optical axis of the system with which the object is observed. These coordinates ex- plicitly take the limitations of the projection onto the image plane into account. From these coordinates, we cannot infer absolute positions but know only the angle at which the object is projected onto the im- age plane. The same coordinates are used in astronomy. The general projection equation of perspective projection Eq. (7.9) then reduces to

 X1 

 X 1 

X =  X₂

X₃

 

 

 − → x ˜ =

 . (7.11)



X₃

 X3  X 2

We will use this simpliﬁ ed projection equation in all further consider- ations. For optical imaging, we just have to include a minus sign or, if speaking geometrically, reﬂ ect the image at the origin of the coordinate system.

7.3.3 Homogeneous Coordinates‡

In computer graphics, the elegant formalism of homogeneous coordinates [37, 46, 122] is used to describe all the transformations we have discussed so far,

i. e., translation, rotation, and perspective projection, in a uniﬁ ed framework. This formalism is signiﬁ cant, because the whole image formation process can be expressed by a single 4 × 4 matrix.

Homogeneous coordinates are represented by a four-component column vector

X ' = Σ tX₁', tX₂', tX₃', tΣ T , (7.12)

184 7 Image Formation

from which ordinary three-dimensional coordinates are obtained by dividing the ﬁ rst three components of the homogeneous coordinates by the fourth. Any arbitrary transformation can be obtained by premultiplying the homogeneous coordinates with a 4 4 matrix M. In particular, we can obtain the image coor- dinates

x = [sx₁, sx₂, sx₃, s]^T (7.13)

x = MX. (7.14)

Since matrix multiplication is associative, we can view the matrix M as com- posed of many transformation matrices, performing such elementary transfor- mations as translation, rotation around a coordinate axis, perspective projec- tion, and scaling. The transformation matrices for the elementary transforms are readily derived:

1	0	0 T₁
0	1	0 T
0	0	1 T₃
0	0	0 1

 2 





T =

 

Translation by [T₁, T₂, T₃]^T

R _x1 =

 1 0 0 0 

 0 cos θ − sin θ 0 

   

0 sin θ cos θ 0 0 0 0 1

Rotation about X₁

axis by θ

R _x2 =



 

cos φ	0	sin φ	0
0	1	0	0
sin φ	0	cos φ	0
	0	0	1

  − 0

Rotation about X₂ axis by φ



 

(7.15)

R _x3 =  

cos ψ − sin ψ 0 0

sin ψ cos ψ 0 0

 

0 0 1 0

0 0 0 1

Rotation about X₃ axis by ψ

s₁	0	0	0
0	s	0	0
0	0	s₃	0
0	0	0	1

 2 





S =

 





1	0	0	0
0	1	0	0
0	0	1	0
0	0	− 1/d'	1

   

Scaling

Perspective projection.

Perspective projection is formulated slightly diﬀ erently from the deﬁ nition in Eq. (7.11). Premultiplication of the homogeneous vector

X = [tX₁, tX₂, tX₃, t]^T

with P yields

 d' − X  1 2 3

tX, tX, tX, t ³

, (7.16)

7.4 Real Imaging 185

optical system

parallel light rays

focal point

principal

p1 points p2

back focal length

front focal length

effective focal length

parallel light rays effective focal

length

Figure 7.6: Black box model of an optical system.

from which we obtain the image coordinates by division through the fourth coordinate

Σ Σ  X1 d' 

x₁

x2 = 

d' − X₃

 . (7.17)

 X2 d' − X₃ 

From this equation we can see that the image plane is positioned at the origin,

since if X₃ = 0, both image and world coordinates are identical. The center of projection has been shifted to [0, 0, − d']^T.

Complete transformations from world coordinates to image coordinates can be composed of these elementary matrices. Strat [179], for example, proposed the following decomposition:

M = CSPR _z R _y R _x T . (7.18)

The scaling S and cropping (translation) C are transformations taking place in the two-dimensional image plane. Strat [179] shows how the complete trans- formation parameters from camera to world coordinates can be determined in a noniterative way from a set of calibration points whose positions in the space are exactly known. In this way, an absolute calibration of the outer camera pa- rameters position and orientation and the inner parameters piercing point of the optical axis, focal length, and pixel size can be obtained.

Real Imaging

⇐ Предыдущая 29 30 31 32 333435 36 37 38 Следующая ⇒

Последнее изменение этой страницы: 2019-05-04; Просмотров: 244; Нарушение авторского права страницы