A Simple Example: Linear Regression

= +

The ﬁ t of a straight line to a set of experimental data points x, y is a simple example of a discrete inverse problem. As illustrated in Fig. 17.6, the quantity y is measured as a function of a parameter x. In this case, our model is a straight line with two parameters, the oﬀ set a₀ and the slope a₁: y a₀ a₁x. With a set of Q data points [x_q, y_q]T we end up with the linear equation system



1 x₁

1 x₂



 Σ a0 Σ

y1 

= 







y₂



(17.50)







 1 x_Q 

which can be abbreviated by

  yQ  

Mp = d. (17.51)

The Q 2 matrix M is denoted as the model matrix or design matrix. This matrix reﬂ ects both the type of the model (here a linear regression) and the chosen independent measuring points x_q. The model or para- meter vector p contains the parameters of the model to be estimated and the data vector d the measured data x_q.

If we have only two data points which do not coincide, x₁ ≠ x₂, we get an exact solution of the linear equation system. If more than two data points are available, we have more equations than unknowns. We say that the equation system is an overdetermined inverse problem. In this case, it is generally no longer possible to obtain an exact solution. We can only compute an estimate of the model parameters p _est in the sense that the deviation of the data d from the data predicted with the model d _pre Mp _est is minimal. This deviation can be expressed by an error vector e:

Error Norms

e = d − d _pre = d − Mp _est. (17.52)

In order to minimize the error vector we need a suitable measure. We may use norms, which we discussed when using inner product vector

462 17 Regularization and Modeling

spaces in Section 2.3.1. Generally, the L_n norm of the Q-dimensional vector e is deﬁ ned as



 _Q

1/n

 .



⊗ e ⊗ _n = |e_q|

q=1

. (17.53)

A special case is the L_∞ norm

⊗ e ⊗ _∞ = max |e_q|. (17.54)

The L₂ norm is more commonly used; it is the root of the sum of the squared deviations of the error vector elements

1/2

 _Q 

 . 

⊗ e ⊗ ₂ = (d_q − d_{pre, q})2

q=1

. (17.55)

Higher norms rate higher deviations with a more signiﬁ cant weighting. The statistics of the data points determines which norm is to be taken. If the measured data points y_q have a normal density (Section 3.4.2), the L₂ norm must be used [124].

Least Squares Solution

The overdetermined linear inverse problem is solved with a minimum

L₂ norm of the error vector by

p _est =. M T M Σ − M T d. (17.56)

This solution can be made plausible by the following sequence of oper- ations:

M T Mp _est

..   Σ ₋T                                     T=

Mp_est         = d                                MT

M d                          M M

1 .

(17.57)

p _est = . M T M Σ − M T d

provided that the inverse of M T M exists.

Geometric Illustration

Before we study methods for solving huge linear equation systems, it is helpful to illustrate linear equation systems geometrically. The P model parameters p span a P" =dimensional vector space. This space can be regarded as the space of all possible solutions of an inverse problem with P model parameters. Now, we ask ourselves what it means to have

17.6 Discrete Inverse Problems† 463

a b

Figure 17.7: Geometric illustration of the solution of a linear equation system with three unknowns using the Hough transform: a exact soluble equation sys- tem; b overdetermined equation system with a non-unique solution.

one data point d_q. According to Eq. (17.51), one data point results in one linear equation involving all model parameters M

m_qp' p_p' = d_q or g _q p = d_q. (17.58)

k=p'

−

This equation can be regarded as the scalar product of a row q of the model matrix m _q with the model vector p. In the model space, this equa- tion constitutes a P 1-dimensional hyperplane of all vectors p which has a normal vector m _q and a distance d_q from the origin of the model space. Thus, the linear equation establishes a one-to-one correspon- dence between a data point in the data space and a (P 1)-dimensional hyperplane in the model space. This mapping of data points into the model space is called the Hough transform, which we introduced in Sec- tion 16.5.2. Each data point reduces the space of possible solutions to a (P 1)-dimensional hyperplane in the model space.

Figure 17.7a illustrates the solution of a linear equation system with three unknowns. With three equations, three planes meet at a single point, provided that the corresponding 3 3 model matrix is invertible. Even in an overdetermined case, the solution needs not necessarily be unique. Figure 17.7b shows a case of ﬁ ve planes intersecting at a line. Then, the solution is not unique, but only restricted to a line. If this line is oriented along one of the axes, the corresponding model parameter may take any value; the two other model parameters, however, are ﬁ xed. In case of an arbitrarily oriented line, things are more complex. Then, the parameter combinations normal to the line are ﬁ xed, but the parame- ter combination represented by a vector in the direction of the line is not. Using the singular value decomposition [54, 143], we can solve singular

464 17 Regularization and Modeling

linear equation systems and separate the solvable from the unsolvable parameter combinations.

An overdetermined linear equation system that has no unique solu- tion is not just a mathematical curiosity. It is rather a common problem in image processing. We have encountered it already, for example in motion determination with the aperture problem (Section 14.3.2).

17.6.5 Derivation of the Least Squares Solution‡

. mq'p'' pp''  .

In this section we provide a derivation of the solution of the overdetermined discrete linear inverse problem (Eq. (17.51)) that minimizes the L₂ norm of the error vector. Therefore, we compute the solution explicitly by minimizing the L₂ norm of the error vector e (Eq. (17.52)):

⊗ e⊗ 2 =

.  d_q' − . m_q'p' p_p'   d_q' −

Q  P

  P 

q'=1

p'=1

p''=1

Factorizing the sum and interchanging the two summations yields

P P Q

⊗ e ⊗ 2 =

.. pp' pp''. mq'p' mq'p''

p'=1p''=1 q'=1

˛ A¸ r

− 2. p_p'. m_q ' p ' d_q'

(17.59)

p'=1

q'=1

Q ˛ B¸ r

+ d_q' d_q'.

q'=1

We ﬁ nd a minimum for this expression by computing the partial derivatives with respect to the parameters p_k that are to be optimized. Only the expressions A and B in Eq. (17.59) depend on p_k:

∂ A

∂ p_k =

P P Q

. . .δ k− p'' pp' + δ k− p' pp'' Σ . mq'p' mq'p''

p'=1p''=1

P Q

q'=1

P Q

= . p_p'. m_q'p' m_q'k +. p_p''. m_q'km_q'p''

p'=1

q'=1

p''=1