5.3.1 原始优化问题

This leads to the following general optimization problem defining SVMs in the non-separable case where the parameter $C \geq 0$ determines the trade-off between margin-maximization (or minimization of $∥ w ∥^{2}$ ) and the minimization of the slack penalty $i = 1 \sum m ξ_{i}^{p}$ :

w, b, ξ min \frac{1}{2} ∥ w ∥^{2} + C i = 1 \sum m ξ_{i}^{p} (5.24)

subject to y_{i} (w \cdot x_{i} + b) \geq 1 - ξ_{i} \land ξ_{i} \geq 0, i \in [m],

where $ξ = (ξ_{1}, \dots, ξ_{m})^{⊤}$ . The parameter $C$ is typically determined via $n$ -fold cross-validation (see section 4.5).

As in the separable case, (5.24) is a convex optimization problem since the constraints are affine and thus convex and since the objective function is convex for any $p \geq 1$ . In particular, $ξ \mapsto i = 1 \sum m ξ_{i}^{p} =∥ ξ ∥_{p}^{p}$ is convex in view of the convexity of the norm $∥ \cdot ∥_{p}$ .

There are many possible choices for $p$ leading to more or less aggressive penaliza-tions of the slack terms (see exercise 5.1). The choices $p = 1$ and $p = 2$ lead to the most straightforward solutions and analyses. The loss functions associated with $p = 1$ and $p = 2$ are called the hinge loss and the quadratic hinge loss, respectively. Figure 5.5 shows the plots of these loss functions as well as that of the standard zero-one loss function. Both hinge losses are convex upper bounds on the zero-one loss, thus making them well suited for optimization. In what follows, the analysis is presented in the case of the hinge loss $(p = 1)$ , which is the most widely used loss function for SVMs.

0192515b-435f-75ef-9b27-37409ba7b98f_10_495_257_716_524_0.jpg

Figure 5.5

Both the hinge loss and the quadratic hinge loss provide convex upper bounds on the binary zero-one loss.

Youliang Zhong

Backlinks

Graph View

5.3.1 原始优化问题

5.3.1 原始优化问题