5.3.1 原始优化问题
This leads to the following general optimization problem defining SVMs in the non-separable case where the parameter determines the trade-off between margin-maximization (or minimization of ) and the minimization of the slack penalty :
where . The parameter is typically determined via -fold cross-validation (see section 4.5).
As in the separable case, (5.24) is a convex optimization problem since the constraints are affine and thus convex and since the objective function is convex for any . In particular, is convex in view of the convexity of the norm .
There are many possible choices for leading to more or less aggressive penaliza-tions of the slack terms (see exercise 5.1). The choices and lead to the most straightforward solutions and analyses. The loss functions associated with and are called the hinge loss and the quadratic hinge loss, respectively. Figure 5.5 shows the plots of these loss functions as well as that of the standard zero-one loss function. Both hinge losses are convex upper bounds on the zero-one loss, thus making them well suited for optimization. In what follows, the analysis is presented in the case of the hinge loss , which is the most widely used loss function for SVMs.
Figure 5.5
Both the hinge loss and the quadratic hinge loss provide convex upper bounds on the binary zero-one loss.