3.1 Growth function of intervals in $R$

让 $H$ 为 $R$ 中的区间集合。 $H$ 的VC维度为2。计算其打碎系数 $Π_{H} (m), m \geq 0$ 。将你的结果与增长函数的一般界限进行比较。

3.2 Growth function and Rademacher complexity of thresholds in $R$

让 $H$ 为实数线上的阈值函数族： $H = {x \mapsto 1_{x \leq θ} : θ \in R} \cup {x \mapsto$ $1_{x \geq θ} : θ \in R}$ 。给出增长函数 $Π_{m} (H)$ 的上界。利用该上界推导出 $R_{m} (H)$ 的上界。

3.3 Growth function of linear combinations

A linearly separable labeling of a set $X$ of vectors in $R^{d}$ is a classification of $X$ into two sets $X^{+}$ and $X^{-}$ with $X^{+} =$ ${x \in X : w \cdot x > 0}$ and $X^{-} = {x \in X : w \cdot x < 0}$ for some $w \in R^{d}$ .

Let $X = {x_{1}, \dots, x_{m}}$ be a subset of $R^{d}$ .

Let ${X^{+}, X^{-}}$ be a dichotomy of $X$ and let $x_{m + 1} \in R^{d}$ . Show that ${X^{+} \cup {x_{m + 1}}, X^{-}}$ and ${X^{+}, X^{-} \cup {x_{m + 1}}}$ are linearly separable by a hyperplane going through the origin if and only if ${X^{+}, X^{-}}$ is linearly separable by a hyperplane going through the origin and $x_{m + 1}$ .
Let $X = {x_{1}, \dots, x_{m}}$ be a subset of $R^{d}$ such that any $k$ -element subset of $x$ with $k \leq d$ is linearly independent. Then, show that the number of linearly separable labelings of $X$ is $C (m, d) = 2 k = 0 \sum d - 1 (m - 1 k)$ .
- Hint: prove by induction that $C (m + 1, d) = C (m, d) + C (m, d - 1)$
Let $f_{1}, \dots, f_{p}$ be $p$ functions mapping $R^{d}$ to $R$ . Define $F$ as the family of classifiers based on linear combinations of these functions: $F = {x \mapsto sgn (k = 1 \sum p a_{k} f_{k} (x)) : a_{1}, \dots, a_{p} \in R} .$
- Define $Ψ$ by $Ψ (x) = (f_{1} (x), \dots, f_{p} (x))$ . Assume that there exists $x_{1}, \dots, x_{m} \in R^{d}$ such that every $p$ -subset of ${Ψ (x_{1}), \dots, Ψ (x_{m})}$ is linearly independent. Then, show that $Π_{F} (m) = 2 i = 0 \sum p - 1 (m - 1 i) .$

3.4 Lower bound on growth function.

Prove that Sauer’s lemma (theorem 3.17) is tight, i.e., for any set $x$ of $m > d$ elements, show that there exists a hypothesis class $H$ of VC-dimension $d$ such that $Π_{H} (m) = i = 0 \sum d (m i)$ .

3.5 Finer Rademacher upper bound. Show that a finer upper bound on the Rademacher complexity of the family $G$ can be given in terms of $E_{S} [Π (G, S)]$ , where $Π (G, S)$ is the number of ways to label the points in sample $S$ .

3.6 Singleton hypothesis class. Consider the trivial hypothesis set $H = {h_{0}}$ .

(a) Show that $R_{m} (H) = 0$ for any $m > 0$ .

(b) Use a similar construction to show that Massart’s lemma (theorem 3.7) is tight.

3.7 Two function hypothesis class. Let $H$ be a hypothesis set reduced to two functions: $H = {h_{- 1}, h_{+ 1}}$ and let $S = (x_{1}, \dots, x_{m}) \subseteq X$ be a sample of size $m$ .

(a) Assume that $h_{- 1}$ is the constant function taking value -1 and $h_{+ 1}$ the constant function taking the value +1 . What is the VC-dimension $d$ of $H$ ? Upper bound the empirical Rademacher complexity $ℜ_{S} (H)$ (Hint: express $R_{S} (H)$ in terms of the absolute value of a sum of Rademacher variables and apply Jensen’s inequality) and compare your bound with $d / m$ .

(b) Assume that $h_{- 1}$ is the constant function taking value -1 and $h_{+ 1}$ the function taking value -1 everywhere except at $x_{1}$ where it takes the value +1. What is the VC-dimension $d$ of $H$ ? Compute the empirical Rademacher complexity $R_{S} (H)$ .

3.8 Rademacher identities. Fix $m \geq 1$ . Prove the following identities for any $α \in R$ and any two hypothesis sets $H$ and $H^{'}$ of functions mapping from $X$ to $R$ :

(a) $ℜ_{m} (α H) = ∣ α ∣ ℜ_{m} (H)$ .

(b) $ℜ_{m} (H + H^{'}) = ℜ_{m} (H) + ℜ_{m} (H^{'})$ .

where $max (h, h^{'})$ denotes the function $x \mapsto x \in X max (h (x), h^{'} (x))$ (Hint: you could use the identity $max (a, b) = \frac{1}{2} [a + b + ∣ a - b ∣]$ valid for all $a, b \in R$ and Talagrand’s contraction lemma (see lemma 5.7)).

3.9 Rademacher complexity of intersection of concepts. Let $H_{1}$ and $H_{2}$ be two families of functions mapping $X$ to ${0, 1}$ and let $H = {h_{1} h_{2} : h_{1} \in H_{1}, h_{2} \in$ $H_{2}}$ . Show that the empirical Rademacher complexity of $H$ for any sample $S$ of size $m$ can be bounded as follows:

$R_{S} (H) \leq R_{S} (H_{1}) + R_{S} (H_{2})$

Hint: use the Lipschitz function $x \mapsto max (0, x - 1)$ and Talagrand’s contraction lemma.

Use that to bound the Rademacher complexity $R_{m} (U)$ of the family $U$ of intersections of two concepts $c_{1}$ and $c_{2}$ with $c_{1} \in C_{1}$ and $c_{2} \in C_{2}$ in terms of the Rademacher complexities of $C_{1}$ and $C_{2}$ .

3.10 Rademacher complexity of prediction vector. Let $S = (x_{1}, \dots, x_{m})$ be a sample of size $m$ and fix $h : X \to R$ .

(a) Denote by $u$ the vector of predictions of $h$ for $S : u = h (x_{1}) ⋮ h (x_{m})$ . Give an upper bound on the empirical Rademacher complexity $R_{S} (H)$ of $H =$ ${h, - h}$ in terms of $∥ u ∥_{2}$ (Hint: express $ℜ_{S} (H)$ in terms of the expectation of an absolute value and apply Jensen’s inequality). Suppose that $h (x_{i}) \in$ ${0, - 1, + 1}$ for all $i \in [m]$ . Express the bound on the Rademacher complexity in terms of the sparsity measure $n = ∣ {i ∣ h (x_{i}) \neq = 0} ∣$ . What is that upper bound for the extreme values of the sparsity measure?

(b) Let $F$ be a family of functions mapping $x$ to $R$ . Give an upper bound on the empirical Rademacher complexity of $F + h = {f + h : f \in F}$ and that of $F \pm h = (F + h) \cup (F - h)$ in terms of $R_{S} (F)$ and $∥ u ∥_{2}$ .

3.11 Rademacher complexity of regularized neural networks. Let the input space be $X = R^{n_{1}}$ . In this problem, we consider the family of regularized neural networks defined by the following set of functions mapping $X$ to $R$ :

$H = {x \mapsto j = 1 \sum n_{2} w_{j} σ (u_{j} \cdot x) :∥ w ∥_{1} \leq Λ^{'}, u_{j}_{2} \leq Λ, \forall j \in [n_{2}]},$

where $σ$ is an $L$ -Lipschitz function. As an example, $σ$ could be the sigmoid function which is 1-Lipschitz.

(a) Show that $ℜ_{S} (H) = \frac{Λ ^{'}}{m} E_{σ} [∥ u ∥_{2} \leq Λ sup i = 1 \sum m σ_{i} σ (u \cdot x_{i})]$ .

(b) Use the following form of Talagrand’s lemma valid for all hypothesis sets $H$ and $L$ -Lipschitz function $Φ$ :

$\frac{1}{m} E [h \in H sup i = 1 \sum m σ_{i} (Φ \circ h) (x_{i})] \leq \frac{L}{m} E [h \in H sup i = 1 \sum m σ_{i} h (x_{i})],$

to upper bound $R_{S} (H)$ in terms of the empirical Rademacher complexity of $H^{'}$ , where $H^{'}$ is defined by

$H^{'} = {x \mapsto s (u \cdot x) :∥ u ∥_{2} \leq Λ, s \in {- 1, + 1}} .$

$R_{S} (H^{'}) = \frac{Λ}{m} σ E [i = 1 \sum m σ_{i} x_{i}_{2}] .$

(d) Use the inequality $E_{y} [∥ v ∥_{2}] \leq E_{v} [∥ v ∥_{2}^{2}]$ , which holds by Jensen’s inequality to upper bound $R_{S} (H^{'})$ .

(e) Assume that for all $x \in S, ∥ x ∥_{2} \leq r$ for some $r > 0$ . Use the previous questions to derive an upper bound on the Rademacher complexity of $H$ in terms of $r$ .

3.12 Rademacher complexity. Professor Jesetoo claims to have found a better bound on the Rademacher complexity of any hypothesis set $H$ of functions taking values in ${- 1, + 1}$ , in terms of its VC-dimension VCdim $(H)$ . His bound is of the form $R_{m} (H) \leq O (\frac{VCdim ( H )}{m})$ . Can you show that Professor Jesetoo’s claim cannot be correct? (Hint: consider a hypothesis set $H$ reduced to just two simple functions.)

3.13 VC-dimension of union of $k$ intervals. What is the VC-dimension of subsets of the real line formed by the union of $k$ intervals?

3.14 VC-dimension of finite hypothesis sets. Show that the VC-dimension of a finite hypothesis set $H$ is at most $lo g_{2} ∣ H ∣$ .

3.15 VC-dimension of subsets. What is the VC-dimension of the set of subsets $I_{α}$ of the real line parameterized by a single parameter $α : I_{α} = [α, α + 1] \cup [α + 2, + \infty)$ ?

3.16 VC-dimension of axis-aligned squares and triangles.

(a) What is the VC-dimension of axis-aligned squares in the plane?

(b) Consider right triangles in the plane with the sides adjacent to the right angle both parallel to the axes and with the right angle in the lower left corner. What is the VC-dimension of this family?

3.17 VC-dimension of closed balls in $R^{n}$ . Show that the VC-dimension of the set of all closed balls in $R^{n}$ , i.e., sets of the form ${x \in R^{n} : x - x_{0}^{2} \leq r}$ for some $x_{0} \in R^{n}$ and $r \geq 0$ , is less than or equal to $n + 2$ .

3.18 VC-dimension of ellipsoids. What is the VC-dimension of the set of all ellipsoids in $R^{n}$ ?

3.19 VC-dimension of a vector space of real functions. Let $F$ be a finite-dimensional vector space of real functions on $R^{n}, dim (F) = r < \infty$ . Let $H$ be the set of hypotheses:

$H = {{x : f (x) \geq 0} : f \in F} .$

Show that $d$ , the VC-dimension of $H$ , is finite and that $d \leq r$ . (Hint: select an arbitrary set of $m = r + 1$ points and consider linear mapping $u : F \to R^{m}$ defined by: $u (f) = (f (x_{1}), \dots, f (x_{m}))$ .)

3.20 VC-dimension of sine functions. Consider the hypothesis family of sine functions (Example 3.16): ${x \to sin (ω x) : ω \in R}$ .

(a) Show that for any $x \in R$ the points $x, 2 x, 3 x$ and $4 x$ cannot be shattered by this family of sine functions.

(b) Show that the VC-dimension of the family of sine functions is infinite. (Hint: show that ${2^{- i} : i \leq m}$ can be shattered for any $m > 0$ .)

3.21 VC-dimension of union of halfspaces. Provide an upper bound on the VC-dimension of the class of hypotheses described by the unions of $k$ halfspaces.

3.22 VC-dimension of intersection of halfspaces. Consider the class $C_{k}$ of convex intersections of $k$ halfspaces. Give lower and upper bound estimates for $VCdim (C_{k})$ .

3.23 VC-dimension of intersection concepts.

(a) Let $C_{1}$ and $C_{2}$ be two concept classes. Show that for any concept class $C = {c_{1} \cap c_{2} : c_{1} \in C_{1}, c_{2} \in C_{2}}$

${\Pi }_{\mathcal{C}}\left( m\right) \leq {\Pi }_{{\mathcal{C}}_{1}}\left( m\right) {\Pi }_{{\mathcal{C}}_{2}}\left( m\right) \tag{3.53}$

(b) Let $C$ be a concept class with VC-dimension $d$ and let $C_{s}$ be the concept class formed by all intersections of $s$ concepts from $C, s \geq 1$ . Show that the VC-dimension of $C_{s}$ is bounded by $2 d s lo g_{2} (3 s)$ . (Hint: show that $lo g_{2} (3 x) <$ $9 x / (2 e)$ for any $x \geq 2$ .)

3.24 VC-dimension of union of concepts. Let $A$ and $B$ be two sets of functions mapping from $X$ into ${0, 1}$ , and assume that both $A$ and $B$ have finite VC-dimension, with $VCdim (A) = d_{A}$ and $VCdim (B) = d_{B}$ . Let $C = A \cup B$ be the union of $A$ and $B$ .

(a) Prove that for all $m, Π_{C} (m) \leq Π_{A} (m) + Π_{B} (m)$ .

(b) Use Sauer’s lemma to show that for $m \geq d_{A} + d_{B} + 2, Π_{C} (m) < 2^{m}$ , and give a bound on the VC-dimension of $C$ .

3.25 VC-dimension of symmetric difference of concepts. For two sets $A$ and $B$ , let $A Δ B$ denote the symmetric difference of $A$ and $B$ , i.e., $A Δ B = (A \cup B) - (A \cap B)$ . Let $H$ be a non-empty family of subsets of $x$ with finite VC-dimension. Let $A$ be an element of $H$ and define $H Δ A = {X Δ A : X \in H}$ . Show that

$VCdim (H Δ A) = VCdim (H)$

3.26 Symmetric functions. A function $h : {0, 1}^{n} \to {0, 1}$ is symmetric if its value is uniquely determined by the number of 1’s in the input. Let $C$ denote the set of all symmetric functions.

(a) Determine the VC-dimension of $C$ .

(b) Give lower and upper bounds on the sample complexity of any consistent PAC learning algorithm for $C$ .

(c) Note that any hypothesis $h \in C$ can be represented by a vector $(y_{0}, y_{1}, \dots, y_{n})$ $\in {0, 1}^{n + 1}$ , where $y_{i}$ is the value of $h$ on examples having precisely $i 1$ ‘s. Devise a consistent learning algorithm for $C$ based on this representation.

3.27 VC-dimension of neural networks. {#vc-dimension-of-neural-networks. .unnumbered}

Let $C$ be a concept class over $R^{r}$ with VC-dimension $d$ . A $C$ -neural network with one intermediate layer is a concept defined over $R^{n}$ that can be represented by a directed acyclic graph such as that of Figure 3.7, in which the input nodes are those at the bottom and in which each other node is labeled with a concept $c \in C$ .

The output of the neural network for a given input vector $(x_{1}, \dots, x_{n})$ is obtained as follows. First, each of the $n$ input nodes is labeled with the corresponding value $x_{i} \in R$ . Next, the value at a node $u$ in the higher layer and labeled with $c$ is obtained by applying $c$ to the values of the input nodes admitting an

::: center {width=“30%”} :::

Figure 3.7

A neural network with one intermediate layer.

edge ending in $u$ . Note that since $c$ takes values in ${0, 1}$ , the value at $u$ is in ${0, 1}$ . The value at the top or output node is obtained similarly by applying the corresponding concept to the values of the nodes admitting an edge to the output node.

(a) Let $H$ denote the set of all neural networks defined as above with $k \geq 2$ internal nodes. Show that the growth function $Π_{H} (m)$ can be upper bounded in terms of the product of the growth functions of the hypothesis sets defined at each intermediate layer.

(b) Use that to upper bound the VC-dimension of the C-neural networks (Hint: you can use the implication $m = 2 x lo g_{2} (x y) \Rightarrow m > x lo g_{2} (y m)$ valid for $m \geq 1$ , and $x, y > 0$ with $x y > 4$ ).

(c) Let $C$ be the family of concept classes defined by threshold functions $C =$ ${sgn (j = 1 \sum r w_{j} x_{j}) : w \in R^{r}}$ . Give an upper bound on the VC-dimension of $H$ in terms of $k$ and $r$ .

3.28 VC-dimension of convex combinations. Let $H$ be a family of functions mapping from an input space $X$ to ${- 1, + 1}$ and let $T$ be a positive integer. Give an upper bound on the VC-dimension of the family of functions $F_{T}$ defined by

$F = {sgn (t = 1 \sum T α_{t} h_{t}) : h_{t} \in H, α_{t} \geq 0, t = 1 \sum T α_{t} \leq 1} .$

(Hint: you can use exercise 3.27 and its solution).

3.29 Infinite VC-dimension.

(a) Show that if a concept class $C$ has infinite VC-dimension, then it is not PAC-learnable.

(b) In the standard PAC-learning scenario, the learning algorithm receives all examples first and then computes its hypothesis. Within that setting, PAC-learning of concept classes with infinite VC-dimension is not possible as seen in the previous question.

Imagine now a different scenario where the learning algorithm can alternate between drawing more examples and computation. The objective of this problem is to prove that PAC-learning can then be possible for some concept classes with infinite VC-dimension.

Consider for example the special case of the concept class $C$ of all subsets of natural numbers. Professor Vitres has an idea for the first stage of a learning algorithm $L$ PAC-learning $C$ . In the first stage, $L$ draws a sufficient number of points $m$ such that the probability of drawing a point beyond the maximum value $M$ observed be small with high confidence. Can you complete Professor Vitres’ idea by describing the second stage of the algorithm so that it PAC-learns $C$ ? The description should be augmented with the proof that $L$ can PAC-learn C.

3.30 VC-dimension generalization bound - realizable case. In this exercise we show that the bound given in corollary 3.19 can be improved to $O (\frac{d l o g ( m / d )}{m})$ in the realizable setting. Assume we are in the realizable scenario, i.e. the target concept is included in our hypothesis class $H$ . We will show that if a hypothesis $h$ is consistent with a sample $S \sim D^{m}$ then for any $ϵ > 0$ such that $m ϵ \geq 8$

$\mathbb{P}\left\lbrack {R\left( h\right) > \epsilon }\right\rbrack \leq 2{\left\lbrack \frac{2em}{d}\right\rbrack }^{d}{2}^{-{m\epsilon }/2}. \tag{3.54}$

(a) Let $H_{S} \subseteq H$ be the subset of hypotheses consistent with the sample $S$ , let $R_{S} (h)$ denote the empirical error with respect to the sample $S$ and define $S^{'}$ as another independent sample drawn from $D^{m}$ . Show that the following inequality holds for any $h_{0} \in H_{S}$ :

$P [h \in H_{S} sup R_{S} (h) - R_{S^{'}} (h) > \frac{ϵ}{2}] \geq P [B (m, ϵ) > \frac{m ϵ}{2}] P [R (h_{0}) > ϵ],$

where $B (m, ϵ)$ is a binomial random variable with parameters $(m, ϵ)$ . (Hint: prove and use the fact that $P [R_{S} (h) \geq \frac{ϵ}{2}] \geq P [R_{S} (h) > \frac{ϵ}{2} \land R (h) > ϵ]$ .)

(b) Prove that $P [B (m, ϵ) > \frac{m ϵ}{2}] \geq \frac{1}{2}$ . Use this inequality along with the result from (a) to show that for any $h_{0} \in H_{S}$

$P [R (h_{0}) > ϵ] \leq 2 P [h \in H_{S} sup R_{S} (h) - R_{S^{'}} (h) > \frac{ϵ}{2}] .$

(c) Instead of drawing two samples, we can draw one sample $T$ of size $2 m$ then uniformly at random split it into $S$ and $S^{'}$ . The right hand side of part (b) can then be rewritten as:

$P [h \in H_{S} sup R_{S} (h) - R_{S^{'}} (h) > \frac{ϵ}{2}] = T \sim D^{2 m}, T \to [S, S^{'}] P [\exists h \in H : R_{S} (h) = 0 \land R_{S^{'}} (h) > \frac{ϵ}{2}] .$

Let $h_{0}$ be a hypothesis such that $R_{T} (h_{0}) > \frac{ϵ}{2}$ and let $l > \frac{m ϵ}{2}$ be the total number of errors $h_{0}$ makes on $T$ . Show that the probability of all $l$ errors falling into $S^{'}$ is upper bounded by $2^{- l}$ .

(d) Part (b) implies that for any $h \in H$

$T \sim D^{2 m} : T \to (S, S^{'}) P [R_{S} (h) = 0 \land R_{S^{'}} (h) > \frac{ϵ}{2} ∣ R_{T} (h_{0}) > \frac{ϵ}{2}] \leq 2^{- l} .$

Use this bound to show that for any $h \in H$

$T \sim D^{2 m}; T \to (S, S^{'}) P [R_{S} (h) = 0 \land R_{S^{'}} (h) > \frac{ϵ}{2}] \leq 2^{- \frac{ϵ m}{2}} .$

(e) Complete the proof of inequality (3.54) by using the union bound to upper bound $P_{T \sim D^{2} m} : [\exists h \in H : R_{S} (h) = 0 \land R_{S^{'}} (h) > \frac{ϵ}{2}]$ . Show that we can achieve a high probability generalization bound that is of the order $O (\frac{d l o g ( m / d )}{m})$ .

3.31 Generalization bound based on covering numbers. Let $H$ be a family of functions mapping $x$ to a subset of real numbers $y \subseteq R$ . For any $ϵ > 0$ , the covering number $N (H, ϵ)$ of $H$ for the $L_{\infty}$ norm is the minimal $k \in N$ such that $H$ can be covered with $k$ balls of radius $ϵ$ , that is, there exists ${h_{1}, \dots, h_{k}} \subseteq H$ such that, for all $h \in H$ , there exists $i \leq k$ with $h - h_{i}_{\infty} = x \in X max ∣ h (x) - h_{i} (x) ∣ \leq ϵ$ . In particular, when $H$ is a compact set, a finite covering can be extracted from a covering of $H$ with balls of radius $ϵ$ and thus $N (H, ϵ)$ is finite.

Covering numbers provide a measure of the complexity of a class of functions: the larger the covering number, the richer is the family of functions. The objective of this problem is to illustrate this by proving a learning bound in the case of the squared loss. Let $D$ denote a distribution over $X \times Y$ according to which

labeled examples are drawn. Then, the generalization error of $h \in H$ for the squared loss is defined by $R (h) = E_{(x, y) \sim D} [(h (x) - y)^{2}]$ and its empirical error for a labeled sample $S = ((x_{1}, y_{1}), \dots, (x_{m}, y_{m}))$ by $R_{S} (h) = \frac{1}{m} i = 1 \sum m (h (x_{i}) - y_{i})^{2}$ . We will assume that $H$ is bounded, that is there exists $M > 0$ such that $∣ h (x) - y ∣ \leq M$ for all $(x, y) \in X \times y$ . The following is the generalization bound proven in this problem:

$\underset{S \sim {\mathcal{D}}^{m}}{\mathbb{P}}\left\lbrack {\mathop{\sup }\limits_{{h \in \mathcal{H}}}\left| {R\left( h\right) - {\widehat{R}}_{S}\left( h\right) }\right| \geq \epsilon }\right\rbrack \leq \mathcal{N}\left( {\mathcal{H},\frac{\epsilon }{8M}}\right) 2\exp \left( \frac{-m{\epsilon }^{2}}{2{M}^{4}}\right) . \tag{3.55}$

The proof is based on the following steps.

(a) Let $L_{S} = R (h) - R_{S} (h)$ , then show that for all $h_{1}, h_{2} \in H$ and any labeled sample $S$ , the following inequality holds:

$∣ L_{S} (h_{1}) - L_{S} (h_{2}) ∣ \leq 4 M h_{1} - h_{2}_{\infty} .$

(b) Assume that $H$ can be covered by $k$ subsets $B_{1}, \dots, B_{k}$ , that is $H = B_{1} \cup$ $\dots \cup B_{k}$ . Then, show that, for any $ϵ > 0$ , the following upper bound holds:

$S \sim D^{m} P [h \in H sup ∣ L_{S} (h) ∣ \geq ϵ] \leq i = 1 \sum k S \sim D^{m} P [h \in B_{i} sup ∣ L_{S} (h) ∣ \geq ϵ] .$

(c) Finally, let $k = N (H, \frac{ϵ}{8 M})$ and let $B_{1}, \dots, B_{k}$ be balls of radius $ϵ / (8 M)$ centered at $h_{1}, \dots, h_{k}$ covering $H$ . Use part (a) to show that for all $i \in [k]$ ,

$S \sim D^{m} P [h \in B_{i} sup ∣ L_{S} (h) ∣ \geq ϵ] \leq S \sim D^{m} P [∣ L_{S} (h_{i}) ∣ \geq \frac{ϵ}{2}],$

and apply Hoeffding’s inequality (theorem D.2) to prove (3.55).

Youliang Zhong

Table of Contents

Backlinks

Graph View

3.6 Exercises

3.1 Growth function of intervals in $R$

3.2 Growth function and Rademacher complexity of thresholds in $R$

3.3 Growth function of linear combinations

3.4 Lower bound on growth function.

Youliang Zhong

Table of Contents

Backlinks

Graph View

3.6 Exercises

3.1 Growth function of intervals in R

3.2 Growth function and Rademacher complexity of thresholds in R

3.3 Growth function of linear combinations

3.4 Lower bound on growth function.

3.1 Growth function of intervals in $R$

3.2 Growth function and Rademacher complexity of thresholds in $R$