We limit ourselves to two classes of outputs y_i and assign these classes the values y_i = \pm 1 . In a p -dimensional space of say p features we have a hyperplane defines as b+wx_1+w_2x_2+\dots +w_px_p=0. If we define a matrix \boldsymbol{X}=\left[\boldsymbol{x}_1,\boldsymbol{x}_2,\dots, \boldsymbol{x}_p\right] of dimension n\times p , where n represents the observations for each feature and each vector x_i is a column vector of the matrix \boldsymbol{X} , \boldsymbol{x}_i = \begin{bmatrix} x_{i1} \\ x_{i2} \\ \dots \\ \dots \\ x_{ip} \end{bmatrix}. If the above condition is not met for a given vector \boldsymbol{x}_i we have b+w_1x_{i1}+w_2x_{i2}+\dots +w_px_{ip} >0, if our output y_i=1 . In this case we say that \boldsymbol{x}_i lies on one of the sides of the hyperplane and if b+w_1x_{i1}+w_2x_{i2}+\dots +w_px_{ip} < 0, for the class of observations y_i=-1 , then \boldsymbol{x}_i lies on the other side.
Equivalently, for the two classes of observations we have y_i\left(b+w_1x_{i1}+w_2x_{i2}+\dots +w_px_{ip}\right) > 0.
When we try to separate hyperplanes, if it exists, we can use it to construct a natural classifier: a test observation is assigned a given class depending on which side of the hyperplane it is located.