Data Space

In the tradition of Tkinter SVM GUI, the purpose of this website is to demonstrate how machine learning model forms are affected by the shape of the underlying dataset. By selecting a dataset or by creating one of your own, you can fit a model to the data and see how the model would make decisions based on the data it has been trained on. The fitted contours display the highest likelihoods of the class the model would select.

Although this is a toy example, hopefully it helps give you the intuition that the machine learning process is a model selection search for the best combination of features, algorithm, and hyperparameter that generalize well in a bounded feature space.

This application is for demonstration purposes only.

Support vector machines are supervised, discriminitive classifiers that learn an optimal hyperplane that can separate and categorize data. This hyperplane (e.g. an defined space one dimension less than the ambient space) maximizes the distance between groups of classes by selecting support vectors from each group (potentially with some slack) then finding the parallel hyperplane between those vectors that is halfway between the orthogonal. To optimize support vector discovery, the kernel functions are used to find mappings that increase the space between points, increasing separability between classes.

Hyperparameters

C · float: Penalty parameter C of the error term.
kernel · {'linear', 'poly', 'rbf', 'sigmoid', 'precomputed', None}: Specifies the kernel type to be used in the algorithm. It must be one of the string choices or a callable. If None is given, ‘rbf’ will be used. If a callable is given it is used to pre-compute the kernel matrix from data matrices; that matrix should be an array of shape (n_samples, n_samples).
degree · int: Degree of the polynomial kernel function (‘poly’). Ignored by all other kernels.
gamma · float: Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.
coef0 · float: Independent term in kernel function. It is only significant in ‘poly’ and ‘sigmoid’.
shrinking · boolean: Whether to use the shrinking heuristic.
tol · float: Tolerance for stopping criterion.
class_weight · {dict, 'balanced'}: Set the parameter C of class i to class_weight[i]*C for SVC. If not given, all classes are supposed to have weight one. The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))
max_iter · int: Hard limit on iterations within solver, or -1 for no limit.
decision_function_shape · {‘ovo’, ‘ovr’}: Whether to return a one-vs-rest (‘ovr’) decision function of shape (n_samples, n_classes) as all other classifiers, or the original one-vs-one (‘ovo’) decision function of libsvm which has shape (n_samples, n_classes * (n_classes - 1) / 2). However, one-vs-one (‘ovo’) is always used as multi-class strategy.

Logistic Regression is a supervised classification algorithm that models the probabilities describing the possible outcome (class) of a single trial using a logistic function. This method is also known as a logit regression, maximum-entropy classifier, or log-linear classifier.

Hyperparameters

penalty · {'l1', 'l2', 'elasticnet', 'none'}: Used to specify the norm used in the penalization. The ‘newton-cg’, ‘sag’ and ‘lbfgs’ solvers support only l2 penalties. ‘elasticnet’ is only supported by the ‘saga’ solver. If ‘none’ (not supported by the liblinear solver), no regularization is applied.
dual · bool: Dual or primal formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer dual=False when n_samples > n_features.
tol · float: Tolerance for stopping criteria.
C · float: Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.
fit_intercept · bool: Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.
intercept_scaling · float: Useful only when the solver ‘liblinear’ is used and self.fit_intercept is set to True. In this case, x becomes [x, self.intercept_scaling], i.e. a “synthetic” feature with constant value equal to intercept_scaling is appended to the instance vector.
class_weight · {dict, 'balanced'}: Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one. The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)).
solver · {'newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'}: Algorithm to use in the optimization problem.
max_iter · int: Maximum number of iterations taken for the solvers to converge.
multi_class · {'ovr', 'multinomial', 'auto'}: If the option chosen is ‘ovr’, then a binary problem is fit for each label. For ‘multinomial’ the loss minimised is the multinomial loss fit across the entire probability distribution, even when the data is binary. ‘multinomial’ is unavailable when solver=’liblinear’. ‘auto’ selects ‘ovr’ if the data is binary, or if solver=’liblinear’, and otherwise selects ‘multinomial’.
l1_ratio · float: The Elastic-Net mixing parameter, with 0 <= l1_ratio <=1. Only used if penalty='elasticnet'. Setting l1_ratio=0 is equivalent to using penalty='l2', while setting l1_ratio=1 is equivalent to using penalty='l1'. For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2.