Data Space
  • Home (current)
  • About

Training F1 Score:

Made with ♠ by @bbengfort for the Georgetown Data Science Certificate ●
About Data Space

In the tradition of Tkinter SVM GUI, the purpose of this website is to demonstrate how machine learning model forms are affected by the shape of the underlying dataset. By selecting a dataset or by creating one of your own, you can fit a model to the data and see how the model would make decisions based on the data it has been trained on. The fitted contours display the highest likelihoods of the class the model would select.

Although this is a toy example, hopefully it helps give you the intuition that the machine learning process is a model selection search for the best combination of features, algorithm, and hyperparameter that generalize well in a bounded feature space.

This application is for demonstration purposes only.

Naive Bayes

Naive Bayesian models are a collection of supervised classification algorithms that apply Bayes rule of conditional probability with the "naive" assumption of conditional independence between all pairs of features given the value. Bayesian predictions are based on the conditional likelihood of the joint probability of all features and the target class. Becasue features are treated like likelihoods, the primary difference of each classifier is the assumptions they make about the distrubition of the features.

  • GaussianNB : Assumes the likelihood of features is Gaussian, e.g. a range of infinite values.
  • MultinomialNB: Features are treated as a finite number of discrete events measured as a multinomial distribution.
  • BernoulliNB: Features are distributed according to multivariate Bernoulli disribution: e.g. features are either 1 or 0.
  • ComplementNB: A modification of MultinomialNB where the class complement is used - good for class imbalance.
Hyperparameters
Priors/Class Prior · array-like shape (n_classes,)
Prior probabilities of the classes. If specified the priors are not adjusted according to the data. (Not used with ComplementNB)
Smoothing · float
Portion of the largest variance of all features that is added to variances for calculation stability.
Alpha · float
Additive (Laplace/Lidstone) smoothing parameter (0 for no smoothing).
Fit Prior · bool
Whether to learn class prior probabilities or not. If false, a uniform prior will be used.
Binarize · float or None
Threshold for binarizing (mapping to booleans) of sample features. If None, input is presumed to already consist of binary vectors.
Norm · bool
Whether or not a second normalization of the weights is performed.
Support Vector Machines

Support vector machines are supervised, discriminitive classifiers that learn an optimal hyperplane that can separate and categorize data. This hyperplane (e.g. an defined space one dimension less than the ambient space) maximizes the distance between groups of classes by selecting support vectors from each group (potentially with some slack) then finding the parallel hyperplane between those vectors that is halfway between the orthogonal. To optimize support vector discovery, the kernel functions are used to find mappings that increase the space between points, increasing separability between classes.

Hyperparameters
C · float
Penalty parameter C of the error term.
kernel · {'linear', 'poly', 'rbf', 'sigmoid', 'precomputed', None}
Specifies the kernel type to be used in the algorithm. It must be one of the string choices or a callable. If None is given, ‘rbf’ will be used. If a callable is given it is used to pre-compute the kernel matrix from data matrices; that matrix should be an array of shape (n_samples, n_samples).
degree · int
Degree of the polynomial kernel function (‘poly’). Ignored by all other kernels.
gamma · float
Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.
coef0 · float
Independent term in kernel function. It is only significant in ‘poly’ and ‘sigmoid’.
shrinking · boolean
Whether to use the shrinking heuristic.
tol · float
Tolerance for stopping criterion.
class_weight · {dict, 'balanced'}
Set the parameter C of class i to class_weight[i]*C for SVC. If not given, all classes are supposed to have weight one. The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))
max_iter · int
Hard limit on iterations within solver, or -1 for no limit.
decision_function_shape · {‘ovo’, ‘ovr’}
Whether to return a one-vs-rest (‘ovr’) decision function of shape (n_samples, n_classes) as all other classifiers, or the original one-vs-one (‘ovo’) decision function of libsvm which has shape (n_samples, n_classes * (n_classes - 1) / 2). However, one-vs-one (‘ovo’) is always used as multi-class strategy.
Logistic Regression

Logistic Regression is a supervised classification algorithm that models the probabilities describing the possible outcome (class) of a single trial using a logistic function. This method is also known as a logit regression, maximum-entropy classifier, or log-linear classifier.

Hyperparameters
penalty · {'l1', 'l2', 'elasticnet', 'none'}
Used to specify the norm used in the penalization. The ‘newton-cg’, ‘sag’ and ‘lbfgs’ solvers support only l2 penalties. ‘elasticnet’ is only supported by the ‘saga’ solver. If ‘none’ (not supported by the liblinear solver), no regularization is applied.
dual · bool
Dual or primal formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer dual=False when n_samples > n_features.
tol · float
Tolerance for stopping criteria.
C · float
Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.
fit_intercept · bool
Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.
intercept_scaling · float
Useful only when the solver ‘liblinear’ is used and self.fit_intercept is set to True. In this case, x becomes [x, self.intercept_scaling], i.e. a “synthetic” feature with constant value equal to intercept_scaling is appended to the instance vector.
class_weight · {dict, 'balanced'}
Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one. The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)).
solver · {'newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'}
Algorithm to use in the optimization problem.
max_iter · int
Maximum number of iterations taken for the solvers to converge.
multi_class · {'ovr', 'multinomial', 'auto'}
If the option chosen is ‘ovr’, then a binary problem is fit for each label. For ‘multinomial’ the loss minimised is the multinomial loss fit across the entire probability distribution, even when the data is binary. ‘multinomial’ is unavailable when solver=’liblinear’. ‘auto’ selects ‘ovr’ if the data is binary, or if solver=’liblinear’, and otherwise selects ‘multinomial’.
l1_ratio · float
The Elastic-Net mixing parameter, with 0 <= l1_ratio <=1. Only used if penalty='elasticnet'. Setting l1_ratio=0 is equivalent to using penalty='l2', while setting l1_ratio=1 is equivalent to using penalty='l1'. For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2.