phishbench.classification¶
This module handles the training of classifiers.
Built in classifiers can be found in the classifiers subpackage. Users can write custom classifiers by subclassing the BaseClassifier class.
-
phishbench.classification.train_classifiers(x_train, y_train, io_dir, verbose=1)¶ Train classifiers on the provided dataset according to the configuration file.
- Parameters
x_train (array-like or sparse matrix of shape (n_samples, n_features)) – The training feature vectors
y_train (array-like of shape (n_samples)) – The training label vector
io_dir (str) – The folder to interact with
verbose (int) –
The level of verbosity in its output to stdout.
0prints nothing.1prints the classifiers being trained
- Returns
The trained classifiers
- Return type
A list of
BaseClassifierobjects
-
phishbench.classification.load_classifiers(filter_classifiers=True)¶ Loads internal classifiers and classifiers from the working directory
- Parameters
filter_classifiers (bool) – Whether or not to use the config to filter the classifiers using the configuration file
- Returns
The loaded classifiers
- Return type
A list of subclasses of
BaseClassifier
-
phishbench.classification.load_classifiers_from_module(source, filter_classifiers=True)¶ Loads classifiers from a module
- Parameters
source (module) – The module to load the classifiers from
filter_classifiers (bool) – Whether or not to use the config to filter the classifiers
- Returns
The loaded classifiers
- Return type
A list of subclasses of
BaseClassifier
-
class
phishbench.classification.BaseClassifier(io_dir, save_file)¶ The base class for PhishBench Classifiers. Your custom classifiers should subclass this class.
-
__init__(io_dir, save_file)¶ Initializes the BaseClassifier
- Parameters
io_dir (str) – The folder to save this classifier to.
save_file (str) – The file to save this classifier to
-
fit(x, y)¶ Trains the classifier on the provided dataset.
If being used as a wrapper for a scikit-learn style classifier, then implementations of this function can simply store the trained underlying classifier in self.clf. Other implementations should also override predict and predict_proba
- Parameters
x (array-like or sparse matrix of shape (n_samples, n_features)) – Training feature vectors
y (array-like of shape (n_samples)) – Target values, with 0 being legitimate and 1 being phishing
-
fit_weighted(x, y)¶ Trains the classifier using weighted training. If this method is not implemented, PhishBench will issue a warning to stdout and default to unweighted training.
If being used as a wrapper for a scikit-learn style classifier, then implementations of this function can simply store the trained underlying classifier in self.clf. Other implementations should also override predict and predict_proba
- Parameters
x (array-like or sparse matrix of shape (n_samples, n_features)) – Training feature vectors
y (array-like of shape (n_samples)) – Target values, with 0 being legitimate and 1 being phishing
-
param_search(x, y)¶ Performs parameter search to find the best parameters.
- Parameters
x (array-like or sparse matrix of shape (n_samples, n_features)) – Training feature vectors
y (array-like of shape (n_samples)) – Target values, with 0 being legitimate and 1 being phishing
- Returns
The best parameters.
- Return type
dict
-
predict(x)¶ Perform classification on x
- Parameters
x (array-like or sparse matrix of shape (n_samples, n_features)) – The samples to generate predictions for
- Returns
The predicted class values
- Return type
array-like of shape (n_samples)
-
predict_proba(x)¶ Estimates the probability of samples in X being phishing.
- Parameters
x (array-like or sparse matrix of shape (n_samples, n_features)) – The samples to generate probabilities for
- Returns
The probability of each test sample being phish
- Return type
array-like of shape (n_samples)
-
load_model()¶ Loads the model from self.model_path
-
save_model()¶ Saves the model to self.model_path
-