`phishbench.classification`¶

This module handles the training of classifiers.

Built in classifiers can be found in the classifiers subpackage. Users can write custom classifiers by subclassing the BaseClassifier class.

phishbench.classification.train_classifiers(x_train, y_train, io_dir, verbose=1)¶

Train classifiers on the provided dataset according to the configuration file.

Parameters

x_train (array-like or sparse matrix of shape (n_samples, n_features)) – The training feature vectors
y_train (array-like of shape (n_samples)) – The training label vector
io_dir (str) – The folder to interact with
verbose (int) –
The level of verbosity in its output to stdout.
- 0 prints nothing.
- 1 prints the classifiers being trained

Returns

The trained classifiers

Return type

A list of BaseClassifier objects

phishbench.classification.load_classifiers(filter_classifiers=True)¶

Loads internal classifiers and classifiers from the working directory

Parameters: filter_classifiers (bool) – Whether or not to use the config to filter the classifiers using the configuration file
Returns: The loaded classifiers
Return type: A list of subclasses of BaseClassifier

phishbench.classification.load_classifiers_from_module(source, filter_classifiers=True)¶

Loads classifiers from a module

Parameters

source (module) – The module to load the classifiers from
filter_classifiers (bool) – Whether or not to use the config to filter the classifiers

Returns

The loaded classifiers

Return type

A list of subclasses of BaseClassifier

class phishbench.classification.BaseClassifier(io_dir, save_file)¶

The base class for PhishBench Classifiers. Your custom classifiers should subclass this class.

__init__(io_dir, save_file)¶

Initializes the BaseClassifier

Parameters

io_dir (str) – The folder to save this classifier to.
save_file (str) – The file to save this classifier to

fit(x, y)¶

Trains the classifier on the provided dataset.

If being used as a wrapper for a scikit-learn style classifier, then implementations of this function can simply store the trained underlying classifier in self.clf. Other implementations should also override predict and predict_proba

Parameters

x (array-like or sparse matrix of shape (n_samples, n_features)) – Training feature vectors
y (array-like of shape (n_samples)) – Target values, with 0 being legitimate and 1 being phishing

fit_weighted(x, y)¶

Trains the classifier using weighted training. If this method is not implemented, PhishBench will issue a warning to stdout and default to unweighted training.

If being used as a wrapper for a scikit-learn style classifier, then implementations of this function can simply store the trained underlying classifier in self.clf. Other implementations should also override predict and predict_proba

Parameters

x (array-like or sparse matrix of shape (n_samples, n_features)) – Training feature vectors
y (array-like of shape (n_samples)) – Target values, with 0 being legitimate and 1 being phishing

param_search(x, y)¶

Performs parameter search to find the best parameters.

Parameters

x (array-like or sparse matrix of shape (n_samples, n_features)) – Training feature vectors
y (array-like of shape (n_samples)) – Target values, with 0 being legitimate and 1 being phishing

Returns

The best parameters.

Return type

dict

predict(x)¶

Perform classification on x

Parameters: x (array-like or sparse matrix of shape (n_samples, n_features)) – The samples to generate predictions for
Returns: The predicted class values
Return type: array-like of shape (n_samples)

predict_proba(x)¶

Estimates the probability of samples in X being phishing.

Parameters: x (array-like or sparse matrix of shape (n_samples, n_features)) – The samples to generate probabilities for
Returns: The probability of each test sample being phish
Return type: array-like of shape (n_samples)

load_model()¶: Loads the model from self.model_path

save_model()¶: Saves the model to self.model_path

phishbench.classification¶

`phishbench.classification`¶