phishbench.classification

This module handles the training of classifiers.

Built in classifiers can be found in the classifiers subpackage. Users can write custom classifiers by subclassing the BaseClassifier class.

phishbench.classification.train_classifiers(x_train, y_train, io_dir, verbose=1)

Train classifiers on the provided dataset according to the configuration file.

Parameters
  • x_train (array-like or sparse matrix of shape (n_samples, n_features)) – The training feature vectors

  • y_train (array-like of shape (n_samples)) – The training label vector

  • io_dir (str) – The folder to interact with

  • verbose (int) –

    The level of verbosity in its output to stdout.

    • 0 prints nothing.

    • 1 prints the classifiers being trained

Returns

The trained classifiers

Return type

A list of BaseClassifier objects

phishbench.classification.load_classifiers(filter_classifiers=True)

Loads internal classifiers and classifiers from the working directory

Parameters

filter_classifiers (bool) – Whether or not to use the config to filter the classifiers using the configuration file

Returns

The loaded classifiers

Return type

A list of subclasses of BaseClassifier

phishbench.classification.load_classifiers_from_module(source, filter_classifiers=True)

Loads classifiers from a module

Parameters
  • source (module) – The module to load the classifiers from

  • filter_classifiers (bool) – Whether or not to use the config to filter the classifiers

Returns

The loaded classifiers

Return type

A list of subclasses of BaseClassifier

class phishbench.classification.BaseClassifier(io_dir, save_file)

The base class for PhishBench Classifiers. Your custom classifiers should subclass this class.

__init__(io_dir, save_file)

Initializes the BaseClassifier

Parameters
  • io_dir (str) – The folder to save this classifier to.

  • save_file (str) – The file to save this classifier to

fit(x, y)

Trains the classifier on the provided dataset.

If being used as a wrapper for a scikit-learn style classifier, then implementations of this function can simply store the trained underlying classifier in self.clf. Other implementations should also override predict and predict_proba

Parameters
  • x (array-like or sparse matrix of shape (n_samples, n_features)) – Training feature vectors

  • y (array-like of shape (n_samples)) – Target values, with 0 being legitimate and 1 being phishing

fit_weighted(x, y)

Trains the classifier using weighted training. If this method is not implemented, PhishBench will issue a warning to stdout and default to unweighted training.

If being used as a wrapper for a scikit-learn style classifier, then implementations of this function can simply store the trained underlying classifier in self.clf. Other implementations should also override predict and predict_proba

Parameters
  • x (array-like or sparse matrix of shape (n_samples, n_features)) – Training feature vectors

  • y (array-like of shape (n_samples)) – Target values, with 0 being legitimate and 1 being phishing

Performs parameter search to find the best parameters.

Parameters
  • x (array-like or sparse matrix of shape (n_samples, n_features)) – Training feature vectors

  • y (array-like of shape (n_samples)) – Target values, with 0 being legitimate and 1 being phishing

Returns

The best parameters.

Return type

dict

predict(x)

Perform classification on x

Parameters

x (array-like or sparse matrix of shape (n_samples, n_features)) – The samples to generate predictions for

Returns

The predicted class values

Return type

array-like of shape (n_samples)

predict_proba(x)

Estimates the probability of samples in X being phishing.

Parameters

x (array-like or sparse matrix of shape (n_samples, n_features)) – The samples to generate probabilities for

Returns

The probability of each test sample being phish

Return type

array-like of shape (n_samples)

load_model()

Loads the model from self.model_path

save_model()

Saves the model to self.model_path