henchman.learning.create_model¶
-
henchman.learning.
create_model
(X, y, model=None, metric=None, n_splits=1, split_size=0.3, _return_df=False)[source]¶ Make a model. Returns a scorelist and a fit model. A wrapper around a standard scoring workflow. Uses
train_test_split
unless otherwise specified (in which case it will useTimeSeriesSplit
).In this function we trade flexibility for ease of use. Unless you want this exact validation-fitting-scoring method, it’s recommended you just use the sklearn API.
Parameters: - X (pd.DataFrame) – A cleaned numeric feature matrix.
- y (pd.Series) – A column of labels.
- model – A sklearn model with fit and predict methods.
- metric – A metric which takes y_test, preds and returns a score.
- n_splits (int) – If 1 use a train_test_split. Otherwise use tssplit. Default value is 1.
- split_size (float) – Size of testing set. Default is .3.
- _return_df (bool) – If true, return (X_train, X_test, y_train, y_test) after returns. Not generally useful, but sometimes necessary.
Returns: A list of scores and a fit model.
Return type: (list[float], sklearn.ensemble)
Example
>>> from henchman.learning import create_model >>> import numpy as np >>> from sklearn.ensemble import RandomForestClassifier >>> from sklearn.metrics import roc_auc_score >>> scores, fit_model = create_model(X, y, ... RandomForestClassifier(), ... roc_auc_score, ... n_splits=5) >>> print('Average score of {:.2f}'.format(np.mean(scores)))