henchman.diagnostics.warnings¶
-
henchman.diagnostics.
warnings
(data, corr_thresh=0.9, missing_thresh=0.1, card_thresh=50)[source]¶ Warn about common dataset problems. Checks for duplicates, highly linearly correlated columns, columns with many missing values and categorical columns with many unique values.
Parameters: - data (pd.DataFrame) – The dataframe to warn about.
- corr_thresh (float) – Warn above this threshold (Default .9)
- missing_thresh (float) – Warn above this threshold (Default .1)
- card_thresh (int) – Warn above this threshold (Default 50).
Example
>>> from henchman.diagnostics import warnings >>> warnings(df, corr_thresh=.5)