henchman.diagnostics.warnings

henchman.diagnostics.warnings(data, corr_thresh=0.9, missing_thresh=0.1, card_thresh=50)[source]

Warn about common dataset problems. Checks for duplicates, highly linearly correlated columns, columns with many missing values and categorical columns with many unique values.

Parameters:
  • data (pd.DataFrame) – The dataframe to warn about.
  • corr_thresh (float) – Warn above this threshold (Default .9)
  • missing_thresh (float) – Warn above this threshold (Default .1)
  • card_thresh (int) – Warn above this threshold (Default 50).

Example

>>> from henchman.diagnostics import warnings
>>> warnings(df, corr_thresh=.5)