Data Quality Engine

Python based tool and can be used to check the sanity of the data before actual analyzing it

This is a python based tool and can be used to check the sanity of the data before actually analyzing it. It has a comprehensive set of rules which can be configured pretty easily. Some of them are

  • Missing values treatment using average, mean, median, mode, pruning
  • Outliner Treatment using average, mean, median, mode, pruning
  • Filter data based on complex rules e.g. all the data which has salary > 1000000 and age < 21 is invalid.
  • Validate data for various specific formats e.g. Email, Phone, SSN, ZipCode, DrivingLicenseNo , Passport No, etc