This is a python based tool and can be used to check the sanity of the data before actually analyzing it. It has a comprehensive set of rules which can be configured pretty easily. Some of them are
- Missing values treatment using average, mean, median, mode, pruning
- Outliner Treatment using average, mean, median, mode, pruning
- Filter data based on complex rules e.g. all the data which has salary > 1000000 and age < 21 is invalid.
- Validate data for various specific formats e.g. Email, Phone, SSN, ZipCode, DrivingLicenseNo , Passport No, etc