Wednesday, January 23, 2013

Machine Learning Using SAS Enterprise Miner: A Basic Comparison Example

        This writing is to show how one can leverage SAS Enterprise Miner 12.1(“EM”), released August 2012, to build large number of leading machine learning models in short amount of time, by point-n-click. The comparison shown is mainly to organize built models, not to support any conclusion about the strengh of the methods.  The selected data set has ~40K observations, with 12 predictor variables. The binary target variable ATTRITE has ~16%=1 (The data set is from a published data mining book. Forgot which one it is from). 

The following screen shot shows 16 models are built (3 logistic regressions, 2 neural nets, 2 random forests, 1 memory-based reason (K nearest neighbor), 1 decision trees, 2 stochastci gradient boosting, 1 LARS regression and 4 SVM models)













Below is comparison details of the 16 models










 
The two random forest models stand above the rest in misclassification rate and KS. Notice
  1.  these models are built without much EDA (exploratory data analysis) work.
  2. A traditional decision tree is not far behind
  3. Neither of MBR, Boosting, SVM and NN does very well due to the fact there are only a dozen input variables. However, random forest still outshines them using few variables
  4. Logistic regression (the two HPREG models) models perform low probably due to the default cutoff selection as well
I like Enterprise Miner because I can load and set up large number of models (sometimes >100) quickly, easily tweak and manage their subtle differences, and pick the one that fits my domain business the best. Model lineage and knowledge sharing are other two reasons.