Monday, December 31, 2012

Random Forest Modeling in SAS, Several Key Aspects

In August 2012, SAS Institute had Release 12.1. One major modeling facility added to its machine learning and data science portfolio is random forest. In SAS High-Performance Analytics Server 12.1, or the procedure place, proc HPFOREST does the job. In SAS Enterprise Miner, HP FOREST node is where random forests can be built.

This post illustrates several key aspects proc HPFOREST covers in modeling random forest.

Here is the SAS code (IF data elements look like yours, that is pure coincidence)

"
%macro hpforest(Vars=);
proc hpforest data=&indsn maxtrees=200 vars_to_try =&Vars. trainfraction=0.6;
  target &targetx./level=binary;
  input &input1/level=interval;
  input &input2/level=nominal;
  input &input3/level=ordinal;
  ods output FitStatistics = fitstats_vars&Vars.(rename=(Miscoob=VarsToTry&Vars.));
run;
%mend;

%hpforest(vars=8);

data fitstats;
   set fitstats_vars8;
   rename Ntrees=Trees;
   label VarsToTry8   = "Vars=8";
run;


proc sgplot data=fitstats;
   title "Misclassification Rate for 200 Trees";
   series x=Trees y=VarsToTry8/lineattrs=(Pattern=MediumDashDotDot Tickness=4 COlor=brown);
yaxis label='OOB Misclassification Rate';
run;
title;

"


Subject 1: Do more trees improve classification rate? The plot above shows the classification rate starts to peter to flat ~ 50 trees. After 100 trees, it definitely does not improve any more

Subject 2: "Loss Reduction Variable Importance Report" from random forest often does NOT tell a story about variable importance similar to what you get from other methods



 
 
 
Subject 3: Random Forest Fit Statistics, the Out-of-Bag tree steps
 












7 comments:

  1. Hi, It's very usefull code.

    How to score the test dataset

    ReplyDelete
  2. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
    NEET Coaching Classes

    ReplyDelete
  3. Boost your career with our salesforce admin course
    designed to equip you with hands-on skills in CRM management, automation, and reporting. Learn from industry experts and become a certified Salesforce Administrator to drive business success.

    ReplyDelete
  4. Enhance your data visualization skills with our comprehensive tableau developer training
    designed for both beginners and professionals. Learn to create interactive dashboards, generate insightful reports, and advance your career in business intelligence.

    ReplyDelete
  5. Boost your analytics skills with our power bi online training
    designed to help you master data visualization and business intelligence from anywhere. Gain hands-on experience and transform raw data into actionable insights through expert-led Power BI online training sessions.

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. Boost your career with our salesforce development course
    designed to teach you Apex, Visualforce, and Lightning components from scratch. Master real-world projects and become a certified Salesforce developer with hands-on training in our Salesforce development course.

    ReplyDelete