This post illustrates several key aspects proc HPFOREST covers in modeling random forest.
Here is the SAS code (IF data elements look like yours, that is pure coincidence)
"
%macro hpforest(Vars=);
proc hpforest data=&indsn maxtrees=200 vars_to_try =&Vars. trainfraction=0.6;
target &targetx./level=binary;
input &input1/level=interval;
input &input2/level=nominal;
input &input3/level=ordinal;
ods output FitStatistics = fitstats_vars&Vars.(rename=(Miscoob=VarsToTry&Vars.));
run;
%mend;
%hpforest(vars=8);
data fitstats;
set fitstats_vars8;
rename Ntrees=Trees;
label VarsToTry8 = "Vars=8";
run;
proc sgplot data=fitstats;
title "Misclassification Rate for 200 Trees";
series x=Trees y=VarsToTry8/lineattrs=(Pattern=MediumDashDotDot Tickness=4 COlor=brown);
yaxis label='OOB Misclassification Rate';
run;
title;
"
Subject 1: Do more trees improve classification rate? The plot above shows the classification rate starts to peter to flat ~ 50 trees. After 100 trees, it definitely does not improve any more
Subject 2: "Loss Reduction Variable Importance Report" from random forest often does NOT tell a story about variable importance similar to what you get from other methods
Hi, It's very usefull code.
ReplyDeleteHow to score the test dataset
It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
ReplyDeleteNEET Coaching Classes