SAS STAT product has so many model tools to offer sometime one is confused which covers what
cases and data structure. Below is a summary diagram I took from a training course SAS offers.
Again a picture speaks volume. This diagram is two years old. I believe, 90%, stay the same since.
Some such as GENMOD and GLIMMIX may be considered to move to HP platform. And NLIN and MIXED already have their big data counterpart in SAS HPA's HPNLIN and HPMIXED
Saturday, April 13, 2013
Friday, April 12, 2013
SAS Clustering Solution Overview, just One Picture
More and more encounters and friends lately told me they see many SAS procedures that are related
to clustering, but not clear about interrelations among them (which one does what). From a training
course offered by SAS titled "Applied Clustering Techniques", I found a diagram that does a good
job explain it
As we often say, a picture is better a thousand words. Take a look
to clustering, but not clear about interrelations among them (which one does what). From a training
course offered by SAS titled "Applied Clustering Techniques", I found a diagram that does a good
job explain it
As we often say, a picture is better a thousand words. Take a look
SAS High Performance Text Mining: SAS HPTMINE
Currently there is one text mining procedure in SAS HPA, HPTMINE (experimental) which actually works fair well. This writing presents one working example.
The text file contains ~216K news entries, total file size ~384MB. The example runs on a Windows client with 16GB RAM.
"
proc HPTMINE data=doc2.news2;
doc_id id2; /*ID variable is required*/
variable description; /*listing multiple variables may cause confusion*/
parse outterms = doc2.out_terms_news reducef=2;
/*frequency for term filtering: minimum frequency of occurrence by which a term is dropped*/
/*nostermming entities= stop= start= multiterm= syn= termwgt= cellwgt= outchild= outterms=*/
/*all these options can be turned on and off. Weighting is important in tweaking process*/
svd k=10 outdocpro=doc2.docpro_news
/*this is critical math part in the whole exercise. In some cases you act on direct frequency*/
/*max_k*/
svds=doc2.news_svds
svdu=doc2.news_svdu /*left singular vector*/
svdv=doc2.news_svdv; /*right singular vector*/
/*tol= tolerance value for singular value*/
/*resolution =low|med|high
performance host="&GRIDHOST" install="&GRIDINSTALLLOC" details;*/
run ;
"
This procedure integrates several separate procedures available in regular SAS Text Miner, so as to reduce I/O traffic due to the separations. The advantage from this integration is more pronounced when the input text file is huge. This integration also is a logic centralization to happen before parallel computation is invoked to execute the job. This specific example is not executed on parallel nodes.
Below are some log details, less than 2 minutes for the operation
Below are screen shots of term probability table and term-frequency matrix. The mechanics of the whole operation is very intuitive. To get desired outcome often requires time-consuming tweaking. The upside is using all defaults could very well
The text file contains ~216K news entries, total file size ~384MB. The example runs on a Windows client with 16GB RAM.
"
proc HPTMINE data=doc2.news2;
doc_id id2; /*ID variable is required*/
variable description; /*listing multiple variables may cause confusion*/
parse outterms = doc2.out_terms_news reducef=2;
/*frequency for term filtering: minimum frequency of occurrence by which a term is dropped*/
/*nostermming entities= stop= start= multiterm= syn= termwgt= cellwgt= outchild= outterms=*/
/*all these options can be turned on and off. Weighting is important in tweaking process*/
svd k=10 outdocpro=doc2.docpro_news
/*this is critical math part in the whole exercise. In some cases you act on direct frequency*/
/*max_k*/
svds=doc2.news_svds
svdu=doc2.news_svdu /*left singular vector*/
svdv=doc2.news_svdv; /*right singular vector*/
/*tol= tolerance value for singular value*/
/*resolution =low|med|high
performance host="&GRIDHOST" install="&GRIDINSTALLLOC" details;*/
run ;
"
This procedure integrates several separate procedures available in regular SAS Text Miner, so as to reduce I/O traffic due to the separations. The advantage from this integration is more pronounced when the input text file is huge. This integration also is a logic centralization to happen before parallel computation is invoked to execute the job. This specific example is not executed on parallel nodes.
Below are some log details, less than 2 minutes for the operation
Below are screen shots of term probability table and term-frequency matrix. The mechanics of the whole operation is very intuitive. To get desired outcome often requires time-consuming tweaking. The upside is using all defaults could very well
Subscribe to:
Posts (Atom)