Saturday, February 9, 2013

Moving Logistic Regression toward Big, Complex Data:SAS HPLOGISTIC Optimization Techniques

In August 2012, SAS Institute released its first version of high performance based analytics server (HPAS12.1). Throughout the ~25 HP procedures released, new designs and changes are consistently made to help users better meet today's big data challenges in predictive modeling in terms of more efficient and finer algorithms, among others.

Proc Logistic is one of the most popular and widely used procedures in SAS products for logistic regression model building. To cover all major big data facilities in the new Proc HPLOGSITIC likely will yield a big paper. This writing focuses on one key aspect, the optimization techniques for maximum likelihood estimation and includes some excerpts from HPLOGISTIC user guide where presentation / explanation is the best

Under Proc Logistic, the default optimization technique is Fisher Scoring. One can change it to Newton Raphson (NF). One can also set Ridge option to Absolute or Relative. All is done under Model Statement. Under Proc HPLOGISTIC, Fisher Scoring disappears entirely. The default optimization technique is set at Newton Raphson Ridge, or NRRIDE. The table lists all the options

While in practice (and in theories) there is little consensus as to which option fits what data conditions, HPLOGISTIC's user guide provides excellent guideline

          "For many optimization problems, computing the gradient takes more computer time than computing the function value. Computing the Hessian sometimes takes much more computer time and memory than computing the gradient, especially when there are many decision variables. Unfortunately, optimization techniques that do not use some kind of Hessian approximation usually require many more iterations than techniques that do use a Hessian matrix, and, as a result the total run time of these techniques is often longer. Techniques that do not use the Hessian also tend to be less reliable. For example, they can terminate more ".

Time taken to computer gradient, function value, Hessian (where applicable), number of decision variables involved are among the key choice factors
  1. Second-derivative methods include TRUREG, NEWRAP, and NRRIDG (best for small problems for which the Hessian matrix is not expensive to compute. This does  not necessarily say calculating Hessian matrix, for small problems or not, is not expensive. 'Small problems' still vary a lot)
  2. If you want to replicate your old model where Fisher Scoring is used, you can use NRRIDG. Where your target is binary, you may get identical results. Otherwise results may be slightly different (mainly estimation coefficients)
  3. First-derivative methods include QUANEW and DBLDOG (best for medium-sized problems for which the objective function and the gradient can be evaluated much faster than the Hessian). In general, the QUANEW and DBLDOG algorithms require more iterations than the Second-derivative methods above, but each iteration can be much faster. The QUANEW and DBLDOG algorithms require only the gradient to update an approximate Hessian, and they require slightly less memory than TRUREG or NEWRAP.
  4. Because CONGRA requires only a factor of p double-word memory, many large applications can be solved only by CONGRA. However, I personally feel the computational beauty of CONGRA may actually be overstated a bit
  5. All these insights and guidelines are of course to be vetted and reckoned with other key aspects such as selection criteria (selection in HPLOGISTIC, by the way , has become a separate statement under the procedure, unlike Proc Logistic where Selection is a Model statement option)
While SAS remains the very best, strongest statistical powerhouse, the big data orientation and embedment in its HPAS release, hopefully manifested by this writing, has demonstrated its leading position in commerical machine learning solution in tackling big data; SAS is very computational today. New HP procedures like HPLOGISTIC require the modeler to be very sensitive and conscious of data conditions, complexities and residuals in the model universe on hand. The ultimate value of SAS HPAS, like many other SAS solutions and tools, lies in its productivity implication: You don't build anything from ground zero. You don't even write a line of code.

My next writing on SAS logistic regression will cover selection criteria.

1 comment: