Project Title: Development of Logistic Regression Model
Primary Skill: Statistical Algorithm Development
Additional Skills: Statistical knowledge of predictive modeling algorithms for handling large data is essential.
We need professional to develop the following Predictive model for modeling large data coming from Banking, finance, Insurance, Telecommunication and such other fields.
Project Description: We need professional to develop the following Predictive model for modeling large data coming from Banking, finance, Insurance, Telecommunication and such other fields.
Multinomial/ Ordinal and conditional Logistic regression Model Feature
The set up for the Logistic Regression development is: One response variable of multinomial (which includes binary as a special case)/ordinal type ( for ex., Customer Satisfaction (CSAT) response measured as Highly Satisfied, Satisfied, Less satisfied, or Credit card defaulters/ non-defaulters etc. ) with several predictor variables (may be continuous or categorical).
Usually the data size is large and it will be stored in EXCEL or ASCII format. Assume that column represents a response variable while Rows indicate the cases.
Develop a statistical algorithm to fit the Multinomial Logistic regression model as well as corresponding JAVA implementation. Your algorithm should describe, among other things,
a. Computational procedure for computing the sum of squares matrix based on regressors and related intermediate statistical results. Provision to include interaction effects between predictors.
b. A suitable numerical method of estimation to estimate the parameters of the multinomial regression model along with convergence criterion. Your method should be able to handle large data.
c. Computations of the relevant statistics such as standard errors, Wald’s statistic, and standardized estimates along with their p-values , odds ratio and confidence interval of odds ratio and pseudo R square .
d. Stepwise method for selection of important variables in the model.
e. Final model with model validation measures such as K-fold cross-validation.
f. Test case results to demonstrate how the program behaves at extreme values, Invalid inputs etc. Your program should work correctly for large data. Include at least one example to demonstrate working of the program.
g. Develop a suitable JAVA code corresponding to the above procedures without using any of the third-party library functions.
Develop suitable algorithm and corresponding Java code to perform Steps (a) to (f) for Ordinal logistic regression without using any of the third-party library functions.
Develop a computation method and corresponding JAVA code for ROC computations and plot of ROC.
Time Line: Time to develop a statistical algorithm and its Java implementation should not exceed Two months.
Budget: $4000-$ 6000