Logistic Regression

Haas School of Business
University of California, Berkeley
Professor Florian Zettelmeyer prepared this case to provide material for class discussion rather than to illustrate either
effective or ineffective handling of a business situation. This case is based on series of cases written by Professor
Charlotte Mason. Names and data may have been disguised to assure confidentiality. The assistance of the Direct
Marketing Educational Foundation in supplying the data used for this case is gratefully acknowledged.
Copyright ? 2005 by Charlotte Mason and Florian Zettelmeyer.
Predicting Response at BookBinders: Next Product To Buy
Dave Lawton, BookBinders’ marketing director, has just received a new dataset which contains the
responses of a random sample of 50,000 customers to three offers that were made to consumers
in rapid succession. Consumers were first sent the an offering in the art category,
titled “The Art History of Florence.” One week later, consumers were sent an offering in the do-ityourself
category titled “Painting Like a Pro.” Another week later, consumers received an offering in
the cooking category titled “Vegetarian Cooking for Everyone.”
The database contains the variables from the familiar database “BBB.dta” supplemented with
variables which record the response of each customer to the three offers (the new database is
called BBB_NPTB.dta). For each of the three offers the revenue and cost structure is identical:
Selling price (shipping included): $18.00
Wholesale price paid by BookBinders: $9.00
Shipping costs: $3.00
The cost of mailing an offer is $0.5. It is not economical to combine two or three offers in one
mailing since the service company that executes the mailing charges a large premium if more than
one insert is contained in an envelope. Hence, if Dave wanted to send more than one offer to
consumers, he would have to send the offers in separate mailings, each of which cost $0.5.
Dave Lawton has asked you to think about exploiting the new database with multiple offers to
increase profits. You plan to use a series of logistic regressions to build a response model. The
response model’s results will be used to “score” the 500,000 remaining customers in the database
and select customers for the full mailing campaign.
Page 2
Part I: Logistic Regression
You decide to estimate three logistic regression models using “buy_art”, “buy_diy”, and
“buy_cook”, respectively as the dependent variable and the following as predictor variables:
1. What is the expected profit from each offer if you apply the break-even rule for each product
mailing? Remember, you are calculating expected profits for applying the break-even rule
to the remaining 500,000 consumers in the database, not to the random sample.
(Suggestion: Name the predicted purchase probabilities “prob_art”, “prob_diy”, and
“prob_cok”. Name the dummies which specify which customer to mail “mail_art”, “mail_diy”,
and “mail_cok”).
2. How many of the 500,000 consumers should be mailed one offer? two offers? three offers?
3. How much overlap is there in customer interests? What explains this overlap? Provide
evidence from the odds ratios in the three logistic regressions.
Part II: Marketing With a Limited Budget
You find out that the marketing budget has been cut and that only $50,000 remains allocated to
this project. This means that you can only send out 100,000 offers. (Hint: you may make all
calculations assuming that your marketing budget permits you to target 10,000 consumers in
the random sample and then multiply the predicted profits by 10.)
Dave Lawton tells you that on the basis of your profitability calculations for the three products,
one product seems to stands out. He suggests that you focus on only that product and use the
targeting from your logistic analysis to send one offer for that product to each of 100,000
1. For each offer, what is the average predicted purchase probability for the “best” quintile, i.e.
the best 10,000 consumers?
(Name the quintile categorical variables “quin_art”, “quin_diy”, and “quin_cok”).
2. Which product do you pick and how profitable do you expect the offer to be? How much
more profitable is offering this product compared to either of the other two products?
Technical Note:
purch is excluded from the set of predictor
variables – including it will lead to perfect
collinearity since purch (the number of books
purchased) is equal to the sum of the number of
books purchased in the 7 categories. By including
the number of purchases in each category, there
is no need to include the total number of
Page 3
Part III: Next Product To Buy Model
You quickly realize that the approach suggested by your boss is not optimal. Instead you
suggest to your boss to construct a “Next Product To Buy” (NPTB) model. You decide to use
the logistic regression approach to select the best product to offer.
1. For each customer, determine the maximum of the purchase probabilities for the three
products (Name this variable “max_prob”).
What is the average predicted purchase probability for the “best” quintile? (Name the
quintile categorical variable “quin_max”.)
Compare this number to results in II.1 (part II, question 1). Does this make sense, and if so
2. Calculate how many of the best 100,000 consumers should receive the art, the do-ityourself,
and the cooking book offer, respectively. (Hint: construct a categorical variable
“bestprod” which is “1” if the maximum purchase probability is for the art book, “2” if the
maximum purchase probability is for the do-it-yourself book, and “3” if the maximum
purchase probability is for the cooking book.)
3. Calculate the expected profits from sending each of the 100,000 customers the offer that is
best for them.
4. Compare the profit calculated in III.3 with the profit from the procedure Dave Lawton
suggested (which you calculated in II.2). Dave is so impressed with your skills that he
decides to give you a bonus equal to 20% of the incremental profits you generated. How
much bonus money do you make from having taken “Database Marketing & CRM”?

follow the instructions in the case. Do not include superfluous information (in particular, please do not paste the entire SPSS output from the logistic regressions), use your judgment what is relevant and what not.

In Part III.2 you are asked to construct a categorical variable bestprod = 1 if maxprob is for art books, = 2 for DYI, and =3 for cookbook. To do that, go to Transform > Compute variable, and define a value for bestprod (say =1) and go to the bottom and under include case, specify the condition (e.g. if maxprob=prob_art), repeat for the remaining values, clicking on “change variable” – yes, when prompted.


  • Buy an assignment from
  • us today and save 22%!
  • Enter the discount code: CPH22