Question 1:

When mean imputation is performed on data after the data is partitioned for honest assessment, what is the most appropriate method for handling the mean imputation?

A. The sample means from the validation data set are applied to the training and test data sets.

B. The sample means from the training data set are applied to the validation and test data sets.

C. The sample means from the test data set are applied to the training and validation data sets.

D. The sample means from each partition of the data are applied to their own partition.

Correct Answer: B

Question 2:

In partitioning data for model assessment, which sampling methods are acceptable? (Choose two.)

A. Simple random sampling without replacement

B. Simple random sampling with replacement

C. Stratified random sampling without replacement

D. Sequential random sampling with replacement

Correct Answer: AC

Question 3:

Which SAS program will divide the original data set into 60% training and 40% validation data sets, stratified by county?

A. Option A

B. Option B

C. Option C

D. Option D

Correct Answer: C

Question 4:

Refer to the lift chart:

What does the reference line at lift = 1 corresponds to?

A. The predicted lift for the best 50% of validation data cases

B. The predicted lift if the entire population is scored as event cases

C. The predicted lift if none of the population are scored as event cases

D. The predicted lift if 50% of the population are randomly scored as event cases

Correct Answer: B

Question 5:

Suppose training data are oversampled in the event group to make the number of events and non-events roughly equal. A logistic regression is run and the probabilities are output to a data set NEW and given the variable name PE. A decision rule considered is, “Classify data as an event if probability is greater than 0.5.” Also the data set NEW contains a variable TG that indicates whether there is an event (1=Event, 0= No event).

The following SAS program was used.

What does this program calculate?

A. Depth

B. Sensitivity

C. Specificity

D. Positive predictive value

Correct Answer: B

A00-240 PDF DumpsA00-240 Practice TestA00-240 Study Guide

Question 6:

Assume a $10 cost for soliciting a non-responder and a $200 profit for soliciting a responder. The logistic regression model gives a probability score named P_R on a SAS data set called VALID. The VALID data set contains the responder variable Pinch, a 1/0 variable coded as 1 for responder. Customers will be solicited when their probability score is more than 0.05.

Which SAS program computes the profit for each customer in the data set VALID?

A. Option A

B. Option B

C. Option C

D. Option D

Correct Answer: A

Question 7:

Refer to the exhibit:

Based upon the comparative ROC plot for two competing models, which is the champion model and why?

A. Candidate 1, because the area outside the curve is greater

B. Candidate 2, because the area under the curve is greater

C. Candidate 1, because it is closer to the diagonal reference curve

D. Candidate 2, because it shows less over fit than Candidate 1

Correct Answer: B

Question 8:

A marketing campaign will send brochures describing an expensive product to a set of customers. The cost for mailing and production per customer is $50. The company makes $500 revenue for each sale. What is the profit matrix for a typical person in the population?

A. Option A

B. Option B

C. Option C

D. Option D

Correct Answer: C

Question 9:

This question will ask you to provide missing code segments.

A logistic regression model was fit on a data set where 40% of the outcomes were events (TARGET=1) and 60% were non-events (TARGET=0). The analyst knows that the population where the model will be deployed has 5% events and 95% non-events. The analyst also knows that the company\’s profit margin for correctly targeted events is nine times higher than the company\’s loss for incorrectly targeted non-event.

Given the following SAS program:

What X and Y values should be added to the program to correctly score the data?

A. X=40, Y=10

B. X=.05, Y=10

C. X=.05, Y=.40

D. X=.10.Y=05

Correct Answer: B

Question 10:

A company has branch offices in eight regions. Customers within each region are classified as either “High Value” or “Medium Value” and are coded using the variable name VALUE. In the last year, the total amount of purchases per customer is used as the response variable.

Suppose there is a significant interaction between REGION and VALUE. What can you conclude?

A. More high value customers are found in some regions than others.

B. The difference between average purchases for medium and high value customers depends on the region.

C. Regions with higher average purchases have more high value customers.

D. Regions with higher average purchases have more medium value customers.

Correct Answer: B

Author: CertBus