Stay ahead by continuously learning and advancing your career. Learn More

SAS Certified Statistical Business Analyst Using SAS 9: Regression and Modeling Practice Exam

description

Bookmark Enrolled Intermediate

SAS Certified Statistical Business Analyst Using SAS 9: Regression and Modeling Practice Exam


The SAS Certified Statistical Business Analyst Using SAS 9: Regression and Modeling A00-240 exam validates your ability to use SAS 9 software for statistical analysis, specifically focusing on regression and modeling techniques. It's designed for individuals who want to leverage SAS 9 for data-driven decision making in business settings.


Who Should Take This Exam?

This exam is ideal for:

  • Business Analysts
  • Statisticians/Data Analysts
  • Those pursuing a career in Business Intelligence or Data Science


Exam Details 

  • Exam Code: A00-240
  • Exam Name: SAS Statistical Business Analysis Using SAS®9: Regression and Modeling Exam
  • Exam Languages: English
  • Exam Questions: 60 Questions
  • Time: 2 hours
  • Passing Score: 68%


Exam Course Outline 

The Exam covers the given topics  - 

Domain 1: Overview of ANOVA - 10%

Verify the assumptions of ANOVA

  • Explain the central limit theorem and when it must be applied
  • Examine the distribution of continuous variables (histogram, box -whisker, Q-Q plots)
  • Describe the effect of skewness on the normal distribution
  • Define H0, H1, Type I/II error, statistical power, p-value
  • Describe the effect of sample size on p-value and power
  • Interpret the results of hypothesis testing
  • Interpret histograms and normal probability charts
  • Draw conclusions about your data from histogram, box-whisker, and Q-Q plots
  • Identify the kinds of problems may be present in the data: (biased sample, outliers, extreme values)
  • For a given experiment, verify that the observations are independent
  • For a given experiment, verify the errors are normally distributed
  • Use the UNIVARIATE procedure to examine residuals
  • For a given experiment, verify all groups have equal response variance
  • Use the HOVTEST option of MEANS statement in PROC GLM to asses response variance


Analyze differences between population means using the GLM and TTEST procedures

  • Use the GLM Procedure to perform ANOVA
  • Evaluate the null hypothesis using the output of the GLM procedure
  • Interpret the statistical output of the GLM procedure (variance derived from MSE, F value, p-value R**2, Levene's test)
  • Interpret the graphical output of the GLM procedure
  • Use the TTEST Procedure to compare means


Perform ANOVA post hoc test to evaluate treatment effect

  • Use the LSMEANS statement in the GLM or PLM procedure to perform pairwise comparisons
  • Use PDIFF option of LSMEANS statement
  • Use ADJUST option of the LSMEANS statement (TUKEY and DUNNETT)
  • Interpret diffograms to evaluate pairwise comparisons
  • Interpret control plots to evaluate pairwise comparisons
  • Compare/Contrast use of pairwise T-Tests, Tukey and Dunnett comparison methods


Detect and analyze interactions between factors

  • Use the GLM procedure to produce reports that will help determine the significance of the interaction between factors. MODEL statement
  • LSMEANS with SLICE=option (Also using PROC PLM)
  • ODS SELECT
  • Interpret the output of the GLM procedure to identify interaction between factors: p-value
  • F Value
  • R Squared
  • TYPE I SS
  • TYPE III SS


Domain 2: Understand Linear Regression - 20%

Fit a multiple linear regression model using the REG and GLM procedures

  • Use the REG procedure to fit a multiple linear regression model
  • Use the GLM procedure to fit a multiple linear regression model


Analyze the output of the REG, PLM, and GLM procedures for multiple linear regression models

  • Interpret REG or GLM procedure output for a multiple linear regression model: convert models to algebraic expressions
  • Convert models to algebraic expressions
  • Identify missing degrees of freedom
  • Identify variance due to model/error, and total variance
  • Calculate a missing F value
  • Identify variable with largest impact to model
  • For output from two models, identify which model is better
  • Identify how much of the variation in the dependent variable is explained by the model
  • Conclusions that can be drawn from REG, GLM, or PLM output: (about H0, model quality, graphics)


Use the REG or GLMSELECT procedure to perform model selection

  • Use the SELECTION option of the model statement in the GLMSELECT procedure
  • Compare the differentmodel selection methods (STEPWISE, FORWARD, BACKWARD)
  • Enable ODS graphics to display graphs from the REG or GLMSELECT procedure
  • Identify best models by examining the graphical output (fit criterion from the REG or GLMSELECT procedure)
  • Assign names to models in the REG procedure (multiple model statements)


Assess the validity of a given regression model through the use of diagnostic and residual analysis

  • Explain the assumptions for linear regression
  • From a set of residuals plots, asses which assumption about the error terms has been violated
  • Use REG procedure MODEL statement options to identify influential observations (Student Residuals, Cook's D, DFFITS, DFBETAS)
  • Explain options for handling influential observations
  • Identify collinearity problems by examining REG procedure output
  • Use MODEL statement options to diagnose collinearity problems (VIF, COLLIN, COLLINOINT)


Domain 3: Logistic Regression - 25%

Perform logistic regression with the LOGISTIC procedure

  • Identify experiments that require analysis via logistic regression
  • Identify logistic regression assumptions
  • logistic regression concepts (log odds, logit transformation, sigmoidal relationship between p and X)
  • Use the LOGISTIC procedure to fit a binary logistic regression model (MODEL and CLASS statements)


Optimize model performance through input selection

  • Use the LOGISTIC procedure to fit a multiple logistic regression model
  • LOGISTIC procedure SELECTION=SCORE option
  • Perform Model Selection (STEPWISE, FORWARD, BACKWARD) within the LOGISTIC procedure


Interpret the output of the LOGISTIC procedure

  • Interpret the output from the LOGISTIC procedure for binary logistic regression models: Model Convergence section
  • Testing Global Null Hypothesis table
  • Type 3 Analysis of Effects table
  • Analysis of Maximum Likelihood Estimates table
  • Association of Predicted Probabilities and Observed Responses


Score new data sets using the LOGISTIC and PLM procedures

  • Use the SCORE statement in the PLM procedure to score new cases
  • Use the CODE statement in PROC LOGISTIC to score new data
  • Describe when you would use the SCORE statement vs the CODE statement in PROC LOGISTIC
  • Use the INMODEL/OUTMODEL options in PROC LOGISTIC
  • Explain how to score new data when you have developed a model from a biased sample


Domain 4: Prepare Inputs for Predictive Model Performance - 20%

Identify the potential challenges when preparing input data for a model

  • Identify problems that missing values can cause in creating predictive models and scoring new data sets
  • Identify limitations of Complete Case Analysis
  • Explain problems caused by categorical variables with numerous levels
  • Discuss the problem of redundant variables
  • Discuss the problem of irrelevant and redundant variables
  • Discuss the non-linearities and the problems they create in predictive models
  • Discuss outliers and the problems they create in predictive models
  • Describe quasi-complete separation
  • Discuss the effect of interactions
  • Determine when it is necessary to oversample data


Use the DATA step to manipulate data with loops, arrays, conditional statements and functions

  • Use ARRAYs to create missing indicators
  • Use ARRAYS, LOOP, IF, and explicit OUTPUT statements


Improve the predictive power of categorical inputs

  • Reduce the number of levels of a categorical variable
  • Explain thresholding
  • Explain Greenacre's method
  • Cluster the levels of a categorical variable via Greenacre's method using the CLUSTER procedure
  • Convert categorical variables to continuous using smooth weight of evidence


Screen variables for irrelevance and non-linear association using the CORR procedure

  • Explain how Hoeffding's D and Spearman statistics can be used to find irrelevant variables and non-linear associations
  • Produce Spearman and Hoeffding's D statistic using the CORR procedure (VAR, WITH statement)
  • Interpret a scatter plot of Hoeffding's D and Spearman statistic to identify irrelevant variables and non-linear associations


Screen variables for non-linearity using empirical logit plots

  • Use the RANK procedure to bin continuous input variables (GROUPS=, OUT= option; VAR, RANK statements)
  • Interpret RANK procedure output
  • Use the MEANS procedure to calculate the sum and means for the target cases and total events (NWAY option; CLASS, VAR, OUTPUT statements)
  • Create empirical logit plots with the SGPLOT procedure
  • Interpret empirical logit plots


Domain 5: Measure Model Performance - 25%

Apply the principles of honest assessment to model performance measurement

  • Explain techniques to honestly assess classifier performance
  • Explain overfitting
  • Explain differences between validation and test data
  • Identify the impact of performing data preparation before data is split


Assess classifier performance using the confusion matrix

  • Explain the confusion matrix
  • Define: Accuracy, Error Rate, Sensitivity, Specificity, PV+, PV-
  • Explain the effect of oversampling on the confusion matrix
  • Adjust the confusion matrix for oversampling


Model selection and validation using training and validation data

  • Divide data into training and validation data sets using the SURVEYSELECT procedure
  • Discuss the subset selection methods available in PROC LOGISTIC
  • Discuss methods to determine interactions (forward selection, with bar and @ notation)
  • Create interaction plot with the results from PROC LOGISTIC
  • Select the model with fit statistics (BIC, AIC, KS, Brier score)


Create and interpret graphs (ROC, lift, and gains charts) for model comparison and selection

  • Explain and interpret charts (ROC, Lift, Gains)
  • Create a ROC curve (OUTROC option of the SCORE statement in the LOGISTIC procedure)
  • Use the ROC and ROCCONTRAST statements to create an overlay plot of ROC curves for two or more models
  • Explain the concept of depth as it relates to the gains chart


Establish effective decision cut-off values for scoring

  • Illustrate a decision rule that maximizes the expected profit
  • Explain the profit matrix and how to use it to estimate the profit per scored customer
  • Calculate decision cutoffs using Bayes rule, given a profit matrix
  • Determine optimum cutoff values from profit plots
  • Given a profit matrix, and model results, determine the model with the highest average profit 


Reviews

Be the first to write a review for this product.

Write a review

Note: HTML is not translated!
Bad           Good