# R in action读书笔记（9）-第八章：回归 -回归诊断

8.3回归诊断

> fit<-lm(weight~height,data=women)

> par(mfrow=c(2,2))

> plot(fit)

8.3.2改进的方法

qqPlot() 分位数比较图

durbinWatsonTest()对误差自相关性做Durbin-Watson检验

crPlots()成分与残差图

ncvTest()对非恒定的误差方差做得分检验

outlierTest()Bonferroni离群点检验

avPlots()添加的变量图形

inluencePlot()回归影响图

scatterplot()增强的散点图

scatterplotMatrix()增强的散点图矩阵

vif()方差膨胀因子

1.正态性

Eg:

> library(car)

> states=data.frame(state.region,state.x77)

> fit<-lm(Murder~Population+Illiteracy+Income+Frost,data=states)

> qqPlot(fit,labels=row.names(states),id.method="identify",simulate=TRUE,main="Q-QPlot")

residplot<-function(fit,nbreaks=10){

z<-rstudent(fit)

hist(z,breaks=nbreaks,freq=FALSE,

xlab="Studentized Residual",

main="Distribution of Errors")

rug(jitter(z),col="brown")

curve(dnorm(x,mean=mean(z),sd=sd(z)),

legend("topright",legend=c("NormalCurve","Kernel Density Curve"),

lty=1:2,col=c("blue","red"),cex=.7)

}

residplot(fit)

2.误差的独立性

car包提供了一个可做Durbin-Watson检验的函数，能够检测误差的序列相关性。

> durbinWatsonTest(fit)

lagAutocorrelation D-W Statistic p-value

1 -0.2006929 2.317691 0.284

Alternative hypothesis: rho != 0

3. 线性

4. 同方差性

> library(car)

> ncvTest(fit)

Non-constant Variance Score Test

Variance formula: ~ fitted.values

Chisquare = 1.746514 Df = 1 p = 0.1863156

Suggested power transformation: 1.209626

8.3.3 线性模型假设的综合验证

gvlma包中的gvlma()函数

> library(gvlma)

> gvmodel<-gvlma(fit)

> summary(gvmodel)

Call:

lm(formula = Murder ~Population + Illiteracy + Income + Frost,

data = states)

Residuals:

Min 1Q Median 3Q Max

-4.7960 -1.6495-0.0811 1.4815 7.6210

Coefficients:

Estimate Std. Error t valuePr(>|t|)

(Intercept)1.235e+00 3.866e+00 0.319 0.7510

Population 2.237e-04 9.052e-05 2.471 0.0173 *

Illiteracy 4.143e+00 8.744e-01 4.738 2.19e-05 ***

Income 6.442e-05 6.837e-04 0.094 0.9253

Frost 5.813e-04 1.005e-02 0.058 0.9541

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’1

Residual standarderror: 2.535 on 45 degrees of freedom

F-statistic: 14.73 on 4and 45 DF, p-value: 9.133e-08

ASSESSMENT OF THELINEAR MODEL ASSUMPTIONS

USING THE GLOBAL TESTON 4 DEGREES-OF-FREEDOM:

Level of Significance= 0.05

Call:

gvlma(x = fit)

Value p-value Decision

Global Stat 2.7728 0.5965 Assumptions acceptable.

Skewness 1.5374 0.2150 Assumptions acceptable.

Kurtosis 0.6376 0.4246 Assumptions acceptable.

Link Function 0.1154 0.7341 Assumptions acceptable.

Heteroscedasticity0.4824 0.4873 Assumptions acceptable.

8.3.4 多重共线性

> library(car)

> vif(fit)

PopulationIlliteracy Income Frost

1.245282 2.165848 1.345822 2.082547

> sqrt(vif(fit))>2

PopulationIlliteracy Income Frost

FALSE FALSE FALSE FALSE