统计建模与R软件(薛毅)第九章答案文档格式.doc
- 文档编号:840453
- 上传时间:2023-04-29
- 格式:DOC
- 页数:12
- 大小:150KB
统计建模与R软件(薛毅)第九章答案文档格式.doc
《统计建模与R软件(薛毅)第九章答案文档格式.doc》由会员分享,可在线阅读,更多相关《统计建模与R软件(薛毅)第九章答案文档格式.doc(12页珍藏版)》请在冰点文库上搜索。
8
第二主成分:
第三主成分:
第四主成分:
表(26)各行业按主成分得分进行排序结果
图(21)主成分碎石图
图(22)第一主成分与第二主成分下的散点图
习题程序与结论:
>
industry<
-data.frame(
+X1=c(90342,4903,6735,49454,139190,12215,2372,11062,17111,1206,2150,5251,14341),
+X2=c(52455,1973,21139,36241,203505,16219,6572,23078,23907,3930,5704,6155,13203),
+X3=c(101091,2035,3767,81557,215898,10351,8103,54935,52108,6126,6200,10383,19396),
+X4=c(19272,10313,1780,22504,10609,6382,12329,23804,21796,15586,10870,16875,14691),
+X5=c(82.0,34.2,36.1,98.1,93.2,62.5,184.4,370.4,221.5,330.4,184.2,146.4,94.6),
+X6=c(16.1,7.1,8.2,25.9,12.6,8.7,22.2,41.0,21.5,29.5,12.0,27.5,17.8),
+X7=c(197435,592077,726396,348226,139572,145818,20921,65486,63806,1840,8913,78796,6354),
+X8=c(0.172,0.003,0.003,0.985,0.628,0.066,0.152,0.263,0.276,0.437,0.274,0.151,1.574))
industry.pr<
-princomp(industry,cor=T)
summary(industry.pr)####做主成分分析,得到4个主成分,累积贡献率达94.68%
Importanceofcomponents:
Comp.1Comp.2Comp.3Comp.4Comp.5
Standarddeviation1.76207621.70218730.96447680.801325320.55143824
ProportionofVariance0.38811410.36218020.11627690.080265280.03801052
CumulativeProportion0.38811410.75029430.86657120.946836490.98484701
Comp.6Comp.7Comp.8
Standarddeviation0.294274970.1794000620.0494143207
ProportionofVariance0.010824720.0040230480.0003052219
CumulativeProportion0.995671730.9996947781.0000000000
load<
-loadings(industry.pr)####求出载荷矩阵
load
Loadings:
Comp.1Comp.2Comp.3Comp.4Comp.5Comp.6Comp.7Comp.8
X1-0.477-0.296-0.1040.1840.7580.245
X2-0.473-0.278-0.163-0.174-0.305-0.5180.527
X3-0.424-0.378-0.156-0.174-0.781
X40.213-0.4510.5160.5390.288-0.2490.220
X50.388-0.331-0.321-0.199-0.4500.5820.233
X60.352-0.403-0.1450.279-0.317-0.714
X7-0.2150.377-0.1400.758-0.4180.194
X8-0.2730.891-0.3220.122
Comp.1Comp.2Comp.3Comp.4Comp.5Comp.6Comp.7Comp.8
SSloadings1.0001.0001.0001.0001.0001.0001.0001.000
ProportionVar0.1250.1250.1250.1250.1250.1250.1250.125
CumulativeVar0.1250.2500.3750.5000.6250.7500.8751.000
plot(load[,1:
2])
text(load[,1],load[,2],adj=c(-0.4,-0.3))
screeplot(industry.pr,npcs=4,type="
lines"
)####得出主成分的碎石图
biplot(industry.pr)####得出在第一,第二主成分之下的散点图
p<
-predict(industry.pr)####预测数据,讲预测值放入p中
order(p[,1]);
order(p[,2]);
order(p[,3]);
order(p[,4]);
####将预测值分别以第一,第二,第三,第四主成分进行排序
[1]51324613119712108
[1]58491011312711623
[1]81539127102611413
[1]11657101312918324
kmeans(scale(p),4)####将预测值进行标准化,并分为4类
K-meansclusteringwith4clustersofsizes5,1,4,3
Clustermeans:
Comp.1Comp.2Comp.3Comp.4Comp.5Comp.6
10.5132590-0.03438438-0.3405983-0.51300310.23551510.22441040
2-2.5699693-1.32913757-0.4848689-0.9460127-0.9000187-0.06497950
30.23815810.72871986-0.29959180.3126036-0.4744091-0.19709710
4-0.3163193-0.471273331.12874260.75353800.5400265-0.08956137
Comp.7Comp.8
1-0.38197798-0.7474855
2-0.675002090.4569548
30.090630690.9826915
40.74078975-0.2167643
Clusteringvector:
[1]4334211113134
Withinclustersumofsquaresbycluster:
[1]19.411370.0000024.4950416.61172
(between_SS/total_SS=37.0%)
Availablecomponents:
[1]"
cluster"
"
centers"
totss"
"
withinss"
"
tot.withinss"
[6]"
betweenss"
"
size"
9.2
####用数据框的形式输入数据
sale<
X1=c(82.9,88.0,99.9,105.3,117.7,131.0,148.2,161.8,174.2,184.7),
X2=c(92,93,96,94,100,101,105,112,112,112),
X3=c(17.1,21.3,25.1,29.0,34.0,40.0,44.0,49.0,51.0,53.0),
X4=c(94,96,97,97,100,101,104,109,111,111),
Y=c(8.4,9.6,10.4,11.4,12.2,14.2,15.8,17.9,19.6,20.8)
)
####作线性回归
lm.sol<
-lm(Y~X1+X2+X3+X4,data=sale)
summary(lm.sol)
显示结果
Call:
lm(formula=Y~X1+X2+X3+X4,data=sale)
Residuals:
1234567
0.0248030.0794760.012381-0.007025-0.2883450.216090-0.142085
8910
0.158360-0.1359640.082310
Coefficients:
EstimateStd.ErrortvaluePr(>
|t|)
(Intercept)-17.667685.94360-2.9730.03107*
X10.090060.020954.2980.00773**
X2-0.231320.07132-3.2430.02287*
X30.018060.039070.4620.66328
X40.420750.118473.5520.01636*
---
Signif.codes:
0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1
Residualstandarderror:
0.2037on5degreesoffreedom
MultipleR-squared:
0.9988,AdjustedR-squared:
0.9978
F-statistic:
1021on4and5DF,p-value:
1.827e-07
模型通过t检验和F检验,因此回归方程为:
Y=-17.66768+0.09006X1-0.23132X2+0.01806X3+0.42075X4Y是销售量,X1是居民可支配收入X2是该类消费品平均价格指数,X1和X2越高Y越高这与实际情况不符,原因是4个变量存在多重共线性,对变量作主成分回归,先作主成分分析。
####作主成分分析
sale.pr<
-princomp(~X1+X2+X3+X4,data=sale,cor=TRUE)
summary(sale.pr,loadings=TRUE)
Comp.1Comp.2Comp.3Comp.4
Standarddeviation1.98590370.1999069920.112189660.0603085506
ProportionofVariance0.98595340.0099907010.003146630.0009092803
CumulativeProportion0.98595340.9959440900.999090721.0000000000
Comp.1Comp.2Comp.3Comp.4
X1-0.502-0.2370.5790.598
X2-0.5000.493-0.6100.367
X3-0.498-0.707-0.368-0.342
X4-0.5010.4490.396-0.626
λ4=0.06030855062≈0所以变量存在着多重共线性
下面作主成分回归分析,首先计算样本的主成分的预测,并将第一主成分和第二主成分的预测值存放在数据框sale中,然后对主成分作回归分析,其命令格式如下
####预测样本主成分,并作主成分分析
pre<
-predict(sale.pr)
sale$Z1<
-pre[,1];
sale$Z2<
-pre[,2]
-lm(Y~Z1+Z2,data=sale)
lm(formula=Y~Z1+Z2,data=sale)
Min1QMedian3QMax
-0.74323-0.292230.017460.308070.80849
EstimateStd.ErrortvaluePr(>
|t|)
(Intercept)14.030000.1712581.9271.06e-11***
Z1-2.061190.08623-23.9035.70e-08***
Z2-0.624090.85665-0.7290.49
0.5415on7degreesoffreedom
0.9879,AdjustedR-squared:
0.9845
285.9on2and7DF,p-value:
1.945e-07
模型通过t检验和F检验,回归方程:
Y=14.0300-2.06119Z*1-0.62409Z*2
####作变换,得到原坐标下的关系表达式
beta<
-coef(lm.sol);
A<
-loadings(sale.pr)
x.bar<
-sale.pr$center;
x.sd<
-sale.pr$scale
coef<
-(beta[2]*A[,1]+beta[3]*A[,2])/x.sd
beta0<
-beta[1]-sum(x.bar*coef)
c(beta0,coef)
(Intercept)X1X2X3X4
-16.884606550.034209680.093764600.119548810.12360237
故回归方程为:
Y=-16.88460655+0.03420968X1+0.09376460X2+0.11954881X3+0.12360237X4
该方程对应系数均为整数比原方程更合理
9.3
将数据放入矩阵中,形成相关矩阵,并对矩阵r做主成分分析,由碎石图和累积贡献率可知,只取comp1,comp2两个主成分,因此因子数是2,接下做因子分析,在因子的载荷矩阵中看到有接近于1的数据和接近于0.1的数据,因此将数据分为factor1:
身高x1,手臂长x2,上肢长x3,下肢长x4数据都接近于1(“长”类)和factor2:
体重x5,颈围x6,胸围x7,胸宽x8数据都较大(“宽”类)。
图(25)碎石图
x<
-c(1.000,0.846,0.805,0.859,0.473,0.398,0.301,0.382,
+0.846,1.000,0.881,0.826,0.376,0.326,0.277,0.277,
+0.805,0.881,1.000,0.801,0.380,0.319,0.237,0.345,
+0.859,0.826,0.801,1.000,0.436,0.329,0.327,0.365,
+0.473,0.376,0.380,0.436,1.000,0.762,0.730,0.629,
+0.398,0.326,0.319,0.329,0.762,1.000,0.583,0.577,
+0.301,0.277,0.237,0.327,0.730,0.583,1.000,0.539,
+0.382,0.415,0.345,0.365,0.629,0.577,0.539,1.000)
names<
-c("
身高x1"
"
手臂长x2"
上肢长x3"
下肢长x4"
体重x5"
+"
颈围x6"
胸围x7"
胸宽x8"
)
r<
-matrix(x,nrow=8,dimnames=list(names,names))####构成相关矩阵
####做主成分分析,选取主成分个数,由累积贡献率可知,只取comp1,comp2
r.pr<
-princomp(r,cor=T)
summary(r.pr)
Comp.1Comp.2Comp.3Comp.4Comp.5
Standarddeviation2.56686910.807192570.64978150.44817950.279924016
ProportionofVariance0.82360210.081444980.05277700.02510810.009794682
CumulativeProportion0.82360210.905047090.95782410.98293220.992726878
Standarddeviation0.191396130.1468077021.144334e-08
ProportionofVariance0.004579060.0026940631.636875e-17
CumulativeProportion0.997305941.0000000001.000000e+00
r.load<
-loadings(r.pr);
r.load####载荷矩阵
Comp.1Comp.2Comp.3Comp.4Comp.5Comp.6Comp.7Comp.8
身高x1-0.374-0.210-0.373-0.3520.613-0.3900.124
手臂长x2-0.3830.3050.3970.768
上肢长x3-0.382-0.1080.617-0.209-0.3100.566
下肢长x4-0.377-0.374-0.470-0.5920.3000.225
体重x50.3440.364-0.230-0.7080.372-0.102-0.213
颈围x60.3300.291-0.6970.309-0.1980.1710.396
胸围x70.3410.3170.619-0.1090.2220.1090.568
胸宽x80.286-0.812-0.140-0.3330.1100.1750.294
CumulativeVar0.1250.2500.375
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 统计 建模 软件 薛毅 第九 答案
![提示](https://static.bingdoc.com/images/bang_tan.gif)