1、实验报告聚类分析实验报告聚类分析实验原理:K均值聚类、中心点聚类、系统聚类和 EM算法聚类分析技术。实验题目:用鸢尾花的数据集,进行聚类挖掘分析。实验要求:探索鸢尾花数据的基本特征,利用不同的聚类挖掘方法,获得基本 结论并简明解释。实验题目-分析报告:data(iris)rm(list=ls()gc()used (Mb) gc trigger (Mb) max used (Mb)Ncells 431730 929718 607591Vcells 787605 8388608 1592403data(iris)datav-irishead(data)1Speciessetosa2setosa3s
2、etosa4setosa5setosa6setosa#Kmear聚类分析n ewiris n ewiris$Species (kc table(iris$Species, kc$cluster)1 2 3setosa 0 50 0versicolor 48 0 2virgi nica 14 0 36plot( newirisc(, ), col = kc$cluster)poi nts(kc$ce nters,c(, ), col = 1:3, pch = 8, cex=2)Q45 50 55 flO 05 70 75 8DSepal.LengthLlp-ZsE 吕e#K-Mediods进行聚
3、类分析(cluster)library(cluster)table(iris$Species,$clusteri ng)1 2 3setosa 50 0 0versicolor 0 3 47virgi nica 0 49 1layout(matrix(c(1,2),1,2)plotCoirijjonenl 1Tn牌 two componerts explain &.02 % of me poini w layout(matrix(1)Silhouette plot of pam(x = iris, k = 3) nwl50 3 AJSteis Cjj. iavecj s;l. 50 | O.6
4、C2 52 0.410.0 0.2 0.4 D.S 0.6 1.0SilfKiuele widdl SiHowHie widWi - 0.57#hcplot( , hang = -1)plclust( , labels = FALSE, ha ng = -1)re sapply (uniq ue,+ fun ctio n(g)iris$Species=g)11 setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa12 setosa setosa setosa setosa setosa seto
5、sa setosa setosa setosa setosa setosa23 setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa34 setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa45 setosa setosa setosa setosa setosa setosaLevels: setosa versicolor virginica1 versicolor versicolor ve
6、rsicolor versicolor versicolor versicolor versicolor8 versicolor versicolor versicolor versicolor versicolor versicolor versicolor15 versicolor versicolor versicolor versicolor versicolor versicolor versicolor22 versicolor versicolor virginica virginica virginica virginica virginica29 virginica virg
7、inica virginica virginica virginica virginica virginica36 virginica virginica virginica virginica virginica virginica virginica43 virginica virginica virginica virginica virginica virginica virginica50 virginica virginica virginica virginica virginica virginica virginica57 virginica virginica virgin
8、ica virginica virginica virginica virginica64 virginica virginica virginica virginica virginica virginica virginica71 virginica virginicaLevels: setosa versicolor virginica31 versicolor versicolor versicolor versicolor versicolor versicolor versicolor8 versicolor versicolor versicolor versicolor ver
9、sicolor versicolor versicolor15 versicolor versicolor versicolor versicolor versicolor versicolor versicolor22 versicolor versicolor versicolor versicolor versicolor versicolor virginicaLevels: setosa versicolor virginicaplot,k=4,border=light grey)# 用浅灰色矩形框出4分类聚类结果,k=3,border=dark grey)# 用浅灰色矩形框出3分类
10、聚类结果,k=7,which=c(2,6),border=dark grey)Cluiter Dendrogrtiim# DBSCAN基于密度的聚类(fpc)library(fpc)半径参数为1,密度阈值为5ds仁dbsca n(iris,1:4,eps=1,Mi nPts=5)#ds1 dbsca n Pts=150 Mi nPts=5 eps=11 2 border 0 1seed 50 99total 50 100ds2=dbsca n(iris,1:4,eps=4,Mi nPts=5)ds3=dbsca n(iris,1:4,eps=4,Mi nPts=2)ds4=dbsca n(ir
11、is,1:4,eps=8,Mi nPts=2)par(mfcol=c(2,2)plot(ds1,iris,1:4,main=1: MinPts=5 eps=1)plot(ds3,iris,1:4,main=3: MinPts=2 eps=4)plot(ds2,iris,1:4,main=2: MinPts=5 eps=4)plot(ds4,iris,1:4,main=4: MinPts=2 eps=8)4: MinPts=2 eps=82.G 3.G M 0 5 IF 2 5d=dist(iris,1:4)# 计算数据集的距离矩阵dmax(d);min(d)#计算数据集样本的距离的最值1 0(
12、ggpiot2)Iibrary(ggplot2) in terval=cut_i nterval(d,30) table(i nterval)interval0,88585876891831688,543369379339335406,458459465480468505,349385321291187138,97927850184 (table(i nterval),4 for(i in 3:5)+ for(j in 1:10)+ ds=dbsca n(iris,1:4,eps=i,M in Pts=j)+ prin t(ds)+ + dbscan Pts=150 Min Pts=1 eps
13、=31seed 150total 150dbscan Pts=150 Min Pts=2 eps=31seed 150total 150dbscan Pts=150 Min Pts=3 eps=31seed 150total 150 dbscan Pts=150 MinPts=4 eps=31seed 150total 150dbscan Pts=150 Min Pts=5 eps=31seed 150total 150dbscan Pts=150 Min Pts=6 eps=31seed 150total 150dbscan Pts=150 Min Pts=7 eps=31seed 150t
14、otal 150dbscan Pts=150 Min Pts=8 eps=31seed 150total 150dbscan Pts=150 Min Pts=9 eps=31seed 150total 150dbscan Pts=150 Mi nPts=10 eps=31seed 150total 150dbscan Pts=150 Min Pts=1 eps=41total 150dbscan Pts=150 MinPts=2 eps=41seed 150total 150dbscan Pts=150 Min Pts=3 eps=41seed 150total 150dbscan Pts=1
15、50 MinPts=4 eps=41seed 150total 150dbscan Pts=150 MinPts=5 eps=41seed 150total 150dbscan Pts=150 Min Pts=6 eps=41seed 150total 150dbscan Pts=150 MinPts=7 eps=41seed 150total 150dbscan Pts=150 Min Pts=8 eps=41seed 150total 150dbscan Pts=150 Min Pts=9 eps=41seed 150total 150dbscan Pts=150 Mi nPts=10 e
16、ps=41seed 150total 150 dbscan Pts=150 MinPts=1 eps=51seed 150total 150dbscan Pts=150 Mi nPts=2 eps=51seed 150total 150dbscan Pts=150 Mi nPts=3 eps=51seed 150total 150dbsca n Pts=150 Mi nPts=4 eps=51seed 150total 150dbscan Pts=150 Mi nPts=5 eps=51seed 150total 150dbscan Pts=150 Mi nPts=6 eps=51seed 1
17、50total 150dbsca n Pts=150 Mi nPts=7 eps=51seed 150total 150dbscan Pts=150 Mi nPts=8 eps=51total 150dbscan Pts=150 Min Pts=9 eps=51seed 150total 150dbscan Pts=150 Mi nPts=10 eps=51seed 150total 150#30次dbscan的聚类结果ds5=dbsca n(iris,1:4,eps=3,Mi nPts=2)ds6=dbsca n(iris,1:4,eps=4,Mi nPts=5)ds7=dbsca n(ir
18、is,1:4,eps=5,Mi nPts=9)par(mfcol=c(1,3)plot(ds5,iris,1:4,main=1: MinPts=2 eps=3)plot(ds6,iris,1:4,main=3: MinPts=5 eps=4)plot(ds7,iris,1:4,main=2: MinPts=9 eps=5)2: MinPts=9 eps=52.G 3.G 0 5 IE 2 54.5 S.5 6.5 7.5 12 3 4 5 6 7# EM期望最大化聚类(mclust)library(mclust)fit_EM=Mclust(iris,1:4)fitting .|=|100%su
19、mmary(fit_EM)Gaussian finite mixture model fitted by EM algorithmMclust VEV (ellipsoidal, equal shape) model with 2 comp onents: n df BIC ICL150 26Clusteri ng table:1 250 100 summary(fit_EM,parameters二TRUE)Gaussian finite mixture model fitted by EM algorithmMclust VEV (ellipsoidal, equal shape) mode
20、l with 2 comp onents: n df BIC ICL150 26Clusteri ng table:1 250 100Mixing probabilities:1 2Mea ns:,1 ,2Varia nces:,10. 0.0. 0.,20. 0.0.0. 0. plot(fit_EM)# 对EM聚类结果作图Model-based clusteri ng plots:1: BIC2: classificati on3: un certa inty4: den sitySelectio n: (下面显示选项)#选1品IO吕1 2545S789O#选2Sep al.Length2
21、5 3.0 3.5 A.Q舞B 日 手1卑PEted 丄 ength聲才鼻 1Petal Width#选3#选4Selectio n: 0iris_BIC=mclustBIC(iris,1:4)fitting .|=|100%iris_BICsum=summary(iris_BIC,data=iris,1:4)iris_BICsum #获取数1据集iris 在各模型和类别数下的BIC值Best BIC values:VEV,2 VEV,3 VVV,2BICBIC diffClassification table for model (VEV,2):1 250 100iris_BICBayesi
22、an Information Criterion (BIC):123456789123456789TopNA NANANANA3 models based on the BIC criterion:VEV,2 VEV,3 VVV,2 par(mfcol=c(1,1) plot(iris_BIC,G=1:7,col=yellow)J1HJIULULU1 1 、1 1 11 1 1rnmrnmmrTirTi mclust2Dplot(iris,1:2,+ classificati on=iris_BICsum$classificati on,+ parameters=iris_BICsum$par
23、ameters,col=yellow)I5.0 5.5606.57.07.5 0.0InELaEAAradolDSepal Length iris_De ns=de nsityMclust(iris,1:2)#对每一个样本进行密度估计fitting . iris_De nsdensityMclust model object: (VEV,2)Available comp onents:1 call data5 d G9 loglik df 13 zmodelName nBIC bichypvol parametersclassification uncertainty density plot
24、(iris_Dens,iris,1:2,col=yellow,nlevels=55) #Model-based den sity estimati on plots:输入1或21: BIC2: den sitySelectio n: (下面显示选项)#选1o o inEEEVVVVVE VEEV VEVVEVEV E 日日 WVIE EVLUVESLUNumber of components#选2o 寸 gel0“ gK 001 p_M-_tudaso5.Sepal.LengthSelectio n: 0 plot(iris_De ns,type = persp,col = grey) Model-based den sity estimati on plots:1: BIC2: den sitySelectio n: (下面显示选项)#选1ooinC3 O 9#选2Selectio n: 0-H H E EvlvlLLJ evevef-EEEVVVV vevee-v EVWEVEy12 3 45 6 7 8 9Number of components