数据挖掘PPT.ppt
- 文档编号:14553397
- 上传时间:2023-06-24
- 格式:PPT
- 页数:499
- 大小:6.46MB
数据挖掘PPT.ppt
《数据挖掘PPT.ppt》由会员分享,可在线阅读,更多相关《数据挖掘PPT.ppt(499页珍藏版)》请在冰点文库上搜索。
数据挖掘,主要内容,1.概述2.数据仓库与OLAP技术3.数据挖掘技术4.数据挖掘应用数据挖掘工具6.数据挖掘实例,1概述,1.1背景1.2数据挖掘定义1.3基本概念1.4主要功能1.5数据挖掘模型1.6实现流程1.7数据挖掘的应用1.8未来趋势,1.1背景,二十世纪末以来,全球信息量以惊人的速度急剧增长据估计,每二十个月将增加一倍。
许多组织机构的IT系统中都收集了大量的数据(信息)。
目前的数据库系统虽然可以高效地实现数据的录入、查询、统计等功能,但无法发现数据中存在的关系和规则,无法根据现有的数据预测未来的发展趋势。
为了充分利用现有信息资源,从海量数据中找出隐藏的知识,数据挖掘技术应运而生并显示出强大的生命力。
1.1背景,数据挖掘是八十年代投资AI研究项目失败后,AI转入实际应用时提出的。
它是一个新兴的,面向商业应用的AI研究。
(AI(ArtificialIntelligence,人工智能)1989年8月,在美国底特律召开的第11届国际人工智能联合会议的专题讨论会上首次出现数据库中的知识发现(KnowledgeDiscoveryinDatabase,KDD)这一术语。
随后,在1991年、1993年和1994年都举行KDD专题讨论会,汇集来自各个领域的研究人员和应用开发者,集中讨论数据统计、海量数据分析算法、知识表示、知识运用等问题。
最初,数据挖掘是作为KDD中利用算法处理数据的一个步骤,其后逐渐演变成KDD的同义词。
1.1背景,现在,人们往往不加区别地使用两者。
KDD常常被称为数据挖掘(DataMining),实际两者是有区别的。
一般将KDD中进行知识学习的阶段称为数据挖掘(DataMining),数据挖掘是KDD中一个非常重要的处理步骤。
数据挖掘是近年来出现的客户关系管理(CustomerRelationshipManagement,CRM)、商业智能(BusinessIntelligence,BI)等热点领域的核心技术之一。
DataMining.FindingpatternsindataDescribingthepatterns-onewayisbyrulesPredictingfromtherules-classification-forecasting,orKnowledgeDiscoveryinDatabasesExtractingknowledgeRepresentingknowledgeacquiredUsingtheknowledgeforfutureexamples,1.2数据挖掘定义,1.2数据挖掘定义,技术角度的定义数据挖掘(DataMining)是从大量的、不完全的、有噪声的、模糊的、随机的实际应用数据中,提取隐含在其中的、人们事先不知道的、但又是潜在有用的信息和知识的过程。
与数据挖掘相近的同义词包括:
数据融合、数据分析和决策支持等。
这一定义包括好几层含义:
数据源必须是真实的、海量的、含噪声的;发现的是用户感兴趣的知识;发现的知识要可接受、可理解、可运用;并不要求发现放之四海皆准的知识,仅支持特定的发现问题。
1.2数据挖掘定义,商业角度的定义数据挖掘是一种新的商业信息处理技术,其主要特点是对商业数据库中的大量业务数据进行抽取、转换、分析和其他模型化处理,从中提取辅助商业决策的关键性信息。
简言之,数据挖掘其实是一类深层次的数据分析方法。
因此,数据挖掘可以描述为:
按企业既定业务目标,对大量的企业数据进行探索和分析,揭示隐藏的、未知的或验证己知的规律性,并进一步将其模型化的有效方法。
Dataminingcontext,BusinessintelligencemodelLevelsofdataanalysismethod,hidden,shallow,surface,simpledatabasequeries,statisticalanalysis,datamining,Whatsortofdata?
ConsiderhereonlytextualtypedatacharactersornumbersDatathathasbeenstructuredinsomewayDatacanalsobebevisual,auralortactilePatternrecognitioninotherdatavisualanalysisofdatalater,WhatData?
Datasets,DatasetconcerningbridgesinUSAE13,A,33,CRAFTS,HIGHWAY,?
2,N,THROUGH,WOOD,?
S,WOODE15,A,28,CRAFTS,RR,?
2,N,THROUGH,WOOD,?
S,WOODE16,A,25,CRAFTS,HIGHWAY,MEDIUM,2,N,THROUGH,IRON,MEDIUM,S-F,SUSPENE17,M,4,CRAFTS,RR,MEDIUM,2,N,THROUGH,IRON,MEDIUM,?
SIMPLE-TE18,A,28,CRAFTS,RR,MEDIUM,2,N,THROUGH,IRON,SHORT,S,SIMPLE-TE19,A,29,CRAFTS,HIGHWAY,MEDIUM,2,N,THROUGH,WOOD,MEDIUM,S,WOODE20,A,32,EMERGING,HIGHWAY,MEDIUM,2,N,THROUGH,WOOD,MEDIUM,S,WOODE21,M,16,EMERGING,RR,?
2,?
THROUGH,IRON,?
?
SIMPLE-TE23,M,1,EMERGING,HIGHWAY,MEDIUM,?
?
THROUGH,STEEL,LONG,F,SUSPENE22,A,24,EMERGING,HIGHWAY,MEDIUM,4,G,THROUGH,WOOD,SHORT,S,WOODE24,O,45,EMERGING,RR,?
2,G,?
STEEL,?
?
SIMPLE-TE25,M,10,EMERGING,RR,?
2,G,?
STEEL,?
?
SIMPLE-TE27,A,39,EMERGING,RR,?
2,G,THROUGH,STEEL,?
F,SIMPLE-TE26,M,12,EMERGING,RR,MEDIUM,2,G,THROUGH,STEEL,MEDIUM,S,SIMPLE-TE30,A,31,EMERGING,RR,?
2,G,THROUGH,STEEL,MEDIUM,F,SIMPLE-TE29,A,26,EMERGING,HIGHWAY,MEDIUM,2,G,THROUGH,STEEL,MEDIUM,?
SUSPENE28,M,3,EMERGING,HIGHWAY,MEDIUM,2,G,THROUGH,STEEL,MEDIUM,S,ARCHE32,A,30,EMERGING,HIGHWAY,?
2,G,THROUGH,IRON,MEDIUM,F,SIMPLE-TE31,M,8,EMERGING,RR,MEDIUM,2,G,THROUGH,STEEL,MEDIUM,S,SIMPLE-TE34,O,41,EMERGING,RR,LONG,2,G,THROUGH,STEEL,LONG,F,SIMPLE-TE33,M,19,EMERGING,HIGHWAY,MEDIUM,?
G,THROUGH,IRON,MEDIUM,F,SIMPLE-TE36,O,45,MATURE,HIGHWAY,?
2,G,THROUGH,IRON,SHORT,F,SIMPLE-TE35,A,27,MATURE,HIGHWAY,MEDIUM,2,G,THROUGH,STEEL,MEDIUM,F,SIMPLE-TE38,M,17,MATURE,HIGHWAY,?
2,G,THROUGH,IRON,MEDIUM,F,SIMPLE-TE37,M,18,MATURE,RR,MEDIUM,2,G,THROUGH,STEEL,MEDIUM,S,SIMPLE-TE39,A,25,MATURE,HIGHWAY,?
2,G,THROUGH,STEEL,MEDIUM,F,SIMPLE-TE4,A,27,MATURE,AQUEDUCT,MEDIUM,1,N,THROUGH,WOOD,SHORT,S,WOODE40,M,22,MATURE,HIGHWAY,?
2,G,THROUGH,STEEL,MEDIUM,F,SIMPLE-TE41,M,11,MATURE,HIGHWAY,?
2,G,THROUGH,IRON,MEDIUM,F,SIMPLE-TE42,M,9,MATURE,HIGHWAY,LONG,2,G,THROUGH,STEEL,LONG,F,SIMPLE-T,formatissimplycommaseparatedvalues,Datasets,Datasetconcerninggeotechnicalparameters,formattakendirectlyfromaspreadsheet,DatastructuredintothreepartsRelationshaveAttributeswhichhaveInstancesExample,Relationaboutcupsofcoffeecoffeehasattributesofsize,sugar,temperatureetcsizehasinstancesofsmall,medium,largesugarhasinstancesofyesornotemperaturehasinstancesof39.8,54.7or41.0Celsius,DataStructure,Eachcupofcoffeehasattribute-values(records)InstancescanbenumericalornominalDatapreparation,filteringanddiscretizationcanbeconsiderableasmuchanartasscience,Data,Example,Cappuccinocoffeerelation,missingdata,attribute-value,attributeasnumber,orname,notethisattribute,processofdiscretization,Coffee?
CappuccinocoffeerelationBestrulesfound:
1.milkiness=over3=size=largeenjoy=yes32.size=largemilkiness=over3=enjoy=yes33.milkiness=overenjoy=yes3=size=large34.milkiness=over3=enjoy=yes35.milkiness=over3=size=large36.size=small3=enjoy=no37.size=largechocolate=ok2=milkiness=overenjoy=yes28.milkiness=overchocolate=ok2=size=largeenjoy=yes29.size=largemilkiness=overchocolate=ok2=enjoy=yes210.size=largechocolate=okenjoy=yes2=milkiness=over2,New(test)dataposesquestionmedium,over,?
ok,Wheatley,yes,ab-classifiedas10|a=yes00|b=no,Whatdowewanttodo?
TheenjoyattributehasvaluesofeitheryesornoIntheexamplewewanttodiscoverifthereareanycombinationsofconditionsthatleadtosomecoffeebeingmoreenjoyablethanothersConsequently,canIpredictwhetherIwilllikeaparticularcoffee?
Whatwilldataminingtrytodo?
DataminingwilltrytofindrulesorrelationshipsthatlinkthedatawithinstancesofeitheryesornoInotherwords,whatisinthedatathatiscommontotheyes(orno)instances?
Howdoesitwork?
DataminingalgorithmsanalysethedataNumeroustypesofalgorithmsforanalysisinferringrulesfromthedatalookingforpatternsorassociationswithinthedataDataforknownexamplestrainingdataResultsderivedfromanalysiscanbeusedonnewdatatestdatatogenerate.decisiontreesclassificationspredictions,Dataminingmixtureofmathematics,logic,statistics,artificialintelligenceForexample-algorithmforsimpleclassificationrules.,Morethanjuststatistics,foreachattribute,foreachvalueofthatattribute,makearuleasfollows:
counthowofteneachclassappearsfindthemostfrequentclassmakearulethatassignsthatclasstothisattribute-valuecalculatetheerrorrateoftheruleschoosetheruleswiththesmallesterrorrate,maths,stats,AI,stats,logic,Multivariatedata,AnotherwaytoseewhatisgoingonNumericaldata.,XYplotshowslittlepattern,Multivariatedata,AnotherwaytoseewhatisgoingonNumericaldata.,XZplotalsoshowslittlepattern3Dplot,Multivariatedata,AnotherwaytoseewhatisgoingonNumericaldata.,YZplotindicatesarelationship,ZvalueshavearelationwithXandYeachvariableaffectstherelationwithothervariablesinanotherdimensionmanymorevariablesarepossible-multivariate,Example,SimpleexampleofdataminingmethodWeatherdata,Relationabouttheweatherforgoingcyclingattributenamevalues,values,valuesattributeoutlooksunny,overcast,rainyattributetemperaturerealattributehumidityrealattributewindyTRUE,FALSEattributecycleyes,no,datafrompreviouseventsoutlook,temp,humid,wind,cyclesunny,85,85,FALSE,nosunny,80,90,TRUE,noovercast,83,86,FALSE,yesrainy,70,96,FALSE,yesrainy,68,80,FALSE,yesrainy,65,70,TRUE,noovercast,64,65,TRUE,yessunny,72,95,FALSE,nosunny,69,70,FALSE,yesrainy,75,80,FALSE,yessunny,75,70,TRUE,yesovercast,72,90,TRUE,yesovercast,81,75,FALSE,yesrainy,71,91,TRUE,no,usealgorithmstofindpatternsindataformrulesfromthepatternsuserulesforpredictionorclassification,Example,Testmode:
10-foldcross-validation=Classifiermodel(fulltrainingset)J48prunedtree-outlook=sunny|humidity75:
no(3.0)outlook=overcast:
yes(4.0)outlook=rainy|windy=TRUE:
no(2.0)|windy=FALSE:
yes(3.0)NumberofLeaves:
5Sizeofthetree:
8,Weatherrelationrulesgeneratedfromtrainingdata,Example,Weatherrelationshipdataanalysedforassociationsanddisplayedasatreeoneofmanyanalysisanddisplayoptionsrulesinducedandappliedtonewsituationsbuildexpertsystems,strongestassociationsatrootoftree,Example,Weatherrelationnewdata,New(test)dataDecidenottogocyclingisthattherightdecision?
datasunny,70,85,FALSE,no,=ConfusionMatrix=ab-classifiedas00|a=yes01|b=no,result,interestingresultwhy?
Valueofdatamining,Businessusesandbenefits,SomerfieldStoresuseddataminingtoexplorepatternsofbreadbuyingtomakebetterpredictionsaboutsalesvolumesLeedsBuildingSocietyuseddataminingforidentifyingmortgageaccountslikelytodefaultonrepaymentsNorwichUnionLife&Pensionsuseddataminingtechniquestodevelopaknowledge-basedsystemtoautomateunderwritingdecisionsHalfordsuseddataminingtodevelopmethodsforselectingstorelocationsformaximumturnoverBBCuseddataminingforschedulingprogramsinordertomaximiseaudienceshareDTIfundedprojectsonsecurityandfraudusingdatamininghttp:
/www.securityatwork.org.uk/Main/datminingCS.htm,SomerfieldStores,Theinitialtargetproblemwastheanalysisofbreadbuyingpatternsusinglargevolumesofdatacapturedatthebasketlevelandstoredintheirdatawarehouse.Becausebreadhasashortshelflife,itisimportantthataccuratestore-levelpurchasingpredictionscanbemadetoensureoptimumfreshnessandavailability.Thesituationiscomplicatedbyproductpromotionsandthedominoeffect,whereonetypeofbreadsellsoutandsalesaretransferredtootherbrands.Throughtheuseofdatamining,anincreasedunderstandingofpurchasingtrendsenablesbetterbreadavailabilityandgreatercustomersatisfaction.Nowthatthedataminingprocesshasbeenprovedintheoptimisationofbreadmanagement,SomerfieldStoresareapplyingdatamininginotherareasofthebusiness.Refhttp:
/www.it-innovation.soton.ac.uk/services/dm_deployment/datam_cases.shtml#bread,mainpointssofar,Simpleexamplesshownlarger,morecomplexdatasetsarenormalAllsortsofdatacanbemined
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 数据 挖掘 PPT