03Data-Warehousing-and-O.ppt
- 文档编号:11832308
- 上传时间:2023-06-02
- 格式:PPT
- 页数:58
- 大小:3.19MB
03Data-Warehousing-and-O.ppt
《03Data-Warehousing-and-O.ppt》由会员分享,可在线阅读,更多相关《03Data-Warehousing-and-O.ppt(58页珍藏版)》请在冰点文库上搜索。
June2,2023,DataMining:
ConceptsandTechniques,1,DataMining:
ConceptsandTechniquesChapter3,JiaweiHanDepartmentofComputerScienceUniversityofIllinoisatUrbana-Champaignwww.cs.uiuc.edu/hanj2006JiaweiHanandMichelineKamber,Allrightsreserved,June2,2023,DataMining:
ConceptsandTechniques,2,June2,2023,DataMining:
ConceptsandTechniques,3,Chapter3:
DataWarehousingandOLAPTechnology:
AnOverview,Whatisadatawarehouse?
Amulti-dimensionaldatamodelDatawarehousearchitectureDatawarehouseimplementationFromdatawarehousingtodatamining,June2,2023,DataMining:
ConceptsandTechniques,4,WhatisDataWarehouse?
Definedinmanydifferentways,butnotrigorously.AdecisionsupportdatabasethatismaintainedseparatelyfromtheorganizationsoperationaldatabaseSupportinformationprocessingbyprovidingasolidplatformofconsolidated,historicaldataforanalysis.“Adatawarehouseisasubject-oriented,integrated,time-variant,andnonvolatilecollectionofdatainsupportofmanagementsdecision-makingprocess.”W.H.InmonDatawarehousing:
Theprocessofconstructingandusingdatawarehouses,June2,2023,DataMining:
ConceptsandTechniques,5,DataWarehouseSubject-Oriented,Organizedaroundmajorsubjects,suchascustomer,product,salesFocusingonthemodelingandanalysisofdatafordecisionmakers,notondailyoperationsortransactionprocessingProvideasimpleandconciseviewaroundparticularsubjectissuesbyexcludingdatathatarenotusefulinthedecisionsupportprocess,June2,2023,DataMining:
ConceptsandTechniques,6,DataWarehouseIntegrated,Constructedbyintegratingmultiple,heterogeneousdatasourcesrelationaldatabases,flatfiles,on-linetransactionrecordsDatacleaninganddataintegrationtechniquesareapplied.Ensureconsistencyinnamingconventions,encodingstructures,attributemeasures,etc.amongdifferentdatasourcesE.g.,Hotelprice:
currency,tax,breakfastcovered,etc.Whendataismovedtothewarehouse,itisconverted.,June2,2023,DataMining:
ConceptsandTechniques,7,DataWarehouseTimeVariant,ThetimehorizonforthedatawarehouseissignificantlylongerthanthatofoperationalsystemsOperationaldatabase:
currentvaluedataDatawarehousedata:
provideinformationfromahistoricalperspective(e.g.,past5-10years)EverykeystructureinthedatawarehouseContainsanelementoftime,explicitlyorimplicitlyButthekeyofoperationaldatamayormaynotcontain“timeelement”,June2,2023,DataMining:
ConceptsandTechniques,8,DataWarehouseNonvolatile,AphysicallyseparatestoreofdatatransformedfromtheoperationalenvironmentOperationalupdateofdatadoesnotoccurinthedatawarehouseenvironmentDoesnotrequiretransactionprocessing,recovery,andconcurrencycontrolmechanismsRequiresonlytwooperationsindataaccessing:
initialloadingofdataandaccessofdata,June2,2023,DataMining:
ConceptsandTechniques,9,DataWarehousevs.HeterogeneousDBMS,TraditionalheterogeneousDBintegration:
AquerydrivenapproachBuildwrappers/mediatorsontopofheterogeneousdatabasesWhenaqueryisposedtoaclientsite,ameta-dictionaryisusedtotranslatethequeryintoqueriesappropriateforindividualheterogeneoussitesinvolved,andtheresultsareintegratedintoaglobalanswersetComplexinformationfiltering,competeforresourcesDatawarehouse:
update-driven,highperformanceInformationfromheterogeneoussourcesisintegratedinadvanceandstoredinwarehousesfordirectqueryandanalysis,June2,2023,DataMining:
ConceptsandTechniques,10,DataWarehousevs.OperationalDBMS,OLTP(on-linetransactionprocessing)MajortaskoftraditionalrelationalDBMSDay-to-dayoperations:
purchasing,inventory,banking,manufacturing,payroll,registration,accounting,etc.OLAP(on-lineanalyticalprocessing)MajortaskofdatawarehousesystemDataanalysisanddecisionmakingDistinctfeatures(OLTPvs.OLAP):
Userandsystemorientation:
customervs.marketDatacontents:
current,detailedvs.historical,consolidatedDatabasedesign:
ER+applicationvs.star+subjectView:
current,localvs.evolutionary,integratedAccesspatterns:
updatevs.read-onlybutcomplexqueries,June2,2023,DataMining:
ConceptsandTechniques,11,OLTPvs.OLAP,June2,2023,DataMining:
ConceptsandTechniques,12,WhySeparateDataWarehouse?
HighperformanceforbothsystemsDBMStunedforOLTP:
accessmethods,indexing,concurrencycontrol,recoveryWarehousetunedforOLAP:
complexOLAPqueries,multidimensionalview,consolidationDifferentfunctionsanddifferentdata:
missingdata:
DecisionsupportrequireshistoricaldatawhichoperationalDBsdonottypicallymaintaindataconsolidation:
DSrequiresconsolidation(aggregation,summarization)ofdatafromheterogeneoussourcesdataquality:
differentsourcestypicallyuseinconsistentdatarepresentations,codesandformatswhichhavetobereconciledNote:
TherearemoreandmoresystemswhichperformOLAPanalysisdirectlyonrelationaldatabases,June2,2023,DataMining:
ConceptsandTechniques,13,Chapter3:
DataWarehousingandOLAPTechnology:
AnOverview,Whatisadatawarehouse?
Amulti-dimensionaldatamodelDatawarehousearchitectureDatawarehouseimplementationFromdatawarehousingtodatamining,June2,2023,DataMining:
ConceptsandTechniques,14,FromTablesandSpreadsheetstoDataCubes,AdatawarehouseisbasedonamultidimensionaldatamodelwhichviewsdataintheformofadatacubeAdatacube,suchassales,allowsdatatobemodeledandviewedinmultipledimensionsDimensiontables,suchasitem(item_name,brand,type),ortime(day,week,month,quarter,year)Facttablecontainsmeasures(suchasdollars_sold)andkeystoeachoftherelateddimensiontablesIndatawarehousingliterature,ann-Dbasecubeiscalledabasecuboid.Thetopmost0-Dcuboid,whichholdsthehighest-levelofsummarization,iscalledtheapexcuboid.Thelatticeofcuboidsformsadatacube.,June2,2023,DataMining:
ConceptsandTechniques,15,Cube:
ALatticeofCuboids,time,item,time,item,location,time,item,location,supplier,June2,2023,DataMining:
ConceptsandTechniques,16,ConceptualModelingofDataWarehouses,Modelingdatawarehouses:
dimensions&measuresStarschema:
AfacttableinthemiddleconnectedtoasetofdimensiontablesSnowflakeschema:
Arefinementofstarschemawheresomedimensionalhierarchyisnormalizedintoasetofsmallerdimensiontables,formingashapesimilartosnowflakeFactconstellations:
Multiplefacttablessharedimensiontables,viewedasacollectionofstars,thereforecalledgalaxyschemaorfactconstellation,June2,2023,DataMining:
ConceptsandTechniques,17,ExampleofStarSchema,SalesFactTable,time_key,item_key,branch_key,location_key,units_sold,dollars_sold,avg_sales,Measures,June2,2023,DataMining:
ConceptsandTechniques,18,ExampleofSnowflakeSchema,SalesFactTable,time_key,item_key,branch_key,location_key,units_sold,dollars_sold,avg_sales,Measures,June2,2023,DataMining:
ConceptsandTechniques,19,ExampleofFactConstellation,SalesFactTable,time_key,item_key,branch_key,location_key,units_sold,dollars_sold,avg_sales,Measures,ShippingFactTable,time_key,item_key,shipper_key,from_location,to_location,dollars_cost,units_shipped,June2,2023,DataMining:
ConceptsandTechniques,20,CubeDefinitionSyntax(BNF)inDMQL,CubeDefinition(FactTable)definecube:
DimensionDefinition(DimensionTable)definedimensionas()SpecialCase(SharedDimensionTables)Firsttimeas“cubedefinition”definedimensionasincube,June2,2023,DataMining:
ConceptsandTechniques,21,DefiningStarSchemainDMQL,definecubesales_startime,item,branch,location:
dollars_sold=sum(sales_in_dollars),avg_sales=avg(sales_in_dollars),units_sold=count(*)definedimensiontimeas(time_key,day,day_of_week,month,quarter,year)definedimensionitemas(item_key,item_name,brand,type,supplier_type)definedimensionbranchas(branch_key,branch_name,branch_type)definedimensionlocationas(location_key,street,city,province_or_state,country),June2,2023,DataMining:
ConceptsandTechniques,22,DefiningSnowflakeSchemainDMQL,definecubesales_snowflaketime,item,branch,location:
dollars_sold=sum(sales_in_dollars),avg_sales=avg(sales_in_dollars),units_sold=count(*)definedimensiontimeas(time_key,day,day_of_week,month,quarter,year)definedimensionitemas(item_key,item_name,brand,type,supplier(supplier_key,supplier_type)definedimensionbranchas(branch_key,branch_name,branch_type)definedimensionlocationas(location_key,street,city(city_key,province_or_state,country),June2,2023,DataMining:
ConceptsandTechniques,23,DefiningFactConstellationinDMQL,definecubesalestime,item,branch,location:
dollars_sold=sum(sales_in_dollars),avg_sales=avg(sales_in_dollars),units_sold=count(*)definedimensiontimeas(time_key,day,day_of_week,month,quarter,year)definedimensionitemas(item_key,item_name,brand,type,supplier_type)definedimensionbranchas(branch_key,branch_name,branch_type)definedimensionlocationas(location_key,street,city,province_or_state,country)definecubeshippingtime,item,shipper,from_location,to_location:
dollar_cost=sum(cost_in_dollars),unit_shipped=count(*)definedimensiontimeastimeincubesalesdefinedimensionitemasitemincubesalesdefinedimensionshipperas(shipper_key,shipper_name,locationaslocationincubesales,shipper_type)definedimensionfrom_locationaslocationincubesalesdefinedimensionto_locationaslocationincubesales,June2,2023,DataMining:
ConceptsandTechniques,24,MeasuresofDataCube:
ThreeCategories,Distributive:
iftheresultderivedbyapplyingthefunctiontonaggregatevaluesisthesameasthatderivedbyapplyingthefunctiononallthedatawithoutpartitioningE.g.,count(),sum(),min(),max()Algebraic:
ifitcanbecomputedbyanalgebraicfunctionwithMarguments(whereMisaboundedinteger),eachofwhichisobtainedbyapplyingadistributiveaggregatefunctionE.g.,avg(),min_N(),standard_deviation()Holistic:
ifthereisnoconstantboundonthestoragesizeneededtodescribeasubaggr
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 03 Data Warehousing and