Kyoto Cabinet数据库资料.docx
- 文档编号:9941685
- 上传时间:2023-05-22
- 格式:DOCX
- 页数:40
- 大小:30.01KB
Kyoto Cabinet数据库资料.docx
《Kyoto Cabinet数据库资料.docx》由会员分享,可在线阅读,更多相关《Kyoto Cabinet数据库资料.docx(40页珍藏版)》请在冰点文库上搜索。
KyotoCabinet数据库资料
FundamentalSpecificationsofKyotoCabinetVersion1
Introduction
KyotoCabinetisalibraryofroutinesformanagingadatabase.Thedatabaseisasimpledatafilecontainingrecords,eachisapairofakeyandavalue.Everykeyandvalueisserialbyteswithvariablelength.Bothbinarydataandcharacterstringcanbeusedasakeyandavalue.Eachkeymustbeuniquewithinadatabase.Thereisneitherconceptofdatatablesnordatatypes.RecordsareorganizedinhashtableorB+tree.
Thefollowingaccessmethodsareprovidedtothedatabase:
storingarecordwithakeyandavalue,deletingarecordbyakey,retrievingarecordbyakey.Moreover,traversalaccesstoeverykeyareprovided.TheseaccessmethodsaresimilartoonesoftheoriginalDBM(anditsfollowers:
NDBMandGDBM)librarydefinedintheUNIXstandard.KyotoCabinetisanalternativefortheDBMbecauseofitshigherperformance.
Eachoperationofthehashdatabasehasthetimecomplexityof"O
(1)".Therefore,intheory,theperformanceisconstantregardlessofthescaleofthedatabase.Inpractice,theperformanceisdeterminedbythespeedofthemainmemoryorthestoragedevice.Ifthesizeofthedatabaseislessthanthecapacityofthemainmemory,theperformancewillseemon-memoryspeed,whichisfasterthanstd:
:
mapofSTL.Ofcourse,thedatabasesizecanbegreaterthanthecapacityofthemainmemoryandtheupperlimitis8exabytes.Eveninthatcase,eachoperationneedsonlyoneortwoseekingofthestoragedevice.
EachoperationoftheB+treedatabasehasthetimecomplexityof"O(logN)".Therefore,intheory,theperformanceislogarithmictothescaleofthedatabase.AlthoughtheperformanceofrandomaccessoftheB+treedatabaseisslowerthanthatofthehashdatabase,theB+treedatabasesupportssequentialaccessinorderofthekeys,whichrealizesforwardmatchingsearchforstringsandrangesearchforintegers.Theperformanceofsequentialaccessismuchfasterthanthatofrandomaccess.
AstheAPIisbasedonobject-orienteddesign,thehashdatabaseandthetheB+treedatabasehavesamemethodswhichinheritedfromtheupperabstractclass.Besidethem,twokindsofdatabasesareprovidedunderthesamebaseclass.TheprototypedatabaseispoweredbythestandardcontainersofSTL.Thecachedatabaseispoweredbytheoriginalimplementationofdouble-linkedhashmapwithLRUdeletionalgorithm.Alldatabaseshavepracticalutilitymethodsrelatedtotransactionandcursor.Programsforcommandlineinterfacearealsoincludedinthepackage.
KyotoCabinetrunsveryfast.Forexample,elapsedtimetostoreonemillionrecordsis0.9secondsforthehashdatabase,and1.1secondsfortheB+treedatabase.Moreover,thesizeofdatabaseisverysmall.Forexample,overheadforarecordis16bytesforthehashdatabase,and4bytesfortheB+treedatabase.Furthermore,scalabilityofKyotoCabinetisgreat.Thedatabasesizecanbeupto8EB(9.22e18bytes).
KyotoCabinetiswrittenintheC++language,andprovidedasAPIofC++,C,Java,Python,Ruby,Perl,andLua.KyotoCabinetisavailableonplatformswhichhaveAPIconformingtoC++03withtheTR1libraryextensions.KyotoCabinetisafreesoftwarelicensedundertheGNUGeneralPublicLicense.
--------------------------------------------------------------------------------
Features
ThissectiondescribesthefeaturesofKyotoCabinet.
Genealogy
TheoriginalDBMwasdevelopedbyKennethThompsonasapartoftheoriginalAT&TUNIX.Afterthat,alotoffollowersdevelopedsuchDBM-likeproductsasNDBM,SDBM,GDBM,TDB,andBerkeleyDB.In2003,IdevelopedQDBMtoreplaceGDBMforperformancereason.
In2007,TokyoCabinetwasdevelopedasthesuccessortoQDBMonthefollowingpurposes.TheywereachievedandTokyoCabinetcouldreplaceconventionalDBMproducts.
?
improvesspaceefficiency:
smallersizeofdatabasefile.
?
improvestimeefficiency:
fasterprocessingspeed.
?
improvesparallelism:
higherperformanceinmulti-threadenvironment.
?
improvesusability:
simplifiedAPI.
?
improvesrobustness:
databasefileisnotcorruptedevenundercatastrophicsituation.
?
supports64-bitarchitecture:
enormousmemoryspaceanddatabasefileareavailable.
In2009,KyotoCabinetwasdevelopedasanothersuccessortoQDBM.Comparedwiththesiblingproduct(TokyoCabinet),thefollowingadvantageswerepursued.However,theperformanceofTokyoCabinetishigherthanKyotoCabinet,atleastinsinglethreadoperations.
?
improvesspaceefficiency:
smallersizeofdatabasefile.
?
improvesparallelism:
higherperformanceinmulti-threadenvironment.
?
improvesportability:
abstractionofthelowerlayertosupportnon-POSIXsystems.
?
improvesusability:
simplifiedAPI,object-orienteddesign.
?
improvesrobustness:
databasefileisnotcorruptedevenundercatastrophicsituation.
I'llmaintainthebothofTokyoCabinetandKyotoCabinetbecausetheirvaluesaredifferent.
EffectiveImplementationofHashDatabase
KyotoCabinetuseshashalgorithmtoretrieverecords.Ifabucketarrayhassufficientnumberofelements,thetimecomplexityofretrievalis"O
(1)".Thatis,thetimerequiredforretrievingarecordisconstant,regardlessofthescaleofadatabase.Itisalsothesameaboutstoringanddeleting.Collisionofhashvaluesismanagedbyseparatechaining.Datastructureofthechainsisbinarysearchtree.Evenifabucketarrayhasunusuallyscarceelements,thetimecomplexityofretrievalis"O(logn)".
KyotoCabinetattainsperformanceimprovementinretrievalbyloadingthewholeofthebucketarrayontotheRAM.IfthebucketarrayisonRAM,itispossibletoaccessaregionofatargetrecordbyaboutonesetoffileoperationssuchas`lseek',`read',and`write'.ThebucketarraysavedinafileisnotreadintoRAMwiththe`read'callbutdirectlymappedtoRAMwiththe`mmap'call.Therefore,preparationtimeonconnectingtoadatabaseisveryshort,andtwoormoreprocessescansharethesamememorymap.
ThehashfunctionusedforhashtableisMurMurHash2.0.Ifthenumberofelementsofthebucketarrayisaboutahalfofrecordsstoredwithinadatabase,althoughitdependsoncharacteristicoftheinput,theprobabilityofcollisionofhashvaluesisabout55.3%(35.5%ifthesame,20.4%iftwice,11.0%iffourtimes,5.7%ifeighttimes).Inthatcase,itispossibletoretrievearecordbytwoorlesssetsoffileoperations.Ifitismadeintoaperformanceindex,inordertohandleadatabasecontainingonemillionofrecords,abucketarraywithhalfamillionofelementsisrequired.Thesizeofeachelementis6bytes.Thatis,if3MbytesofRAMisavailable,adatabasecontainingonemillionrecordscanbehandled.
Whenoverwritingarecordwithavaluewhosesizeisgreaterthantheexistingone,itisnecessarytomovetheregiontoanotherpositionofthefile.Becausethetimecomplexityoftheoperationdependsonthesizeoftheregionofarecord,extendingvaluessuccessivelyisinefficient.However,KyotoCabinetdealswiththisproblembyalignment.Iftheincrementaldatacanbeplacedinthepaddingregiontrailingtherecords,itisnotnecessarytomovetheregionoftherecord.
Generallyspeaking,whilesuccessionofupdating,fragmentationofavailableregionsoccurs,andthesizeofadatabasegrowsrapidly.KyotoCabinetdealswiththisproblembythefreeblockpoolandtheautodefragmentationmechanism.Ifarecordisremovedorshiftedtoanotherposition,theregionwillbetreatedasafreeblock.Thefreeblockpoolmanagesfreeblocksandreusesthebestfitregionforanewrecord.Theautodefragmentationistoshiftrecordsandfreeblocksseparately.Successivefreeblockscoalesceintoone.
UsefulImplementationofB+TreeDatabase
AlthoughtheB+treedatabaseisslowerthanthehashdatabase,itfeaturesorderingaccesstoeachrecord.Theordercanbeassignedbyusers.RecordsintheB+treedatabasearesortedandarrangedinlogicalpages.SparseindexorganizedinBtreethatismultiwaybalancedtreearemaintainedforeachpage.Thus,thetimecomplexityofretrievalandsoonis"O(logn)".Cursorisprovidedtoaccesseachrecordinorder.Thecursorcanjumptoapositionspecifiedbyakeyandcanstepforwardorbackwardfromthecurrentposition.Becauseeachpageisarrangedasdoublelinkedlist,thetimecomplexityofsteppingcursoris"O
(1)".
TheB+treedatabaseisimplementedbasedontheabovethehashdatabase.BecauseeachpageoftheB+treedatabaseisstoredaseachrecordinthehashdatabase,theB+treedatabaseinheritsefficiencyofstoragemanagementofthehashdatabase.Becausetheheaderofeachrecordissmallerandalignmentofeachpageisadjustedaccordingtothepagesize,inmostcases,thesizeofdatabasefileiscutbyhalfcomparedtooneofthehashdatabase.
AlthoughoperationsofmanypagesarerequiredtoupdatetheB+treedatabase,KyotoCabinetexpeditestheprocessbythepagecacheandreducingfileoperations.ThepagecacheisimplementedwithdoublelayeredLRUlist,whichrealizesthatfrequentlyaccessedpagesarecachedinthe"hot"listandrecentlyaccessedpagesarecachedinthe"warm"LRUlist.Ifthepagecacheworksefficientlyandthewholeofthesparseindexiscachedonmemory,itispossibletoretrievearecordbyoneorlesssetoffileoperations.
EachpageoftheB+treedatabasecanbestoredwithcompressed.Thedefaultcompressionmethodis"Deflate"byZLIB.Becauserecordsinapagehassimilarpatterns,highefficiencyofcompressionisexpectedduetotheLempel-Zivalgorithm.Incaseofhandlingtextdata,thesizeofadatabaseisreducedtoabout50%orless.Ifthescal
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- Kyoto Cabinet数据库资料 Cabinet 数据库 资料