C415Tutorial.docx
- 文档编号:16232428
- 上传时间:2023-07-12
- 格式:DOCX
- 页数:32
- 大小:199.36KB
C415Tutorial.docx
《C415Tutorial.docx》由会员分享,可在线阅读,更多相关《C415Tutorial.docx(32页珍藏版)》请在冰点文库上搜索。
C415Tutorial
C4.5Tutorial
C4.5isasoftwareextensionofthebasicID3algorithmdesignedbyQuinlantoaddressthefollowingissuesnotdealtwithbyID3:
∙Avoidingoverfittingthedata
oDetermininghowdeeplytogrowadecisiontree.
∙Reducederrorpruning.
∙Rulepost-pruning.
∙Handlingcontinuousattributes.
oe.g.,temperature
∙Choosinganappropriateattributeselectionmeasure.
∙Handlingtrainingdatawithmissingattributevalues.
∙Handlingattributeswithdifferingcosts.
∙Improvingcomputationalefficiency.
ItisinstalledforuseonGrendel(grendel.icd.uregina.ca),butitmaybesetuponalocalmachineasfollows:
C4.5Release8InstallationInstructionsforUNIX
1.DownloadtheC4.5sourcecode.
2.Decompressthearchive:
1.Type"tarxvzfc4.5r8.tar"(notuniversallysupported),or,alternatively,
2.Type"gunzipc4.5r8.tar.gz"todecompressthegziparchive,andthen
Type"tarxvfc4.5r8.tar"todecompressthetararchive.
3.Changeto./R8/Src
4.Type"makeall"tocompiletheexecutables.
5.Puttheexecutablesintoa"bin"subdirectoryandincludeitinthepathforcommand-lineusage.
ManualPages
∙c4.5:
usingthec4.5decisiontreegenerator.
∙verbosec4.5:
interpretingoutputgeneratedbyc4.5.
∙c4.5rules:
usingthec4.5rulegenerator.
∙verbosec4.5rules:
interpretingoutputgeneratedbyc4.5rules.
∙consult:
usesadecisiontreetoclassifyitems.
∙consultr:
usesarulesettoclassifyitems.
Examples
ClickonthelinksbelowforexamplesofC4.5usage:
∙Example1-Golf
oAsimple,detailedexampleofhowC4.5andC4.5ruleswork.
∙Example2-Sunburn
oThesunburnexamplerevisited.
∙Example3-Homonyms
oAdvancedusageof,andapracticalapplicationof,C4.5andC4.5rules.
ManualPages
∙c4.5:
usingthec4.5decisiontreegenerator.
NAME
c4.5-formadecisiontreefromafileofexamples
SYNOPSIS
c4.5[-ffilestem][-u][-s][-p][-vverb][-ttrials][-wwsize][-iincr][-g][-mminobjs][-ccf]
DESCRIPTION
C4.5isaprogramforinducingclassificationrulesintheformofdecisiontreesfromasetofgivenexamples.
AllfilesreadandwrittenbyC4.5areoftheformfilestem.extwherefilestemisafilenamestemthatidentifiestheinductiontaskandextisanextensionthatdefinesthetypeoffile.Theprogramexpectstofindatleasttwofiles:
anamesfilefilestem.namesdefiningclass,attributeandattributevaluenames,andadatafilefilestem.datacontainingasetofobjects,eachofwhichisdescribedbyitsvaluesofeachoftheattributesanditsclass.
Theprogramcangeneratetreesintwoways.Inbatchmode(thedefault),theprogramgeneratesasingletreeusingalltheavailabledata.Initerativemode,theprogramstartswitharandomly-selectedsubsetofthedata(thewindow),generatesatrialdecisiontree,addssomemisclassifiedobjects,andcontinuesuntilthetrialdecisiontreecorrectlyclassifiesallobjectsnotinthewindoworuntilitappearsthatnoprogressisbeingmade.Sinceiterativemodestartswitharandomly-selectedsubset,multipletrialswiththesamedatacanbeusedtogeneratemorethanonetree.
Alltreesgeneratedintheprocessaresavedinfilestem.unpruned.Aftereachtreeisgenerated,itisprunedinanattempttosimplifyit.The"best"prunedtree(selectedbytheprogramifmorethereismorethanonetrial)issavedinmachine-readableforminfilestem.tree.
Alltreesproduced,bothpre-andpost-simplification,areevaluatedonthetrainingdata.Ifrequired,theycanalsobeevaluatedonunseendatainthefilefilestem.test.
FILEFORMATS
Thenamesfilefilestem.namesisaseriesofentriesdefiningnamesofattributes,attributevaluesandclasses.Thefileisfree-formatwiththeexceptionthattheverticalbar"|"causestheremainderofthatlinetobeignored.Eachentryisterminatedbyaperiodwhichmaybeomittedifitisthelastcharacterofaline.
Thefilecommenceswiththenamesoftheclasses,separatedbycommasandterminatedwithaperiod.Eachnameconsistsofastringofcharactersthatdoesnotincludecomma,questionmarkorcolon(unlessprecededbyabackslash).Aperiodmaybeembeddedinanameprovideditisnotfollowedbyaspace.Embeddedspacesarealsopermittedbutmultiplewhitespaceisreplacedbyasinglespace.Therestofthefileconsistsofasingleentryforeachattribute.Anattributeentrybeginswiththeattributenamefollowedbyacolon,andtheneithertheword"ignore"(indicatingthatthisattributeshouldnotbeused),theword"continuous"(indicatingthattheattributehasrealvalues),theword"discrete"followedbyanintegern(indicatingthattheprogramshouldassemblealistofuptonpossiblevalues),oralistofallpossiblediscretevaluesseparatedbycommas.(Thelatterformfordiscreteattributesisrecommendedasitenablesinputtobechecked.)Eachentryisterminatedwithaperiod(butseeabove).
Thedatafilefilestem.datacontainsonelineperobject.Eachlinecontainsthevaluesoftheattributesinorderfollowedbytheobject'sclass,withallentriesseparatedbycommas.Therulesforvalidnamesinthenamesfilealsoholdforthenamesinthedatafile.Anunknownvalueofanattributeisindicatedbyaquestionmark"?
".Ifatestfilefilestem.testisused,ithasthesameformatasthedatafile.
OPTIONS
Optionsandtheirmeaningsare:
-ffilestem
Specifythefilenamestem(defaultDF)
-u
Evaluatetreesproducedonunseencasesinfilefilestem.test.
-s
Force"subsetting"ofalltestsbasedondiscreteattributeswithmorethantwovalues.C4.5willconstructatestwithasubsetofvaluesassociatedwitheachbranch.
-p
Probabilisticthresholdsusedforcontinuousattributes(seeQuinlan,1987a).
-ttrials
Setiterativemodewithspecifiednumberoftrials.
-vverb
Settheverbositylevel[0-3](default0).Thisoptiongeneratesmorevoluminousoutputthatmayhelptoexplainwhattheprogramisdoing(butdon'tcountonit);seethemanualentryforverbose.
Thefollowingoptionsarealsoavailablebutneednotbeusedexceptforexperimentationwithtreeconstruction:
-wwsize
Setthesizeoftheinitialwindow(defaultisthemaximumof20percentandtwicethesquarerootofthenumberofdataobjects).
-iincr
Setthemaximumnumberofobjectsthatcanbeaddedtothewindowateachiteration(defaultis20percentoftheinitialwindowsize).
-g
Usethegaincriteriontoselecttests.Thedefaultusesthegainratiocriterion.
-mminobjs
Inalltests,atleasttwobranchesmustcontainaminimumnumberofobjects(default2).Thisoptionallowstheminimumnumbertobealtered.
-ccf
Setthepruningconfidencelevel(default25%).
FILES
c4.5
filestem.data
filestem.names
filestem.unpruned(unprunedtrees)
filestem.tree(finaldecisiontree)
filestem.test(unseendata)
∙verbosec4.5:
interpretingoutputgeneratedbyc4.5.
NAME
AguidetotheverboseoutputoftheC4.5decisiontreegenerator
DESCRIPTION
ThisdocumentexplainstheoutputoftheprogramC4.5whenitisrunwiththeverbositylevel(optionv)settovaluesfrom1to3.
TREEBUILDING
VerbosityLevel1
Tobuildadecisiontreefromasetofdataitemseachofwhichbelongstooneofasetofclasses,C4.5proceedsasfollows:
1.Ifallitemsbelongtothesameclass,thedecisiontreeisaleafwhichislabelledwiththisclass.
2.Otherwise,C4.5attemptstofindthebestattributetotestinordertodividethedataitemsintosubsets,andthenbuildsasubtreefromeachsubsetbyrecursivelyinvokingthisprocedureforeachone.
Thebestattributetobranchonateachstageisselectedbydeterminingtheinformationgainofasplitoneachoftheattributes.IftheselectioncriterionbeingusedisGAIN(optiong),thebestattributeisthatwhichdividesthedataitemswiththehighestgainininformation,whereasiftheGAINRATIOcriterion(thedefault)isbeingused(andthegainisatleasttheaveragegainacrossallattributes),thebestattributeisthatwiththehighestratioofinformationgaintopotentialinformation.Fordiscrete-valuedattributes,abranchcorrespondingtoeachvalueoftheattributeisformed,whereasforcontinuous-valuedattributes,athresholdisfound,thusformingtwobranches.Ifsubsettestsarebeingused(options),branchesmaybeformedcorrespondingtoasubsetofvaluesofadiscreteattributebeingtested.
Theverboseoutputshowsthenumberofitemsfromwhichatreeisbeingconstructed,aswellasthetotalweightoftheseitems.Theweightofanitemistheprobabilitythattheitemwouldreachthispointinthetreeandwillbelessthan1.0foritemswithanunknownvalueofsomepreviously-testedattribute.
Shownforthebestattributeis:
cut
-threshold(continuousattributesonly)
inf
-thepotentialinformationofasplit
gain
-thegainininformationofasplit
val
-thegainorthegain/inf(dependingontheselectioncriterion)
Alsoshownistheproportionofitemsatthispointinthetreewithanunknownvalueforthatattribute.Itemswithanunknownvaluefortheattributebeingtestedaredistributedacrossallvaluesinproportiontotherelativefrequencyofthesevaluesinthesetofitemsbeingtested.
Ifnosplitgivesagainininformation,thesetofitemsismadeintoaleaflabelledwiththemostfrequentclassofitemsreachingthispointinthetree,andthemessage:
nosensiblesplitsr1/r2
isgiven,wherer1isthetotalweightofitemsreachingthispointinthetree,andr2istheweightofthesewhichdon'tbelongtotheclassofthisleaf.
Ifasubtreeisfoundtomisclassifyatleastasmanyitemsasdoesreplacingthesubtreewithaleaf,thenthesubtreeisreplacedandthefollowingmessagegiven:
Collapsetre
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- C415Tutorial