完整版计算机体系结构课后习题Word格式.docx
- 文档编号:614873
- 上传时间:2023-04-29
- 格式:DOCX
- 页数:17
- 大小:324.21KB
完整版计算机体系结构课后习题Word格式.docx
《完整版计算机体系结构课后习题Word格式.docx》由会员分享,可在线阅读,更多相关《完整版计算机体系结构课后习题Word格式.docx(17页珍藏版)》请在冰点文库上搜索。
So,
(3)By
Ifonlyoneenhancementcanbeimplemented:
So,wemustselectenhancement1and3tomaximizeperformance.
1.2Supposethereisagraphicsoperationthataccountsfor10%ofexecutiontimeinanapplication,andbyaddingspecialhardwarewecanspeedthisupbyafactorof18.Infurther,wecouldusetwiceasmuchhardware,andmakethegraphicsoperationrun36timesfaster.Givethereasonofwhetheritisworthexploringsuchanfurtherarchitecturalchange?
So,Itisnotworthexploringsuchanfurtherarchitecturalchange.
1.3Inmanypracticalapplicationsthatdemandareal-timeresponse,thecomputationalworkloadWisoftenfixed.Asthenumberofprocessorsincreasesinaparallelcomputer,thefixedworkloadisdistributedtomoreprocessorsforparallelexecution.Assume20percentofWmustbeexecutedsequentially,and80percentcanbeexecutedby4nodessimultaneously.Whatisafixed-loadspeedup?
So,afixed-loadspeedupis2.5.
2.1Thereisamodelmachinewithnineinstructions,whichfrequenciesareADD(0.3),SUB(0.24),JOM(0.06),STO(0.07),JMP(0.07),SHR(0.02),CIL(0.03),CLA(0.2),STP(0.01),respectively.ThereareseveralGPRsinthemachine.Memoryisbyteaddressable,withaccessedaddressesaligned.Andthememorywordwidthis16bit.
Supposethenineinstructionswiththecharacteristicsasfollowing:
nTwooperandsinstructions
nTwokindsofinstructionlength
nExtendedcoding
nShorterinstructionoperandsformat:
R(register)-R(register)
nLongerinstructionoperandsformat:
R(register)-M(memory)
nWithdisplacementmemoryaddressingmode
A.EncodethenineinstructionswithHuffman-coding,andgivetheaveragecodelength.
B.Designedthepracticalinstructioncodes,andgivetheaveragecodelength.
C.Writethetwoinstructionwordformatsindetail.
D.Whatisthemaximumoffsetforaccessingmemoryaddress?
HuffmancodingbyHuffmantree
nADD30%01
nSUB24%11
nCLA20%10
nJOM6%0001
nSTO7%0011
nJMP7%0010
nSHR2%000001
nCIL3%00001
nSTP1%000000
So,theaveragecodelengthis
(B)Twokindsofinstructionlengthextendedcoding
nSUB24%11
nCLA20%10
nJOM6%11000
nSTO7%11001
nJMP7%11010
nSHR2%11011
nCIL3%11100
nSTP1%11101
(C)Shorterinstructionformat:
Opcode
2bits
Register
3bits
Longerinstructionformat:
opcode
5bits
offset
(D)Themaximumoffsetforaccessingmemoryaddressis32bytes.
3.1Identifyallofthedatadependencesinthefollowingcode.Whichdependencesaredatahazardsthatwillberesolvedviaforwarding?
ADDR2,R5,R4
ADDR4,R2,R5
SWR5,100(R2)
ADDR3,R2,R4
3.2Howcouldwemodifythefollowingcodetomakeuseofadelayedbranchslot?
Loop:
LWR2,100(R3)
ADDIR3,R3,#4
BEQR3,R4,Loop
LWR2,100(R3)
Loop:
ADDIR3,R3,#4
BEQR3,R4,Loop
Delayedbranchslotà
LWR2,100(R3)
3.3Considerthefollowingreservationtableforafour-stagepipelinewithaclockcyclet=20ns.
A.Whataretheforbiddenlatenciesandtheinitialcollisionvector?
B.Drawthestatetransitiondiagramforschedulingthepipeline.
C.DeterminetheMALassociatedwiththeshortestgreedycycle.
D.DeterminethepipelinemaximumthroughputcorrespondingtotheMALandgivent.
s1
s2
s3
s4
123456
×
A.theforbiddenlatenciesF={1,2,5}
theinitialcollisionvectorC=(10011)
B.thestatetransitiondiagram
C.MAL(MinimalAverageLatency)=3clockcycles
D.ThepipelinemaximumthroughputHk=1/(3×
20ns)
3.4Usingthefollowingcodefragment:
LWR1,0(R2);
loadR1fromaddress0+R2
ADDIR1,R1,#1;
R1=R1+1
SW0(R2),R1;
storeR1ataddress0+R2
ADDIR2,R2,#4;
R2=R2+4
SUBR4,R3,R2;
R4=R3-R2
BNEZR4,Loop;
BranchtoloopifR4!
=0
AssumethattheinitialvalueofR3isR2+396.
ThroughoutthisexerciseusetheclassicRISCfive-stageintegerpipelineandassumeallmemoryaccesstake1clockcycle.
A.ShowthetimingofthisinstructionsequencefortheRISCpipelinewithoutanyforwardingorbypassinghardwarebutassumingaregisterreadandawriteinthesameclockcycle“forwards”throughtheregisterfile.Assumethatthebranchishandledbyflushingthepipeline.Ifallmemoryreferencestake1cycle,howmanycyclesdoesthislooptaketoexecute?
B.ShowthetimingofthisinstructionsequencefortheRISCpipelinewithnormalforwardingandbypassinghardware.Assumethatthebranchishandledbypredictingitasnottaken.Ifallmemoryreferencetake1cycle,howmanycyclesdoesthislooptaketoexecute?
C.AssumetheRISCpipelinewithasingle-cycledelayedbranchandnormalforwardingandbypassinghardware.Scheduletheinstructionsintheloopincludingthebranchdelayslot.Youmayreorderinstructionsandmodifytheindividualinstructionoperands,butdonotundertakeotherlooptransformationsthatchangethenumberoropcodeoftheinstructionsintheloop.Showapipelinetimingdiagramandcomputethenumberofcyclesneededtoexecutetheentireloop.
A.·
Theloopiterates396/4=99times.
·
Gothroughonecompleteiterationoftheloopandthefirstinstructioninthenextiteration.
Totallength=thelengthofiterations0through97(Thefirst98iterationsshouldbeofthesamelength)+thelengthofthelastiteration.
WehaveassumedtheversionofDLXdescribedinFigure3.21(Page97)inthebook,whichresolvesbranchesinMEM.
FromthisFigure,theseconditerationbegin17clocksafterthefirstiterationandthelastiterationtakes18cyclestocomplete.
Totallength=17×
98+18=1684clockcycles
B.·
FromthisFigure,theseconditerationbegin10clocksafterthefirstiterationandthelastiterationtakes11cyclestocomplete.
Totallength=10×
98+11=991clockcycles
C.Loop:
Reorderinstructionsto:
LWR1,0(R2);
SW-4(R2),R1;
storeR1ataddress0+R2
FromFiguretheseconditerationbegin6clocksafterthefirstiterationandthelastiterationtakes10cyclestocomplete.
Totallength=6×
98+10=598clockcycles
stall
(stall)ADDIR2,R2,#4;
(stall)ADDIR1,R1,#1;
(stall)SW-4(R2),R1;
3.5Considerthefollowingreservationtableforafour-stagepipeline.
D.DeterminethepipelinemaximumthroughputcorrespondingtotheMAL.
E.Accordingtotheshortestgreedycycle,putsixtasksintothepipeline,determinethepipelineactualthroughput.
1
2
3
4
5
6
7
√
A.theforbiddenlatenciesare{2,4,6}
theinitialcollisionvectorC=(101010)
B.thestatetransitiondiagram:
C.theMALassociatedwiththeshortestgreedycycleis4cycles.
scheduling
Averagelatency
(1,7)
(3,5)
(5,3)
(5)
(3,7)
(5,7)
(7)
D.thepipelinemaximumthroughputcorrespondingtotheMAL:
Hk=1/(4clockcycles)
E.Accordingtotheshortestgreedycycle,putsixtasksintothepipeline.
Thebestschedulingisthegreedycycle(l,7).
because:
accordingto(1,7)scheduling:
actualthroughputHk=6/(1+7+1+7+1+7)=6/(24cycles)
accordingto(3,5)scheduling:
actualthroughputHk=6/(3+5+3+5+3+7)=6/(26cycles)
accordingto(5,3)scheduling:
actualthroughputHk=6/(5+3+5+3+5+
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 完整版 计算机体系结构 课后 习题