Attention-Mechanism注意力模型.pptx

文档编号：18898898
上传时间：2024-02-10
格式：PPTX
页数：18
大小：2.52MB

《Attention-Mechanism注意力模型.pptx》由会员分享，可在线阅读，更多相关《Attention-Mechanism注意力模型.pptx（18页珍藏版）》请在冰点文库上搜索。

Attention-Mechanism注意力模型.pptx

IntroductiontoAttentionMechanismBoWuApr.28,2018HumanvisualattentionHumanvisualattentionAparticularlystudiedaspectisvisualattention:

manyanimalsfocusonspecificpartsoftheirvisualinputstocomputetheadequateresponses.NeuralprocessesinvolvingattentionhavebeenlargelystudiedinNeuroscienceandComputationalNeuroscience1,2Asimilaridea:

focusingonspecificpartsoftheinputhasbeenappliedinDeepLearning,forspeechrecognition,translation,imagecaption,QA.AttentioninDeepLearningAttentioninDeepLearningEncoder-DecoderEncoder-DecoderTheencoderencodeseverythingweneedtoknowaboutthesourcesentence.Itgeneratesavectorwhichfullycapturethemeaningofsourcesentence.Thedecodergeneratesatranslationsolelybasedonthevectorfromtheencoder.NEURALMACHINETRANSLATIONNEURALMACHINETRANSLATIONTakeanrecurrentneuralnetwork（RNN）-usuallyanLSTM-andencodeasentencewritteninlanguageA（English）.TheRNNspitsoutahiddenstate,whichwerefertoasS.Thishiddenstatehopefullyrepresentsallthecontentofthesentence.ThishiddenstateSisthensuppliedtothedecoder,whichgeneratesthesentenceinlanguageB（German）wordbyword.encoderImagecaptionImagecaptiondecoderLimitationLimitationThelimitationisthattheonlylinkbetweenencodinganddecodingisafixedlengthofthesemanticvectorC.Thesemanticvectorcannotfullyrepresenttheentireinformationofsequence.Thelongertheinputsequence,themoreseriousthisphenomenonis.AttentionWecanextractacontextvectorthatsaweightedsummationoftheencoderoutputsdependingonhowrelevantwethinktheyare.ValueKeyQueryAttentionisAllYouNeedAttentionisAllYouNeedThispaperproposesanewsimplenetworkarchitecture,theTransformer,basedsolelyonattentionmechanisms,dispensingwithrecurrenceandconvolutionsentirely.Thispaperproposesamulti-headattentionmechanismmethodExperimentsontwomachinetranslationtasksshowthesemodelstobesuperiorinqualitywhilebeingmoreparallelizableandrequiringsignificantlylesstimetotrain.VaswaniA,ShazeerN,ParmarN,etal.AttentionisallyouneedC/AdvancesinNeuralInformationProcessingSystems.2017:

6000-6010.Multi-headattentionallowsthemodeltojointlyattendtoinformationfromdifferentrepresentationsubspacesatdifferentpositions.forlargevaluesofdk,thedotproductsgrowlargeinmagnitude,pushingthesoftmaxfunctionintoregionswhereithasextremelysmallgradients.Tocounteractthiseffect,wescalethedotproductsbySelfAttention（SelfAttention（intra-attention）ConclusionThenatureoftheattentionmechanismistopicktheinformationthatcontributesalottothetargetfromthesource.self-attentioncanbeaspecialcaseofgeneralAttention.Inself-attention,Q=K=Vandallunitsinthesequencearecalculatedbyattention.ReferencesSutskeverI,VinyalsO,LeQV.SequencetosequencelearningwithneuralnetworksC/Advancesinneuralinformationprocessingsystems.2014:

3104-3112.VinyalsO,ToshevA,BengioS,etal.Showandtell:

AneuralimagecaptiongeneratorC/ComputerVisionandPatternRecognition（CVPR）,2015IEEEConferenceon.IEEE,2015:

3156-3164.BahdanauD,ChoK,BengioY.NeuralmachinetranslationbyjointlylearningtoalignandtranslateJ.arXivpreprintarXiv:

1409.0473,2014.VaswaniA,ShazeerN,ParmarN,etal.AttentionisallyouneedC/AdvancesinNeuralInformationProcessingSystems.2017:

6000-6010.https:

/