书签分享收藏举报版权申诉 / 43

立即下载加入VIP,免费下载

当前位置：首页 > 自然科学 > 物理 > 机器学习期末试题.docx

机器学习期末试题.docx

文档编号：17754432
上传时间：2023-08-03
格式：DOCX
页数：43
大小：2.36MB

机器学习期末试题.docx

《机器学习期末试题.docx》由会员分享，可在线阅读，更多相关《机器学习期末试题.docx（43页珍藏版）》请在冰点文库上搜索。

机器学习期末试题.docx

机器学习期末试题

中国科学院大学课程编号：

712008Z

试题专用纸课程名称：

机器学习

任课教师：

卿来云

———————————————————————————————————————————————

姓名学号成绩

一、基础题（共36分）

1、请描述极大似然估计MLE和最大后验估计MAP之间的区别。

请解释为什么MLE比MAP更容易过拟合。

（10分）

A

B

C

y

0

1

0

1

0

1

0

1

0

1

0

1

2、在年度百花奖评奖揭晓之前，一位教授问80个电影系的学生，谁将分别获得8个奖项（如最佳导演、最佳男女主角等）。

评奖结果揭晓后，该教授计算每个学生的猜中率，同时也计算了所有80个学生投票的结果。

他发现所有人投票结果几乎比任何一个学生的结果正确率都高。

这种提高是偶然的吗？

请解释原因。

（10分）

3、假设给定如右数据集，其中A、B、C为二值随机变量，y为待预测的二值变量。

（a）对一个新的输入A=0,B=0,C=1，朴素贝叶斯分类器将会怎样预测y？

（10分）

（b）假设你知道在给定类别的情况下A、B、C是独立的随机变量，那么其他分类器（如Logstic回归、SVM分类器等）会比朴素贝叶斯分类器表现更好吗？

为什么？

（注意：

与上面给的数据集没有关系。

）（6分）

二、回归问题。

（共24分）

现有N个训练样本的数据集

，其中

为实数。

1．我们首先用线性回归拟合数据。

为了测试我们的线性回归模型，我们随机选择一些样本作为训练样本，剩余样本作为测试样本。

现在我们慢慢增加训练样本的数目，那么随着训练样本数目的增加，平均训练误差和平均测试误差将会如何变化？

为什么？

（6分）

平均训练误差：

A、增加B、减小

平均测试误差：

A、增加B、减小

2．给定如下图（a）所示数据。

粗略看来这些数据不适合用线性回归模型表示。

因此我们采用如下模型：

，其中

。

假设我们采用极大似然估计w，请给出log似然函数并给出w的估计。

（8分）

3．给定如下图（b）所示的数据。

从图中我们可以看出该数据集有一些噪声，请设计一个对噪声鲁棒的线性回归模型，并简要分析该模型为什么能对噪声鲁棒。

（10分）

（a）（b）

三、SVM分类。

（第1~5题各4分，第6题5分，共25分）

下图为采用不同核函数或不同的松弛因子得到的SVM决策边界。

但粗心的实验者忘记记录每个图形对应的模型和参数了。

请你帮忙给下面每个模型标出正确的图形。

1、

其中

。

2、

其中

。

3、

其中

。

4、

其中

。

5、

其中

。

6、考虑带松弛因子的线性SVM分类器：

下面有一些关于某些变量随参数C的增大而变化的表述。

如果表述总是成立，标示“是”；如果表述总是不成立，标示“否”；如果表述的正确性取决于C增大的具体情况，标示“不一定”。

共3页第2页

（1）

不会增大

（2）

增大

（3）

不会减小

（4）会有更多的训练样本被分错

（5）间隔（Margin）不会增大

四、一个初学机器学习的朋友对房价进行预测。

他在一个N=1000个房价数据的数据集上匹配了一个有533个参数的模型，该模型能解释数据集上99%的变化。

1、请问该模型能很好地预测来年的房价吗？

简单解释原因。

（5分）

2、如果上述模型不能很好预测新的房价，请你设计一个合适的模型，给出模型的参数估计，并解释你的模型为什么是合理的。

（10分）

共3页第3页

机器学习题库

一、极大似然

1、MLestimationofexponentialmodel（10）

AGaussiandistributionisoftenusedtomodeldataontherealline,butissometimesinappropriatewhenthedataareoftenclosetozerobutconstrainedtobenonnegative.Insuchcasesonecanfitanexponentialdistribution,whoseprobabilitydensityfunctionisgivenby

GivenNobservationsxidrawnfromsuchadistribution:

（a）Writedownthelikelihoodasafunctionofthescaleparameterb.

（b）Writedownthederivativeoftheloglikelihood.

（c）GiveasimpleexpressionfortheMLestimateforb.

2、换成Poisson分布：

二、贝叶斯

1、贝叶斯公式应用

假设在考试的多项选择中，考生知道正确答案的概率为p，猜测答案的概率为1-p，并且假设考生知道正确答案答对题的概率为1，猜中正确答案的概率为

，其中m为多选项的数目。

那么已知考生答对题目，求他知道正确答案的概率。

：

2、Conjugatepriors

Givenalikelihood

foraclassmodelswithparametersθ,aconjugatepriorisadistribution

withhyperparametersγ,suchthattheposteriordistribution

与先验的分布族相同

（a）Supposethatthelikelihoodisgivenbytheexponentialdistributionwithrateparameterλ:

Showthatthegammadistribution

_

isaconjugatepriorfortheexponential.Derivetheparameterupdategivenobservations

andthepredictiondistribution

.

（b）Showthatthebetadistributionisaconjugatepriorforthegeometricdistribution

whichdescribesthenumberoftimeacoinistosseduntilthefirstheadsappears,whentheprobabilityofheadsoneachtossisθ.Derivetheparameterupdateruleandpredictiondistribution.

（c）Suppose

isaconjugatepriorforthelikelihood

;showthatthemixtureprior

isalsoconjugateforthesamelikelihood,assumingthemixtureweightswmsumto1.

（d）Repeatpart（c）forthecasewherethepriorisasingledistributionandthelikelihoodisamixture,andthepriorisconjugateforeachmixturecomponentofthelikelihood.

somepriorscanbeconjugateforseveraldifferentlikelihoods;forexample,thebetaisconjugatefortheBernoulli

andthegeometricdistributionsandthegammaisconjugatefortheexponentialandforthegammawithfixedα

（e）（Extracredit,20）Explorethecasewherethelikelihoodisamixturewithfixedcomponentsandunknownweights;i.e.,theweightsaretheparameterstobelearned.

三、判断题

（1）给定n个数据点，如果其中一半用于训练，另一半用于测试，则训练误差和测试误差之间的差别会随着n的增加而减小。

（2）极大似然估计是无偏估计且在所有的无偏估计中方差最小，所以极大似然估计的风险最小。

（３）回归函数A和B，如果A比B更简单，则A几乎一定会比B在测试集上表现更好。

（４）全局线性回归需要利用全部样本点来预测新输入的对应输出值，而局部线性回归只需利用查询点附近的样本来预测输出值。

所以全局线性回归比局部线性回归计算代价更高。

（５）Boosting和Bagging都是组合多个分类器投票的方法，二者都是根据单个分类器的正确率决定其权重。

（６）Intheboostingiterations,thetrainingerrorofeachnewdecisionstumpandthetrainingerrorofthecombinedclassifiervaryroughlyinconcert（F）

Whilethetrainingerrorofthecombinedclassifiertypicallydecreasesasafunctionofboostingiterations,theerroroftheindividualdecisionstumpstypicallyincreasessincetheexampleweightsbecomeconcentratedatthemostdifficultexamples.

（７）OneadvantageofBoostingisthatitdoesnotoverfit.（F）

（８）Supportvectormachinesareresistanttooutliers,i.e.,verynoisyexamplesdrawnfromadifferentdistribution.（Ｆ）

（9）在回归分析中，最佳子集选择可以做特征选择，当特征数目较多时计算量大；岭回归和Lasso模型计算量小，且Lasso也可以实现特征选择。

（10）当训练数据较少时更容易发生过拟合。

（11）梯度下降有时会陷于局部极小值，但EM算法不会。

（12）在核回归中，最影响回归的过拟合性和欠拟合之间平衡的参数为核函数的宽度。

（13）IntheAdaBoostalgorithm,theweightsonallthemisclassifiedpointswillgoupbythesamemultiplicativefactor.（T）

（14）True/False:

Inaleast-squareslinearregressionproblem,addinganL2regularizationpenaltycannotdecreasetheL2errorofthesolutionwˆonthetrainingdata.（F）

（15）True/False:

Inaleast-squareslinearregressionproblem,addinganL2regularizationpenaltyalwaysdecreasestheexpectedL2errorofthesolutionwˆonunseentestdata（F）.

（16）除了EM算法，梯度下降也可求混合高斯模型的参数。

（T）

（20）Anydecisionboundarythatwegetfromagenerativemodelwithclass-conditionalGaussiandistributionscouldinprinciplebereproducedwithanSVMandapolynomialkernel.

True!

Infact,sinceclass-conditionalGaussiansalwaysyieldquadraticdecisionboundaries,theycanbereproducedwithanSVMwithkernelofdegreelessthanorequaltotwo.

（21）AdaBoostwilleventuallyreachzerotrainingerror,regardlessofthetypeofweakclassifierituses,providedenoughweakclassifiershavebeencombined.

False!

Ifthedataisnotseparablebyalinearcombinationoftheweakclassifiers,AdaBoostcan’tachievezerotrainingerror.

（22）TheL2penaltyinaridgeregressionisequivalenttoaLaplacepriorontheweights.（F）

（23）Thelog-likelihoodofthedatawillalwaysincreasethroughsuccessiveiterationsoftheexpectationmaximationalgorithm.（F）

（24）Intrainingalogisticregressionmodelbymaximizingthelikelihoodofthelabelsgiventheinputswehavemultiplelocallyoptimalsolutions.（F）

四、回归

1、考虑回归一个正则化回归问题。

在下图中给出了惩罚函数为二次正则函数，当正则化参数C取不同值时，在训练集和测试集上的log似然（meanlog-probability）。

（10分）

（1）说法“随着C的增加，图2中训练集上的log似然永远不会增加”是否正确，并说明理由。

（2）解释当C取较大值时，图2中测试集上的log似然下降的原因。

2、考虑线性回归模型：

，训练数据如下图所示。

（10分）

（1）用极大似然估计参数，并在图（a）中画出模型。

（3分）

（2）用正则化的极大似然估计参数，即在log似然目标函数中加入正则惩罚函数

，

并在图（b）中画出当参数C取很大值时的模型。

（3分）

（3）在正则化后，高斯分布的方差

是变大了、变小了还是不变？

（4分）

图（a）图（b）

3.考虑二维输入空间点

上的回归问题，其中

在单位正方形内。

训练样本和测试样本在单位正方形中均匀分布，输出模型为

，我们用1-10阶多项式特征，采用线性回归模型来学习x与y之间的关系（高阶特征模型包含所有低阶特征），损失函数取平方误差损失。

（1）现在

个样本上，训练1阶、2阶、8阶和10阶特征的模型，然后在一个大规模的独立的测试集上测试，则在下3列中选择合适的模型（可能有多个选项），并解释第3列中你选择的模型为什么测试误差小。

（10分）

训练误差最小

训练误差最大

测试误差最小

1阶特征的线性模型

X

2阶特征的线性模型

X

8阶特征的线性模型

X

10阶特征的线性模型

X

（2）现在

个样本上，训练1阶、2阶、8阶和10阶特征的模型，然后在一个大规模的独立的测试集上测试，则在下3列中选择合适的模型（可能有多个选项），并解释第3列中你选择的模型为什么测试误差小。

（10分）

训练误差最小

训练误差最大

测试误差最小

1阶特征的线性模型

X

2阶特征的线性模型

8阶特征的线性模型

X

10阶特征的线性模型

X

（3）Theapproximationerrorofapolynomialregressionmodeldependsonthenumberoftrainingpoints.（T）

（4）Thestructuralerrorofapolynomialregressionmodeldependsonthenumberoftrainingpoints.（F）

4、Wearetryingtolearnregressionparametersforadatasetwhichweknowwasgeneratedfromapolynomialofacertaindegree,butwedonotknowwhatthisdegreeis.Assumethedatawasactuallygeneratedfromapolynomialofdegree5withsomeaddedGaussiannoise（thatis

.

Fortrainingwehave100{x,y}pairsandfortestingweareusinganadditionalsetof100{x,y}pairs.Sincewedonotknowthedegreeofthepolynomialwelearntwomodelsfromthedata.ModelAlearnsparametersforapolynomialofdegree4andmodelBlearnsparametersforapolynomialofdegree6.Whichofthesetwomodelsislikelytofitthetestdatabetter?

Answer:

Degree6polynomial.Sincethemodelisadegree5polynomialandwehaveenoughtrainingdata,themodelwelearnforasixdegreepolynomialwilllikelyfitaverysmallcoefficientforx6.Thus,eventhoughitisasixdegreepolynomialitwillactuallybehaveinaverysimilarwaytoafifthdegreepolynomialwhichisthecorrectmodelleadingtobetterfittothedata.

5、Input-dependentnoiseinregression

Ordinaryleast-squaresregressionisequivalenttoassumingthateachdatapointisgeneratedaccordingtoalinearfunctionoftheinputpluszero-mean,constant-varianceGaussiannoise.Inmanysystems,however,thenoisevarianceisitselfapositivelinearfunctionoftheinput（whichisassumedtobenon-negative,i.e.,x>=0）.

a）Whichofthefollowingfamiliesofprobabilitymodelscorrectlydescribesthissituationintheunivariatecase?

（Hint:

onlyoneofthemdoes.）

（iii）iscorrect.InaGaussiandistributionovery,thevarianceisdeterminedbythecoefficientofy2;sobyreplacing

by

wegetavariancethatincreaseslinearlywithx.（Notealsothechangetothenormalization“constant.”）（i）hasquadraticdependenceonx;（ii）doesnotchangethevarianceatall,itjustrenamesw1.

b）CircletheplotsinFigure1thatcouldplausiblyhavebeengeneratedbysomeinstanceofthemodelfamily（ies）youchose.

（ii）and（iii）.（Notethat（iii）worksfor

.）（i）exhibitsalargevarianceatx=0,andthevarianceappearsindependentofx.

c）True/False:

Regressionwithinput-dependentnoisegivesthesamesolutionasordinaryregressionforaninfinitedatasetgeneratedaccordingtothecorrespondingmodel.

True.Inbothcasesthealgorithmwillrecoverthetrueunderlyingm