Final exam

Reading Guide:

Listed below are the minimum things you should know. This is not an all-inclusive list, but you should at least be prepared to do these things:

Reinforcement Learning

  • Calculating Q-function and finding optimal policy
  • MDP, use of discounted reward
  • Sample: HW5

SVM, Multiclass classfication, and Kernel SVM

  • Sample: HW6
  • Focus on What kinds of functions K(·, ·) can correspond to some feature map φ?
  • How to calculate φ(x) from x?
  • How to calculate the weight-parameters (w/theta) given the decision function? You will find examples in the sample exams
  • Impact of offset, impact of C and slack variable
  • One vs all multiclass classification, loss for multiclass classification

Neural Networks

  • MLP – use, how number of layers affect func. approximation, hyperparameters
  • activation functions
  • CNN – use, what is convolution, what is filter and how are filters different and similar from MLP
  • RNN – use, problem of RNN, how are weights different and similar from MLP
  • Backpropagation for 2-layer MLP, and how is multi-layer MLP backprop. different from that

Bias - variance, Regularization, and Cross-validation

  • How is model complexity connected to bias, variance, and test error for different ML models?
    • For example, How variance changes with neighbor increase in KNN?
    • For example, How bias and variance changes with sigma in RBF SVM?
    • For example, How bias and variance changes with decision tree depth?
  • How does cross-validation help us to better generalize?


  • How does ensembling help in learning task to better generalize?
  • Effect of Bagging/ Random forest on bias-variance
  • Effect of Boosting on bias-variance
  • Sequential vs parallel training in bagging and boosting
  • adaboost algorithm steps
  • random forest algorithm steps
  • how does adaboost update weights and choose training examples for sequential training


  • Relationship between Principal componenets and explained variance
  • How much data is captured with each principal component?

Midterm exam

Reading Guide:

Listed below are the minimum things you should be prepared to do. This is not an all-inclusive list, but you should at least be prepared to do these things:

Logistic Regression, Linear Regression

  • What is the cost function and what is the log-likelihood?
  • How do you obtain the gradient descent update rule from cost function?
  • How do you get to log-likelihood from h(x)?
  • Why do we need 0-1 and perceptron loss?
  • What is minimizing least squares?

Multiclass classification

  • How is Multiclass classification connected to Logistic Regression (the general idea)
  • what is their loss (how do you train multiclass)?

Naive Bayes (NB)

  • How to estimate parameters for likelihood functions using bayes rule? What parameters needed to be calculated to obtain P(Y|X)?
  • How do you handle continuous and discrete X in NB?
  • How many independent parameters we need to estimate for calculation of joint probabilities?
  • how does NB assumption improves it?
  • what are the subtleties of Naive Bayes?
  • How to get log-likelihood from P(Dtheta) and how get MLE and MAP estimate of theta from it?

Bias - variance, and Cross-validation

  • How is model complexity connected to bias, variance, and test error?
  • How does L1 and L2 regularization affect classifiers?
  • How does cross-validation help us to better generalize?

Decision Tree

  • Run Simulation of a decision tree
  • When does overfitting happen? How to avoid overfitting in decision tree?
  • How does the decision boundary look like?


  • What are principal components? How do you find them?
  • How to get reduced dimension?
  • How do you #components?

K-means, KNN

  • What are the problems of KNN? How do you solve them?
  • What is hierarchical clustering? What does the evaluation metrics evaluate in clusters?
  • KNN decision boundaries
  • How variance changes with neighbor increase in KNN?
  • Advantage and Disadvantage

Sample Questions

