Coursera Machine Learning Weeks 7 and 8

I was out of the office for personal reasons. As a result I am behind on the Coursera Machine Learning lectures. Here are my notes by video and topic:

Support Vector Machines

  • Optimization Objective – yet another long equation is presented for the cost function.
  • Large Margin Intuition – the decision boundary found by support vector machines has a large margin apparently.
  • Mathematics Behind Large Margin Classification – optional, it’s basic stuff so you can safely skip it.
  • Kernels I – an introduction to the similarity function (kernel).
  • Kernels II – SVM has a parameter called C, which is the inverse of regularization constant it seems.
  • Using An SVM – need to choose C and kernel. You can use a Gaussian kernel, polynomial kernel or no kernel (linear) at all. And there are more specialized kernels. SVM software packages can sometimes do multi-class classification, but if that’s not the case you can just use the one-versus-all method.


  • Unsupervised Learning – unsupervised learning doesn’t require you to specify correct results for training examples. Clustering is introduced as an example of unsupervised learning along with its many applications. (In my opinion, clustering can give you incorrect results. For instance, I can use a random generator to generate some random data points. We still could find clusters, but those would be meaningless.)
  • K-Means Algorithm – k here refers to the number of clusters. The algorithm involves iteratively averaging values around centroids.
  • Optimization Objective – the cost function seems almost trivial.
  • Random Initialization
  • Choosing the Number of Clusters

Dimensionality Reduction

  • Motivation I: Data Compression – obviously if you reduce the number of dimensions your data would get compressed.
  • Motivation II: Visualization – and your data would be easier to visualize. (Although you might lose information.)
  • Principal Component Analysis Problem Formulation – basically we try to project the data to a lower dimensional (hyper) plane.
  • Principal Component Analysis Algorithm – it’s recommended to do mean normalization first. Then we need to either calculate the Singular Value Decomposition or eigenvectors. Singular Value Decomposition is apparently the better choice, because it is more numerically stable. And yes, there are NumPy functions that can do these computations for you.
  • Choosing the Number of Principal Components
  • Reconstruction from Compressed Representation
  • Advice for Applying PCA


By the author of NumPy Beginner's Guide, NumPy Cookbook and Instant Pygame. If you enjoyed this post, please consider leaving a comment or subscribing to the RSS feed to have future articles delivered to your feed reader.
This entry was posted in Uncategorized. Bookmark the permalink.