Coursera Machine Learning Weeks 9 and 10

I am still in catch-up mode, but fortunately these seem to be the last lectures.

Anomaly Detection

  • Problem Motivation – anomaly here means the same as outlier, I think. Or in certain contexts error or defect.
  • Gaussian Distribution – a.k.a the Normal Distribution.
  • Algorithm – the algorithm finds outliers given a threshold for the probability of a value in the data set assuming Gaussian distribution. (Apparently this assumption is OK. I am not so sure myself. I believe that there are other/better ways to find outliers.)
  • Developing and Evaluating an Anomaly Detection System
  • Anomaly Detection vs. Supervised Learning
  • Choosing What Features to Use
  • Multivariate Gaussian Distribution
  • Anomaly Detection using the Multivariate Gaussian Distribution

Recommender Systems

  • Problem Formulation – this lecture answers the age-old question of how to recommend movies.
  • Content Based Recommendations – based on the content of the movies. We should have values for the degree of romance or action (features) in the motion picture. For example, “Capitalism – a love story” is clearly a romantic movie. So it will have the value 5 out of 5 for romance.
  • Collaborative Filtering – in this scheme the content feature values can be partially missing. We try to learn those on the fly as well.
  • Collaborative Filtering Algorithm – first we initialize to small random values. Then we minimize the cost function of ratings and features. Finally use the result to predict ratings (recommend).
  • Vectorization: Low Rank Matrix Factorization
  • Implementational Detail: Mean Normalization

Large Scale Machine Learning

  • Learning With Large Datasets – having lots of training examples is problematic, because the cost function of gradient descent will require us to sum over a lot of terms.
  • Stochastic Gradient Descent – perform random/drunk walk downhill one training example at a time instead of evaluating all of them.
  • Mini-Batch Gradient Descent – partition the training examples and use small batches (dependent on concurrency capability) to make progress faster.
  • Stochastic Gradient Descent Convergence – obviously if we are going downhill we should converge eventually, but it’s best to make sure by plotting the progress.
  • Online Learning – online here means on the fly.
  • Map Reduce and Data Parallelism

Application Example: Photo OCR

  • Problem Description and Pipeline – the problem is to¬† recognize text in photos.
  • Sliding Windows – small rectangular patches are used to scan the photos.
  • Getting Lots of Data and Artificial Data – this is similar to bootstrapping I guess. If you don’t have enough data, you can always mix whatever you have or add distortions and noise.
  • Ceiling Analysis: What Part of the Pipeline to Work on Next – this analysis helps you find the low hanging fruit and focus on easy wins.
By the author of NumPy Beginner's Guide, NumPy Cookbook and Instant Pygame. If you enjoyed this post, please consider leaving a comment or subscribing to the RSS feed to have future articles delivered to your feed reader.
Share
This entry was posted in Uncategorized. Bookmark the permalink.