# Social News in 1000 Steps – Step 10

This entry is part 10 of 14 in the series Social News

Following the previous step, I want to classify the Bing search results. I do this with the TfidfVectorizer and LogisticRegression classes. The results seem to be surprisingly good. The code is given as follows:

```import core from sklearn.cross_validation import train_test_split from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score from sklearn.metrics import cohen_kappa_score from sklearn.metrics import confusion_matrix import numpy as np     db = core.connect() bing_searches = db['bing_searches'] x = [] y = []   for row in bing_searches.all(): x.append("{0}\n{1}".format(row['title'], row['description'])) y.append(row['flag'] == 'Use')   vectorizer = TfidfVectorizer(sublinear_tf=True, stop_words='english')   train_x, test_x, train_y, test_y = train_test_split(x, y, random_state=43) train_features = vectorizer.fit_transform(train_x) test_features = vectorizer.transform(test_x) lr = LogisticRegression() lr.fit(train_features, train_y) preds = lr.predict(test_features) print('Test Accuracy', accuracy_score(test_y, preds)) print('Test Kappa', cohen_kappa_score(test_y, preds)) print('Confusion Matrix\n', confusion_matrix(test_y, preds))   feature_names = np.array(vectorizer.get_feature_names())   bottom10 = np.argsort(lr.coef_)[:10] print("Possible negative keywords {}".format(feature_names[bottom10]))```
Series NavigationSocial News in 1000 Steps – Step 9Social News in 1000 Steps – Step 11
By the author of NumPy Beginner's Guide, NumPy Cookbook and Instant Pygame. If you enjoyed this post, please consider leaving a comment or subscribing to the RSS feed to have future articles delivered to your feed reader.
This entry was posted in programming and tagged . Bookmark the permalink.