Upload List of Twitter Names Pull Bios
Twitter is a rich source of a user'southward interests: the public bio, observations, people followed, Retweets and favorites. What if we could procedure all this data in real time to build awesome web apps that personalize content based on the Twitter profile?
MonkeyLearn (@monkeylearn) is a technology platform that enables this type of deep app/site customization. Here we will show y'all how to process a Twitter user's public data to empower customization, equally well as other kinds of intelligent applications.
Every bit a prerequisite, you lot need to offset have Twitter API credentials via a registered Twitter app, likewise as accept signed up with MonkeyLearn and have an API token.
Overview
- Gather information most a Twitter user, including:
- Contour information
- Tweets
- Favorites
- Analyze the text to filter on linguistic communication and assign topic categories
- Create visualizations
- A pie chart of the most common topic categories
- A word deject of the most important keywords in a category
You can get the full source code here.
Assemble user data
First, we create a tweepy API object with our Twitter API key credentials:
# tweepy is used to call the Twitter API from Python import tweepy import re # Authenticate to Twitter API auth = tweepy.OAuthHandler(TWITTER_CONSUMER_KEY, TWITTER_CONSUMER_SECRET) auth.set_access_token(TWITTER_ACCESS_TOKEN_KEY, TWITTER_ACCESS_TOKEN_SECRET) api = tweepy.API(auth)
Once you take a Twitter client, we think Tweets and favorites, filtering for text-heavy Tweets and calculating a separate quality score:
def get_tweets(api, twitter_user, tweet_type='timeline', max_tweets=200, min_words=5): tweets = [] full_tweets = [] step = 200 # Maximum value is 200. for get-go in xrange(0, max_tweets, step): end = get-go + step # Maximum of `footstep` tweets, or the remaining to achieve max_tweets. count = min(step, max_tweets - start) kwargs = {'count': count} if full_tweets: last_id = full_tweets[-i].id kwargs['max_id'] = last_id - i if tweet_type == 'timeline': electric current = api.user_timeline(twitter_user, **kwargs) else: current = api.favorites(twitter_user, **kwargs) full_tweets.extend(current) for tweet in full_tweets: text = re.sub(r'(https?://\S+)', '', tweet.text) // calculate a "score" of tweet relevance/data quality score = tweet.favorite_count + tweet.retweet_count if tweet.in_reply_to_status_id_str: score -= 15 # Only keep tweets with at least min_words words. if len(re.dissever(r'[^0-9A-Za-z]+', text)) > min_words: tweets.append((text, score)) return tweets
In the provided source code, you'll also come across the states go one footstep further and include friends descriptions into our content corpus.
Filter on linguistic communication
The side by side step is to filter the Tweets and content to English. Nosotros can do this easily using MonkeyLearn's API, classifying text in batch style:
import requests import json # This is a handy function to classify a list of texts in batch way (much faster) def classify_batch(text_list, classifier_id): """ Batch classify texts text_list -- list of texts to exist classified classifier_id -- id of the MonkeyLearn classifier to exist applied to the texts """ results = [] step = 250 for start in xrange(0, len(text_list), step): end = start + stride data = {'text_list': text_list[offset:end]} response = requests.post( MONKEYLEARN_CLASSIFIER_BASE_URL + classifier_id + '/classify_batch_text/', data=json.dumps(data), headers={ 'Say-so': 'Token {}'.format(MONKEYLEARN_TOKEN), 'Content-Blazon': 'awarding/json' }) try: results.extend(response.json()['effect']) except: print response.text raise return results
If you need boosted language support, MonkeyLearn has a number of language classifiers, including Spanish, French and many others. Await at our source code for the filter_language() method on how to swap out for your desired language.
Detect categories
Now that nosotros have a list of Tweets and descriptions in English, we can use a MonkeyLearn topic classifier to categorize the text and create a histogram of the most popular categories for the user:
from collections import Counter def category_histogram(texts, short_texts): # Allocate the bios and tweets with MonkeyLearn'due south topic classifier. topics = classify_batch(texts, MONKEYLEARN_TOPIC_CLASSIFIER_ID) # The histogram will keep the counters of how many texts fall in # a given category. histogram = Counter() samples = {} for classification, text, short_text in zip(topics, texts, short_texts): # Join the parent and child category names in one string. category = classification[0]['label'] + '/' + classification[1]['label'] probability = (classification[0]['probability'] * classification[1]['probability']) MIN_PROB = 0.iii # Discard texts with a predicted topic with probability lower than a treshold if probability < MIN_PROB: proceed # Increment the category counter. histogram[category] += ane # Store the texts past category samples.setdefault(category, []).append((short_text, text)) return histogram, samples # Classify the expanded tweets using MonkeyLearn, return the historgram tweets_histogram, tweets_categorized = category_histogram(expanded_tweets, tweets_english) # Classify the expanded bios of the followed users using MonkeyLearn, return the historgram descriptions_histogram, descriptions_categorized = category_histogram(expanded_descriptions, descriptions_english)
Brandish the most popular categories
The above histogram counts how much Tweet action a user has in each category. Using matplotlib, we create a pie nautical chart that shows the distribution:
The previous pie nautical chart represents my own interests, which is a pretty accurate breakdown given my Twitter activeness. I'thousand a software engineer and geek, so I'thou very interested in Computers & Net/Programming. Also I'm an entrepreneur, so I'm too interested in Business organisation & Finance/Small businesses.
Excerpt keywords from a given category
The pie nautical chart offers a loftier level summary of a user's interests. We can dig deeper, finding specific interests in that category. To do that, we'll once again use our keyword extractor to highlight the most of import terms in each category.
First, for each category, we'll join all the content:
joined_texts = {} for category in tweets_categorized: if category not in top_categories: proceed expanded = 0 joined_texts[category] = u' '.bring together(map(lambda t: t[expanded], tweets_categorized[category]))
Nosotros then apply MonkeyLearn to extract keywords for each category, only proceed the top 20 past relevance:
keywords = dict(goose egg(joined_texts.keys(), extract_keywords(joined_texts.values(), 20))) for cat, kw in keywords.iteritems(): top_relevant = map( lambda ten: x.become('keyword'), sorted(kw, key=lambda 10: float(10.go('relevance')), reverse=True) ) print u"{}: {}".format(true cat, u", ".join(top_relevant))
The following clouds show the keywords that stand for the computers & internet, and the business & finance categories respectively:
As another information indicate, you tin see the pie chart and word cloud for Katy Perry, in which nosotros place events and Special Occasions and Entertainment are key categories, given her career and decorated outcome schedule.
Conclusion
Using the Twitter API and MonkeyLearn, it'southward elementary to classify and excerpt relevant information from the Tweets and users descriptions. Together they offering useful insights into an individual usage, which can exist used for a diversity of applications:
- For news or content sites: allow users to login in via Twitter to rapidly understand their interests and tailor your content accordingly.
- For e-commerce sites: recommend products based on user'south previous Tweets, favorites and follow graph.
We encourage using Twitter API and sign upwardly to MonkeyLearn to discover new applications with the programming language you dear.
Credits
A huge thank you to Agustin Azzinari and Rodrigo Stecanella for their contributions to the source code and Federico Pascual and Martin Alcala Rubi for their writing and editing.
brizendineaber1997.blogspot.com
Source: https://blog.twitter.com/developer/en_us/a/2015/guest-post-understanding-users-through-twitter-data-and-machine-learning
0 Response to "Upload List of Twitter Names Pull Bios"
Post a Comment