Complete Search and Analytics based on dissecting Twitter data

  • Retrieved, preprocessed and indexed tweets using the Twitter API and Apache Solr.
  • A BM25 model based search engine was created. Sentiment analysis was performed on the retrieved tweets and the tweet map and word clouds were created dynamically.

We crawled around 3 million tweets from multiple cities, for multiple topics and languages using the Twitter API. The collected data was cleaned, preprocessed and then indexed using Solr. A BM25 information retrieval model was initialized and optimized to improve recall.

A full fledged search engine that passes the search query to this model in Solr and retrieves the search results was developed. The search results display the user name, tweet text, topic, city and the sentiment of the tweet. A sentiment analyzer identifies the overall sentiment of each tweet and classifies the tweet into happy, sad or neutral.

We also generate word clouds for all topics. The website also displays a tweet map that integrates location wise analysis of data.

The website can be found here
The video demonstration below demonstrates the functioning of our search engine :


Find the entire project here: