web analytics

How to Scrape Twitter data using C-Sharp??

This Tutorial will explain you how we can scrape twitter data  using C sharp.

Have you heard about twitter Data Scraping ?? It is a way to Scrape twitter data from twitter.com by Automated way using C Sharp.

twitter data scraping provides updated tweet information along with tweets, category, profile id etc..

Data we can extract using python.

  • tweet
  • category
  • profile id
  • listed count
  • friends
  • followers
  • title
  • created date
  • location
  • name
  • description
  • profile_banner_url
  • own_site_url
  • language
  • timeline_url

Screen shot  from data will be extracting

To find appropriate data from website first we have to  inspecting and understanding html tag  which is associated with given data ..

please follow below steps to finding tags

  1. Open browser (Google Chrome , Mozilla )
  2. Copy and paste url you want to scrape.
  1. Press F12 to view HTML structure of given site.,
  1. find tags for require data

C-Sharp Code to Scrape  twitter.com

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.Threading.Tasks;

using System.IO;

using System.Linq;

using System.Net;

using HtmlAgilityPack;

using Newtonsoft.Json;

using Newtonsoft.Json.Linq;

namespace Twitter

{

class Program

{

/// <summary>

/// Scrap Data from Yellowpages.com

/// Store Data to Json format

/// </summary>

/// <param name=”args”></param>

static void Main(string[] args)

{

string url = string.Empty;

string strHtml = string.Empty;

//Console.WriteLine(“Please Enter URL :- “);

Console.WriteLine(“Please Enter URL :- “);

url = Console.ReadLine();

Console.WriteLine(“Fetch Data From URL {0} …”, url);

strHtml = GetRequest(url);

object result = DataParse(strHtml);

Console.WriteLine(“Result :”);

Console.WriteLine(JsonConvert.SerializeObject(result, Formatting.Indented));

Console.ReadLine();

}

/// <summary>

/// below code send http get request to yellowpages.com

/// return content in form of string

/// reference 1 : https://stackoverflow.com/questions/4015324/how-to-make-http-post-web-request

/// reference 2 : https://stackoverflow.com/questions/27108264/c-sharp-how-to-properly-make-a-http-web-get-request

/// Lib Reference

///  1 : using System.Net;

///  2 : using System.IO;

///  3 : using System.Text;

/// </summary>

/// <param name=”url”></param>

/// <returns></returns>

public static string GetRequest(string url)

{

string strhtml = String.Empty;

HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);

request.AutomaticDecompression = DecompressionMethods.GZip;

request.UserAgent =

“Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36”;

using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())

using (Stream stream = response.GetResponseStream())

using (StreamReader reader = new StreamReader(stream))

{

strhtml = reader.ReadToEnd();

}

return strhtml;

}

/// <summary>

/// This method is use to parse data from string

/// Return object with data

/// Lib Reference

///  1 : using HtmlAgilityPack;

///  2 : using Newtonsoft.Json;

///  3 : using Newtonsoft.Json.linq;

/// </summary>

/// <param name=”strHtml”></param>

/// <returns></returns>

public static object DataParse(string strHtml)

{

string tweet = String.Empty;

string category = String.Empty;

string profileid = String.Empty;

string listed_count = String.Empty;

string friends = String.Empty;

string followers = String.Empty;

string title = String.Empty;

string createddate = String.Empty;

string location = String.Empty;

string name = String.Empty;

string description = String.Empty;

string profile_banner_url = String.Empty;

string own_site_url = String.Empty;

string language = String.Empty;

string timeline_url = String.Empty;

HtmlAgilityPack.HtmlDocument htmlDocument = new HtmlAgilityPack.HtmlDocument();

htmlDocument.LoadHtml(strHtml);

string strJson = htmlDocument.DocumentNode.SelectSingleNode(“//input[@id=’init-data’]”).Attributes[“value”].Value.Replace(“&quot;”, “\””);

JObject jObject = JObject.Parse(strJson);

title = jObject[“initialState”][“title”].Value<string>();

name = jObject[“profile_user”][“name”].Value<string>();

category = jObject[“pageName”].Value<string>();

description = jObject[“profile_user”][“description”].Value<string>();

createddate = jObject[“profile_user”][“created_at”].Value<string>();

location = jObject[“profile_user”][“location”].Value<string>();

profileid = jObject[“profile_user”][“id”].Value<string>();

tweet = jObject[“profile_user”][“statuses_count”].Value<string>();

friends = jObject[“profile_user”][“friends_count”].Value<string>();

followers = jObject[“profile_user”][“followers_count”].Value<string>();

listed_count = jObject[“profile_user”][“listed_count”].Value<string>();

language = jObject[“profile_user”][“lang”].Value<string>();

own_site_url = jObject[“profile_user”][“url”].Value<string>();

timeline_url = jObject[“timeline_url”].Value<string>();

profile_banner_url = jObject[“profile_user”][“profile_banner_url”].Value<string>();

return new

{

Title = title,

Name = name,

Category = category,

Description = description,

CreatedDate = createddate,

Location = location,

Profileid = profileid,

Tweet = tweet,

Friends = friends,

Followers = followers,

Listed_count = listed_count,

Language = language,

Own_Site_url = own_site_url,

Timeline_url = timeline_url,

Profile_Banner_url = profile_banner_url

};

}

}

}

Above code is developed in C-sharp   so  To Run this Code you need to use Visual Studio .

Clarification :- This  code available in this tutorial is  only learning purpose . We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. This code is only  use for knowledge expansion regarding programming field.. by this tutorial we are not encourage tweeter scraping or web scraping but will help to understand scraping.. also we are not responsible to provide any support for this code .. user can modify for learning purpose..

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.