Rselenium and automatic log in procedure for facebook

Trying to monitor the flux toward my website generated by using my professional facebook Page, I tried to figure out how to perform regular updates of a dashbord. I will not speak about the dashbord in itself but I will focus on how to scrape my posts’ data on facebook. You could find many posts on the web, which explain how to do it. But in my opinion, this is not sufficient for a real beginner, because of missing steps. So this post aims at giving you the tools to accomplish this task with all the steps described. However, I assume you know a little of R (how to install packages), and for anything you need out of R’s scope, you can find extra information here.

First we need a R script that will deal with the Facebook pages,  and for my purpose here, a way to automate the launching of the Rselenium server needed by the R script.

Accessing Facebook data

To obtain the FB data you need to use the Facebook Graph API, which depends on a user token to grant you access. To automate the access to the token, this is the procedure that I use:

  • automate the login on Facebook
  • go to the Facebook Graph API to get the token
  • and scrape the data on my Facebook professional Page

This is not a difficult task, Rselenium works fine to log in on a personal Facebook page.

The R script that does the scrapping job

As the aim of this post is a tutorial on how to log in on a site and specifically on a Facebook you will need an information about the targeted page, here for the example my professional page. You need the numeric ID of the page. This is accomplished once, using this site for instance. For my page it is 1939114249654307.

If you want to try this procedure by yourself on your own Facebook page for instance, this ID is to be used in the R script at two locations (so make sure that the changes are made according to your needs):

paste0("https://graph.facebook.com/v2.10/1939114249654307/feed?fields=shares&access_token=", Token)
# AND
paste0("https://graph.facebook.com/v2.10/1939114249654307/feed?fields=shares&access_token=", Token)

So the full R script (named script_updater.R) is:

rm(list=ls())
YourLogIN <- 'email_address'
YourPassword <- "password"


library(RSelenium)
library(stringr)
library(RCurl)
library(rjson)

remDr <- remoteDriver(browserName = "phantomjs")
remDr$open()

remDr$navigate("https://fr-fr.facebook.com/login/")

#send username
remDr$findElement("id", "email")$sendKeysToElement(list(YourLogIN))
remDr$findElement("id", "pass")$sendKeysToElement(list(YourPassword))
remDr$findElement("id", "loginbutton")$clickElement()

#then we go to the API page
remDr$navigate("https://developers.facebook.com/tools/explorer")
df<-remDr$getPageSource()

TXT <- unlist(df)
Splitted <- unlist(str_split(TXT, "\\\\"))
Splitted2 <- Splitted[grep("value=", Splitted)]
Splitted3 <- unlist(str_split(Splitted2 , '\\"'))
To_test <- Splitted3[grep("^[a-zA-Z0-9]{30,}$", Splitted3)]
if(length(To_test)<1){stop("token not retrieved")}
Token <- Splitted3[grep("^[a-zA-Z0-9]{30,}$", Splitted3)]

# for posts data
Posts <- paste0("https://graph.facebook.com/v2.10/1939114249654307/feed?access_token=", Token)

Posts_data <- getURL(Posts)
Posts_data <- do.call(rbind, fromJSON(Posts_data)$data)
Posts_data <- data.frame(Posts_data, stringsAsFactors=F)

# for shares data
Shares <- paste0("https://graph.facebook.com/v2.10/1939114249654307/feed?fields=shares&access_token=", Token)

Shares_data<- getURL(Shares)
Shares_data <- do.call(rbind, fromJSON(Shares_data)$data)
invisible(sapply(1:length(Shares_data[, 1]), function(i){
	Shares_data[i, 1] <<- unlist(Shares_data[i, 1])
		}))

If you want to try it yourself, make sure you have previously installed all the necessary packages.

Automate the use of Rselenium package

Now, as we have a R script, we will see how to automate its use. As a Mac user, I use the following procedure to make my R script running alone:

  • I need to write a Applescript to launch first the selenium server, and then the R script itself
  • I will need  to define a cron job (to schedule the updater launch)

I am sure Windows or linux users will find their own way!

The Applescript to automate the launch all the procedure

An Applescript like the following will automatically launch the selenium server (if it is your first time with selenium, you may find this useful). The Applescript contains two instructions, first to change the directory to where the selenium-server-standalone-2.53.1.jar file is located, then to execute this file.

tell application "Terminal"
	activate 
	do script "cd /Users/admin/Desktop 
	java -jar selenium-server-standalone-2.53.1.jar" 
end tell

When the server is running the Applescript calls the R script previously described.

tell application "R"
	activate 
	cmd "source('/Users/admin/Desktop/script_updater.R', chdir = TRUE)" 
end tell

So the complete Applescript (named MyScript.scpt ) is:

tell application "Terminal"
	activate 
	do script "cd /Users/admin/Desktop 
	java -jar selenium-server-standalone-2.53.1.jar" 
end tell
delay 30
tell application "R"
	activate 
	cmd "source('/Users/admin/Desktop/script_updater.R', chdir = TRUE)" 
end tell

NB: delay is added to give time to the selenium server to be running before calling the R script.

The cronjob to automatically call the Applescript

The cronjob I defined to run the Applescript every monday at 12:20 is:

20 12 * * 1 osascript /Users/admin/Desktop/MyScript.scpt

If you are new to contab, please look at this link. Basically, to define it, I use “crontab -e” command line in the Terminal to create/edit a job, next “a” to insert and to modify, “esc” followed by :wq to save your creation/change.

Et voilà!

Now, I let you do what you want with your Facebook data which are stored into Post_data and Shares_data.

G.