Saturday, 4 June 2016

Interesting undocumented Facebook API to identify friends in any personal photos

So last week I was casually browsing through Facebook feed and looking at some random photo posted by a friend. So hovering over the picture the Facebook displayed a rectangle over the face of my friend and showed "Do you want to tag X". This caught my attention. OK so Facebook does image recognition to identify not only faces of people but also identifies who it is. I wondered how FB does it so I spawned up my Chrome Dev Tools, to debug into how this whole thing works.

So it seems that FB has a bunch of div blocks which are the defining the face boxes with the suggestions for identity for each face box in data-recognizeduids. It also had another block of html with the names of the identified users.

OK so now the question is how is this data coming in the first place. It seems it is loaded when one clicks on a photo. So it must be loaded using some ajax call to the server later. Now to find out what call we switch to network tab which shows all the traffic between the browser and FB servers. So now it was all about finding the needle in haystack as there were hundreds of requests. So I started looking for the once with relevant names and struck gold when I found a request "/ajax/pagelet/generic.php/PhotoViewerInitPagelet". 

Then I searched the response for relevant content and found above block of html in the response. The params to the route looked encoded so after decoding them it seemed it sends bunch of params along with the id of the photo. The request looks something like this<changed>&no_script_path=1&data={"fbid":"<changed>","set":"<changed>","type":"3","theater":null,"firstLoad":true,"ssid":<changed>,"av":"<changed>"}&__user=1&__a=1&__dyn=<changed>&__req=jsonp_2&__be=0&__pc=PHASED:DEFAULT&__rev=11111&__adt=2

Most of the params don't make much sense except for fbid which looks like the id of the photo. So I played around removing unnecessary params to find the ones that are actually needed for api to work and I found that it needed only the fb id and 2-3 other constant params. Chrome has a pretty neat feature where it allows you to copy a request as CURL request which has all headers and params so you could run it from the command line as is. Using that I replicated the request and found the same response which had the html data containing the information about the users. So I thought if we can use this API to find friends in any photo which might not be uploaded on FB. 

But for the API to work it needs the photo to be on FB. So I thought if we can temporarily upload a photo to FB without publishing it and use the API. Now to upload photos on FB I used the FB Graph API. I needed these photos not to be published and uploaded temporarily for using the API. The Graph Photo Upload API allows to upload the images with options "temporary"=> true and "published"=> false so that the images are uploaded temporarily and not show up in the feed. I created an app on FB Photo Friend Finder which uses publish_actions and user_photos permissions to allow access to the photo API. Once the photo is uploaded the API returns with the id of the photo which is then passed to the above PhotoViewerInitPagelet API which returns HTML data containing details of face suggestions. Now all I needed to do was parse the response to extract the data which can easily be done using HTML parser. I used Nokogiri in ruby for the same.

I put together a small Ruby script to do all of these. So you just run the script with path to image as input param and the script returns the number of people in the photo and identifies the friends. I found this API has a limitation it only suggests people in the photo who are in your FB friend list. I used Koala gem to access FB Graph API and JSON and Nokogiri libraries to parse the API response.
Here is the link to the script.
Script in action.

It was a lot of fun exploring and putting together the script.

Sunday, 29 March 2015

Series Torrent Downloader to Dropbox

Blog post after a lonnng time. So I like to watch movies and series a lot. I started following so many series that it became difficult to keep track of what series is aired when and what episode am I on. It was almost like monday one, tuesday another, wednesday third one. And after that search and download it from somewhere( ;) torrents). So for a long time I had this idea that there should be some application that automatically takes care of everything from keeping track of all my series (which episode is aired when and download the next aired episode from torrents to my dropbox and notify me after that). But didn't find time to build one. But finally I was able to put something together. 

So it all started when I came across the popcorn app, which is used for streaming movies and series from torrents. So I just got curious about how it works, and I came the torrent stream library, that the popcorn app used. It basically takes a torrent and generates a stream out of it, and you can write it to a file, stream it on a video player. So I thought if I could upload the stream to some cloud storage. Now dropbox provides a chunked upload feature that can take chunks of file, so essentially you could break down a very large file into chunks and upload it, or resume a partial download using this feature. So I took the chunks in stream provided by library and uploaded chunks to dropbox. It seems even google drive provides an api for it. But for now I have it only for dropbox. As I am open sourcing it, someone might be interested in extending it to other cloud storage services. So now I had the portion that uploaded a torrent to dropbox. Now as this library is in nodejs. But I am not so proficient in JS. So for the other part I used ruby. So I wrote a service that essentially monitors all series that you specify in configs and download them to dropbox using the nodejs script. It uses the tv maze api, to get the next episode for each series that it needs to download. So it essentially runs like a cron, every 2 hours it gets an episode of a series from api checks if that episode is out for download, if not it records it in a DbStore. (I used redis as my db store.) and sleeps. Next time when it runs it checks if the next episode in my db store is available. Now if it is then it downloads the file to dropbox, and then updates the next episode to download in db store. 

Once both scripts were done, I thought of deploying it to my (free) heroku instance. However the challenge with that is how do I trigger it at specific times in a day. It does provide something called a worker dyno, but I don't have much idea if heroku provides a free plugin for scheduling scripts. So I came up with this. I run a web server (I used sinatra), and have a continuous background thread that actually runs the above task, then sleeps for x hours and then comes back, so essentially behaving like a scheduled task. And using this technique I could also add a route that simple dumps the logs of the task to monitor if everything went well. Now there was one more small issue with heroku. Heroku shuts down your app if there is inactivity for some time( I think 15 minutes.) But now I cannot afford my task shutting down as I don't have any way to track if it shutdown in between a download. So to keep it up all the time, I used a service (pingdom) that sends a ping to the app every 5 minutes. It is essentially used to track the uptime of a service. Ironically, I used it to keep my app up all the time.

I have open sourced it on github. You could check it out, and help me improve it. And use it at your own risk. It can be deployed in both modes as a webserver on heroku or a cron on some other cloud instance. Although to use it you need to create a app on your dropbox, authorize it and get a access token. And add the access token to dnode.js. It is the nodejs script that is actually used for downloading the torrent to dropbox. If you want yourself to be notified you could also create a mailgun account and create a key and add it to constants.yml that is used to send emails to your specified email. Otherwise you could comment the mail sending line. You also need a redis node on cloud, which is used as a db store. I used redislabs service which provides a 25 MB node for free. And to add the series you want to monitor you could add it to the list in constants.yml.

Sunday, 5 January 2014

Gmail Notifier for Mac OSX

Recently, I came across this gem called terminal-notifier, it allows you to send Mac OS X User Notifications. It also comes with a command-line tool. I was thinking of some interesting use-case for it. Then an idea struck, how about an application that notifies when I receive a mail contains specific text, or from some specific user.
Now I'll give a little background about how and why this app. In our office on any special occasion, people get chocolates or sweets and put it in cafeteria, and then mail in a common group to let people know. As it happens, I don't have the habit to check my mails very frequently, and often miss out on important mails at right time. So how about an script that checks for mails containing certain words or from a specific user.
So the task was simple check for the new emails and if they contain a match for the given criteria in my case it was if the mail subject or the body contains words sweets or chocolates then send a User Notification. To get the emails I used the gmail gem. Now this gem needs your username and password. But storing them as plaintext in the script is not smart from security point. So I was looking for a simple and reliable option to store and retrieve my username and password. So I used OSX keychain tool to store the sensitive information. To retrieve the email and password I used the commandline tool "security". Initially to store the username and password in OSX Keychain use,

security add-internet-password -a username -w password -s

And then to retrieve it use,

security 2>&1 find-internet-password -ga username | grep password | cut -d '"' -f2

Moving forward once I get the username and password, I logged into the gmail account by,

gmail =, password)

After logging in I searched through the unread mails for the search queries in both subject and body of mail. Once I find the mail containing the search term I fire a notification. Now as this gem didn't have a sound notification I used the system sounds file and played it using the afplay command as,

afplay -v 4 -q 1 /System/Library/Sounds/Glass.aiff

and then fire the user notification by calling,

TerminalNotifier.notify("Subject : #{content_to_display}", :title => 'Title')

and then marked the mail as read so that it didn't come up when next time the cron runs. Finally to continuously scan through my inbox I set up this script as a cron that will run every minute,
*/1 * * * * ruby gmail_notifier.rb

You can find the complete script on github here

Wednesday, 26 December 2012

A simple No-SQL key-value db using self-modifying ruby script: An interesting application of Ruby's Reflection API

I am basically a java developer and I recently started learning ruby and was amazed at the ease in which you can develop in it and elegance of the language. The easiest way to learn any programming language is to develop simple applications in it that use various features of language. So when going through the various features I came across Reflection and Metaprogramming in Ruby. A very powerful feature in ruby. It amazing how the eval function allows one to write and execute ruby code dynamically. So while thinking of an application using this feature I came up with the following application.
It is a simple No-SQL key-value db that stores data in hash in ruby. The first line declares a hash. Then the following part of script reads its own code and uses eval function to execute it declare the hash. Then depending upon the function called GET/SET it either retrieves the value associated with the key or sets a new key/value in hash. Then it simply stores the new hash in the source code.
Here is the source code for the ruby script:
hash={"1"=>"Narendra", "2"=>"Mangesh", "3"=>"Viru", "4"=>"Virendra", "5"=>"Genh"}
err_msg="ruby #{__FILE__} <GET/SET> <key_to_search/key_to_set> <not_required/value_to_set>"
if ARGV.length<2 && (ARGV[0]!="GET" || ARGV[0]!="SET")
  puts err_msg
if ARGV[0]=="SET" && ARGV.length!=3
  puts err_msg
end,'r'){|f| f.each_line {|l| z<<l}}
if ARGV[0]=="GET"
  puts c[ARGV[1]] if c.include?(ARGV[1])
elsif ARGV[0]=="SET"
  z[0]="hash="+c.inspect+"\n",'w'){|f| z.each{|x| f<<x}}

Thursday, 13 December 2012

Hacking the Little Alchemy game with only Chrome in less than an hour

Today my friend introduced me to this cute little but addictive game named Little Alchemy.
So playing around with it for few minutes and looking at the time it took to find new elements I wondered how much time it would take to find all. Not having the patience to play whole game to determine all elements I thought why not do it the hacker style. So popped open the Chrome Web Inspector and looked around in the network tab to see what all data is sent by littlealchemy web page. I noticed it stores all data in application cache and retrieves from there. Looking around the js files for something interesting I struck gold when I found the logic in alchemy.js file. It seemed to make ajax call to 2 files /base/names.json and /base/base.json. Then I looked into these files and found that the mapping of all the elements was stored in base.json in array forms and all the names of elements were stored in names.json. After finding this it was a piece of cake to hack together a javascript code to display the combinations for all elements. So then I opened the javascript console and put together this piece of code to print the combinations for all elements and dumped it to a html file. You can check the output here.

And here is the piece of javascript code

var base,names,i,j;
          type: "GET",
          url: "",
          }).done(function( data ) {
          type: "GET",
          url: "",
          }).done(function( data ) {
         console.log(names.lang[i]+ " doesn't have any combination");
      console.log(names.lang[Number(base.base[i][j][0])]+" + "+names.lang[Number(base.base[i][j][1])]+" => "+names.lang[i]);

Thursday, 16 August 2012

So damn true: "Necessity the mother of building cool stuff"

Few days back, some hardware issue with my hard drive left it irrepairable and all my data was lost including softwares, movies, music. (Thank god I use github to host all my projects).

Now loss of movies and softwares is no big deal. But finding one's song collection is difficult as one has a particular taste for music. But good for me all that music is stored in my IPod. But as we all know Apple doesn't allow one to copy music files from IPod to computer. So what I did was look around some way to get my music from IPod to PC. I opened the IPod in USB storage mode and looked at where the files are stored. I found all the files but they were stored with random (unrecognizable) names in it. So for starters I copied all those files to my PC. Then I looked for a way to get the files some understandable names. But naming around 1000 songs listening to each one is not the way a geek would do it. So then I wrote this java app that scanned through all mp3 files and read the ID3 tags to get the song title and renamed all songs. Cool so it scanned all files and did the task in few minutes. Then I thought why not modify it a little and let it scan through your entire computer and look for mp3 files, rename them using song title and store them in a single directory where the songs are categorized in subdirectories using the album  name or artist name. In this way all the duplicates of a single mp3 would also be removed.

So this Mp3Manager is a console utility in java that takes as input the name of root directory to store all your music files and type of categorization to use to store the music files in directories(album name or artist). It is available on github

Read the to know how to use it. Let me know if you like it or if you would like any modifications to it.

Sunday, 22 July 2012

Fuzzy matching autocomplete library with inbuilt standalone http server in java

Past couple of days I had this idea in mind of implementing a autocomplete that uses fuzzy matching. For fuzzy matching of a partial string with all the strings in dictionary it uses the Levenshtein distance. Now finding the Levenshtein distance of given string with each string in dictionary is very inefficient. So I used the idea described here. Converted it to java and modified it to fit the needs.

It is available in 2 modes. First is the Http server mode in which you run the standalone server and call it directly from your ajax script.It returns data in json format with decreasing scores. For this you need to configure it by providing the file name containing the words for suggestion. To use it run the server and then use this url to get the results   http://server-ip/autoc?word=word_for_autocomplete&tf=max_typos_allowed
Specify the word_for_autocomplete and the max_typos_allowed parameter.

The second is you can embed it directly in your application by first creating trie and then calling the getResult() method in CHandler class. See the source code for better understanding. A sample main method is given which uses the second method.

You can find the source code for the project here.

If you like the idea and want to improve it or port it to other languages fork me github.