Instagram API. Data collection using location and time as parameters
UPDATE 2019: Instagram API is much more restricted now so API endpoints described in this blog post are not available by default anymore. Hashtag Search API is still available though so the code below can be used as skeleton for new queries.
Social networks research is quite popular topic nowadays. I found many interesting surveys describing the nature and patterns of the most popular networks like Twitter, Facebook and YouTube, however I did not see much work that has been done on Instagram data. Despite the fact that Instagram is relatively young social network, it has huge, dynamic and fast growing community of users. It would be interesting to look at the social graph of this network and try to find hidden patterns. I am going to collect the data using Instagram API and analyze it from this perspective.
This is my first blog post of Instagram analysis series that shows how to collect the data in order to start our research.
Instagram search API allows to get posts by the following parameters: time frame, location and number of posts from the location. However it has rate limits per access token - 5000 requests/hour (at the moment of writing this blog post). Apart from this, beware of connection errors if you are going to send more than 500 requests and you want your data collection script to remain stable.
I used the following method in order to get the data from Instagram server:
Next, we can get information about the connections between users (follows and followers). Before doing that I extract user ids from the collected posts. I store the data in local MongoDB instance. I will use Mongo Aggregation in order to get the ids from our data.
Now we can collect information about the connections. Here I get followers for each user. If you want to collect follows, change followed-by
in the request to follows
.
The API returns maximum 50 users in one request, so we need to use nextCursor
if we want to get all followers. Same is true for follows.
Now you know how to collect the data about users and connections on Instagram.
If you need to send a lot more than 5000 requests and you do not want to wait an hour every time you exceed the limit, I know some tips and tricks how to arrange that. Contact me or leave a comment and I share them with you.
In my next blog post I describe how to build a social graph based on this data and analyze it using Apache Spark and Graph Frames library.
Read another post where I describe clustering Instagram users with help of Apache Spark and MLlib.
As always, feel free to leave questions in comments below.
Let me know what you think of this article on twitter @mizvladimir or leave a comment below!