IMDb Data – Now available in Amazon S3

  • 1
  • Announcement
  • Updated 1 day ago
  • (Edited)
This is an announcement for customers of the IMDb bulk data available via FTP.

We are pleased to announce, starting today IMDb datasets are now available in Amazon S3. Using the new interface, customers can bulk-access IMDb title and name data. This more robust and reliable solution will replace the IMDb FTP sites, which will be retired on November 7, 2017 September 10, 2017.


For details on the S3 solution, file format and access guidelines, see www.imdb.com/interfaces.

In our continued effort to best serve our Contributors, we are streamlining the datasets and making them available in a more useful and structured format in S3. Notably:

  • Data refresh frequency is now daily (previously weekly). 
  • IMDb title and name identifiers are included in all the files for ease of matching and linking back to IMDb.
  • The files are in tab separated values (TSV) format.
  • The sets of data we provide are updated to only include the essential ones that help with matching and linking to an IMDb title or name.
  • To access the IMDb S3 buckets, users will need an Amazon AWS account.
As part of housekeeping the FTP site, we are deleting files we no longer update and are changing the directory structure. This will take effect next week. The list data files will be available at two new locations (see below) until November 7, 2017. Please note that the directory structure is subject to change without notice. We strongly encourage FTP site users to switch to the S3 solution at the earliest to ensure their applications continue to work without interruption.

 If you are not an IMDb Contributor and wish to obtain IMDb content for commercial use, we offer a content license.  The license grants you access to our content via an XML web service, plus the right to use the content in your product or service.  If that interests you, please email licensing@imdb.com.

 If you have any questions or concerns, please share your feedback in this thread.

 Thank you for your continued support.
Photo of sv

sv, Official Rep

  • 25 Posts
  • 11 Reply Likes

Posted 3 months ago

  • 1
Photo of Col Needham

Col Needham, Official Rep

  • 3824 Posts
  • 1330 Reply Likes
Official Response
Thanks for the feedback so far on this thread.  Please do continue to post and we will try to take as much as possible into account. This post answers some of the questions raised and there will be further updates based on the next round of feedback. 

On the S3 access issues, we now have a working prototype of a system which can make the same S3 data available to you via HTTP from IMDb directly without requiring any S3 registration and free from any possibility of AWS charges.  Please watch for an announcement as we convert this into production code. The only thing needed will be an ordinary IMDb user account attached to a valid email address.  We still intend to also make the data available via S3 for those people who find the AWS access tools more convenient and can stay within the free tier of AWS.    

On the general data availability, we are adding the AKA titles to the basic data set accessible to everyone.  Longer term, we are looking at the possibility of daily diff files for at least some of the data in the basic set. 

On the point about contributors, we are looking at extending the range of data available via the http solution based on your contribution history and volume. For top contributors and those people using the data to help us clean it via bulk corrections, this is likely to extend far beyond the current set of data even on the FTP site.  It is not our intention to deprive access to the data by those people who have genuinely helped to build it over the years and who want to continue to improve IMDb. We aim to also be able to grant specific permissions to specific customers for specific extra subsets of data as required on a case by case basis. This latter part may take some time to become a fully formed solution so please bear with us. 

The background to all of this is that there is a huge multi-year technology migration project which is nearing completion at IMDb. We have too many complicated old systems around which have been slowing the overall pace of development (I add a bit more detail to this on https://getsatisfaction.com/imdb/topics/why-doesnt-imdb-staff-ever-consult-with-the-contributor-base...).  The move to the new technology has been providing the opportunity to look at the way we operate different parts of the IMDb service.  One of the oldest software systems is the one which publishes the FTP data, and we will soon no longer to even be able to generate the .list files once the final pieces of the old IMDb system are decommissioned; at least not without re-writing all of the publication software to connect to the new system and produce an extremely difficult to manipulate text file format which was designed 27 years ago and has not changed in 21 years. Instead, we decided that it would be better to publish the data via a modern system (S3 and soon over https) in a modern format which can be more easily parsed.  The other problem with FTP is that we have no idea how many people are using the data and for what purpose, nor do we know what additional things they may want from the data. From feedback over the years, we knew some of your requirements already, notably (a) access to the title and name constant data (b) an easier to parse format (c) information to help in matching other catalogs to IMDb (d) more frequent updates.  We found ourselves having to guess the remaining requirements until we decided the best way forward was to move the data to a new location within the FTP sites, post an announcement on Get Satisfaction (this thread) and then wait to gather feedback before replying and figuring out what steps to take next (this reply). 

We hope this helps.  We have plenty to be working upon in the meantime, and we will follow-up as we deliver parts of the above. 

Col
Founder & CEO, IMDb.com. 
Photo of sv

sv, Official Rep

  • 25 Posts
  • 11 Reply Likes
Official Response
We are currently working on asolution for data access via HTTP endpoint as an alternative to thedirect AWS S3 access. As part of this solution, we are also looking into a contributor exclusive solution, providing extended datasets based on a person’scontribution history and volume. Given these developments, we are postponing the shutdown of the IMDb FTP sites to November 7, 2017

Please stay tuned for more updates.  Thanks!
(Edited)