API/Bulk Data Access

  • 20
  • Question
  • Updated 2 years ago
  • Answered
Hi!

We’re in the process of reviewing how we make our data available to the outside world with the goal of making it easier for anyone to innovate and answer interesting questions with the data. If you use our current ftp solution to get data [http://www.imdb.com/interfaces] or are thinking about it, we’d love to get your feedback on the current process for accessing data and what we could do to make it easier for you to use in the future. We have some specific questions below, but would be just as happy hearing about how you access and use IMDb data to make a better overall experience.

1. What works/doesn’t work for you with the current model?
2. Do you access entertainment data from other sources in addition to IMDb?
3. Would a single large data set with primary keys be more or less useful to you than the current access model? Why?
4. Would an API that provides access to IMDb data be more or less useful to you than the current access model? Why?
5. Does how you plan on using the data impact how you want to have it delivered?
6. Is JSON format sufficient for your use cases (current or future) or would additional format options be useful? Why?
7. Are our T&Cs easy for you to understand and follow?


Thanks for your time and feedback!

Regards,

Aaron
IMDb.com
Photo of Gideon

Gideon, Employee

  • 6 Posts
  • 4 Reply Likes

Posted 5 years ago

  • 20
Photo of Alan Hall

Alan Hall

  • 1 Post
  • 6 Reply Likes
I think it's great you have a flat file listing of all the data, however I want to read it with a Perl program, and parsing through your format is not a simple task.  It would be much simpler if the fields were delimited by a '|', or something similar, and a code indicating movie, tv show, etc. I am looking at the actor/actress files specifically, haven't looked at the others as yet.  I don't supposed you already have Perl scripts available that would read the file and insert the data into MySQL?

Photo of Simo Tuokko

Simo Tuokko

  • 4 Posts
  • 1 Reply Like
I just wrote a simple Java program that does just that, it doesn't put them to DB but outputs '|'-separated list that can then be Mapreduce-processed.

I could clean it up a bit and put it up to Github etc if someone else is interested?
Photo of Robert Ștefan Stănescu

Robert Ștefan Stănescu

  • 1 Post
  • 0 Reply Likes
Now this is something I would really like to see. If you would be so kind to share it with us... much appreciated.
Photo of RyanG

RyanG

  • 8 Posts
  • 5 Reply Likes
1. Once a parser is built for the data files, parsing them basically just works, with the one major exception that there are no primary keys, so there's a lot of babysitting the data over time.
2. Sure. FreeDB, Wikipedia, Rotten Tomatoes, Amazon, among other sources.
3. Primary keys, did you say?
4. I like being able to pull all the data down at once and make minimal calls to the server, but an API in addition would not be unwelcome.
5. Sure.
6. JSON would be fine.
7. T&Cs could use some updating/examples/clarification in this mashup/integration/Web2 era.

Photo of Tanner Netterville

Tanner Netterville

  • 1 Post
  • 0 Reply Likes
I agree
Photo of Laurie Crist

Laurie Crist

  • 1 Post
  • 0 Reply Likes
still agreed with this comment
Photo of Mansour Behabadi

Mansour Behabadi

  • 2 Posts
  • 3 Reply Likes
OK. I finally decided to roll the sleeves up and do it myself. This script takes some/all of the .list.gz files and converts them to JSON:

https://github.com/oxplot/imdb2json