DBOutputFormat only inserts, does not update DB rows - alternatives?
Hi!
I am trying to export the results of Hadoop processing to relational DB (MySQL in this case). I have tried using DBOutputFormat, but it seems a bit limited to me.
We are using hadoop to gather statistics and to post them to MySQL. The data flow is small enough so it wouldn't have much impact on our SQL server. However, the statistics are per-day and would also have to be updated, not just inserted (as the day goes on).
For example: we have the data that there were 2000 visits on some page today (by now), so we insert a row with this data in DB. But an hour later we need to update this row (not just insert a new one) with the new value.
Now, DBOutputFormat only allows inserts. The best solution I could find by now is to copy DBOutputFormat.java and DBConfiguration.java to my own project and change them (I can't just subclass them because of private properties and functions). But this just seems so... awkward. I am sure there must be a better way?
Can Sqoop help?
Thanks!
Andrew
I am trying to export the results of Hadoop processing to relational DB (MySQL in this case). I have tried using DBOutputFormat, but it seems a bit limited to me.
We are using hadoop to gather statistics and to post them to MySQL. The data flow is small enough so it wouldn't have much impact on our SQL server. However, the statistics are per-day and would also have to be updated, not just inserted (as the day goes on).
For example: we have the data that there were 2000 visits on some page today (by now), so we insert a row with this data in DB. But an hour later we need to update this row (not just insert a new one) with the new value.
Now, DBOutputFormat only allows inserts. The best solution I could find by now is to copy DBOutputFormat.java and DBConfiguration.java to my own project and change them (I can't just subclass them because of private properties and functions). But this just seems so... awkward. I am sure there must be a better way?
Can Sqoop help?
Thanks!
Andrew
1
person has this question
I have this question, too!
Tell me when someone answers.
The more people who ask this question, the more it gets noticed.
The more people who ask this question, the more it gets noticed.
-
Inappropriate?Hi Andrew,
I'm working on building "export" ability for Sqoop. But this will be built on top of DBOutputFormat. So Sqoop will make it easier to dump HDFS files into tables, but it'll still be INSERT-based.
The short answer is that a new OutputFormat is going to be necessary to accomplish what you want to do, either subclassing DBOutputFormat or in some other way related to it. This seems like a reasonable feature to add to Hadoop and Sqoop. I'm happy to take it under advisement as a future direction for Sqoop features. If you want to help though, it could certainly come together faster :) Send me an email: aaron at cloudera -- if you're interested in hacking on the project. I can give you some advice as to how to most easily do this.
Cheers
- Aaron -
Inappropriate?Hi Aaron!
I'll send you an e-mail then... I'll have to add this feature anyway because I need it, so it makes sense to do it in such a way that it makes it into core Hadoop / Sqoop.
Enjoy!
Andrew
-
Inappropriate?Ok, mail sent... :)
Loading Profile...


