sqoop OutOfMemoryError while importing table
Hi all.
I'm trying to import a few GBs table from MySql to Hadoop using sqoop, but the job fails. The job is running with 1GB ram.
The stack trace is:
09/06/18 16:19:31 INFO sqoop.Sqoop: Beginning code generation
09/06/18 16:19:31 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM outbound_messages AS t WHERE 1 = 1
java.lang.OutOfMemoryError: Java heap space
at com.mysql.jdbc.Buffer.getBytes(Buffer.java:198)
at com.mysql.jdbc.Buffer.readLenByteArray(Buffer.java:318)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1375)
at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2369)
at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:451)
at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:2076)
at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1451)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1787)
at com.mysql.jdbc.Connection.execSQL(Connection.java:3283)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1332)
at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1467)
at org.apache.hadoop.sqoop.manager.SqlManager.execute(SqlManager.java:254)
at org.apache.hadoop.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:97)
at org.apache.hadoop.sqoop.orm.ClassWriter.generate(ClassWriter.java:445)
at org.apache.hadoop.sqoop.Sqoop.generateORM(Sqoop.java:64)
at org.apache.hadoop.sqoop.Sqoop.importTable(Sqoop.java:76)
at org.apache.hadoop.sqoop.Sqoop.run(Sqoop.java:160)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.sqoop.Sqoop.main(Sqoop.java:178)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
Any help will be appreciated
Thanks
I'm trying to import a few GBs table from MySql to Hadoop using sqoop, but the job fails. The job is running with 1GB ram.
The stack trace is:
09/06/18 16:19:31 INFO sqoop.Sqoop: Beginning code generation
09/06/18 16:19:31 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM outbound_messages AS t WHERE 1 = 1
java.lang.OutOfMemoryError: Java heap space
at com.mysql.jdbc.Buffer.getBytes(Buffer.java:198)
at com.mysql.jdbc.Buffer.readLenByteArray(Buffer.java:318)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1375)
at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2369)
at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:451)
at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:2076)
at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1451)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1787)
at com.mysql.jdbc.Connection.execSQL(Connection.java:3283)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1332)
at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1467)
at org.apache.hadoop.sqoop.manager.SqlManager.execute(SqlManager.java:254)
at org.apache.hadoop.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:97)
at org.apache.hadoop.sqoop.orm.ClassWriter.generate(ClassWriter.java:445)
at org.apache.hadoop.sqoop.Sqoop.generateORM(Sqoop.java:64)
at org.apache.hadoop.sqoop.Sqoop.importTable(Sqoop.java:76)
at org.apache.hadoop.sqoop.Sqoop.run(Sqoop.java:160)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.sqoop.Sqoop.main(Sqoop.java:178)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
Any help will be appreciated
Thanks
2
people have this problem
I have this problem, too!
Tell me when someone solves it.
The more people who report this problem, the more it gets noticed.
The more people who report this problem, the more it gets noticed.
-
Inappropriate?Hi Simonluca,
Looking into this, it seems as though the mysql driver, by default, buffers the entire result set in RAM before serving the query to the client (i.e., sqoop). So even though sqoop was executing that particular query only to examine the types of the table (it only needs one result row), it was reading the whole thing!
This shouldn't be too hard for me to fix. I'll work on a patch.
Thanks!
- Aaron -
Inappropriate?Aaron,
I think this post could help: http://benjchristensen.wordpress.com/...
Thanks for your work
-=[SLL]=-
I’m thankful
-
Inappropriate?I'd found that same one :)
-
Inappropriate?Hi Aaron,
I'm facing the same problem now. Is the patch available now? Or any workarounds I can apply to? -
Inappropriate?Hi weiwei,
Yes, the patch is at https://issues.apache.org/jira/browse...
If this isn't already integrated in our distribution (I can't check from the machine I'm on), then it'll be done in the next release in a couple weeks. In the meantime, you can apply the patch to your own local source copy and keep moving.
Cheers,
- Aaron -
Inappropriate?Hi Aaron,
Thank you for your prompt response. it is super cool.
-weiwei -
Inappropriate?Hi Aaron,
I'm still struggling with the patch. I installed Hadoop using cloudera distribution which version hadoop 0.18.3-14. I unpackaged hadoop-mapred-2009-07-12_13-01-18 patch and replaced all the files under /usr/lib/hadoop and /etc/hadoop/conf. I got java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobShell.
I checked configure files from the patch. It looks like hadoop 0.20 instead of 0.18. The question is that is this patch really applicable to the current cloudera distribution? Or yes, do you have any documents on how to apply patch to the current build?
Many many thanks. -
Inappropriate?Hi weiwei,
Sorry to frustrate you -- it's possible that the patch is 20-branch specific. I just tried applying it on top of 18 and noticed that it fails. We'll have a hadoop-20 set of RPMs out "soon" which will include this and several more features for Sqoop.
Stay tuned!
- Aaron
Loading Profile...


