Sqoop import of large tables can time out
I'm running into a problem trying to export from a large table. I see that there is a patch commited for Hadoop 0.21
http://issues.apache.org/jira/browse/...
I'm using cloudera distribution hadoop-0.20.1+133
Will this patch be incorporated into the cloudera distribution soon?
How would I go about applying this patch myself?
http://issues.apache.org/jira/browse/...
I'm using cloudera distribution hadoop-0.20.1+133
Will this patch be incorporated into the cloudera distribution soon?
How would I go about applying this patch myself?
1
person has this question
I have this question, too!
Tell me when someone answers.
The more people who ask this question, the more it gets noticed.
The more people who ask this question, the more it gets noticed.
-
Inappropriate?note, i'm using it on a mac, so i can't use the source rpm approach
-
Inappropriate?Hi Jonathan,
MAPREDUCE-876 has already been added to CDH and is present in the version you have. Can you give some more detail about what you're doing?
- Aaron -
Inappropriate?hmm, it looks like it might be something else. Here's a summary:
I have a database with 68 tables. Sqoop is able to process most of them, but 13 of them consistently fail to copy. The log4j output reports no errors though. Here is an example for the table SessionType
09/10/26 18:58:41 INFO mapreduce.DataDrivenImportJob: Beginning data-driven import of SessionType
09/10/26 18:58:42 DEBUG sqoop.ConnFactory: Loaded manager factory: org.apache.hadoop.sqoop.manager.DefaultManagerFactory
09/10/26 18:58:42 DEBUG sqoop.ConnFactory: Trying ManagerFactory: org.apache.hadoop.sqoop.manager.DefaultManagerFactory
09/10/26 18:58:42 DEBUG manager.DefaultManagerFactory: Trying with scheme: jdbc:mysql:
09/10/26 18:58:42 DEBUG sqoop.ConnFactory: Instantiated ConnManager.
09/10/26 18:58:43 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM SessionType AS t LIMIT 1
09/10/26 18:59:26 INFO mapreduce.DataDrivenImportJob: Transferred 0 bytes in 42.9181 seconds (0 bytes/sec)
This particular table is quite small, here is a sql statement creating it along with it's contents
CREATE TABLE IF NOT EXISTS `SessionType` (
`SessionTypeID` int(11) NOT NULL,
`Name` varchar(200) DEFAULT NULL,
`Description` varchar(200) DEFAULT NULL,
PRIMARY KEY (`SessionTypeID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `SessionType` (`SessionTypeID`, `Name`, `Description`) VALUES
(1, 'Practice', NULL),
(2, 'QuizVerb1', NULL),
(3, 'QuizVerbAll', NULL),
(4, 'QuizNoun1', NULL),
(5, 'QuizNounAll', NULL),
(6, 'QuizAny1', NULL),
(7, 'QuizAny2', NULL),
(8, 'QuizAll', NULL);
I have noticed that there is an hdfs folder created: SessionType/_logs/history
Can I email you the contents of the log file to see if that helps diagnose the problem? -
Inappropriate?Hi Jonathan,
Sure, send me the logfile: aaron at cloudera. FWIW, I created the table using those two statements you pasted above, and then ran:
$ sqoop --connect jdbc:mysql://localhost/importtest --table SessionType
It imported just fine.
A few questions:
- what versions of Linux and mysql are you using?
- what exact commandline did you deliver to sqoop?
- in your email, can you send the exact complete debug log from sqoop?
- Are there perhaps other applications holding locks on the SessionType table? (Does 'mysqladmin status' or 'mysqladmin processlist' show anything else open?)
Finally, does it work if you use direct-mode import? (Run sqoop on the same node as the mysql server, and give it the --direct argument)
Loading Profile...


