Recent activity
Subscribe to this feed
Jonathan Marston replied on November 20, 2009 03:50 to the question "sqoop hive import giving null query values" in Cloudera:
adding the commons-io.jar made the --direct work
i'm still getting null results with the --direct parameter
here's the command and results
sqoop --connect jdbc:mysql://localhost:3306/prod_content --username root --table AssetType --hive-import --direct --fields-terminated-by '\0001' --lines-terminated-by '\n'
09/11/19 22:45:02 DEBUG sqoop.ImportOptions: Parsing sqoop arguments:
09/11/19 22:45:02 DEBUG sqoop.ImportOptions: --connect
09/11/19 22:45:02 DEBUG sqoop.ImportOptions: jdbc:mysql://localhost:3306/prod_content
09/11/19 22:45:02 DEBUG sqoop.ImportOptions: --username
09/11/19 22:45:02 DEBUG sqoop.ImportOptions: root
09/11/19 22:45:02 DEBUG sqoop.ImportOptions: --table
09/11/19 22:45:02 DEBUG sqoop.ImportOptions: AssetType
09/11/19 22:45:02 DEBUG sqoop.ImportOptions: --hive-import
09/11/19 22:45:02 DEBUG sqoop.ImportOptions: --direct
09/11/19 22:45:02 DEBUG sqoop.ImportOptions: --fields-terminated-by
09/11/19 22:45:02 DEBUG sqoop.ImportOptions: \0001
09/11/19 22:45:02 DEBUG sqoop.ImportOptions: --lines-terminated-by
09/11/19 22:45:02 DEBUG sqoop.ImportOptions: \n
09/11/19 22:45:02 DEBUG sqoop.ConnFactory: Loaded manager factory: org.apache.hadoop.sqoop.manager.DefaultManagerFactory
09/11/19 22:45:02 DEBUG sqoop.ConnFactory: Trying ManagerFactory: org.apache.hadoop.sqoop.manager.DefaultManagerFactory
09/11/19 22:45:02 DEBUG manager.DefaultManagerFactory: Trying with scheme: jdbc:mysql:
09/11/19 22:45:02 DEBUG sqoop.ConnFactory: Instantiated ConnManager.
09/11/19 22:45:02 INFO sqoop.Sqoop: Beginning code generation
09/11/19 22:45:03 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM AssetType AS t LIMIT 1
09/11/19 22:45:03 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM AssetType AS t LIMIT 1
09/11/19 22:45:03 DEBUG orm.ClassWriter: Writing source file: ./AssetType.java
09/11/19 22:45:03 DEBUG orm.ClassWriter: Table name: AssetType
09/11/19 22:45:03 DEBUG orm.ClassWriter: Columns: AssetTypeID:4, Description:1,
09/11/19 22:45:03 DEBUG orm.ClassWriter: sourceFilename is AssetType.java
09/11/19 22:45:03 DEBUG orm.ClassWriter: Could not create directory tree for .
09/11/19 22:45:03 INFO orm.CompilationManager: HADOOP_HOME is /usr/lib/hadoop-0.20
09/11/19 22:45:03 INFO orm.CompilationManager: Found hadoop core jar at: /usr/lib/hadoop-0.20/hadoop-0.20.1+152-core.jar
09/11/19 22:45:03 DEBUG orm.CompilationManager: Invoking javac with args: -sourcepath ./ -d /tmp/sqoop/compile/ -classpath /usr/lib/hadoop-0.20/conf:/usr/lib/jvm/java-6-openjdk/lib/tools.jar:/usr/lib/hadoop-0.20:/usr/lib/hadoop-0.20/hadoop-0.20.1+152-core.jar:/usr/lib/hadoop-0.20/lib/cloudera-desktop-plugins.jar:/usr/lib/hadoop-0.20/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20/lib/commons-codec-1.3.jar:/usr/lib/hadoop-0.20/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20/lib/commons-httpclient-3.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-io-1.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-net-1.4.1.jar:/usr/lib/hadoop-0.20/lib/core-3.1.1.jar:/usr/lib/hadoop-0.20/lib/hadoop-0.20.1+152-fairscheduler.jar:/usr/lib/hadoop-0.20/lib/hadoop-0.20.1+152-scribe-log4j.jar:/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20/lib/hsqldb.jar:/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.0.1.jar:/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.0.1.jar:/usr/lib/hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20/lib/jetty-6.1.14.jar:/usr/lib/hadoop-0.20/lib/jetty-util-6.1.14.jar:/usr/lib/hadoop-0.20/lib/junit-3.8.1.jar:/usr/lib/hadoop-0.20/lib/junit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20/lib/libfb303.jar:/usr/lib/hadoop-0.20/lib/libthrift.jar:/usr/lib/hadoop-0.20/lib/log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mysql-connector-java-5.0.8-bin.jar:/usr/lib/hadoop-0.20/lib/oro-2.0.8.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-0.20/lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop-0.20/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-api-2.1.jar:/usr/lib/hadoop-0.20/hadoop-0.20.1+152-core.jar:/usr/lib/hadoop-0.20/contrib/sqoop/hadoop-0.20.1+152-sqoop.jar ./AssetType.java
Note: ./AssetType.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
09/11/19 22:45:03 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop/compile/AssetType.jar
09/11/19 22:45:03 DEBUG orm.CompilationManager: Scanning for .class files in directory: /tmp/sqoop/compile
09/11/19 22:45:03 DEBUG orm.CompilationManager: Considering entry: /tmp/sqoop/compile/AssetType.class
09/11/19 22:45:03 DEBUG orm.CompilationManager: Got classfile: /tmp/sqoop/compile/AssetType.class -> AssetType.class
09/11/19 22:45:03 DEBUG orm.CompilationManager: Considering entry: /tmp/sqoop/compile/AssetType.jar
09/11/19 22:45:03 DEBUG orm.CompilationManager: Finished writing jar file /tmp/sqoop/compile/AssetType.jar
09/11/19 22:45:03 INFO manager.LocalMySQLManager: Beginning mysqldump fast path import
09/11/19 22:45:03 INFO manager.LocalMySQLManager: Performing import of table AssetType from database prod_content
09/11/19 22:45:03 DEBUG manager.LocalMySQLManager: Starting mysqldump with arguments:
09/11/19 22:45:03 DEBUG manager.LocalMySQLManager: mysqldump
09/11/19 22:45:03 DEBUG manager.LocalMySQLManager: --skip-opt
09/11/19 22:45:03 DEBUG manager.LocalMySQLManager: --compact
09/11/19 22:45:03 DEBUG manager.LocalMySQLManager: --no-create-db
09/11/19 22:45:03 DEBUG manager.LocalMySQLManager: --no-create-info
09/11/19 22:45:03 DEBUG manager.LocalMySQLManager: --quick
09/11/19 22:45:03 DEBUG manager.LocalMySQLManager: --single-transaction
09/11/19 22:45:03 DEBUG manager.LocalMySQLManager: --user=root
09/11/19 22:45:03 DEBUG manager.LocalMySQLManager: prod_content
09/11/19 22:45:03 DEBUG manager.LocalMySQLManager: AssetType
09/11/19 22:45:04 DEBUG util.DirectImportUtils: Writing to filesystem: hdfs://localhost:8022
09/11/19 22:45:04 DEBUG util.DirectImportUtils: Creating destination directory AssetType
09/11/19 22:45:04 DEBUG io.SplittingOutputStream: Opening next output file: AssetType/data-00000
09/11/19 22:45:04 DEBUG manager.LocalMySQLManager: User-specified delimiters; using reparsing import
09/11/19 22:45:04 INFO manager.LocalMySQLManager: Converting data to use specified delimiters.
09/11/19 22:45:04 INFO manager.LocalMySQLManager: (For the fastest possible import, use
09/11/19 22:45:04 INFO manager.LocalMySQLManager: --mysql-delimiters to specify the same field
09/11/19 22:45:04 INFO manager.LocalMySQLManager: delimiters as are used by mysqldump.)
09/11/19 22:45:04 INFO manager.LocalMySQLManager: Transfer loop complete.
09/11/19 22:45:04 INFO manager.LocalMySQLManager: Transferred 261 bytes in 0.066 seconds (3.861 KB/sec)
09/11/19 22:45:04 INFO hive.HiveImport: Loading uploaded data into Hive
09/11/19 22:45:04 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM AssetType AS t LIMIT 1
09/11/19 22:45:04 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM AssetType AS t LIMIT 1
09/11/19 22:45:04 DEBUG hive.TableDefWriter: Create statement: CREATE TABLE AssetType ( AssetTypeID INT, Description STRING) COMMENT 'Imported by sqoop on 2009/11/19 22:45:04' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\01' LINES TERMINATED BY '\012' STORED AS TEXTFILE
09/11/19 22:45:04 DEBUG hive.TableDefWriter: Load statement: LOAD DATA INPATH 'hdfs://localhost:8022/user/jon/AssetType' INTO TABLE AssetType
09/11/19 22:45:05 INFO hive.HiveImport: Hive history file=/tmp/jon/hive_job_log_jon_200911192245_1352574233.txt
09/11/19 22:45:08 INFO hive.HiveImport: OK
09/11/19 22:45:08 INFO hive.HiveImport: Time taken: 3.346 seconds
09/11/19 22:45:08 INFO hive.HiveImport: Loading data to table assettype
09/11/19 22:45:08 INFO hive.HiveImport: OK
09/11/19 22:45:08 INFO hive.HiveImport: Time taken: 0.22 seconds
09/11/19 22:45:08 INFO hive.HiveImport: Hive import complete.
Jonathan Marston replied on November 19, 2009 18:49 to the question "sqoop hive import giving null query values" in Cloudera:
i actually hadn't explicitly used the --direct parameter. When I tried, i got a different error message. :(
===================
sqoop --connect jdbc:mysql://localhost:3306/prod_content --username root --table AssetType --hive-import --direct --fields-terminated-by '\0001' --lines-terminated-by '\n'
09/11/19 13:43:58 INFO manager.LocalMySQLManager: Transferred 0 bytes in 765.8869 seconds (0 bytes/sec)
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/io/output/CountingOutputStream
at org.apache.hadoop.sqoop.util.DirectImportUtils.createHdfsSink(DirectImportUtils.java:85)
at org.apache.hadoop.sqoop.manager.LocalMySQLManager.importTable(LocalMySQLManager.java:450)
at org.apache.hadoop.sqoop.Sqoop.importTable(Sqoop.java:91)
at org.apache.hadoop.sqoop.Sqoop.run(Sqoop.java:175)
========================
if i try and do the import with the delimiters suggested but not use the --direct parameter, i get the same results as before with all null values.
Jonathan Marston asked a question in Cloudera on November 19, 2009 04:13:
sqoop hive import giving null query valuesI'm running into a problem using sqoop to bring data into hive. the data appears to get into hive, but running a query against the hive table yields all 'null' results. the row and column counts are correct, but all null values. I've tried this against a number of different mysql tables all with the same results. I am using CDH2 0.20.1+152-1~jaunty~cdh2 on ubuntu
here's the mysql table:
===================
CREATE TABLE IF NOT EXISTS `AssetType` (
`AssetTypeID` int(10) NOT NULL AUTO_INCREMENT,
`Description` char(50) NOT NULL,
PRIMARY KEY (`AssetTypeID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=26 ;
INSERT INTO `AssetType` (`AssetTypeID`, `Description`) VALUES
(1, 'rawaudio'),
(2, 'clearspeakaudio'),
(3, 'slowspeakaudio'),
(4, 'video'),
(5, 'thumb'),
(6, 'others'),
(8, 'videoprocessorfiles'),
(13, 'avatar'),
(14, 'word'),
(15, 'dictionary'),
(16, 'grammar'),
(17, 'featuredpicture'),
(18, 'filler'),
(19, 'demopicture'),
(20, 'wordpronunciation'),
(21, 'lineset'),
(22, 'wordpronunciationall'),
(23, 'linesetall'),
(24, 'dictionaryall'),
(25, 'grammarall');
=====================
then running the following command imports into hive
sqoop --connect jdbc:mysql://localhost:3306/prod_content --username root --table AssetType --hive-import
09/11/18 22:14:36 INFO sqoop.ImportOptions: Using Hive-specific delimiters for output. You can override
09/11/18 22:14:36 INFO sqoop.ImportOptions: delimiters with --fields-terminated-by, etc.
09/11/18 22:14:36 INFO sqoop.Sqoop: Beginning code generation
09/11/18 22:14:36 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM AssetType AS t LIMIT 1
09/11/18 22:14:36 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM AssetType AS t LIMIT 1
09/11/18 22:14:36 INFO orm.CompilationManager: HADOOP_HOME is /usr/lib/hadoop-0.20
09/11/18 22:14:36 INFO orm.CompilationManager: Found hadoop core jar at: /usr/lib/hadoop-0.20/hadoop-0.20.1+152-core.jar
Note: ./AssetType.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
09/11/18 22:14:37 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop/compile/AssetType.jar
09/11/18 22:14:37 WARN manager.MySQLManager: It looks like you are importing from mysql on
09/11/18 22:14:37 WARN manager.MySQLManager: localhost. This transfer can be faster! Use the
09/11/18 22:14:37 WARN manager.MySQLManager: --direct option to exercise a MySQL-specific fast
09/11/18 22:14:37 WARN manager.MySQLManager: path.
09/11/18 22:14:37 INFO mapreduce.DataDrivenImportJob: Beginning data-driven import of AssetType
09/11/18 22:14:37 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM AssetType AS t LIMIT 1
09/11/18 22:15:03 INFO mapreduce.DataDrivenImportJob: Transferred 281 bytes in 25.8179 seconds (10.8839 bytes/sec)
09/11/18 22:15:03 INFO hive.HiveImport: Removing temporary files from import process: AssetType/_logs
09/11/18 22:15:03 INFO hive.HiveImport: Loading uploaded data into Hive
09/11/18 22:15:03 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM AssetType AS t LIMIT 1
09/11/18 22:15:03 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM AssetType AS t LIMIT 1
09/11/18 22:15:04 INFO hive.HiveImport: Hive history file=/tmp/jon/hive_job_log_jon_200911182215_1757140011.txt
09/11/18 22:15:08 INFO hive.HiveImport: OK
09/11/18 22:15:08 INFO hive.HiveImport: Time taken: 3.628 seconds
09/11/18 22:15:08 INFO hive.HiveImport: Loading data to table assettype
09/11/18 22:15:08 INFO hive.HiveImport: OK
09/11/18 22:15:08 INFO hive.HiveImport: Time taken: 0.245 seconds
09/11/18 22:15:08 INFO hive.HiveImport: Hive import complete.
================
connecting to hive and running a query shows the right number of rows, but all the values are null
hive> select * from assettype;
OK
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
Time taken: 4.127 seconds
hive> describe extended assettype;
OK
assettypeid int
description string
Detailed Table Information Table(tableName:assettype, dbName:default, owner:jon, createTime:1258600504, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:assettypeid, type:int, comment:null), FieldSchema(name:description, type:string, comment:null)], location:hdfs://localhost:8022/user/hive/warehouse/assettype, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1,line.delim=
,field.delim=1}), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[], parameters:{comment=Imported by sqoop on 2009/11/18 22:15:03})
Time taken: 3.5 seconds
===================
BUT when I open up cloudera desktop (very cool product by the way) and use the file browser to look at the file /user/hive/warehouse/assettype/part-m-00000 I can see that the contents from the mysql table are in there
1.rawaudio
2.clearspeakaudio
3.slowspeakaudio
4.video
5.thumb
6.others
====================
any thoughts?
Jonathan Marston replied on October 26, 2009 23:08 to the question "Sqoop import of large tables can time out" in Cloudera:
hmm, it looks like it might be something else. Here's a summary:
I have a database with 68 tables. Sqoop is able to process most of them, but 13 of them consistently fail to copy. The log4j output reports no errors though. Here is an example for the table SessionType
09/10/26 18:58:41 INFO mapreduce.DataDrivenImportJob: Beginning data-driven import of SessionType
09/10/26 18:58:42 DEBUG sqoop.ConnFactory: Loaded manager factory: org.apache.hadoop.sqoop.manager.DefaultManagerFactory
09/10/26 18:58:42 DEBUG sqoop.ConnFactory: Trying ManagerFactory: org.apache.hadoop.sqoop.manager.DefaultManagerFactory
09/10/26 18:58:42 DEBUG manager.DefaultManagerFactory: Trying with scheme: jdbc:mysql:
09/10/26 18:58:42 DEBUG sqoop.ConnFactory: Instantiated ConnManager.
09/10/26 18:58:43 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM SessionType AS t LIMIT 1
09/10/26 18:59:26 INFO mapreduce.DataDrivenImportJob: Transferred 0 bytes in 42.9181 seconds (0 bytes/sec)
This particular table is quite small, here is a sql statement creating it along with it's contents
CREATE TABLE IF NOT EXISTS `SessionType` (
`SessionTypeID` int(11) NOT NULL,
`Name` varchar(200) DEFAULT NULL,
`Description` varchar(200) DEFAULT NULL,
PRIMARY KEY (`SessionTypeID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `SessionType` (`SessionTypeID`, `Name`, `Description`) VALUES
(1, 'Practice', NULL),
(2, 'QuizVerb1', NULL),
(3, 'QuizVerbAll', NULL),
(4, 'QuizNoun1', NULL),
(5, 'QuizNounAll', NULL),
(6, 'QuizAny1', NULL),
(7, 'QuizAny2', NULL),
(8, 'QuizAll', NULL);
I have noticed that there is an hdfs folder created: SessionType/_logs/history
Can I email you the contents of the log file to see if that helps diagnose the problem?
Jonathan Marston replied on October 26, 2009 01:05 to the question "Sqoop import of large tables can time out" in Cloudera:
Jonathan Marston asked a question in Cloudera on October 26, 2009 00:13:
Sqoop import of large tables can time outI'm running into a problem trying to export from a large table. I see that there is a patch commited for Hadoop 0.21
http://issues.apache.org/jira/browse/...
I'm using cloudera distribution hadoop-0.20.1+133
Will this patch be incorporated into the cloudera distribution soon?
How would I go about applying this patch myself?
Jonathan Marston replied on October 25, 2009 23:40 to the question "why can't hive load data inpath with training vm 0.3.1" in Cloudera:
Jonathan Marston asked a question in Cloudera on October 25, 2009 22:59:
sqoop needs to escape table namesI'm trying to run sqoop against my mysql database, but there are a couple of tables that it is choking on. These tables have names that cause a sql error. This could be solved by escaping the tablename in the query run by sqoop.
so my error is this:
09/10/25 18:34:00 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM Character AS t LIMIT 1
09/10/25 18:34:00 ERROR manager.SqlManager: Error executing statement: com.mysql.jdbc.exceptions.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corre
which chokes on the table named Character. The same query fails when i try to run it in phpmyadmin. but this query works
SELECT t.* FROM `Character` AS t LIMIT 1
this is similar to an earlier problem with unescaped column names.
Thanks for the tool and I'm looking forward to using it. let me know if there is any info I can offer to help.
Loading Profile...

