Get your own customer support community

Recent activity

Subscribe to this feed
  • question

    Jonathan Marston replied on November 20, 2009 03:50 to the question "sqoop hive import giving null query values" in Cloudera:

    Jonathan Marston
    adding the commons-io.jar made the --direct work

    i'm still getting null results with the --direct parameter

    here's the command and results

    sqoop --connect jdbc:mysql://localhost:3306/prod_content --username root --table AssetType --hive-import --direct --fields-terminated-by '\0001' --lines-terminated-by '\n'

    09/11/19 22:45:02 DEBUG sqoop.ImportOptions: Parsing sqoop arguments:
    09/11/19 22:45:02 DEBUG sqoop.ImportOptions: --connect
    09/11/19 22:45:02 DEBUG sqoop.ImportOptions: jdbc:mysql://localhost:3306/prod_content
    09/11/19 22:45:02 DEBUG sqoop.ImportOptions: --username
    09/11/19 22:45:02 DEBUG sqoop.ImportOptions: root
    09/11/19 22:45:02 DEBUG sqoop.ImportOptions: --table
    09/11/19 22:45:02 DEBUG sqoop.ImportOptions: AssetType
    09/11/19 22:45:02 DEBUG sqoop.ImportOptions: --hive-import
    09/11/19 22:45:02 DEBUG sqoop.ImportOptions: --direct
    09/11/19 22:45:02 DEBUG sqoop.ImportOptions: --fields-terminated-by
    09/11/19 22:45:02 DEBUG sqoop.ImportOptions: \0001
    09/11/19 22:45:02 DEBUG sqoop.ImportOptions: --lines-terminated-by
    09/11/19 22:45:02 DEBUG sqoop.ImportOptions: \n
    09/11/19 22:45:02 DEBUG sqoop.ConnFactory: Loaded manager factory: org.apache.hadoop.sqoop.manager.DefaultManagerFactory
    09/11/19 22:45:02 DEBUG sqoop.ConnFactory: Trying ManagerFactory: org.apache.hadoop.sqoop.manager.DefaultManagerFactory
    09/11/19 22:45:02 DEBUG manager.DefaultManagerFactory: Trying with scheme: jdbc:mysql:
    09/11/19 22:45:02 DEBUG sqoop.ConnFactory: Instantiated ConnManager.
    09/11/19 22:45:02 INFO sqoop.Sqoop: Beginning code generation
    09/11/19 22:45:03 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM AssetType AS t LIMIT 1
    09/11/19 22:45:03 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM AssetType AS t LIMIT 1
    09/11/19 22:45:03 DEBUG orm.ClassWriter: Writing source file: ./AssetType.java
    09/11/19 22:45:03 DEBUG orm.ClassWriter: Table name: AssetType
    09/11/19 22:45:03 DEBUG orm.ClassWriter: Columns: AssetTypeID:4, Description:1,
    09/11/19 22:45:03 DEBUG orm.ClassWriter: sourceFilename is AssetType.java
    09/11/19 22:45:03 DEBUG orm.ClassWriter: Could not create directory tree for .
    09/11/19 22:45:03 INFO orm.CompilationManager: HADOOP_HOME is /usr/lib/hadoop-0.20
    09/11/19 22:45:03 INFO orm.CompilationManager: Found hadoop core jar at: /usr/lib/hadoop-0.20/hadoop-0.20.1+152-core.jar
    09/11/19 22:45:03 DEBUG orm.CompilationManager: Invoking javac with args: -sourcepath ./ -d /tmp/sqoop/compile/ -classpath /usr/lib/hadoop-0.20/conf:/usr/lib/jvm/java-6-openjdk/lib/tools.jar:/usr/lib/hadoop-0.20:/usr/lib/hadoop-0.20/hadoop-0.20.1+152-core.jar:/usr/lib/hadoop-0.20/lib/cloudera-desktop-plugins.jar:/usr/lib/hadoop-0.20/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20/lib/commons-codec-1.3.jar:/usr/lib/hadoop-0.20/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20/lib/commons-httpclient-3.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-io-1.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-net-1.4.1.jar:/usr/lib/hadoop-0.20/lib/core-3.1.1.jar:/usr/lib/hadoop-0.20/lib/hadoop-0.20.1+152-fairscheduler.jar:/usr/lib/hadoop-0.20/lib/hadoop-0.20.1+152-scribe-log4j.jar:/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20/lib/hsqldb.jar:/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.0.1.jar:/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.0.1.jar:/usr/lib/hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20/lib/jetty-6.1.14.jar:/usr/lib/hadoop-0.20/lib/jetty-util-6.1.14.jar:/usr/lib/hadoop-0.20/lib/junit-3.8.1.jar:/usr/lib/hadoop-0.20/lib/junit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20/lib/libfb303.jar:/usr/lib/hadoop-0.20/lib/libthrift.jar:/usr/lib/hadoop-0.20/lib/log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mysql-connector-java-5.0.8-bin.jar:/usr/lib/hadoop-0.20/lib/oro-2.0.8.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-0.20/lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop-0.20/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-api-2.1.jar:/usr/lib/hadoop-0.20/hadoop-0.20.1+152-core.jar:/usr/lib/hadoop-0.20/contrib/sqoop/hadoop-0.20.1+152-sqoop.jar ./AssetType.java
    Note: ./AssetType.java uses or overrides a deprecated API.
    Note: Recompile with -Xlint:deprecation for details.
    09/11/19 22:45:03 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop/compile/AssetType.jar
    09/11/19 22:45:03 DEBUG orm.CompilationManager: Scanning for .class files in directory: /tmp/sqoop/compile
    09/11/19 22:45:03 DEBUG orm.CompilationManager: Considering entry: /tmp/sqoop/compile/AssetType.class
    09/11/19 22:45:03 DEBUG orm.CompilationManager: Got classfile: /tmp/sqoop/compile/AssetType.class -> AssetType.class
    09/11/19 22:45:03 DEBUG orm.CompilationManager: Considering entry: /tmp/sqoop/compile/AssetType.jar
    09/11/19 22:45:03 DEBUG orm.CompilationManager: Finished writing jar file /tmp/sqoop/compile/AssetType.jar
    09/11/19 22:45:03 INFO manager.LocalMySQLManager: Beginning mysqldump fast path import
    09/11/19 22:45:03 INFO manager.LocalMySQLManager: Performing import of table AssetType from database prod_content
    09/11/19 22:45:03 DEBUG manager.LocalMySQLManager: Starting mysqldump with arguments:
    09/11/19 22:45:03 DEBUG manager.LocalMySQLManager: mysqldump
    09/11/19 22:45:03 DEBUG manager.LocalMySQLManager: --skip-opt
    09/11/19 22:45:03 DEBUG manager.LocalMySQLManager: --compact
    09/11/19 22:45:03 DEBUG manager.LocalMySQLManager: --no-create-db
    09/11/19 22:45:03 DEBUG manager.LocalMySQLManager: --no-create-info
    09/11/19 22:45:03 DEBUG manager.LocalMySQLManager: --quick
    09/11/19 22:45:03 DEBUG manager.LocalMySQLManager: --single-transaction
    09/11/19 22:45:03 DEBUG manager.LocalMySQLManager: --user=root
    09/11/19 22:45:03 DEBUG manager.LocalMySQLManager: prod_content
    09/11/19 22:45:03 DEBUG manager.LocalMySQLManager: AssetType
    09/11/19 22:45:04 DEBUG util.DirectImportUtils: Writing to filesystem: hdfs://localhost:8022
    09/11/19 22:45:04 DEBUG util.DirectImportUtils: Creating destination directory AssetType
    09/11/19 22:45:04 DEBUG io.SplittingOutputStream: Opening next output file: AssetType/data-00000
    09/11/19 22:45:04 DEBUG manager.LocalMySQLManager: User-specified delimiters; using reparsing import
    09/11/19 22:45:04 INFO manager.LocalMySQLManager: Converting data to use specified delimiters.
    09/11/19 22:45:04 INFO manager.LocalMySQLManager: (For the fastest possible import, use
    09/11/19 22:45:04 INFO manager.LocalMySQLManager: --mysql-delimiters to specify the same field
    09/11/19 22:45:04 INFO manager.LocalMySQLManager: delimiters as are used by mysqldump.)
    09/11/19 22:45:04 INFO manager.LocalMySQLManager: Transfer loop complete.
    09/11/19 22:45:04 INFO manager.LocalMySQLManager: Transferred 261 bytes in 0.066 seconds (3.861 KB/sec)
    09/11/19 22:45:04 INFO hive.HiveImport: Loading uploaded data into Hive
    09/11/19 22:45:04 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM AssetType AS t LIMIT 1
    09/11/19 22:45:04 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM AssetType AS t LIMIT 1
    09/11/19 22:45:04 DEBUG hive.TableDefWriter: Create statement: CREATE TABLE AssetType ( AssetTypeID INT, Description STRING) COMMENT 'Imported by sqoop on 2009/11/19 22:45:04' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\01' LINES TERMINATED BY '\012' STORED AS TEXTFILE
    09/11/19 22:45:04 DEBUG hive.TableDefWriter: Load statement: LOAD DATA INPATH 'hdfs://localhost:8022/user/jon/AssetType' INTO TABLE AssetType
    09/11/19 22:45:05 INFO hive.HiveImport: Hive history file=/tmp/jon/hive_job_log_jon_200911192245_1352574233.txt
    09/11/19 22:45:08 INFO hive.HiveImport: OK
    09/11/19 22:45:08 INFO hive.HiveImport: Time taken: 3.346 seconds
    09/11/19 22:45:08 INFO hive.HiveImport: Loading data to table assettype
    09/11/19 22:45:08 INFO hive.HiveImport: OK
    09/11/19 22:45:08 INFO hive.HiveImport: Time taken: 0.22 seconds
    09/11/19 22:45:08 INFO hive.HiveImport: Hive import complete.
  • question

    Jonathan Marston replied on November 19, 2009 18:49 to the question "sqoop hive import giving null query values" in Cloudera:

    Jonathan Marston
    i actually hadn't explicitly used the --direct parameter. When I tried, i got a different error message. :(

    ===================

    sqoop --connect jdbc:mysql://localhost:3306/prod_content --username root --table AssetType --hive-import --direct --fields-terminated-by '\0001' --lines-terminated-by '\n'

    09/11/19 13:43:58 INFO manager.LocalMySQLManager: Transferred 0 bytes in 765.8869 seconds (0 bytes/sec)
    Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/io/output/CountingOutputStream
    at org.apache.hadoop.sqoop.util.DirectImportUtils.createHdfsSink(DirectImportUtils.java:85)
    at org.apache.hadoop.sqoop.manager.LocalMySQLManager.importTable(LocalMySQLManager.java:450)
    at org.apache.hadoop.sqoop.Sqoop.importTable(Sqoop.java:91)
    at org.apache.hadoop.sqoop.Sqoop.run(Sqoop.java:175)

    ========================

    if i try and do the import with the delimiters suggested but not use the --direct parameter, i get the same results as before with all null values.
  • question

    Jonathan Marston asked a question in Cloudera on November 19, 2009 04:13:

    Jonathan Marston
    sqoop hive import giving null query values
    I'm running into a problem using sqoop to bring data into hive. the data appears to get into hive, but running a query against the hive table yields all 'null' results. the row and column counts are correct, but all null values. I've tried this against a number of different mysql tables all with the same results. I am using CDH2 0.20.1+152-1~jaunty~cdh2 on ubuntu

    here's the mysql table:

    ===================

    CREATE TABLE IF NOT EXISTS `AssetType` (
    `AssetTypeID` int(10) NOT NULL AUTO_INCREMENT,
    `Description` char(50) NOT NULL,
    PRIMARY KEY (`AssetTypeID`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=26 ;

    INSERT INTO `AssetType` (`AssetTypeID`, `Description`) VALUES
    (1, 'rawaudio'),
    (2, 'clearspeakaudio'),
    (3, 'slowspeakaudio'),
    (4, 'video'),
    (5, 'thumb'),
    (6, 'others'),
    (8, 'videoprocessorfiles'),
    (13, 'avatar'),
    (14, 'word'),
    (15, 'dictionary'),
    (16, 'grammar'),
    (17, 'featuredpicture'),
    (18, 'filler'),
    (19, 'demopicture'),
    (20, 'wordpronunciation'),
    (21, 'lineset'),
    (22, 'wordpronunciationall'),
    (23, 'linesetall'),
    (24, 'dictionaryall'),
    (25, 'grammarall');

    =====================

    then running the following command imports into hive

    sqoop --connect jdbc:mysql://localhost:3306/prod_content --username root --table AssetType --hive-import

    09/11/18 22:14:36 INFO sqoop.ImportOptions: Using Hive-specific delimiters for output. You can override
    09/11/18 22:14:36 INFO sqoop.ImportOptions: delimiters with --fields-terminated-by, etc.
    09/11/18 22:14:36 INFO sqoop.Sqoop: Beginning code generation
    09/11/18 22:14:36 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM AssetType AS t LIMIT 1
    09/11/18 22:14:36 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM AssetType AS t LIMIT 1
    09/11/18 22:14:36 INFO orm.CompilationManager: HADOOP_HOME is /usr/lib/hadoop-0.20
    09/11/18 22:14:36 INFO orm.CompilationManager: Found hadoop core jar at: /usr/lib/hadoop-0.20/hadoop-0.20.1+152-core.jar
    Note: ./AssetType.java uses or overrides a deprecated API.
    Note: Recompile with -Xlint:deprecation for details.
    09/11/18 22:14:37 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop/compile/AssetType.jar
    09/11/18 22:14:37 WARN manager.MySQLManager: It looks like you are importing from mysql on
    09/11/18 22:14:37 WARN manager.MySQLManager: localhost. This transfer can be faster! Use the
    09/11/18 22:14:37 WARN manager.MySQLManager: --direct option to exercise a MySQL-specific fast
    09/11/18 22:14:37 WARN manager.MySQLManager: path.
    09/11/18 22:14:37 INFO mapreduce.DataDrivenImportJob: Beginning data-driven import of AssetType
    09/11/18 22:14:37 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM AssetType AS t LIMIT 1
    09/11/18 22:15:03 INFO mapreduce.DataDrivenImportJob: Transferred 281 bytes in 25.8179 seconds (10.8839 bytes/sec)
    09/11/18 22:15:03 INFO hive.HiveImport: Removing temporary files from import process: AssetType/_logs
    09/11/18 22:15:03 INFO hive.HiveImport: Loading uploaded data into Hive
    09/11/18 22:15:03 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM AssetType AS t LIMIT 1
    09/11/18 22:15:03 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM AssetType AS t LIMIT 1
    09/11/18 22:15:04 INFO hive.HiveImport: Hive history file=/tmp/jon/hive_job_log_jon_200911182215_1757140011.txt
    09/11/18 22:15:08 INFO hive.HiveImport: OK
    09/11/18 22:15:08 INFO hive.HiveImport: Time taken: 3.628 seconds
    09/11/18 22:15:08 INFO hive.HiveImport: Loading data to table assettype
    09/11/18 22:15:08 INFO hive.HiveImport: OK
    09/11/18 22:15:08 INFO hive.HiveImport: Time taken: 0.245 seconds
    09/11/18 22:15:08 INFO hive.HiveImport: Hive import complete.

    ================

    connecting to hive and running a query shows the right number of rows, but all the values are null

    hive> select * from assettype;
    OK
    NULL NULL
    NULL NULL
    NULL NULL
    NULL NULL
    NULL NULL
    NULL NULL
    NULL NULL
    NULL NULL
    NULL NULL
    NULL NULL
    NULL NULL
    NULL NULL
    NULL NULL
    NULL NULL
    NULL NULL
    NULL NULL
    NULL NULL
    NULL NULL
    NULL NULL
    NULL NULL
    Time taken: 4.127 seconds

    hive> describe extended assettype;
    OK
    assettypeid int
    description string

    Detailed Table Information Table(tableName:assettype, dbName:default, owner:jon, createTime:1258600504, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:assettypeid, type:int, comment:null), FieldSchema(name:description, type:string, comment:null)], location:hdfs://localhost:8022/user/hive/warehouse/assettype, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1,line.delim=
    ,field.delim=1}), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[], parameters:{comment=Imported by sqoop on 2009/11/18 22:15:03})
    Time taken: 3.5 seconds

    ===================

    BUT when I open up cloudera desktop (very cool product by the way) and use the file browser to look at the file /user/hive/warehouse/assettype/part-m-00000 I can see that the contents from the mysql table are in there

    1.rawaudio
    2.clearspeakaudio
    3.slowspeakaudio
    4.video
    5.thumb
    6.others

    ====================

    any thoughts?
  • question

    Jonathan Marston replied on October 26, 2009 23:08 to the question "Sqoop import of large tables can time out" in Cloudera:

    Jonathan Marston
    hmm, it looks like it might be something else. Here's a summary:

    I have a database with 68 tables. Sqoop is able to process most of them, but 13 of them consistently fail to copy. The log4j output reports no errors though. Here is an example for the table SessionType

    09/10/26 18:58:41 INFO mapreduce.DataDrivenImportJob: Beginning data-driven import of SessionType
    09/10/26 18:58:42 DEBUG sqoop.ConnFactory: Loaded manager factory: org.apache.hadoop.sqoop.manager.DefaultManagerFactory
    09/10/26 18:58:42 DEBUG sqoop.ConnFactory: Trying ManagerFactory: org.apache.hadoop.sqoop.manager.DefaultManagerFactory
    09/10/26 18:58:42 DEBUG manager.DefaultManagerFactory: Trying with scheme: jdbc:mysql:
    09/10/26 18:58:42 DEBUG sqoop.ConnFactory: Instantiated ConnManager.
    09/10/26 18:58:43 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM SessionType AS t LIMIT 1
    09/10/26 18:59:26 INFO mapreduce.DataDrivenImportJob: Transferred 0 bytes in 42.9181 seconds (0 bytes/sec)

    This particular table is quite small, here is a sql statement creating it along with it's contents

    CREATE TABLE IF NOT EXISTS `SessionType` (
    `SessionTypeID` int(11) NOT NULL,
    `Name` varchar(200) DEFAULT NULL,
    `Description` varchar(200) DEFAULT NULL,
    PRIMARY KEY (`SessionTypeID`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

    INSERT INTO `SessionType` (`SessionTypeID`, `Name`, `Description`) VALUES
    (1, 'Practice', NULL),
    (2, 'QuizVerb1', NULL),
    (3, 'QuizVerbAll', NULL),
    (4, 'QuizNoun1', NULL),
    (5, 'QuizNounAll', NULL),
    (6, 'QuizAny1', NULL),
    (7, 'QuizAny2', NULL),
    (8, 'QuizAll', NULL);

    I have noticed that there is an hdfs folder created: SessionType/_logs/history

    Can I email you the contents of the log file to see if that helps diagnose the problem?
  • question

    Jonathan Marston replied on October 26, 2009 01:05 to the question "Sqoop import of large tables can time out" in Cloudera:

    Jonathan Marston
    note, i'm using it on a mac, so i can't use the source rpm approach
  • question

    Jonathan Marston asked a question in Cloudera on October 26, 2009 00:13:

    Jonathan Marston
    Sqoop import of large tables can time out
    I'm running into a problem trying to export from a large table. I see that there is a patch commited for Hadoop 0.21

    http://issues.apache.org/jira/browse/...

    I'm using cloudera distribution hadoop-0.20.1+133

    Will this patch be incorporated into the cloudera distribution soon?

    How would I go about applying this patch myself?
  • question

    Jonathan Marston replied on October 25, 2009 23:40 to the question "why can't hive load data inpath with training vm 0.3.1" in Cloudera:

    Jonathan Marston
    I'm seeing the same problem trying to import sqoop output into hive.

    to switch to port 8022, what does your hive-site.xml file look like? doesn't sound like that will work based on your findings, but i thought i would try.
  • question

    Jonathan Marston asked a question in Cloudera on October 25, 2009 22:59:

    Jonathan Marston
    sqoop needs to escape table names
    I'm trying to run sqoop against my mysql database, but there are a couple of tables that it is choking on. These tables have names that cause a sql error. This could be solved by escaping the tablename in the query run by sqoop.

    so my error is this:

    09/10/25 18:34:00 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM Character AS t LIMIT 1
    09/10/25 18:34:00 ERROR manager.SqlManager: Error executing statement: com.mysql.jdbc.exceptions.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corre

    which chokes on the table named Character. The same query fails when i try to run it in phpmyadmin. but this query works

    SELECT t.* FROM `Character` AS t LIMIT 1

    this is similar to an earlier problem with unescaped column names.

    Thanks for the tool and I'm looking forward to using it. let me know if there is any info I can offer to help.