Recent activity
Subscribe to this feed
srand() asked a question in Cloudera on July 23, 2009 09:44:
Pig does not order data correctlyHi,
I've imported data from a MySQL db thanks to sqoop. However when I try to order this data on 2 fields it does not return the same answer as MySQL does (which is the correct result)
Here is the code I use :
grunt> A = LOAD 'hdfs://hadoopM:54310/user/hadoop/rental' USING PigStorage(',') AS (rental_id:int, rental_date:chararray, inventory_id:int, customer_id:int);
grunt> B = ORDER A BY inventory_id DESC, customer_id ASC;
grunt> C = LIMIT B 20;
grunt> DUMP C;
Here is the result with Pig :
(132,2005-05-25 21:46:54.0,3367,479)
(263,2005-05-26 15:47:40.0,1160,449)
(324,2005-05-27 01:00:04.0,3364,292)
(359,2005-05-27 06:48:33.0,1156,152)
(582,2005-05-28 11:33:46.0,4579,198)
(711,2005-05-29 03:49:03.0,4581,215)
(809,2005-05-29 19:10:20.0,2114,222)
(927,2005-05-30 12:16:40.0,1158,167)
(1084,2005-05-31 11:10:17.0,4577,12)
(1341,2005-06-15 12:26:18.0,3363,344)
(1493,2005-06-15 21:50:32.0,4581,235)
(1537,2005-06-16 00:52:51.0,4577,594)
(1625,2005-06-16 07:49:08.0,3367,39)
(1729,2005-06-16 15:29:47.0,3364,523)
(1945,2005-06-17 07:51:26.0,3366,207)
(2137,2005-06-17 21:18:28.0,1158,581)
(2149,2005-06-17 22:50:00.0,3365,333)
(2321,2005-06-18 09:42:42.0,1160,565)
(2799,2005-06-19 19:15:21.0,4579,576)
(2806,2005-06-19 19:30:48.0,2114,510)
Here is the result with MySQL :
mysql> select rental_id, rental_date, inventory_id, customer_id from rental order by inventory_id desc , customer_id asc limit 20
+-----------+---------------------+--------------+-------------+
| rental_id | rental_date | inventory_id | customer_id |
+-----------+---------------------+--------------+-------------+
| 711 | 2005-05-29 03:49:03 | 4581 | 215 |
| 6712 | 2005-07-12 13:24:47 | 4581 | 226 |
| 1493 | 2005-06-15 21:50:32 | 4581 | 235 |
| 9701 | 2005-07-31 07:32:21 | 4581 | 401 |
| 12894 | 2005-08-19 03:49:28 | 4581 | 541 |
| 10479 | 2005-08-01 10:11:25 | 4580 | 275 |
| 15916 | 2005-08-23 17:56:01 | 4580 | 327 |
| 5274 | 2005-07-09 14:34:09 | 4579 | 108 |
| 582 | 2005-05-28 11:33:46 | 4579 | 198 |
| 12458 | 2005-08-18 11:22:53 | 4579 | 277 |
| 8289 | 2005-07-29 02:23:24 | 4579 | 459 |
| 2799 | 2005-06-19 19:15:21 | 4579 | 576 |
| 11453 | 2005-08-02 21:00:05 | 4578 | 84 |
| 12456 | 2005-08-18 11:21:51 | 4578 | 85 |
| 6664 | 2005-07-12 11:28:22 | 4578 | 351 |
| 1084 | 2005-05-31 11:10:17 | 4577 | 12 |
| 5972 | 2005-07-11 00:08:54 | 4577 | 30 |
| 12854 | 2005-08-19 02:18:51 | 4577 | 362 |
| 9644 | 2005-07-31 05:40:35 | 4577 | 441 |
| 1537 | 2005-06-16 00:52:51 | 4577 | 594 |
+-----------+---------------------+--------------+-------------+
20 rows in set (3.10 sec)
thanks
srand() replied on July 23, 2009 09:36 to the question "Pig 0.2.0 distribution" in Cloudera:
srand() replied on July 22, 2009 09:53 to the question "sqoop getting java.lang.NoClassDefFoundError when trying to import a table" in Cloudera:
srand() replied on July 21, 2009 20:41 to the question "sqoop getting java.lang.NoClassDefFoundError when trying to import a table" in Cloudera:
Hi Aaron,
thanks for your suggestion. I hadn't set JAVA_HOME as hadoop was working well without specifying it, I was thinking that each binaries-oop was finding the JDK by itself.
Now it 's working better but .......... fails !
09/07/21 22:57:23 WARN mapred.JobClient: Error reading task outputConnection refused
09/07/21 22:57:23 WARN mapred.JobClient: Error reading task outputConnection refused
09/07/21 22:57:24 INFO mapred.JobClient: Task Id : attempt_200907211208_0006_m_000001_2, Status : FAILED
Failed to rename output with the exception: java.io.IOException: Can not get the relative path: base = hdfs://hadoopM:54310/user/hadoop/store/_temporary/_attempt_200907211208_0006_m_000001_2 child = hdfs://HadoopM:54310/user/hadoop/store/_temporary/_attempt_200907211208_0006_m_000001_2/part-00001
at org.apache.hadoop.mapred.Task.getFinalPath(Task.java:586)
at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:599)
at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:617)
at org.apache.hadoop.mapred.Task.saveTaskOutput(Task.java:561)
at org.apache.hadoop.mapred.JobTracker$TaskCommitQueue.run(JobTracker.java:2061)
09/07/21 22:57:24 WARN mapred.JobClient: Error reading task outputConnection refused
09/07/21 22:57:24 WARN mapred.JobClient: Error reading task outputConnection refused
09/07/21 22:57:26 ERROR sqoop.Sqoop: Encountered IOException running import job: java.io.IOException: Job failed!
I have to investiguate.
thanks
srand() asked a question in Cloudera on July 21, 2009 10:25:
sqoop getting java.lang.NoClassDefFoundError when trying to import a tableEverything is working well when I use sqoop to list databases or tables in my MySQL server but not when I want to import data into hdfs
sqoop --connect jdbc:mysql://127.0.0.1/sakila --username root --password root --table staff
09/07/21 12:31:01 INFO sqoop.Sqoop: Beginning code generation
09/07/21 12:31:02 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM staff AS t WHERE 1 = 1
09/07/21 12:31:02 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM staff AS t WHERE 1 = 1
09/07/21 12:31:02 ERROR orm.ClassWriter: Cannot resolve SQL type -4
09/07/21 12:31:02 ERROR orm.ClassWriter: No Java type for SQL type -4
09/07/21 12:31:02 ERROR orm.ClassWriter: No Java type for SQL type -4
09/07/21 12:31:02 ERROR orm.ClassWriter: No Java type for SQL type -4
09/07/21 12:31:02 ERROR orm.ClassWriter: No Java type for SQL type -4
09/07/21 12:31:02 ERROR orm.ClassWriter: No Java type for SQL type -4
09/07/21 12:31:02 DEBUG orm.ClassWriter: Writing source file: ./staff.java
09/07/21 12:31:02 DEBUG orm.ClassWriter: Table name: staff
09/07/21 12:31:02 DEBUG orm.ClassWriter: Columns: staff_id:-6, first_name:12, last_name:12, address_id:5, picture:-4, email:12, store_id:-6, active:-7, usern
ame:12, password:12, last_update:93,
09/07/21 12:31:02 DEBUG orm.ClassWriter: Could not create directory tree for .
09/07/21 12:31:02 DEBUG orm.CompilationManager: Warning: Could not make directories for /tmp/sqoop/compile/
09/07/21 12:31:02 INFO orm.CompilationManager: HADOOP_HOME is /usr/lib/hadoop
09/07/21 12:31:02 INFO orm.CompilationManager: Found hadoop core jar at: /usr/lib/hadoop/hadoop-0.18.3-6cloudera0.3.0-core.jar
09/07/21 12:31:02 INFO orm.CompilationManager: Invoking javac with args: -sourcepath ./ -d /tmp/sqoop/compile/ -classpath /etc/hadoop/conf:/usr/lib/jvm/java-
6-sun//lib/tools.jar:/usr/lib/hadoop:/usr/lib/hadoop/hadoop-0.18.3-6cloudera0.3.0-core.jar:/usr/lib/hadoop/lib/commons-cli-2.0-SNAPSHOT.jar:/usr/lib/hadoop/l
ib/commons-codec-1.3.jar:/usr/lib/hadoop/lib/commons-httpclient-3.0.1.jar:/usr/lib/hadoop/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop/lib/commons-logging-a
pi-1.0.4.jar:/usr/lib/hadoop/lib/commons-net-1.4.1.jar:/usr/lib/hadoop/lib/hadoop-0.18.3-6cloudera0.3.0-fairscheduler.jar:/usr/lib/hadoop/lib/hadoop-0.18.3-6
cloudera0.3.0-scribe-log4j.jar:/usr/lib/hadoop/lib/hsqldb.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/jetty-5.1.4.jar:/usr/lib/hadoop/lib/ju
nit-4.5.jar:/usr/lib/hadoop/lib/kfs-0.1.3.jar:/usr/lib/hadoop/lib/libfb303.jar:/usr/lib/hadoop/lib/libthrift.jar:/usr/lib/hadoop/lib/log4j-1.2.15.jar:/usr/li
b/hadoop/lib/mysql-connector-java-5.0.8-bin.jar:/usr/lib/hadoop/lib/oro-2.0.8.jar:/usr/lib/hadoop/lib/servlet-api.jar:/usr/lib/hadoop/lib/slf4j-api-1.4.3.jar
:/usr/lib/hadoop/lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-ext/commons-el.jar:/usr/lib/hadoop/lib/jetty-ext/j
asper-compiler.jar:/usr/lib/hadoop/lib/jetty-ext/jasper-runtime.jar:/usr/lib/hadoop/lib/jetty-ext/jsp-api.jar:/usr/lib/hadoop/hadoop-0.18.3-6cloudera0.3.0-co
re.jar:/usr/lib/hadoop/contrib/sqoop/hadoop-0.18.3-6cloudera0.3.0-sqoop.jar ./staff.java
java.lang.NoClassDefFoundError: com/sun/tools/javac/Main
at org.apache.hadoop.sqoop.orm.CompilationManager.compile(CompilationManager.java:166)
at org.apache.hadoop.sqoop.Sqoop.generateORM(Sqoop.java:65)
at org.apache.hadoop.sqoop.Sqoop.importTable(Sqoop.java:76)
at org.apache.hadoop.sqoop.Sqoop.run(Sqoop.java:160)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.sqoop.Sqoop.main(Sqoop.java:178)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
Caused by: java.lang.ClassNotFoundException: com.sun.tools.javac.Main
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
... 16 more
FYI, I've enabled sqoop debug in /etc/hadoop/conf/log4j.properties
srand() replied on December 22, 2008 08:54 to the problem "I cannot post to blogspot anymore" in Flock:
srand() replied on December 17, 2008 15:17 to the problem "google desktop does not work with flock >= 2.0" in Flock:
srand() replied on December 17, 2008 14:43 to the problem "google desktop does not work with flock >= 2.0" in Flock:
srand() replied on December 17, 2008 09:31 to the problem "google desktop does not work with flock >= 2.0" in Flock:
srand() replied on December 17, 2008 09:12 to the problem "I cannot post to blogspot anymore" in Flock:
srand() replied on December 17, 2008 09:09 to the problem "I cannot post to blogspot anymore" in Flock:
srand() replied on December 17, 2008 09:08 to the problem "google desktop does not work with flock >= 2.0" in Flock:
srand() replied on December 16, 2008 21:00 to the problem "I cannot post to blogspot anymore" in Flock:
srand() replied on December 16, 2008 20:57 to the problem "google desktop does not work with flock >= 2.0" in Flock:
srand() replied on December 16, 2008 09:53 to the problem "google desktop does not work with flock >= 2.0" in Flock:
srand() reported a problem in Flock on November 27, 2008 15:50:
google desktop does not work with flock >= 2.0google desktop was working well with flock 1.2 but since flock 2.0 (2.0.2 too) it does not work anymore.
Loading Profile...

