Aula Virtual Formación Hadoop: Ejercicios Apache Spark

Foro Formación Hadoop

Ejercicios Apache Spark

Hola.

Tengo un error de syntaxis , pero no lo veo.

[cloudera@quickstart ~]$ val logs=sc.textFile("hdfs:///user/curso/weblogs/2014-03-08.log").filter(line => line.contains(".jpg")).saveAsTextFile("hdfs:///user/curso/jpgs")
bash: syntax error near unexpected token `('

Saludos

Enlace permanente

Hola Miguel,

No tienes ningún error, el problema es que estás ejecutando la sentencia fuera de la shell de Spark.

Veo que estás utilizando Scala, por lo que debes acceder a la shell poniendo:

spark-shell

Un saludo,

Enlace permanente | Mostrar mensaje anterior

Tienes razón Fernando es que tengo varias terminales abiertas y había ejecutado el RDD directamente en linux, sim embargo la he vuelto ejecutar dentro de spark-shell y me aparece este error.

scala> val logs=sc.textFile("hdfs:///user/curso/weblogs/2014-03-08.log").filter(line => line.contains(".jpg")).saveAsTextFile("hdfs:///user/curso/jpgs")
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://quickstart.cloudera:8020/user/curso/jpgs already exists
   at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:131)
   at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1177)
   at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1154)
   at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1154)
   at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
   at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
   at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
   at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1154)
   at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:1060)
   at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1026)
   at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1026)
   at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
   at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
   at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
   at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:1026)
   at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply$mcV$sp(PairRDDFunctions.scala:952)
   at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:952)
   at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:952)
   at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
   at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
   at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
   at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:951)
   at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply$mcV$sp(RDD.scala:1457)
   at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1436)
   at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1436)
   at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
   at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
   at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
   at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1436)
   at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:27)
   at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:32)
   at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:34)
   at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
   at $iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
   at $iwC$$iwC$$iwC.<init>(<console>:40)
   at $iwC$$iwC.<init>(<console>:42)
   at $iwC.<init>(<console>:44)
   at <init>(<console>:46)
   at .<init>(<console>:50)
   at .<clinit>(<console>)
   at .<init>(<console>:7)
   at .<clinit>(<console>)
   at $print(<console>)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1045)
   at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1326)
   at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:821)
   at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:852)
   at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:800)
   at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
   at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
   at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
   at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
   at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
   at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
   at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
   at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
   at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
   at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
   at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
   at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1064)
   at org.apache.spark.repl.Main$.main(Main.scala:35)
   at org.apache.spark.repl.Main.main(Main.scala)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:730)
   at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Saludos

Enlace permanente | Mostrar mensaje anterior

Hola Miguel,

Ahora el error es debido a que el directorio de salida en HDFS ya existe. Esto no puede pasar en el modo de escritura que estás utilizando:

Output directory hdfs://quickstart.cloudera:8020/user/curso/jpgs already exists

Un saludo,

Enlace permanente | Mostrar mensaje anterior

Novedades del sitio ►