Foro Formación Hadoop

Descarga de un fichero HDFS a local

 
Picture of Óscar Villa Salcedo
Descarga de un fichero HDFS a local
by Óscar Villa Salcedo - Monday, 26 February 2018, 6:20 PM
 

Buenas tardes.

Al intentar ejecutar la instrucción hadoop fs -cat /tmp/oscar/formacionhadoop/desarrollador/cervantes/novela/quijote.txt | tail - n 50 se produce el siguiente error:

124-127.0.0.1-1500470886981:blk_1073742774_1950 file=/tmp/oscar/formacionhadoop/desarrollador/cervantes/novela/quijote.txt
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:996)

...

cat: Could not obtain block: BP-1028023124-127.0.0.1-1500470886981:blk_1073742774_1950 file=/tmp/oscar/formacionhadoop/desarrollador/cervantes/novela/quijote.txt

 

Estos son los servicios levantados:

[cloudera@quickstart init.d]$ sudo jps -m
17907 EventCatcherService
4403 ResourceManager
18113 Main --pipeline-type SERVICE_MONITORING --mgmt-home /usr/share/cmf
3280 HRegionServer start
18112 Main --pipeline-type HOST_MONITORING --mgmt-home /usr/share/cmf
2744 DataNode
9348 Jps -m
4447 Bootstrap start
2829 Bootstrap start
3397 ThriftServer start --port 9090 -threadpool --bind 0.0.0.0
27693 RunJar /usr/lib/hive/lib/hive-service-1.1.0-cdh5.12.0.jar org.apache.hive.service.server.HiveServer2 --hiveconf hive.aux.jars.path=file:///usr/lib/hive/auxlib/hive-exec-1.1.0-cdh5.12.0-core.jar,file:///usr/lib/hive/auxlib/hive-exec-core.jar
2703 NameNode
27689 RunJar /usr/lib/hive/lib/hive-service-1.1.0-cdh5.12.0.jar org.apache.hadoop.hive.metastore.HiveMetaStore -p 9083
2797 SecondaryNameNode
2969 QuorumPeerMain /var/run/cloudera-scm-agent/process/16-zookeeper-server/zoo.cfg
2360 Main
4189 NodeManager
3141 HistoryServer --properties-file /var/run/cloudera-scm-agent/process/17-spark_on_yarn-SPARK_YARN_HISTORY_SERVER/spark-conf/spark-history-server.conf
3074 JobHistoryServer
2600 AlertPublisher

-----------------------

 

Es raro porque la misma instrucción la he ejecutado unos dias antes y no me ha dado problema. ¿Por qué puede ser?

 

Muchas gracias.

Un saludo,

Óscar.

Picture of Admin Formación Hadoop
Re: Descarga de un fichero HDFS a local
by Admin Formación Hadoop - Tuesday, 27 February 2018, 11:42 AM
 

Hola Óscar,

El error indica que no es capaz de encontrar uno de los bloques del fichero. ¿has probado a reiniciar los servicios de HDFS?

Algo parece que se te ha quedado "corrupto". Prueba también a borrar el fichero (para ver si te deja o no) y nos dices.

Un saludo,

Picture of Óscar Villa Salcedo
Re: Descarga de un fichero HDFS a local
by Óscar Villa Salcedo - Tuesday, 27 February 2018, 1:34 PM
 

Hola.

Gracias por la respuesta.

He reiniciado los servicios de hdfs y he podido lanzar el proceso. Aunque se han producido algunas excepciones creo que el proceso de mapReduce ha finalizado bien.

-----------

[cloudera@quickstart wordcount]$ hadoop jar wordcount-1.jar com.formacionhadoop.wordcount.WordCountDriver /tmp/oscar/formacionhadoop/desarrollador/cervantes/novela /tmp/oscar/formacionhadoop/desarrollador/ResulWordCountNovela
18/02/27 04:17:07 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032
18/02/27 04:17:08 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
18/02/27 04:17:09 INFO input.FileInputFormat: Total input paths to process : 13
18/02/27 04:17:09 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:952)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:690)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:879)
18/02/27 04:17:09 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:952)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:690)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:879)
18/02/27 04:17:09 INFO mapreduce.JobSubmitter: number of splits:13
18/02/27 04:17:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1519729858709_0005
18/02/27 04:17:10 INFO impl.YarnClientImpl: Submitted application application_1519729858709_0005
18/02/27 04:17:10 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1519729858709_0005/
18/02/27 04:17:10 INFO mapreduce.Job: Running job: job_1519729858709_0005
18/02/27 04:17:26 INFO mapreduce.Job: Job job_1519729858709_0005 running in uber mode : false
18/02/27 04:17:26 INFO mapreduce.Job: map 0% reduce 0%
18/02/27 04:17:55 INFO mapreduce.Job: map 15% reduce 0%
18/02/27 04:18:24 INFO mapreduce.Job: map 31% reduce 0%
18/02/27 04:18:44 INFO mapreduce.Job: map 46% reduce 0%
18/02/27 04:19:03 INFO mapreduce.Job: map 62% reduce 0%
18/02/27 04:19:22 INFO mapreduce.Job: map 77% reduce 0%
18/02/27 04:19:42 INFO mapreduce.Job: map 92% reduce 0%
18/02/27 04:20:00 INFO mapreduce.Job: map 100% reduce 0%
18/02/27 04:20:04 INFO mapreduce.Job: map 100% reduce 100%
18/02/27 04:20:04 INFO mapreduce.Job: Job job_1519729858709_0005 completed successfully
18/02/27 04:20:05 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=577523
FILE: Number of bytes written=3328251
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=3739852
HDFS: Number of bytes written=301399
HDFS: Number of read operations=42
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=13
Launched reduce tasks=1
Data-local map tasks=13
Total time spent by all maps in occupied slots (ms)=140245504
Total time spent by all reduces in occupied slots (ms)=9623040
Total time spent by all map tasks (ms)=273917
Total time spent by all reduce tasks (ms)=18795
Total vcore-milliseconds taken by all map tasks=273917
Total vcore-milliseconds taken by all reduce tasks=18795
Total megabyte-milliseconds taken by all map tasks=140245504
Total megabyte-milliseconds taken by all reduce tasks=9623040
Map-Reduce Framework
Map input records=25553
Map output records=719769
Map output bytes=6429481
Map output materialized bytes=949414
Input split bytes=2439
Combine input records=0
Combine output records=0
Reduce input groups=28468
Reduce shuffle bytes=949414
Reduce input records=719769
Reduce output records=28468
Spilled Records=1439538
Shuffled Maps =13
Failed Shuffles=0
Merged Map outputs=13
GC time elapsed (ms)=3104
CPU time spent (ms)=24140
Physical memory (bytes) snapshot=1714270208
Virtual memory (bytes) snapshot=9791086592
Total committed heap usage (bytes)=710148096
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=3737413
File Output Format Counters
Bytes Written=301399

-------------

 

Muchas gracias.

 

Un saludo,

Óscar.

Picture of Admin Formación Hadoop
Re: Descarga de un fichero HDFS a local
by Admin Formación Hadoop - Wednesday, 28 February 2018, 8:24 AM
 

Hola Oscar,

Si, efectivamente le proceso a terminado correctamente. Las excepciones son "normales" en la VM.

Un saludo,