File not found error at step 2 in yarn logs

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

File not found error at step 2 in yarn logs

Gavin_Chou
Hi, all:
        I have a problem while building cube at step 2.

        The error appears in yarn log:

2017-06-14 11:21:08,793 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1497364689294_0018 transitioned from NEW to INITING
2017-06-14 11:21:08,793 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Adding container_1497364689294_0018_01_000001 to application application_1497364689294_0018
2017-06-14 11:21:08,793 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1497364689294_0018 transitioned from INITING to RUNNING
2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1497364689294_0018_01_000001 transitioned from NEW to LOCALIZING
2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_INIT for appId application_1497364689294_0018
2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource file:/tmp/hadoop-yarn/staging/hadoop/.staging/job_1497364689294_0018/job.jar transitioned from INIT to DOWNLOADING
2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource file:/tmp/hadoop-yarn/staging/hadoop/.staging/job_1497364689294_0018/job.splitmetainfo transitioned from INIT to DOWNLOADING
2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource file:/tmp/hadoop-yarn/staging/hadoop/.staging/job_1497364689294_0018/job.split transitioned from INIT to DOWNLOADING
2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource file:/tmp/hadoop-yarn/staging/hadoop/.staging/job_1497364689294_0018/job.xml transitioned from INIT to DOWNLOADING
2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource file:/home/q/hadoop/kylin/tomcat/temp/kylin_job_meta3892468167792432608/meta transitioned from INIT to DOWNLOADING
2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Created localizer for container_1497364689294_0018_01_000001
2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ file:/home/q/hadoop/kylin/tomcat/temp/kylin_job_meta3892468167792432608/meta, 1497410467000, FILE, null }
2017-06-14 11:21:08,796 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Writing credentials to the nmPrivate file /home/q/hadoop/hadoop/tmp/nm-local-dir/nmPrivate/container_1497364689294_0018_01_000001.tokens. Credentials list:
2017-06-14 11:21:08,796 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download rsrc { { file:/home/q/hadoop/kylin/tomcat/temp/kylin_job_meta3892468167792432608/meta, 1497410467000, FILE, null },pending,[(container_1497364689294_0018_01_000001)],781495827608056,DOWNLOADING}
java.io.FileNotFoundException: File file:/home/q/hadoop/kylin/tomcat/temp/kylin_job_meta3892468167792432608/meta does not exist
        at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:524)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:737)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:514)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:397)
        at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:250)
        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:353)
        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
2017-06-14 11:21:08,796 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user hadoop
2017-06-14 11:21:08,797 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource file:/home/q/hadoop/kylin/tomcat/temp/kylin_job_meta3892468167792432608/meta(->/home/q/hadoop/hadoop/tmp/nm-local-dir/filecache/18/meta) transitioned from DOWNLOADING to FAILED
2017-06-14 11:21:08,797 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1497364689294_0018_01_000001 transitioned from LOCALIZING to LOCALIZATION_FAILED
2017-06-14 11:21:08,797 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: Container container_1497364689294_0018_01_000001 sent RELEASE event on a resource request { file:/home/q/hadoop/kylin/tomcat/temp/kylin_job_meta3892468167792432608/meta, 1497410467000, FILE, null } not present in cache.
2017-06-14 11:21:08,797 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hadoop OPERATION=Container Finished - Failed TARGET=ContainerImpl RESULT=FAILURE DESCRIPTION=Container failed with state: LOCALIZATION_FAILED APPID=application_1497364689294_0018 CONTAINERID=container_1497364689294_0018_01_000001
2017-06-14 11:21:08,797 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1497364689294_0018_01_000001 transitioned from LOCALIZATION_FAILED to DONE

        This error appears in yarn-nodemanager log of machine B and D. And before it I found a warning log in yarn-nodemanager log in machine C (Kylin is only installed in machine A):

2017-06-14 11:21:01,131 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1497364689294_0017_01_000002 transitioned from LOCALIZING to LOCALIZED
2017-06-14 11:21:01,146 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1497364689294_0017_01_000002 transitioned from LOCALIZED to RUNNING
2017-06-14 11:21:01,146 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Neither virutal-memory nor physical-memory monitoring is needed. Not running the monitor-thread
2017-06-14 11:21:01,149 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [nice, -n, 0, bash, /home/q/hadoop/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1497364689294_0017/container_1497364689294_0017_01_000002/default_container_executor.sh]
2017-06-14 11:21:05,024 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Stopping container with container Id: container_1497364689294_0017_01_000002
2017-06-14 11:21:05,025 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hadoop IP=10.90.181.160 OPERATION=Stop Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1497364689294_0017 CONTAINERID=container_1497364689294_0017_01_000002
2017-06-14 11:21:05,025 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1497364689294_0017_01_000002 transitioned from RUNNING to KILLING
2017-06-14 11:21:05,025 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1497364689294_0017_01_000002
2017-06-14 11:21:05,028 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1497364689294_0017_01_000002 is : 143
2017-06-14 11:21:05,040 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1497364689294_0017_01_000002 transitioned from KILLING to CONTAINER_CLEANEDUP_AFTER_KILL
2017-06-14 11:21:05,041 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hadoop OPERATION=Container Finished - Killed TARGET=ContainerImpl RESULT=SUCCESS APPID=application_1497364689294_0017 CONTAINERID=container_1497364689294_0017_01_000002
2017-06-14 11:21:05,041 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1497364689294_0017_01_000002 transitioned from CONTAINER_CLEANEDUP_AFTER_KILL to DONE

        It puzzles me that why kylin wants to load a local file by applications on other nodes in step 2? How can I solve it?

        Here are some additional information(They may be helpful for analyzing the problem):
                The cluster has 4 machines: A B C and D.
                Hadoop version 2.5.0  support snappy      
                              Namenode: A(stand by) B(active)
                              Datanode: all
                Hive version 0.13.1 recompile for hadoop2
                HBase version 0.98.6 recompile for hadoop 2.5.0
                             Master: A(active) and B
                When I set “hbase.rootdir” in hbase-site.xml as detail IP address of active namenode, the step 2 is ok, but it will failed at the last 5 step.
                So I change the setting item to cluster name. And there is no problem in hbase logs.

Thank you

Best regards  




Reply | Threaded
Open this post in threaded view
|

Re: File not found error at step 2 in yarn logs

Yang
Kylin sends metadata as distributed cache of MR job. The missing file
"file:/home/q/hadoop/kylin/tomcat/temp/kylin_job_meta3892468167792432608/meta"
should be prepared on machine B and D before YARN kicks off mappers.

As to why the files were not there.... I don't know.

On Wed, Jun 14, 2017 at 12:12 PM, Gavin_Chou <[hidden email]> wrote:

> Hi, all:
> I have a problem while building cube at step 2.
>
> The error appears in yarn log:
>
> 2017-06-14 11:21:08,793 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.application.Application: Application
> application_1497364689294_0018 transitioned from NEW to INITING
> 2017-06-14 11:21:08,793 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.application.Application: Adding
> container_1497364689294_0018_01_000001 to application application_
> 1497364689294_0018
> 2017-06-14 11:21:08,793 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.application.Application: Application
> application_1497364689294_0018 transitioned from INITING to RUNNING
> 2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.container.Container: Container
> container_1497364689294_0018_01_000001 transitioned from NEW to LOCALIZING
> 2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.AuxServices: Got event CONTAINER_INIT for
> appId application_1497364689294_0018
> 2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.localizer.LocalizedResource: Resource
> file:/tmp/hadoop-yarn/staging/hadoop/.staging/job_1497364689294_0018/job.jar
> transitioned from INIT to DOWNLOADING
> 2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.localizer.LocalizedResource: Resource
> file:/tmp/hadoop-yarn/staging/hadoop/.staging/job_1497364689294_0018/job.splitmetainfo
> transitioned from INIT to DOWNLOADING
> 2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.localizer.LocalizedResource: Resource
> file:/tmp/hadoop-yarn/staging/hadoop/.staging/job_1497364689294_0018/job.split
> transitioned from INIT to DOWNLOADING
> 2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.localizer.LocalizedResource: Resource
> file:/tmp/hadoop-yarn/staging/hadoop/.staging/job_1497364689294_0018/job.xml
> transitioned from INIT to DOWNLOADING
> 2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.localizer.LocalizedResource:
> Resource file:/home/q/hadoop/kylin/tomcat/temp/kylin_job_meta3892468167792432608/meta
> transitioned from INIT to DOWNLOADING
> 2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.localizer.ResourceLocalizationService:
> Created localizer for container_1497364689294_0018_01_000001
> 2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.localizer.ResourceLocalizationService:
> Downloading public rsrc:{ file:/home/q/hadoop/kylin/tomcat/temp/kylin_job_meta3892468167792432608/meta,
> 1497410467000, FILE, null }
> 2017-06-14 11:21:08,796 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.localizer.ResourceLocalizationService:
> Writing credentials to the nmPrivate file /home/q/hadoop/hadoop/
> tmp/nm-local-dir/nmPrivate/container_1497364689294_0018_01_000001.tokens.
> Credentials list:
> 2017-06-14 11:21:08,796 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.localizer.ResourceLocalizationService:
> Failed to download rsrc { { file:/home/q/hadoop/kylin/
> tomcat/temp/kylin_job_meta3892468167792432608/meta, 1497410467000, FILE,
> null },pending,[(container_1497364689294_0018_01_000001)]
> ,781495827608056,DOWNLOADING}
> java.io.FileNotFoundException: File file:/home/q/hadoop/kylin/
> tomcat/temp/kylin_job_meta3892468167792432608/meta does not exist
> at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(
> RawLocalFileSystem.java:524)
> at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(
> RawLocalFileSystem.java:737)
> at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(
> RawLocalFileSystem.java:514)
> at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(
> FilterFileSystem.java:397)
> at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:250)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:353)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> 2017-06-14 11:21:08,796 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
> Initializing user hadoop
> 2017-06-14 11:21:08,797 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.localizer.LocalizedResource:
> Resource file:/home/q/hadoop/kylin/tomcat/temp/kylin_job_
> meta3892468167792432608/meta(->/home/q/hadoop/hadoop/tmp/nm-local-dir/filecache/18/meta)
> transitioned from DOWNLOADING to FAILED
> 2017-06-14 11:21:08,797 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.container.Container: Container
> container_1497364689294_0018_01_000001 transitioned from LOCALIZING to
> LOCALIZATION_FAILED
> 2017-06-14 11:21:08,797 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
> Container container_1497364689294_0018_01_000001 sent RELEASE event on a
> resource request { file:/home/q/hadoop/kylin/tomcat/temp/kylin_job_meta3892468167792432608/meta,
> 1497410467000, FILE, null } not present in cache.
> 2017-06-14 11:21:08,797 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger:
> USER=hadoop OPERATION=Container Finished - Failed TARGET=ContainerImpl
> RESULT=FAILURE DESCRIPTION=Container failed with state:
> LOCALIZATION_FAILED APPID=application_1497364689294_0018
> CONTAINERID=container_1497364689294_0018_01_000001
> 2017-06-14 11:21:08,797 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.container.Container: Container
> container_1497364689294_0018_01_000001 transitioned
> from LOCALIZATION_FAILED to DONE
>
> This error appears in yarn-nodemanager log of machine B and D. And before
> it I found a warning log in yarn-nodemanager log in machine C (Kylin is
> only installed in machine A):
>
> 2017-06-14 11:21:01,131 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.container.Container: Container
> container_1497364689294_0017_01_000002 transitioned from LOCALIZING to
> LOCALIZED
> 2017-06-14 11:21:01,146 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.container.Container: Container
> container_1497364689294_0017_01_000002 transitioned from LOCALIZED to
> RUNNING
> 2017-06-14 11:21:01,146 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.monitor.ContainersMonitorImpl: Neither
> virutal-memory nor physical-memory monitoring is needed. Not running the
> monitor-thread
> 2017-06-14 11:21:01,149 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
> launchContainer: [nice, -n, 0, bash, /home/q/hadoop/hadoop/tmp/nm-
> local-dir/usercache/hadoop/appcache/application_
> 1497364689294_0017/container_1497364689294_0017_01_000002/
> default_container_executor.sh]
> 2017-06-14 11:21:05,024 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.ContainerManagerImpl: Stopping container
> with container Id: container_1497364689294_0017_01_000002
> 2017-06-14 11:21:05,025 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger:
> USER=hadoop IP=10.90.181.160 OPERATION=Stop Container Request
> TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_
> 1497364689294_0017 CONTAINERID=container_1497364689294_0017_01_000002
> 2017-06-14 11:21:05,025 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.container.Container: Container
> container_1497364689294_0017_01_000002 transitioned from RUNNING to
> KILLING
> 2017-06-14 11:21:05,025 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up
> container container_1497364689294_0017_01_000002
> 2017-06-14 11:21:05,028 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
> Exit code from container container_1497364689294_0017_01_000002 is : 143
> 2017-06-14 11:21:05,040 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.container.Container: Container
> container_1497364689294_0017_01_000002 transitioned from KILLING to
> CONTAINER_CLEANEDUP_AFTER_KILL
> 2017-06-14 11:21:05,041 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger:
> USER=hadoop OPERATION=Container Finished - Killed TARGET=ContainerImpl
> RESULT=SUCCESS APPID=application_1497364689294_0017 CONTAINERID=container_
> 1497364689294_0017_01_000002
> 2017-06-14 11:21:05,041 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.container.Container: Container
> container_1497364689294_0017_01_000002 transitioned
> from CONTAINER_CLEANEDUP_AFTER_KILL to DONE
>
> It puzzles me that why kylin wants to load a local file by applications on
> other nodes in step 2? How can I solve it?
>
> Here are some additional information(They may be helpful for analyzing the
> problem):
> The cluster has 4 machines: A B C and D.
> Hadoop version 2.5.0  support snappy
>       Namenode: A(stand by) B(active)
>       Datanode: all
> Hive version 0.13.1 recompile for hadoop2
> HBase version 0.98.6 recompile for hadoop 2.5.0
>      Master: A(active) and B
> When I set “hbase.rootdir” in hbase-site.xml as detail IP address of
> active namenode, the step 2 is ok, but it will failed at the last 5 step.
> So I change the setting item to cluster name. And there is no problem in
> hbase logs.
>
> Thank you
>
> Best regards
>
>
>
>
>