flink yarn 运行

  • 2020-12-29
  • 浏览 (1516)

Apache Flink 是一个分布式大数据处理引擎,可对有限数据流和无限数据流进行有状态或无状态的计算,能够部署在各种集群环境,对各种规模大小的数据进行快速计算。可以看下Flink入门的一些概念,对flink理解更透彻。

准备

安装hadoop集群环境,可以参考hadoop3 docker 集群部署

可以到flink官网下载最新版本,这里使用1.12的版本。

wget https://mirror.bit.edu.cn/apache/flink/flink-1.12.0/flink-1.12.0-bin-scala_2.12.tgz

tar -zxvf flink-1.12.0-bin-scala_2.12.tgz

启动hadoop

cd /opt/hadoop/sbin/
./start-all.sh

Once you’ve made sure that the HADOOP_CLASSPATH environment variable is set, you can launch a Flink on YARN session, and submit an example job:

we assume to be in the root directory of the unzipped Flink distribution

(0) 设置环境变量: HADOOP_CLASSPATH

export HADOOP_CLASSPATH=`hadoop classpath`

(1) 启动 YARN Session

yarn-session.sh -n 2 -tm 800 -s 1
#-n 表示申请2个容器
#-s 表示每个容器启动多少个slot
#-tm 表示每个TaskManager申请800M内存

you will see the infomation like this:

2020-12-28 14:24:33,033 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Found Web Interface m1:45925 of application 'application_1609136262025_0002'.
JobManager Web Interface: http://m1:45925
2020-12-28 14:24:33,163 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                [] - The Flink YARN session cluster has been started in detached mode. In order to stop Flink gracefully, use the following command:
$ echo "stop" | ./bin/yarn-session.sh -id application_1609136262025_0002
If this should not be possible, then you can also kill Flink via YARN's web interface or via:
$ yarn application -kill application_1609136262025_0002
Note that killing Flink might not clean up all job artifacts and temporary files.

此次创建的yarn session的id会保存在/tmp/.yarn-properties-${user},如果同一用户在同一机器上再次创建一个 Yarn session,则这个文件会被覆盖掉。

You can now access the Flink Web Interface through the URL printed in the last lines of the command output, or through the YARN ResourceManager web UI.

可以用 http://m0:18088/ 访问yarn的集群信息。

(2) 提交 example job

./bin/flink run -yid application_1609136262025_0002  ./examples/streaming/TopSpeedWindowing.jar

application_1609136262025_0002 是启动yarn生成的id

(3) 停止 YARN session (replace the application id based on the output of the yarn-session.sh command)

echo "stop" | ./bin/yarn-session.sh -id application_XXXXX_XXX

Congratulations! You have successfully run a Flink application by deploying Flink on YARN.

参考

0  赞