NOTE: HDFS is required for Flink's DistributedCache which distributes Python plans to worker nodes. We use BlueData Hadoop CDH nodes.
Remember to make sure you aren't using env.execute(local=True)
in your Python plans!
On the master node:
-
Install
git
and other useful things that we likesudo yum install git bzip2 -y