Skip to content

Instantly share code, notes, and snippets.

@fuyi
Created August 28, 2022 06:49
Show Gist options
  • Save fuyi/c2386e35351e2538ee92839915d7a05e to your computer and use it in GitHub Desktop.
Save fuyi/c2386e35351e2538ee92839915d7a05e to your computer and use it in GitHub Desktop.
Dockerfile
FROM public.ecr.aws/lambda/python:3.8
RUN yum -y install java-1.8.0-openjdk wget curl
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
ENV JAVA_HOME="/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.302.b08-0.amzn2.0.1.x86_64/jre"
ENV PATH=${PATH}:${JAVA_HOME}/bin
ENV SPARK_HOME="/var/lang/lib/python3.8/site-packages/pyspark"
ENV PATH=$PATH:$SPARK_HOME/bin
ENV PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.9-src.zip:$PYTHONPATH
ENV PATH=$SPARK_HOME/python:$PATH
RUN mkdir $SPARK_HOME/conf
RUN echo "SPARK_LOCAL_IP=127.0.0.1" > $SPARK_HOME/conf/spark-env.sh
RUN chmod 777 $SPARK_HOME/conf/spark-env.sh
ARG HADOOP_VERSION=3.3.2
ARG AWS_SDK_VERSION=1.12.289
ARG HADOOP_JAR=https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/${HADOOP_VERSION}/hadoop-aws-${HADOOP_VERSION}.jar
ARG AWS_SDK_JAR=https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/${AWS_SDK_VERSION}/aws-java-sdk-bundle-${AWS_SDK_VERSION}.jar
ADD $HADOOP_JAR ${SPARK_HOME}/jars/
ADD $AWS_SDK_JAR ${SPARK_HOME}/jars/
COPY spark-class $SPARK_HOME/bin/spark-class
RUN chmod 777 $SPARK_HOME/bin/spark-class
COPY spark-defaults.conf $SPARK_HOME/conf/spark-defaults.conf
COPY pipeline.py ${LAMBDA_TASK_ROOT}
ENTRYPOINT ["python", "/var/task/pipeline.py"]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment