Add instructions to build Spark image

parent b5f1cb16
## Spark image build instructions
The spark image is located in [](, but the image must be updated in order to upgrade
spark or hadoop version, or even enable different components like yarn integration.
### Build a custom Spark distribution
It is recommended to build a spark distribution from the source code, as the Hadoop jars embedded in spark 2.4 releases are very old. To build
* Clone the [git repo](
* Run `git checkout branch-2.4` to move to a 2.4 release branch
* Run the following command to build the custom distribution:
./dev/ --name spark-hadoop28 --pip --tgz -Phadoop-2.8 -Dhadoop.version=2.8.4 -Phive -Phive-thriftserver -Pkubernetes -DskipTests
After the script execution, you will end up with a tgz file in the git repo dir. Now we can proceed with the image build.
### Build the Spark image
The Spark image is built from a tool created by the []( project. This will make our images easier to create as it uses a tool called [CEKit]( to modularize the steps to create the container image.
To build a Spark image, follow the [instructions]( in git repo to build the new image and push to your container registry.
\ No newline at end of file
Markdown is supported
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment