lakeFS gives Git-like capabilities over your MinIO storage, allowing you to coordinate with colleagues when working on your data.
In the following example, we will use lakeFS to create a branch on your storage, commit changes to it, and then merge it to the master branch.
For this example we will use a Postgres instance within a docker container. A production-suitable installation will require a persistent Postgres installation.
We will install lakeFS locally on your development machine. For more installation options, see lakeFS docs.
Create a docker-compose environment file for lakeFS, replacing <minio_access_key_id>
, <minio_secret_access_key>
and <minio_endpoint>
with their values in your MinIO installation. Run the following commands:
LAKEFS_CONFIG_FILE=./.lakefs-env
echo "LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_KEY_ID=<minio_access_key_id>" > $LAKEFS_CONFIG_FILE
echo "LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_SECRET_KEY=<minio_secret_access_key>" >> $LAKEFS_CONFIG_FILE
echo "LAKEFS_BLOCKSTORE_S3_ENDPOINT=<minio_endpoint>" >> $LAKEFS_CONFIG_FILE
Then start lakeFS:
curl https://compose.lakefs.io | docker-compose --env-file $LAKEFS_CONFIG_FILE -f - up
Browse to lakeFS to create an admin user: http://127.0.0.1:8000/setup
Take note of the generated access key and secret.
We will use the lakectl
binary to perform lakeFS operations. Find the distribution suitable to your operating system here, and extract the lakectl
binary from the tar.gz archive. Put it somewhere in your $PATH and run lakectl --version
to verify.
Then run the following command to configure lakectl (use the credentials given to you in the setup before):
lakectl config
# output:
# Config file /home/janedoe/.lakectl.yaml will be used
# Access key ID: <LAKEFS_ACCESS_KEY_ID>
# Secret access key: <LAKEFS_SECRET_KEY>
# Server endpoint URL: http://lakefs.example.com:8000/api/v1
Verify that lakectl
can access lakeFS with the command:
lakectl repo list
If no error is displayed, you are good to go. Now let's set a MinIO alias for lakeFS:
mc alias set lakefs http://s3.local.lakefs.io <LAKEFS_ACCESS_KEY_ID> <LAKEFS_SECRET_KEY>
Create a bucket in MinIO. Note that this bucket is created directly in your installation of MinIO. Later we will use lakeFS to enable versioning on this bucket.
mc mb myminio/example-bucket
Create a repoistory in lakeFS:
lakectl repo create lakefs://example-repo s3://example-bucket
Create two example files:
echo "my first file" > myfile.txt
echo "my second file" > myfile2.txt
Copy the file to your master branch, and commit:
mc cp ./myfile.txt lakefs/example-repo/master/
lakectl commit lakefs://example-repo@master -m "my first commit"
Now let's create a branch named branch1
, and copy a file to it:
lakectl branch create lakefs://example-repo@branch1 --source lakefs://example-repo@master
mc cp ./myfile2.txt lakefs/example-repo/branch1/
List master and the branch and see that the new file is only visibile in the branch, while the older file is visible in both the branch and the master.
mc ls lakefs/example-repo/master
# only myfile.txt should be listed
mc ls lakefs/example-repo/branch1
# both files should be listed
Now let's commit the branch, and merge it back to master:
lakectl commit lakefs://example-repo@branch1 -m "my second commit"
lakectl merge lakefs://example-repo@branch1 lakefs://example-repo@master
Now both files are accessible through master:
mc ls lakefs/example-repo/master