Skip to content

Instantly share code, notes, and snippets.

@timosalm
Last active July 28, 2020 13:16
Show Gist options
  • Save timosalm/b17cb53a936b17d2e9f30b8acd760898 to your computer and use it in GitHub Desktop.
Save timosalm/b17cb53a936b17d2e9f30b8acd760898 to your computer and use it in GitHub Desktop.
Using S3 compatible storage with SCDF

Introduction

S3 compatible storage is currently not supported with the S3 source and sink available as app starters for Spring Cloud Data Flow. I added the S3 compatible storage support for new versions of the app starters using Spring Cloud Function available here with more information here.

Archive files built on Jul 8, 2020 (Commit #c397df1) are available here:

http://scdf-applications-jars.cfapps.io/s3-sink-rabbit-3.0.0-SNAPSHOT.jar
http://scdf-applications-jars.cfapps.io/s3-sink-kafka-3.0.0-SNAPSHOT.jar
http://scdf-applications-jars.cfapps.io/s3-source-rabbit-3.0.0-SNAPSHOT.jar
http://scdf-applications-jars.cfapps.io/s3-source-kafka-3.0.0-SNAPSHOT.jar

UPDATE: And with the release of version 2020.0.0-M2 here, e.g.

https://repo.spring.io/libs-milestone-local/org/springframework/cloud/stream/app/s3-sink-rabbit/3.0.0-M2/s3-sink-rabbit-3.0.0-M2.jar
https://repo.spring.io/libs-milestone-local/org/springframework/cloud/stream/app/s3-sink-kafka/3.0.0-M2/s3-sink-kafka-3.0.0-M2.jar
https://repo.spring.io/libs-milestone-local/org/springframework/cloud/stream/app/s3-source-rabbit/3.0.0-M2/s3-source-rabbit-3.0.0-M2.jar
https://repo.spring.io/libs-milestone-local/org/springframework/cloud/stream/app/s3-source-kafka/3.0.0-M2/s3-source-kafka-3.0.0-M2.jar

As an alternative you can also build the applications yourself. For instructions how to build the archives, see the documentation here

Stream definitions

Using the S3 source with compatible storage

See the official documentation for the options here

s3-source --s3.common.endpoint-url=https://example.com --s3.supplier.remote-dir=/source-bucket
 --cloud.aws.credentials.accessKey=YOUR-ACCES-KEY --cloud.aws.credentials.secretKey=YOUR-SECRET-KEY 
 --cloud.aws.stack.auto=false --cloud.aws.region.static=eu-central-1 --outputType=application/octet-stream
 --JBP_CONFIG_SPRING_AUTO_RECONFIGURATION='{enabled: false}' --SPRING_PROFILES_ACTIVE=cloud 
  • s3.common.endpoint-url: The endpoint url to your S3 compatible storage. If not set, the default AWS Endpoint based on the defined region will be used
  • s3.supplier.remote-dir: The source bucket
  • cloud.aws.region.static: Has to be set to a valid AWS S3 region (even if it's not relevant for the compatible storage)
  • cloud.aws.stack.auto: Has to be disabled via false
  • outputType: Has to be set to "application/octet-stream" because currently there is an issue that the outgoing message content-type is "application/json" by default.
  • JBP_CONFIG_SPRING_AUTO_RECONFIGURATION, SPRING_PROFILES_ACTIVE: Required because of auto-reconfiguration conflicts. Starting with v2.0, SCDF and Skipper switched to the Java-CFEnv project to autoconfigure datasource and other services automatically in CF. See documentation here. IMPORTANT To enable the ENVIRONMENT variables you have to deploy your stream with stream deploy --properties "deployer.*.cloudfoundry.use-spring-application-json=false" my-stream or see another option here

Using the S3 sink with compatible storage

See the official documentation for the options here

s3-sink --s3.common.endpoint-url=https://example.com --s3.consumer.bucket=/target-bucket 
 --s3.consumer.key-expression=headers.file_name
 --cloud.aws.credentials.accessKey=YOUR-ACCES-KEY --cloud.aws.credentials.secretKey=YOUR-SECRET-KEY 
 --cloud.aws.stack.auto=false --cloud.aws.region.static=eu-central-1
 --JBP_CONFIG_SPRING_AUTO_RECONFIGURATION='{enabled: false}' --SPRING_PROFILES_ACTIVE=cloud 
  • s3.common.endpoint-url: The endpoint url to your S3 compatible storage. If not set, the default AWS Endpoint based on the defined region will be used
  • s3.consumer.bucket: The target bucket
  • s3.consumer.key-expression: Expression to evaluate S3 Object key. If you directly connect a S3 source with a S3 sink the source file name is available in headers.file_name
  • cloud.aws.region.static: Has to be set to a valid AWS S3 region (even if it's not relevant for the compatible storage)
  • cloud.aws.stack.auto: Has to be disabled via false
  • JBP_CONFIG_SPRING_AUTO_RECONFIGURATION, SPRING_PROFILES_ACTIVE: Required because of auto-reconfiguration conflicts. Starting with v2.0, SCDF and Skipper switched to the Java-CFEnv project to autoconfigure datasource and other services automatically in CF. See documentation here. IMPORTANT To enable the ENVIRONMENT variables you have to deploy your stream with stream deploy --properties "deployer.*.cloudfoundry.use-spring-application-json=false" my-stream or see another option here

Hints

  • If you want to enable debug logging for all applications that are based on spring integration, you can deploy your stream with:
    stream deploy --properties "app.*.logging.level.org.springframework.integration=DEBUG" my-stream
    
    See https://dataflow.spring.io/docs/resources/faq/#debuglogs for more details.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment