logo
发布于

设置debezium2.0+支持confluent-schema-registry

6544-–
作者
  • avatar
    姓名
    zhli

Debezium 是一个开源高性能的数据库变更捕获(Change Date Capture简称CDC)平台。它能监控数据库中的数据变更,包括 增、删、改操作,并将这些变更事件记录下来,以实时流的形式发布到Kafka。

Debezium默认发布的每条事件都包含了冗长的schema信息。

key.converter.schemas.enable = false
value.converter.schemas.enable = false

通过上述配置,可以减少80%左右的事件大小。

Apache Avro 是一种开源的数据序列化框架,它提供了一种紧凑且高效的数据交换格式。使用Avro序列化,可以进一步减少50%的事件大小

Debezium从2.0开始取消了内置的Confluent schema registry的支持,需要用的话,需要自己构建Debezium的镜像。

官方的镜像代码在 Github 可以下载,我们基于connect-base/2.3来构建.

需要增加Confluence的以下库:

  • kafka-connect-avro-converter
  • kafka-connect-avro-data
  • kafka-avro-serializer
  • kafka-schema-serializer
  • kafka-schema-registry-client
  • common-config
  • common-utils

一定要注意版本,不同版本的可能会出现不兼容的情况。其中

  1. Kafka,使用的镜像版本是:bitnami/kafka:3.4.0;
  2. Confluent Schema Registry使用的镜像版本是:confluentinc/cp-schema-registry:7.0.1

Dockerfile如下:

FROM debezium/kafka:2.3.2.Final

LABEL maintainer="Debezium Community"

USER root
RUN microdnf -y install libaio && microdnf clean all

USER kafka

EXPOSE 8083 8778
VOLUME ["/kafka/data","/kafka/logs","/kafka/config"]

COPY --chown=kafka:kafka docker-entrypoint.sh /
COPY --chown=kafka:kafka log4j.properties $KAFKA_HOME/config/log4j.properties
COPY --chown=kafka:kafka docker-maven-download.sh /usr/local/bin/docker-maven-download
RUN chmod +x /usr/local/bin/docker-maven-download
RUN chmod +x /docker-entrypoint.sh
#
# Set up the plugins directory ...
#
ENV KAFKA_CONNECT_PLUGINS_DIR=$KAFKA_HOME/connect \
    EXTERNAL_LIBS_DIR=$KAFKA_HOME/external_libs \
    CONNECT_PLUGIN_PATH=$KAFKA_CONNECT_PLUGINS_DIR \
    MAVEN_DEP_DESTINATION=$KAFKA_HOME/libs \    
    JOLOKIA_VERSION=1.7.2 \
    CONFLUENT_VERSION=7.0.1 \
    AVRO_VERSION=1.10.1 \    
    GUAVA_VERSION=31.0.1-jre

RUN mkdir "$KAFKA_CONNECT_PLUGINS_DIR" "$EXTERNAL_LIBS_DIR"

#
# The `docker-entrypoint.sh` script will automatically discover the child directories
# within the $KAFKA_CONNECT_PLUGINS_DIR directory (e.g., `/kafka/connect`), and place
# all of the files in those child directories onto the Java classpath.
#
# The general recommendation is to create a separate child directory for each connector
# (e.g., "debezium-connector-mysql"), and to place that connector's JAR files
# and other resource files in that child directory.
#
# However, use a single directory for connectors when those connectors share dependencies.
# This will prevent the classes in the shared dependencies from appearing in multiple JARs
# on the classpath, which results in arcane NoSuchMethodError exceptions.
#

RUN docker-maven-download central org/jolokia jolokia-jvm "$JOLOKIA_VERSION" d489d62d1143e6a2e85a869a4b824a67

RUN docker-maven-download confluent kafka-connect-avro-converter "$CONFLUENT_VERSION" fd03a1436f29d39e1807e2fb6f8e415a && \
    docker-maven-download confluent kafka-connect-avro-data "$CONFLUENT_VERSION" d27f30e9eca4ef1129289c626e9ce1f1 && \
    docker-maven-download confluent kafka-avro-serializer "$CONFLUENT_VERSION" c72420603422ef54d61f493ca338187c && \
    docker-maven-download confluent kafka-schema-serializer "$CONFLUENT_VERSION" 9c510db58119ef66d692ae172d5b1204 && \
    docker-maven-download confluent kafka-schema-registry-client "$CONFLUENT_VERSION" 7449df1f5c9a51c3e82e776eb7814bf1 && \
    docker-maven-download confluent common-config "$CONFLUENT_VERSION" aab5670de446af5b6f10710e2eb86894 && \
    docker-maven-download confluent common-utils "$CONFLUENT_VERSION" 74bf5cc6de2748148f5770bccd83a37c && \
    docker-maven-download central org/apache/avro avro "$AVRO_VERSION" 35469fee6d74ecbadce4773bfe3a204c && \    
    docker-maven-download central com/google/guava guava "$GUAVA_VERSION" bb811ca86cba6506cca5d415cd5559a7

ENV DEBEZIUM_VERSION="2.3.2.Final" \     
    MAVEN_REPOS_ADDITIONAL="" \
    MAVEN_DEP_DESTINATION=$KAFKA_CONNECT_PLUGINS_DIR \   
    MYSQL_MD5=dfd11bddc043c935ddb68c8207eafd75 \    
    SQLSERVER_MD5=90ce09673bc2a862b8867b0f1607fb52 \    
    KCRESTEXT_MD5=c28e8184a234db32b4c32780ba789c10 \
    SCRIPTING_MD5=0726417249e59e375404c1b670e9a237

RUN \
    docker-maven-download debezium mysql "$DEBEZIUM_VERSION" "$MYSQL_MD5" && \
    docker-maven-download debezium sqlserver "$DEBEZIUM_VERSION" "$SQLSERVER_MD5" && \
    docker-maven-download debezium-optional connect-rest-extension "$DEBEZIUM_VERSION" "$KCRESTEXT_MD5" && \
    docker-maven-download debezium-optional scripting "$DEBEZIUM_VERSION" "$SCRIPTING_MD5"

ENTRYPOINT ["/docker-entrypoint.sh"]
CMD ["start"]

上面的文件只安装了对mysql和sqlserver的支持,如果需要支持其他数据库,可以参考下官方Dockerfile