scriptsbta.blogg.se

In MM1 the topic name at the source is typically the same at the target cluster and is automatically created in the downstream cluster. MM2 avoids this unnecessary data copying by a direct passthrough from source to sink.

Now if we simply take a Kafka source and sink connector and deploy them in tandem to do replication, the data would need to hop through an intermediate Kafka cluster. In MM2 only one connect cluster is needed for all the cross-cluster replications between a pair of datacenters. In settings where there are multiple clusters across multiple data centers in active-active settings, it would be prohibitive to have an MM2 cluster for each target cluster. Connect internally always needs a Kafka cluster to store its state and this is called the “primary” cluster which in this case would be the target cluster. As with MM1, the pattern of remote-consume and local-produce is recommended, thus in the simplest source-target replication pair, the MM2 Connect cluster is paired with the target Kafka cluster. In a typical Connect configuration, the source-connector writes data into a Kafka cluster from an external source and the sink-connector reads data from a Kafka cluster and writes to an external repository. MM2 is based on the Kafka Connect framework and can be viewed at its core as a combination of a Kafka source and sink connector. Moreover, handling active-active clusters and disaster recovery are use cases that MM2 supports out of the box. MM2 ( KIP-382 ) fixes the limitations of Mirrormaker 1 with the ability to dynamically change configurations, keep the topic properties in sync across clusters and improve performance significantly by reducing rebalances to a minimum. As we had discussed in the blog, the current Apache Kafka solution with Mirrormaker 1 has known limitations in providing an enterprise managed disaster recovery solution. In our previous blog on A Case for Mirromaker 2, we had discussed how enterprises rely on Apache Kafka as an essential component of their data pipelines and require that the data availability and durability guarantees cover for entire cluster or datacenter failures.