A brief description of Apache Kafka
Posted By : Lokesh Babu Sharma | 30-Aug-2019
Introduction
Apache Kafka, a pub/sub messaging system, is a stream-processing software platform. It is developed by LinkedIn in early 2011 and donated to the Apache Software Foundation.Java and Scala languages have been used to develop Kafka. The project aims is to hand over a durable, fast, scalable, and fault-tolerant publish-subscribe messaging system for handling real-time data feeds. Its storage layer is essentially making it highly valuable for enterprise infrastructures to process streaming data because it is a massively scalable public / subscribe message queue. Kafka can also connect to external systems to import and export data via Kafka Connect and hand over Kafka Streams, a Java stream processing library.
Generally, Kafka is used for two broad classes of applications first one is building real-time streaming data pipelines that easily transfer data between systems or applications and the second one is building real-time streaming applications that transform or react data as streams.
Generally, Kafka is used for two broad classes of applications first one is building real-time streaming data pipelines that easily transfer data between systems or applications and the second one is building real-time streaming applications that transform or react data as streams.
First concepts of apache Kafka are :
1. Kafka works as a cluster in one or more servers that can connect multiple datacenters.
2. Kafka stores the stream of 'records' according to categories called topics.
3. Each record consists of a timestamp, a key, and a value.
2. Kafka stores the stream of 'records' according to categories called topics.
3. Each record consists of a timestamp, a key, and a value.
Apache Kafka Architecture
Mainly four core APIs are used in Kafka:
1. Producer API - Publish streams of records to one or more Kafka topics.
2. Consumer API - Subscribe one or more topics and process the streams of records.
3. Streams API - Converts the input stream to output stream and producing results. Kafka works as a stream processor, consuming an input stream and producing an output stream.
4. Connector API - Running and building reusable producers and consumers that connect Kafka topics to already in use applications or data systems.
Kafka as a cluster have different components:
1. Kafka Broker - A Kafka cluster has one or more servers (Kafka brokers) to maintain load balance. Due to stateless brokers use ZooKeeper to maintain their cluster state. A single Kafka broker instance can handle thousands of reads and writes in a second and each broker can handle TB of messages without reducing performance. Kafka topics are divided into many partitions, each partition can be placed on a single or separate machine to allow for multiple consumers to read from a topic in parallel.
2. Kafka Zookeeper - a Zookeeper is a top-level software developed by Apache which used to maintain configuration data and naming and to hand over robust and flexible synchronization within distributed systems and acts as a centralized service. Zookeeper keeps information on the status of cluster nodes of Kafka and keeps information on Kafka topics, partitions, etc. The brain of the whole system is known as Zookeeper Atomic Broadcast (ZAB) protocol.
3. Kafka Producer - Producers send data to brokers. All the producers search the newly started broker and automatically sends a message to that newly started broker. Kafka producer doesn’t wait for acknowledgments from the broker and Kafka producer sends messages as fast as the broker can handle, it doesn’t wait for acknowledgments from the broker.
4. Kafka Consumers - Kafka broker is stateless this is why consumers manage that how many messages have been consumed. The broker is issued an asynchronous pull request to the consumer to have bytes buffer ready to consume. Once the consumer acknowledges a particular message offset, the consumer has consumed all prior messages.
fundamental concepts
5. Kafka topics - The stream of a particular type /classification of data is defined by a topic. The producer produces a stream of data with topics then consumers consume these data topic wise. In the Kafka, cluster topics name must be unique. We can use unlimited topics. We can update data or messages after gets published.
6. Partitions in Kafka - Topics are breaks into the partition and replicated across brokers. Messages, each assigned an incremental id called offset, are stored in sequence manner in a partition, These messages are meaningful only within the partition.
7. Topic replication Factor - Topic makes a replica in another broker. If any broker goes down, topics replica's from another broker can solve this crisis.
2. Kafka Zookeeper - a Zookeeper is a top-level software developed by Apache which used to maintain configuration data and naming and to hand over robust and flexible synchronization within distributed systems and acts as a centralized service. Zookeeper keeps information on the status of cluster nodes of Kafka and keeps information on Kafka topics, partitions, etc. The brain of the whole system is known as Zookeeper Atomic Broadcast (ZAB) protocol.
3. Kafka Producer - Producers send data to brokers. All the producers search the newly started broker and automatically sends a message to that newly started broker. Kafka producer doesn’t wait for acknowledgments from the broker and Kafka producer sends messages as fast as the broker can handle, it doesn’t wait for acknowledgments from the broker.
4. Kafka Consumers - Kafka broker is stateless this is why consumers manage that how many messages have been consumed. The broker is issued an asynchronous pull request to the consumer to have bytes buffer ready to consume. Once the consumer acknowledges a particular message offset, the consumer has consumed all prior messages.
fundamental concepts
5. Kafka topics - The stream of a particular type /classification of data is defined by a topic. The producer produces a stream of data with topics then consumers consume these data topic wise. In the Kafka, cluster topics name must be unique. We can use unlimited topics. We can update data or messages after gets published.
6. Partitions in Kafka - Topics are breaks into the partition and replicated across brokers. Messages, each assigned an incremental id called offset, are stored in sequence manner in a partition, These messages are meaningful only within the partition.
7. Topic replication Factor - Topic makes a replica in another broker. If any broker goes down, topics replica's from another broker can solve this crisis.
Setup Apache Kafka :
1. First Install JDK using the following commands:
sudo apt-get update sudo apt-get install default-jdk
2. Setup Kafka using the following commands:
wget http://www-us.apache.org/dist/kafka/2.2.1/kafka_2.12-2.2.1.tgz extract the archive file tar xzf kafka_2.12-2.2.1.tgz move file mv kafka_2.12-2.2.1 /usr/local/kafka
3. Start Kafka :
cd /usr/local/kafka
4. First run Zookeeper :
bin/zookeeper-server-start.sh config/zookeeper.properties
5. Then run Kafka :
bin/kafka-server-start.sh config/server.properties
Conclusion
Apache Kafka is one of the best pub/sub messaging systems. It is quite simple to use as a publish-subscribe system.
Request for Proposal
Cookies are important to the proper functioning of a site. To improve your experience, we use cookies to remember log-in details and provide secure log-in, collect statistics to optimize site functionality, and deliver content tailored to your interests. Click Agree and Proceed to accept cookies and go directly to the site or click on View Cookie Settings to see detailed descriptions of the types of cookies and choose whether to accept certain cookies while on the site.
About Author
Lokesh Babu Sharma
Lokesh is in backend team. He Believes in smart work, He is good in programming. He loves to play with codes.He has expertise in Java.