Setting up your own Apache Kafka cluster with Vagrant - Step by Step

This step-by-step walk-through will guide you through building an Apache Kafka cluster from the ground up, with vanilla Debian as a base on Vagrant boxes..

Object Partners

Apache Kafka is a distributed publish-subscribe messaging system that aims to be fast, scalable, and durable. If you want to just get up and running quickly with a cluster of Vagrant virtual machines configured with Kafka, take a look at this awesome blog post. It sets up all the VMs for you and configures each node in the cluster, in one fell swoop.

However, if you want to learn how to install and configure a Kafka cluster yourself, utilizing your own Vagrant boxes, then read on. This step-by-step walk-through will guide you through building a Kafka cluster from the ground up, with vanilla Debian as a base. Kafka requires Apache Zookeeper, a service that coordinates distributed applications. In this walk-through, we will setup our first box from scratch. We will then package that box and use it as the base box for the other nodes in the cluster. When we’re finished, we’ll have a fully functional 3-node Zookeeper and Kafka cluster. It would probably be a better practice to automate this via existing chef recipes, but that’s hardly walk-through material. We are going to do it the simple, long-winded way. And I think you will find that it isn’t too painful. Onward!

Part I - Setting up a single Zookeeper/Kafka node, starting from a Vagrant base box

  1. Download and install Virtualbox from virtualbox.org Note: This walk-through uses a Vagrant base box that requires Virtualbox 4.2.10. If you already have Vagrant configured to work with VMWare, there is a VMWare Fusion version of the same base box. I will point it out in step 3 below.

  2. Download and install Vagrant from vagrantup.com

  3. Initialize a new Vagrant box. This particular box is vanilla Debian from Puppet Labs. I recommend creating it in a directory with a name that accurately describes what the box represents. If you are new to Vagrant, it’s easy to get carried away and wind up with an over-abundance of VMs on your machine.

mkdir debian-cluster-node-1
cd debian-cluster-node-1
vagrant init debian-cluster-node-1 http://puppet-vagrant-boxes.puppetlabs.com/debian-70rc1-x64-vbox4210.box

Or, if you’re using Vagrant with VMWare:vagrant init debian-cluster-node-1 http://puppet-vagrant-boxes.puppetlabs.com/debian-70rc1-x64-vf503.boxThis will create a Vagrantfile in the directory. You use this file to configure your VM.

  1. Edit the Vagrantfile to your liking It’s a good idea to bump up the memory. 2048 should be sufficient.
config.vm.provider :virtualbox do |vb|
  vb.customize ["modifyvm", :id, "--memory", "2048"]
end

The only other setting of note is the private IP. This allows the host (your computer’s OS) and other VMs to access your new Vagrant box via a local network IP address.

Find the line\# config.vm.network :private_network, ip: "192.168.33.10"

Uncomment it, and change the IP address if you feel like it, otherwise just leave it as is. I set mine to 192.168.33.21. I will be referring to that IP address throughout this walk-through.

  1. Setup the Vagrant box Start the box:vagrant upThe first time takes quite awhile. It needs to download and unpack the box first.

Login to the box:vagrant sshInstall dependencies (you only need Java, and you might want to install a text editor too)

sudo apt-get update
sudo apt-get install openjdk-7-jdk

For the following steps, change to the root user (sudo su) 4. Download, build, and install Kafka I’ve had issues trying to get things up and running with just the binary download, so we’ll build from source. Even the Kafka Quick Start tells you to build from source, so that’s what we’re going to do here. Don’t worry, it’s easy. Note: You don’t have to install it in /usr/local/kafka. You can put it wherever you want.

wget https://archive.apache.org/dist/kafka/kafka-0.8.0-beta1-src.tgz
mkdir /usr/local/kafka
tar -zxvf kafka-0.8.0-beta1-src.tgz
cd kafka-0.8.0-beta1-src
./sbt update
./sbt package
./sbt assembly-package-dependency
cd ../
mv kafka-0.8.0-beta1-src /usr/local/kafka

5. Install Zookeeper _Note: You don't have to install it in /usr/local/zookeeper. You can put it wherever you want._

wget http://apache.claz.org/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz
mkdir /usr/local/zookeeper
tar -zxvf zookeeper-3.4.6.tar.gz --directory /usr/local/zookeeper
cp /usr/local/zookeeper/zookeeper-3.4.6/conf/zoo_sample.cfg /usr/local/zookeeper/zookeeper-3.4.6/conf/zoo.cfg

6. Configure Zookeeper Before configuring, create a directory for the Zookeeper data.``mkdir -p /var/zookeeper/dataEdit the Zookeeper configuration file, /usr/local/zookeeper/zookeeper-3.4.6/conf/zoo.cfg

Change the dataDir property to the directory you created above.dataDir=/var/zookeeper/dataFind the list of servers that’s commented out. If these lines aren’t there, add them.

#server.1=zookeeper1:2888:3888
#server.2=zookeeper2:2888:3888
#server.3=zookeeper3:2888:3888

Uncomment the server.1 property, and change “zookeeper1” to the private IP address that you assigned to this VM.server.1=192.168.33.21:2888:3888Important step, often forgot! We need to create a myid file in the data directory. Zookeeper uses a file named “myid” to identify itself within the cluster. It holds a single character, 1-255. Let’s set it to 1.echo "1" > /var/zookeeper/data/myid7. Configure Kafka If you followed the above installation instructions, the config directory will be here: /usr/local/kafka/kafka-0.8.0-beta1-src/config Edit the server.properties file Take note of the broker.id value. Each Kafka instance will need to have a unique broker.id, just as each Zookeeper instance needs to have a distinct value in the myid file. Let’s set this to 1.broker.id=1Uncomment #host.name=localhost and set it to the private IP address of the VM.host.name=192.168.33.21Locate the zookeeper.connect property. The default setting is fine, but we will be adding more nodes as we build up the cluster. Change “localhost” to the IP address of the VM.zookeeper.connect=192.168.33.21:21818. Test the current setup You probably want to add these to your ~/.bash_profile first

export ZK_HOME=/usr/local/zookeeper/zookeeper-3.4.6/
export KAFKA_HOME=/usr/local/kafka/kafka-0.8.0-beta1-src/
export PATH=$ZK_HOME/bin:$KAFKA_HOME/bin:$PATH

Start Zookeepersudo $ZK_HOME/bin/zkServer.sh startStart Kafkasudo $KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties &Test Kafka List topics (should not have any to start with)$KAFKA_HOME/bin/kafka-list-topic.sh --zookeeper 192.168.33.21:2181

Create a new topic$KAFKA_HOME/bin/kafka-create-topic.sh --zookeeper 192.168.33.21:2181 --replica 1 --partition 1 --topic topic-1

Produce messages to that topic from the console

$KAFKA_HOME/bin/kafka-console-producer.sh --broker-list 192.168.33.21:9092 --topic topic-1
Hi
My
Name
Is
Kafka

(ctrl-c to kill the console producer) Run the console consumer to verify that the messages are there for the new topic$KAFKA_HOME/bin/kafka-console-consumer.sh --zookeeper 192.168.33.21:2181 --topic topic-1 --from-beginning``You should see the output

Hi
My
Name
Is
Kafka

Assuming that everything works, it’s time to package up this box so that we can use it as our new base box for the other VMs in the cluster.

On your host, find the name of your current VM.VBoxManage list vms

Mine happens to be “vagrantdefault1399123653833_13594”

Now package it up into a box.vagrant package --base vagrant_default_1399123653833_13594 --output debian-cluster.box

Put the box in a more easily recognizable location.

mkdir ~/boxes
mv debian-cluster.box ~/boxes
  1. Shutdown the VMvagrant haltPart II - Adding new nodes to the cluster from the newly created base box

  2. Make a directory for a new cluster node and cd to it. “debian-cluster-node-2” sounds good to me.vagrant init debian-cluster-node-2 ~/boxes/debian-cluster.box2. Edit the Vagrantfile, do NOT overwrite it with the Vagrantfile from your other box. Set the memory to 2048 and set the private IP address to something different this time. I will use this: 192.168.33.22

  3. Start up the new box and log in

vagrant up
vagrant ssh
  1. Edit the Kafka config settings If you set $KAFKAHOME in your .bashprofile before packaging the box in Part I of this walk-through, it will be here: $KAFKA_HOME/config/server.properties

Set the following properties:

broker.id=2
host.name=192.168.33.22

Leave the Zookeeper settings alone for now.

  1. In another terminal window, start your first Vagrant box up again and log in. (Cluster Node 1)
vagrant up
vagrant ssh
  1. Start Zookeeper and Kafka
sudo $ZK_HOME/bin/zkServer.sh start
sudo $KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties &
  1. Go back to your newly created VM for your second cluster node and start Kafka (Cluster Node 2)sudo $KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties &That’s it! Your Kafka servers are now clustered together. To test, go back to the terminal window for node 1. Produce some messages to the topic that you created earlier, but this time use your new VM as the broker.
$KAFKA_HOME/bin/kafka-console-producer.sh --broker-list 192.168.33.22:9092 --topic topic-1
Hello
From
Broker 2

(ctrl-c)

Check to see that your messages were successfully produced$KAFKA_HOME/bin/kafka-console-consumer.sh --zookeeper 192.168.33.21:2181 --topic topic-1 --from-beginningYou should be able to produce messages to either broker now, or you can pass in both brokers to the console producer:$KAFKA_HOME/bin/kafka-console-producer.sh --broker-list 192.168.33.21:9092,192.168.33.22:9092 --topic topic-1

What about Zookeeper?

Zookeeper uses a “majority rule” strategy to make its decisions. If we were to setup a 2-server Zookeeper cluster, and 1 server died, then there would only be 1 out of 2 remaining, which is not enough to be a “majority.” See this post for a better explanation.

Now let’s add a third node so that we can configure a 3-node Zookeeper cluster. Follow steps 1-3 above, but name this node “debian-cluster-node-3”, and give it a different private IP in the Vagrantfile. I will use 192.168.33.23. At step 4, we’ll do things a little differently, so come back here when you’ve finished steps 1-3.

  1. Edit the Kafka server properties$KAFKA_HOME/config/server.propertiesJust as we did for the second node, we set the broker.id and host.name properties.
broker.id=3
host.name=192.168.33.23

This time, since we will have a 3-node Zookeeper cluster, we will also edit the zookeeper.connect property.zookeeper.connect=192.168.33.21:2181,192.168.33.22:2181,192.168.33.23:2181At this time, go back and edit the server.properties file in your other two boxes and set the zookeeper.connect property to be the same as what you have here.

  1. Edit the Zookeeper config (for all servers) We ignored this step when setting up the second node in the cluster, because we didn’t have enough servers for a proper Zookeeper cluster yet. We’re going to have to go back and take care of that now.

In all three of your servers, open up $ZK_HOME/conf/zoo.cfg file, and make sure you have the following:

server.1=192.168.33.21:2888:3888
server.2=192.168.33.22:2888:3888
server.3=192.168.33.23:2888:3888
  1. Set the myid file for the second and third servers Remember that 1-character long file we created on the first box? We need to do the same thing for the second and third servers, or our Zookeeper cluster will not work. Since we used “1” for the first server, let’s keep it simple for the other servers.

On your second server:echo "2" > /var/zookeeper/data/myidOn your third server:echo "3" > /var/zookeeper/data/myid7. Shut them all down, hurry! Ok, no hurry, but let’s shut down all the boxes and then bring them up one at a time, just to be sure we’re starting fresh.

For each VM

exit
vagrant halt
  1. Start them all up again
vagrant up
vagrant ssh
  1. Start Zookeeper and Kafka on each server Zookeeper first It’s a good idea to start up all the Zookeeper instances first before starting Kafka, so for each VM:sudo $ZK_HOME/bin/zkServer.sh startNow start Kafka on each node.sudo $KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties &Your cluster should be in full swing now!

Test again with the console producer, this time using the third node as the broker.

$KAFKA_HOME/bin/kafka-console-producer.sh --broker-list 192.168.33.23:9092 --topic topic-1
Hello
From
Broker
3

(ctrl-c)

And then use the console consumer to read the topic. This time, use one of your new Zookeeper nodes for the —zookeeper argument.$KAFKA_HOME/bin/kafka-console-consumer.sh --zookeeper 192.168.33.23:2181 --topic topic-1 --from-beginningNow let’s create a new replicated topic and produce some messages to it.

$KAFKA_HOME/bin/kafka-create-topic.sh --zookeeper 192.168.33.22:2181 --replica 3 --partition 1 --topic replicated-topic-1
$KAFKA_HOME/bin/kafka-console-producer.sh --broker-list 192.168.33.23:9092 --topic replicated-topic-1
I
Am
A
Replicated
Topic

Now consume the new topic from one of your other servers.$KAFKA_HOME/bin/kafka-console-consumer.sh --zookeeper 192.168.33.21:2181 --topic replicated-topic-1 --from-beginningPlay around with producing to different brokers and consuming with different zookeepers. Hopefully, it all works!

You can do A LOT with Zookeeper and Kafka. The purpose of this walk-through is just to get you to a point where you can be ready to explore all of Kafka’s goodness within a clustered environment. For more information, please read the documentation.

http://kafka.apache.org/documentation.html http://zookeeper.apache.org/doc/r3.4.6/

Cheers!

Share this Post

Related Blog Posts

Unknown

OPI Hires Consultant Jeremy Schacherer

March 27th, 2014

Object Partners is pleased to welcome Jeremy Schacherer to their technology consulting staff. Jeremy has 19 years of technology experience working in the food packaging, government, internet auctions, insurance and direct mail industries. Jeremy has…

Object Partners
Unknown

Finding a Buggy Commit Using git bisect and curl #git

March 11th, 2014

Describes a recipe for using git bisect and curl to find where in the commit history a bug was introduced.

Patrick Double
Unknown

Single Sign-on With Rails, Wordpress and Oauth2

February 18th, 2014

Implementing Single Sign-on Using Rails, Oauth2 and Wordpress as an Oauth Provider

Object Partners

About the author