Let’s talk about EnMasse : the open source “Messaging as a Service”

After the Red Hat Summit speaking about JBoss AMQ and Apache Kafka using the EnMasse project, the coming weeks will be rich of sessions about this “Messaging as a Service” platform.

First of all, I’ll have a meetup on May 22nd in Naples organized by the DevDay community. It will be all around messaging (and I’m not going to speak about Whatsapp, Hangout, … :-)) and how we are developing a “Messaging as a Service” solution running on Kubernetes and OpenShift : it’s name is EnMasse.

Selection_058

The other session will be on June 5th in Milan during an IoT meetup organized by the AgileIoT community in the Microsoft House. There, I’ll always speak about EnMasse and how it “democratizes” the IoT giving you a full open source solution for that : in this case I’ll show how this “Messaging as a Service” platform can run in the Azure cloud as well.

Selection_059

So … if you want to know more about EnMasse just pick one of this events … or both ! 🙂

 

The “impact of an individual”

Monday 1st , 2017 … the alarm is ringing … it’s 3:00 AM … the time is arrived … let’s wake up … finally the Red Hat Summit is going to start !

After submitting my proposal and being accepted for a session with Christian Posta about JBoss AMQ and Apache Kafka … a few months have passed and one of the most thrilling experience in my work life is about less then 7000 kms far from me.

It’s 7:00 AM … the plane takes off and my mind starts to think about last year when, as a new hired, I was reading a lot of emails related to the Red Hat Summit materials and demo preparation from other “veteran” employees … I couldn’t imagine that after just one year I would have been one of the guys for the next summit !

Naples … Frankfurt … and finally Boston. I start to breath the summit just outside the airport with a lot of Red Hat advertising boards; the company has invested a lot of money and time for giving the best experience to employees, partners and customers in order to engage each other finding the way to collaborate and doing business but starting from something completely free … the freedom to think, to develop and to share ideas and projects with the community.

sdr dig

Yes ! … because this is the Red Hat … no NDAs … no restrictions … no limits … it’s all up to you … you are completely free … free to propose, to design, to collaborate and … finally having an “impact as an individual” (cit.)

sdr

Yes ! … every person as an individual has his freedom and more individuals make a community developing ideas … becoming the biggest “company” in the world with million of developers. This is the way for innovation, no other way is the way to go. In such a model, every single person with a different experience and a different background can give his contribution to any open source project improving and enriching it. Every single person has an “impact as an individual” but for doing that he needs to be in the right context and Red Hat is the leading company on doing that.

sdr

After landing to the Logan airport I set my compass to the Boston Convention & Exhibition Center with my new friend Bolek; I have never seen him, it’s the first time we meet in person even because he doesn’t come from my messaging team but from the Keycloak one. This is one of the great things of this kind of big conferences; you’ll meet a lot of colleagues, customers, partners or community members from all around the world sometimes for the first time even if you chat with them almost every day. During this summit I’ll be in a shared accommodation with other two guys from the Keycloak team, Sebastien and Bruno, other than one of the “gurus” of my team, Ted.

The venue is huge, people are coming for the checkin or the onsite registration and the staff is building the partners pavilion; it must be ready for the coming day … and it will be !

dav sdr

Let’s the summit begin ! A lot of speakers, from Red Hat and other companies, are bringing their knowledge and experience here with a common denominator … sharing what they are doing in the “open” way. Two keynotes every day and each one packed; all the rooms for the breakout sessions will be packed as well.

Announcements on JBoss AMQ 7, OpenShift, RHEL, Microservices and all the other projects and related products make the attendees so enthusiastic because there are a lot of things to do. New projects will come so let’s “start something” (cit) in the right way … “try, learn & modify” (cit.).

sdr

New announcements with “old” partners like Amazon, consolidation with other partners like Microsoft and experiences from big customers like Deutsche Bank. They trust us … they trust the way how Red Hat is the only company able to make an open source project reliable and usable at the enterprise level.

Attending the David Ingham and Ted Ross sessions around JBoss AMQ 7 make me so proud to be part of this team. They have been working so hard in the last years for bringing customers a new powerful experience doing messaging : the new broker, the new router component and the new clients. If you need to do messaging in your business there is no choice : from the hybrid to the cloud, AMQ 7 is the answer. And let me say that it will be even the pillar for the coming “baby” that is on its way : trust me … “incredible” guys are working on that, EnMasse is its name … the “messaging as a service” platform of the future.

dav dav

dav sdr

The IoT business is something that a lot of customers are exploring too and the “IoT Codestarter” evening event organized together with the Eclipse Foundation and Eurotech (our partner) is a great opportunity for hacking from the field with sensors and gateways to the Cloud : Kura and Kapua are the involved projects but I can say … pay attention to Hono as well (not only just because I’m working on that ;)).

On my side, the last day, when it seems that all the people are tired and want just to come back home, the room is packed. Why to use JBoss AMQ 7 ? When to use Apache Kafka ? Can I use them together ? I and Christian give the answers to these questions to the attendees. Even in this case, a lot of interest around that … “playing” around Kafka is on our radar (while I’m writing, this session has raised a lot of discussion on Twitter as well).

dav C_AZuofXYAElcyC

Last but not least during these days I’m falling in love with Boston day by day; it seems to be ten or more cities into only one. Walking through the city and longside the river is a great experience.

dig dav

sdr sdr

Three days have passed, all the Red Hatters and the other attendees are leaving Boston with a new freshness and the certainty that more great things are coming. What does it mean ? The countdown to the next summit is already started, San Francisco will wait for us, let’s see how every person will have his “impact as an individual” during this year.

This is the open source, this is Red Hat, this is the summit … something that you can breath every day which makes you part of a big community where the power, as I like to say, is always in the “collaboration” !

IoT developer survey : my 2 cents one year later …

As last year, I have decided to write a blog post about my point of view on the IoT developer survey from the Eclipse Foundation (IoT Working Group) with IEEE, Agile IoT and the IoT Council.

From my point of view, the final report gives always interesting insights on where the IoT business is going and about that, Ian Skerrett (Vice President of Marketing at Eclipse Foundation) has already analyzed the results, available here, writing a great blog post.

I want just to add 2 more cents on that …

Industry adoption …

It’s clear that industries are adopting IoT and there is a big increment for industrial automation, smart cities, energy management, building automation, transportation, healthcare and so on. IoT is becoming “real” even if, as we will see in the next paragraphs, it seems that we are still in a prototyping stage. A lot of companies are investing on that but few of them have real solutions running in the field. Finally, from my point of view, it could be great to add more information about countries because I think that there is a big difference on how and where every country is investing for IoT.

The concerns …

Security is always the big concern but, as Ian said, interoperability and connectivity are on a downward trend; I agree with him saying that all the available middleware solutions and the IoT connectivity platforms are solving these problems. The great news is that all of them support different open and standard protocols (MQTT, AMQP but even HTTP) that is the way to go for having interoperability; at same time we are able to connect a lot of different devices, supporting different protocols, so the connectivity problem is addressed as well.

Coming back to security, the survey shows that much more software developers are involved on building IoT solutions even because all the stuff they mostly use are SSL/TLS and data encryption so at software level. From my point of view, some security concerns should be addressed at hardware level (using crypto-chip, TPM and so on) but this is an area where software developers have a lack of knowledge. It’s not a surprise because we know that IoT needs a lot of different knowledge from different people but the survey shows that in some cases not the “right” people are involved on developing IoT solution. Too much web and mobile developers are working on that, too few embedded developer with a real hardware knowledge.

Languages : finally a distinction !

Last year, in my 2 cents, I asked for having a distinction on which side of an IoT solution we consider the most used programming languages. I’m happy to know that Eclipse Foundation got this suggestion so this year survey asked about languages used on constrained devices, gateway and cloud.

iot_survey

The results don’t surprise me : C is the most used language on “real” low constrained devices and all the other languages from Java to Python are mostly used on gateways; JavaScript fits in the cloud mainly with NodeJS. In any case, NodeJS is not a language so my idea is that providing only JavaScript as possible answer was enough even because other than using a server-side framework like NodeJS the other possibility is using JavaScript in “function as a service” platforms (i.e. Lambda from AWS, Azure Functions and so on) that are mostly based on NodeJS. Of course, the most used language in the cloud is Java.

What about OS ?

Linux is the most used OS for both constrained devices and IoT gateways but … here a strange thing comes in my mind. On “real” constrained devices that are based on MCUs (i.e. Cortex-Mx) you can run few specific Linux distros (i.e. uCLinux) and not a full Linux distro so it’s strange that Linux wins on constrained devices but then when the survey shows what distros are used, uCLinux has a very low percentage. My guess is that a lot of software developers don’t know what a constrained device is 🙂

On constrained devices I expect that developers uses “no OS” (programming on bare metal) or a really tiny RTOS but not something closed to Linux.

On gateways I totally agree with Linux but Windows is growing from last year.

Regarding the most used distros, the Raspbian victory shows that we are still in a prototyping stage. I can’t believe that developers are using Raspbian so the related Raspberry Pi hardware in production ! If it’s true … I’m scared about that ! If you know what are the planes, trains, building automation systems which are using something like that, please tell me … I have to avoid them 🙂

Regarding the protocols …

From my point of view, the presence of TCP/IP in the connectivity protocols results is misleading. TCP/IP is a protocol used on top of Ethernet and Wi-Fi that are in the same results and we can’t compare them.

Regarding communication protocols, the current know-how is still leading; this is the reason why HTTP 1.1 is still on the top and HTTP 2.0 is growing. MQTT is there followed by CoAP, which is surprising me considering the necessity to have an HTTP proxy for exporting local traffic outside of a local devices network. AMQP is finding its own way and I think that in the medium/long term it will become a big player on that.

Cloud services

In this area we should have a distinction because the question is pretty general but we know that you can use Amazon AWS or Microsoft Azure for IoT in two ways :

  • as IaaS hosting your own solution or an open source one for IoT (i.e. just using provided virtual machines for running an IoT software stack)
  • as PaaS using the managed IoT platforms (i.e. AWS IoT, Azure IoT Hub, …)

Having Amazon AWS on the top doesn’t surprise me but we could have more details on how it is used by the IoT developers.

Conclusion

The IoT business is growing and its adoption as well but looking at these survey results, most of the companies are still in a prototyping stage and few of them have a real IoT solution in the field.

It means that there is a lot of space for all to be invited to the party ! 😀

 

Being a “remotee” in Red Hat … one year later

When I started to work at Red Hat last year (on March 1st), all my friends and relatives asked me a lot of questions about my new job … from home !

“What are your working hours ?”, “How does you manager verify that you are working ?”, “How do you share artifacts with your colleagues ?”, “What’s your daily life without getting in touch with your colleagues ?” …

I understand that for people, who don’t know what being a “remotee” means, it’s quite difficult to understand this way to spend the time at home but … working.

Other people just for kidding say “You are at home, you can do whatever you want” … but it’s not absolutely true ! It’s exactly the opposite !

I can say that every day I chat with my colleagues, writing or speaking it depends on the stuff we have to discuss. Once a week I have a video call for syncing about the work each team member has done during the last week. Sometimes, we have the chance to meet in person for conferences or business travels (it was great for me being in Boston in January for the F2F meeting of the whole “messaging” team).

Even if I don’t meet in person my colleagues every day, I have started to have friendship feelings with them. I think that if we lived in the same place, I could have a lot activities with them outside of the office. Of course, it’s a matter of how people are … and I was very lucky with my colleagues. By the way, having such a feeling means that you are comfortable in the team and with your “remote” colleagues.

We are an open source company, so all my artifacts are available online (on GitHub) but in any case we also share documents regarding stuff we are developing in order to have a place for getting feedback and comments.

After one year, I have these main points to share with you :

  • You have to be the manager of yourself and it’s not always simple.
  • You need a separation between being at “home” and being at the “home office”.
  • You have flexibility on the working hours but my preference is to have a full day work as I was in a “real” office.
  • You have your manager who trusts in you … for having a dispersed team, the trust is one of the main aspects.
  • You need to be a passionate employee about your job … you have to love it … and I’m lucky on that ! 🙂

Quite often, working from home means working more and it’s exactly the opposite of what a lot of people think. You are right there at the “office” (just few seconds from the bed), you are right there at “home” (just few steps from your desk) … but if you are passionate about what you do … it’s not a problem … at least for me 🙂

Of course, there are some perks about being a “remotee” :

  • I have more free time in the early morning and late afternoon because I can avoid to waste my time in a traffic jam ! Now I’m a runner who starts his day at 6:00 AM for having a workout and even a father that can play with his children just “one minute” after ending the work.
  • I have time for taking and picking up my son to the school.
  • Last year I had a daughter and since July I have been seeing her every hour during each day and how she is growing.
  • When I have a break during the day I can speak with my wife or play a little bit with my children.
  • There are few distractions because when you are at your desk … you are alone 🙂 You can be more concentrate on the problem you are trying to solve.
  • Two additional perks are … having a good Napolitan coffee at lunch and watching a “The Big Bang Theory” episode after lunch 🙂

I think that in a such working environment you need two main things : being passionate and being professional.

This short post came to my mind after reading a blog post series written by a Red Hatter during the last days explaining how it’s possible to work in a “dispersed” team and I think that you can read it to understand better how “dispersed” teams work great here in Red Hat.

So with this … I hope I have answered to all the people with their questions ! 🙂

And now … now I’m ready … ready for the Red Hat Summit where I’ll meet in person some of my colleagues and other Red Hatters from all around the world !

“Hostpath” based volumes dynamically provisioned on OpenShift

Storage is one of the critical pieces in a Kubernetes/OpenShift deployment for those applications which need to store persistent data; a good example is represented by “stateful” applications that are deployed using Stateful Sets (previously known as Pet Sets).

In order to do that, one or more persistent volumes are manually provisioned by the cluster admin and the applications can use persistent volume claims for having access to them (read/write). Starting from 1.2 release (as alpha), Kubernetes offers the dynamic provisioning feature for avoiding the pre-provisioning by the cluster admin and allowing auto-provisioning of persistent volumes when they are requested by users. In the current 1.6 release, this feature is now considered in the stable state (you can read more about that at following link).

As described in the above link, there is a provisioner which is able to provision persistent volumes as requested by users through a specified storage class. In general, each cloud provider (Amazon Web Services, Microsoft Azure, Google Cloud Platform, …) allows to use some default provisioners but for a local deployment on a single node cluster (i.e. for developing purpose) there is no default provisioner for using an “hostpath” (providing a persistent volume through the host in a local directory).

There is the following project (in the “Kubernetes incubator”) which provides a library for developing a custom external provisioner and one of the examples is exactly a provisioner for using a local directory on the host for persistent volumes : the hostpath-provisioner.

In this article, I’ll explain the steps needed to have the “hostpath provisioner” working on an OpenShift cluster and what I have learned during this journey. My intention is to provide a unique guide gathering information taken from various sources like the official repository.

Installing Golang

First of all,  I didn’t have Go language on my Fedora 24 and the first thing to know is that the version 1.7 (or above) is needed because the “context” package (added in the 1.7 release) is needed. I started installing the default Go version provided by Fedora 24 repositories (1.6.5) but receiving the following error trying to build the provisioner :

vendor/k8s.io/client-go/rest/request.go:21:2: cannot find package "context" in any of:
 /home/ppatiern/go/src/hostpath-provisioner/vendor/context (vendor tree)
 /usr/lib/golang/src/context (from $GOROOT)
 /home/ppatiern/go/src/context (from $GOPATH)

In order to install Go 1.7 manually, after downloading the tar file from the web site, you can extract it in the following way :

tar -zxvf go1.7.5.linux-amd64.tar.gz -C /usr/local

After that, two main environment variables are needed to be set for having the Go compiler and runtime working fine.

  • GOROOT : the directory where Go is just installed (i.e. /usr/local/go)
  • GOPATH : the directory with the Go workspace (where we need to create two other directories there, the src and bin)

Modifying the .bashrc (or the .bash_profile) file we can export such environment variables.

export GOPATH=$HOME/go
PATH=$PATH:$GOPATH/bin
export GOROOT=/usr/local/go
PATH=$PATH:$GOROOT/bin

Having the GOPATH/bin in the PATH is needed as we’ll see in the next step.

Installing Glide

The provisioner project we want to build has some Go dependecies and Glide is used as dependencies manager.

It can be installed in the following way :

curl https://glide.sh/get | sh

This command downloads the needed files and builds the Glide binary copying it into the GOPATH/bin directory (so we need to have that into the PATH as already done for using glide on the command line).

Building the “hostpath-provisioner”

First of all we need to clone the GitHub repository from here and then launching the make command from the docs/demo/hostpath-provisioner directory.

The Makefile has the following steps :

  • using Glide in order to download all the needed dependencies.
  • compiling the hostpath-provisioner application.
  • building a Docker image which contains the above application.

It means that this provisioner needs to be deployed in the cluster in order to provide the dynamic provisioning feature to the other pods/containers which needs persistent volumes created dynamically.

Deploying the “hostpath-provisioner”

This provisioner is going to use a directory on the host for persistent volumes. The name of the root folder is hardcoded in the implementation and it is /tmp/hostpath-provisioner. Every time an application will claim for using a persistent volume, a new child directory will be created under this one.

Such root folder needs to be created having all access for reading and writing :

mkdir -p /tmp/hostpath-provisioner
chmod 777 /tmp/hostpath-provisioner

In order to run the “hostpath-provisioner” in a cluster with RBAC (Role Based Access Control) enabled or on OpenShift you must authorize the provisioner.

First of all, create a ServiceAccount resource described in the following way :

apiVersion: v1
kind: ServiceAccount
metadata:
 name: hostpath-provisioner

then a ClusterRole :

kind: ClusterRole
apiVersion: v1
metadata:
 name: hostpath-provisioner-runner
rules:
 - apiGroups: [""]
 resources: ["persistentvolumes"]
 verbs: ["get", "list", "watch", "create", "delete"]
 - apiGroups: [""]
 resources: ["persistentvolumeclaims"]
 verbs: ["get", "list", "watch", "update"]
 - apiGroups: ["storage.k8s.io"]
 resources: ["storageclasses"]
 verbs: ["get", "list", "watch"]
 - apiGroups: [""]
 resources: ["events"]
 verbs: ["list", "watch", "create", "update", "patch"]
 - apiGroups: [""]
 resources: ["services", "endpoints"]
 verbs: ["get"]

It’s needed because the controller requires authorization to perform the above API calls (i.e. listing, watching, creating and deleting persistent volumes and so on).

Let’s create a sample project for that, save the above resources in two different files (i.e. serviceaccount.yaml and openshift-clusterrole.yaml) and finally create these resources.

oc new-project test-provisioner
oc create -f serviceaccount.yaml
oc create -f openshift-clusterrole.yaml

Finally we need to provide such authorization in the following way :

oc adm policy add-scc-to-user hostmount-anyuid system:serviceaccount:test-provisioner:hostpath-provisioner
oc adm policy add-cluster-role-to-user hostpath-provisioner-runner system:serviceaccount:test-provisioner:hostpath-provisioner

The “hostpath-provisioner” example provides a pod.yaml file which describes the Pod to deploy for having the provisioner running in the cluster. Before creating the Pod we need to modify this file, setting the spec.serviceAccount property to the that in this case is just “hostpath-provisioner” (as described in the serviceaccount.yaml file).

kind: Pod
apiVersion: v1
metadata:
 name: hostpath-provisioner
spec:
 containers:
 - name: hostpath-provisioner
 image: hostpath-provisioner:latest
 imagePullPolicy: "IfNotPresent"
 env:
 - name: NODE_NAME
 valueFrom:
 fieldRef:
 fieldPath: spec.nodeName
 volumeMounts:
 - name: pv-volume
 mountPath: /tmp/hostpath-provisioner
 serviceAccount: hostpath-provisioner
 volumes:
 - name: pv-volume
 hostPath:
 path: /tmp/hostpath-provisioner

Last steps … just creating the Pod and then the StorageClass and the PersistentVolumeClaim using the provided class.yaml and claim.yaml files.

oc create -f pod.yaml
oc create -f class.yaml
oc create -f claim.yaml

Finally we have a “hostpath-provisioner” deployed in the cluster that is ready to provision persistent volumes as requested by the other applications running in the same cluster.

Selection_040

See the provisioner working

For checking that the provisioner is really working, there is a test-pod.yaml file in the project which starts a pod claiming for a persistent volume in order to create a SUCCESS file inside it.

After starting the pod :

oc create -f test-pod.yaml

we should see a SUCCESS file inside a child directory with a very long name inside the root /tmp/hostpath-provisioner.

ls /tmp/hostpath-provisioner/pvc-1c565a55-1935-11e7-b98c-54ee758f9350/
SUCCESS

It means that the provisioner has handled the claim request in the correct way, providing a volume to the test-pod in order to write the file.

A new “Kafka” novel : the OpenShift & Kubernetes deployment

This blog post doesn’t want to be an exhaustive tutorial to describe the way to go for having Apache Kafka deployed in an OpenShift or Kubernetes cluster but just the story of my journey for having a “working” deployment and using it as a starting point to improve over time with a daily basis work in progress. This journey started using Apache Kafka 0.8.0, went through 0.9.0, finally reaching the nowadays 0.10.1.0 version.

From “stateless” to “stateful”

One of the main reasons to use a platform like OpenShift/Kubernetes (let me to use OS/K8S from now) is the scalability feature we can have for our deployed applications. With “stateless” applications there are not so much problems to use such a platform for a Cloud deployment; every time an application instance crashes or needs to be restarted (and/or relocated to a different node), just spin up a new instance without any relationship with the previous one and your deployment will continue to work properly as before. There is no need for the new instance to have information or state related to the previous one.

It’s also true that, out there, we have a lot of different applications which need to persist state information if something goes wrong in the Cloud and they need to be restarted. Such applications are “stateful” by nature and their “story” is important so that just spinning up a new instance isn’t enough.

The main challenges we have with OS/K8S platform are :

  • pods are scaled out and scaled in through Replica Sets (or using Deployment object)
  • pods will be assigned an arbitrary name at runtime
  • pods may be restarted and relocated (on a different node) at any point in time
  • pods may never be referenced directly by the name or IP address
  • a service selects a set of pods that match specific criterion and exposes them through a well-defined endpoint

All the above considerations aren’t a problem for “stateless” applications but they are for “stateful” ones.

The difference between them is also know as “Pets vs Cattle” meme, where “stateless” applications are just a herd of cattle and when one of them die, you can just replace it with a new one having same characteristics but not exactly the same (of course !); the “stateful” applications are like pets, you have to take care of them and you can’t just replace a pet if it’s die 😦

Just as reference you can read about the history of “Pets vs Cattle” in this article.

Apache Kafka is one of these type of applications … it’s a pet … which needs to be handle with care. Today, we know that OS/K8S offers Stateful Sets (previously known as Pet Sets … for clear reasons!) that can be used in this scenario but I started this journey when they didn’t exist (or not released yet), so I’d like to share with you my story, the main problems I encountered and how I solved them (you’ll see that I have “emulated” something that Stateful Sets offer today out of box).

Let’s start with a simple architecture

Let’s start in a very simple way using a Replica Set (only one replica) for Zookeeper server and the related service and a Replica Set (with three replicas) for Kafka servers and the related service.

reference_architecture_1st_ver

The Kafka Replica Set has three replicas for “quorum” and leader election (even for topic replication). The Kafka service is needed to expose Kafka servers access even to clients. Each Kafka server may need :

  • unique ID (for Zookeeper)
  • advertised host/port (for clients)
  • logs directory (for storing topic partitions)
  • Zookeeper info (for connection)

The first approach is to use the broker id dynamic generation so that when a Kafka server starts and needs to connect to Zookeeper, a new broker id is generated and assigned to it. The advertised host/port are just container IP and the fixed 9092 port while the logs directory is predefined (by configuration file). Finally, the Zookeeper connection info are provided through the related Zookeeper service using the related environment variables that OS/K8S creates for us (ZOOKEEPER_SERVICE_HOST and ZOOKEEPER_SERVICE_PORT).

Let’s consider the following use case with a topic (1 partition and 3 replicas). The initial situation is having Kafka servers with broker id 1001, 1002, 1003 and the topic with current state :

  • leader : 1001
  • replicas : 1001, 1002, 1003
  • ISR : 1001, 1002, 1003

It means that clients need to connect to 1001 for sending/receiving messages for the topic and that 1002 and 1003 are followers for having this topic replicated handling failures.

Now, imagine that the Kafka server 1003 crashes and a new instance is just started. The topic description becomes :

  • leader : 1001
  • replicas : 1001, 1002, 1003 <– it’s still here !
  • ISR  : 1001, 1002 <– that’s right, 1003 is not “in-sync”

Zookeeper still sees the broker 1003 as a host for one of the topic replicas but not “in-sync” with the others. Meantime, the new started Kafka server has a new auto generated id 1004. A manual script execution (through the kafka-preferred-replica-election.sh) is needed in order to :

  • adding 1004 to the replicas
  • removing 1003 from replicas
  • new leader election for replicas

use_case_autogenerated_id.png

So what does it mean ?

First of all, the new Kafka server instance needs to have the same id of the previous one and, of course, the same data so the partition replica of the topic. For this purpose, a persistent volume can be the solution used, through a claim, by the Replica Set for storing the logs directory for all the Kafka servers (i.e. /kafka-logs-<broker-id>). It’s important to know that, by Kafka design, a logs directory has a “lock” file locked by the server owner.

For searching for the “next” broker id to use, avoiding the auto-generation and getting the same data (logs directory) as the previous one, a script (in my case a Python one) can be used on container startup before launching the related Kafka server.

In particular, the script :

  • searches for a free “lock” file in order to reuse the broker id for the new Kafka server instance …
  • … otherwise a new broker id is used and a new logs directory is created

Using this approach, we obtain the following result for the previous use case :

  • the new started Kafka server instance acquires the broker id 1003 (as the previous one)
  • it’s just automatically part of the replicas and ISR

use_case_locked_id

But … what on Zookeeper side ?

In this deployment, the Zookeeper Replica Set has only one replica and the service is needed to allow connections from the Kafka servers. What happens if the Zookeeper crashes (application or node fails) ? The OS/K8S platform just restarts a new instance (not necessary on the same node) but what I see is that the currently running Kafka servers can’t connect to the new Zookeeper instance even if it holds the same IP address (through the service usage). The Zookeeper server closes the connections after an initial handshake, probably related to some Kafka servers information that Zookeeper stores locally. Restarting a new instance, this information are lost !

Even in this case, using a persistent volume for the Zookeeper Replica Set is a solution. It’s used for storing the data directory that will be the same for each instance restarted; the new instance just finds Kafka servers information in the volume and grants connections to them.

reference_architecture_1st_ver_zookeeper

When the Stateful Sets were born !

At some point (from the 1.5 Kubernetes release), the OS/K8S platform started to offer the Pet Sets then renamed in Stateful Sets like a sort of Replica Sets but for “stateful” application but … what they offer ?

First of all, each “pet” has a stable hostname that is always resolved by DNS. Each “pet” is being assigned a name with an ordinal index number (i.e. kafka-0, kafka-1, …) and finally a stable storage is linked to that hostname/ordinal index number.

It means that every time a “pet” crashes and it’s restarted, the new one will be the same : same hostname, same name with ordinal index number and same attached storage. The previous running situation is fully recovered and the new instance is exactly the same as the previous one. You could see them as something that I tried to emulate with my scripts on container startup.

So today, my current Kafka servers deployment has :

  • a Stateful set with three replicas for Kafka servers
  • an “headless” service (so without an assigned cluster IP) that is needed for having Stateful set working (so for DNS hostname resolution)
  • a “regular” service for providing access to the Kafka servers from clients
  • one persistent volume for each Kafka server with a claim template defined in the Stateful set declaration

reference_architecture_statefulsets

Other then to use a better implementation 🙂 … the current solution doesn’t use a single persistent volume for all the Kafka servers (having a logs directory for each of them) but it’s preferred to use a persistent storage dedicated to only one “pet”.

It’s great to read about it but … I want to try … I want to play !

You’re right, I told you my journey that isn’t finished yet but you would like to try … to play with some stuff for having Apache Kafka deployed on OS/K8S.

I called this project Barnabas like one of the main characters of the author Franz Kafka who was a … messenger in “The Castel” novel :-). It’s part of the bigger EnMasse project which provides a scalable messaging as a service (MaaS) infrastructure running on OS/K8S.

The repo provides different deployment types : from the “handmade” solution (based on bash and Python scripts) to the current Stateful Sets solution that I’ll improve in the coming weeks.

The great thing about that (in the context of the overall EnMasse project) is that today I’m able to use standard protocols like AMQP and MQTT to communicate with an Apache Kafka cluster (using an AMQP bridge and an MQTT gateway) for all the use cases where using Kafka makes sense against traditional messaging brokers … that from their side have to tell about a lot of stories and different scenarios 😉

Do you want to know more about that ? The Red Hat Summit 2017 (Boston, May 2-4) could be a good place, where me and Christian Posta (Principal Architect, Red Hat) will have the session “Red Hat JBoss A-MQ and Apache Kafka : which to use ?” … so what are you waiting for ? See you there !

Vert.x and IoT in Rome : what a meetup !

Yesterday I had a great day in Rome for a meetup hosted by Meet{cast} (powered by dotnetpodcast community) and Codemotion, speaking about Vert.x and how we can use it for developing “end to end” Internet of Things solutions.

17352445_10208955590111131_6229030843024604532_n

17352567_10208955588791098_766816304298598626_n

I started with an high level introduction on Vert.x and how it works, its internals and its main usage then I moved to dig into some specific components useful for developing IoT applications like the MQTT server, AMQP Proton and Kafka client.

17342690_10208955588751097_8818320599257580571_n

17352571_10208955588951102_2851165399929439718_n

It was interesting to know that even in Italy a lot of developers and companies are moving to use Vert.x for developing microservices based solutions. A lot of interesting questions came out … people seem to like it !

Finally, in order to prove the Vert.x usage in enterprise applications I showed two real use cases that today work thanks to the above components : Eclipse Hono and EnMasse. I had few time to explain better how EnMasse works in details, the Qpid Dispatch Router component in particular and for this reason I hope to have a future meetup on that, the AMQP router concept is quite new today ! In any case, knowing that such a scalable platform is based (even) on Vert.x was a great news for the attendees.

17264802_10208955590191133_8923182437405273553_n

If you are interested to know more about that, you can take a look to the slides and the demo. Following the link to the video of the meetup but only in Italian (my apologies for my English friends :-)). Hope you’ll enjoy the content !

Of course, I had some networking with attendees after the meetup and … with some beer 🙂

17310150_1421561734583219_8414988688301135801_o