Cloud

Your enterprise containers orchestrator in the cloud: deploying OpenShift on Azure

openshift_azure

In a “containerized” world running “cloud native” applications, there is the need for a containers orchestrator as Kubernetes which is the best open source platform providing such features. Furthermore, Red Hat pushed this project to an enterprise-grade level developing OpenShift (on top of Kubernetes) adding more powerful features in order to simplify the applications development pipeline. Today, thanks to this platforms, you can easily move your containerized applications (or microservices based solutions) from an on-premise installation to the cloud without changing anything on the development side; it’s just a matter of having a Kubernetes or OpenShift cluster up and running and moving your container from one side to the other.

As already announced during the latest Red Hat Summit in San Francisco, in order to improve the developers experience, Red Hat and Microsoft, are working side by side for providing a managed OpenShift service (as it is already available for Kubernetes with AKS); you can read more about this effort here.

But … while waiting for having this awesome and really easy solution, what’s the way for deploying an OpenShift cluster on Azure today?

Microsoft is already providing some good documentation for helping you on doing that for both the open source OKD project (Origin Community Distribution of Kubernetes, formerly known as OpenShift Origin) and the Red Hat productized version named OCP (OpenShift Container Platform) if you have a Red Hat subscription. Furthermore, there are two different GitHub repositories which provide templates for deploying both: the openshift-origin and the openshift-container-platform.

In this blog post, I’d like to describe my personal experience about deploying OKD on Azure trying to summarize and explain all the needed steps for having OpenShift up and running in the cloud.

Get the Azure template

First of all, you have to clone the openshift-origin repository, available under the Microsoft GitHub organization, which provides the deployment template (in JSON format) for deploying OKD.

git clone https://github.com/Microsoft/openshift-origin.git

Because the master branch contains the most current release of OKD with experimental items, this may cause instability but includes new things. For this reason, it’s better switching to one of the branches for using one of the stable releases, for example, the 3.9:

git checkout -b release-3.9 origin/release-3.9

This repository contains the ARM (Azure Resource Manager) template file for deploying the needed Azure resources, the related configurable parameters file and the Ansible scripts for preparing the nodes and deploying the OpenShift cluster.

You will come back to use the artifacts provided in this repository later because there are some other steps you need to do as prerequisites for the final deployment.

Generate SSH key

In order to secure access to the OpenShift cluster (to the master node), an SSH key pair is needed (without a passphrase). Once you have finished deploying the cluster, you can always generate new keys that use a passphrase and replace the original ones used during initial deployment. The following command shows how it’s possible using the ssh-keygen tool on Linux and macOS (for doing the same on Windows, follow here).

ssh-keygen -f ~/.ssh/openshift_rsa -t rsa -N ''

The above command will store the SSH key pair, generated using the RSA algorithm (-t rsa) and without a passphrase (-N ”), in the openshift_rsa file.

01_ssh_keygen

In order to make the SSH private key available in the Azure cloud, we are going to use the Azure Key Vault service as described in the next step.

Store SSH Private Key in Azure Key Vault

Azure Key Vault is used for storing cryptographic keys and other secrets used by cloud applications and services. It’s useful to create a dedicated resource group for hosting the Key Vault instance (this group is different from the one for deploying the OpenShift cluster).

az group create --name keyvaultgroup --location northeurope

Next step is to create the Key Vault in the above group.

az keyvault create --resource-group keyvaultgroup --name keyvaultopenshiftazure --location northeurope --enabled-for-template-deployment true

The Key Vault name must be globally unique (so not just inside the just created resource group).

Finally, you have to store the SSH key into the Key Vault for making it accessible during the deployment process.

az keyvault secret set --vault-name keyvaultopenshiftazure --name openshiftazuresecret --file ~/.ssh/openshift_rsa

Create an Azure Active Directory service principal

Because OpenShift communicates with Azure, you need to give it the permissions for executing operations and it’s possible creating a service principal under Azure Active Directory.

The first step is to create the resource group where we are going to deploy the OpenShift cluster. The service principal will assign the contributor permissions to this resource group.

az group create --name openshiftazuregroup --location northeurope

Finally, the service principal creation.

az ad sp create-for-rbac --name openshiftazuresp --role Contributor --password mypassword --scopes /subscriptions//resourceGroups/openshiftazuregroup

From the JSON output, it’s important to take a note of the appId field which will be used as aadClientId parameter in the deployment template.

02_create_sp

Customize and deploy the Azure template

Back to the openshift-origin GitHub repository, we have cloned before, the azuredeploy.parameters.json file provides all the parameters for the deployment. All the corresponding possible values are defined in the related azuredeploy.json file.

This file defines the deployment template and it describes all the Azure resources that will be deployed for making the OpenShift cluster such as virtual machines for the nodes, virtual networks, storage accounts, network interfaces, load balancers, public IP addresses (for master and infra node) and so on. It also describes the parameters with the related possible values.

Taking a look at the parameters JSON file there are some parameters with a “changeme” value. Of course, it means that we have to assign meaningful values to these parameters because they make the main customizable part of the deployment itself:

openshiftClusterPrefix: a prefix used for assigning hostnames for all nodes – master, infra and nodes.
adminUsername: admin username for both operating system login and OpenShift login
openshiftPassword: password for the OpenShift login
sshPublicKey: the SSH public key generated in the previous steps (the content of the ~/.ssh/openshift_rsa.pub file)
keyVaultResourceGroup: the name of the resource group that contains the Key Vault
keyVaultName: the name of the Key Vault you created
keyVaultSecret: the Secret Name you used when creating the Secret (that contains the Private Key)
aadClientId: the Azure Active Directory Client ID you should have noted when the service principal was created
aadClientSecret: the Azure Active Directory service principal secret/password chosen on creation

Other useful parameters are:

masterVmSize, infraVmSize and nodeVmSize: size for VMs related to the master, infra and worker nodes
masterInstanceCount, masterInstanceCount and masterInstanceCount: number of master, infra and worker nodes

All the deployed nodes run CentOS as the operating system.

Instead of replacing the values in the original parameters file, it’s better making a copy and renaming it, for example, as azuredeploy.parameters.local.json.

After configuring the template parameters, it’s possible to deploy the cluster running the following command from inside the GitHub repository directory (for using the contained JSON template file and the related parameters file).

az group deployment create -g openshiftazuregroup --name myopenshiftcluster --template-file azuredeploy.json --parameters @azuredeploy.parameters.local.json

This command could take quite a long time depending on the number of nodes and their size and at the end, it will show the output in JSON format from which the two main fields (in the “outputs” section) are:

openshift Console Url: it’s the URL of the OpenShift web console you can access using the configured adminUsername and openshiftPassword
openshift Master SSH: contains information for accessing to the master node via SSH
openshift Infra Load Balancer FQDN: contains the URL for accessing the infra node

In order to access the OpenShift web console, you can just copy and paste the “openshift Console Url” value in your preferred browser.

03_openshift_console

In the same way, you can access the OpenShift master node by just copying and pasting the “openshift Master SSH” value in a terminal.

Once on the master node, you can start to use the oc tool for interacting with the OpenShift cluster, for example showing the available nodes as follows.

04_master_ssh

You can notice that these URLs don’t have really friendly names and it’s because the related variables in the template for defining them are:


"infraLbPublicIpDnsLabel": "[concat('infradns', uniqueString(concat(resourceGroup().id, 'infra')))]",

"openshiftMasterPublicIpDnsLabel": "[concat('masterdns', uniqueString(concat(resourceGroup().id, 'master')))]"

Conclusion

Nowadays, as you can see, following a bunch of simple steps and leveraging the Azure template provided by Microsoft and Red Hat, it’s not so difficult to have an OpenShift cluster up and running in the Azure cloud. If you think about all the resources that need to be deployed (nodes, virtual network, load balancers, network interfaces, storage, …), the process really simplifies the overall deployment. Of course, the developers’ life will be easier when Microsoft and Red Hat will announce the managed OpenShift offer.

Having an OpenShift cluster running in the cloud it’s just the beginning of the fun … so, now that you have it … enjoy developing your containerized applications!

We can have more … EnMasse !

This morning, my working day started in a different way with an interesting news from AWS re:Invent 2017, the annual Amazon conference …

The news was about Amazon MQ, a new managed message broker service based on ActiveMQ 5.x with all the goodies that it provides in terms of supported protocols like MQTT, JMS, STOMP, … and … yes … AMQP 1.0 !

It seems that this news made Clemens Vaster (from Microsoft) happy as well 🙂

Selection_078

Finally, even Amazon added support for a “real” messaging protocol which is enterprise ready and from my point of view … even IoT ready 🙂

Taking a look to the blog post about this new service, another project came to my mind … guess what ? EnMasse !

We can have more : EnMasse !

What the AmazonMQ provides is the possibility to create a new broker instance and then accessing to the console, creating queues, topics and so on. It’s great but … we can have more !

For this reason I decided to write, for the first time, something about EnMasse even if I had a lot of sessions in different conferences in the past, speaking about it.

EnMasse is an open source “messaging as a service” platform which simplifies the deployment of a messaging infrastructure both “on premise” and in the Cloud. It provides scalability and elasticity in order to address all the problems we can have when the number of connected clients increases (and decreases) even reaching big numbers like in an IoT scenario.

It supports all the well-known messaging patterns (request/reply, publish/subscribe and competing consumers) and up today two main protocols, AMQP 1.0 and MQTT (but adding the HTTP support is on the road-map).

It provides multi-tenancy having different tenants sharing the same infrastructure but being isolated each other. Finally, it provides security in terms of using TLS protocol for establishing connections (with clients and between internal components) other than authentication using Keycloak as the identity management system.

Store and forward or … direct ?

One of the main features it provides is the support for two different messaging mechanisms, “store and forward” and “direct messaging”.

The “store and forward” mechanism is exactly what the messaging brokers provide today. The broker takes the ownership of the message sent by a producer before forwarding this message to a consumer which is asking for it (connecting to a queue or a topic on the broker itself). It means that “storing” the message is the first step executed by the broker and “forwarding” is the next one which can happen later, only when a consumer will be online for getting the message : it allows asynchronous communication between clients and time decoupling. There is always a double contract between produce-broker and broker-consumer, so that the producer knows that the messages reached the broker but not the consumer (a new messages exchange on the opposite direction is needed for having something like an “acknowledgement” from the consumer).

The “direct messaging” mechanism is not something new because it means having a sort of “direct” communication between clients, so that the producer is able to send the message only when the consumer is online with a single contract between the parties : when the producer receives the “acknowledgement”, it means that the consumer has got the message. Of course, EnMasse provides this mechanism in a reliable way : it uses an AMQP 1.0 routers network (connected in a mesh) so that clients aren’t really connected in a direct way but through this mesh. Every router, unlike a broker, doesn’t take ownership of the message but just forwards it to the next hop in the network in order to reach the destination. When a router crashes, the network is automatically re-configured in order to determine a new path for reaching the consumer; it means that high availability is provided in terms of “path redundancy”. Furthermore, thanks to the AMQP 1.0 protocol, a producer doesn’t receive “credits” from a router to send messages if the consumer isn’t online or can’t process more messages.

EnMasse provides these messaging mechanisms using two open source projects : Apache Qpid Dispatch Router, for the router network, and ActiveMQ Artemis (so ActiveMQ 6.x and not 5.x like in the AmazonMQ) for the brokers side.

enmasse_overall_view

I want to know only about “addresses” !

Comparing to the new AmazonMQ service, from a developers point of view, the interesting part is the abstraction layer that EnMasse adds to the underlying messaging infrastructure. You can create a new “address” using the console and specifying a type which can be :

queue : backed by a broker, for “store and forward” and for providing competing consumer pattern, asynchronous communication and so on.
topic : backed by a broker, for “store and forward” as well but for providing publish/subscribe pattern.
anycast : it’s something like a queue but in terms of “direct messaging”. A producer can send messages to such an address only when one or more consumers are listening on it and the routers network will deliver them in a competing consumer fashion.
multicast : it’s something like a topic but in terms of “direct messaging”, having a producer publishing messages to more consumers listening on the same address so that all of them receive the same message.

Selection_081

The developer doesn’t have to worry about creating the broker, configuring the routers and so on; using the console and a few simple steps in the wizard, he will have a usable “address” for exchanging messages between clients.

Selection_082

Good for microservices …

The interesting part of having the supported “direct messaging” mechanism is even, but not only, about the “micro-services” world. EnMasse can be used as a messaging infrastructure for handing request/reply and publish/subscribe between micro-services using an enterprise protocol like AMQP 1.0.

You can read more about building an AMQP 1.0 based API and a micro-services infrastructure in this article written by on of my colleague, Jakub Scholz.

Who orchestrate ? OpenShift and Kubernetes

Another aspect which makes EnMasse more appealing than other solutions is that it’s totally containerized and runs on the main containers orchestration platforms like Kubernetes and the enterprise OpenShift (using the OpenShift Origin project as well). It means that your messaging based (or IoT) solution can be deployed “on promise” and then easily moved to the Cloud without any changes to your applications (maybe just the addresses for establishing the connections from the clients).

Selection_079

Conclusion

Of course, this blog post didn’t mean to be an exhaustive guide on what EnMasse is but just a brief introduction that I wanted to write for a long time. The Amazon news gave me this opportunity for showing you that you can really have more than just creating a broker in the Cloud and taking care of it 🙂

A lot of fun with … AMQP, Spark, Kafka, EnMasse, MQTT, Vert.x & IoT

When I say to someone that I work for Red Hat they say me “Ah ! Are you working on Linux ?” … No, no, no and … no ! I’m not a Linux guy, I’m not a fan boy but I’m just a daily user 🙂

All people know that Red Hat is THE company which provides the best enterprise Linux distribution well known as Red Hat Enterprise Linux (RHEL) but Red Hat is not only Linux today. Its portfolio is huge : the cloud and containers business with the OpenShift effort, the microservices offer with Vert.x, Wildfly Swarm, Spring Boot, the IoT world with the involvement in the main Eclipse Foundation projects.

The objective of this blog is just showing briefly the projects I worked (or I’m working) on since last year when I was hired on March 1st. They are not “my” projects, they are projects I’m involved because the entire team is working on them … collaboration, you know 🙂

You could be surprised about that but … there is no Linux ! I’m on the messaging & IoT team, so you will see only projects about this stuff 🙂

AMQP – Apache Spark connector

This “little” component is strictly related to the “big” radanalytics.io project which takes the powerful of Apache Spark for analytics (batch, real-time, machine learning, …) running on OpenShift.

Because the messaging team works mainly on projects like ActiveMQ Artemis and the Qpid Dispatch Router, where the main protocol is AMQP 1.0, the idea was developing a connector for Spark Streaming in order to ingest data through this protocol so from queues/topics on a broker or through the router in a direct messaging fashion.

You can find the component here and even an IoT demo here which shows how it’s possible to ingest data through AMQP 1.0 using the EnMasse project (see below) and then executing a real time streaming analytics with Spark Streaming, all running on Kubernetes and OpenShift.

AMQP – Apache Kafka bridge

Apache Kafka is one of the best technologies used today for ingesting data (i.e. IoT related scenarios) with an high throughput. Even in this case, the idea was providing a way for having AMQP 1.0 clients and JMS clients pushing messages to Apache Kafka topics without knowing the related custom protocol.

In this way, if you have such clients because you are already using a broker technology but then you need some specific Kafka features (i.e. re-reading streams), you can just switch the messaging system (from the broker to Kafka) and using the bridge you don’t need to update or modify clients. I showed how this is possible at the Red Hat summit as well and the related demo is available here.

MQTT on EnMasse

EnMasse is an open source messaging platform, with focus on scalability and performance. It can run on your own infrastructure (on premise) or in the cloud, and simplifies the deployment of messaging infrastructure.

It’s based on other open source projects like ActiveMQ Artemis and Qpid Dispatch Router supporting the AMQP 1.0 protocol natively.

In order to provide support for the MQTT protocol, we designed how to take “MQTT over AMQP” so having MQTT features on the AMQP protocol. From the design we moved to develop two main components :

the MQTT gateway which handles connections with remote MQTT clients translating all messages from MQTT to AMQP and vice versa;
the MQTT LWT (Last and Will Testament) service which provides a way for notifying all clients connected to EnMasse that another client is suddenly died sending them its “will message”. The great thing about this service, is that it works with pure AMQP 1.0 clients so bringing the LWT feature on AMQP as well : for this reason the team is thinking to change its name just in AMQP LWT service.

EnMasse is great for IoT scenarios in order to handle a huge number of connections and ingesting a lot of data using AMQP and MQTT as protocols. I used it in all my IoT demos for showing how it’s possible to integrate it with streaming and analytics frameworks. It’s also the main choice as messaging infrastructure in the cloud for the Eclipse Hono project.

Vert.x and the IoT components

Vert.x is a great toolkit for developing reactive applications running on a JVM.

The reactive applications manifesto fits really well for IoT scenarios where responsiveness, resiliency, elasticity and the communication driven by messages are the pillars of all the IoT solutions.

Starting to work on the MQTT gateway for EnMasse using Vert.x for that, I decided to develop an MQTT server that was just able to handle communication with remote clients providing an API for interacting with them : this component was used for bridging MQTT to AMQP (in EnMasse) but can be used for any scenario where a sort of protocol translation or integration is needed (i.e. MQTT to Vert.x Event Bus, to Kafka, …). Pay attention, it’s not a full broker !

The other component was the Apache Kafka client, mainly developed by Julien Viet (lead on Vert.x) and then passed to me as maintainer for improving it and adding new features from the first release.

Finally, thanks to the Google Summer of Code, during the last 2 months I have been mentoring a student who is working on developing a Vert.x native MQTT client.

As you can see the Vert.x toolkit is really growing from an IoT perspective other then providing a lot of components useful for developing pure microservices based solutions.

Eclipse Hono

Eclipse Hono is a project under the big Eclipse IoT umbrealla in the Eclipse Foundation. It provides a service interfaces for connecting large numbers of IoT devices to a back end and interacting with them in a uniform way regardless of the device communication protocol.

It supports scalable and secure ingestion of large volumes of sensor data by means of its Telemetry API. The Command & Control API allows for sending commands (request messages) to devices and receive a reply to such a command from a device asynchronously in a reliable way.

This project is mainly developed by Red Hat and Bosch and I gave my support on designing all the API other then implementing the MQTT adapter even in this case using the Vert.x MQTT server component.

Because Eclipse Hono works on top of a messaging infrastructure for allowing messages exchange, the main choice was using ActiveMQ Artemis and the Qpid Dispatch Router even running them using Kubernetes and OpenShift with EnMasse.

Apache Kafka

Finally, I was involved to develop a POC named “barnabas” (a messenger character from a Frank Kafka novel :-)) in order to take Apache Kafka running on OpenShift.

Considering the stetaful nature of a project like Kafka, I started when Kubernetes didn’t offer the StatefulSets feature doing something similar by myself. Today, the available deploy is based on StatefulSets and it’s a work in progress on which I’ll continue to work for pushing the POC to the next level.

Apache Kafka is a really great project which has its own use cases in the messaging world; today it’s more powerful thanks to the new Streams API which allows to execute a real time streaming analytics using topics from your cluster and running simple applications. My next step is to move my EnMasse + Spark demo to an EnMasse + Kafka (and streaming) deployment. I’m also giving my support on the Apache Kafka code.

Conclusion

The variety and heterogeneity of all the above projects is giving me a lot of fun in my day by day work even collaborating with different people with different knowledge. I like learning new stuff and the great thing is that … things to learn are endless ! 🙂

IoT developer survey : my 2 cents one year later …

As last year, I have decided to write a blog post about my point of view on the IoT developer survey from the Eclipse Foundation (IoT Working Group) with IEEE, Agile IoT and the IoT Council.

From my point of view, the final report gives always interesting insights on where the IoT business is going and about that, Ian Skerrett (Vice President of Marketing at Eclipse Foundation) has already analyzed the results, available here, writing a great blog post.

I want just to add 2 more cents on that …

Industry adoption …

It’s clear that industries are adopting IoT and there is a big increment for industrial automation, smart cities, energy management, building automation, transportation, healthcare and so on. IoT is becoming “real” even if, as we will see in the next paragraphs, it seems that we are still in a prototyping stage. A lot of companies are investing on that but few of them have real solutions running in the field. Finally, from my point of view, it could be great to add more information about countries because I think that there is a big difference on how and where every country is investing for IoT.

The concerns …

Security is always the big concern but, as Ian said, interoperability and connectivity are on a downward trend; I agree with him saying that all the available middleware solutions and the IoT connectivity platforms are solving these problems. The great news is that all of them support different open and standard protocols (MQTT, AMQP but even HTTP) that is the way to go for having interoperability; at same time we are able to connect a lot of different devices, supporting different protocols, so the connectivity problem is addressed as well.

Coming back to security, the survey shows that much more software developers are involved on building IoT solutions even because all the stuff they mostly use are SSL/TLS and data encryption so at software level. From my point of view, some security concerns should be addressed at hardware level (using crypto-chip, TPM and so on) but this is an area where software developers have a lack of knowledge. It’s not a surprise because we know that IoT needs a lot of different knowledge from different people but the survey shows that in some cases not the “right” people are involved on developing IoT solution. Too much web and mobile developers are working on that, too few embedded developer with a real hardware knowledge.

Languages : finally a distinction !

Last year, in my 2 cents, I asked for having a distinction on which side of an IoT solution we consider the most used programming languages. I’m happy to know that Eclipse Foundation got this suggestion so this year survey asked about languages used on constrained devices, gateway and cloud.

iot_survey

The results don’t surprise me : C is the most used language on “real” low constrained devices and all the other languages from Java to Python are mostly used on gateways; JavaScript fits in the cloud mainly with NodeJS. In any case, NodeJS is not a language so my idea is that providing only JavaScript as possible answer was enough even because other than using a server-side framework like NodeJS the other possibility is using JavaScript in “function as a service” platforms (i.e. Lambda from AWS, Azure Functions and so on) that are mostly based on NodeJS. Of course, the most used language in the cloud is Java.

What about OS ?

Linux is the most used OS for both constrained devices and IoT gateways but … here a strange thing comes in my mind. On “real” constrained devices that are based on MCUs (i.e. Cortex-Mx) you can run few specific Linux distros (i.e. uCLinux) and not a full Linux distro so it’s strange that Linux wins on constrained devices but then when the survey shows what distros are used, uCLinux has a very low percentage. My guess is that a lot of software developers don’t know what a constrained device is 🙂

On constrained devices I expect that developers uses “no OS” (programming on bare metal) or a really tiny RTOS but not something closed to Linux.

On gateways I totally agree with Linux but Windows is growing from last year.

Regarding the most used distros, the Raspbian victory shows that we are still in a prototyping stage. I can’t believe that developers are using Raspbian so the related Raspberry Pi hardware in production ! If it’s true … I’m scared about that ! If you know what are the planes, trains, building automation systems which are using something like that, please tell me … I have to avoid them 🙂

Regarding the protocols …

From my point of view, the presence of TCP/IP in the connectivity protocols results is misleading. TCP/IP is a protocol used on top of Ethernet and Wi-Fi that are in the same results and we can’t compare them.

Regarding communication protocols, the current know-how is still leading; this is the reason why HTTP 1.1 is still on the top and HTTP 2.0 is growing. MQTT is there followed by CoAP, which is surprising me considering the necessity to have an HTTP proxy for exporting local traffic outside of a local devices network. AMQP is finding its own way and I think that in the medium/long term it will become a big player on that.

Cloud services

In this area we should have a distinction because the question is pretty general but we know that you can use Amazon AWS or Microsoft Azure for IoT in two ways :

as IaaS hosting your own solution or an open source one for IoT (i.e. just using provided virtual machines for running an IoT software stack)
as PaaS using the managed IoT platforms (i.e. AWS IoT, Azure IoT Hub, …)

Having Amazon AWS on the top doesn’t surprise me but we could have more details on how it is used by the IoT developers.

Conclusion

The IoT business is growing and its adoption as well but looking at these survey results, most of the companies are still in a prototyping stage and few of them have a real IoT solution in the field.

It means that there is a lot of space for all to be invited to the party ! 😀

A new “Kafka” novel : the OpenShift & Kubernetes deployment

This blog post doesn’t want to be an exhaustive tutorial to describe the way to go for having Apache Kafka deployed in an OpenShift or Kubernetes cluster but just the story of my journey for having a “working” deployment and using it as a starting point to improve over time with a daily basis work in progress. This journey started using Apache Kafka 0.8.0, went through 0.9.0, finally reaching the nowadays 0.10.1.0 version.

From “stateless” to “stateful”

One of the main reasons to use a platform like OpenShift/Kubernetes (let me to use OS/K8S from now) is the scalability feature we can have for our deployed applications. With “stateless” applications there are not so much problems to use such a platform for a Cloud deployment; every time an application instance crashes or needs to be restarted (and/or relocated to a different node), just spin up a new instance without any relationship with the previous one and your deployment will continue to work properly as before. There is no need for the new instance to have information or state related to the previous one.

It’s also true that, out there, we have a lot of different applications which need to persist state information if something goes wrong in the Cloud and they need to be restarted. Such applications are “stateful” by nature and their “story” is important so that just spinning up a new instance isn’t enough.

The main challenges we have with OS/K8S platform are :

pods are scaled out and scaled in through Replica Sets (or using Deployment object)
pods will be assigned an arbitrary name at runtime
pods may be restarted and relocated (on a different node) at any point in time
pods may never be referenced directly by the name or IP address
a service selects a set of pods that match specific criterion and exposes them through a well-defined endpoint

All the above considerations aren’t a problem for “stateless” applications but they are for “stateful” ones.

The difference between them is also know as “Pets vs Cattle” meme, where “stateless” applications are just a herd of cattle and when one of them die, you can just replace it with a new one having same characteristics but not exactly the same (of course !); the “stateful” applications are like pets, you have to take care of them and you can’t just replace a pet if it’s die 😦

Just as reference you can read about the history of “Pets vs Cattle” in this article.

Apache Kafka is one of these type of applications … it’s a pet … which needs to be handle with care. Today, we know that OS/K8S offers Stateful Sets (previously known as Pet Sets … for clear reasons!) that can be used in this scenario but I started this journey when they didn’t exist (or not released yet), so I’d like to share with you my story, the main problems I encountered and how I solved them (you’ll see that I have “emulated” something that Stateful Sets offer today out of box).

Let’s start with a simple architecture

Let’s start in a very simple way using a Replica Set (only one replica) for Zookeeper server and the related service and a Replica Set (with three replicas) for Kafka servers and the related service.

reference_architecture_1st_ver

The Kafka Replica Set has three replicas for “quorum” and leader election (even for topic replication). The Kafka service is needed to expose Kafka servers access even to clients. Each Kafka server may need :

unique ID (for Zookeeper)
advertised host/port (for clients)
logs directory (for storing topic partitions)
Zookeeper info (for connection)

The first approach is to use the broker id dynamic generation so that when a Kafka server starts and needs to connect to Zookeeper, a new broker id is generated and assigned to it. The advertised host/port are just container IP and the fixed 9092 port while the logs directory is predefined (by configuration file). Finally, the Zookeeper connection info are provided through the related Zookeeper service using the related environment variables that OS/K8S creates for us (ZOOKEEPER_SERVICE_HOST and ZOOKEEPER_SERVICE_PORT).

Let’s consider the following use case with a topic (1 partition and 3 replicas). The initial situation is having Kafka servers with broker id 1001, 1002, 1003 and the topic with current state :

leader : 1001
replicas : 1001, 1002, 1003
ISR : 1001, 1002, 1003

It means that clients need to connect to 1001 for sending/receiving messages for the topic and that 1002 and 1003 are followers for having this topic replicated handling failures.

Now, imagine that the Kafka server 1003 crashes and a new instance is just started. The topic description becomes :

leader : 1001
replicas : 1001, 1002, 1003 <– it’s still here !
ISR : 1001, 1002 <– that’s right, 1003 is not “in-sync”

Zookeeper still sees the broker 1003 as a host for one of the topic replicas but not “in-sync” with the others. Meantime, the new started Kafka server has a new auto generated id 1004. A manual script execution (through the kafka-preferred-replica-election.sh) is needed in order to :

adding 1004 to the replicas
removing 1003 from replicas
new leader election for replicas

So what does it mean ?

First of all, the new Kafka server instance needs to have the same id of the previous one and, of course, the same data so the partition replica of the topic. For this purpose, a persistent volume can be the solution used, through a claim, by the Replica Set for storing the logs directory for all the Kafka servers (i.e. /kafka-logs-<broker-id>). It’s important to know that, by Kafka design, a logs directory has a “lock” file locked by the server owner.

For searching for the “next” broker id to use, avoiding the auto-generation and getting the same data (logs directory) as the previous one, a script (in my case a Python one) can be used on container startup before launching the related Kafka server.

In particular, the script :

searches for a free “lock” file in order to reuse the broker id for the new Kafka server instance …
… otherwise a new broker id is used and a new logs directory is created

Using this approach, we obtain the following result for the previous use case :

the new started Kafka server instance acquires the broker id 1003 (as the previous one)
it’s just automatically part of the replicas and ISR

use_case_locked_id

But … what on Zookeeper side ?

In this deployment, the Zookeeper Replica Set has only one replica and the service is needed to allow connections from the Kafka servers. What happens if the Zookeeper crashes (application or node fails) ? The OS/K8S platform just restarts a new instance (not necessary on the same node) but what I see is that the currently running Kafka servers can’t connect to the new Zookeeper instance even if it holds the same IP address (through the service usage). The Zookeeper server closes the connections after an initial handshake, probably related to some Kafka servers information that Zookeeper stores locally. Restarting a new instance, this information are lost !

Even in this case, using a persistent volume for the Zookeeper Replica Set is a solution. It’s used for storing the data directory that will be the same for each instance restarted; the new instance just finds Kafka servers information in the volume and grants connections to them.

reference_architecture_1st_ver_zookeeper

When the Stateful Sets were born !

At some point (from the 1.5 Kubernetes release), the OS/K8S platform started to offer the Pet Sets then renamed in Stateful Sets like a sort of Replica Sets but for “stateful” application but … what they offer ?

First of all, each “pet” has a stable hostname that is always resolved by DNS. Each “pet” is being assigned a name with an ordinal index number (i.e. kafka-0, kafka-1, …) and finally a stable storage is linked to that hostname/ordinal index number.

It means that every time a “pet” crashes and it’s restarted, the new one will be the same : same hostname, same name with ordinal index number and same attached storage. The previous running situation is fully recovered and the new instance is exactly the same as the previous one. You could see them as something that I tried to emulate with my scripts on container startup.

So today, my current Kafka servers deployment has :

a Stateful set with three replicas for Kafka servers
an “headless” service (so without an assigned cluster IP) that is needed for having Stateful set working (so for DNS hostname resolution)
a “regular” service for providing access to the Kafka servers from clients
one persistent volume for each Kafka server with a claim template defined in the Stateful set declaration

reference_architecture_statefulsets

Other then to use a better implementation 🙂 … the current solution doesn’t use a single persistent volume for all the Kafka servers (having a logs directory for each of them) but it’s preferred to use a persistent storage dedicated to only one “pet”.

It’s great to read about it but … I want to try … I want to play !

You’re right, I told you my journey that isn’t finished yet but you would like to try … to play with some stuff for having Apache Kafka deployed on OS/K8S.

I called this project Barnabas like one of the main characters of the author Franz Kafka who was a … messenger in “The Castel” novel :-). It’s part of the bigger EnMasse project which provides a scalable messaging as a service (MaaS) infrastructure running on OS/K8S.

The repo provides different deployment types : from the “handmade” solution (based on bash and Python scripts) to the current Stateful Sets solution that I’ll improve in the coming weeks.

The great thing about that (in the context of the overall EnMasse project) is that today I’m able to use standard protocols like AMQP and MQTT to communicate with an Apache Kafka cluster (using an AMQP bridge and an MQTT gateway) for all the use cases where using Kafka makes sense against traditional messaging brokers … that from their side have to tell about a lot of stories and different scenarios 😉

Do you want to know more about that ? The Red Hat Summit 2017 (Boston, May 2-4) could be a good place, where me and Christian Posta (Principal Architect, Red Hat) will have the session “Red Hat JBoss A-MQ and Apache Kafka : which to use ?” … so what are you waiting for ? See you there !

IoT developer survey : my point of view

Few days ago, the Eclipse Foundation published the report of the last IoT developer survey sponsored by the foundation itself with IEEE IoT and Agile IoT. This survey has as main objective to understand what are the preferred technologies used by developers in terms of languages, standards and operating systems; furthermore, it shows what are the main concerns about IoT and how companies are shipping IoT solutions today.

Great content about this report was published by Ian Skerrett (Vice President of Marketing at Eclipse Foundation) on his blog and on slideshare with a summary of all main information about it.

I’d like just to add my 2 cents and doing some absolutely personal considerations about the results …

Companies are investing …

Regarding how companies are delivering IoT solutions, it’s clear that the IoT market is growing. A lot of companies already have IoT products in the fields and the others are planning to develop them in the coming months. It’s not a surprise, other than a buzzword, the IoT is a real business opportunity for all companies strictly related to the embedded devices (silicon vendors, OEMs, ..) or software companies (for the cloud and application side) which are rapidly change how their business is made.

Security and interoperability : the big concerns

The result related to the main concerns about IoT is very clear : people and companies are worried about the security. All data flowing from our personal life or owned by companies to the cloud need to be protected in order to avoid someone can steal them. The concern about security is strictly related to software protocols (i.e. SSL/TLS, …) and hardware stuff (i.e. cryptochip, …) and today it seems that a very good solution isn’t available. The same is for interoperability : having a lot of IoT standard protocols means having NO standard protocols. A lot of consortiums are trying to define some standard specifications and frameworks in order to define a standard but … they are too much; all big companies are divided in different consortiums and some of them are part of more then one : this is a big deal, as for protocols … it means NO standard.

Developer prefer Java and C … what about JavaScript ?

It’s not a surprise the first place of Java as preferred language and C as second one : Java is used in a lot of cloud solutions which are based on open source products and C is the better language for developing on devices side with great performance at low cost (at least from an hardware point of view). First strange position is about JavaScript as third most used language : I hope this position is related to its huge usage with NodeJS on server side and not as “embedded” language on devices … I’m scared about that.

Protocols : the current know-how is leading

Now, the protocols …

Having HTTP/1.1 as first used protocol is real because today it’s the only well known protocol in the developers world; in order to develop and deliver an IoT solution with a quick time to market, companies leverage on internal know-how and sometimes they don’t invest to figure out how other protocols work and if they have other advantages. It explains to me this position, thanks to HTTP/1.1 simplicity and its ASCII/text based nature : a lot of developers don’t like binary format so much. Last point is that the REST architecture is a very good solution in a lot of scenarios and HTTP/1.1 is the most used protocol (the only one ?) for that.

MQTT and CoAP are used a lot thanks to the available open source projects and their simplicity; MQTT is very lightweight and works great on tiny embedded devices, CoAP tries to overcome some HTTP/1.1 disadvantages (i.e. server push, observer, …) with new features and its binary nature.

A lot of developers are scared about AMQP because I have to admit it’s not so simple like the previous ones but it’s powerful and everyone should give it a try. If you want to start with it, you can find a lot of links and resources here.

I’m surprised by the fourth position of HTTP/2.0 ! I mean … how many developers know, love and use HTTP/2 today ? I was surprised by this high position … I expected it behind “in-house, proprietary”, AMQP and XMPP. I suppose that companies are prototyping solutions using this protocol because they think that thanks to the HTTP/1.1 knowledge it’s quite simple to move to the next version : I think it’s totally wrong, because HTTP/2.0 is completely different from HTTP/1.1. I love it … I’ll invest in it.

OS : Linux and RTOS on bare metal

Regarding operating systems, the first position for Linux isn’t a surprise but we have to consider it both on server side and devices side (even if embedded devices based on Linux are a lot). The other OS are only for embedded devices (low constrained devices) so the percentages don’t have any help from cloud side. Finally Linux is useful for IoT gateway too (as we know with Kura) even if Microsoft, for example, is investing in its Windows IoT Core and will release an IoT Gateway SDK in the next months.

All the services in the cloud

Not a surprise Amazon AWS with its first position as Cloud services provider but I don’t think about their relatively new AWS IoT platform but all the IoT open source stuff that developers prefer to run on Amazon VMs than Azure VMs.

Conclusion

Here the great news is that IoT market is growing and developers/companies are investing in it to try to be on the market as soon as possible. The “bad” news is that too much different protocols and frameworks are used and the way to interoperability and interconnection is quite long or … infinite ?

Azure IoT Hub is GA : the news !

Yesterday, the Microsoft Azure IoT Hub was released in GA !

The public preview had a good success with a lot of people (makers) and companies (professional) try to use it for developing their IoT end to end solutions.

In a previous blog post, I have already discussed about its mean features with a comparison with AWS IoT, the Internet of Things platform by Amazon.

Relating to that article, there are the following differences it’s important to focus on :

Azure IoT Hub now supports MQTT 3.1.1 natively ! There is no need to use a field gateway for translating MQTT to AMQP (or HTTP) to communicate with the Hub. Now, your MQTT enabled devices can connect directly to the Cloud and you can use the SDK provided by Microsoft (with an API abstraction layer on top of MQTT) or any MQTT library (and M2Mqtt is a good choice for C# applications). Of course, the connection must be always encrypted with SSL/TLS protocol. More information at official documentation page here.
The pricing is changed : first of all, the pricing isn’t related to the number of devices (as the public preview) but only to the total number of messages/day. The bad news is that starting from April 1st the S1 and S2 plans will have a doubled price. Of course, the Free plan … will be still free !
AMQP over WebSockets : the AMQP protocol is supported on WebSockets too (like Event Hubs for example).

With the above two major news, the Azure IoT Hub offer is closer to AWS IoT offer : it supports MQTT and removed the devices limit on pricing.

News are not only on the Cloud side but on devices side too !

In the last months, a lot of OEMs and hardware companies worked hard to support Windows 10 for IoT Core and Azure IoT Hub connection on their platforms. Today the number of Azure Certified IoT Partners is literally increased !

new_iothub_partners

It’s great to see that the Hub ecosystem is growing … now we have to wait for real IoT solutions based on it !

To start learning about Azure IoT Hub, I advice you the link to the Azure IoT Hub Learning Path which will guide you through all the steps needed to use the Hub in the best way.

The “hybrid” Internet of Things

The title of this blog post could sound strange to you … what’s the “hybrid” Internet of Things ? Why speaking about an hybrid nature related to IoT ? As we’ll see, It’s not something new but we can consider it both a new way to approach already running solutions or an old way to see new solutions in the IoT space … it’s up to you choosing the interpretation you like. In the Internet of Things nothing is new, today it’s only the right time for connecting “objects” on large scale.

So … what’s “hybrid” Internet of Things ?

The hybrid cloud

For several years, the cloud computing was the most used base for a lot of enterprise architectures and companies decided to move all their data, computation, processing and so on from “on premise” infrastructures to the cloud itself.

The cloud seems to offer infinite storage space and scaling for computation without any concerns from a company point of view which can configure all the features to change automatically. The result could be less time to spend on handling “on premise” infrastructures and less (?) moneys to invest.

hybrid-cloud

Even if all available cloud platforms (Microsoft Azure, Amazon AWS, IBM Bluemix, Google App Engine, …) have got certifications about data storage and related protection there is one big concern for companies which invest in a distributed architecture … the security and privacy on data.

Data are the money of the new century and for this reason today a lot of companies prefer to store sensible data in their private servers but rely on scaling and computational features of public servers in the cloud : the hybrid cloud was born in order to connect these two infrastructures.

In the hybrid architecture, the company protects sensible data in house and leverage on the computation of public cloud exchanging only not sensible data using encrypted connection of course.

One of the most important players in the hybrid cloud is Red Hat which leverage on all main open source projects to develop its offer.

The idea of Internet of Things

When Kevin Ashton coined the term “Internet of Things” in the 1999, he wanted to describe the connection between the physical world and Internet speaking about “things” (not only people) on the public network.

iot

At that time, the hardware was too much expensive, too big in size, not so powerful and with a poor connection. Nowadays we have very tiny devices with a lot of CPU power and memory other than all kind of connectivity : all these things at a very low cost. These conditions have favored the birth of “maker” movement : a lot of non professional people can now develop on embedded devices in a very simple way with the programming language they love. Not only the “old” but powerful C/C++ but many other high level languages like C#, Java, Python and JavaScript of course.

In this scenario, the main idea is connecting all these things to Internet directly, sending data to the cloud for processing and receiving command to control devices from the cloud itself. The connection is the main needed feature that is provided on the embedded devices themselves if they have enough resources or through a more powerful node called “field gateway”.

The direct connection to Internet means a very huge number of devices connected to the big network but it means a lot of problems too. Of course, it’s true for all the devices that are TCP/IP capable (via Ethernet or Wi-Fi) but not for a very huge number of “old” devices with legacy connections (RS232, RS485 …), devices with PAN (Personal Area Network) support (BLE, ZigBee, Z-Wave, …) or industrial protocols (OPC-UA, Modbus, CAN, …). In these scenarios, the field gateway becomes the bridging point between devices and the cloud.

Moving IoT in the “fog”

As we can see in most cases there is the need to add another node at the edge of an IoT solution to bring the real world data and control to the cloud. There are too much problems to rely on a public servers infrastructure only, it’s true both on devices and cloud side.

Moving part of the intelligence at the edge of an IoT solution and closed to the “T” side is an old practise in industrial environment but has a huge value today considering the growing number of connected devices. This approach has a well defined name today : the fog computing.

fog

There are a lot of concerns about cloud computing solved by the fog. Let’s try to summarize them !

The protocols Babel tower

The first role of a field gateway is the protocol translation. This scenario is always true : it’s true for devices which are already TCP/IP capable for connecting to the Internet but it’s also true for legacy and low constrained devices which need a bridging point to the cloud. Devices could be able to connect directly to the cloud from a network capabilities point of view but they couldn’t speak the same “language”, it means the same protocol. Babel tower is the home for a lot of protocols today and the online IoT platforms can’s speak all of them. The first need is a local translation from devices protocol to cloud protocol in both cases if it’s based on TCP/IP (see MQTT, AMQP, HTTP, …) or on personal area network (see BLE for example).

Reducing cloud workload

Using an IoT field gateway, part of the work is done in the fog at the edge of our complex solution and it means reducing the cloud workload. Data centers are huge with powerful public servers but the resources aren’t unlimited; speaking about million/billion/trillion of connected devices is a problem for the cloud and reducing this number could be a very good solution. Thanks to a central node at the edge, we can establish to few connections but at same time sending data that are representative of more devices in a local network. Sometimes not all produced data are needed for processing in the cloud so a pre-processing is executed at the edge to analyze, filter, reduce, elaborate data to send.

Real time reaction

A lot of IoT solutions need a near real time reaction time. Connection to the cloud introduces latency : to control a device, the data is sending to the cloud through the “big net”, the server processes it and replies with a command to start an action on the device itself. There is a not negligible round trip related to the connection and to the server workload too, because it’s serving not only one device but a huge number of them.

Offline handling

Without an Internet connection, an IoT solution which is entirely based on an online platform can’t work ! Of course, all cloud providers offer a SLA (Service Level Agreement) very closed to 100% but a lot of times we have to consider the very low percentage of failure. With fog computing we have a local node and we have to deal with local network connection only. The same is true when the cloud platform is available but the connection isn’t reliable (and it’s like having the platform offline) or the bandwidth is low. Thanks to the central node at the edge we are able to handle a lot of offline scenarios : we can leverage on local storing data when the connection isn’t available.

Security and privacy

In a lot of scenarios protecting data is a must. Even if all cloud connections are encrypted and based on SSL/TLS protocol, more companies which build IoT solution prefer to have data in their private server to protect themselves and their customers (who are the data owners). Using a field gateway, we can filter data to avoid sending all of them to the cloud; we can hold sensible data in our local network and send the non sensible ones to the public servers.

Reducing price

All IoT cloud platforms aren’t for free but they have a cost which is related to the number of connected devices and the number of messages exchanged per hours/days. Thanks to the field gateway we can connect more local devices using only a single connection (for the gateway itself) and thanks to the pre-processing we can filter and reduce the amount of data to send : it means reducing cost. It’s more important when the traffic isn’t “free” but it has an higher cost like using GSM connection for example.

The “fog” computing is around us

If we think about the the now days IoT market, almost of the 100% of the solutions are “fog” solutions and don’t rely on a pure cloud architecture. We can think about the connected cars : all the sensors speak a specific protocol on a CAN bus and a central and unique gateway is the only one having an Internet connection to send gathered data to the cloud.

connected_cars

The same is for industrial environments (think about a cars production line) where all devices use industrial protocols like OPC-UA and need near real time reaction time (think about the robots in the production line).

production_line

The smart home is another example made of BLE, ZigBee, Z-Wave based devices connected to the Internet through a central router. In general, the smart grid solutions like smart cities are based on an architecture made of more local networks connected to the cloud through a single point. Last simple example could be considered the wearable market : all wearable devices are very low constrained and aren’t TCP/IP capable so for this reason we need a gateway to send their data to the cloud; in most cases this gateway is our smartphone !

Conclusion

All around the world, big companies speak about their great online IoT platforms (you can read about two of them on this article). Of course, we need them because building an “on premise” IoT solution is almost impossible. It’s also true that we can’t rely on a “pure” cloud architecture. We saw that the solutions already in the field aren’t so pure but they are “fog” solutions and it could be considered the only reliable approach we have to use for future implementation.

The future is not “pure” … it’s “hybrid” … for this reason I like to speak about “hybrid” Internet of Things !