Kubernetes

Your enterprise containers orchestrator in the cloud: deploying OpenShift on Azure

openshift_azure

In a “containerized” world running “cloud native” applications, there is the need for a containers orchestrator as Kubernetes which is the best open source platform providing such features. Furthermore, Red Hat pushed this project to an enterprise-grade level developing OpenShift (on top of Kubernetes) adding more powerful features in order to simplify the applications development pipeline. Today, thanks to this platforms, you can easily move your containerized applications (or microservices based solutions) from an on-premise installation to the cloud without changing anything on the development side; it’s just a matter of having a Kubernetes or OpenShift cluster up and running and moving your container from one side to the other.

As already announced during the latest Red Hat Summit in San Francisco, in order to improve the developers experience, Red Hat and Microsoft, are working side by side for providing a managed OpenShift service (as it is already available for Kubernetes with AKS); you can read more about this effort here.

But … while waiting for having this awesome and really easy solution, what’s the way for deploying an OpenShift cluster on Azure today?

Microsoft is already providing some good documentation for helping you on doing that for both the open source OKD project (Origin Community Distribution of Kubernetes, formerly known as OpenShift Origin) and the Red Hat productized version named OCP (OpenShift Container Platform) if you have a Red Hat subscription. Furthermore, there are two different GitHub repositories which provide templates for deploying both: the openshift-origin and the openshift-container-platform.

In this blog post, I’d like to describe my personal experience about deploying OKD on Azure trying to summarize and explain all the needed steps for having OpenShift up and running in the cloud.

Get the Azure template

First of all, you have to clone the openshift-origin repository, available under the Microsoft GitHub organization, which provides the deployment template (in JSON format) for deploying OKD.

git clone https://github.com/Microsoft/openshift-origin.git

Because the master branch contains the most current release of OKD with experimental items, this may cause instability but includes new things. For this reason, it’s better switching to one of the branches for using one of the stable releases, for example, the 3.9:

git checkout -b release-3.9 origin/release-3.9

This repository contains the ARM (Azure Resource Manager) template file for deploying the needed Azure resources, the related configurable parameters file and the Ansible scripts for preparing the nodes and deploying the OpenShift cluster.

You will come back to use the artifacts provided in this repository later because there are some other steps you need to do as prerequisites for the final deployment.

Generate SSH key

In order to secure access to the OpenShift cluster (to the master node), an SSH key pair is needed (without a passphrase). Once you have finished deploying the cluster, you can always generate new keys that use a passphrase and replace the original ones used during initial deployment. The following command shows how it’s possible using the ssh-keygen tool on Linux and macOS (for doing the same on Windows, follow here).

ssh-keygen -f ~/.ssh/openshift_rsa -t rsa -N ''

The above command will store the SSH key pair, generated using the RSA algorithm (-t rsa) and without a passphrase (-N ”), in the openshift_rsa file.

01_ssh_keygen

In order to make the SSH private key available in the Azure cloud, we are going to use the Azure Key Vault service as described in the next step.

Store SSH Private Key in Azure Key Vault

Azure Key Vault is used for storing cryptographic keys and other secrets used by cloud applications and services. It’s useful to create a dedicated resource group for hosting the Key Vault instance (this group is different from the one for deploying the OpenShift cluster).

az group create --name keyvaultgroup --location northeurope

Next step is to create the Key Vault in the above group.

az keyvault create --resource-group keyvaultgroup --name keyvaultopenshiftazure --location northeurope --enabled-for-template-deployment true

The Key Vault name must be globally unique (so not just inside the just created resource group).

Finally, you have to store the SSH key into the Key Vault for making it accessible during the deployment process.

az keyvault secret set --vault-name keyvaultopenshiftazure --name openshiftazuresecret --file ~/.ssh/openshift_rsa

Create an Azure Active Directory service principal

Because OpenShift communicates with Azure, you need to give it the permissions for executing operations and it’s possible creating a service principal under Azure Active Directory.

The first step is to create the resource group where we are going to deploy the OpenShift cluster. The service principal will assign the contributor permissions to this resource group.

az group create --name openshiftazuregroup --location northeurope

Finally, the service principal creation.

az ad sp create-for-rbac --name openshiftazuresp --role Contributor --password mypassword --scopes /subscriptions//resourceGroups/openshiftazuregroup

From the JSON output, it’s important to take a note of the appId field which will be used as aadClientId parameter in the deployment template.

02_create_sp

Customize and deploy the Azure template

Back to the openshift-origin GitHub repository, we have cloned before, the azuredeploy.parameters.json file provides all the parameters for the deployment. All the corresponding possible values are defined in the related azuredeploy.json file.

This file defines the deployment template and it describes all the Azure resources that will be deployed for making the OpenShift cluster such as virtual machines for the nodes, virtual networks, storage accounts, network interfaces, load balancers, public IP addresses (for master and infra node) and so on. It also describes the parameters with the related possible values.

Taking a look at the parameters JSON file there are some parameters with a “changeme” value. Of course, it means that we have to assign meaningful values to these parameters because they make the main customizable part of the deployment itself:

  • openshiftClusterPrefix: a prefix used for assigning hostnames for all nodes – master, infra and nodes.
  • adminUsername: admin username for both operating system login and OpenShift login
  • openshiftPassword: password for the OpenShift login
  • sshPublicKey: the SSH public key generated in the previous steps (the content of the ~/.ssh/openshift_rsa.pub file)
  • keyVaultResourceGroup: the name of the resource group that contains the Key Vault
  • keyVaultName: the name of the Key Vault you created
  • keyVaultSecret: the Secret Name you used when creating the Secret (that contains the Private Key)
  • aadClientId: the Azure Active Directory Client ID you should have noted when the service principal was created
  • aadClientSecret: the Azure Active Directory service principal secret/password chosen on creation

Other useful parameters are:

  • masterVmSize, infraVmSize and nodeVmSize: size for VMs related to the master, infra and worker nodes
  • masterInstanceCount, masterInstanceCount and masterInstanceCount: number of master, infra and worker nodes

All the deployed nodes run CentOS as the operating system.

Instead of replacing the values in the original parameters file, it’s better making a copy and renaming it, for example, as azuredeploy.parameters.local.json.

After configuring the template parameters, it’s possible to deploy the cluster running the following command from inside the GitHub repository directory (for using the contained JSON template file and the related parameters file).

az group deployment create -g openshiftazuregroup --name myopenshiftcluster --template-file azuredeploy.json --parameters @azuredeploy.parameters.local.json

This command could take quite a long time depending on the number of nodes and their size and at the end, it will show the output in JSON format from which the two main fields (in the “outputs” section) are:

  • openshift Console Url: it’s the URL of the OpenShift web console you can access using the configured adminUsername and openshiftPassword
  • openshift Master SSH: contains information for accessing to the master node via SSH
  • openshift Infra Load Balancer FQDN: contains the URL for accessing the infra node

In order to access the OpenShift web console, you can just copy and paste the “openshift Console Url” value in your preferred browser.

03_openshift_console

In the same way, you can access the OpenShift master node by just copying and pasting the “openshift Master SSH” value in a terminal.

Once on the master node, you can start to use the oc tool for interacting with the OpenShift cluster, for example showing the available nodes as follows.

04_master_ssh

You can notice that these URLs don’t have really friendly names and it’s because the related variables in the template for defining them are:


"infraLbPublicIpDnsLabel": "[concat('infradns', uniqueString(concat(resourceGroup().id, 'infra')))]",

"openshiftMasterPublicIpDnsLabel": "[concat('masterdns', uniqueString(concat(resourceGroup().id, 'master')))]"

Conclusion

Nowadays, as you can see, following a bunch of simple steps and leveraging the Azure template provided by Microsoft and Red Hat, it’s not so difficult to have an OpenShift cluster up and running in the Azure cloud. If you think about all the resources that need to be deployed (nodes, virtual network, load balancers, network interfaces, storage, …), the process really simplifies the overall deployment. Of course, the developers’ life will be easier when Microsoft and Red Hat will announce the managed OpenShift offer.

Having an OpenShift cluster running in the cloud it’s just the beginning of the fun … so, now that you have it … enjoy developing your containerized applications!

Advertisements

Apache Kafka on Kubernetes and OpenShift : Barnabas is died … long life to Strimzi !

Almost one year and half ago, I started my journey about running Apache Kafka on Kubernetes and OpenShift. At that time, these containers orchestration platforms were focused on “stateless” (micro)services so there wasn’t a real support for a technology like Apache Kafka which is “stateful” by definition.

I wrote this blog post about my “investigation”, highlighting the problems to address in order to have Apache Kafka running on such a platforms. The solution was trying to “mimic” something that would have been added in the following months : the PetSets (then renamed in StatefulSets).

I created a new project named “Barnabas” (the name came from a character in a Franz Kafka novel; he was a messenger) with the objective to help developers on having resources (i.e. all needed YAML files) for deploying Apache Kafka on Kubernetes and OpenShift.

I got few people using “Barnabas” for their demos and proofs of concept receiving feedback and improvements; for example the Debezium team started to use it for deploying Apache Kafka with their supported Kafka Connect connectors; some people from the Aerogear project used it for some POCs as well.

Barnabas is died … long life to Strimzi !

Today … “Barnabas” isn’t here anymore 😦

It’s sad but it’s not so true ! It just has a new name which is Strimzi !

strimzi

The objective here is always the same : providing a way to run an Apache Kafka cluster on Kubernetes and OpenShift. Of course, the project is open source and I hope that a new community can be born and grow around it : today I’m not the only one contributor and that’s great !

The current first early release (the 0.1.0) provides all the YAML resources needed for deploying the Apache Kafka cluster in terms of StatefulSets (used for the broker and Zookeeper nodes), Services (for having the nodes able to communicate each other and reachable by the clients), Persistent Volume Claims (for storing Kafka logs other then supporting “ephemeral” storage with emptyDir) and finally metrics support in order to get metrics data from the cluster through Prometheus and showing them in a Grafana dashboard.

Selection_011

Other than that, Strimzi provides a way for deploying Kafka Connect as well alongside a Kafka cluster. In order to simplify the addition of new connectors when running on OpenShift, the deployment leverage some unique OpenShift features like “Builds” and “S2I” images.

The future … is bright

While the current release already provides a simple way to deploy the Apache Kafka cluster (“templates” are also provided in the OpenShift use case) the future is rich of improvements and features we’d like to add.

First of all, we are working on not having these YAML resources anymore but using the “operator” approach (well known in the Kubernetes world).

Two main components will be part of such an approach in a short time : a cluster controller and a topic controller.

The cluster controller, running on Kubernetes (OpenShift), is in charge to deploy an Apache Kafka cluster based on the configuration provided by the user through a “cluster” ConfigMap resource. Its main work is to watch for a ConfigMap which contains the cluster configuration (i.e. number of broker nodes, number of Zookeeper nodes, healthcheck information, broker configuration properties, metrics configuration and so on) and then deploying the cluster based on such information. During its life, the cluster controller is also in charge to check updates on the ConfigMap and reflecting the changes on the already deployed cluster (i.e. the user increase the number of broker nodes in order to scale up the cluster or change some metrics parameters and so on).

The topic controller, always running on Kubernetes (OpenShift), provides a way to manage the Kafka topics without interacting with the Kafka brokers directly but using a ConfigMap approach. In the same way as the cluster controller, the topic controller is in charge to watch for specific ConfigMap resources which describe topics : this mean that a user can create a new ConfigMap containing topic information (i.e. name, number of partitions, replication factor and so on) and the topic controller will create the topic in the cluster. As already happens with the cluster controller, the topic controller is also in charge to check updates on the “topic” ConfigMap reflecting its changes to the cluster as well. Finally, this component is also able to handle scenarios where topics changes don’t happen on the ConfigMap(s) only but even directly into the Kafka cluster. It’s able to run a “3-way reconciliation” process in order to align topic information from these different sources.

Conclusion

Having these two components will be the next step for Strimzi in the short term but more improvements will come related to security, authentication/authorization and automatic cluster balancing where, thanks to the metrics, a cluster balancer will be able to balance the load across the different nodes in the cluster re-assign partitions when needed.

If you want to know more about the Strimzi project, you can engage with us in different ways, from IRC to the mailing list and starting following the official Twitter account. Thanks to its open source nature you can easily jump into the project providing feedback or opening issues and/or PRs … becoming a new contributor !

Looking forward to hear from you ! 😉

We can have more … EnMasse !

This morning, my working day started in a different way with an interesting news from AWS re:Invent 2017, the annual Amazon conference …

The news was about Amazon MQ, a new managed message broker service based on ActiveMQ 5.x with all the goodies that it provides in terms of supported protocols like MQTT, JMS, STOMP, … and … yes … AMQP 1.0 !

It seems that this news made Clemens Vaster (from Microsoft) happy as well 🙂

Selection_078

Finally, even Amazon added support for a “real” messaging protocol which is enterprise ready and from my point of view … even IoT ready 🙂

Taking a look to the blog post about this new service, another project came to my mind … guess what ? EnMasse !

We can have more : EnMasse !

What the AmazonMQ provides is the possibility to create a new broker instance and then accessing to the console, creating queues, topics and so on. It’s great but … we can have more !

For this reason I decided to write, for the first time, something about EnMasse even if I had a lot of sessions in different conferences in the past, speaking about it.

EnMasse is an open source “messaging as a service” platform which simplifies the deployment of a messaging infrastructure both “on premise” and in the Cloud. It provides scalability and elasticity in order to address all the problems we can have when the number of connected clients increases (and decreases) even reaching big numbers like in an IoT scenario.

It supports all the well-known messaging patterns (request/reply, publish/subscribe and competing consumers) and up today two main protocols, AMQP 1.0 and MQTT (but adding the HTTP support is on the road-map).

It provides multi-tenancy having different tenants sharing the same infrastructure but being isolated each other. Finally, it provides security in terms of using TLS protocol for establishing connections (with clients and between internal components) other than authentication using Keycloak as the identity management system.

Store and forward or … direct ?

One of the main features it provides is the support for two different messaging mechanisms, “store and forward” and “direct messaging”.

The “store and forward” mechanism is exactly what the messaging brokers provide today. The broker takes the ownership of the message sent by a producer before forwarding this message to a consumer which is asking for it (connecting to a queue or a topic on the broker itself). It means that “storing” the message is the first step executed by the broker and “forwarding” is the next one which can happen later, only when a consumer will be online for getting the message : it allows asynchronous communication between clients and time decoupling. There is always a double contract between produce-broker and broker-consumer, so that the producer knows that the messages reached the broker but not the consumer (a new messages exchange on the opposite direction is needed for having something like an “acknowledgement” from the consumer).

The “direct messaging” mechanism is not something new because it means having a sort of “direct” communication between clients, so that the producer is able to send the message only when the consumer is online with a single contract between the parties : when the producer receives the “acknowledgement”, it means that the consumer has got the message. Of course, EnMasse provides this mechanism in a reliable way : it uses an AMQP 1.0 routers network (connected in a mesh) so that clients aren’t really connected in a direct way but through this mesh. Every router, unlike a broker, doesn’t take ownership of the message but just forwards it to the next hop in the network in order to reach the destination. When a router crashes, the network is automatically re-configured in order to determine a new path for reaching the consumer; it means that high availability is provided in terms of “path redundancy”. Furthermore, thanks to the AMQP 1.0 protocol, a producer doesn’t receive “credits” from a router to send messages if the consumer isn’t online or can’t process more messages.

EnMasse provides these messaging mechanisms using two open source projects : Apache Qpid Dispatch Router, for the router network, and ActiveMQ Artemis (so ActiveMQ 6.x and not 5.x like in the AmazonMQ) for the brokers side.

enmasse_overall_view

I want to know only about “addresses” !

Comparing to the new AmazonMQ service, from a developers point of view, the interesting part is the abstraction layer that EnMasse adds to the underlying messaging infrastructure. You can create a new “address” using the console and specifying a type which can be :

  • queue : backed by a broker, for “store and forward” and for providing competing consumer pattern, asynchronous communication and so on.
  • topic : backed by a broker, for “store and forward” as well but for providing publish/subscribe pattern.
  • anycast : it’s something like a queue but in terms of “direct messaging”. A producer can send messages to such an address only when one or more consumers are listening on it and the routers network will deliver them in a competing consumer fashion.
  • multicast : it’s something like a topic but in terms of “direct messaging”, having a producer publishing messages to more consumers listening on the same address so that all of them receive the same message.

Selection_081

The developer doesn’t have to worry about creating the broker, configuring the routers and so on; using the console and a few simple steps in the wizard, he will have a usable “address” for exchanging messages between clients.

Selection_082

Good for microservices …

The interesting part of having the supported “direct messaging” mechanism is even, but not only, about the “micro-services” world. EnMasse can be used as a messaging infrastructure for handing request/reply and publish/subscribe between micro-services using an enterprise protocol like AMQP 1.0.

You can read more about building an AMQP 1.0 based API and a micro-services infrastructure in this article written by on of my colleague, Jakub Scholz.

Who orchestrate ? OpenShift and Kubernetes

Another aspect which makes EnMasse more appealing than other solutions is that it’s totally containerized and runs on the main containers orchestration platforms like Kubernetes and the enterprise OpenShift (using the OpenShift Origin project as well). It means that your messaging based (or IoT) solution can be deployed “on promise” and then easily moved to the Cloud without any changes to your applications (maybe just the addresses for establishing the connections from the clients).

Selection_079

Conclusion

Of course, this blog post didn’t mean to be an exhaustive guide on what EnMasse is but just a brief introduction that I wanted to write for a long time. The Amazon news gave me this opportunity for showing you that you can really have more than just creating a broker in the Cloud and taking care of it 🙂

 

A lot of fun with … AMQP, Spark, Kafka, EnMasse, MQTT, Vert.x & IoT

When I say to someone that I work for Red Hat they say me “Ah ! Are you working on Linux ?” … No, no, no and … no ! I’m not a Linux guy, I’m not a fan boy but I’m just a daily user 🙂

All people know that Red Hat is THE company which provides the best enterprise Linux distribution well known as Red Hat Enterprise Linux (RHEL) but Red Hat is not only Linux today. Its portfolio is huge : the cloud and containers business with the OpenShift effort, the microservices offer with Vert.x, Wildfly Swarm, Spring Boot, the IoT world with the involvement in the main Eclipse Foundation projects.

The objective of this blog is just showing briefly the projects I worked (or I’m working) on since last year when I was hired on March 1st. They are not “my” projects, they are projects I’m involved because the entire team is working on them … collaboration, you know 🙂

You could be surprised about that but … there is no Linux ! I’m on the messaging & IoT team, so you will see only projects about this stuff 🙂

AMQP – Apache Spark connector

This “little” component is strictly related to the “big” radanalytics.io project which takes the powerful of Apache Spark for analytics (batch, real-time, machine learning, …) running on OpenShift.

Because the messaging team works mainly on projects like ActiveMQ Artemis and the Qpid Dispatch Router, where the main protocol is AMQP 1.0, the idea was developing a connector for Spark Streaming in order to ingest data through this protocol so from queues/topics on a broker or through the router in a direct messaging fashion.

You can find the component here and even an IoT demo here which shows how it’s possible to ingest data through AMQP 1.0 using the EnMasse project (see below) and then executing a real time streaming analytics with Spark Streaming, all running on Kubernetes and OpenShift.

AMQP – Apache Kafka bridge

Apache Kafka is one of the best technologies used today for ingesting data (i.e. IoT related scenarios) with an high throughput. Even in this case, the idea was providing a way for having AMQP 1.0 clients and JMS clients pushing messages to Apache Kafka topics without knowing the related custom protocol.

In this way, if you have such clients because you are already using a broker technology but then you need some specific Kafka features (i.e. re-reading streams), you can just switch the messaging system (from the broker to Kafka) and using the bridge you don’t need to update or modify clients. I showed how this is possible at the Red Hat summit as well and the related demo is available here.

MQTT on EnMasse

EnMasse is an open source messaging platform, with focus on scalability and performance. It can run on your own infrastructure (on premise) or in the cloud, and simplifies the deployment of messaging infrastructure.

It’s based on other open source projects like ActiveMQ Artemis and Qpid Dispatch Router supporting the AMQP 1.0 protocol natively.

In order to provide support for the MQTT protocol, we designed how to take “MQTT over AMQP” so having MQTT features on the AMQP protocol. From the design we moved to develop two main components :

  • the MQTT gateway which handles connections with remote MQTT clients translating all messages from MQTT to AMQP and vice versa;
  • the MQTT LWT (Last and Will Testament) service which provides a way for notifying all clients connected to EnMasse that another client is suddenly died sending them its “will message”. The great thing about this service, is that it works with pure AMQP 1.0 clients so bringing the LWT feature on AMQP as well : for this reason the team is thinking to change its name just in AMQP LWT service.

EnMasse is great for IoT scenarios in order to handle a huge number of connections and ingesting a lot of data using AMQP and MQTT as protocols. I used it in all my IoT demos for showing how it’s possible to integrate it with streaming and analytics frameworks. It’s also the main choice as messaging infrastructure in the cloud for the Eclipse Hono project.

Vert.x and the IoT components

Vert.x is a great toolkit for developing reactive applications running on a JVM.

The reactive applications manifesto fits really well for IoT scenarios where responsiveness, resiliency, elasticity and the communication driven by messages are the pillars of all the IoT solutions.

Starting to work on the MQTT gateway for EnMasse using Vert.x for that, I decided to develop an MQTT server that was just able to handle communication with remote clients providing an API for interacting with them : this component was used for bridging MQTT to AMQP (in EnMasse) but can be used for any scenario where a sort of protocol translation or integration is needed (i.e. MQTT to Vert.x Event Bus, to Kafka, …). Pay attention, it’s not a full broker !

The other component was the Apache Kafka client, mainly developed by Julien Viet (lead on Vert.x) and then passed to me as maintainer for improving it and adding new features from the first release.

Finally, thanks to the Google Summer of Code, during the last 2 months I have been mentoring a student who is working on developing a Vert.x native MQTT client.

As you can see the Vert.x toolkit is really growing from an IoT perspective other then providing a lot of components useful for developing pure microservices based solutions.

Eclipse Hono

Eclipse Hono is a project under the big Eclipse IoT umbrealla in the Eclipse Foundation. It provides a service interfaces for connecting large numbers of IoT devices to a back end and interacting with them in a uniform way regardless of the device communication protocol.

It supports scalable and secure ingestion of large volumes of sensor data by means of its Telemetry API. The Command & Control API allows for sending commands (request messages) to devices and receive a reply to such a command from a device asynchronously in a reliable way.

This project is mainly developed by Red Hat and Bosch and I gave my support on designing all the API other then implementing the MQTT adapter even in this case using the Vert.x MQTT server component.

Because Eclipse Hono works on top of a messaging infrastructure for allowing messages exchange, the main choice was using ActiveMQ Artemis and the Qpid Dispatch Router even running them using Kubernetes and OpenShift with EnMasse.

Apache Kafka

Finally, I was involved to develop a POC named “barnabas” (a messenger character from a Frank Kafka novel :-)) in order to take Apache Kafka running on OpenShift.

Considering the stetaful nature of a project like Kafka, I started when Kubernetes didn’t offer the StatefulSets feature doing something similar by myself. Today, the available deploy is based on StatefulSets and it’s a work in progress on which I’ll continue to work for pushing the POC to the next level.

Apache Kafka is a really great project which has its own use cases in the messaging world; today it’s more powerful thanks to the new Streams API which allows to execute a real time streaming analytics using topics from your cluster and running simple applications. My next step is to move my EnMasse + Spark demo to an EnMasse + Kafka (and streaming) deployment. I’m also giving my support on the Apache Kafka code.

Conclusion

The variety and heterogeneity of all the above projects is giving me a lot of fun in my day by day work even collaborating with different people with different knowledge. I like learning new stuff and the great thing is that … things to learn are endless ! 🙂

 

How to learn Kubernetes ? “Kubernetes in action” the answer !

Few months ago I started the Manning Access Early Program (MEAP) for one of the books that I think is the best resource for all the newbies who want to start studying Kubernetes and for all the experts who want to dig into its details and internals.

It’s “Kubernetes in action” written by Marko Luksa , one of my colleagues in Red Hat as software engineer in the Cloud Enablement Team.

First of all, I can guarantee on the author ! Marko has a really deep knowledge of Kubernetes and OpenShift and luckily for us, he decided to write a book about that. He is a very nice person, always available to help you to solve any problem that you are facing with Kube. I was lucky to start working with Marko when the EnMasse project was born.

Speaking about the book, it’s awesome !

After the first part introducing Docker and Kubernetes and the first steps with it, the book moves to the core concepts.

You can find all information about what pods are and how you can deploy containers (so your applications) with them and finally how you can replicate pods. Then how to make applications accessible inside and outside the cluster using services and how to use storage for having a persistence layer for your data shared between pods and always available on restarts. Do you want to know how your application is configurable even with sensitive data (i.e. certificates and credentials), you will find such information in this book !

After covering all these core concepts, the last big part has the objective to bring you more deep information about Kubernetes. First of all, the new StatefulSets feature which allows to deploy stateful applications with their stable identity and stored data across restarts. Then a really interesting look to Kubernetes internals speaking about all the components which made it : etcd, the API server, the controller manager and all the other stuff. Managing resources is another interesting chapter describing how you can request specific resources in terms of CPU and memory for the containers even setting limits on them.

But one of the big advantage of using a container orchestrator like Kubernetes is the possibility to scale your infrastructure based on the load you have against the applications. You will find this information about auto-scaling on CPU utilization or custom metrics !

The final part is enriched by best practices for developing cloud native applications which run on Kubernetes. In order to learn a new technology, knowing how it works is the main part but it’s more useful having examples and patterns provided by expert people like Marko.

Finally, the book ends explaining how it’s possible to extend Kubernetes defining new components and custom API objects so showing a powerful feature like its extensibility.

In conclusion, I think that this book deserves to be read and to be part of your books collection even because after reading it you’ll become expert on two technologies for developing cloud native applications : not only Kubernetes but OpenShift as well ! 🙂

Today meetup … “Open sourcing the IoT : running EnMasse on Kubernetes”

Yes … I’m at the airport waiting for my flight coming back home and I like to write something about the reason of my trip … as usual.

IMG_20170605_132419 DBjIUg1W0AEL7u7

Today, I had a meetup in Milan hosted in the Microsoft Office and organized by my friend Felice Pescatore who leads the AgileIoT project; of course my session was about messaging and IoT … so no news on that. The title ? “Open sourcing the IoT : running EnMasse on Kubernetes”.

Other friends were there with their sessions like Felice himself, Valter Minute speaking about how moving from an IoT prototype to a product and Clemente Giorio and Matteo Valoriani with very interesting sessions about Holo Lens real scenarios.

I started with an introduction about messaging and how it is related to the IoT then moving to the EnMasse project, an open source “messaging as a service” platform that is well suited for being the messaging infrastructure of an IoT solution (for example, it’s applicable inside the Eclipse Hono project).

I showed main EnMasse features and the new ones which will come in the next weeks and how EnMasse provides a messaging and IoT solution from an “on-premise” deployment to the “cloud” in a Kubernetes or OpenShift cluster. For this reason I said “open sourcing the IoT”, because all the components in such solution are open source !

IMG_20170605_132407 IMG_20170605_132359

For showing that, I had a demo with a Kubernetes cluster running on Azure Container Service deploying EnMasse and Apache Spark on that. This demo was made of an AMQP publisher sending simulated temperature values to a “temperature” address deployed in EnMasse (as a queue) and a Spark Streaming job reading such values in order to process them in real time and getting the max value in the latest 5 seconds writing the result to the “max” address (another queue); finally an AMQP receiver was running in order to read and show such values from “max”.

If you want to know more about that you can find the following resources :

“Hostpath” based volumes dynamically provisioned on OpenShift

Storage is one of the critical pieces in a Kubernetes/OpenShift deployment for those applications which need to store persistent data; a good example is represented by “stateful” applications that are deployed using Stateful Sets (previously known as Pet Sets).

In order to do that, one or more persistent volumes are manually provisioned by the cluster admin and the applications can use persistent volume claims for having access to them (read/write). Starting from 1.2 release (as alpha), Kubernetes offers the dynamic provisioning feature for avoiding the pre-provisioning by the cluster admin and allowing auto-provisioning of persistent volumes when they are requested by users. In the current 1.6 release, this feature is now considered in the stable state (you can read more about that at following link).

As described in the above link, there is a provisioner which is able to provision persistent volumes as requested by users through a specified storage class. In general, each cloud provider (Amazon Web Services, Microsoft Azure, Google Cloud Platform, …) allows to use some default provisioners but for a local deployment on a single node cluster (i.e. for developing purpose) there is no default provisioner for using an “hostpath” (providing a persistent volume through the host in a local directory).

There is the following project (in the “Kubernetes incubator”) which provides a library for developing a custom external provisioner and one of the examples is exactly a provisioner for using a local directory on the host for persistent volumes : the hostpath-provisioner.

In this article, I’ll explain the steps needed to have the “hostpath provisioner” working on an OpenShift cluster and what I have learned during this journey. My intention is to provide a unique guide gathering information taken from various sources like the official repository.

Installing Golang

First of all,  I didn’t have Go language on my Fedora 24 and the first thing to know is that the version 1.7 (or above) is needed because the “context” package (added in the 1.7 release) is needed. I started installing the default Go version provided by Fedora 24 repositories (1.6.5) but receiving the following error trying to build the provisioner :

vendor/k8s.io/client-go/rest/request.go:21:2: cannot find package "context" in any of:
 /home/ppatiern/go/src/hostpath-provisioner/vendor/context (vendor tree)
 /usr/lib/golang/src/context (from $GOROOT)
 /home/ppatiern/go/src/context (from $GOPATH)

In order to install Go 1.7 manually, after downloading the tar file from the web site, you can extract it in the following way :

tar -zxvf go1.7.5.linux-amd64.tar.gz -C /usr/local

After that, two main environment variables are needed to be set for having the Go compiler and runtime working fine.

  • GOROOT : the directory where Go is just installed (i.e. /usr/local/go)
  • GOPATH : the directory with the Go workspace (where we need to create two other directories there, the src and bin)

Modifying the .bashrc (or the .bash_profile) file we can export such environment variables.

export GOPATH=$HOME/go
PATH=$PATH:$GOPATH/bin
export GOROOT=/usr/local/go
PATH=$PATH:$GOROOT/bin

Having the GOPATH/bin in the PATH is needed as we’ll see in the next step.

Installing Glide

The provisioner project we want to build has some Go dependecies and Glide is used as dependencies manager.

It can be installed in the following way :

curl https://glide.sh/get | sh

This command downloads the needed files and builds the Glide binary copying it into the GOPATH/bin directory (so we need to have that into the PATH as already done for using glide on the command line).

Building the “hostpath-provisioner”

First of all we need to clone the GitHub repository from here and then launching the make command from the docs/demo/hostpath-provisioner directory.

The Makefile has the following steps :

  • using Glide in order to download all the needed dependencies.
  • compiling the hostpath-provisioner application.
  • building a Docker image which contains the above application.

It means that this provisioner needs to be deployed in the cluster in order to provide the dynamic provisioning feature to the other pods/containers which needs persistent volumes created dynamically.

Deploying the “hostpath-provisioner”

This provisioner is going to use a directory on the host for persistent volumes. The name of the root folder is hardcoded in the implementation and it is /tmp/hostpath-provisioner. Every time an application will claim for using a persistent volume, a new child directory will be created under this one.

Such root folder needs to be created having all access for reading and writing :

mkdir -p /tmp/hostpath-provisioner
chmod 777 /tmp/hostpath-provisioner

In order to run the “hostpath-provisioner” in a cluster with RBAC (Role Based Access Control) enabled or on OpenShift you must authorize the provisioner.

First of all, create a ServiceAccount resource described in the following way :

apiVersion: v1
kind: ServiceAccount
metadata:
 name: hostpath-provisioner

then a ClusterRole :

kind: ClusterRole
apiVersion: v1
metadata:
 name: hostpath-provisioner-runner
rules:
 - apiGroups: [""]
 resources: ["persistentvolumes"]
 verbs: ["get", "list", "watch", "create", "delete"]
 - apiGroups: [""]
 resources: ["persistentvolumeclaims"]
 verbs: ["get", "list", "watch", "update"]
 - apiGroups: ["storage.k8s.io"]
 resources: ["storageclasses"]
 verbs: ["get", "list", "watch"]
 - apiGroups: [""]
 resources: ["events"]
 verbs: ["list", "watch", "create", "update", "patch"]
 - apiGroups: [""]
 resources: ["services", "endpoints"]
 verbs: ["get"]

It’s needed because the controller requires authorization to perform the above API calls (i.e. listing, watching, creating and deleting persistent volumes and so on).

Let’s create a sample project for that, save the above resources in two different files (i.e. serviceaccount.yaml and openshift-clusterrole.yaml) and finally create these resources.

oc new-project test-provisioner
oc create -f serviceaccount.yaml
oc create -f openshift-clusterrole.yaml

Finally we need to provide such authorization in the following way :

oc adm policy add-scc-to-user hostmount-anyuid system:serviceaccount:test-provisioner:hostpath-provisioner
oc adm policy add-cluster-role-to-user hostpath-provisioner-runner system:serviceaccount:test-provisioner:hostpath-provisioner

The “hostpath-provisioner” example provides a pod.yaml file which describes the Pod to deploy for having the provisioner running in the cluster. Before creating the Pod we need to modify this file, setting the spec.serviceAccount property to the that in this case is just “hostpath-provisioner” (as described in the serviceaccount.yaml file).

kind: Pod
apiVersion: v1
metadata:
 name: hostpath-provisioner
spec:
 containers:
 - name: hostpath-provisioner
 image: hostpath-provisioner:latest
 imagePullPolicy: "IfNotPresent"
 env:
 - name: NODE_NAME
 valueFrom:
 fieldRef:
 fieldPath: spec.nodeName
 volumeMounts:
 - name: pv-volume
 mountPath: /tmp/hostpath-provisioner
 serviceAccount: hostpath-provisioner
 volumes:
 - name: pv-volume
 hostPath:
 path: /tmp/hostpath-provisioner

Last steps … just creating the Pod and then the StorageClass and the PersistentVolumeClaim using the provided class.yaml and claim.yaml files.

oc create -f pod.yaml
oc create -f class.yaml
oc create -f claim.yaml

Finally we have a “hostpath-provisioner” deployed in the cluster that is ready to provision persistent volumes as requested by the other applications running in the same cluster.

Selection_040

See the provisioner working

For checking that the provisioner is really working, there is a test-pod.yaml file in the project which starts a pod claiming for a persistent volume in order to create a SUCCESS file inside it.

After starting the pod :

oc create -f test-pod.yaml

we should see a SUCCESS file inside a child directory with a very long name inside the root /tmp/hostpath-provisioner.

ls /tmp/hostpath-provisioner/pvc-1c565a55-1935-11e7-b98c-54ee758f9350/
SUCCESS

It means that the provisioner has handled the claim request in the correct way, providing a volume to the test-pod in order to write the file.