Month: July 2015

M2Mqtt on MQTT encyclopedia and me on DotNetPodcast with “IoT end to end”

Today, I’d like to highlight some stuff …

First, my guest article on the M2Mqtt project in the “MQTT Client Library Encyclopedia” blog post series hosted on HiveMQ web site (thanks to the team for asking me to write this entry). It’s great for this library to be part of this “encyclopedia” that will host all main MQTT client implementations in all different languages and technologies in the next weeks. M2Mqtt represents the Microsoft based one developed in C# and available for all .Net platforms (full, compact and micro) and WinRT (Windows 8.1, Windows Phone 8.1 and Windows 10).

Second, as always I want to thanks the DotNetPodcast team for registering a new podcast, titled “IoT end to end”, on building an Internet of Things solution from devices to the cloud using all Microsoft Azure related technolgies. I described all the main steps that builds an IoT scenario speaking about devices, gateway (in the field and in the cloud), protocols, data ingestion with Azure Event Hubs, analysis in real time with Azure Stream Analytics, data storage and presentation to the user. Of course I spoke a little about the future with IoT suite and the new IoT Hub service. I’m sorry for all my non italian friends … but the podcast is in italian. If you like you can find all my podcasts (only two for now) on the dedicated page of my blog.

AMQP message accepted : encoding on the wire

We saw the AMQP builtin type system, we sent a message to the broker …. now, how is encoded the receive confirmation from the broker itself ?

In this case, the so called performative is the disposition (format code 0x15, page 71 on official specification) that has some fields all filled in a lot of cases.

Of course it has the descriptor 0x00 0x53 and 0x15 as format code and it’s a list 0xC0 with 0x09 bytes distributed on 0x05 fields.

The first field is role that defines the role of the peer that could be sender or receiver. In this case the value is 0x41 that is the “true” value for the boolean type (page 26 of official specification). Why ? What’s the relationship between role and boolean ? The role type is defined as a restricted type that has boolean has source. You can see it like an inheritance because role derives from the existing boolean type restricting and/or changing the mean of the related values using choice (page 73 of official specification). In this case we have that sender is “false” and receiver is “true”. In our case we have 0x41 (true) so the role is the receiver and it’s true because we send a message to the broker (to a queue) and it means that on the other side of the link, the peer is the receiver.

After role, we have first & last fields. Each message sent by the sender has a “delivery-id”. When the receiver sends the disposition to the sender to confirm received message, it needs to set the “delivery-id” of the message itself. It’s possible to define a “delivery-ids” interval so that the receiver can confirm more received messages with a single disposition. The two limits are defined by first and last fields that are of type delivery-number (page 74 of official specification) that is a restricted type with sequence-no as base (it’s a serial number as defined in RFC-1982). In our example the receiver has to confirm only one message so first is 0 and last is null. As you can see, the 0 value is expressed with the byte 0x43 that is a specific and well defined byte used to describe a 0 value (uint0, page 26 of official specification).

Next field is settled that defines the delivery status of the message on the endpoint/peer. This value is also related to the QoS (Quality of Service) and it acts as an aknowledge on messages exchanged. For example, the settled field is in the transfer disposition too; a sender can set it to true and it means to apply a “fire and forget” QoS (the sender sends the message and it isn’t interested to have an acknowledge by the receiver). If the sender set it to false, it expects to receive a disposition with settled to true from receiver (it’s a “at least once” QoS because if the disposition message is lost, the sender re-send the message). Of course, AMQP supports “exactly once” QoS (it isn’t supported by Azure).

In this case its value is 0x41 (true) because it’s defined as a boolean type;

The last field is state that defines the delivery outcome with following possible values :

accepted : the message was processed by the receiver;
rejected : indicates an invalid and unprocessable message;
released : the message was not (and will not be) processed;
modified : message was modified, but not processed;

Due to the above outcomes, a message can go through following state transitions :

In our example the message was accepted and the field has the descriptor 0x00 0x53 0x24 where the 0x24 format code defines the accepted outcome (page 89 of official specification) that is a list. In this case the list is empty and it’s defined by the byte 0x45 used to describe an “empty list”.

AMQP on the wire : messages content framing

After the previous post on the AMQP builtin type system, I’d like to show you a different sample frame strictly related to the sending messages. For starting, I used the SASL mechanisms frame as example to understand the encoding/decoding system just because it’s one of the first exchanged frames in the protocol but every day we use AMQP to send messages so understand their encoding is much more important.

An AMQP message on the wire

Consider the following source code that uses AMQP .Net Lite library to send a simple message :

Connection connection = new Connection(address);
Session session = new Session(connection);
SenderLink sender = new SenderLink(session, "sender-" + testName, "q1");

Message message = new Message("Hello");
message.Properties = new Properties();
message.Properties.MessageId = Guid.NewGuid().ToString();
message.ApplicationProperties = new ApplicationProperties();
message.ApplicationProperties["prop1"] = 1; 
message.ApplicationProperties["prop2"] = "value";

sender.Send(message, 60000);

In the above code, we are interested in the encoding and framing used in the AMQP protocol to send the Message object on the wire. As reported on the official specification (page 38), each frame has a format as showed in the following picture (we already saw this format for the SASL frames with some “little” changes).

As we can see, the real body starts with the so called performative; you can think about it as an “operation” executed against the AMQP broker (opening the connection or the session, attaching the link, transfering a message and so on).

In this case, we have the transfer performative for sending messages and the encoded frame is much more complex than the first one (SASL mechanisms).

The objective of this article is to understand how the builtin type system is used to encode the three main parts of an AMQP message.

From the above picture we’ll consider :

Message properties : system properties well defined by the AMQP specification and used by the broker for manipulation, routing and so on;
Application properties : user defined properties to carry data without using the following payload;
Application data : user defined payload of the message;

The content of all these parts is encoded using the builtin type system. Starting from the “Message-Properties” section we have the descriptor (0x00 0x53 0x73) where the 0x73 code defines the “AMQP properties list” (page 84 of official specification) that is a list (format code 0xC0) of 39 bytes (0x27) with only one element (0x01). This element is an UTF8 encoded string (0xA1) with size of 36 characters expressed with only one byte (0x24). This string is the “Message-Id” we set in our source code (it’s the autogenerated GUID value). You can see that there is no information to describe that this field is the “Message-Id” field but we know that because it’s the first field of the composite type “AMQP properties list”. If the “Message-Id” field weren’t explicitly set by code, it would be null but encoded inside the properties list itself. This means that all missing fields between two defined fields are encoded as null but inserted inside the list (to guarantee the fields sequence); if all subsequent fields (until the end of the list) aren’t set, they aren’t encoded inside the frame (to avoid a waste of space).

The “Application-Properties” section starts with its related descriptor (0x00 0x53 0x74) where the 0x74 code defines the “Application properties map” (page 86 of official specification) that is a map (format code 0xC1) of 24 bytes (0x18) with four elements (0x04). A map is a compound type in the AMQP builtin type system we didn’t see yet; it’s a collection with key-value pairs each encoded as defined in the specification.

In our example, we have two pairs (“prop1” = 1 and “prop2” = “value”) so four elements in the map. The name of the first key “prop1” is encoded as a UTF8 string (code 0xA1) with 0x05 a size (five characters for “prop1”) and the related value that is the string “value” is encoded in the same way. Then there is the name of the second key “prop2” and the related value 0x01 that is an integral type expressed with only one byte (format code 0x54, page 27 on official specification).

The last part of the message is the data section (the “Hello” string) that has the descriptor (0x00 0x53 0x77) where the code 0x77 defines an “AMQP value” (page 86 of official specification) that in this case is an UTF8 string (0xA1) with 0x05 characters.

Changing the data as raw binary

The interesting thing is that the data part of an AMQP message could be encoded as an “AMQP value” (as previous), as an “AMQP sequence” (of values) or as pure “Binary” data.

Consider the previous example but with following changes related to the creation of Message instance.

Message message = new Message()
{
    BodySection = new Data() { Binary = Encoding.UTF8.GetBytes("Hello") }
};

Using the BodySection as Data object and the related Binary field the encoding of “Hello” string on the wire is defined as binary (0xA0) with 0x05 following bytes.

Encoding the data in the following way is an advantage for the consumer of the message that needs to understand pure binary data and not “AMQP value” encoding.

Conclusion

The AMQP message format seems to be more complex than any other protocol but deeping into it …. it’s seems to be simpler.

Of course, there is another part in the message we didn’t consider named the “Message Annotations” that are used a lot for customization inside brokers like for Event Hubs inside Microsoft Azure Service Bus.

However, we can’t see them on the wire because we know that Service Bus traffic is encrypted (using SSL/TLS protocol) and the local AMQP broker for testing doesn’t support event hubs for clear reasons 🙂

AMQP protocol : the builtin type system by examples

One of the most interesting features of AMQP protocol is the built in type system that provides a way to represent the AMQP values inside a frame starting from primitive types to custom types (based on our application domain) and composite types (containing fields).

During this article I’ll try to explain in a simple way the main features of the type system but for better understanding you can refer the official AMQP specification available at following link.

AMQP type system : a brief introduction

First of all, any AMQP data is encoded in the following way :

As you can see, it has a constructor followed by untyped bytes. The constructor could be a code to represent a primitive type (named primitive format-code) or a code to represent a described type that is useful to describe custom/complex types in our application domain; a described type consists of a descriptor followed by a primitive format-code.

For example, the encoding for the “Hello world” string is the following :

0xA1 0x0B “Hello World”

where :

0xA1 : it’s the simple constructor with a single byte that represents the primitive format-code for UTF8 encoded strings. This format-code has the higher nibble “A” (named subcategory) that represents a data with variable width defined by a single byte so with size between 0-255 (page 19 on official specification). The lower nibble “1” is named subtype and the entire byte represents the type UTF8 encoded string with a max length of 255 characters (page 29 of official specification);
0x0B : it’s the length of the string (11 bytes). As you can see, the length is represented with a single byte as described by the previous constructor 0xA1;
“Hello world” : the UTF8 encoded string represented as AMQP value;

Representing a primitive type like a string is very simple because the constructor is simple itself but describing a custom type is more complex. Using the same example from the official specification, imagine you want to define a custom type “URL” (of course, it’s a string but not all strings are URLs) and to encode an URL as AMQP value. In this case, the constructor doesn’t consist of a single byte (a simple primitive format-code) but it has a descriptor followed by a primitive format-code. The very interesting thing is that the descriptor itself is a valid AMQP value so it’s encoded in the same way ! It means that we need a constructor followed untyped bytes to define a descriptor !

Referring the URL example we have :

0x00 0xA1 0x03 “URL” 0xA1 0x1E “http://example.org/hello-world”

where :

0x00 : it’s the format-code that defines a descriptor (page 18 of official specification);
0xA1 : it’s the primitive format-code used to define the custom type within the descriptor. Our custom type is defined as the UTF8 encoded string “URL” so a string with max length above 255 characters;
0x03 : it’s the length of the string (3 bytes);
“URL” : the UTF8 encoded string used to define the type with the whole descriptor;
0xA1 : it’s the primitive format-code that is part of the constructor for a described type and follows the above descriptor. In this case it represents an UTF8 encoded string with max length of 255 characters;
0x1E : it’s the length of the string (30 bytes);
“http://example.org/hello-world ” : the UTF8 encoded string represented as AMQP value;

In this way we don’t represent a simple string but a more specific string because we have defined the custom type “URL” followed by the URL value.

Of course, the AMQP type system is more complex and provides a lot of simple and complex type from scalar (integral numbers, booleans, strings and so on) to collections (arrays, lists and maps). To deep into them you can refer to the official AMQP specification.

Inside a sample frame : SASL mechanisms list

After a quick introduction to the AMQP type system, we can try to analyze a real frame sniffing the AMQP traffic with Wireshark who provides a specific AMQP protocol analyzer. Of course, we can’t collect this traffic if our client connects to Azure Service Bus (queues, topics/subscriptions or event hubs) because this traffic is encrypted with an SSL/TLS connection. For this reason, to have plain (not encrypted) traffic, I used the AMQP broker built in the AMQP .Net Lite library and my Azure SB Lite as client.

After TCP handshaking between client and server and after the protocol header exchange, there is the SASL traffic that is used to define the security and authentication layer that will be used for data exchange. The first is the sasl-mechanisms list frame sent from the server to the client to announce the supported authentication mechanisms (for more information, starting from page 113 of official specification).

Following a screenshot of such a frame :

The first 8 bytes are part of the AMQP frame header (page 37 of official specification) as it’s defined by the specification so we have :

first 4 bytes (0x00 0x00 0x00 0x1B) represent the entire frame length (which is 27 bytes) as decoded by Wireshark plugin;
the 5th byte is the DOFF byte (data offset) which represents where the frame data starts (the starting offeset is 4 * DOFF);
the 6th byte is the frame type so what’s the content inside the payload. In this case it’s 0x01 that represents a SASL frame;
the last 2 bytes of the header are ignored in the case of a SASL frame. In a more generic AMQP frame they represents the channel and it’s the reason why Wireshark plugin decodes them as channel 0 even if it’s a non sense.

After the AMQP frame header we have the frame body that is encoded using the “awesome” builtin type system.

The first byte is 0x00 (descriptor format-code), so it means that we have a not simple constructor that has a descriptor inside it for defining a custom type. What’s this custom type we are describing ? It’s the SASL mechanisms list that contains all the authentication mechanisms supported by the server. This descriptor has 0x53 as primitive format-code that indicates a fixed-one value so an integral value (ulong type on page 26 of official specification) with a size expressed by a single byte (0 – 255). The following value 0x40 represents the custom SASL mechanisms list type as defined in the official specification on page 115. As we can see on this page, it’s a composite type which is a list as we can expected and it’s represented by the following 0xC0 byte (page 29 of official specification).

In the AMQP type system, a list is a polymorphic collection (as compound type) that could contain values with different types (ex. an integral, a boolean, an array …) so it’s encoded as follow :

There is the total size of the list in byte (SIZE) and the number of items (COUNT). Each item consists of a constructor and the related data (this is the way for the list to be polymorphic, because each item doesn’t have only the value but the constructor to specify the type too). Returning to our frame we have 0x0E that is the size of the list (14 bytes) followed by 0x01 (the number of items, only one).

But … what’s the single item inside this list ? Of course it starts with the constructor that is represented with 0xE0 and it meas the format-code for an array. In the AMQP type system, an array is a monomorphic collection so that all the items are of the same type. For this reason, it’s encoded in the following way :

The encoding is like the list type (SIZE and COUNT) but it has only one constructor to define the single type for all the items inside the collection followed by all the item values themself.

In our frame, we have 0x0B (total size of array in bytes, 11 bytes) and 0x01 (only one element inside the array). After that we have the item constructor represented by 0xB3 format-code that is a “symbol” type (page 29 of the official specification). Symbols are values from a constrained domain; you can think them as enums with a limited number of possible values. The 0xB3 code represents a symbol with a size expressed with 4 bytes, so the following 4 bytes in the frame (0x00 0x00 0x00 0x05) represent a size of 5 bytes for the symbol value.

The symbol value is “PLAIN” and it means that the broker supports only PLAIN authentication method. We can notice that the frame value isn’t encoded as a simple string (primitive format-code 0xA1 we saw previously); a symbol is used because it wants to define an “enum” due to the limited possible values for SASL authentication mechanisms (PLAIN, EXTERNAL, ANONYMOUS and so on).

Conclusion

Finally we have decoded the frame and it means we have a list with only one element that is an array of only one element that is the PLAIN symbol !

We could ask why there is a list with an array inside it and not a simple array without external list ! From the official specification (page 22) we can see that a composite type is sent on the wire as a list and it must be always a list. The SASL mechanisms frame is described as a composite type (page 115) so it must be a list.

Inside it we have an array because the broker may support more than one authentication methods and not only one as in this example.

I hope you found this article useful to understand the AMQP type system but you have to consider this blog post as a starting point to deep into the official specification.

DEVEXPERIENCE

Paolo Patierno's Blog

Month: July 2015