AMQP protocol : the builtin type system by examples

One of the most interesting features of AMQP protocol is the built in type system that provides a way to represent the AMQP values inside a frame starting from primitive types to custom types (based on our application domain) and composite types (containing fields).

During this article I’ll try to explain in a simple way the main features of the type system but for better understanding you can refer the official AMQP specification available at following link.

AMQP type system : a brief introduction

First of all, any AMQP data is encoded in the following way :

amqp_type

As you can see, it has a constructor followed by untyped bytes. The constructor could be a code to represent a primitive type (named primitive format-code) or a code to represent a described type that is useful to describe custom/complex types in our application domain; a described type consists of a descriptor followed by a primitive format-code.

For example, the encoding for the “Hello world” string is the following :

0xA1 0x0B “Hello World”

where :

  • 0xA1 : it’s the simple constructor with a single byte that represents the primitive format-code for UTF8 encoded strings. This format-code has the higher nibble “A” (named subcategory) that represents a data with variable width defined by a single byte so with size between 0-255 (page 19 on official specification). The lower nibble “1” is named subtype and the entire byte represents the type UTF8 encoded string with a max length of 255 characters (page 29 of official specification);
  • 0x0B : it’s the length of the string (11 bytes). As you can see, the length is represented with a single byte as described by the previous constructor 0xA1;
  • “Hello world” : the UTF8 encoded string represented as AMQP value;

Representing a primitive type like a string is very simple because the constructor is simple itself but describing a custom type is more complex. Using the same example from the official specification, imagine you want to define a custom type “URL” (of course, it’s a string but not all strings are URLs) and to encode an URL as AMQP value. In this case, the constructor doesn’t consist of a single byte (a simple primitive format-code) but it has a descriptor followed by a primitive format-code. The very interesting thing is that the descriptor itself is a valid AMQP value so it’s encoded in the same way ! It means that we need a constructor followed untyped bytes to define a descriptor !

Referring the URL example we have :

0x00 0xA1 0x03 “URL” 0xA1 0x1E “http://example.org/hello-world” 

where :

  • 0x00 : it’s the format-code that defines a descriptor (page 18 of official specification);
  • 0xA1 : it’s the primitive format-code used to define the custom type within the descriptor. Our custom type is defined as the UTF8 encoded string “URL” so a string with max length above 255 characters;
  • 0x03 : it’s the length of the string (3 bytes);
  • “URL” : the UTF8 encoded string used to define the type with the whole descriptor;
  • 0xA1 : it’s the primitive format-code that is part of the constructor for a described type and follows the above descriptor. In this case it represents an UTF8 encoded string with max length of 255 characters;
  • 0x1E : it’s the length of the string (30 bytes);
  • http://example.org/hello-world ” : the UTF8 encoded string represented as AMQP value;

In this way we don’t represent a simple string but a more specific string because we have defined the custom type “URL” followed by the URL value.

Of course, the AMQP type system is more complex and provides a lot of simple and complex type from scalar (integral numbers, booleans, strings and so on) to collections (arrays, lists and maps). To deep into them you can refer to the official AMQP specification.

Inside a sample frame : SASL mechanisms list

After a quick introduction to the AMQP type system, we can try to analyze a real frame sniffing the AMQP traffic with Wireshark who provides a specific AMQP protocol analyzer. Of course, we can’t collect this traffic if our client connects to Azure Service Bus (queues, topics/subscriptions or event hubs) because this traffic is encrypted with an SSL/TLS connection. For this reason, to have plain (not encrypted) traffic, I used the AMQP broker built in the AMQP .Net Lite library and my Azure SB Lite as client.

01_wireshark_amqp

After TCP handshaking between client and server and after the protocol header exchange, there is the SASL traffic that is used to define the security and authentication layer that will be used for data exchange. The first is the sasl-mechanisms list frame sent from the server to the client to announce the supported authentication mechanisms (for more information, starting from page 113 of official specification).

Following a screenshot of such a frame :

02_sasl_mechanisms_frame

The first 8 bytes are part of the AMQP frame header (page 37 of official specification) as it’s defined by the specification so we have :

  • first 4 bytes (0x00 0x00 0x00 0x1B) represent the entire frame length (which is 27 bytes) as decoded by Wireshark plugin;
  • the 5th byte is the DOFF byte (data offset) which represents where the frame data starts (the starting offeset is 4 * DOFF);
  • the 6th byte is the frame type so what’s the content inside the payload. In this case it’s 0x01 that represents a SASL frame;
  • the last 2 bytes of the header are ignored in the case of a SASL frame. In a more generic AMQP frame they represents the channel and it’s the reason why Wireshark plugin decodes them as channel 0 even if it’s a non sense.

After the AMQP frame header we have the frame body that is encoded using the “awesome” builtin type system.

sasl_frame

The first byte is 0x00 (descriptor format-code), so it means that we have a not simple constructor that has a descriptor inside it for defining a custom type. What’s this custom type we are describing ? It’s the SASL mechanisms list that contains all the authentication mechanisms supported by the server. This descriptor has 0x53 as primitive format-code that indicates a fixed-one value so an integral value (ulong type on page 26 of official specification) with a size expressed by a single byte (0 – 255). The following value 0x40 represents the custom SASL mechanisms list type as defined in the official specification on page 115. As we can see on this page, it’s a composite type which is a list as we can expected and it’s represented by the following 0xC0 byte (page 29 of official specification).

In the AMQP type system, a list is a polymorphic collection (as compound type) that could contain values with different types (ex. an integral, a boolean, an array …) so it’s encoded as follow :

amqp_compound_type

There is the total size of the list in byte (SIZE) and the number of items (COUNT). Each item consists of a constructor and the related data (this is the way for the list to be polymorphic, because each item doesn’t have only the value but the constructor to specify the type too). Returning to our frame we have 0x0E that is the size of the list (14 bytes) followed by 0x01 (the number of items, only one).

But … what’s the single item inside this list ? Of course it starts with the constructor that is represented with 0xE0 and it meas the format-code for an array. In the AMQP type system, an array is a monomorphic collection so that all the items are of the same type. For this reason, it’s encoded in the following way :

amqp_array_type

The encoding is like the list type (SIZE and COUNT) but it has only one constructor to define the single type for all the items inside the collection followed by all the item values themself.

In our frame, we have 0x0B (total size of array in bytes, 11 bytes) and 0x01 (only one element inside the array). After that we have the item constructor represented by 0xB3 format-code that is a “symbol” type (page 29 of the official specification). Symbols are values from a constrained domain; you can think them as enums with a limited number of possible values. The 0xB3 code represents a symbol with a size expressed with 4 bytes, so the following 4 bytes in the frame (0x00 0x00 0x00 0x05) represent a size of 5 bytes for the symbol value.

The symbol value is “PLAIN” and it means that the broker supports only PLAIN authentication method. We can notice that the frame value isn’t encoded as a simple string (primitive format-code 0xA1 we saw previously); a symbol is used because it wants to define an “enum” due to the limited possible values for SASL authentication mechanisms (PLAIN, EXTERNAL, ANONYMOUS and so on).

amqp_sasl_mechanisms_notes

Conclusion

Finally we have decoded the frame and it means we have a list with only one element that is an array of only one element that is the PLAIN symbol !

We could ask why there is a list with an array inside it and not a simple array without external list ! From the official specification (page 22) we can see that a composite type is sent on the wire as a list and it must be always a list. The SASL mechanisms frame is described as a composite type (page 115) so it must be a list.

Inside it we have an array because the broker may support more than one authentication methods and not only one as in this example.

I hope you found this article useful to understand the AMQP type system but you have to consider this blog post as a starting point to deep into the official specification.

4 comments

  1. Nice article! One point on the sasl-mechanisms frame is that the sasl-server-mechanisms field is specified to be a symbol supporting multiple values. So an array with a single element is a valid value, but so would be a simple symbol. As to why the sasl-mechanisms type is defined as a list, I believe the use of lists for composite types in general is to allow for future extensibility. So depending on how a library is written, if a future version of the spec was add some field, the library may not have to actually be altered in order to be able to read the frame (and any subsequent ones). Clearly it or the application would have to be updated to take advantage of any information supplied in that field of course.

    1. Thanks Gordon !

      Good point about extensibility. I think that the main reason for the list usage is the possibility to add new values in a “polymorphic” way. If it was a simple array, we could have values of same type (monomorphic). Using a list they added extensibility for new values (in the future) of different types inside it.

Leave a reply to ppatierno Cancel reply