YANG Catalog: Making models work together

With YANG establishing itself as the standard modeling language of choice, the industry hit a speed bump. Although the entire industry develops YANG models, we lack a way to ensure they will work together in order for operators to automate coherent services.

The problem was, as the IETF blog elaborates, tackled at Internet Engineering Task Force’s IETF 98 Hackathon by a group of ten-ish enthusiasts, including Pantheon Technologies’ Miroslav Kováč, via integrating tools around a YANG catalog.

The idea behind the catalog is to become a reference for all available YANG modules, serving both the YANG developers as well as operators. The catalog should also provide metadata on YANG models, offering information on module implementation, availability of open-source code and possibly much more.

Compared to a Github repository, added value of the YANG catalog resides in the toolchain and the additional metadata, more about which you’ll find in the IETF article.

Martin Firák

OpenDaylight RPCs or What Could Possibly Go Wrong With Adding This One Cool Feature

OpenDaylight uses YANG as its Interface Definition Language. This is an architecture decision we have made way back in 2013 and it works reasonably well for the most part.

One of YANG concepts used rather heavily is the concept of an RPC. For YANG and its intended use in NETCONF’s client/server model it works perfectly fine, but trouble starts brewing when you borrow concepts and try to make them fit your use case.

OpenDaylight uses YANG RPCs to not only define its northbound model, but also model interactions between its individual plugins. It does this in an environment, which is a single process, but rather a cluster of nodes, each having a mesh of plugins, some activated some not.

From architecture’s view, which looks at things from an elevation of 10,000 feet, the problem of making RPCs work in this sort of environment is quite simple: all you need are registries and request routers. From implementation perspective, though, things can easily go wrong… implementations have bugs, quirks and limitations which are not immediately apparent. They just surface when you try and push the system closer to its architectural limits.

The Trouble with Names

RFC 6020 defines only the basic RPC concept and assumes there is a single implementation servicing any request for that RPC. This is okay as long as you are targeting singleton actions — like ‘ping IP’, ‘clear system log’ and similar. In a complex system, though, requests are typically associated with a particular resource — like ‘create a flow on this switch’. Since YANG did not give us this tool, we have decided to create an OpenDaylight extension to allow an RPC to be bound to a context. This gave rise to two unfortunate names: ‘Global RPCs‘ and ‘Routed RPCs‘, the first being normal RPCs and the second being bound to a context. Plus, a third name, ‘RPCs‘, to refer to either one of those concepts. Are you confused yet?

The initial implementation of these concepts was done back in 2013, when there was no clustering in sight, by a team who have spent days upon days discussing the difference. When clustering came into the implementation picture, in 2014, the implementation team attached their own meaning to the word ‘Routed’ and we ended up with an implementation, where Routed RPCs are routed between cluster nodes, but the default ones are not. That is the subject matter behind BUG-3128. It did not matter much as long as all cluster-enabled applications used Routed RPCs, but that changed with emergence of Cluster Singleton Service and its wide-spread adoption among plugins.

These days we have YANG 1.1, defined in RFC 7950, which has the same underlying concept with much less confusing names. ‘Global RPCs’ are ‘RPCs‘. ‘Routed RPCs’ are ‘actions‘. Since those terms make the conversation about semantics a reasonable affair, this is the last you hear about Global and Routed RPCs from me.

Fun with Concepts, Contexts and Contracts

In order to support both RPCs and actions, OpenDaylight’s MD-SAL infrastructure has to define a concept to identify them both. Since the two are utterly similar in what they do, DOMRpcIdentifier was born. It is used to identify either an action or an RPC. To do that is is an abstract class with two concrete, private final implementations: DOMRpcIdentifier$Global and DOMRpcIdentifier$Local. Why those names? I do not remember the details, but I could wager a guess about what I was thinking back then. At any rate, the two implementations differ only in their implementation of DOMRpcIdentifier.getContextReference(). DOMRpcIdentifier$Global’s is always empty and DOMRpcIdentifier$Local’s is always non-empty.

This is consistent with how RPCs (without a context reference) and actions (with a context reference) are invoked and it makes the API involved in the context of RPC/action invocation clean and simple. API contract. In the context of registering an RPC or action implementation, things are slightly less straightforward. It is a separate interface, with a rather terse Javadoc. In both cases there is a hint of ‘a conceptual dynamic router’, but not much in terms of details.

Unless you were very curious as to the details of the API contracts involved, after reading the documentation available, with some OpenDaylight tutorials under your belt, you would feel this is a dead-simple matter and just use the interfaces provided. Run a few test cases and everything works just fine. No trouble in sight.

About That Router Thing…

The Simultaneous Release name of OpenDaylight for the release currently in development is Carbon, meaning we have shipped 5 major releases, so this ‘dynamic router’ thing vaguely referenced actually exists somewhere and it does something to fulfill the API contracts imposed on it, otherwise the applications would not be able to work at all. The entry point into the implementation is DOMRpcRouter. Glancing over that, it contains some ugliness, but it gets the general outline of the two sides of the contract done.

Digging a bit deeper into the invocation path, you get into the fork at AbstractDOMRpcRoutingTableEntry.invokeRpc(). The RPC invocation path is rather straightforward, but the invocation path for actions is far from simple. Out of two code paths (actions and RPCs) we suddenly have 4, as an action can be invoked without a context reference as if it were an RPC and there is a brief mention of remote rpc connector registering action implementations with an empty context reference … wait … WHAT???!!!

Okay, we seem to have two implementations integrated based on implementation details, without that being supported by a single line in the API contract. The connector referenced is actually sal-remoterpc-connector and is something that is meaningful in clusters. To make some sense of this, we have to go back to 2013 again.

A Tale of Three Routers

From the get go, the MD-SAL architecture was split into two distinct worlds: Binding-Independent (BI, DOM) and Binding-Aware (BA, Binding). This split comes from two competing requirements: type-safety provided by Java for application developers who interact with specific data models and infrastructure services which are independent of data models. The former is supported by interfaces and classes generated from YANG models and generally feels like any code where you deal with DTOs. The latter is supported by an object model similar to XML DOM, where you deal with hierarchical ‘document’ trees and all you have to go by are QNames. For obvious reasons most developers interacting with OpenDaylight have never touched the BI world, even though it underpins pretty much every single feature available in the platform.

A very dated picture of how the system is organized can be found here. It is obvious that the two worlds need to seamlessly interoperate — for example RPCs invoked by one world must be able to be serviced by the other and the caller should be none the wiser. Since RPCs are the equivalent of a method call, this process needs to be as fast as possible, too. That lead to a design, where each world has its own Broker and the two brokers are connected. Invocations within the world would be handled by that world’s broker, foregoing any translation. A very old picture of how an inter-world call would look like can be seen in this diagram.

For RPCs this meant that there were two independent routing tables with re-exports being done from each of them. The idea of an RPC router was generalized in the (now long-forgotten) RpcRouter interface. Within a single node, the Binding and DOM routers would be interconnected. For clustered scenarios, a connector would be used to connect the DOM routers across all nodes. So an inter-node BA RPC request from node A to node B would go through: BA-A -> BI-A -> Connector-A -> Connector-B -> BI-B -> BA-B (and back again). Both the BI and connector speak the same language, hence can communicate without data translation.

The design was simple and effective, but has not quite survived the test of time, most notably the transition to dynamic loading of models in the Karaf container. Model loading impacts data translation services needed to cross the BA/BI barrier, leading to situations where an RPC implementation was available in BA world, but could not yet be exported to the BI world — leading to RPC routing loops, and in case of data store services missing data and deadlocks.

To solve these issues, we have decided to remove the BA/BI split from the implementation and turn the Binding-Aware world into an overlay on top of the Binding-Independent world. This means that all infrastructure services always go through BI, and the Binding RPC Broker was gradually taken behind the barn, there was a muffled sound in 2015, and these days we only have two routers, one hiding behind a connector name.

Blueprint for a New Feature

Probably the most significant pain point identified by new people coming to OpenDaylight is that the technology stack is a snowflake, providing few familiar components, with implementation and documentation being borderline hostile to newcomers. One of such pieces is the Configuration Subsystem (CSS). Driven by invalid YANG and magic XMLs, it is a model-driven service activation, dependency injection and configuration framework built on top of JMX. While it offers the ability to re-wire a running instance in a way which does not break anything half-way through reconfiguration, it is a major pain to get right.

It pre-dates MD-SAL (which offers nicer configuration change interactions) and is utterly slow (because the JMX implementation is horrible). It was also designed to safeguard against operator errors and this is quite contrary to what Karaf’s feature service provides — if you hit feature:uninstall, those services are going down without any safeties whatsoever.

To fix this particular sore spot, one of the decisions from the Beryllium design summit was to extend Blueprint with a few capabilities and start the long journey to OpenDaylight without CSS, where internal wiring would be done in Blueprint and user-visible configuration would be stored in MD-SAL configuration data store. The crash-course page is a very easy read.

You will note that there is support for injecting and publishing RPC implementations — which is a nice feature for developers. Rather than having to deal with registries, I can declare a dependency on an RPC service and have Blueprint activate me when it becomes available like this:

<odl:rpc-service id="fooRpcService" interface="org.opendaylight.app.FooRpcService"/>

I can also publish my bean as an implementation, just with a single declaration, like this:

<bean id="fooRpcService" class="org.opendaylight.app.FooRpcServiceImpl">
  <!-- constructor args -->
</bean>
<odl:rpc-implementation ref="fooRpcService"/>

This is beyond neat, this is awesome.

FooRpcService vs. DOMRpcIdentifier

We have already covered how Binding Aware layer sits on top of the Binding Independent one, but it is not a one-to-one mapping. This comes from the fact that Binding Independent layer is centered around what makes sense in YANG, whereas the Binding Aware layer is centered around what makes sense in Java, including various trade-offs and restrictions coming from them. One such difference is that RPCs do not have individual mappings, i.e. we do not generate an interface class for each RPC, but rather we generate a single interface for all RPC definitions in a particular YANG module. Hence for a model like

module foo {
    rpc first { input { ... } output { ... } }
    rpc second { input { ... } output { ... } }
}

we generate a single FooService interface

public interface FooService {
    Future<FirstOutput> first(FirstInput input);
    Future<FirstOutput> second(SecondInput input);
}

The reasoning behind this is that a particular module’s RPCs (in the broad sense, including actions) will always be implemented by a single OpenDaylight plugin and hence it makes sense to bundle them together.

An unfortunate side-effect of this is that in the Binding Aware layer, both RPCs and actions are packaged in the same interface and it is up to the intermediate layers to sort out the ambiguities. This problem is being addressed in Binding V2, where each action has its own interface, but we have to have a solution which works even in this weird setup.

Fix Some, Break Some

Considering these complexities and gaps in the API contract documentation department, it is not quite surprising that the fix for BUG-3128, while making RPCs work correctly across the cluster had the unfortunate side-effect of breaking blueprint wiring in a downstream project (OpenFlow Plugin). In order to understand why that happened, we need to explore the interactions between DOMRpcRouter, blueprint and sal-remoterpc-connector.

When blueprint sees an <odl:rpc-service/> declaration, it will wire a dependency on the specified RPC (Binding Aware) interface being available in DOMRpcService (which is a facet of DOMRpcRouter). As soon as it sees a registration, it considers the dependency satisfied and proceeds to with the wiring of the component. This is true for LLDP Speaker, too. Note how it declares a dependency on an implementation of PacketProcessingService. Try as you may, you will not find a place where the corresponding <odl:rpc-implementation/> lives. The reason for this is quite simple: this service contains a single action and an implementation is registered when an OpenFlow switch connects to the OpenDaylight instance. So how is it possible this works?

Well, it does not. At least not the way it is intended to work.

What happens is that Blueprint starts listening for an implementation of PacketProcessingService becoming available with an empty context, just as with any old RPC. Except this is an action, so somebody has to register as a global provider for the action, i.e. as being capable to dynamically invoke it based on its content and not being tied to a particular context. That someone is sal-remoterpc-connector, in its current shape an form, which does precisely what is mentioned in that terse comment. It registers itself as a dynamic router for all actions and when a request comes in, it will try to find a remote node which has registered an implementation for the specified in the invocation. That means that unbeknownst to the Blueprint extension, all actions appear to have an implementation — even if there is no component actually providing it — and therefore LLDP Speaker will always activate, just as if that dependency declaration was not there.

The fix to address BUG-3128 performed a simple thing: rather than using blanket registrations, it only propagates registrations observed on other nodes — becoming really a connector rather than a dynamic router. Since no component provides the registration at startup time, blueprint will not see the LLDP Speaker dependency as satisfied, leading to a failure to activate. Unless an OpenFlow switch happens to connect while we are waiting — in that case, activation will go through.

So we are at a fork: we either have blueprint ‘working’, or we have RPC routing in cluster working. Getting both to work at the same time, and actually fixing LLDP Speaker to activate when appropriate, we will obviously have to perform some amount surgery on multiple components.

I will detail what changes are needed to close this little can of worms in my next post, so stay tuned 🙂

 

Róbert Varga

CTO Pantheon Technologies

Sysrepo at IETF 96 Hackathon in Berlin

Sysrepo, an open source project developed by several partners including Pantheon Technologies, participated at the IETF 96 Hackathon in Berlin, held from July 16th to July 17th, 2016.

The IETF Hackathon is all about promoting the collaborative spirit of open source development and integrating it into IETF standards. The Sysrepo project provides a framework that can be used to bring NETCONF & YANG management to any existing or new Unix/Linux application, which should help spreading these IETF standards into the wider open source community.

The hackathon was our first opportunity to introduce the Sysrepo project to the audience experienced with NETCONF & YANG standards. In front of our poster (see below), we led many constructive discussions with other participants and have gained lots of feedback.

ietf96-hackathon-posterApart from presenting the project to other participants of the IETF meeting, we spent the weekend by hacking on three sub-projects based on Sysrepo:

NETCONF/YANG management of Raspberry Pi

To demonstrate that NETCONF & YANG are also applicable in the IoT (Internet of Things) domain, as well as to demonstrate that Sysrepo can work also on systems with limited resources, we prepared a simple Sysrepo plugin that can control GPIO pins of the Raspberry Pi. We’ve demonstrated this on a relay switch and a thermal sensor connected to the GPIO of the Pi running on Raspbian Linux with Sysrepo and Netopeer2 – we were able to turn the relay on or off via NETCONF, or retrieve the current temperature gained from the sensor via NETCONF.

sysrepo-raspberry

Sysrepo plugin for the ietf-system YANG module

Another part of the team formed out of the hackathon participants focused on development of a Sysrepo plugin that implements the ietf-system YANG module on a generic Linux host. During the hackathon, they managed to write the code that allows NETCONF management of the host name, clock & timezone settings, and is capable of restarting and shutting down device via NETCONF RPCs.

NETCONF/YANG management of DHCPv6 in ISC Kea

The developers of the ISC Kea DHCP server joined our team with a clear goal: enable NETCONF/YANG management of their DHCP daemon using Sysrepo and Netopeer2. During the hackathon they wrote a Sysrepo plug-in for ISC Kea that is able to manage some part of Kea’s configuration via NETCONF. Their work hasn’t stopped after the hackathon ended – they expressed an interest to continue in this direction in the future too.

After the hacking ended, each team prepared a short presentation of their achievements. These were streamed online and are available on YouTube:

Although the biggest achievement for us was the high interest in the Sysrepo project among the IETF meeting participants, and all the feedback we gained from them, we were also selected as a winner in the „Most Importance to IETF“ category. You can read more about that in this blog post.

Rastislav Szabo

                                                                                 Software Engineer in Pantheon Technologies

More information on Sysrepo:

Project page: http://www.sysrepo.org/

GitHub: https://github.com/sysrepo/sysrepo

Mailing lists: http://lists.sysrepo.org/listinfo/

https://www.isc.org/blogs/ietf-hackathon-in-berlin-kea-and-yangnetconf/