What Could Possibly Go Wrong With Adding This One Cool Feature
OpenDaylight uses YANG as its Interface Definition Language. This is an architecture decision we have made way back in 2013 and it works reasonably well for the most part.
One of YANG concepts, used rather heavily, is the use of RPC. For YANG and its intended use in NETCONF’s client/server model it works perfectly fine, but trouble starts brewing when you borrow concepts and try to make them fit your use case.
OpenDaylight uses YANG RPCs to not only define its northbound model, but also model interactions between its individual plugins. It does this in an environment, which is a single process, but rather a cluster of nodes, each having a mesh of plugins, some activated some not.
From architecture’s view, which looks at things from an elevation of 10,000 feet, the problem of making RPCs work in this sort of environment is quite simple: all you need are registries and request routers. From implementation perspective, though, things can easily go wrong… implementations have bugs, quirks and limitations which are not immediately apparent. They just surface when you try and push the system closer to its architectural limits.
The Trouble with Names
RFC 6020 defines only the basic RPC concept and assumes there is a single implementation servicing any request for that RPC. This is okay as long as you are targeting singleton actions — like ‘ping IP’, ‘clear system log’ and similar. In a complex system, though, requests are typically associated with a particular resource — like ‘create a flow on this switch’. Since YANG did not give us this tool, we have decided to create an OpenDaylight extension to allow an RPC to be bound to a context. This gave rise to two unfortunate names: ‘Global RPCs‘ and ‘Routed RPCs‘, the first being normal RPCs and the second being bound to a context. Plus, a third name, ‘RPCs‘, to refer to either one of those concepts. Are you confused yet?
The initial implementation of these concepts was done back in 2013, when there was no clustering in sight, by a team who have spent days upon days discussing the difference. When clustering came into the implementation picture, in 2014, the implementation team attached their own meaning to the word ‘Routed’ and we ended up with an implementation, where Routed RPCs are routed between cluster nodes, but the default ones are not. That is the subject matter behind BUG-3128. It did not matter much as long as all cluster-enabled applications used Routed RPCs, but that changed with emergence of Cluster Singleton Service and its wide-spread adoption among plugins.
These days we have YANG 1.1, defined in RFC 7950, which has the same underlying concept with much less confusing names. ‘Global RPCs’ are ‘RPCs‘. ‘Routed RPCs’ are ‘actions‘. Since those terms make the conversation about semantics a reasonable affair, this is the last you hear about Global and Routed RPCs from me.
Fun with Concepts, Contexts and Contracts
In order to support both RPCs and actions, OpenDaylight’s MD-SAL infrastructure has to define a concept to identify them both. Since the two are utterly similar in what they do, DOMRpcIdentifier was born. It is used to identify either an action or an RPC. To do that is is an abstract class with two concrete, private final implementations: DOMRpcIdentifier$Global and DOMRpcIdentifier$Local. Why those names? I do not remember the details, but I could wager a guess about what I was thinking back then. At any rate, the two implementations differ only in their implementation of DOMRpcIdentifier.getContextReference(). DOMRpcIdentifier$Global’s is always empty and DOMRpcIdentifier$Local’s is always non-empty.
This is consistent with how RPCs (without a context reference) and actions (with a context reference) are invoked and it makes the API involved in the context of RPC/action invocation clean and simple. API contract. In the context of registering an RPC or action implementation, things are slightly less straightforward. It is a separate interface, with a rather terse Javadoc. In both cases there is a hint of ‘a conceptual dynamic router’, but not much in terms of details.
Unless you were very curious as to the details of the API contracts involved, after reading the documentation available, with some OpenDaylight tutorials under your belt, you would feel this is a dead-simple matter and just use the interfaces provided. Run a few test cases and everything works just fine. No trouble in sight.
About That Router Thing…
The Simultaneous Release name of OpenDaylight for the release currently in development is Carbon, meaning we have shipped 5 major releases, so this ‘dynamic router’ thing vaguely referenced actually exists somewhere and it does something to fulfill the API contracts imposed on it, otherwise the applications would not be able to work at all. The entry point into the implementation is DOMRpcRouter. Glancing over that, it contains some ugliness, but it gets the general outline of the two sides of the contract done.
Digging a bit deeper into the invocation path, you get into the fork at AbstractDOMRpcRoutingTableEntry.invokeRpc(). The RPC invocation path is rather straightforward, but the invocation path for actions is far from simple. Out of two code paths (actions and RPCs) we suddenly have 4, as an action can be invoked without a context reference as if it were an RPC and there is a brief mention of remote rpc connector registering action implementations with an empty context reference … wait … WHAT???!!!
Okay, we seem to have two implementations integrated based on implementation details, without that being supported by a single line in the API contract. The connector referenced is actually sal-remoterpc-connector and is something that is meaningful in clusters. To make some sense of this, we have to go back to 2013 again.
A Tale of Three Routers
From the get go, the MD-SAL architecture was split into two distinct worlds: Binding-Independent (BI, DOM) and Binding-Aware (BA, Binding). This split comes from two competing requirements: type-safety provided by Java for application developers who interact with specific data models and infrastructure services which are independent of data models. The former is supported by interfaces and classes generated from YANG models and generally feels like any code where you deal with DTOs. The latter is supported by an object model similar to XML DOM, where you deal with hierarchical ‘document’ trees and all you have to go by are QNames. For obvious reasons most developers interacting with OpenDaylight have never touched the BI world, even though it underpins pretty much every single feature available in the platform.
A very dated picture of how the system is organized can be found here. It is obvious that the two worlds need to seamlessly interoperate — for example RPCs invoked by one world must be able to be serviced by the other and the caller should be none the wiser. Since RPCs are the equivalent of a method call, this process needs to be as fast as possible, too. That lead to a design, where each world has its own Broker and the two brokers are connected. Invocations within the world would be handled by that world’s broker, foregoing any translation. A very old picture of how an inter-world call would look like can be seen in this diagram.
For RPCs this meant that there were two independent routing tables with re-exports being done from each of them. The idea of an RPC router was generalized in the (now long-forgotten) RpcRouter interface. Within a single node, the Binding and DOM routers would be interconnected. For clustered scenarios, a connector would be used to connect the DOM routers across all nodes. So an inter-node BA RPC request from node A to node B would go through: BA-A -> BI-A -> Connector-A -> Connector-B -> BI-B -> BA-B (and back again). Both the BI and connector speak the same language, hence can communicate without data translation.
The design was simple and effective, but has not quite survived the test of time, most notably the transition to dynamic loading of models in the Karaf container. Model loading impacts data translation services needed to cross the BA/BI barrier, leading to situations where an RPC implementation was available in BA world, but could not yet be exported to the BI world — leading to RPC routing loops, and in case of data store services missing data and deadlocks.
To solve these issues, we have decided to remove the BA/BI split from the implementation and turn the Binding-Aware world into an overlay on top of the Binding-Independent world. This means that all infrastructure services always go through BI, and the Binding RPC Broker was gradually taken behind the barn, there was a muffled sound in 2015, and these days we only have two routers, one hiding behind a connector name.
Blueprint for a New Feature
Probably the most significant pain point identified by new people coming to OpenDaylight is that the technology stack is a snowflake, providing few familiar components, with implementation and documentation being borderline hostile to newcomers. One of such pieces is the Configuration Subsystem (CSS). Driven by invalid YANG and magic XMLs, it is a model-driven service activation, dependency injection and configuration framework built on top of JMX. While it offers the ability to re-wire a running instance in a way which does not break anything half-way through reconfiguration, it is a major pain to get right.
It pre-dates MD-SAL (which offers nicer configuration change interactions) and is utterly slow (because the JMX implementation is horrible). It was also designed to safeguard against operator errors and this is quite contrary to what Karaf’s feature service provides — if you hit feature:uninstall, those services are going down without any safeties whatsoever.
To fix this particular sore spot, one of the decisions from the Beryllium design summit was to extend Blueprint with a few capabilities and start the long journey to OpenDaylight without CSS, where internal wiring would be done in Blueprint and user-visible configuration would be stored in MD-SAL configuration data store. The crash-course page is a very easy read.
You will note that there is support for injecting and publishing RPC implementations — which is a nice feature for developers. Rather than having to deal with registries, I can declare a dependency on an RPC service and have Blueprint activate me when it becomes available like this:
<odl:rpc-service id="fooRpcService" interface="org.opendaylight.app.FooRpcService"/>
I can also publish my bean as an implementation, just with a single declaration, like this:
<bean id="fooRpcService" class="org.opendaylight.app.FooRpcServiceImpl">
<!-- constructor args --> </bean> <odl:rpc-implementation ref="fooRpcService"/>
This is beyond neat, this is awesome.
FooRpcService vs. DOMRpcIdentifier
We have already covered how Binding Aware layer sits on top of the Binding Independent one, but it is not a one-to-one mapping. This comes from the fact that Binding Independent layer is centered around what makes sense in YANG, whereas the Binding Aware layer is centered around what makes sense in Java, including various trade-offs and restrictions coming from them. One such difference is that RPCs do not have individual mappings, i.e. we do not generate an interface class for each RPC, but rather we generate a single interface for all RPC definitions in a particular YANG module. Hence for a model like
module foo {
rpc first { input { ... } output { ... } }
rpc second { input { ... } output { ... } } }
we generate a single FooService interface
public interface FooService {
Future<FirstOutput> first(FirstInput input);
Future<FirstOutput> second(SecondInput input);
}
The reasoning behind this is that a particular module’s RPCs (in the broad sense, including actions) will always be implemented by a single OpenDaylight plugin and hence it makes sense to bundle them together.
An unfortunate side-effect of this is that in the Binding Aware layer, both RPCs and actions are packaged in the same interface and it is up to the intermediate layers to sort out the ambiguities. This problem is being addressed in Binding V2, where each action has its own interface, but we have to have a solution which works even in this weird setup.
Fix Some, Break Some
Considering these complexities and gaps in the API contract documentation department, it is not quite surprising that the fix for BUG-3128, while making RPCs work correctly across the cluster had the unfortunate side-effect of breaking blueprint wiring in a downstream project (OpenFlow Plugin). In order to understand why that happened, we need to explore the interactions between DOMRpcRouter, blueprint and sal-remoterpc-connector.
When blueprint sees an <odl:rpc-service/> declaration, it will wire a dependency on the specified RPC (Binding Aware) interface being available in DOMRpcService (which is a facet of DOMRpcRouter). As soon as it sees a registration, it considers the dependency satisfied and proceeds to with the wiring of the component. This is true for LLDP Speaker, too. Note how it declares a dependency on an implementation of PacketProcessingService. Try as you may, you will not find a place where the corresponding <odl:rpc-implementation/> lives. The reason for this is quite simple: this service contains a single action and an implementation is registered when an OpenFlow switch connects to the OpenDaylight instance. So how is it possible this works?
Well, it does not. At least not the way it is intended to work.
What happens is that Blueprint starts listening for an implementation of PacketProcessingService becoming available with an empty context, just as with any old RPC. Except this is an action, so somebody has to register as a global provider for the action, i.e. as being capable to dynamically invoke it based on its content and not being tied to a particular context. That someone is sal-remoterpc-connector, in its current shape an form, which does precisely what is mentioned in that terse comment. It registers itself as a dynamic router for all actions and when a request comes in, it will try to find a remote node which has registered an implementation for the specified in the invocation. That means that unbeknownst to the Blueprint extension, all actions appear to have an implementation — even if there is no component actually providing it — and therefore LLDP Speaker will always activate, just as if that dependency declaration was not there.
The fix to address BUG-3128 performed a simple thing: rather than using blanket registrations, it only propagates registrations observed on other nodes — becoming really a connector rather than a dynamic router. Since no component provides the registration at startup time, blueprint will not see the LLDP Speaker dependency as satisfied, leading to a failure to activate. Unless an OpenFlow switch happens to connect while we are waiting — in that case, activation will go through.
So we are at a fork: we either have blueprint ‘working’, or we have RPC routing in cluster working. Getting both to work at the same time, and actually fixing LLDP Speaker to activate when appropriate, we will obviously have to perform some amount surgery on multiple components.
I will detail what changes are needed to close this little can of worms in my next post. Stay tuned!
Róbert Varga
CTO Pantheon Technologies
PANTHEON.tech @ ONUG 2017 in New York
/in Blog /by PANTHEON.techPANTHEON.tech was part of the Open Networking User Group (ONUG) 2017 in New York. The conference were held from October 17th, until the 18th.
ONUG Highlights & Insights
ONUG belongs to the group of conferences rather smaller in size, but surely not in importance. This year it took place in New York. The Big Apple is a truly interesting place and so was the conference. This event was a combination of trade show and a panel discussion.
Pantheon Technologies did not actively participate in the trade show part this time, as our focus was more on potential business hunting.
ONUG is a 2-day event fully packed with big names on stage, as part of panel discussions, and a good selection of vendors, community leaders, service and solution providers.
The conference includes keynotes from IT business enterprise leaders as they address their open software-defined cloud-based infrastructure journeys, updates from the Working Group Initiative members, hands-on tutorials and interactive labs, real world use cases, proof of concept demonstrations and a vendor technology showcase.
The goal of all ONUG events and initiatives is to bring together the full IT community, allow IT business leaders to:
We are looking forward to ONUG 2018
For Pantheon Technologies, this was a good opportunity to understand current networking needs of service providers, enterprises and vendors. This helps us to improve promoting Pantheon even better in the field of our expertise, in customized software development. ONUG clearly showed, that service providers are heading more and more towards SD-WAN solutions.
We have discussed our expertise in SDN and NFV with almost all of the ONUG participants. We have also found several potential partners to explore this exciting business with. Software Defined Networking is not only a buzzword anymore, it’s been well established and the market is very competitive, especially in the US territory.
That is why we at Pantheon Technologies needs to be on top of it.
Peter Takáč
PANTHEON.tech @ SDN/NFV World Congress 2017 in Hague
/in Blog /by PANTHEON.techThis year, our colleagues from PANTHEON.tech visited quite a couple of tech events around the globe. Among them, the SDN NFV World Congress, taking place in Hague, was one we definitely couldn’t have missed.
As one of the largest conferences focused at network transformation, it attracted more than 1700 visitors from companies all over the world. And it weren’t only large companies, many of whom are among our long-term clients; a fairly large number of start-ups joined in order to present their solutions.
Intent-based Networking: Still not in sight
It’s thrilling to follow the gradual transformation of proprietary solutions into those based on open-source. The reason is simple: at Pantheon Technologies, we contribute into several open-source projects, as we firmly believe that it’s the only way to ensure interoperability and standardization of individual building blocks of SDN and NFV solutions.
Software-defined networking is still under development. Until the present, most use-cases have only been dealing with automation. The bottom line is that it’s still a HDN – human-defined network. It’s still people who express the desired state of the network, it’s not done by a software.
Therefore, after solving the issues with automation and interoperability of the building blocks, a new adventure from the intent-based networking world might await. The current SDN solutions offered by the market, will only provide the infrastructure to be used to fulfill the network users’ intentions.
During the week which we spent at the conference, we’ve had plenty of interesting discussions, both sales-oriented and technical. Now, we’re very much looking forward to further meetings and discussions.
Miroslav Miklus, Martin Firak
PANTHEON.tech @ TechXLR8 in Singapore
/in Blog /by PANTHEON.techLooking for customers and partners in new markets is an essential part of a diversification strategy. New markets bring new opportunities, new insights, needs and challenges. Hence, at the beginning of this October. With my colleagues Denis and Robert, we traveled to Singapore in search of all of the above-mentioned.
We’ve anticipated finding it all at the huge TechXLR8 event, sponsored by PANTHEON.tech, which comprised of smaller happenings: 5G Asia, IoT World Asia, NV & SDN, the AI Summit and Project Kairos Asia. Being the Silver Sponsor at such a vast event was a brand new experience for us.
We’ve spent two days discussing SDN and networking, introducing Pantheon Technologies and our products to the representatives of Asian market. We also had an opportunity to take part in a panel discussion on NFV MANO interoperability and how it fits into the open source world along with related standardization being done by ETSI.
This discussion, more than anything else, showed our presence to other attendees. So, we talked, smiled and explained. People were interested in Visibility Package which we have demonstrated. They asked a lot about the company and our contribution to OpenDaylight, as well as other open source projects we are part of, or have experience with.
SDN, OpenDaylight and the others
Pantheon Technologies was not the only company promoting OpenDaylight-related solutions. Official OpenDaylight members were present, as well as other companies and groups offering their ODL based solutions. We have received several offers for cooperation from several company representatives advertising their ODL and SDN-related skills. This clearly indicates the importance of the OpenDaylight project.
IoT is the word
Despite TechXLR8 being crowded with companies presenting different IoT solutions and despite having our booth placed at NFV/SDN area, we have received a great number of IoT-related questions. We talked about IotDM as of oneM2m compliant data broker for ODL. For some people, oneM2M was just another buzzword. They were frequently asking about specific use cases related to the IoT field. Our question, “what do you need?” still hangs there waiting to be answered. Asia seems to be searching for its answer on what IoT stands for. There are open opportunities for us to help finding an answer for this question.
Man in the middle
Along all the companies presenting their products, skills or ecosystems, there was one special group of people present. They usually introduced themselves as “the company that represents telco in Asia.” Who were these people?
Asian markets are quite different from what we have experienced so far – in a way how companies search for partners and how partnerships are being built. There are many companies acting as matchmakers. It seems that a significant number of telco companies don’t actively search for partners, but rely on matchmakers. Matchmakers actively seek solutions or vendors, who might match their telco customers. What do matchmakers have to say about their customer’s expectations?
All of them had pretty much the same answer. We need to approach companies with our solutions and make them think it is what they need. As if only thing market is looking for was advantage over competitors. Whatever solution will make that happen.
Even that we can’t honestly say there is a market driving vision missing, it for sure feels that way. Presence of buzzwords without focus on specific case indicates that Asian telco and IT market has evolved differently as markets we use to operate.
Hic abundant leones
The best way to describe our first encounter with the Asian market is mapping terra incognita, the unknown land, a place where lions are. We’ve made the first step towards the unknown and have found some potential partners on the way. Now we have to figure out how to turn the first contact into a working partnership and collaboration.
We need to find a set of use cases, to show to potential customers in Asia, but we aren’t quite sure what to show and whom to show it. Finding that out is our next goal. Find use case to make a showcase of and find audience for it. For that, we need to flood the matchmakers we already know and also keep looking for new ones.
Lesson learned
Are our solutions tailored to fulfill specific needs? Indeed they are. Do our solutions bring variety and scalability? Definitely. Can we deliver? Yes we can. Next time, we have to show that more explicitly. We need to prepare showcases that would amaze people.
We need to find equilibrium between our skill and the market’s desire for buzzwords. It does not need to be product quality, does not even need to be a product by itself. It just needs to show – hey, we are the right ones.
Our journey to Singapore was a success. The journey to Asian markets has just begun. It is our job to make the most out of it.
Martin Bobák
Technical leader
Integrate VPP & Honeycomb & Extension of VPP Services
/in Blog /by PANTHEON.techUpdated 2/2022 – FD.io has moved from Nexus to hosting the documentation elsewhere.
In this short article, I would like to share our experience in the field of integrating VPP and Honeycomb, and about the extension of VPP services. Among our colleagues are many developers who contribute to both projects, as well as people who work on integrating these two projects. These developers also work on integrating them with the rest of the networking world.
Let’s define the basic terms.
What is VPP?
According to its wiki page, it is “an extensible framework that provides out-of-the-box production quality switch/router functionality”. There is definitely more to say about VPP, but what’s most important is that it:
“Platform independent” means, that it is up to your decision where you will run it (virtualized environment, bare-metal or others). VPP is a piece of software, which is by default spread in the form of packages. Final VPP packages are available from the official documentation page. Let’s say we decide to use stable VPP in version 17.04 on a stable Ubuntu version 16.04. You can download all available packages from the corresponding Nexus site. If there is no such platform available at Nexus, you can still download VPP and build it on the platform, which you need.
VPP will process packets, which flow in your network similarly to a physical router, but with one big advantage: you do not need to buy a router. You can use whatever physical device you have and just install the VPP modules.
What is Honeycomb?
It is a specific VPP interface. Honeycomb provides NETCONF and RESTCONF interface on northbound and stores required configuration (in form of XML or JSON) in local data store. There is also the hc2vpp project, which calls the corresponding VPP API as reaction to a new configuration stored in data store.
In VPP, there is a special CLI that is used to communicate with VPP. It is in text form (similarly as in OS). To make it easier to use VPP, we also have Honeycomb. It provides an interface, which is somewhere between a GUI and a CLI. You can also request VPP state or statistics of via XML, and you will get the response in an XML form. Honeycomb can be installed in the same way as VPP, through packages, which can be accessed from the Nexus site.
Where can the combination of VPP and Honeycomb be used?
We’ve already showcased several use cases on our PANTHEON.tech YouTube channel:
Another alternative is to use the two as vCPE (Virtual Customer Premises Equipment) as specified in this draft. One of projects which wants to implement it is ONAP. VPP used as vCPE-endpoint for the internet connection from a provider. According to this use case, vCPE should provide several services. In standalone VPP, such services aren’t supported, but they still can be added to a machine where VPP is running. For demonstration, we have chosen DHCP and DNS.
DHCP
In this case, we have two VMs. VM0 simulates the client side (DHCP client) which wants IP address to be assigned to interface enp0s9. VM1 contains VPP and a DHCP server. The DHCP request is broadcasted via enp0s9 at VM0 to VPP1 via port 192.168.40.2. VPP1 is set as proxy DHCP server and DHCP request message is forwarded to 192.168.60.2, where the DHCP server will response with a DHCP offer. Finally, after all DHCP configuration steps are done, interface enp0s9 at VM0 is configured with IP address 192.168.40.10.
DNS
In this case, we also have two VMs. VM0 simulates the client side (DNS client) which needs to resolve domain name to IP address. This request is routed via local port to VPP1, where it is routed to DNS server in VM1. If this resolution is required for the first time, then the request will be sent to the external DNS server. Otherwise, local DNS server will serve this request.
Jozef Glončák
Software Engineer
PANTHEON.tech Supports Apache Cassandra Data Store
/in Blog /by PANTHEON.techby Claudio Gasparini | Leave us your feedback on this post!
As a company with highly skilled people and experience in networking and ODL, PANTHEON.tech provides solutions to any problem or requirement our clients bring up. In this case, we are going to illustrate what we can do on showcasing the workflow of a project.
Identifying a need
The first step was to identify a need; one of the main issues of working with data-store is that we lose data when the Controller goes down.
Proposing a solution
Once we’ve identified the need, we start looking for possible solutions, analyzing each one’s pros and cons, looking for the best answer available. In this case, the best available solution was to replace the in-memory ODL datastore with a persistent database: the Apache Cassandra Data Store.
What is Cassandra?
If you need scalability and high availability without compromising performance, the Apache Cassandra database is the right choice for you. It is the perfect platform for mission-critical data thanks to linear scalability and proven fault-tolerance on cloud infrastructure or commodity hardware.
Cassandra is able of replicating across multiple datacenters and it’s best in the class. With her, your users are provided with lower latency – and you with peace of mind, if you realize how simple surviving a regional outage is.
Defining the solution requirements
We need to define the requirements for the proposed solution: what will it do, and how, requirements from the user. For this project, we’ve decided that the user would need to register the service at a specific prefix, pointing at a specific path on a shard which the user is interested in storing.
The service will be listening to any changes under this and whenever the information is updated, it will take care of transforming the information into the JSON format, and store it in Cassandra.
Implementing the solution & testing
We’ve defined the requirements and have selected the solution. We’ve identified the steps required/wanted to achieve the results expected. Based upon them, we’ve created the tasks required and have implemented them. Finally, we shall test the result. We can see some of the anticipated results in the table below.
* Benchmark, Karaf and Cassandra were running under same Virtual Machine, with 8G RAM and 4 Processors dedicated
Use-cases
We’ve identified one use case for this project – which is to have a persistent data-store. But the list of possible benefits does not end there.
Given the case that we were storing the OpenFlow statistics, we could benefit from having that information using Spark for applying Real-time data analytics & visualization on it. This would allow us to react and improve our network by, for example, banning or redirecting heavy traffic. Once we have the information, everything we need is to pick up the fruit.
For more information, please feel free to contact us.
Sponsoring the SDN NFV World Congress Confirmed
/in Blog /by PANTHEON.techIn mid-October, the SDN NFV World Congress will dominate Europe’s IT landscape. Taking place in Netherlands’ Hague, the event is Europe’s largest dedicated forum addressing the growing markets of software-defined networking (SDN) and network functions virtualization (NFV).
Naturally, this is the type of event we at Pantheon Technologies gravitate towards sponsoring. We’re officially one of the partners of this event. There were already a couple of interesting names on board (Open Networking Foundation, Intel, Telefonica, BT, Konia, Orange…) so how could we be the one to miss out?
If you’d like to hear about technologies such as OpenDaylight, FD.io, OPNFV and many more – and learn about the magic we can work with them, we’ll be looking forward to talking to you live! Also, if you just want to know us, or only have a chat, feel free to drop by!
Martin Firak
PANTHEON.tech partners up with TechXLR8 Asia
/in Blog /by PANTHEON.techWe’ve already started establishing a tradition of Pantheon Technologies partnering with the best tech events around the globe. To keep up with it, we’ll be sponsoring the Network Virtualization & SDN Asia conference, which will be taking place this fall in Singapore as a part of TechXLR8 Asia.
On board with partners such as Juniper Networks, Fujitsu and VMware, we’ll be joining as a silver sponsor.
What does this mean in practice? Our colleagues will be able to showcase the PANTHEON.tech skills and know-how both as speakers and in the exhibition area.
As we were recently proven at TechXLR8 London, our portfolio is quite unique. The topics revolving around ODL, SysRepo, FD.io, Honeycomb and Vector Packet Processing have struck the cord. Not only that we’ve met lots of interesting people from telco, SDN and content delivery companies, but our business card supply wasn’t able to cover the demand!
Is there anything specific you’d like to hear us talk about?
See you in Singapore on October 3-4!
Martin Firak
PANTHEON.tech @ OpenDaylight Forum 2017
/in Blog /by PANTHEON.techOn a regular basis, OpenDaylight (ODL) developers meet in order to discuss their ideas as well as plans for upcoming releases. PANTHEON.tech’s Robert Varga and Vratko Polak have joined this year’s gathering. Vratko’s account of the event follows.
Brief introduction to ODL
OpenDaylight is an open source project aimed at supporting Software Defined Networking, mainly through a Java application (also called ODL). It’s capable of communicating with network elements via various protocols (southbound) while accepting requests from humans and other programs (northbound), again, via various protocols (although RESCONF is currently the main one).
OpenDaylight as a project is hosted by the Linux Foundation (LF), but has its own governance. ODL itself consists of (sub)projects, each has its own Git repository, committers and Project Lead. The Technical Steering Committee (TSC) allows creation of new projects, archival of old projects, and provides guidance for inter-project matters. Most projects focus on providing code for the Java application, so most of their code is in Java, together with Maven definitions used to build artifacts. Those projects depend on each other, ODLParent is the most “upstream” of such projects. Leaf projects are those which do not have other ODL Java project depending on them, not counting Integration/Distribution, which is a project aggregating all artifacts of a particular release into a file archive containing ODL installation.
Integration/Test then runs system tests (CSIT stands for Continuous System Integration Testing) against this archive. Both building and testing is done in Jenkins, Releng/Builder is the project responsible for configuring those Jenkis jobs (and other minutae of infrastructure). In between releases, ODL projects build Snapshot artifacts that are stored in a Nexus server, so artifact version does not define a unique code, and there are possible race conditions when one job uploads new artifacts while other job downloads them.
To avoid these downsides, Releng/Autorelease is a project which downloads all the code, bumps it to a non-snapshot version, builds that, and uploads to a staging repository, thus creating a release candidate. Integration/ and Releng/ projects are examples of Support projects.
Release name & forum
ODL releases are named after periodic table elements. This Forum has taken place just after the Carbon release, and its goal was to bring developers together in order to speed up discussion and planning of the Nitrogen release. One of the few things every project has to agree with, is the choice of the Java container. From Beryllium up to Carbon, the container of choice was Karaf, versions from the 3.0.x series. Karaf is a Java container based on OSGi. The main concept in Karaf is a Feature, which can contain OSGi bundles, config files and other Karaf features. ODL seems to be using Karaf features in a slightly different way from what Karaf developers have intended, therefore the Carbon initiative to upgrade to Karaf 4 has failed. Previous ODL releases tended to come in roughly 8-month cycles. But ODL is now part of larger ecosystem of networking-focused projects, so TSC decided to change to a 6-month cycle. And to fit into a correct slot, Nitrogen is scheduled to be released only 4 months after Carbon, with upgrading to Karaf 4 as its main goal.
The Developer Design Forum (DDF) for Nitrogen has taken place in Hotel Marriott, Santa Clara, California. The official program was two days long, opening on May 31 and concluding on June 1, 2017. DDF gatherings usually consist of scheduled “conference” sessions, accompanied by parallel “unconference” sessions, created on the spot. Compared to previous DDFs, there were less participants than usual (roughly 50 compared to 150 in the past), leading to only one meeting room being used for conferences and leaving the other available for unconferences.
A list of sessions that I attended follows, together with short descriptions. Please note that the descriptions (and session names) are very loose paraphrases of what was actually discussed, based rather on my personal impressions than the official program.
Karaf 4 planning conference session
After reiterating facts about Nitrogen being a “short” release focused on Karaf 4 transition, a rough timeline was presented. It was stressed that active participation of all projects is required. Projects too slow to respond will be dropped from the release mercilessly.
Not many technical details were discussed at this point, aside from notifying projects that there will be a time period where usual build and test jobs will not be running (at least not for every project) as incompatible changes will require time for rebuilds, to be performed in order throughout the project dependency graph.
Emergency leaf project removal plan
Around half of current projects are in dormant state, not being developed anymore, usually with only one person performing critical maintenance in their spare time. It is expected that multiple projects in this state will be unable to perform their Karaf 4 migration duties in time. Therefore, many Carbon projects are not going to make it into Nitrogen official release. Yet, there is a backup plan in place, at least for leaf projects: they could release their artifacts in a standalone release.
That means, their artifacts will not be built within the usual Autorelease job. Releng/Builder can create a job template for that kind of release, so that project won’t need much work to perform such release. Integration/Test would need more changes to allow CSIT for such projects, but we do not envision many projects asking for that.
ODLParent standalone release
It is a long-standing plan to “decentralize” the ODL release process, so that it depends less on Releng/Autorelease forcing everyone to release at the same time. ODLParent will be the first project to do separate releases (and still end up in Integration/Distribution builds). This needs a new job template, basically the same one as for the removed leafs. Version bumping in downstream will be somewhat painful at first, but the Autorelease project already has all the scripts and rights needed, and an automated job can be created later.
Karaf 4 specific changes
In Carbon it was discovered that two main ways to install features (the featuresBoot configuration line and feature:install runtime command) use different code paths in Karaf 4, and therefore supporting both of them might not be possible. If Linux Foundation pays a Karaf developer, it might become possible, but we cannot count on that within the Nitrogen cycle. The first Karaf 4 ready ODLParent release will drop support for Karaf 3, Integration/Distribution will stop building Karaf 3 distribution, and all CSIT testing will be switched to Karaf 4. That means we do not need to support a transition period of both versions being built and tested at the same time. If we decide to only support feature:install, changes to Releng/Builder scripts (for CSIT) will be needed.
Releng/Builder needed changes
This was a technical session, hashing out details of how items from the two previous sessions will be implemented. Few general enhancements were also discussed briefly, however, with no plans of implementing them in the Nitrogen cycle.
Jira instead of Bugzilla
There is a long-standing plan of migrating from Bugzilla to Jira. We’ve discussed several technical reasons why we really need that, as well as a few risks involved. The general consensus is that we want Jira, but it takes some work and we need a person to take the responsibility and make it happen. Not likely within Nitrogen.
ODLParent planning
Technical explanation of what went wrong with Karaf 4 in Carbon. We have a general plan to finally fix that, consisting of 4 approaches we intend to try. Explicit steps of how ODLParent standalone releases and Karaf 4 support will be done, with milestones and deadlines for ODLParent, Java projects, Integration/Distribution and Integration/Test. There will be at least one period where the usual Jenkins jobs will not work, perhaps more if multiple ODLParent releases are needed. Karaf 3 support will be propped as soon as possible, so that projects are motivated to help their upstream with migration.
Integration/Test planning
Few ideas were mentioned, but they were postponed in general, as Karaf 4 migration will consume most of the time. The old plan of migrating ODL installation logic from Releng/Builder bash scripts to Robot Framework suites is still good, but demanding. General Robot code maintenance will remain a slow gradual process. Having a small set of reliable “sanity” tests is still desired. We have a stub already running; all we need is to add more suites which are stable and quick enough. Test result availability and comprehensibility is still a major issue. The current plan is to export the test results to a database, and have a dashboard to render results in a user-friendly way. We have new interns to work on both steps.
MD-SAL usage
A highly technical session where our colleague from Pantheon Technologies, Robert Varga, was talking about the ways MD-SAL (Model Driven Service Abstraction Layer) can be interacted with. Each has its pros and cons. Single listener subscribed to a set of subtrees seems to be the approach avoiding the most of pitfalls, but the cluster implementation is not ready yet.
Infrastructure and CSIT, retrospective and improvements
The changes to Integration/Test and Releng/Builder done in Carbon. Current gaps and how we plan to bridge them, rehashing some ideas from the unconference earlier.
Upgrade-ability conference session
initially, we will be satisfied with reliable offline upgrades. We know that there are significant API changes between releases, and MD-SAL lacks a service which would tell the user that ODL has finished booting up. ODL has a built-in persistence, but some of it is cleared on startup and, perhaps, also corrupted on shutdown. Nevertheless, companies that create ODL-based solutions usually have a way to transfer data from earlier to later version of ODL, so it should be possible to create a basic mechanism in ODL itself. The Daexim project provides a basic set of tools, but it is not equipped to handle data structure changes caused by API changes in each project. The ODL core can help by sticking to the current schema.
Service recovery mechanisms
As the ‘uninstall’ feature does not really work correctly in ODL, current recovery options are limited to restarting the Java Virtual Machine. However, some services present in ODL support a softer restart on demand. A simple model was presented to abstract services and some actions on them, which would allow a client application to query service state and cause a restart without knowing details of a particular service implementation.
Unit testing async code
One of the criteria for ODL code quality is test coverage. Instead of testing each class as a unit, a higher-level “component” tests are the more common option. They still rely upon JUnit executed during a Maven build, but they test a construct consisting of several classes wired together. This is quite positive, as a “real” unit test would frequently have more complicated assertions, and it would still not be clear whether a composite would behave correctly (while such unit tests would take significantly longer to develop). During Carbon development, a significant progress has been achieved in the wiring part of component tests, yet there still is one area that needs improvement: most of ODL code is asynchronous, which means the component consists of several Java threads running concurrently.
One issue is that JUnit requires the assertion to be executed in the main thread to take effect. Another issue is that many asynchronous components lack visible intermediate state changes, which the main thread could check. Most current tests just use sleep for a fixed time before launching the final assert. However, everybody knows, that a test which relies on sleep is a bad test. The ideal solution would be for each class within a component to support dependency injection of asynchronous building blocks, such as executors and listeners. That way the component test can inject specialized building blocks with all hooks the test needs. Failing that, the cheapest solution is to use Awaitility, which, basically, spins an assert (not changing the state) until it passes, or a predefined time runs out. That is better that sleep in that it can pass more quickly.
Closing remarks
The closing session mostly consisted of discussing, why we were joined by way less attendees than is usual. What can be done? One possibility is to merge the Developer Design Forum with some other LF event, however, people argued that this would take away focus from ODL planning. Another option is to ask member organizations to provide the venue, so that a smaller event like this could be hosted without hotel-high venue cost.
Vratko Polák
PANTHEON.tech @ TechXLR8 in London
/in Blog /by PANTHEON.techIn mid-June, the TechXLR8 multi-genre tech festival took place in London. Although being part of the London Tech Week 2017, it comprised of further eight ‘smaller’ events: 5G World, IoT World Europe, Cloud & DevOps World, Apps World Evolution, VR & AR World, AI & Machine Learning World, Connected Cars & Autonomous Vehicles Europe and Project Kairos.
Since it was, from a global perspective, one of the key industry meetings, PANTHEON.tech could not have missed it. We’ve participated in TechXLR8 Cloud & DevOps World section where we showcased our SDN, ODL and networking skills and know-how: we’ve seen a lot of great things, we’ve managed to acquire interesting contacts with international companies active in telco, content delivery and SDN segments. Products from our portfolio such as SysRepo, ODL, HoneyComb, VPP, FD.io turned out to be really great topics for discussion.
Which keywords did the participants respond to best? Linux Foundation, OpenStack, Docker, Kubernetes, BigData. The demand for Pantheon’s business cards was so high that it caught us by surprise. We even had to ration them on the last day, such was the appetite for PANTHEON.tech!
Juraj Veverka
PANTHEON.tech @ GeeCon 2017 in Krakow
/in Blog /by PANTHEON.techEvery year, Krakow welcomes some of the biggest industry names to talk about Java and everything related. This time, PANTHEON.tech couldn’t miss it.
First day in Krakow
The proverbial long and winding road does exist. It sits between Žilina in northern Slovakia and Polish Krakow. After a couple of hours of tiresome driving, we’ve safely arrived in the city. It was a lonesome journey with only radio Pogoda keeping us company by talking gibberish and playing some traditional Polish songs (also in gibberish). The city of Wypadki is surely a magical place. A place where trucks have voting rights and bikers outnumber pigeons 3 to 1. Unfortunately, there was no time to explore further. We checked-in and prepared a schedule of talks to visit.
Second day survival
GeeCon took place in a well-equipped multiplex near the city centre. As it turned out, the venue was not built for this type of events. The corridors‘ bottleneck started to fill with attendees blocking the passage to talk rooms, and you could have spent the whole breaks standing in line in front of a bathroom.
However, the 2017 GeeCon brought out the big guns right at the beginning. David Moore from Sabre showed us the true meaning of “experience.” Although his talk had a rather bland title “Platform and Product Evolution at Sabre”. He touched a broad spectrum of topics – from organizational structures and their need to reflect the software architecture to his hatred towards “layered-cake” architecture designs.
Next on the schedule were some sub-par talks about Java 9 in general, mixed with some never-ending Docker hype, CUDA computing, and introductory profiling. And then we got the juicy stuff. Milen Dyankov from Liferay was not afraid to speak openly about the state and purpose of Jigsaw, the need for the OSGi, and where it all fits together. Great talk for an audience of all levels of familiarity with modular concepts in Java. And of all genders, of course.
We were really pumped up for Monica Beckwith’s talk boldly called “Java Performance Engineers Survival guide”. The abstract was attractive and her CV was quite impressive: JavaOne rock star, previously working in AMD as performance engineer, then Sun, later at Oracle working on GC. Suffice to say, the expectations were really high. However, this was probably the biggest disappointment of the entire event.
We ended the day with a dry sauna back at the hotel and went to sleep.
Third day with Java and Avast
After such an exhausting first day, we started with a well-prepared soft-skills talk promising to improve our client presentations, only to continue with the trend of microservices and reactive programming. Right before lunch, Jarosław Pałka showed us the magic of bytecode. It stood up to the high anticipations and made us want to –javaagent something.
Avast people demonstrated how to utilize Docker in production and Marcin Grzejszczak explained the idea behind consumer-driven contracts of APIs. This certainly got our attention and we will consider it for future projects.
After Steve Poole’s light talk about Java vulnerabilities, we headed back to the hotel to get ready for the biggest IT party of the year. A large club located inside an old fort hosted geeks the entire night and they seriously did show their mad dancing skills, as you can see in the photo.
Fourth day, after the party
The morning after the party, waking up was a bit more painful. We ate the breakfast quickly and checked out.
Even though the party was hard, the audience listened carefully at the first presentation about interrupted exception. We decided to fork us and take a part at different presentations. To the roots of JVM – Java native runtime and another hype – Akka (full auditorium with no spare room left). Later on, we continued with some general JavaScript and JPA lectures.
We joined together at the presentation called “Distributed systems explained (with NodeJS)”, given by Bruno Bossola. Our long-standing question on how to do testing properly was answered by Anton Arhipov – TestContainers.
There was a great presentation about code generation and the reasons why we should generate configurations instead of code at the very end of the conference. Here we felt as if the future was already here. Rod Johnson presented Atomist – a bot for Slack.
Thank you to PANTHEON.tech and to the organizers of GeeCon for this amazing experience.
Martin Dindoffer
Milan Frátrik
Sponsoring the Automotive Linux Summit in Tokyo
/in Blog /by PANTHEON.techPANTHEON.tech is proud to announce that we’ve become a Silver Sponsor of the Automotive Linux Summit, which will be taking place at Tokyo Conference Center Ariake from May 31 till June 2, 2017. In practice, this means more visibility for our brand plus a lot of networking potential. Which equals great potential for meeting new customers.
The Automotive Linux Summit is a one-of-a-kind event where automotive innovators meet with Linux ninjas, research & development managers and business executives. The result? Connecting developers with their peers and vendors, driving innovation towards the automotive future.
With PANTHEON.tech’s background, skills and global plans, this is a place where we naturally belong.
And we’re not going to miss the chance.
Martin Firák
PANTHEON.tech @ PyCon 2017 in Bratislava
/in Blog /by PANTHEON.techThe PyConSK 2017 conference took place at Slovak University of Technology’s Faculty of Informatics and Information Technologies in Bratislava over the weekend from March 10th to 12th.
First day & its challenges
The first day, the Slovak Day took place. The presentations in the large auditorium were focused not only on Python, but also on Robot Framework, artificial intelligence, Open Data, e-government and many other topics. Presentations in the Small auditorium discussed education, elaborating on best practices in teaching Python at high schools.
The presentation “Alternative Methods of Running Tests and Evaluation of their Results” gave a good insight into using test suites and their interpretation. Another interesting presentation was “Custom Python Libraries for Robot Framework,” which was an inviting introduction to Robot framework for beginners. Another two presentations caught our attention: the first one, by Exponea’s Jožo Kováč, was rather more serious.
In “How can Artificial Intelligence in Python Help a Company Grow?” he gave examples of AI’s benefits for e-commerce. The second one, and also the last presentation of the first day, was the funniest of the whole conference. Speaker Michal Kaukič shone out from others by his excellent sense of humor on the topic, which was otherwise boring – “Graphics in Jupyter Notebooks.” He did not give us much opportunities to fall asleep, as twenty per cent of the time the whole auditorium was laughing.
Second day & Django Girls
Saturday’s presentations were mostly in English. The first one was given by Pavel Serbajlo: “What Makes Silicon Valley Software Developers Special?” Since he spent over 4 years in the USA, he knew what he was talking about. It was interesting for all of us to hear what the values in Silicon Valley are, how people work, communicate, commute and how they live.
Other two presentations, “Making Monitoring Boring” and “Building Data Pipelines with Python” were presentations which I definitely wanted to hear due to personal interest in data mining and Linux administration.
The lunch break was a perfect time to establish new contacts, or simply talk to each other while enjoying a great meal. There were several sponsors booths representing RedHat, Fedora, Mozilla, Exponea, Kiwi, Eset, Kistler and others.
After the lunch break, I attended an interesting presentation, “From Code to Community,” which was focused on a community and its ability to organize not only smaller meet-ups, but also bigger conferences like PyCon.DE. The last speaker that day, Adrian Holovaty, co-BDFL of the Django web framework, had a humanely-focused speech about the community aspects of open source.
On the event’s third day, Sunday, I briefly visited the Django Girls workshop and shortly after that I went to Code Analysis with Coala sprint. I did not know much about Coala, so I really appreciate that I could learn something new about this great open source project helping developers improving their code quality.
Two presentations in the large auditorium were interesting for me as well. In “Object Calisthenics,” Pawel Lewtak discussed nine steps leading to a better code. He showed us how to use nine rules called Object Calisthenics in order to write the code shorter, more precise, easier to read and easy to test. At “Automating Network Equipment with Python,” by Elisa Jasinska, we could learn about automated access to devices by Cisco, Juniper, Arista and others.
I can recommend such a great conference, as PyCon surely is, to all Python enthusiasts. Videos from the event can be found on YouTube.
Ján Hradil
Software Developer
PANTHEON.tech @ OPNFV Fast Data Stack – FOSDEM 2017
/in Blog /by PANTHEON.techOn February 5th, we were present at the OPNFV Fast Data Stack on FOSDEM conference that is hosted every year at Brussels’ Université libre de Bruxelles. It was a great gathering of software developers who presented their work in the form of 30-minute presentation.
People came not just from Europe, but also oversees and other parts of world. Lectures took place in more than 30 rooms and more than 600 speakers were presenting their projects.
There was a number of interesting lectures not only in the field of networking, but also robotics, neural networks, microprocessors, algorithms and data modeling. Some presenters were members of large teams, some were presenting their own projects. The scope was very wide including almost every programing language one had ever heard about. Visitors could see everything from startups up to trending projects such as Kubernetes, OpenDaylight or OpenStack. Every lecture was recorded and videos can be found on the FOSDEM website. Our presentation was scheduled in the NFV (Network Function Virtualization) section.
About virtualization and networking
Virtualization became very popular over the last years. Virtual machines curb the need for physical resources and make data centers more flexible and accessible. Today’s servers are really powerful and therefore able of hosting many VMs. This shed a new point of view on networking and, as a response, it got virtualized too in the form of virtual forwarders – processes capable of forwarding traffic within a hosting machine. OVS and VPP are the popular technologies these days and both support a very powerful set of data plane libraries and network interface controller drivers for fast packet processing, called DPDK. You may think of VPP and OVS as virtual forwarders between physical NICs and the virtual machines.
What is OPNFV Fast Data Stack?
OPNFV FDS makes it easier to maintain complicated data center environments. It’s a complex multilayer suite that includes software components designed for creating virtual machines and forwarding traffic. All the components are built with Apex installer on given set of host machines that need to match demanding performance needs and have a basic connectivity as well. As a result, a complex stack is created, providing a rich user-interface to network operators. The input exposes abstract set of tools for managing the life cycle of network, virtual machines and policies across given nodes.
Under the hood
Let’s have a look on key components of the OPNFV FDS suite. As mentioned above, multiple components operate at different layers of the stack. Each component participates in transforming defined abstraction to an actual configuration for underlying infrastructure. On top of the stack resides OpenStack. This software is known for its scalability, loads of plugins and vast community. FDS uses OpenStack for managing VMs and for defining forwarding topology and policy rules. Forwarding inputs can be characterized by elements such as networks, subnets, routers or ports. Policy inputs by security groups and security group rules. One layer bellow is the OpenDaylight controller, also popular for its community, and plugins.
In the OPNFV FDS setup, it is used as a controller unit that consumes OpenStack’s abstractions and applies it to an underlying infrastructure using OpenDaylight’s Group Based Policy plugin. When the plugin detects that a policy can be resolved for at least two endpoints, configuration is generated and flushed to forwarders. OPNFV FDS setup, presented on FOSDEM, is using VPP in the hypervisor to forward packets between physical NICs and the VMs.
VPP, Vector Packet Processing, is a virtual switching/routing technology operating at a very impressive rate. It is impressively fast thanks to the DPDK library and CPU cache optimizing techniques. The beauty of Vector Packet Processing is that instead of handling packets one by one, VPP will perform one micro-operation after another to a group of packets which performs better with heavy load and results in increased throughput. VPP exposes C APIs and CLI for configuration. However, it’s not yet possible to use C API remotely because VPP does not run any management client. Therefore, Honeycomb is used in the setup to provide NETCONF interface for the VPP forwarder. OpenDaylight uses NETCONF to talk to a HC Agent.
Supported scenarios
The FDS Demo presented on FOSEDEM showed the L2 scenario, meaning that L2 traffic is passed via VXLAN tunnels between the nodes. Traffic is routed on centralized node and routing is not performed by VPP itself, but by the OpenStack Qrouter service that is interconnected into every L2 domain in VPP via tap ports. NAT and routing towards external networks is also done by Qrouter.
Moving forward, FDS project is also looking at the L3 scenarios, where routing could be either distributed or centralized and will be done by VPP process together with NAT. All this efforts need attention on every layer of the stack including Apex installer.
Conclusion
We were pleased to present the FDS project at the FOSDEM conference. We believe that OPNFV FDS is a key component in network virtualization with a very bright future. For more information about the setup, and project itself, please visit this page.
Tomáš Čechvala, Michal Čmarada
Software Engineers
OpenDaylight RPC & implementation
/in Blog /by PANTHEON.techWhat Could Possibly Go Wrong With Adding This One Cool Feature
OpenDaylight uses YANG as its Interface Definition Language. This is an architecture decision we have made way back in 2013 and it works reasonably well for the most part.
One of YANG concepts, used rather heavily, is the use of RPC. For YANG and its intended use in NETCONF’s client/server model it works perfectly fine, but trouble starts brewing when you borrow concepts and try to make them fit your use case.
OpenDaylight uses YANG RPCs to not only define its northbound model, but also model interactions between its individual plugins. It does this in an environment, which is a single process, but rather a cluster of nodes, each having a mesh of plugins, some activated some not.
From architecture’s view, which looks at things from an elevation of 10,000 feet, the problem of making RPCs work in this sort of environment is quite simple: all you need are registries and request routers. From implementation perspective, though, things can easily go wrong… implementations have bugs, quirks and limitations which are not immediately apparent. They just surface when you try and push the system closer to its architectural limits.
The Trouble with Names
RFC 6020 defines only the basic RPC concept and assumes there is a single implementation servicing any request for that RPC. This is okay as long as you are targeting singleton actions — like ‘ping IP’, ‘clear system log’ and similar. In a complex system, though, requests are typically associated with a particular resource — like ‘create a flow on this switch’. Since YANG did not give us this tool, we have decided to create an OpenDaylight extension to allow an RPC to be bound to a context. This gave rise to two unfortunate names: ‘Global RPCs‘ and ‘Routed RPCs‘, the first being normal RPCs and the second being bound to a context. Plus, a third name, ‘RPCs‘, to refer to either one of those concepts. Are you confused yet?
The initial implementation of these concepts was done back in 2013, when there was no clustering in sight, by a team who have spent days upon days discussing the difference. When clustering came into the implementation picture, in 2014, the implementation team attached their own meaning to the word ‘Routed’ and we ended up with an implementation, where Routed RPCs are routed between cluster nodes, but the default ones are not. That is the subject matter behind BUG-3128. It did not matter much as long as all cluster-enabled applications used Routed RPCs, but that changed with emergence of Cluster Singleton Service and its wide-spread adoption among plugins.
These days we have YANG 1.1, defined in RFC 7950, which has the same underlying concept with much less confusing names. ‘Global RPCs’ are ‘RPCs‘. ‘Routed RPCs’ are ‘actions‘. Since those terms make the conversation about semantics a reasonable affair, this is the last you hear about Global and Routed RPCs from me.
Fun with Concepts, Contexts and Contracts
In order to support both RPCs and actions, OpenDaylight’s MD-SAL infrastructure has to define a concept to identify them both. Since the two are utterly similar in what they do, DOMRpcIdentifier was born. It is used to identify either an action or an RPC. To do that is is an abstract class with two concrete, private final implementations: DOMRpcIdentifier$Global and DOMRpcIdentifier$Local. Why those names? I do not remember the details, but I could wager a guess about what I was thinking back then. At any rate, the two implementations differ only in their implementation of DOMRpcIdentifier.getContextReference(). DOMRpcIdentifier$Global’s is always empty and DOMRpcIdentifier$Local’s is always non-empty.
This is consistent with how RPCs (without a context reference) and actions (with a context reference) are invoked and it makes the API involved in the context of RPC/action invocation clean and simple. API contract. In the context of registering an RPC or action implementation, things are slightly less straightforward. It is a separate interface, with a rather terse Javadoc. In both cases there is a hint of ‘a conceptual dynamic router’, but not much in terms of details.
Unless you were very curious as to the details of the API contracts involved, after reading the documentation available, with some OpenDaylight tutorials under your belt, you would feel this is a dead-simple matter and just use the interfaces provided. Run a few test cases and everything works just fine. No trouble in sight.
About That Router Thing…
The Simultaneous Release name of OpenDaylight for the release currently in development is Carbon, meaning we have shipped 5 major releases, so this ‘dynamic router’ thing vaguely referenced actually exists somewhere and it does something to fulfill the API contracts imposed on it, otherwise the applications would not be able to work at all. The entry point into the implementation is DOMRpcRouter. Glancing over that, it contains some ugliness, but it gets the general outline of the two sides of the contract done.
Digging a bit deeper into the invocation path, you get into the fork at AbstractDOMRpcRoutingTableEntry.invokeRpc(). The RPC invocation path is rather straightforward, but the invocation path for actions is far from simple. Out of two code paths (actions and RPCs) we suddenly have 4, as an action can be invoked without a context reference as if it were an RPC and there is a brief mention of remote rpc connector registering action implementations with an empty context reference … wait … WHAT???!!!
Okay, we seem to have two implementations integrated based on implementation details, without that being supported by a single line in the API contract. The connector referenced is actually sal-remoterpc-connector and is something that is meaningful in clusters. To make some sense of this, we have to go back to 2013 again.
A Tale of Three Routers
From the get go, the MD-SAL architecture was split into two distinct worlds: Binding-Independent (BI, DOM) and Binding-Aware (BA, Binding). This split comes from two competing requirements: type-safety provided by Java for application developers who interact with specific data models and infrastructure services which are independent of data models. The former is supported by interfaces and classes generated from YANG models and generally feels like any code where you deal with DTOs. The latter is supported by an object model similar to XML DOM, where you deal with hierarchical ‘document’ trees and all you have to go by are QNames. For obvious reasons most developers interacting with OpenDaylight have never touched the BI world, even though it underpins pretty much every single feature available in the platform.
A very dated picture of how the system is organized can be found here. It is obvious that the two worlds need to seamlessly interoperate — for example RPCs invoked by one world must be able to be serviced by the other and the caller should be none the wiser. Since RPCs are the equivalent of a method call, this process needs to be as fast as possible, too. That lead to a design, where each world has its own Broker and the two brokers are connected. Invocations within the world would be handled by that world’s broker, foregoing any translation. A very old picture of how an inter-world call would look like can be seen in this diagram.
For RPCs this meant that there were two independent routing tables with re-exports being done from each of them. The idea of an RPC router was generalized in the (now long-forgotten) RpcRouter interface. Within a single node, the Binding and DOM routers would be interconnected. For clustered scenarios, a connector would be used to connect the DOM routers across all nodes. So an inter-node BA RPC request from node A to node B would go through: BA-A -> BI-A -> Connector-A -> Connector-B -> BI-B -> BA-B (and back again). Both the BI and connector speak the same language, hence can communicate without data translation.
The design was simple and effective, but has not quite survived the test of time, most notably the transition to dynamic loading of models in the Karaf container. Model loading impacts data translation services needed to cross the BA/BI barrier, leading to situations where an RPC implementation was available in BA world, but could not yet be exported to the BI world — leading to RPC routing loops, and in case of data store services missing data and deadlocks.
To solve these issues, we have decided to remove the BA/BI split from the implementation and turn the Binding-Aware world into an overlay on top of the Binding-Independent world. This means that all infrastructure services always go through BI, and the Binding RPC Broker was gradually taken behind the barn, there was a muffled sound in 2015, and these days we only have two routers, one hiding behind a connector name.
Blueprint for a New Feature
Probably the most significant pain point identified by new people coming to OpenDaylight is that the technology stack is a snowflake, providing few familiar components, with implementation and documentation being borderline hostile to newcomers. One of such pieces is the Configuration Subsystem (CSS). Driven by invalid YANG and magic XMLs, it is a model-driven service activation, dependency injection and configuration framework built on top of JMX. While it offers the ability to re-wire a running instance in a way which does not break anything half-way through reconfiguration, it is a major pain to get right.
It pre-dates MD-SAL (which offers nicer configuration change interactions) and is utterly slow (because the JMX implementation is horrible). It was also designed to safeguard against operator errors and this is quite contrary to what Karaf’s feature service provides — if you hit feature:uninstall, those services are going down without any safeties whatsoever.
To fix this particular sore spot, one of the decisions from the Beryllium design summit was to extend Blueprint with a few capabilities and start the long journey to OpenDaylight without CSS, where internal wiring would be done in Blueprint and user-visible configuration would be stored in MD-SAL configuration data store. The crash-course page is a very easy read.
You will note that there is support for injecting and publishing RPC implementations — which is a nice feature for developers. Rather than having to deal with registries, I can declare a dependency on an RPC service and have Blueprint activate me when it becomes available like this:
This is beyond neat, this is awesome.
FooRpcService vs. DOMRpcIdentifier
We have already covered how Binding Aware layer sits on top of the Binding Independent one, but it is not a one-to-one mapping. This comes from the fact that Binding Independent layer is centered around what makes sense in YANG, whereas the Binding Aware layer is centered around what makes sense in Java, including various trade-offs and restrictions coming from them. One such difference is that RPCs do not have individual mappings, i.e. we do not generate an interface class for each RPC, but rather we generate a single interface for all RPC definitions in a particular YANG module. Hence for a model like
we generate a single FooService interface
The reasoning behind this is that a particular module’s RPCs (in the broad sense, including actions) will always be implemented by a single OpenDaylight plugin and hence it makes sense to bundle them together.
An unfortunate side-effect of this is that in the Binding Aware layer, both RPCs and actions are packaged in the same interface and it is up to the intermediate layers to sort out the ambiguities. This problem is being addressed in Binding V2, where each action has its own interface, but we have to have a solution which works even in this weird setup.
Fix Some, Break Some
Considering these complexities and gaps in the API contract documentation department, it is not quite surprising that the fix for BUG-3128, while making RPCs work correctly across the cluster had the unfortunate side-effect of breaking blueprint wiring in a downstream project (OpenFlow Plugin). In order to understand why that happened, we need to explore the interactions between DOMRpcRouter, blueprint and sal-remoterpc-connector.
When blueprint sees an <odl:rpc-service/> declaration, it will wire a dependency on the specified RPC (Binding Aware) interface being available in DOMRpcService (which is a facet of DOMRpcRouter). As soon as it sees a registration, it considers the dependency satisfied and proceeds to with the wiring of the component. This is true for LLDP Speaker, too. Note how it declares a dependency on an implementation of PacketProcessingService. Try as you may, you will not find a place where the corresponding <odl:rpc-implementation/> lives. The reason for this is quite simple: this service contains a single action and an implementation is registered when an OpenFlow switch connects to the OpenDaylight instance. So how is it possible this works?
Well, it does not. At least not the way it is intended to work.
What happens is that Blueprint starts listening for an implementation of PacketProcessingService becoming available with an empty context, just as with any old RPC. Except this is an action, so somebody has to register as a global provider for the action, i.e. as being capable to dynamically invoke it based on its content and not being tied to a particular context. That someone is sal-remoterpc-connector, in its current shape an form, which does precisely what is mentioned in that terse comment. It registers itself as a dynamic router for all actions and when a request comes in, it will try to find a remote node which has registered an implementation for the specified in the invocation. That means that unbeknownst to the Blueprint extension, all actions appear to have an implementation — even if there is no component actually providing it — and therefore LLDP Speaker will always activate, just as if that dependency declaration was not there.
The fix to address BUG-3128 performed a simple thing: rather than using blanket registrations, it only propagates registrations observed on other nodes — becoming really a connector rather than a dynamic router. Since no component provides the registration at startup time, blueprint will not see the LLDP Speaker dependency as satisfied, leading to a failure to activate. Unless an OpenFlow switch happens to connect while we are waiting — in that case, activation will go through.
So we are at a fork: we either have blueprint ‘working’, or we have RPC routing in cluster working. Getting both to work at the same time, and actually fixing LLDP Speaker to activate when appropriate, we will obviously have to perform some amount surgery on multiple components.
I will detail what changes are needed to close this little can of worms in my next post. Stay tuned!
Róbert Varga
CTO Pantheon Technologies
ngPoland and beyond
/in Blog /by PANTHEON.techIn late November 2016 we visited one of the world’s biggest Angular conferences – ngPoland. Just two months before, Angular2 had been released, so all the sessions were more or less discussing it.
The first session was focused at Angular CLI. Tracy Lee showed us how to make a simple application and put it into Firebase in 30 minutes. All with the help of Angular CLI – a command line tool which helps build applications faster, since it prepares your dev environment and you can start coding right away.
We’ve already tried Angular CLI in our project and it’s great. Do you want to watch functionality with live reload, do unit testing with Karma, and end-to-end testing with Protractor? It’s all there, plus much more.
Shai Reznik told us the Legend Of ngModules, a pretty funny story with lot of interesting info on how to write, yes, modules. Seems like a skilled developer should know how to structure applications, but it’s nice to be reminded those best practices every now and then. Especially, when it’s your first try with Angular 2 and TypeScript.
There were few moments when we called „It‘s (put year here), so use (put library/pattern/language here).“ Like „It’s 2005, use asynchronous calls,“ or „It‘s 2015, use promises, callbacks are baaaad“. Now we have another one: „It’s 2016, use observables!“ On this topic, Ben Lesh had a good talk about the RxJS library, which implements the Observer pattern for composing asynchronous and event-base programs.
We’ve tried RxJS, and it works pretty well. We replaced promises in our AJAX calls and events in components. It needs some time to get used to, but it gets pretty straightforward then.
There were more good talks at the Ng Poland conference, so it’s awesome that we all can watch the recordings on YouTube.
I would like to conclude this article with some advice: in the case you are about to start a new project and are deciding between Angular 1, which you’ve used before, and have knowledge of, together with skills and code snippets, and Angular 2, use the second one. Angular 2 is simply better.
PS: If you choose to accept my advice, be prepared for lack of documentation. But it’s getting better every day, trust me.
Daniel Malachovský
Technical Leader in Pantheon Technologies
Sysrepo at IETF 96 Hackathon in Berlin
/in Blog /by PANTHEON.techSysrepo, an open-source project developed by several partners including PANTHEON.tech, participated at the IETF 96 Hackathon in Berlin, held from July 16th to July 17th, 2016.
The IETF Hackathon is all about promoting the collaborative spirit of open source development and integrating it into IETF standards. The Sysrepo project provides a framework that can be used to bring NETCONF & YANG management to any existing or new Unix/Linux application, which should help to spread these IETF standards into the wider open source community.
The hackathon was our first opportunity to introduce the Sysrepo project to the audience experienced with NETCONF & YANG standards. In front of our poster (see below), we led many constructive discussions with other participants and have gained lots of feedback.
NETCONF/YANG management of Raspberry Pi
To demonstrate that NETCONF & YANG are also applicable in the IoT (Internet of Things) domain, as well as to demonstrate that Sysrepo can work also on systems with limited resources, we prepared a simple Sysrepo plugin that can control GPIO pins of the Raspberry Pi. We’ve demonstrated this on a relay switch and a thermal sensor connected to the GPIO of the Pi running on Raspbian Linux with Sysrepo and Netopeer2 – we were able to turn the relay on or off via NETCONF, or retrieve the current temperature gained from the sensor via NETCONF.
Sysrepo plugin for the ietf-system YANG module
Another part of the team formed out of the hackathon participants focused on development of a Sysrepo plugin that implements the ietf-system YANG module on a generic Linux host. During the hackathon, they managed to write the code that allows NETCONF management of the host name, clock & timezone settings, and is capable of restarting and shutting down device via NETCONF RPCs.
NETCONF/YANG management of DHCPv6 in ISC Kea
The developers of the ISC Kea DHCP server joined our team with a clear goal: enable NETCONF/YANG management of their DHCP daemon using Sysrepo and Netopeer2. During the hackathon they wrote a Sysrepo plug-in for ISC Kea that is able to manage some part of Kea’s configuration via NETCONF. Their work hasn’t stopped after the hackathon ended – they expressed an interest to continue in this direction in the future too.
After the hacking ended, each team prepared a short presentation of their achievements. These were streamed online and are available on YouTube:
Although the biggest achievement for us was the high interest in the Sysrepo project among the IETF meeting participants, and all the feedback we gained from them, we were also selected as a winner in the „Most Importance to IETF“ category. You can read more about that in this blog post.
Rastislav Szabo
Software Engineer in Pantheon Technologies
More information on Sysrepo:
Project page: http://www.sysrepo.org/
GitHub: https://github.com/sysrepo/sysrepo
Mailing lists: http://lists.sysrepo.org/listinfo/
https://www.isc.org/blogs/ietf-hackathon-in-berlin-kea-and-yangnetconf/
OpenDaylight @ OpenSource Weekend 2016
/in Blog /by Pantheon TechnologiesOpen source has a long history in Slovakia, reaching back to late nineties, when the community was organized around Slovak Linux Users Group (SK-LUG). They were quite successful in their Linux Weekends, gathering a followership of young enthusiasts, mostly college students. As this generation grew up and became engaged in everyday life, these gatherings have fizzled, with no apparent successor.
The Society for Open Information Technologies, has stepped up to fill this gap. It has started organizing an Open Source Weekend, the latest iteration of which took place during the weekend of 9-10th April 2016.
It has marked the first time the SK-LUG and SOIT communities have come together in a cooperative fashion, broadening the topics covered.
I had the pleasure to hold a presentation on OpenDaylight, at least in broad strokes, its role in shaping the network industry even though it is an open source project, and also on how much of a difference Slovak people are making to it. You can find the presentation below.
For the first time, a panel discussion was introduced at an OSS Weekend. The idea came to us after an exchange on a social network, which has shown that there is great disconnect between what established corporations expect and what FOSS companies deliver — at least in Slovakia. I think the format of having multiple views and added interactions with the audience has resulted in better idea sharing and definitely in more fun. You can find the recording below (sorry, no subtitles in this one).
All in all, I found it refreshing to engage with the community. We have already made plans for more and better content, so stay tuned!
Róbert Varga
CTO in Pantheon Technologies