PANTHEON.tech @ OpenDaylight Forum 2017
On a regular basis, OpenDaylight (ODL) developers meet in order to discuss their ideas as well as plans for upcoming releases. PANTHEON.tech’s Robert Varga and Vratko Polak have joined this year’s gathering. Vratko’s account of the event follows.
Brief introduction to ODL
OpenDaylight is an open source project aimed at supporting Software Defined Networking, mainly through a Java application (also called ODL). It’s capable of communicating with network elements via various protocols (southbound) while accepting requests from humans and other programs (northbound), again, via various protocols (although RESCONF is currently the main one).
OpenDaylight as a project is hosted by the Linux Foundation (LF), but has its own governance. ODL itself consists of (sub)projects, each has its own Git repository, committers and Project Lead. The Technical Steering Committee (TSC) allows creation of new projects, archival of old projects, and provides guidance for inter-project matters. Most projects focus on providing code for the Java application, so most of their code is in Java, together with Maven definitions used to build artifacts. Those projects depend on each other, ODLParent is the most “upstream” of such projects. Leaf projects are those which do not have other ODL Java project depending on them, not counting Integration/Distribution, which is a project aggregating all artifacts of a particular release into a file archive containing ODL installation.
Integration/Test then runs system tests (CSIT stands for Continuous System Integration Testing) against this archive. Both building and testing is done in Jenkins, Releng/Builder is the project responsible for configuring those Jenkis jobs (and other minutae of infrastructure). In between releases, ODL projects build Snapshot artifacts that are stored in a Nexus server, so artifact version does not define a unique code, and there are possible race conditions when one job uploads new artifacts while other job downloads them.
To avoid these downsides, Releng/Autorelease is a project which downloads all the code, bumps it to a non-snapshot version, builds that, and uploads to a staging repository, thus creating a release candidate. Integration/ and Releng/ projects are examples of Support projects.
Release name & forum
ODL releases are named after periodic table elements. This Forum has taken place just after the Carbon release, and its goal was to bring developers together in order to speed up discussion and planning of the Nitrogen release. One of the few things every project has to agree with, is the choice of the Java container. From Beryllium up to Carbon, the container of choice was Karaf, versions from the 3.0.x series. Karaf is a Java container based on OSGi. The main concept in Karaf is a Feature, which can contain OSGi bundles, config files and other Karaf features. ODL seems to be using Karaf features in a slightly different way from what Karaf developers have intended, therefore the Carbon initiative to upgrade to Karaf 4 has failed. Previous ODL releases tended to come in roughly 8-month cycles. But ODL is now part of larger ecosystem of networking-focused projects, so TSC decided to change to a 6-month cycle. And to fit into a correct slot, Nitrogen is scheduled to be released only 4 months after Carbon, with upgrading to Karaf 4 as its main goal.
The Developer Design Forum (DDF) for Nitrogen has taken place in Hotel Marriott, Santa Clara, California. The official program was two days long, opening on May 31 and concluding on June 1, 2017. DDF gatherings usually consist of scheduled “conference” sessions, accompanied by parallel “unconference” sessions, created on the spot. Compared to previous DDFs, there were less participants than usual (roughly 50 compared to 150 in the past), leading to only one meeting room being used for conferences and leaving the other available for unconferences.
A list of sessions that I attended follows, together with short descriptions. Please note that the descriptions (and session names) are very loose paraphrases of what was actually discussed, based rather on my personal impressions than the official program.
Karaf 4 planning conference session
After reiterating facts about Nitrogen being a “short” release focused on Karaf 4 transition, a rough timeline was presented. It was stressed that active participation of all projects is required. Projects too slow to respond will be dropped from the release mercilessly.
Not many technical details were discussed at this point, aside from notifying projects that there will be a time period where usual build and test jobs will not be running (at least not for every project) as incompatible changes will require time for rebuilds, to be performed in order throughout the project dependency graph.
Emergency leaf project removal plan
Around half of current projects are in dormant state, not being developed anymore, usually with only one person performing critical maintenance in their spare time. It is expected that multiple projects in this state will be unable to perform their Karaf 4 migration duties in time. Therefore, many Carbon projects are not going to make it into Nitrogen official release. Yet, there is a backup plan in place, at least for leaf projects: they could release their artifacts in a standalone release.
That means, their artifacts will not be built within the usual Autorelease job. Releng/Builder can create a job template for that kind of release, so that project won’t need much work to perform such release. Integration/Test would need more changes to allow CSIT for such projects, but we do not envision many projects asking for that.
ODLParent standalone release
It is a long-standing plan to “decentralize” the ODL release process, so that it depends less on Releng/Autorelease forcing everyone to release at the same time. ODLParent will be the first project to do separate releases (and still end up in Integration/Distribution builds). This needs a new job template, basically the same one as for the removed leafs. Version bumping in downstream will be somewhat painful at first, but the Autorelease project already has all the scripts and rights needed, and an automated job can be created later.
Karaf 4 specific changes
In Carbon it was discovered that two main ways to install features (the featuresBoot configuration line and feature:install runtime command) use different code paths in Karaf 4, and therefore supporting both of them might not be possible. If Linux Foundation pays a Karaf developer, it might become possible, but we cannot count on that within the Nitrogen cycle. The first Karaf 4 ready ODLParent release will drop support for Karaf 3, Integration/Distribution will stop building Karaf 3 distribution, and all CSIT testing will be switched to Karaf 4. That means we do not need to support a transition period of both versions being built and tested at the same time. If we decide to only support feature:install, changes to Releng/Builder scripts (for CSIT) will be needed.
Releng/Builder needed changes
This was a technical session, hashing out details of how items from the two previous sessions will be implemented. Few general enhancements were also discussed briefly, however, with no plans of implementing them in the Nitrogen cycle.
Jira instead of Bugzilla
There is a long-standing plan of migrating from Bugzilla to Jira. We’ve discussed several technical reasons why we really need that, as well as a few risks involved. The general consensus is that we want Jira, but it takes some work and we need a person to take the responsibility and make it happen. Not likely within Nitrogen.
ODLParent planning
Technical explanation of what went wrong with Karaf 4 in Carbon. We have a general plan to finally fix that, consisting of 4 approaches we intend to try. Explicit steps of how ODLParent standalone releases and Karaf 4 support will be done, with milestones and deadlines for ODLParent, Java projects, Integration/Distribution and Integration/Test. There will be at least one period where the usual Jenkins jobs will not work, perhaps more if multiple ODLParent releases are needed. Karaf 3 support will be propped as soon as possible, so that projects are motivated to help their upstream with migration.
Integration/Test planning
Few ideas were mentioned, but they were postponed in general, as Karaf 4 migration will consume most of the time. The old plan of migrating ODL installation logic from Releng/Builder bash scripts to Robot Framework suites is still good, but demanding. General Robot code maintenance will remain a slow gradual process. Having a small set of reliable “sanity” tests is still desired. We have a stub already running; all we need is to add more suites which are stable and quick enough. Test result availability and comprehensibility is still a major issue. The current plan is to export the test results to a database, and have a dashboard to render results in a user-friendly way. We have new interns to work on both steps.
MD-SAL usage
A highly technical session where our colleague from Pantheon Technologies, Robert Varga, was talking about the ways MD-SAL (Model Driven Service Abstraction Layer) can be interacted with. Each has its pros and cons. Single listener subscribed to a set of subtrees seems to be the approach avoiding the most of pitfalls, but the cluster implementation is not ready yet.
Infrastructure and CSIT, retrospective and improvements
The changes to Integration/Test and Releng/Builder done in Carbon. Current gaps and how we plan to bridge them, rehashing some ideas from the unconference earlier.
Upgrade-ability conference session
initially, we will be satisfied with reliable offline upgrades. We know that there are significant API changes between releases, and MD-SAL lacks a service which would tell the user that ODL has finished booting up. ODL has a built-in persistence, but some of it is cleared on startup and, perhaps, also corrupted on shutdown. Nevertheless, companies that create ODL-based solutions usually have a way to transfer data from earlier to later version of ODL, so it should be possible to create a basic mechanism in ODL itself. The Daexim project provides a basic set of tools, but it is not equipped to handle data structure changes caused by API changes in each project. The ODL core can help by sticking to the current schema.
Service recovery mechanisms
As the ‘uninstall’ feature does not really work correctly in ODL, current recovery options are limited to restarting the Java Virtual Machine. However, some services present in ODL support a softer restart on demand. A simple model was presented to abstract services and some actions on them, which would allow a client application to query service state and cause a restart without knowing details of a particular service implementation.
Unit testing async code
One of the criteria for ODL code quality is test coverage. Instead of testing each class as a unit, a higher-level “component” tests are the more common option. They still rely upon JUnit executed during a Maven build, but they test a construct consisting of several classes wired together. This is quite positive, as a “real” unit test would frequently have more complicated assertions, and it would still not be clear whether a composite would behave correctly (while such unit tests would take significantly longer to develop). During Carbon development, a significant progress has been achieved in the wiring part of component tests, yet there still is one area that needs improvement: most of ODL code is asynchronous, which means the component consists of several Java threads running concurrently.
One issue is that JUnit requires the assertion to be executed in the main thread to take effect. Another issue is that many asynchronous components lack visible intermediate state changes, which the main thread could check. Most current tests just use sleep for a fixed time before launching the final assert. However, everybody knows, that a test which relies on sleep is a bad test. The ideal solution would be for each class within a component to support dependency injection of asynchronous building blocks, such as executors and listeners. That way the component test can inject specialized building blocks with all hooks the test needs. Failing that, the cheapest solution is to use Awaitility, which, basically, spins an assert (not changing the state) until it passes, or a predefined time runs out. That is better that sleep in that it can pass more quickly.
Closing remarks
The closing session mostly consisted of discussing, why we were joined by way less attendees than is usual. What can be done? One possibility is to merge the Developer Design Forum with some other LF event, however, people argued that this would take away focus from ODL planning. Another option is to ask member organizations to provide the venue, so that a smaller event like this could be hosted without hotel-high venue cost.
Vratko Polák