VPP 105: Memory Management & DPDK APIs

VPP 105: Memory Management & DPDK APIs

Welcome to the 5th part of our VPP Guide series! Today, we will be asking ourselves practical questions regarding the various technologies & libraries managed in VPP – their usage, advantages, and management. Let’s jump right into it and ask ourselves:

Why does DPDK use Hugepages?

Hugepages

Hugepages is one of the techniques used in virtual memory management. In a standard environment, CPU allocates memory (virtual) for each process. Those blocks of memory are called „pages“ and for efficiency in Linux kernel the size of allocated memory is 4kB. When a process wants to access its memory, CPU has to find where this virtual memory is – this is the task of Memory Management Unit and page table lookup. Using the page table structure CPU could map virtual to physical memory.

For example, when the process needs 1GB of memory, this leads to more than 200k of pages in the page table which the CPU has to lookup for. Of course, this leads to performance slowdown. Fortunately, nowadays CPUs support bigger pages – so-called Hugepages. They can reduce the number of pages to be lookup for and usage of huge pages increases performance.

Memory Management Unit uses one additional hardware cache – Translation Lookaside Buffers (TLB). When there is address translation from virtual memory to physical memory, translation is calculated in MMU, and this mapping is stored in the TLB. So next time accessing the same page will be first handled by TLB (which is fast) and then by MMU.

As TLB is a hardware cache, it has a limited number of entries, so a large number of pages will slow down the application. So a combination of TLB with Hugepages reduces the time it takes to translate a virtual page address to a physical page address and to lookup for and so again it will increase performance.

This is the reason why DPDK, and VPP as well, uses Hugepages for large memory pool allocation, used for packet buffers. By using Hugepages allocations, performance is increased since fewer pages and fewer lookups are needed and the management is more effective.

Cache prefetching

Cache prefetching is another technique used by VPP to boost execution performance. Prefetching data from their original storage in slower memory to a faster local memory before it is actually needed significantly increase performance. CPUs have fast and local cache memory in which prefetched data is held until it is required. Examples of CPU caches with a specific function are the D-cache (data cache), I-cache (instruction cache) and the TLB (translation lookaside buffer) for the MMU. Separated D-cache and I-cache makes it possible to fetch instructions and data in parallel. Moreover, instructions and data have different access patterns.

Cache prefetching is used mainly in nodes when processing packets. In VPP, each node has a registered function responsible for incoming traffic handling. An example of registration (abf and flowprobe nodes):

/*
 * Copyright (c) 2017 Cisco and/or its affiliates. Licensed under the Apache License, Version 2.0. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0
 */

VLIB_REGISTER_NODE (abf_ip4_node) = {
  .function = abf_input_ip4,
  .name = "abf-input-ip4",

VLIB_REGISTER_NODE (flowprobe_ip4_node) = {
  .function = flowprobe_ip4_node_fn,
  .name = "flowprobe-ip4",

In abf processing function, we can see single loop handling – it loops over packets and handles them one by one.

/*
 * Copyright (c) 2017 Cisco and/or its affiliates. Licensed under the Apache License, Version 2.0. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0
 */

abf_input_inline (vlib_main_t * vm,
      vlib_node_runtime_t * node,
      vlib_frame_t * frame, fib_protocol_t fproto)
{
...
      while (n_left_from > 0 && n_left_to_next > 0)
  {
...	
    abf_next_t next0 = ABF_NEXT_DROP;
    vlib_buffer_t *b0;
    u32 bi0, sw_if_index0;
...
    bi0 = from[0];
    to_next[0] = bi0;
    from += 1;
    to_next += 1;
    n_left_from -= 1;
    n_left_to_next -= 1;

    b0 = vlib_get_buffer (vm, bi0);
    sw_if_index0 = vnet_buffer (b0)->sw_if_index[VLIB_RX];

    ASSERT (vec_len (abf_per_itf[fproto]) > sw_if_index0);
    attachments0 = abf_per_itf[fproto][sw_if_index0];
...
    /* verify speculative enqueue, maybe switch current next frame */
    vlib_validate_buffer_enqueue_x1 (vm, node, next_index,
             to_next, n_left_to_next, bi0,
             next0);
  }
      vlib_put_next_frame (vm, node, next_index, n_left_to_next);
    }

In flowprobe node, we can see quad/single loop using prefetching which can significantly increase performance. In the first loop:

/*
 * Copyright (c) 2015 Cisco and/or its affiliates. Licensed under the Apache License, Version 2.0. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0
 */

( while (n_left_from >= 4 ... ) )

it processes buffers b0 and b1 (and moreover, the next two buffers are prefetched), and in the next loop

/*
 * Copyright (c) 2015 Cisco and/or its affiliates. Licensed under the Apache License, Version 2.0. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0
 */

( while (n_left_from > 0 ... ) )

remaining packets are processed.

/*
 * Copyright (c) 2018 Cisco and/or its affiliates. Licensed under the Apache License, Version 2.0. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0
 */

flowprobe_node_fn (vlib_main_t * vm,
       vlib_node_runtime_t * node, vlib_frame_t * frame,
       flowprobe_variant_t which)
{
...
      /*
      * While we have at least 4 vector elements (pkts) to process..
      */

      while (n_left_from >= 4 && n_left_to_next >= 2)
  {
...
    /* Prefetch next iteration. */
    {
      vlib_buffer_t *p2, *p3;

      p2 = vlib_get_buffer (vm, from[2]);
      p3 = vlib_get_buffer (vm, from[3]);

      vlib_prefetch_buffer_header (p2, LOAD);
      vlib_prefetch_buffer_header (p3, LOAD);

      CLIB_PREFETCH (p2->data, CLIB_CACHE_LINE_BYTES, STORE);
      CLIB_PREFETCH (p3->data, CLIB_CACHE_LINE_BYTES, STORE);
    }
...
    /* speculatively enqueue b0 and b1 to the current next frame */
    b0 = vlib_get_buffer (vm, bi0);
    b1 = vlib_get_buffer (vm, bi1);


    /* verify speculative enqueues, maybe switch current next frame */
    vlib_validate_buffer_enqueue_x2 (vm, node, next_index,
             to_next, n_left_to_next,
             bi0, bi1, next0, next1);
  }
      /*
      * Clean up 0...3 remaining packets at the end of the frame
      */
      while (n_left_from > 0 && n_left_to_next > 0)
  {
    u32 bi0;
    vlib_buffer_t *b0;
    u32 next0 = FLOWPROBE_NEXT_DROP;
    u16 len0;

    /* speculatively enqueue b0 to the current next frame */
    bi0 = from[0];
    to_next[0] = bi0;
    from += 1;
    to_next += 1;
    n_left_from -= 1;
    n_left_to_next -= 1;

    b0 = vlib_get_buffer (vm, bi0);

    vnet_feature_next (&next0, b0);

    len0 = vlib_buffer_length_in_chain (vm, b0);
    ethernet_header_t *eh0 = vlib_buffer_get_current (b0);
    u16 ethertype0 = clib_net_to_host_u16 (eh0->type);

    if (PREDICT_TRUE ((b0->flags & VNET_BUFFER_F_FLOW_REPORT) == 0))
      {
        flowprobe_trace_t *t = 0;
        if (PREDICT_FALSE ((node->flags & VLIB_NODE_FLAG_TRACE)
         && (b0->flags & VLIB_BUFFER_IS_TRACED)))
    t = vlib_add_trace (vm, node, b0, sizeof (*t));

        add_to_flow_record_state (vm, node, fm, b0, timestamp, len0,
          flowprobe_get_variant
          (which, fm->context[which].flags,
           ethertype0), t);
      }

    /* verify speculative enqueue, maybe switch current next frame */
    vlib_validate_buffer_enqueue_x1 (vm, node, next_index,
             to_next, n_left_to_next,
             bi0, next0);
  }

VPP I/O Request Handling

Why is polling faster than IRQs? How do the hardware/software IRQs work?

I/O device (NIC) event handling is a significant part of VPP. The CPU doesn’t know when an I/O event can occur, but it has to respond. There are two different approaches – IRQ and Polling, which are different from each other in many aspects.

From a CPU point of view, IRQ seems to be better, as the device disturbs the CPU only when it needs servicing, instead of constantly checking device status in case of polling. But from an efficiency point of view, interruptions are inefficient when the devices keep on interrupting the CPU repeatedly and polling is inefficient when the CPU device is rarely ready for servicing.

As in the case of packet processing in VPP, it is expected that traffic will be permanent. In such a case, the number of interruptions would rapidly increase. On the other hand, the device will be ready for service all the time. So polling seems to be more efficient for packet processing and it is the reason why VPP uses polling when processing the incoming packet.

VPP & DPDK

What API does DPDK offer? How does VPP use this library?

DPDK networking drivers are classified in two categories:

  • physical for real devices
  • virtual for emulated devices

The DPDK ethdev layer exposes APIs, in order to use the networking functions of these devices. For a full list of the supported features and APIs, click here.

In VPP, DPDK support has been moved from core to plugin to simplify enabling/disabling and handling DPDK interfaces. To simplify and store all DPDK relevant info, a DPDK device implementation (src/plugin/dpdk/device/dpdk.h) has a structure with DPDK data:

/* SPDX-License-Identifier: BSD-3-Clause
 * Copyright(c) 2010-2014 Intel Corporation
 */

typedef struct
{
...
  struct rte_eth_conf port_conf;
  struct rte_eth_txconf tx_conf;
...
  struct rte_flow_error last_flow_error;
...
  struct rte_eth_link link;
...
  struct rte_eth_stats stats;
  struct rte_eth_stats last_stats;
  struct rte_eth_xstat *xstats;
...
} dpdk_device_t;

containing all relevant DPDK structs used in VPP, to store DPDK relevant info.

DPDK APIs are used in the DPDK plugin only. Here is a list of DPDK features and their API’s used in VPP, with a few examples of usage.

Speed Capabilities / Runtime Rx / Tx Queue Setup

Supports getting the speed capabilities that the current device is capable of. Supports Rx queue setup after the device started.

API: rte_eth_dev_info_get()

/*
 * Copyright (c) 2017 Cisco and/or its affiliates. Licensed under the Apache License, Version 2.0. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0
 */

dpdk_device_setup (dpdk_device_t * xd)
{
  dpdk_main_t *dm = &dpdk_main;
...
  struct rte_eth_dev_info dev_info;
...
  if (xd->flags & DPDK_DEVICE_FLAG_ADMIN_UP)
    {
      vnet_hw_interface_set_flags (dm->vnet_main, xd->hw_if_index, 0);
      dpdk_device_stop (xd);
    }

  /* Enable flow director when flows exist */
  if (xd->pmd == VNET_DPDK_PMD_I40E)
    {
      if ((xd->flags & DPDK_DEVICE_FLAG_RX_FLOW_OFFLOAD) != 0)
  xd->port_conf.fdir_conf.mode = RTE_FDIR_MODE_PERFECT;
      else
  xd->port_conf.fdir_conf.mode = RTE_FDIR_MODE_NONE;
    }

  rte_eth_dev_info_get (xd->port_id, &dev_info);

Link Status

Supports getting the link speed, duplex mode and link-state (up/down).

API: rte_eth_link_get_nowait()

/*
 *Copyright (c) 2015 Cisco and/or its affiliates. Licensed under the Apache License, Version 2.0. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0
 */

dpdk_update_link_state (dpdk_device_t * xd, f64 now)
{
  vnet_main_t *vnm = vnet_get_main ();
  struct rte_eth_link prev_link = xd->link;
...
  /* only update link state for PMD interfaces */
  if ((xd->flags & DPDK_DEVICE_FLAG_PMD) == 0)
    return;

  xd->time_last_link_update = now ? now : xd->time_last_link_update;
  clib_memset (&xd->link, 0, sizeof (xd->link));
  rte_eth_link_get_nowait (xd->port_id, &xd->link);

Lock-Free Tx Queue

If a PMD advertises DEV_TX_OFFLOAD_MT_LOCKFREE capable, multiple threads can invoke rte_eth_tx_burst() concurrently on the same Tx queue without SW lock.

API: rte_eth_tx_burst()

/*
 * Copyright (c) 2015 Cisco and/or its affiliates. Licensed under the Apache License, Version 2.0. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0
 */

static clib_error_t *
dpdk_lib_init (dpdk_main_t * dm)
{
...
  dpdk_device_t *xd;
...
      if (xd->pmd == VNET_DPDK_PMD_FAILSAFE)
  {
    /* failsafe device numerables are reported with active device only,
     * need to query the mtu for current device setup to overwrite
     * reported value.
     */
    uint16_t dev_mtu;
    if (!rte_eth_dev_get_mtu (i, &dev_mtu))
      {
        mtu = dev_mtu;
        max_rx_frame = mtu + sizeof (ethernet_header_t);

        if (dpdk_port_crc_strip_enabled (xd))
    {
      max_rx_frame += 4;
    }
      }
  }

Promiscuous Mode

Supports enabling/disabling promiscuous mode for a port.

API: rte_eth_promiscuous_enable(), rte_eth_promiscuous_disable(), rte_eth_promiscuous_get()

Allmulticast Mode

Supports enabling/disabling receiving multicast frames.

API: rte_eth_allmulticast_enable(), rte_eth_allmulticast_disable(), rte_eth_allmulticast_get()

/*
 * Copyright (c) 2017 Cisco and/or its affiliates. Licensed under the Apache License, Version 2.0. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0
 */

dpdk_device_stop (dpdk_device_t * xd)
{
  if (xd->flags & DPDK_DEVICE_FLAG_PMD_INIT_FAIL)
    return;

  rte_eth_allmulticast_disable (xd->port_id);
  rte_eth_dev_stop (xd->port_id);
...

Unicast MAC Filter

Supports adding MAC addresses to enable white-list filtering to accept packets.

APIrte_eth_dev_default_mac_addr_set(), rte_eth_dev_mac_addr_add(), rte_eth_dev_mac_addr_remove(), rte_eth_macaddr_get()

VLAN Filter

Supports filtering of a VLAN Tag identifier.

API: rte_eth_dev_vlan_filter()

VLAN Offload

Supports VLAN offload to hardware.

API: rte_eth_dev_set_vlan_offload(), rte_eth_dev_get_vlan_offload()

/*
 * Copyright (c) 2015 Cisco and/or its affiliates. Licensed under the Apache License, Version 2.0. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0
 */

dpdk_subif_add_del_function (vnet_main_t * vnm,
           u32 hw_if_index,
           struct vnet_sw_interface_t *st, int is_add)
{
...
  dpdk_device_t *xd = vec_elt_at_index (xm->devices, hw->dev_instance);
  int r, vlan_offload;
...
  vlan_offload = rte_eth_dev_get_vlan_offload (xd->port_id);
  vlan_offload |= ETH_VLAN_FILTER_OFFLOAD;

  if ((r = rte_eth_dev_set_vlan_offload (xd->port_id, vlan_offload)))
    {
      xd->num_subifs = prev_subifs;
      err = clib_error_return (0, "rte_eth_dev_set_vlan_offload[%d]: err %d",
             xd->port_id, r);
      goto done;
    }

  if ((r =
       rte_eth_dev_vlan_filter (xd->port_id,
        t->sub.eth.outer_vlan_id, is_add)))
    {
      xd->num_subifs = prev_subifs;
      err = clib_error_return (0, "rte_eth_dev_vlan_filter[%d]: err %d",
             xd->port_id, r);
      goto done;
    }

Basic Stats

Support basic statistics such as: ipackets, opackets, ibytes, obytes, imissed, ierrors, oerrors, rx_nombuf. And per queue stats: q_ipackets, q_opackets, q_ibytes, q_obytes, q_errors.

API: rte_eth_stats_get, rte_eth_stats_reset()

/*
 * Copyright (c) 2015 Cisco and/or its affiliates. Licensed under the Apache License, Version 2.0. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0
 */

dpdk_update_counters (dpdk_device_t * xd, f64 now)
{
  vlib_simple_counter_main_t *cm;
  vnet_main_t *vnm = vnet_get_main ();
  u32 thread_index = vlib_get_thread_index ();
  u64 rxerrors, last_rxerrors;

  /* only update counters for PMD interfaces */
  if ((xd->flags & DPDK_DEVICE_FLAG_PMD) == 0)
    return;

  xd->time_last_stats_update = now ? now : xd->time_last_stats_update;
  clib_memcpy_fast (&xd->last_stats, &xd->stats, sizeof (xd->last_stats));
  rte_eth_stats_get (xd->port_id, &xd->stats);

Extended Stats

Supports Extended Statistics, changes from driver to driver.

APIrte_eth_xstats_get(), rte_eth_xstats_reset(), rte_eth_xstats_get_names, rte_eth_xstats_get_by_id(), rte_eth_xstats_get_names_by_id(), rte_eth_xstats_get_id_by_name()

Module EEPROM Dump

Supports getting information and data of plugin module eeprom.

API: rte_eth_dev_get_module_info(), rte_eth_dev_get_module_eeprom()


VPP Library (vlib)

What funcionality does vlib offer?

Vlib is a vector processing library. It also handles various application management functions:

  • buffer, memory, and graph node management and scheduling
  • reliable multicast support
  • ultra-lightweight cooperative multi-tasking threads
  • physical memory, and Linux epoll support
  • maintaining and exporting counters
  • thread management
  • packet tracing.

Vlib also implements the debug CLI.

In VPP (vlib), a vector is an instance of the vlib_frame_t type:

/*
 * Copyright (c) 2015 Cisco and/or its affiliates. Licensed under the Apache License, Version 2.0. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0
 */

typedef struct vlib_frame_t
{
  /* Frame flags. */
  u16 flags;

  /* Number of scalar bytes in arguments. */
  u8 scalar_size;

  /* Number of bytes per vector argument. */
  u8 vector_size;

  /* Number of vector elements currently in frame. */
  u16 n_vectors;

  /* Scalar and vector arguments to next node. */
  u8 arguments[0];
} vlib_frame_t;

As shown, vectors are dynamically resized arrays with user-defined “headers”. Many data structures in VPP (buffers, hash, heap, pool) are vectors with different headers.

The memory layout looks like this:

© Copyright 2018, Linux Foundation


User header (optional, uword aligned)
                  Alignment padding (if needed)
                  Vector length in elements
User's pointer -> Vector element 0
                  Vector element 1
                  ...
                  Vector element N-1

Vectors are not only used in vppinfra data structures (hash, heap, pool, …) but also in vlib – in nodes, buffers, processes and more.

Buffers

Vlib buffers are used to reach high performance in packet processing. To do so, one allocates/frees N-buffers at once, rather than one at a time – except for directly processing specific buffer (its packets in given node), one deals with buffer indices instead of buffer pointers. Vlib buffers have a structure of a vector:

/*
 * Copyright (c) 2015 Cisco and/or its affiliates. Licensed under the Apache License, Version 2.0. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0
 */

/** VLIB buffer representation. */
typedef union
{
  struct
  {
    CLIB_CACHE_LINE_ALIGN_MARK (cacheline0);

    /** signed offset in data[], pre_data[] that we are currently
      * processing. If negative current header points into predata area.  */
    i16 current_data;

    /** Nbytes between current data and the end of this buffer.  */
    u16 current_length;
...
    /** Opaque data used by sub-graphs for their own purposes. */
    u32 opaque[10];
...
    /**< More opaque data, see ../vnet/vnet/buffer.h */
    u32 opaque2[14];

    /** start of third cache line */
      CLIB_CACHE_LINE_ALIGN_MARK (cacheline2);

    /** Space for inserting data before buffer start.  Packet rewrite string
      * will be rewritten backwards and may extend back before
      * buffer->data[0].  Must come directly before packet data.  */
    u8 pre_data[VLIB_BUFFER_PRE_DATA_SIZE];

    /** Packet data */
    u8 data[0];
  };
#ifdef CLIB_HAVE_VEC128
  u8x16 as_u8x16[4];
#endif
#ifdef CLIB_HAVE_VEC256
  u8x32 as_u8x32[2];
#endif
#ifdef CLIB_HAVE_VEC512
  u8x64 as_u8x64[1];
#endif
} vlib_buffer_t;

Each vlib_buffer_t (packet buffer) carries the buffer metadata, which describes the current packet-processing state.

  • u8 data[0]: Ordinarily, hardware devices use data as the DMA target but there are exceptions. Do not access data directly, use vlib_buffer_get_current.
  • u32 opaque[10]: primary vnet-layer opaque data
  • u32 opaque2[14]: secondary vnet-layer opaque data

There are several functions to get data from vector (vlib/node_funcs.h):

To get a pointer to frame vector data

/*
 * Copyright (c) 2015 Cisco and/or its affiliates. Licensed under the Apache License, Version 2.0. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0
 */

always_inline void *
vlib_frame_vector_args (vlib_frame_t * f)
{
  return (void *) f + vlib_frame_vector_byte_offset (f->scalar_size);
}
-	to get pointer to scalar data
always_inline void *
vlib_frame_scalar_args (vlib_frame_t * f)
{
  return vlib_frame_vector_args (f) - f->scalar_size;
}

Get pointer to scalar data

/*
 * Copyright (c) 2015 Cisco and/or its affiliates. Licensed under the Apache License, Version 2.0. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0
 */

always_inline void *
vlib_frame_scalar_args (vlib_frame_t * f)
{
  return vlib_frame_vector_args (f) - f->scalar_size;
}

Translate the buffer index into buffer pointer

/*
 * Copyright (c) 2015 Cisco and/or its affiliates. Licensed under the Apache License, Version 2.0. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0
 */

always_inline vlib_buffer_t *
vlib_get_buffer (vlib_main_t * vm, u32 buffer_index)
{
  vlib_buffer_main_t *bm = vm->buffer_main;
  vlib_buffer_t *b;

  b = vlib_buffer_ptr_from_index (bm->buffer_mem_start, buffer_index, 0);
  vlib_buffer_validate (vm, b);
  return b;
}

Get the pointer to current (packet) data from a buffer to process

/*
 * Copyright (c) 2015 Cisco and/or its affiliates. Licensed under the Apache License, Version 2.0. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0
 */

always_inline void *
vlib_buffer_get_current (vlib_buffer_t * b)
{
  /* Check bounds. */
  ASSERT ((signed) b->current_data >= (signed) -VLIB_BUFFER_PRE_DATA_SIZE);
  return b->data + b->current_data;

Get vnet primary buffer metadata in the reserved opaque field

/*
 * Copyright (c) 2015 Cisco and/or its affiliates. Licensed under the Apache License, Version 2.0. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0<br >
 */

#define vnet_buffer(b) ((vnet_buffer_opaque_t *) (b)->opaque)

An example to retrieve vnet buffer data:

/*
 * Copyright (c) 2017 Cisco and/or its affiliates. Licensed under the Apache License, Version 2.0. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0
 */

add_to_flow_record_state (vlib_main_t * vm, vlib_node_runtime_t * node,
        flowprobe_main_t * fm, vlib_buffer_t * b,
        timestamp_nsec_t timestamp, u16 length,
        flowprobe_variant_t which, flowprobe_trace_t * t)
{
...
  u32 rx_sw_if_index = vnet_buffer (b)->sw_if_index[VLIB_RX];

Get vnet primary buffer metadata in reserved opaque2 field

/*
 * Copyright (c) 2017 Cisco and/or its affiliates. Licensed under the Apache License, Version 2.0. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0
 */

#define vnet_buffer2(b) ((vnet_buffer_opaque2_t *) (b)->opaque2)

Let’s take a look at flowprobe node processing function. Vlib functions always start with a vlib_ prefix.

/*
 * Copyright (c) 2017 Cisco and/or its affiliates. Licensed under the Apache License, Version 2.0. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0
 */

flowprobe_node_fn (vlib_main_t * vm,
       vlib_node_runtime_t * node, vlib_frame_t * frame,
       flowprobe_variant_t which)
{
  u32 n_left_from, *from, *to_next;
  flowprobe_next_t next_index;
  flowprobe_main_t *fm = &flowprobe_main;
  timestamp_nsec_t timestamp;

  unix_time_now_nsec_fraction (&timestamp.sec, &timestamp.nsec);
  
// access frame vector data
  from = vlib_frame_vector_args (frame);
  n_left_from = frame->n_vectors;
  next_index = node->cached_next_index;

  while (n_left_from > 0)
    {
      u32 n_left_to_next;

      // get pointer to next vector data
      vlib_get_next_frame (vm, node, next_index, to_next, n_left_to_next);

// dual loop – we are processing two buffers and prefetching next two buffers
      while (n_left_from >= 4 && n_left_to_next >= 2)
  {
    u32 next0 = FLOWPROBE_NEXT_DROP;
    u32 next1 = FLOWPROBE_NEXT_DROP;
    u16 len0, len1;
    u32 bi0, bi1;
    vlib_buffer_t *b0, *b1;

    /* Prefetch next iteration. */
             // prefetching packets p3 and p4 while p1 and p2 are processed
    {
      vlib_buffer_t *p2, *p3;

      p2 = vlib_get_buffer (vm, from[2]);
      p3 = vlib_get_buffer (vm, from[3]);

      vlib_prefetch_buffer_header (p2, LOAD);
      vlib_prefetch_buffer_header (p3, LOAD);

      CLIB_PREFETCH (p2->data, CLIB_CACHE_LINE_BYTES, STORE);
      CLIB_PREFETCH (p3->data, CLIB_CACHE_LINE_BYTES, STORE);
    }
/* speculatively enqueue b0 and b1 to the current next frame */
// frame contains buffer indecies (bi0, bi1) instead of pointers
    to_next[0] = bi0 = from[0];
    to_next[1] = bi1 = from[1];
    from += 2;
    to_next += 2;
    n_left_from -= 2;
    n_left_to_next -= 2;

// translate buffer index to buffer pointer
    b0 = vlib_get_buffer (vm, bi0);
    b1 = vlib_get_buffer (vm, bi1);
// select next node based on feature arc
    vnet_feature_next (&next0, b0);
    vnet_feature_next (&next1, b1);

    len0 = vlib_buffer_length_in_chain (vm, b0);
// get current data (header) from packet to process
// currently we are on L2 so get etehernet header, but if we
// are on L3 for example we can retrieve L3 header, i.e.
// ip4_header_t *ip0 = (ip4_header_t *) ((u8 *) vlib_buffer_get_current (b0) 
    ethernet_header_t *eh0 = vlib_buffer_get_current (b0);
    u16 ethertype0 = clib_net_to_host_u16 (eh0->type);

    if (PREDICT_TRUE ((b0->flags & VNET_BUFFER_F_FLOW_REPORT) == 0))
      add_to_flow_record_state (vm, node, fm, b0, timestamp, len0,
              flowprobe_get_variant
              (which, fm->context[which].flags,
               ethertype0), 0);
...
/* verify speculative enqueue, maybe switch current next frame */
    vlib_validate_buffer_enqueue_x1 (vm, node, next_index,
             to_next, n_left_to_next,
             bi0, next0);
  }

      vlib_put_next_frame (vm, node, next_index, n_left_to_next);
    }
  return frame->n_vectors;
}

Nodes

As we said – vlib is also designed for graph node management. When creating a new feature, one has to initialize it, using the VLIB_INIT_FUNCTION macro. This constructs a vlib_node_registration_t, most often via the VLIB_REGISTER_NODE macro. At runtime, the framework processes the set of such registrations into a directed graph.

/*
 * Copyright (c) 2016 Cisco and/or its affiliates. Licensed under the Apache License, Version 2.0. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0<br >
 */

static clib_error_t *
flowprobe_init (vlib_main_t * vm)
{
  /* ... initialize things ... */
 
  return 0;
}

VLIB_INIT_FUNCTION (flowprobe_init);

...

VLIB_REGISTER_NODE (flowprobe_l2_node) = {
  .function = flowprobe_l2_node_fn,
  .name = "flowprobe-l2",
  .vector_size = sizeof (u32),
  .format_trace = format_flowprobe_trace,
  .type = VLIB_NODE_TYPE_INTERNAL,
  .n_errors = ARRAY_LEN(flowprobe_error_strings),
  .error_strings = flowprobe_error_strings,
  .n_next_nodes = FLOWPROBE_N_NEXT,
  .next_nodes = FLOWPROBE_NEXT_NODES,
};

VLIB_REGISTER_NODE (flowprobe_walker_node) = {
  .function = flowprobe_walker_process,
  .name = "flowprobe-walker",
  .type = VLIB_NODE_TYPE_INPUT,
  .state = VLIB_NODE_STATE_INTERRUPT,
};

Type member in node registration specifies the purpose of the node:

  • VLIB_NODE_TYPE_PRE_INPUT – run before all other node types
  • VLIB_NODE_TYPE_INPUT – run as often as possible, after pre_input nodes
  • VLIB_NODE_TYPE_INTERNAL – only when explicitly made runnable by adding pending frames for processing
  • VLIB_NODE_TYPE_PROCESS – only when explicitly made runnable.

The initialization of feature is executed at some point in the application’s startup. However, constraints must be used to specify an order (when one feature has to be initialized after/before another one). To hook feature into specific feature arc VNET_FEATURE_INT macro can be used.

/*
 * Copyright (c) 2016 Cisco and/or its affiliates. Licensed under the Apache License, Version 2.0. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0
 */

VNET_FEATURE_INIT (ip4_nat44_ed_hairpin_src, static) = {
  .arc_name = "ip4-output",
  .node_name = "nat44-ed-hairpin-src",
  .runs_after = VNET_FEATURES ("acl-plugin-out-ip4-fa"),
};

VNET_FEATURE_INIT (ip4_nat_hairpinning, static) =
{
  .arc_name = "ip4-local",
  .node_name = "nat44-hairpinning",
  .runs_before = VNET_FEATURES("ip4-local-end-of-arc"),
};

Since VLIB_NODE_TYPE_INPUT nodes are the starting point of a feature arc, they are responsible for generating packets from some source, like a NIC or PCAP file and injecting them into the rest of the graph.

When registering a node, one can provide a .next_node parameter with an indexed list of the upcoming nodes in the graph. For example, a flowprobe node below:

/*
 * Copyright (c) 2017 Cisco and/or its affiliates. Licensed under the Apache License, Version 2.0. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0<br >
 */

...
next_nodes = FLOWPROBE_NEXT_NODES,
...

#define FLOWPROBE_NEXT_NODES {				\
    [FLOWPROBE_NEXT_DROP] = "error-drop",		\
    [FLOWPROBE_NEXT_IP4_LOOKUP] = "ip4-lookup",		\
}

vnet_feature_next is commonly used to select the next node. This selection is based on the feature mechanism, as in the flowprobe example above:

/*
 * Copyright (c) 2017 Cisco and/or its affiliates. Licensed under the Apache License, Version 2.0. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0
 */

flowprobe_node_fn (vlib_main_t * vm,
       vlib_node_runtime_t * node, vlib_frame_t * frame,
       flowprobe_variant_t which)
{
...
    b0 = vlib_get_buffer (vm, bi0);
    b1 = vlib_get_buffer (vm, bi1);
        // select next node based on feature arc
    vnet_feature_next (&next0, b0);
    vnet_feature_next (&next1, b1);

The graph node dispatcher pushes the work-vector through the directed graph, subdividing it as needed until the original work-vector has been completely processed.

Graph node dispatch functions call vlib_get_next_frame to set (u32 *)to_next to the right place in the vlib_frame_t, corresponding to the ith arc (known as next0) from the current node, to the indicated next node.

Before a dispatch function returns, it’s required to call vlib_put_next_frame for all of the graph arcs it actually used. This action adds a vlib_pending_frame_t to the graph dispatcher’s pending frame vector.

/*
 * Copyright (c) 2015 Cisco and/or its affiliates. Licensed under the Apache License, Version 2.0. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0
 */

      vlib_put_next_frame (vm, node, next_index, n_left_to_next);
    }
  return frame->n_vectors;
}

Pavel Kotúček

Thank you for reading through the 5th part of our PANTHEON.tech VPP Guide! As always, feel free to contact us if you are interested in customized solutions!


You can contact us at https://pantheon.tech/

Explore our Pantheon GitHub.

Watch our YouTube Channel.

VPP 105: Memory Management & DPDK APIs

Vector Packet Processing 104: gRPC & REST

Welcome back to our Vector Packet Processing implementation guide, Part 4.

Today, we will go through the essentials of gRPC and REST and introduce their core concepts, while introducing one missing functionality into our VPP build. This part will also introduce the open-source GoVPP, Go language bindings for binary API – VPP management.

The Issue

Natively, VPP does not include a gRPC / RESTful interface. PANTHEON.tech has developed an internal solution, which utilizes a gRPC-gateway to VPP, using GoVPP. It is called VPP-RPC, through which you can now connect to VPP using a REST client. 

In case you are interested in this solution, please contact us via our website.


Introduction

First and foremost, here are the terms that you will come across in this guide:

  • gRPC (Remote Procedure Calls) is a remote procedure call (RPC) system initially developed by Google. It uses HTTP/2 for transport, protocol buffers as the interface description language, and provides features such as authentication, bidirectional streaming and flow control, blocking or non-blocking bindings, and cancellation and timeouts. It generates cross-platform client and server bindings for many programming-languages.
  • gRPC-gateway (GRPC-GW) is a plugin of protobuf. It reads the gRPC service definition and generates a reverse-proxy server which translates a RESTful JSON API into gRPC. This server is generated according to custom options in your gRPC definition.
  • VPP-Agent is aset of VPP-specific plugins that interact with Ligato, in order to access services within the same cloud. VPP Agent provides VPP functionality to client apps through a VPP model-driven API
  • VPP-RPC is our new RESTful VPP service. It utilizes gRPC & gRPC-gateway as 2 separate processes, in order to facilitate communication with VPP through GoVPP.

JSON message sequence

gRPC gateway exposes the REST service, for which there is no built-in support in VPP. By using gRPC-gateway and gRPC server services, the VPP API is available through gRPC and REST at the same time. Both services use generated models from VPP binary APIs, therefore exposing both APIs is easy enough to include both. It is up to the user to choose, which mechanism they will use.

When exchanging data between a browser and a server, the data can only be text. Any JavaScript object can be converted into a JSON and sent to the server. This allows us to process data as JavaScript objects, while avoiding complicated translations and parsing issues.

The sequence diagram below describes the path the original JSON message takes in our solution:

  1. The Client sends a HTTP request to GRPC-GW
  2. GRPC-GW transforms the JSON into protobuf message & sends it to the gRPC server
  3. Each RPC has a go method handling its logic. For unary RPCs, this simply means copying the protobuf message into the corresponding VPP message structure and passing it to GoVPP binary API
  4. GoVPP eventually passes the message to the underlying VPP binary API, where the desired functionality is executed

VPP API build process

The figure below describes the build process for a single VPP API. Lets see what needs to happen for a tap API:

Build process for a VPP API. @PANTHEON.tech

VPP APIs are defined in /usr/share/VPP/api/ which is accessible after installing VPP. The Go package tap is a generated via the VPP Agent binary API of the ‘tap’ VPP module. It is generated from this file: tap.api.json.

This file will drive the creation of all 3 building blocks:

  • GoVPP’s binapi-generator generates the GoVPP file tap.ba.go
  • vpp-rpc-protogen generates tap.proto, which contains the gRPC messages and services, including the URL for each RPC
  • protoc’s go plugin will compile the proto file and create a gRPC stub tap.pb.go, containing client & server interfaces that define RPC methods and protobuf message structs.
  • Vpp-rpc-implementor will generate the code that implements the TapServer interface – the actual RPC methods calling GoVPP APIs – in Tap.server.go file.
  • protoc’s gRPC-gateway plugin will compile the proto file and create the reverse proxy tap.pb.gw.go. We don’t have to touch this file further.

Running the examples

To run all the above, we need to make sure all these processes are running:

  • VPP service
  • gRPC server (needs root privileges)

    sudo vpp-rpc-server
  • gRPC gateway

    vpp-rpc-gateway

Running a simple request using Curl

If we want to invoke the API we have created, we can use Curl. Vpe will require an URL (vpe/showversion/request), which will map the above mentioned API to VPP’s binary API. We will now use Curl to POST a request to a default address of localhost:8080:

curl -H "Content-Type: application/json" -X POST 127.0.0.1:8080/Vpe/ShowVersion/request --silent

{"Retval":0,"Program":"dnBlAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=","Version":"MTguMDctcmMwfjEyOC1nNmYxYzQ4ZAAAAAAAAAAAAAA=",
"BuildDate":"xaB0IG3DoWogIDMgMTQ6MTA6NTQgQ0VTVCAyMDE4AAA=",
"BuildDirectory":"L2hvbWUvcGFsby92cHA"}

The decoded reply says:

{
    "Retval": 0,
    "Program": "vpe",
    "Version": "18.07-rc0~128-g6f1c48d",
    "BuildDate": "Thu May  3 14:10:54 CEST 2018",
    "BuildDirectory": "/home/user/vpp"
}

Postman collection

We provide a Postman collection within our service, which serves as a starting point for users with our solution. The collection created in the vpp-rpc repository, tests/vpp-rpc.postman_collection.json path, contains various requests and subsequent tests to the Tap module.

Performance analysis in Curl

Curl can give us detailed timing analysis of a request’s performance. If we run the previous request 100 times, the summary times (in milliseconds) we get usually are:

mean=4.82364000000000000
min=1.047
max=23.070
average rr per_sec=207.31232015656226418223
average rr per_min=12438.73920939373585093414
average rr per_hour=746324.35256362415105604895

Judging from the graph below, we see that most of the requests take well below the average mark. Profiling our solution, we’ve found the reason for the anomalies (above 10ms) to be caused by GoVPP itself when waiting on the reply from VPP. This behavior is well documented on GoVPP wiki. We can conclude our solution closely mirrors the performance  of the synchronous GoVPP APIs.

Here is the unary RPC total time in ms:

 

In conclusion, we have introduced ourselves to the concepts of gRPC and have run our VPP API + GoVPP build, with a REST service feature. Furthermore, we have shown you our in-house solution VPP-RPC, which facilitates the connection between the API and GoVPP.

If you would like to inquire about this solution, please contact us for more information.

In the last part of this series, we will take a closer look at the gNMI service and how we can benefit from it.


You can contact us at https://pantheon.tech/

Explore our Pantheon GitHub.

Watch our YouTube Channel.

Vector Packet Processing 103: Ligato & VPP Agent

Vector Packet Processing 103: Ligato & VPP Agent

Welcome back to our guide on Vector Packet Processing. In today’s post number three from our VPP series, we will take a look at Ligato and the its VPP Agent.

Ligato is one of multiple commercially supported technologies supported by PANTHEON.tech.

What is a VNF?

A Virtual Network Function is a software implementation of a function. It runs on single or multiple Virtual Machines or Containers, on top of a hardware networking infrastructure. Individual functions of this network may be implemented or combined together, in order to create a complete networking communication service. A VNF can be used as a standalone entity or part of an SDN architecture.

Its life-cycle is controlled by orchestration systems, such as the increasingly popular Kubernets. Cloud-native VNFs and their control/management plane can expose REST or gRPC APIs to external clients, communicate over a message bus or provide a cloud-friendly environment for deployment and usage. It can also support high-performance data planes, such as VPP.

What is Ligato?

It is the open source cloud platform for building and wiring VNFs. Ligato provides infrastructure and libraries, code samples and CI/CD process to accelerate and improve the overall developer experience. It paves the way towards faster code reuse, reducing costs and increasing application agility & maintainability. Being native to the cloud, Ligato has a minimal footprint, plus can be easily integrated, customized and extended, deployed using Kubernetes. The three main components of Ligato are:

  • CN Infra – a Golang platform for developing cloud-native microservices. It can be used to develop any microservice, even though it was primarily designed for Virtual Network Function management/control plane agents.
  • SFC Controller – an orchestration module for data-plane connectivity within cloud-native containers. These containers may be VPP-Agent enabled or communicate via veth interfaces.
  • BGP Agent – a Border Gateway Protocol information provider. You can also view a Ligato demonstration done by PANTHEON.tech here.

The platform is modular-based – new plugins provide new functionality. These plugins can be setup in layers, where each layer can form a new platform with different services at a higher layer plane. This approach mainly aims to create a management/control plane for VPP, with the addition of the VPP Agent.

What is the VPP Agent?

The VPP Agent is a set of VPP-specific plugins that interact with Ligato, in order to access services within the same cloud. VPP Agent provides VPP functionality to client apps through a VPP model-driven API. External and internal clients can access this API, if they are running on the same CN-Infra platform, within the same Linux process. 

Quickstarting the VPP Agent

For this example, we will work with the pre-built Docker image.

Install & Run

  1. Run the downloaded Docker image:

    docker pull ligato/vpp-agent
    docker run -it --name vpp --rm ligato/vpp-agent
  2. Using agentctl, configure the VPP Agent:

    docker exec -it vpp agentctl -
  3. Check the configuration, using agentctl or the VPP console:

    docker exec -it vpp agentctl -e 172.17.0.1:2379 show
    docker exec -it vpp vppctl -s localhost:500

For a detailed rundown of the Quickstart, please refer to the Quickstart section of VPP Agents Github.

We have shown you how to integrate and quickstart the  VPP Agent, on top of Ligato.

Our next post will highlight gRPC/REST – until then, enjoy playing around with VPP Agent.


You can contact us at https://pantheon.tech/

Explore our Pantheon GitHub.

Watch our YouTube Channel.

Vector Packet Processing 102: Honeycomb & hc2vpp

Vector Packet Processing 102: Honeycomb & hc2vpp

Welcome to the second part of our VPP Introduction series, where we will talk about details of the Honeycomb project. Please visit our previous post on VPP Plugins & Binary API, which is used in Honeycomb to manage the VPP agent.

What is Honeycomb?

Honeycomb is a generic, data plane management agent and provides a framework for building specialized agents. It exposes NETCONF, RESTCONF and BGP as northbound interfaces.

Honeycomb runs several, highly functional sets of APIs, based in ODL, which are used to program the VPP platform. It leverages ODL’s existing tools and integrates several of its existing components (YANG Tools, MD-SAL, NETCONF/RESTCONF…).  In other words – it is a light on resources, bare bone version of OpenDaylight.

Its translation layer and data processing pipelines are classified as generic, which makes it extensible and usable not only as a VPP specific agent.

Honeycomb’s functionality can be split into two main layers:

  • Data Processing layer – pipeline processing for data from Northbound interfaces, towards the Translation layer
  • Translation layer – handles mainly configuration updates from data processing layer + reads and writes configuration-data
  • Plugins – extend Honeycombs usability

Honeycomb mainly acts as a bridge between VPP and the actual OpenDaylight SDN Controller.

Examples of VPP x Honeycomb integrations

We’ve already showcased several use cases on our Pantheon Technologies’ YouTube channel:

For the purpose of integrating VPP with Honeycomb, we will further refer to the project hc2vpp, which was directly developed for VPP usage.

What is hc2vpp?

This VPP specific build is called hc2vpp, which provides an interface (somewhere between a GUI and a CLI) for VPP. It runs on the same host as the VPP instance and allows to manage it off-the-box. This project is lead by Pantheons own Michal Čmarada.

Honeycomb was created due to a need for configuring VPP via NETCONF/RESTCONF. During the time it was created, NETCONF/RESTCONF was provided by ODL. Therefore, Honeycomb is based on certain ODL tools (data-store, YANG Tools, others). ODL as such uses an enormous variety of tools. Honeycomb was created as a separate project, in order to create a smaller footprint. After its implementation, it exists as a separate server and starts these implementations from ODL.

Later on, it was decided that Honeycomb should be split into a core instance, and hc2vpp would handle VPP related parts. The split also occurred, in order to provide the possibility of creating a proprietary device control agent. hc2vpp (Honeycomb to VPP) is a configuration agent, so that configurations can be sent via NETCONF/RESTCONF. It translates the configuration to low level APIs (called Binary APIs).

Honeycomb and hc2vpp can be installed in the same way as VPP, by downloading the repositories from GitHub. You can either:

Install Honeycomb

Install hc2vpp

For more information, please refer to the hc2vpp official project site.

In the upcoming post, we will introduce you to the Ligato VPP Agent.


You can contact us at https://pantheon.tech/

Explore our Pantheon GitHub.

Watch our YouTube Channel.

Vector Packet Processing 101: VPP Plugins & Binary API

Vector Packet Processing 101: VPP Plugins & Binary API

In the first part of our new series, we will be building our first VPP platform plug-in, using basic examples. We will start with a first-dive into plugin creation and finish with introducing VAPI into this configuration.

If you do not know what VPP is, please visit our introductory post regarding VPP and why you should consider using it.

Table of contents:

  • How to write a new VPP Plugin
    • 1. Preparing your new VPP plugin
    • 2. Building & running your new plugin
  • How to create new API messages
  • How to call the binary API
    • Additional C/C++ Examples

How to write a new VPP Plugin

The principle of VPP is, that you can plug in a new graph node, adapt it to your network purposes and run it right off the bat. Including a new plugin does not mean, you need to change your core-code with each new addition. Plugins can be either included in the processing graph, or they can be built outside the source tree and become an individual component in your build.

Furthermore, this separation of plugins makes crashes a matter of a simple process restart, which does not require your whole build to be restarted because of one plugin failure.

1. Preparing your new VPP plugin

The easiest way how to create a new plugin that integrates with VPP is to reuse the sample code at “src/examples/sample-plugin”. The sample code implements a trivial “macswap” algorithm that demonstrates the plugins run-time integration with the VPP graph hierarchy, API and CLI.

  • To create a new plugin based on the sample plugin, copy and rename the sample plugin directory

cp -r src/examples/sample-plugin/sample src/plugins/newplugin

#replace 'sample' with 'newplugin'. as always, take extra care with sed!
cd src/plugins/newplugin
fgrep -il "SAMPLE" * | xargs sed -i.bak 's/SAMPLE/NEWPLUGIN/g'
fgrep -il "sample" * | xargs sed -i.bak 's/sample/newplugin/g'
rm *.bak*
rename 's/sample/newplugin/g' *

There are the are following files:

    • node.c – implements functionality of this graph node (swap source and destination address) -update according to your requirements.
    • newplugin.api – defines plugin’s API, see below
    • newplugin.c, newplugin_test.c – implements plugin functionality, API handlers, etc.
  • Update CMakeLists.txt in newplugin directory to reflect your requirements:
add_vpp_plugin(newplugin
  SOURCES
  node.c
  newplugin.c

  MULTIARCH_SOURCES
  node.c

  API_FILES
  newplugin.api

  API_TEST_SOURCES
  newplugin_test.c

  COMPONENT vpp-plugin-newplugin
)
  • Update sample.c to hook your plugin into the VPP graph properly:
VNET_FEATURE_INIT (newplugin, static) = 
{
 .arc_name = "device-input",
 .node_name = "newplugin",
 .runs_before = VNET_FEATURES ("ethernet-input"),
};
  • Update newplugin.api to define your API requests/replies. For more details see “API message creation” below.
  • Update node.c to do required actions on input frames, such as handling incoming packets and more

2. Building & running your new plugin

  • Build vpp and your plugin. New plugins will be built and integrated automatically, based on the CMakeLists.txt
make rebuild
  • (Optional) Build & install vpp packages for your platform
make pkg-deb
cd build-root
sudo dpkg -i *.deb
  • The binary-api header files you can include later are located in build-root/build-vpp_debug-native/vpp/vpp-api/vapi
    •  If vpp is installed, they are located in /usr/include/vapi
  • Run vpp and check whether your plugin is loaded (newplugin has to be loaded and listed using the show plugin CLI command)
make run
...
load_one_plugin:189: Loaded plugin: nat_plugin.so (Network Address Translation)
load_one_plugin:189: Loaded plugin: newplugin_plugin.so (Sample VPP Plugin)
load_one_plugin:189: Loaded plugin: nsim_plugin.so (network delay simulator plugin)
...
DBGvpp# show plugins
...
 Plugin Version Description
 1. ioam_plugin.so 19.01-rc0~144-g0c2319f Inbound OAM
 ...
 x. newplugin_plugin.so 1.0 Sample VPP Plugin
 ...

How to create new API messages

API messages are defined in *.api files – see src/vnet/devices/af_packet.api, src/vnet/ip/ip.api, etc. These API files are used to generate corresponding message handlers. There are two types of API messages – non-blocking and blocking. These messages are used to communicate with the VPP Engine to configure and modify data path processing.

Non-blocking messages use one request and one reply message. Message replies can be auto-generated, or defined manually. Each request contains two mandatory fields – “client-index” and “context“, and each reply message contains mandatory fields – “context” and “retval“.

  • API message with auto-generated reply

autoreply define ip_table_add_del
{
 u32 client_index;
 u32 context;
 u32 table_id;
...
};
  • API message with manually defined reply
define ip_neighbor_add_del
{
 u32 client_index;
 u32 context;
 u32 sw_if_index;
...
};
define ip_neighbor_add_del_reply
{
 u32 context;
 i32 retval;
 u32 stats_index;
...
};

Blocking messages use one request and series of replies defined in *.api file. Each request contains two mandatory fields – “client-index” and “context“, and each reply message contains mandatory field – “context“.

  • Blocking message is defined using two structs – *-dump and *_details

define ip_fib_dump
{
 u32 client_index;
 u32 context;
...
};
define ip_fib_details
{
 u32 context;
...
};

Once you define a message in an API file, you have to define and implement the corresponding handlers for given request/reply message. These handlers are defined in one of component/plugin file and they use predefined naming – vl_api_…_t_handler – for each API message.

Here is an example for existing API messages (you can check it in src/vnet/ip component):

#define foreach_ip_api_msg \
_(IP_FIB_DUMP, ip_fib_dump) \
_(IP_NEIGHBOR_ADD_DEL, ip_neighbor_add_del) \
...
static void vl_api_ip_neighbor_add_del_t_handler (vl_api_ip_neighbor_add_del_t * mp, vlib_main_t * vm)
{
...
 REPLY_MACRO2 (VL_API_IP_NEIGHBOR_ADD_DEL_REPLY,
...
static void vl_api_ip_fib_dump_t_handler (vl_api_ip_fib_dump_t * mp)
{
...
 send_ip_fib_details (am, reg, fib_table, pfx, api_rpaths, mp->context);
...

Request and reply handlers are usually defined in api_format.c (or in plugin). Request uses a predefined naming – api_… for each API message and you have to also define help for each API message :

static int api_ip_neighbor_add_del (vat_main_t * vam)
{
...
  /* Construct the API message */
  M (IP_NEIGHBOR_ADD_DEL, mp);
  /* send it... */
  S (mp);
  /* Wait for a reply, return good/bad news */
  W (ret);
  return ret;
}
static int api_ip_fib_dump (vat_main_t * vam)
{
...
  M (IP_FIB_DUMP, mp);
  S (mp);
  /* Use a control ping for synchronization */
  MPING (CONTROL_PING, mp_ping);
  S (mp_ping);
  W (ret);
  return ret;
}
#define foreach_vpe_api_msg \
...
_(ip_neighbor_add_del, \
 "(<intfc> | sw_if_index <id>) dst <ip46-address> " \
 "[mac <mac-addr>] [vrf <vrf-id>] [is_static] [del]") \
...
_(ip_fib_dump, "") \
...

Replies can be auto-generated or manually defined.

  • auto-generated reply using define foreach_standard_reply_retval_handler, with predefined naming
  • manually defined reply with details

How to call the binary API

In order to call the binary API, we will introduce VAPI to our configuration.

VAPI is the high-level C/C++ binary API. Please refer to src/vpp-api/vapi/vapi_doc.md for details.

VAPI’s multitude of advantages include:

  • All headers in a single place – /usr/include/vapi => simplifies code generation
  • Hidden internals – one no longer has to care about message IDs, byte-order conversion
  • Easier binapi calls – passing user provided context between callbacks

We can use the following C++ code to call our new plugins’s binary API.

#include <cstdlib>
#include <iostream>
#include <cassert>

//necessary includes & macros
#include <vapi/vapi.hpp>
#include <vapi/vpe.api.vapi.hpp>
DEFINE_VAPI_MSG_IDS_VPE_API_JSON

//include the desired modules / plugins
#include <vapi/newplugin.api.vapi.hpp>
DEFINE_VAPI_MSG_IDS_NEWPLUGIN_API_JSON

using namespace vapi;
using namespace std;

//parameters for connecting
static const char *app_name = "test_client";
static const char *api_prefix = nullptr;
static const int max_outstanding_requests = 32;
static const int response_queue_size = 32;

#define WAIT_FOR_RESPONSE(param, ret)      \
  do                                       \
    {                                      \
      ret = con.wait_for_response (param); \
    }                                      \
  while (ret == VAPI_EAGAIN)

//global connection object
Connection con;

void die(int exit_code)
{
    //disconnect & cleanup
    vapi_error_e rv = con.disconnect();
    if (VAPI_OK != rv) {
        fprintf(stderr, "error: (rc:%d)", rv);
    }

    exit(exit_code);
}

int main()
{
    //connect to VPP
    vapi_error_e rv = con.connect(app_name, api_prefix, max_outstanding_requests, response_queue_size);

    if (VAPI_OK != rv) {
        cerr << "error: connecting to vlib";
        return rv;
    }

    try
    {
        Newplugin_macswap_enable_disable cl(con);

        auto &mp = cl.get_request().get_payload();

        mp.enable_disable = true;
        mp.sw_if_index = 5;

        auto rv = cl.execute ();
        if (VAPI_OK != rv) {
            throw exception{};
        }

        WAIT_FOR_RESPONSE (cl, rv);
        if (VAPI_OK != rv) {
            throw exception{};
        }

        //verify the reply
        auto &rp = cl.get_response ().get_payload ();
        if (rp.retval != 0) {
            throw exception{};
        }
    }
    catch (...)
    {
        cerr << "Newplugin_macswap_enable_disable ERROR" << endl;
        die(1);
    }

    die(0);
}

Additional C/C++ Examples

Furthermore, you are encouraged to try the minimal VAPI example provided in vapi_minimal.zip. This example creates a loopback interface, assigns it an IPv4 address and then prints the address.
Follow these steps:

  • Install VPP
  • Extract the archive, build & run examples
unzip vapi_minimal.zip
mkdir build; cd build
cmake ..
make

#c example
sudo ./vapi_minimal
#c++ example
sudo ./vapi_minimal_cpp

In conclusion, we have:

  • successfully built and ran our first VPP plugin
  • created and called an API message in VPP

Our next post will introduce and highlight the key reasons, why you should consider Honeycomb/hc2vpp in your VPP build.


You can contact us at https://pantheon.tech/

Explore our Pantheon GitHub. Follow us on Twitter.

Watch our YouTube Channel.