SONiC Generic Configuration Update and Rollback

High Level Design Document

Rev 0.1

Table of Contents
List of Tables
Revision
About this Manual
Scope
Definition/Abbreviation
- Table 1: Abbreviations
1 Feature Overview
- 1.1 Requirements
- 1.2 Design Overview
  - 1.2.1 Basic Approach
  - 1.2.2 Container
2 Functionality
- 2.1 Target Deployment Use Cases
- 2.2 Functional Description
3 Design
- 3.1 Overview
4 Flow Diagrams
5 Error Handling
6 Serviceability and Debug
7 Warm Boot Support
8 Scalability
9 Unit Tests
10 E2E Tests

List of Tables

Revision

Rev	Date	Author	Change Description
0.1	03/01/2021	Mohamed Ghoneim	Initial version

About this Manual

This document provides a detailed description on the strategy to implement the SONiC configuration generic update and rollback feature.

Scope

This document describes the high level design of a SONiC configuration generic update and rollback feature. This document provides minor implementation details about the proposed solutions.

Definition/Abbreviation

Table 1: Abbreviations

Term	Meaning
ConfigDB	Configuration Database
JSON	JavaScript Object Notation

1 Feature Overview

Updating SONiC partial configurations systematically has been a challenge for a long time, as each part of the config has different requirements in terms of which files to push to the device, what commands to use, and if there are services that need manual restarting. For example updating ACLs is very different from updating DHCP configurations.

ACLs: Updating ACLs require the following steps:

Pushing acl.json file to the device that contain the new ACL rules
Pushing minigraph.xml to the device that contains the new ACL interfaces
Execute sudo acl-loader update full /etc/sonic/acl.json --table_name example_acl

DHCP: Updating DHCP config requires the following steps:

Pushing minigraph.xml to the device that contains the new ACL interfaces
Generating dhcp_servers JSON configs from the minigraph.xml, and saving it as a temporary file
Executing sudo sonic-cfggen -j /tmp/dhcp.json --write-to-db
Restart dhcp_relay service

We have explored SONiC CLI commands to make configuration changes. These CLI commands result in updates to the ConfigDB which are corresponding to the CLI command executed. For example, the config vlan add 10 will create a new row in the VLAN table of the ConfigDB. But relying on the CLI commands to do partial update is also not feasible as there is no standard way of showing the config after the update. Setting up a different update mechanism for each part of the config is very time consuming and inefficient.

The other challenge to updating a switch is recoverability via rollback. Rollback needs to be with minimum-disruption e.g. if reverting ACL updates DHCP should not be affected. Currently SONiC has a couple of operations that can be candidates for rollback config load and config reload.

config reload <config_db.json> : This command clears all the contents of the ConfigDB and loads the contents of config_db.json into the ConfigDB. After that all the Docker containers and Linux services are restarted to establish the user specified configuration state in the config_db.json file.

Pro's
- Assured way of affecting a configuration state change
Con's
- Brings the links down and resets the forwarding state. This operation is disruptive in nature
- Time consuming as it may take 2-3 minutes for all the services to come back online. The time taken may vary based on the switch CPU power.
Verdict
- Cannot be used as a rollback mechanism

config load <config_db.json>: This command loads the contents of config_db.json into the ConfigDB. The updates made to the ConfigDB are additive in nature and thus the new configuration state is a combination of the current running state and the partial configuration state specified by the user in the config_db.json file

Pro's
- Quick way to add new configuration changes
- It does not disrupt existing service whose configuration is not being modified. So it is non-disruptive in nature
Con's
- Can't remove existing configuration and can only be used to add/modify the existing configuration
Verdict
- Cannot be used as a rollback mechanism

Since both config load and config reload are not suitable for a minimum-disruption rollback, we have to look for other approaches.

In this design document, we will be exploring how to standardize the way to do partial updates, how to take checkpoints and finally how to rollback the configurations.

In summary, this is the flow of an update:

And the steps would be:

admin@sonic:~$ config checkpoint mycheckpoint
admin@sonic:~$ echo "config changes to apply to ConfigDb" > some-config-changes.json-patch
admin@sonic:~$ config apply-patch ./some-config-changes.json-patch
admin@sonic:~$ config rollback mycheckpoint # in case of failures

1.1 Requirements

1.1.1 Functional Requirements

A single, simple command to partially update SONiC configuration according to a patch of updates
A single, simple command to take a checkpoint of the full current SONiC config
A single, simple command to fully rollback current SONiC configs with to a checkpoint
[low-priority] A single simple command to fully replace current SONiC configs with a full config provided by an external user.
Other commands to list checkpoints, delete checkpoints
The patch of updates should follow a standard notation. The JSON Patch (RFC6902) notation should be used
Config rollback should be with minimum disruption to the device e.g. if reverting ACL updates DHCP should not be affected i.e. minimum-disruption rollback
User should be able to preview the configuration difference before going ahead and committing the configuration changes
In case of errors, the system should just report an error and the user should take care of it
Only one session globally can update device config at a time i.e. no concurrent updates to configurations

1.1.2 Configuration and Management Requirements

All commands argument to generated using Python-click to provide help menus and other out-of-the box features
Only root user must be allowed to execute the commands
Non-root users can execute commands with dry-run option
Each command must provide the following sub options:
- "dry-run" Perform a dry-run of the command showing exactly what will be executed on the device, without executing it
- "verbose" Provide additional information on the steps executed

1.1.3 Scalability Requirements

N/A

1.1.4 Warm Boot Requirements

N/A

1.2 Design Overview

1.2.1 Basic Approach

SONiC ConfigDB contents can be retrieved in a JSON file format. Modifying JSON file should follow some rules in order to make it straightforward for the users. Fortunately there is already a formal way of defining JSON config updates. It is called JsonPatch, and is formally defined in RFC 6902 JSON Patch.

On top of ConfigDBConnector we are going to implement RFC 6902 JSON Patch. This API we will call apply-patch. On top of that API, we will implement the rollback functionality. It will simply starts by getting the diff (patch) between the checkpoint and the current running config, then it will call the API apply-patch to update that patch.

The JsonPatch python is an open source library that already implements the RFC 6902 JSON Patch. We can leverage this library to verify patch config, generate a diff between checkpoint and current running config, verify apply-patch and rollback work as expected by simulating the final output of the update and comparing with the observed output.

Example: Assume running-config to be:

{
  "DEVICE_NEIGHBOR": {
    "Ethernet8": {
      "name": "Servers1",
      "port": "eth0"
    },
    "Ethernet96": {
      "name": "Servers23",
      "port": "eth0"
    },
  },
  "DHCP_SERVERS": {
    "1.1.1.1": {},
    "2.2.2.2": {},
    "3.3.3.3": {}
  }
}

and the ask to:

replace port under Ethernet8 with eth1
add 4.4.4.4 to DHCP_SERVERS
remove 2.2.2.2 from DHCP_SERVERS

The steps would be:

Take the content of config DB here and store it as a checkpoint

admin@sonic:~$ config checkpoint mycheckpoint

Create a file on the device named dhcp-changes.json-patch, with the following content

[
  {
    "op": "replace",
    "path": "/DEVICE_NEIGHBOR/Ethernet8/port",
    "value": "eth1"
  },
  {
    "op": "add",
    "path": "/DHCP_SERVERS/4.4.4.4",
    "value": {}
  },
  {
    "op": "remove",
    "path": "/DHCP_SERVERS/2.2.2.2"
  }
]

Apply patch using apply-patch command

admin@sonic:~$ config apply-patch ./dhcp-changes.json-patch

In case of failure, rollback the config using rollback command

admin@sonic:~$ config rollback mycheckpoint --verbose # verbose to see generated patch

This will internally do a diff, and generate patch of the needed changes and apply using apply-patch. The patch will be:

[
  {
    "op": "replace",
    "path": "/DEVICE_NEIGHBOR/Ethernet8/port",
    "value": "eth0"
  },
  {
    "op": "remove",
    "path": "/DHCP_SERVERS/4.4.4.4,",
    "value": {}
  },
  {
    "op": "add",
    "path": "/DHCP_SERVERS/2.2.2.2"
  },
]

1.2.2 Container

All the introduced commands will be part of the python-sonic-utilities package installed in Debian host O/S.

2 Functionality

2.1 Target Deployment Use Cases

The apply-patch method should help with automating partial config updates, as external systems can generate the update patch, and apply.

The checkpoint and rollback commands should help improve recoverability, can also be used by external systems to help revert failures during apply-patch operation.

Human operators can also leverage the checkpoint and rollback functionalities while doing updates through the CLI using SONiC CLI commands.

2.2 Functional Description

2.2.1 Apply-Patch

The SONiC apply-patch command can broadly classified into the following steps

Stage-1 JSON Patch Verification

Using YANG SONiC models, but the format of the JSON patch is not what YANG SONiC models is built to verify. We will verify using the following steps:

Get current running config from ConfigDB as JSON
Simulate the patch application on the current config JSON object
Verify the the simulated output using YANG SONiC models

Stage-2 JSON Patch Ordering

There are many ideas to ordering the patch, I will pick a simple and straight forward idea to better understand this problem. Let's assume the main granular element is Table. Each table gets assigned an order index based on its semantic dependencies. Semantic dependencies mean tables referencing other tables, think of it as the result of doing a topological sorting of table dependencies. For example: PORT lots of table depend on it, but it does not depend on other tables so it gets a low index. VLAN_MEMBER depends on PORT so its gets a higher index, and so on. This helps make sure low order table absorb the changes first before dependent tables are updated.

Let's assume the following order indices:

PORT = 1
VLAN_MEMBER = 2
ACL = 3

And the patch to update is the following:

[
  { "op": "add", "path": "/ACL_TABLE/NO-NSW-PACL-V4/ports/0", "value": "Ethernet2" }
  { "op": "add", "path": "/VLAN_MEMBER/Vlan100|Ethernet2", "value": { "tagging_mode": "untagged" } }
  { "op": "add", "path": "/PORT/Ethernet2", "value": { "lanes": "65", "speed": "10000"} }
]

Each operation belongs to a single table, the table name will be the first token in the path after the first /. For example the table of the first operation is ACL_TABLE. Let's add the table name to each operation.

[
  { "table": "ACL_TABLE", "op": "add", "path": "/ACL_TABLE/NO-NSW-PACL-V4/ports/0", "value": "Ethernet2" }
  { "table": "VLAN_MEMBER", "op": "add", "path": "/VLAN_MEMBER/Vlan100|Ethernet2", "value": { "tagging_mode": "untagged" } }
  { "table": "PORT", "op": "add", "path": "/PORT/Ethernet2", "value": { "lanes": "65", "speed": "10000"} }
]

Using the indices table above, let's assign an order index to each operation:

[
  { "order": 3, "table": "ACL_TABLE", "op": "add", "path": "/ACL_TABLE/NO-NSW-PACL-V4/ports/0", "value": "Ethernet2" }
  { "order": 2, "table": "VLAN_MEMBER", "op": "add", "path": "/VLAN_MEMBER/Vlan100|Ethernet2", "value": { "tagging_mode": "untagged" } }
  { "order": 1, "table": "PORT", "op": "add", "path": "/PORT/Ethernet2", "value": { "lanes": "65", "speed": "10000"} }
]

Now let's order the operations by "order":

[
  { "order": 1, "table": "PORT", "op": "add", "path": "/PORT/Ethernet2", "value": { "lanes": "65", "speed": "10000"} }
  { "order": 2, "table": "VLAN_MEMBER", "op": "add", "path": "/VLAN_MEMBER/Vlan100|Ethernet2", "value": { "tagging_mode": "untagged" } }
  { "order": 3, "table": "ACL_TABLE", "op": "add", "path": "/ACL_TABLE/NO-NSW-PACL-V4/ports/0", "value": "Ethernet2" }
]

This will be the final order of applying the changes.

Unfortunately this solution does not always work especially for the case of deletion from PORT, VLAN_MEMBER, ACL_TABLE. Where if following the same logic we will have:

[
  { "order": 1, "table": "PORT", "op": "remove", "path": "/PORT/Ethernet2" }
  { "order": 2, "table": "VLAN_MEMBER", "op": "remove", "path": "/VLAN_MEMBER/Vlan100|Ethernet2" }
  { "order": 3, "table": "ACL_TABLE", "op": "remove", "path": "/ACL_TABLE/NO-NSW-PACL-V4/ports/0" }
]

This will not work, as the PORT table will complain that the port Ethernet2 is still in use by a VLAN_MEMBER.

Since this problem can be solved by many ways, and it needs to verified thoroughly, let's abstract it to a a contract which will be implemented later.

The contract would be:

list<JsonChange> order-patch(JsonPatch jsonPatch)

Here is a summary explaining the JsonChange contract. Check 3.1.1.4.1 JsonChange for detailed description.

aspect	description
definition	JsonChange is a JsonPatch in terms of the final outcome of the update. Ordering of JsonChange updates will not follow the operations order within a JsonPatch, but will update the JSON file in any arbitrary order.

Here is a summary explaining the order-patch contract, Check 3.1.1.4 Patch Orderer for detailed description.

aspect	item	description
inputs	JsonPatch	It represents the changes that needs to applied to the device running config, described in JSON Patch (RFC6902).
outputs	list<JsonChange>	The list will contain the steps to be followed to apply the input JsonPatch correctly. Each item in the list is assumed to be executed after the previous item, in the order given in the list.
errors	malformedPatchError	Will be raised if the input JsonPatch is not valid according to JSON Patch (RFC6902).
	other errors	Check 3.1.1.4.2 Order-Patch for exact list of errors to expect.
side-effects	None
assumptions	running-config locked	The implementor of this contract might interact with ConfigDB to get the running-config, it is assumed the running-config is locked for changes for the lifespan of the operation.

Stage-3 Applying list of JsonChanges in order

There are a few SONiC applications which store their configuration in the ConfigDB. These applications do not subscribe to the ConfigDB change events. So any changes to their corresponding table entries as part of the patch apply process in the ConfigDB are not processed by the application immediately. In order to apply the configuration changes, corresponding service needs to be restarted. Listed below are some example tables from SONiC config, and the corresponding services that need to be manually restarted.

NOTE: In the below table we have "Key": ["list-item1", ...]. Key in below example is the table name, corresponding list is the list of services that needs restarting post table update.

{
  "SYSLOG_SERVER": ["rsyslog"],
  "DHCP_SERVER": ["dhcp_relay"],
  "NTP_SERVER": ["ntp-config.service", "ntp.service"]
  "BGP_MONITORS": ["bgp"],
  "BUFFER_PROFILE": ["swss"],
  "RESTAPI": ["restapi"]
}

Although some other services do not need manual restarting, and absorb the configs silently, there is no good way to make sure there is no errors encountered while the services absorbed the changes. Some of these services might report an error to syslog, others might crash ...etc.

Currently there is no standard approach of updating ConfigDB that takes care of manual service restart and verifying services have absorbed change correctly.

So we are going to introduce a new contract overlaying ConfigDB changes, that will take care of making correct changes to ConfigDB as well as verifying that corresponding services have absorbed the changes correctly.

The contract would be:

void apply-change(JsonChange jsonChange)

Check 3.1.1.4.1 JsonChange for detailed description of JsonChange.

Here is a summary explaining the apply-change contract, Check 3.1.1.4 Change Applier for detailed description.

aspect	item	description
inputs	JsonChange	It represents the changes that needs to applied to the device running config, described in 3.1.1.4.1 JsonChange.
outputs	None
errors	malformedChangeError	Will be raised if the input JsonChange is not valid according to 3.1.1.4.1 JsonChange.
	other errors	Check 3.1.1.4.1 apply-change for exact list of errors to expect.
side-effects	updating running-config	This operation will cause changes to the running-config according to the input JsonChange.
assumptions	running-config locked	The implementor of this contract will interact with ConfigDB to updating the running-config, it is assumed the running-config is locked for changes for the lifespan of the operation.

Stage-4 Post-update validation

The expectations after applying the JsonPatch is that it will adhere to RFC 6902.

The verification steps

Get the state of ConfigDB JSON before the update as a JSON object
Simulate the JsonPatch application over this JSON object
Compare that JSON object with current ConfigDB JSON
In case of mismatch, just report failure

Fail-safe Action

If an error is encountered during the apply-patch operation, an error is reported and the system DOES NOT take any automatic action. The user can take a checkpoint before running apply-patch and if the operation failed, the user can rollback. Another idea is to introduce a config-session where a user enters a config-session mode does all the modifications, once they are happy with it, they commit the changes to ConfigDB. config-session can be built using checkpoint and rollback functionality, but this config-session idea is beyond the scope of this document.

Logging

All the configuration update operations executed and the output displayed by the apply-patch command are stored in the systemd journal. They are also forwarded to the syslog. By storing the commands in the systemd-journal, the user will be able to search and display them easily at a later point in time. The show apply-patch log command reads the systemd-journal to display information about the apply-patch command that was previously executed or currently in progress.

2.2.2 Checkpoint

The SONiC checkpoint command can broadly classified into the following steps

Stage-1 Get current ConfigDB JSON config

The ConfigDBConnector class is used to obtain the running configuration in JSON format

Stage-2 Validating current ConfigDB JSON config using YANG models

ConfigDB might be in invalid to begin with and later if used for rollback it will not work.

Stage-3 Save JSON config

Save the checkpoint to a dedicated location on the SONiC box

2.2.3 Rollback

The SONiC rollback command can broadly classified into the following steps

Stage-1 Get current ConfigDB JSON config

The ConfigDBConnector class is used to obtain the running configuration in JSON format

Stage-2 Get checkpoint JSON config

Load the checkpoint from the SONiC box

Stage-3 Generate the diff as JsonPatch between current config and checkpoint

The current ConfigDB JSON config is compared with the JSON config from the checkpoint. The comparison result should be in JsonPatch format.

Stage-4 Apply-Patch

Pass the generated JsonPatch to the apply-patch API

Stage-5 Verify config rollback

Compare the ConfigDB JSON after the update with the checkpoint JSON, there should be no differences.

Fail-safe Action

If an error is encountered during the rollback operation, an error is reported and the system DOES NOT take any automatic action. Rollback operation is itself an automated fail-safe action, if itself fails the caller should decide how to handle such failures e.g. generate an alert of high severity, or do config reload.

Logging

All the configuration update operations executed and the output displayed by the rollback command are stored in the systemd journal. They are also forwarded to the syslog. By storing the commands in the systemd-journal, the user will be able to search and display them easily at a later point in time. The show rollback log command reads the systemd-journal to display information about the rollback command that was previously executed or currently in progress.

2.2.4 Replace

The SONiC replace command can broadly classified into the following steps

Stage-1 Get target config from the external user

The external user to provide the full target ConfigDB config in JSON format.

Stage-2 Validating the target config using YANG models

The target config are unknown configs and need to be validating using YANG models

Stage-3 Get current ConfigDB JSON config

The ConfigDBConnector class is used to obtain the running configuration in JSON format

Stage-4 Generate the diff as JsonPatch between current config and target config

The current ConfigDB JSON config is compared with the target JSON config. The comparison result should be in JsonPatch format.

Stage-5 Apply-Patch

Pass the generated JsonPatch to the apply-patch API

Stage-6 Verify config replace

Compare the ConfigDB JSON after the update with the target JSON, there should be no differences.

Fail-safe Action

If an error is encountered during the replace operation, an error is reported and the system DOES NOT take any automatic action. The user can take a checkpoint before running replace and if the operation failed, the user can rollback.

Logging

All the configuration update operations executed and the output displayed by the replace command are stored in the systemd journal. They are also forwarded to the syslog. By storing the commands in the systemd-journal, the user will be able to search and display them easily at a later point in time. The show replace log command reads the systemd-journal to display information about the replace command that was previously executed or currently in progress.

3 Design

3.1 Overview

3.1.1 ApplyPatch

3.1.1.1 User

The user of the system, can simply be a human operator or a service that can talk to SONiC CLI. The user can only apply-patch if they have an admin permission. Users without admin permissions can only execute dry-run of the operations where they will be able to see the exact changes going to affect the device, without executing these changes.

3.1.1.2 SONiC CLI

These are the CLI of SONiC switch to which makes it easy for the users to interact with the system. The CLI commands we are interested in are config ... and show ..., check SONiC Command Line Interface Guide to learn more about SONiC CLI.

For further details on the CLI setup, Check 3.2.2 CLI

3.1.1.3 YANG models

YANG is a data modeling language used to model configuration data, state data, Remote Procedure Calls, and notifications for network management protocols. For further details check The YANG 1.1 Data Modeling Language

SONiC is currently getting on-boarded to YANG data models to help verify and generate the configurations. We will leverage these YANG models to help verify the result of simulating the JsonPatch on ConfigDb, to make sure final outcome adheres to all the constrains defined in the YANG models. For further details check YANG SONiC models.

3.1.1.4 Patch Orderer

This component is going to solve the problems discussed in Stage-2 JSON Patch Ordering. This component is going to help provide an order of execution to the JsonPatch, in such a way when the config is updated in this order, there will be no errors generated on the device. The exact implementation details of this component will not be included in this design document, but we are going to explain in details the contract for any implementation.

The contract would be:

list<JsonChange> order-patch(JsonPatch jsonPatch)

3.1.1.4.1 JsonChange

aspect	description
definition	JsonChange is a JsonPatch in terms of the final outcome of the update. Ordering of JsonChange updates will not follow the operations order within a JsonPatch, but will update the JSON file in any arbitrary order.
validation	JsonChange is considered valid if its corresponding JsonPatch is valid according to JSON Patch (RFC6902)

Assume we have the following JsonPatch:

[
  { "op": "add", "path": "/ACL_TABLE/NO-NSW-PACL-V4/ports/0", "value": "Ethernet2" }
  { "op": "add", "path": "/VLAN_MEMBER/Vlan100|Ethernet2", "value": { "tagging_mode": "untagged" } }
  { "op": "add", "path": "/PORT/Ethernet2", "value": { "lanes": "65", "speed": "10000"} }
]

Operations specified in the JsonPatch will be executed in the order specified, but JsonChange can be applied in any arbitrary order as the implementation specify, for example:

[
  { "op": "add", "path": "/PORT/Ethernet2", "value": { "lanes": "65", "speed": "10000"} }
  { "op": "add", "path": "/VLAN_MEMBER/Vlan100|Ethernet2", "value": { "tagging_mode": "untagged" } }
  { "op": "add", "path": "/ACL_TABLE/NO-NSW-PACL-V4/ports/0", "value": "Ethernet2" }
]

[
  { "op": "add", "path": "/PORT/Ethernet2", "value": { "lanes": "65", "speed": "10000"} }
  { "op": "add", "path": "/VLAN_MEMBER/Vlan100|Ethernet2", "value": { "tagging_mode": "untagged" } }
  { "op": "add", "path": "/ACL_TABLE/NO-NSW-PACL-V4/ports/0", "value": "Ethernet2" }
]

The only condition of JsonChange is that the final outcome after applying the whole JsonPatch is the same.

3.1.1.4.2 Order-Patch

aspect	item	description
inputs	JsonPatch	It represents the changes that needs to applied to the device running config, described in JSON Patch (RFC6902).
outputs	list<JsonChange>	The list will contain the steps to be followed to apply the input JsonPatch correctly. Each item in the list is assumed to be executed after the previous item, in the order given in the list.
errors	malformedPatchError	Will be raised if the input JsonPatch is not valid according to JSON Patch (RFC6902).
	unprocessableRequestError	Will be raised if the implementation of the `order-patch` is not able to provide a valid ordering according to its own ordering validations.
	resourceNotFoundError	Will be raised if running config failed to be read or in case any other external resource is not found nor available.
	conflictingStateError	Will be raised if the patch cannot be applied to the current state of the running config e.g. trying to add an item to a non-existing json dictionary.
	internalError	Will be raised if any other error is encountered that's different than the ones listed above.
side-effects	None
assumptions	running-config locked	The implementor of this contract might interact with ConfigDB to get the running-config, it is assumed the ConfigDB is locked for changes for the lifespan of the operation.

If order-patch has to force the update to follow very specific steps, it would have to provide multiple JsonChange objects in the return list of order-patch.

order-patch is returning a list of JsonChanges instead of a simple JsonPatch with multiple operations because a JsonChange can group together multiple JsonPatch operations that share no dependency and can be executed together. This can help the implementor of apply-change to optimize the mechanism for applying JsonChange e.g. group changes under same parent together or reduce number of service restarts.

For example: Assume JsonPatch contains:

[
  { "op": "add", "path": "/DHCP_SERVERS/4.4.4.4", "value": {} },
  { "op": "replace", "path": "/DEVICE_NEIGHBOR/Ethernet8/port", "value": "eth1" },
  { "op": "add", "path": "/DHCP_SERVERS/2.2.2.2", "value": {} }
]

We have 2 operations updating DHCP servers, and another operation for DEVICE_NEIGHBOR. We can assume DHCP and DEVICE_NEIGHBOR tables to be independent meaning they can updated at the same time. order-patch would return:

[
  [
    { "op": "add", "path": "/DHCP_SERVERS/4.4.4.4", "value": {} },
    { "op": "replace", "path": "/DEVICE_NEIGHBOR/Ethernet8/port", "value": "eth1" },
    { "op": "add", "path": "/DHCP_SERVERS/2.2.2.2", "value": {} }
  ],
]

Updating DHCP_SERVERS requires restarting dhcp_relay service, so if the above patch is to be executed in order, we will restart dhcp_relay service twice. But since the implementor of apply-change can order the operations in any way they see fit since they are OK to update together. They can decide move the DHCP updates together, and DHCP table twice, but restart dhcp_relay service only once.

Let's take a visual example, assume we have a JsonPatch with 8 operations, and here is the topological order of the operation. Arrow from op-x to op-y means op-y depends on op-x.

If we sort by on the operations we would have:

But if we organize the operations into groups of JsonChange, we will have:

This will allow the the implementor of apply-change to have the freedom to optimize the operations in any order they see fit.

NOTE: Check Patch Orderer implementation design design details in Json_Patch_Ordering_using_YANG_Models_Design document.

3.1.1.5 Change Applier

This component is going to solve the problems discussed in Stage-3 Applying list of JsonChanges in order. This component is going to help provide a mechanism for updating a JsonChange taking into account manual service restarts, and update verification. The exact implementation details of this component will not be included in this design document, but we are going to explain in details the contract for any implementation.

The contract would be:

void apply-change(JsonChange jsonChange)

3.1.1.5.1 Apply-Change

aspect	item	description
inputs	JsonChange	It represents the changes that needs to applied to the device running config, described in 3.1.1.4.1 JsonChange.
outputs	None
errors	malformedChangeError	Will be raised if the input JsonChange is not valid according to 3.1.1.4.1 JsonChange.
	resourceNotFoundError	Will be raised if running config failed to be read or in case any other external resource is not found nor available.
	unprocessableRequestError	Will be raised if the change is valid, all the resources are found but when applying the change it causes an error in the system.
	internalError	Will be raised if any other error is encountered that's different than the ones listed above.
side-effects	updating running-config	This operation will cause changes to the running-config according to the input JsonChange.
assumptions	running-config locked	The implementor of this contract will interact with ConfigDB to updating the running-config, it is assumed the ConfigDB is locked for changes for the lifespan of the operation.

Since the order of executing the operation does not matter, the implementor of this component can work on optimizing the time to run the operation. For details check 3.1.1.4 Change Applier.

NOTE: Check Change Applier implementation design design details in Json_Change_Application_Design document.

3.1.1.6 ConfigDB

SONiC is managing configuration in a single source of truth - a redisDB instance that we refer as ConfigDB. Applications subscribe to ConfigDB and generate their running configuration correspondingly.

For further details on the ConfigDB, check SONiC Configuration Database Manual.

3.1.2 Checkpoint

3.1.2.1 User

The user of the system, can simply be a human operator or a service that can talk to SONiC CLI. The user can only apply-patch if they have an admin permission.

3.1.2.5 File system

This will the file system where SONiC os is setup. Some suggestions prefer the path /var/sonic/checkpoints, but that is not decided yet. Will leave it to the implementor of this design document to decide.

3.1.3 Rollback

3.1.3.1 User

Same as 3.1.2.1 User

3.1.3.2 SONiC CLI

Same as 3.1.1.2 SONiC CLI

3.1.3.4 File system

Same as 3.1.2.5 File system

3.1.3.4 ConfigDB

same as 3.1.1.5 ConfigDB

3.1.4 Replace

3.1.4.1 User

Same as 3.1.2.1 User

3.1.4.2 SONiC CLI

Same as 3.1.1.2 SONiC CLI

3.1.4.3 YANG models

Same as 3.1.1.3 YANG models

3.1.4.4 File system

Same as 3.1.2.5 File system

3.1.4.5 ConfigDB

same as 3.1.1.5 ConfigDB

3.2 User Interface

3.2.1 Data Models

3.2.1.1 JsonPatch

The JsonPatch consistes of a list operation, and each operation follows this format:

  { "op": "<Operation-Code>", "path": "<Path>", "value": "<Value>", "from": "<From-Path>" }

For detailed information about the JSON patch operations refer to the section 4(Operations) of RFC 6902.

Operation Code

replace - Set ConfigDB entry described in Path to be equal to the data specified in Value
add - Create a new ConfigDB entry described in Path and set it Value
remove - Delete the ConfigDB entry described in Path
copy - Copy the ConfigDB entry specified in the FromPath to create a ConfigDB entry specified in Path
move - Copy the ConfigDB entry specified in the FromPath to create a ConfigDB entry specified in Path and then delete the ConfigDB entry in FromPath

Path

Describes the location of the ConfigDB entry which is being processed. The path string can be dissected into below elements.

Table Name - The ConfigDB table corresponding to the entry
Key - The Key string to identify the row data within the ConfigDB table
Field - The column_key to identify the ConfigDB entry within the identified row data

e.g "/VLAN_MEMBER/Vlan10|Ethernet0/tagging_mode"

Table: VLAN_MEMBER
Key: Vlan10|Ethernet0
Field: tagging_mode

FromPath

Same as Path used in copy and move operations

Value

Data that is used to patch the ConfigDB entry specified in path

3.2.2 CLI

3.2.2.1 Configuration Commands

apply-patch Command Format

config apply-patch <patch-filename> [--dry-run] [--verbose]

Command Option	Purpose
<patch-filename>	The file of the JsonPatch file to apply which follows JSON Patch (RFC6902) specifications
dry-run	Displays the generates commands going to be executed without running them.
verbose	Provide additional details about each step executed as part of the operation.

Command Usage

Command	Purpose
config apply-patch filename	Applies the given JsonPatch file operations following the JSON Patch (RFC6902) specifications.
config apply-patch filename --dry-run	Displays the generates commands going to be executed without running them.
config apply-patch filename --verbose	Applies the given JsonPatch file operations following the JSON Patch (RFC6902) specifications. The CLI output will include additional details about each step executed as part of the operation.
config apply-patch filename --dry-run --verbose	Displays the generates commands going to be executed without running them. The CLI output will include additional details about each step executed as part of the operation.

checkpoint

Command Format

config checkpoint <checkpoint-name> [--dry-run] [--verbose]

Command Option	Purpose
<checkpoint-name>	The name of the checkpoint where ConfigDB JSON config will saved under.
dry-run	Displays the generates commands going to be executed without running them.
verbose	Provide additional details about each step executed as part of the operation.

Command Usage

Command	Purpose
config checkpoint checkpoint-name	Will save ConfigDB JSON config as a checkpoint with the name checkpoint-name.
config checkpoint filename --dry-run	Displays the generates commands going to be executed without running them.
config checkpoint filename --verbose	Will save ConfigDB JSON config as a checkpoint with the name checkpoint-name. The CLI output will include additional details about each step executed as part of the operation.
config checkpoint filename --dry-run --verbose	Displays the generates commands going to be executed without running them. The CLI output will include additional details about each step executed as part of the operation.

checkpoint

Command Format

config rollback <checkpoint-name> [--dry-run] [--verbose]

Command Option	Purpose
<checkpoint-name>	The name of the checkpoint where ConfigDB JSON config will saved under
dry-run	Displays the generates commands going to be executed without running them.
verbose	Provide additional details about each step executed as part of the operation.

Command Usage

Command	Purpose
config rollback checkpoint-name	Rolls back the ConfigDB JSON config to the config saved under checkpoint-name checkpoint.
config rollback filename --dry-run	Displays the generates commands going to be executed without running them.
config rollback filename --verbose	Rolls back the ConfigDB JSON config to the config saved under checkpoint-name checkpoint. The CLI output will include additional details about each step executed as part of the operation.
config rollback filename --dry-run --verbose	Displays the generates commands going to be executed without running them. The CLI output will include additional details about each step executed as part of the operation.

3.2.2.2 Show Commands

apply-patch Command Format

show apply-patch log [exec | verify | status]

Command	Purpose
show apply-patch log exec	Displays a log of all the ConfigDB operations executed includingthose that failed. In case of a failed operation, it displays anerror message against the failed operation.
show apply-patch log verify	Displays a log all the ConfigDB operations that failed, along with an error message. It does not display the operations that were successful.
show apply-patch log status	Displays the status of last successful patch applicationoperation since switch reboot.

rollback Command Format

show rollback log [exec | verify | status]

Command	Purpose
show rollback log exec	Displays a log of all the ConfigDB operations executed includingthose that failed. In case of a failed operation, it displays anerror message against the failed operation.
show rollback log verify	Displays a log all the ConfigDB operations that failed, along with an error message. It does not display the operations that were successful.
show rollback log status	Displays the status of last successful config rollbackoperation since switch reboot.

3.2.2.3 Debug Commands

Use the verbose option to view additional details while executing the different commands.

3.2.3 Multi ASICs Support

The initial design of the SONiC Generic Configuration Update and Rollback feature primarily focuses on single-ASIC platforms. To cater to the needs of Multi-ASIC platforms, this section introduces enhancements to support configuration updates and rollbacks across multiple ASICs and the host namespace.

3.2.3.1 Overview

In Multi-ASIC SONiC platforms, configurations can be applied either globally, affecting all ASICs, or individually, targeting a specific ASIC or the host. The configuration management tools must, therefore, be capable of identifying and applying configurations based on their intended scope.

3.2.3.2 Namespace-aware Configuration Management

The SONiC utilities for configuration management (apply-patch, checkpoint, and rollback) will be enhanced to become namespace-aware. These utilities will determine the target namespace for each operation from the configuration patch itself or from user input for operations that involve checkpoints and rollbacks.

3.2.3.3 JSON Patch Format Extension

The JSON Patch format will be extended to include the namespace identifier for each operation's path. Path that operations the host namespace will be marked with a "localhost" identifier, while those intended for a specific ASIC will include an "asicN" identifier, where N denotes the ASIC number.

[
    {
        "op": "add",
        "path": "/asic0/PORTCHANNEL/PortChannel102/admin_status",
        "value": "down"
    },
    {
        "op": "replace",
        "path": "/localhost/BGP_DEVICE_GLOBAL/STATE/tsa_enabled",
        "value": "true"
    },
    {
        "op": "replace",
        "path": "/asic0/BGP_DEVICE_GLOBAL/STATE/tsa_enabled",
        "value": "true"
    },
    {
        "op": "replace",
        "path": "/asic1/BGP_DEVICE_GLOBAL/STATE/tsa_enabled",
        "value": "true"
    }
]

3.2.3.4 Applying Configuration Changes

When applying a configuration patch, the system will:

Parse the extended JSON Patch to identify the target namespaces.
Apply the operations intended for the host namespace directly to the host's configuration database.
Apply ASIC-specific operations to the respective ASIC's configuration database.

3.2.3.5 Checkpoints and Rollbacks

Checkpoint and rollback operations will be enhanced to support Multi-ASIC platforms by:

Capturing the configuration state of all ASICs and the host namespace when creating a checkpoint.
Allowing rollbacks to restore the configuration state of all ASICs and the host namespace to a specific checkpoint.

3.2.3.6 Implementation Details

The extension of the SONiC Generic Configuration Update and Rollback feature to support Multi-ASIC platforms enhances the flexibility and manageability of SONiC deployments in complex environments. By introducing namespace-aware configuration management, SONiC can efficiently handle the intricacies of Multi-ASIC platforms, ensuring smooth and reliable operation.

Namespace-aware Utilities: Update the SONiC configuration utilities to handle namespace identifiers in the configuration patches and command-line options for specifying target namespaces for checkpoints and rollbacks.

Validation and Verification: Extend the configuration validation and verification mechanisms to cover Multi-ASIC scenarios, ensuring that configurations are valid and consistent across all ASICs and the host.

CLI Enhancements: Introduce new CLI options to specify target namespaces for configuration operations, and to manage checkpoints and rollbacks in a Multi-ASIC environment.

Testing: Develop comprehensive test cases to cover Multi-ASIC configuration updates, including scenarios that involve simultaneous updates to multiple ASICs and the host.

Pull Request	Description
Add Multi ASIC support for apply-patch	1. Categorize configuration as JSON patch format per ASIC.2. Apply patch per ASIC, including localhost.
Add Multi ASIC support for Checkpoint and Rollback	To be implemented

3.2.3.7 Enhancement

Given that applying patches or performing other actions on multiple ASICs can be time-consuming, we are introducing the -p option to expedite the process. This option operates under the assumption that each ASIC functions independently.

Pull Request	Description
Add Multi ASIC support for parallel option	To be implemented

apply-patch

@config.command('apply-patch')
...
@click.option('-p', '--parallel', is_flag=True, default=False, help='applying the change to all ASICs parallelly')
...
def apply_patch(ctx, patch_file_path, format, dry_run, parallel, ignore_non_yang_tables, ignore_path, verbose):
    ...

checkpoint

@config.command()
...
@click.option('-p', '--parallel', is_flag=True, default=False, help='taking the checkpoints to all ASICs parallelly')
...
def checkpoint(ctx, checkpoint_name, verbose):

replace

@config.command()
...
@click.option('-p', '--parallel', is_flag=True, default=False, help='replacing the change to all ASICs parallelly')
...
def replace(ctx, target_file_path, format, dry_run, ignore_non_yang_tables, ignore_path, verbose):
    ...

rollback

@config.command()
...
@click.option('-p', '--parallel', is_flag=True, default=False, help='rolling back the change to all ASICs parallelly')
...
def rollback(ctx, checkpoint_name, dry_run, ignore_non_yang_tables, ignore_path, verbose):

list_checkpoints

@config.command()
...
@click.option('-p', '--parallel', is_flag=True, default=False, help='listing the change to all ASICs parallelly')
...
def list_checkpoints(ctx, checkpoint_name, dry_run, ignore_non_yang_tables, ignore_path, verbose):

delete_checkpoint

@config.command()
...
@click.option('-p', '--parallel', is_flag=True, default=False, help='listing the change to all ASICs parallelly')
...
def delete_checkpoint(ctx, checkpoint_name, dry_run, ignore_non_yang_tables, ignore_path, verbose):

4 Flow Diagrams

5 Error Handling

If an error is encountered during executing any of the commands, the error is reported to the user. The system does not do any recovery actions, and leaves it up to the user to decide.

6 Serviceability and Debug

All commands logs are stored in systemd-journal and syslog.

7 Warm Boot Support

N/A

8 Scalability

N/A

9 Unit Tests

9.1 Unit Tests for Apply-Patch

Test Case	Description
1	Add a new table.
2	Remove an existing table.
3	Modify values of an existing table entry.
4	Modify value of an existing item an array value.
5	Add a new item to an array value.
6	Remove an item form an array value.
7	Add a new key to an existing table .
8	Remove a key from an existing table.
9	Remove 2 items that depends on each other but in different tables e.g. /PORT/Ethernet2 and /VLAN_MEMBER/Vlan101	Ethernet2.
10	Add 2 items that depends on each other but in different tables e.g. /PORT/Ethernet2 and /VLAN_MEMBER/Vlan101	Ethernet2.
11	Remove 2 items that depends on each other in the same table e.g. /INTERFACE/INTERFACE_LIST and /INTERFACE/INTERFACE_PREFIX_LIST.
12	Add 2 items that depends on each other in the same table e.g. /INTERFACE/INTERFACE_LIST and /INTERFACE/INTERFACE_PREFIX_LIST.
13	Replace a mandatory item e.g. type under ACL_TABLE.
14	Dynamic port breakout as described here.
15	Remove an item that has a default value.
16	Modifying items that rely depends on each other based on a `must` condition rather than direct connection such as `leafref` e.g. /CRM/acl_counter_high_threshold (check here).
17	Add a new ASIC config subtree.
18	Add a new ASIC with empty config.
19	Independent Patch Application: Apply configuration patches independently to each ASIC without any coordination between them. Verify that each ASIC updates according to its patch and that there are no discrepancies in configurations that might affect system operations.
20	Simultaneous Patch Application:Apply configuration patches to all ASICs simultaneously to ensure that simultaneous updates do not cause conflicts or failures. This test checks the system’s ability to handle concurrent configuration changes across multiple independent units.
21	Sequential Patch Application: Apply configuration patches to ASICs in a controlled sequence, one after the other. This test aims to check if the order of patch application affects the final system configuration, especially when configurations might not directly interact but could have cumulative effects.
22	Patch Rollback Capability: After applying patches, initiate a rollback to previous configurations for each ASIC independently. Verify that each ASIC can revert to its previous state accurately and that the rollback process does not introduce new issues.
23	Conditional Patch Application: Apply patches based on conditional checks within each ASIC’s configuration (e.g., only apply a patch if the current firmware version is below a certain level). This test ensures that conditions within patches are evaluated correctly and that the patch is applied only when the conditions are met.
24	Cross-ASIC Dependency Verification: While each ASIC operates independently, this test involves applying patches that could potentially have indirect impacts on other ASICs through shared resources or network topology changes. Validate that changes in one ASIC do not adversely affect others.
25	Patch Compatibility and Conflict Resolution: Apply patches that introduce changes conflicting with existing configurations across ASICs. This test examines how the system identifies and resolves conflicts, ensuring that the most critical settings are preserved and that any issues are clearly reported.
26	Performance Impact Assessment: Measure system performance before and after patch application to determine the impact of configuration changes. This includes monitoring processing speed, memory usage, and network latency to ensure that performance remains within acceptable parameters.
27	Add and remove forced mgmt routes config with IPV4 and IPV6 address.

9.2 Unit Tests for Checkpoint

Test Case	Description
1	Invalid Configs according to YANG models should fail to be saved.
2	Saving ConfigDB successfully to a file.

9.3 Unit Tests for Rollback

Test Case	Description
1 ..*	Rollback all unit-tests specified in 9.1 Unit Tests for Apply-Patch.

9.4 Unit Tests for Replace

Test Case	Description
1 ..*	Use replace instead of apply-patch for unit-tests specified in 9.1 Unit Tests for Apply-Patch.

10 E2E Tests

Test Case	Description
1	Updating Syslog configs.
2	Updating AAA configs.
3	Updating DHCP configs.
4	Updating IPv6 configs.
5	Updating monitor configs.(EverflowAlwaysOn)
6	Updating BGP speaker configs.
7	Updating BGP listener configs.
8	Updating Bounce back routing configs.
9	Updating control-plane ACLs (NTP, SNMP, SSH) configs.
10	Updating Ethernet interfaces configs.
11	Updating VLAN interfaces configs.
12	Updating port-channel interfaces configs.
13	Updating loopback interfaces configs.
14	Updating Kubernetes configs.
15	Updating BGP prefix hijack configs.
16	Updating QoS headroom pool and buffer pool size.
17	Updating ECN tuning.
18	Updating dynamic threshold tuning.
19	Disable/enable PFC_WD.
20	Updating PFC_WD poll interval.
21	Updating PG headroom configs.
22	Add/Remove Rack.
23	Updating NTP configs
24	Updating forced_mgmt_routes configs

Larch SONiC Documentation