Arista security policy: PoC reveals potential but not ready for production environment yet

Aleron recently conducted a proof of concept (PoC) project regarding the Arista switching platform to see whether key features could be deployed in a production environment to dynamically offload traffic from the firewall process path to the Arista’s high-speed switching fabric.

If successful, this would mean it’s possible to create a highly-secure, high-speed, low-latency architecture, which would let businesses use a lower-capacity firewall to securely manage an environment that is capable of generating 40-100 gigabit-per-second data flows.

The key features that we investigated were the Arista Direct Flow Assist (DFA) package and the Arisa Macro-Segmentation Service (MSS).

The DFA lets the Arista receive a syslog message from the firewall and trigger the Direct Flow process to dynamically implement a flow entry, which lets the firewall offload certain flows directly into the Arista’s flow table, bypassing the firewall data path.

The MSS is a new capability within Arista’s orchestration service CloudVision, allowing security services to be inserted into the data path regardless of whether the end devices are physical or virtual.

The PoC used the following hardware:

  • Arista 7050SX-128 switch
  • Palo Alto Networks 5050 (7.1.1) gateway
  • Checkpoint 23800 (R77.30) gateways
  • Checkpoint Management server (R77.30)

 

DFA results

The basic operation of DFA was proven in the lab environment and we could see by the dynamic configuration changes and interface monitoring that the security policy could be successfully pushed down from the firewall to the ternary content addressable memory (TCAM) flow tables.

 

Issues:

  • At the time of the lab testing only the Palo Alto firewall had been successfully tested. The Checkpoint firewall had some integration issues involving the format of the syslog message that was being sent to the Arista switch. This is currently being worked on by Arista and Checkpoint development engineers and it is expected to be resolved fairly quickly.
  • The DFA technology seem to be quite layer 2 centric and, although there is a layer 3 support enabled through matching VLAN tags within the DFA python script, to support a 15-VLAN environment means adding about 2,000 lines of code to allow for any-to-any VLAN switching.
  • There is a limitation with the number of entries that can be written to the TCAM based on its current memory allocation of the Broadcom Trident+/2 chipset used by Arista. This means that only 750 bidirectional entries can be written to the TCAM.
  • Once a flow has been written to the TCAM and offloaded to the Arista switching fabric, there is no feedback loop for the firewall to be advised if the flow is still in progress or has been completed. The minimum TCAM flow entry timeout is one minute, which means that the firewall needs to keep the initial flow alive so that, if it times out in the TCAM, the traffic can be forwarded back to the firewall to then generate a new syslog entry and corresponding TCAM entry again. This means that the traffic will bounce between the switch and firewall for longer-lasting flows unless specifically tuned to use a longer timeout value.

 

MSS results

The review for MSS has been paper-based only as the early field trial code was not yet available for testing.

The technology looks promising but it was too early to determine how suitable it will be in a fast-paced, highly-available production environment. There is a suggestion in some documentation that the Direct Flow process will be used to write the policy to the TCAM flow tables from the information received from the CloudVision orchestration tool, which has, in turn, received the policy from a security device attached to the switching fabric. The detailed documentation for this service is still to be released.

 

 

There are four basic scenarios to be tested when the code becomes available:

  1. Determine which part of the switch’s memory is being used to write the policy decision and, if the TCAM is being used, then determine whether the same 750-rule limitation still exists.
  2. The time taken for the security policy to be initiated by the firewall, sent to CloudVision and then pushed down to the switch may be longer than the flow itself.
  3. Determine if the firewall is being used to dynamically process traffic after deep packet inspection and user awareness profiling, or if it’s only being used to manage the security policy at a L3/L4 level, which has been pre-pushed down into the switching fabric by CloudVision.
  4. Determine the level of audit logging available for traffic flows using MSS.

 

Bottom line

The PoC was successful but there is still some way to go with the development cycle to make these features robust and dynamic enough to withstand the demands of a production trading environment.