VXLAN Encapsulation

Chapter Overview

VXLAN encapsulation is the core mechanism that enables Layer 2 networks to span over Layer 3 infrastructure. Understanding the encapsulation process is fundamental to VXLAN operation.

Encapsulation Overview

VXLAN encapsulation wraps the original Ethernet frame in multiple headers to transport it over an IP network:

Original Frame
+ VXLAN Header
+ UDP Header
+ IP Header
+ Ethernet Header

VXLAN Header Structure

The VXLAN header is 8 bytes long and contains the following fields:

Field Length (bits) Description Value
Flags 8 Control flags, VNI valid flag 0x08 (VNI present)
Reserved 24 Reserved for future use 0x000000
VNI 24 VXLAN Network Identifier 0x000001 - 0xFFFFFF
Reserved 8 Reserved for future use 0x00

VXLAN Header Example

A typical VXLAN header for VNI 10100:

Flags:     0x08 (VNI Present)
Reserved:  0x000000
VNI:       0x002774 (10100)
Reserved:  0x00

Binary: 08 00 00 00 00 27 74 00

Complete Packet Structure

A complete VXLAN packet consists of multiple nested headers:

Outer Ethernet Header (14 bytes)

Contains outer source/destination MAC addresses for underlay transport

Outer IP Header (20 bytes IPv4 / 40 bytes IPv6)

Contains VTEP IP addresses as source and destination

UDP Header (8 bytes)

Source port: ephemeral, Destination port: 4789 (VXLAN)

VXLAN Header (8 bytes)

Contains VNI and control flags

Inner Ethernet Header (14 bytes)

Original Layer 2 frame header

Inner Payload (Variable)

Original frame payload (IP packet, ARP, etc.)

Encapsulation Process

The VTEP performs the following steps during encapsulation:

  1. Frame Reception: Receive original Ethernet frame from local segment
  2. VNI Lookup: Determine VNI based on source VLAN or port
  3. Destination Lookup: Find remote VTEP IP for destination MAC
  4. Header Addition: Add VXLAN, UDP, and IP headers
  5. Transmission: Send encapsulated packet to remote VTEP

Encapsulation Example

Original frame encapsulated in VXLAN:

# Original Frame
Dst MAC: 00:11:22:33:44:55
Src MAC: 00:AA:BB:CC:DD:EE
Type:    0x0800 (IP)
Payload: IP packet

# After VXLAN Encapsulation
Outer Dst MAC: 00:11:22:33:44:66 (Next hop)
Outer Src MAC: 00:AA:BB:CC:DD:FF (Local VTEP)
Outer IP Dst:  192.168.1.2 (Remote VTEP)
Outer IP Src:  192.168.1.1 (Local VTEP)
UDP Dst Port:  4789
UDP Src Port:  49152 (ephemeral)
VXLAN VNI:     10100
Inner Frame:   Original Ethernet frame

UDP Port Usage

VXLAN uses UDP for transport with specific port assignments:

Port Type Port Number Description
Destination Port 4789 IANA assigned VXLAN port
Source Port 49152-65535 Ephemeral port for ECMP hashing
Legacy Port 8472 Early implementations (deprecated)

Source Port Importance

The source port is crucial for ECMP (Equal Cost Multi-Path) load balancing. It should be calculated based on inner frame contents to ensure consistent hashing:

  • Inner MAC addresses
  • Inner IP addresses
  • Inner protocol and ports

MTU Considerations

VXLAN adds significant overhead to the original frame:

VXLAN Overhead Calculation

  • Outer Ethernet Header: 14 bytes
  • Outer IP Header: 20 bytes (IPv4) or 40 bytes (IPv6)
  • UDP Header: 8 bytes
  • VXLAN Header: 8 bytes
  • Total IPv4 Overhead: 50 bytes
  • Total IPv6 Overhead: 70 bytes

MTU Impact

VXLAN overhead reduces effective MTU:

  • Standard Ethernet MTU: 1500 bytes
  • VXLAN overhead: 50 bytes
  • Effective inner MTU: 1450 bytes

Consider enabling jumbo frames (9000 bytes) on underlay network.

VNI Concepts

Chapter Overview

The VXLAN Network Identifier (VNI) is the key component that enables network segmentation and isolation in VXLAN networks. Understanding VNI concepts is crucial for proper VXLAN design.

VNI Overview

A VNI is a 24-bit identifier that uniquely identifies a VXLAN segment. It serves a similar purpose to a VLAN ID but provides much greater scale and flexibility.

VNI Characteristics
  • 24-bit identifier: 0 to 16,777,215
  • Globally unique: Within VXLAN domain
  • Tenant isolation: Traffic separation
  • Scalability: 16 million segments
VNI Functions
  • Segmentation: Logical network isolation
  • Forwarding: Destination VTEP lookup
  • Filtering: Traffic scope limitation
  • Mapping: VLAN to VNI translation

VNI Allocation Strategies

Different approaches can be used for VNI allocation depending on the deployment scenario:

Strategy Range Use Case Example
Sequential 10001-20000 Simple deployments VNI 10001, 10002, 10003...
VLAN-based VLAN + offset VLAN migration VLAN 100 → VNI 10100
Tenant-based Tenant blocks Multi-tenant clouds Tenant 1: 100000-199999
Service-based Service categories Application tiers Web: 1000x, DB: 2000x

VNI Allocation Example

Tenant-based allocation scheme:

# Tenant A (Customer 1)
VNI Range: 100000 - 199999
- Web Tier:  100100
- App Tier:  100200  
- DB Tier:   100300
- DMZ:       100400

# Tenant B (Customer 2)
VNI Range: 200000 - 299999
- Web Tier:  200100
- App Tier:  200200
- DB Tier:   200300
- DMZ:       200400

VNI to VLAN Mapping

VNIs are typically mapped to VLANs on access switches to provide backward compatibility:

Host/VM
VLAN 100
VTEP
VNI 10100
One-to-One Mapping

Each VLAN maps to exactly one VNI. Most common and straightforward approach.

  • VLAN 100 → VNI 10100
  • VLAN 200 → VNI 10200
  • Simple configuration
Many-to-One Mapping

Multiple VLANs map to single VNI. Less common but useful for aggregation.

  • VLAN 100, 101 → VNI 10100
  • Reduces VNI consumption
  • Complex troubleshooting

VNI Scoping

VNIs can have different scopes depending on the deployment model:

Scope Description Advantages Disadvantages
Global VNI unique across entire network Simple management, no conflicts Requires coordination
Local VNI unique within local domain Autonomous allocation Complex interconnect
Hierarchical VNI includes location/tenant info Self-documenting Reduced flexibility

VNI Membership

VTEPs must be configured with VNI membership to participate in VXLAN segments:

VNI Membership Configuration

Cisco NX-OS example:

# Configure VNI membership on NVE interface
interface nve1
  no shutdown
  source-interface loopback0
  member vni 10100
    ingress-replication protocol bgp
  member vni 10200
    mcast-group 239.1.1.1
  member vni 10300
    ingress-replication protocol bgp

# Map VLANs to VNIs
vlan 100
  vn-segment 10100
vlan 200
  vn-segment 10200
vlan 300
  vn-segment 10300

VNI Forwarding Behavior

VNI determines forwarding behavior within VXLAN networks:

MAC Learning

MAC addresses learned per VNI

  • Separate MAC tables
  • VNI-specific aging
  • Tenant isolation
BUM Flooding

Broadcast domain per VNI

  • VNI-specific flooding
  • Multicast groups
  • Ingress replication
Unicast Forwarding

Known unicast per VNI

  • VNI-specific lookup
  • Remote VTEP mapping
  • Optimized forwarding

VNI Troubleshooting

Common VNI-related issues and verification commands:

VNI Verification Commands

# Show VNI configuration
show nve vni

# Show VNI membership
show nve interface nve1 detail

# Show VNI MAC addresses
show l2route evpn mac all

# Show VNI flooding information
show nve multicast

# Show VLAN to VNI mapping
show vlan vn-segment

Common VNI Issues

  • VNI Mismatch: Inconsistent VNI configuration between VTEPs
  • Missing Membership: VTEP not configured for required VNI
  • VLAN Mapping: Incorrect VLAN to VNI mapping
  • Control Plane: VNI not advertised in BGP EVPN

VNI Best Practices

Design Guidelines

  • Use consistent VNI allocation scheme
  • Document VNI assignments
  • Reserve ranges for different purposes
  • Avoid VNI 0 (reserved)

Operational Guidelines

  • Monitor VNI utilization
  • Automate VNI provisioning
  • Implement VNI lifecycle management
  • Regular VNI audits

VTEP Architecture

Chapter Overview

VXLAN Tunnel Endpoints (VTEPs) are the cornerstone of VXLAN networks. They handle encapsulation, decapsulation, and forwarding decisions that make VXLAN overlay networks possible.

VTEP Overview

A VTEP is a device that originates and terminates VXLAN tunnels. It can be implemented in hardware (physical switches) or software (hypervisor virtual switches).

Local Segment
VTEP
IP Network
Remote VTEP

VTEP Types

VTEPs can be categorized based on their implementation:

Hardware VTEP

Physical Network Switches

  • ASIC-based processing
  • Line-rate performance
  • Low latency
  • High port density
  • Examples: Cisco Nexus, Arista EOS
Software VTEP

Hypervisor Virtual Switches

  • CPU-based processing
  • Flexible implementation
  • Integration with hypervisor
  • VM-level granularity
  • Examples: OVS, vSphere DvS
SmartNIC VTEP

NIC-based Processing

  • Offload from CPU
  • Hardware acceleration
  • Programmable pipeline
  • Cloud-native integration
  • Examples: Mellanox, Intel

VTEP Functions

VTEPs perform several critical functions in VXLAN networks:

Function Description Process
Encapsulation Wrap original frames in VXLAN headers Add VXLAN + UDP + IP headers
Decapsulation Remove VXLAN headers from received packets Strip outer headers, forward inner frame
MAC Learning Learn MAC addresses and their locations Build forwarding tables per VNI
Flooding Handle BUM traffic distribution Multicast or ingress replication
Forwarding Make forwarding decisions Lookup destination VTEP

VTEP Identification

Each VTEP must have a unique IP address for tunnel endpoint identification:

VTEP IP Configuration

Common approaches for VTEP IP assignment:

# Method 1: Dedicated Loopback Interface
interface loopback0
  description "VTEP IP"
  ip address 192.168.100.1/32

interface nve1
  source-interface loopback0

# Method 2: Physical Interface
interface ethernet1/1
  description "Underlay connectivity"
  ip address 10.1.1.1/30

interface nve1
  source-interface ethernet1/1

VTEP IP Best Practices

  • Use loopback interfaces for stability
  • Assign from dedicated IP range
  • Ensure reachability via IGP
  • Consider anycast for redundancy

VTEP Forwarding Process

The VTEP forwarding process depends on the destination MAC address:

Known Unicast
  1. Lookup destination MAC in VNI table
  2. Find associated remote VTEP IP
  3. Encapsulate frame with VTEP IP
  4. Forward to remote VTEP
BUM Traffic
  1. Identify broadcast/multicast frame
  2. Determine VNI flooding scope
  3. Replicate to all VTEPs in VNI
  4. Use multicast or ingress replication

VTEP Learning Process

VTEPs learn MAC addresses from both local and remote sources:

Local Learning
+
Remote Learning
MAC Table

MAC Learning Example

# Local MAC Learning
Host A (MAC: 00:11:22:33:44:55) sends frame
→ VTEP1 learns: MAC 00:11:22:33:44:55 on local port

# Remote MAC Learning  
VTEP1 receives VXLAN packet from VTEP2
→ VTEP1 learns: MAC 00:AA:BB:CC:DD:EE via VTEP2 (IP: 192.168.100.2)

# MAC Table Entry
VNI 10100: MAC 00:11:22:33:44:55 → Local Port Eth1/1
VNI 10100: MAC 00:AA:BB:CC:DD:EE → Remote VTEP 192.168.100.2

VTEP Redundancy

Several methods provide VTEP redundancy and high availability:

Method Description Pros Cons
Anycast VTEP Multiple VTEPs share same IP Seamless failover Complex configuration
vPC/MLAG Pair of VTEPs act as one Active-active forwarding Vendor-specific
BGP EVPN Control plane convergence Fast convergence Protocol complexity
Multiple VTEPs Separate VTEPs per rack Simple design Slower convergence

VTEP Scaling Considerations

Several factors affect VTEP scalability:

Scale Factors
  • VNI Count: Number of supported VNIs
  • MAC Entries: MAC table size per VNI
  • Tunnel Count: Number of VXLAN tunnels
  • Bandwidth: Aggregate throughput
Limitations
  • Hardware limits: ASIC capabilities
  • Memory constraints: Table sizes
  • CPU utilization: Control plane processing
  • Network overhead: Encapsulation impact

VTEP Monitoring

Key metrics for VTEP health monitoring:

VTEP Monitoring Commands

# Show VTEP status
show nve interface

# Show VTEP statistics
show nve interface nve1 detail

# Show VTEP peers
show nve peers

# Show VTEP MAC learning
show l2route evpn mac all

# Show VTEP tunnels
show interface tunnel brief

VTEP Best Practices

  • Use dedicated loopback interfaces for VTEP IPs
  • Implement proper redundancy mechanisms
  • Monitor VTEP health and performance
  • Plan for scalability requirements
  • Regular software updates and maintenance

Forwarding Behavior

Chapter Overview

Understanding VXLAN forwarding behavior is crucial for network design and troubleshooting. This section covers how VTEPs make forwarding decisions and handle different traffic types.

Forwarding Overview

VXLAN forwarding behavior depends on the destination MAC address and the VTEP's forwarding table state:

Frame Ingress
VNI Lookup
MAC Lookup
Forwarding Decision

Forwarding Table Structure

VTEPs maintain forwarding tables that map MAC addresses to locations:

VNI MAC Address Location Type Age
10100 00:11:22:33:44:55 Eth1/1 Local 120s
10100 00:AA:BB:CC:DD:EE 192.168.100.2 Remote 300s
10200 00:11:22:33:44:66 Eth1/2 Local 45s

Unicast Forwarding

Unicast forwarding behavior depends on whether the destination MAC is known:

Known Unicast

Destination MAC in forwarding table

  1. Lookup destination MAC in VNI table
  2. Determine forwarding location
  3. Forward directly to destination
  4. No flooding required
Unknown Unicast

Destination MAC not in forwarding table

  1. MAC lookup fails
  2. Treat as BUM traffic
  3. Flood to all VTEPs in VNI
  4. Learn from response

Unicast Forwarding Example

# Known Unicast Flow
Host A (00:11:22:33:44:55) → Host B (00:AA:BB:CC:DD:EE)

1. Frame arrives at VTEP1
2. VNI 10100 lookup for MAC 00:AA:BB:CC:DD:EE
3. Found: Remote VTEP 192.168.100.2
4. Encapsulate frame with:
   - Outer IP: 192.168.100.1 → 192.168.100.2
   - VXLAN VNI: 10100
5. Forward to VTEP2

BUM Traffic Handling

Broadcast, Unknown unicast, and Multicast traffic requires special handling:

BUM Frame
VNI Membership
Replication
All VTEPs

BUM Traffic Types

Traffic Type Description Examples
Broadcast Destination MAC: FF:FF:FF:FF:FF:FF ARP requests, DHCP discovery
Unknown Unicast Destination MAC not in table First packet to new destination
Multicast Destination MAC: 01:00:5E:xx:xx:xx OSPF Hello, PIM Join

Flooding Mechanisms

VTEPs use two main methods for BUM traffic flooding:

Multicast Mode

Underlay Multicast

  • Each VNI maps to multicast group
  • Efficient bandwidth utilization
  • Requires multicast routing
  • PIM configuration needed
Ingress Replication

Unicast Replication

  • Ingress VTEP replicates frames
  • Unicast to each remote VTEP
  • No multicast routing required
  • Higher bandwidth usage

Flooding Configuration Examples

# Multicast Mode
interface nve1
  member vni 10100
    mcast-group 239.1.1.1

# Ingress Replication Mode
interface nve1
  member vni 10100
    ingress-replication protocol bgp

# Static Ingress Replication
interface nve1
  member vni 10100
    peer-ip 192.168.100.2
    peer-ip 192.168.100.3

MAC Learning Process

VTEPs learn MAC addresses through data plane and control plane mechanisms:

Data Plane Learning

Traditional flood-and-learn mechanism

  • Learn from source MAC addresses
  • Store in local MAC table
  • Age out inactive entries
  • Similar to traditional switching
Control Plane Learning

BGP EVPN MAC advertisement

  • Advertise MAC/IP bindings
  • Distribute via BGP EVPN
  • Suppress flooding
  • Optimal forwarding

Local vs Remote Forwarding

VTEPs handle local and remote destinations differently:

Destination Location Forwarding Action Encapsulation
Local Host Same VTEP Switch locally None
Remote Host Different VTEP Encapsulate and tunnel VXLAN
External Host Non-VXLAN network Route via gateway None

Forwarding Optimization

Several techniques optimize VXLAN forwarding performance:

Hardware Acceleration
  • ASIC-based forwarding
  • Line-rate performance
  • Dedicated lookup engines
  • Parallel processing
Table Optimization
  • Efficient data structures
  • Hash-based lookups
  • Aging mechanisms
  • Memory optimization
Flood Suppression
  • ARP suppression
  • ND suppression
  • Unknown unicast suppression
  • BGP EVPN integration

Troubleshooting Forwarding Issues

Common forwarding problems and diagnostic approaches:

Forwarding Verification Commands

# Show MAC address table
show mac address-table

# Show VXLAN MAC entries
show l2route evpn mac all

# Show forwarding table
show l2route topology

# Show VTEP peers
show nve peers

# Debug forwarding
debug l2fwd all

Common Issues

  • MAC Not Learning: Check VNI configuration and connectivity
  • Flooding Loops: Verify BUM traffic handling configuration
  • Asymmetric Forwarding: Inconsistent MAC learning between VTEPs
  • Slow Convergence: MAC aging timers too long

Best Practices

  • Use BGP EVPN for optimal MAC learning
  • Implement ARP/ND suppression
  • Monitor MAC table utilization
  • Tune aging timers appropriately
  • Design for symmetric forwarding

Scalability Benefits

Chapter Overview

VXLAN's scalability advantages over traditional VLANs make it ideal for modern data centers and cloud environments. This section explores the key scalability benefits and considerations.

Scalability Overview

VXLAN addresses the fundamental scalability limitations of traditional VLANs:

Traditional VLAN Limits
  • 4,094 VLAN IDs maximum
  • Spanning Tree Protocol constraints
  • Single broadcast domain per VLAN
  • Limited to Layer 2 boundaries
  • Manual configuration overhead
VXLAN Advantages
  • 16 million VNIs (24-bit space)
  • IP routing foundation
  • Overlay network abstraction
  • Spans multiple data centers
  • Automated provisioning

VNI Scale

The 24-bit VNI space provides unprecedented network segmentation scale:

Technology Identifier Bits Maximum Segments Practical Scale
VLAN 12 bits 4,094 Hundreds
VXLAN 24 bits 16,777,214 Millions
NVGRE 24 bits 16,777,214 Millions
STT 64 bits 18.4 × 10^18 Unlimited

VNI Allocation Example

Enterprise with 1000 tenants, 100 VNIs per tenant:

# Traditional VLANs
Maximum tenants: 4,094 VLANs ÷ 100 VLANs/tenant = 40 tenants

# VXLAN VNIs  
Maximum tenants: 16,777,214 VNIs ÷ 100 VNIs/tenant = 167,772 tenants

# Scale improvement: 4,194x increase

Multi-Tenancy Scale

VXLAN enables massive multi-tenancy in cloud environments:

Tenant 1
Tenant 2
...
Tenant N
Shared Infrastructure
Small Cloud
  • 100 tenants
  • 10 VNIs per tenant
  • 1,000 total VNIs
  • 0.006% of VXLAN space
Large Cloud
  • 10,000 tenants
  • 50 VNIs per tenant
  • 500,000 total VNIs
  • 3% of VXLAN space
Hyperscale
  • 100,000 tenants
  • 100 VNIs per tenant
  • 10,000,000 total VNIs
  • 60% of VXLAN space

Geographic Scale

VXLAN overlays span multiple data centers and geographic regions:

Scope Traditional VLAN VXLAN Benefit
Single Rack ✓ Supported ✓ Supported No difference
Single Data Center ✓ Supported ✓ Supported Higher scale
Multiple Data Centers ✗ Complex ✓ Native Seamless extension
Hybrid Cloud ✗ Not feasible ✓ Supported Cloud connectivity

Performance Scale

VXLAN provides better performance scaling through IP routing:

Traditional Limitations
  • Spanning Tree: Blocks redundant paths
  • Broadcast Storms: Single failure domain
  • MAC Learning: Flooding required
  • Convergence: Slow STP reconvergence
VXLAN Advantages
  • ECMP: All paths active
  • Isolation: Fault containment
  • Control Plane: Efficient learning
  • Convergence: Fast IGP convergence

Operational Scale

VXLAN simplifies network operations at scale:

Configuration Comparison

# Traditional VLAN (per switch)
vlan 100
  name tenant-1-web
vlan 101  
  name tenant-1-app
vlan 102
  name tenant-1-db

# VXLAN (automated)
POST /api/v1/tenants
{
  "tenant_id": "tenant-1",
  "networks": [
    {"name": "web", "vni": 100100},
    {"name": "app", "vni": 100200}, 
    {"name": "db", "vni": 100300}
  ]
}

Hardware Scale Limits

Understanding hardware limitations is crucial for scale planning:

Component Typical Limit Impact Mitigation
VNI Count 4,000-32,000 Tenant density Distributed VTEPs
MAC Entries 128K-1M Host density MAC learning optimization
Tunnel Count 1K-10K VTEP connectivity Hierarchical design
Bandwidth 100G-400G Aggregate throughput Port aggregation

Scale Design Patterns

Common design patterns for different scale requirements:

Spine-Leaf Scale
  • 32 spine switches
  • 512 leaf switches
  • 16,384 server ports
  • Linear scale expansion
Multi-Pod Scale
  • Multiple spine-leaf pods
  • Super-spine interconnect
  • 100,000+ server ports
  • Hierarchical scalability

Monitoring Scale

Key metrics for monitoring VXLAN scale:

Scale Monitoring

# VNI utilization
show nve vni summary

# MAC table utilization  
show mac address-table count

# VTEP peer count
show nve peers summary

# Tunnel utilization
show interface tunnel brief | count

# Memory utilization
show system resources

Scale Considerations

  • Control Plane: BGP EVPN route scale
  • Convergence: Larger scale = slower convergence
  • Troubleshooting: Complexity increases with scale
  • Monitoring: More data to collect and analyze

Scaling Best Practices

  • Plan for 3-5 year growth
  • Implement hierarchical designs
  • Use automation for provisioning
  • Monitor scale metrics continuously
  • Design for failure scenarios