Data Center

Optimizing and Scaling on a Leaf-Spine Architecture

Mike Peterson

The Internet of Things (IoT) and the proliferation of virtualization have caused traffic between devices in the data center to grow. Referred to as “east-west traffic,” this term accounts for traffic going back and forth between servers in a data center.

When you run lots of east-west traffic through a topology designed for north-south traffic (traffic that enters and exits the data center), devices connected to the same switch port may contend for bandwidth – and end-users experience poor response time.

 

If hosts on one access switch need to quickly communicate with hosts on another access switch, uplinks between the access layer and aggregation can be a point of congestion. A common three-tier network design may worsen the issue, constraining the location of devices like virtual servers.

 

Moving to a Leaf-Spine Architecture

That’s where leaf-spine architecture comes in, scaling horizontally through the addition of spine switches. This two-layer topology allows devices to be exactly the same number of segments away.

 

With each leaf switch connecting to each spine connection, the number of spine switches is limited to the number of uplink ports on the leaf. The most common leaf switches come with only four 40G QSFP+ uplink ports, limiting your network to only four spine switches. This starts to limit network scalability.

 

One way to achieve more scale is to break the 40G SR4 channel into four 10G duplex channels, turning the four 40G uplink ports into 16 available uplinks. This increases the number of spine switches that can be a part of the mesh network to 16, providing four times the scalability.

 

Scaling Networks: 10G vs. 40G

Let’s use an example to compare scaling in leaf-spine architecture between 10G and 40G networks.

 

With 40G uplinks, the number of spine switches is fixed at four, based on the leaf having four uplinks. Typically, each spine has a total of four line cards. These line cards come with 36 40G ports per line card. The total number of available ports to connect to leaf switches is 144; each leaf has 48 ports to connect to network devices, allowing for a maximum of 6,912 computers to connect to the 40G mesh network.

 

When you scale out on a 10G network, scaling is increased by a factor of four. Each 40G uplink is broken into four 10G channels, allowing for 16 spine switches. With four line cards, and 36 40G ports per line card split into 10G legs, there are a maximum of 576 leaf switches (144 ports x 4). With each leaf having 48 ports, you can connect 27,648 computers – four times the scaling throughout the mesh network.

 

10G Channels: Potential Obstacles

Moving to four 10G channels in leaf-spine architecture introduces a new concern: Latency (the amount of time it takes for a packet of information to travel from point A to point B) increases because the pipes are split into smaller lanes. The smaller the lanes, the slower the traffic. Although throughput remains the same, latency increases.

 

One of the biggest challenges to implementing a mesh network is cabling. Mesh networks require LC patch cords to create a cross-connect, ensuring that all leaf switches and spines are properly connected. A cross-connect is created in the main distribution area (MDA), creating several cabling issues: insertion loss, maintaining polarity, increase in cable counts, etc. Rack challenges include density, required U space and power availability.

 

To create the 10G channel, a complex cross-connect must be created. Each eight-fiber MPO port on the switch is broken up into an LC duplex connection; 144 MPOs become 576 LC duplex connections per switch, for a total of 18,432 LC duplex ports (both sides of the cross connect). To connect the 10G channels to each leaf and spine, a total of, 9,216 LC duplex patch cords are needed. As a result, additional channels for MACs (moves, adds and changes), cable routing and space constraints are possible.

 

This essentially breaks an MPO into four lanes and makes an LC connection. Each lane is combined with lanes from other spines and converted back into an eight-fiber MPO (Base-8) with four channels from four different spine switches. Cable management, space utilization, documentation and labeling become extremely difficult to troubleshoot and maintain.

 

Shuffle Cassettes Save Space and Reduce Complexity

There’s a new leaf-spine architecture solution available that drastically reduces the amount of space needed, as well as the number of cables in the MDA: Belden shuffle cassettes.

These cassettes eliminate the need to create a cross-connect to separate 40G channels into 10G, and recombine to connect to each leaf, handling lane reassignments internally. Each shuffle cassette has four MPOs in and out; each leaf requires four shuffle cassettes.

 

Comparing Total Modules

Traditional MPO-LC-MPO Belden Shuffle Cassette Savings
704 modules 416 modules 288 modules
176U space 104U space 72U (roughly 1.6 racks)
9,216 patch cords 2,304 patch cords 6,912 patch cords

By utilizing the same connector, reducing connections and standardizing on components across the channel, Belden’s shuffle cassettes allow for scaling in leaf-spine architecture, reduce the opportunity for human error, speed up deployment time and reduce time spent on MACs. By using a shuffle cassette that fits into any Belden housings, you reclaim valuable floor space.

 

Learn more about Belden solutions that allow you to standardize and improve space utilization your data center.

 

What did you think about this article? Be sure to share in the comments section below!