Friday, May 2, 2008

Analysis of TCP Throughput Collapse in Ordinary Ethernet-based Clustered Storage Systems

Situation
Client access to data from a storage cluster or iSCSI-based storage system on ordinary Ethernet can be severely impaired thereby providing it a much lower read-bandwidth than should be available from configured network links.

The following example depicts a client-initiated synchronized read operation across a simple clustered storage system.


Incast Problem Definition
Incast is a catastrophic TCP throughput collapse that occurs as the number of storage servers sending data to a client increases past the ability of an Ethernet switch to buffer sufficient number of packets.

Anatomy of the Incast Problem
The Incast problem arises from a subtle interaction between depleted Ethernet buffers, cluster-centric communication patterns, and inadequate TCP loss-recovery mechanisms. A synchronized read operation of striped data from storage servers floods the switch buffers leading to packet loss and TCP timeouts. As striping also couples the behavior of multiple storage servers, overall system latency can be reduced to hundreds of milliseconds, if not more, which is a significant order of magnitude greater than typical data fetch times.

The following graph illustrates TCP throughput collapse during synchronized read for a simple clustered storage system.


For details on the Incast problem, simulation, and real-world test results please refer to the USENIX Association paper titled "Measurement and Analysis of TCP Throughput Collapse in Cluster-based Storage Systems" by Amar Phanishaayee et al. from Carnegie Mellon University (CMU). A copy of this paper can be found here.

Best-of-Breed Ordinary Ethernet Switch Behavior
Does the Incast problem occur in real-world storage clusters with best-of-breed Etherent switches?

The CMU team analyzed the issue for the following three best-of-breed 1GE and 10GE switches:
1) HP ProCurve 2848
- 44x 1GE ports with 4x 1GE SFP ports, List Price: $3,299
2) Force10 S50
- 48x 1GE ports and up to 4x 10GE ports, List Price: $16,500
3) Force10 E1200
- Up to 1260x 1GE ports or 224x 10GE ports, List Price: >$500,000

The following graph pictorially depicts TCP throughput collapse in each scenario.


Observations
(1) Incast is a generic problem with ordinary Ethernet switches
(2) QoS implementation and memory allocation policies for buffer management is vendor specific
(3) QoS is typically implemented by partitioning output queues for each class of service. Disabling QoS increases effective size of output queues and can affect onset of Incast
(3) Switch buffer sizes play an important role in mitigating Incast
(4) HP ProCurve 2848 uses small buffers. Incast-induced throughput collapse occurs around seven servers
(5) Force10 S50 allocates a relatively large amount of buffer space and switch resources to support QoS. With QoS disabled, incast-induced throughput collapse occurs around 35 servers
(6) On Force10 E1200, incast-induced throughput collapse occurs around 87 servers
(7) Status-quo Ethernet mechanisms are inadequate for handling mission-critical storage traffic in data centers. A fundamentally different approach is necessary.

Thursday, May 1, 2008

AZ-10GE – Not Just Another Acronym – A Tectonic Shift!

Teak Technologies has pioneered a new category of scalable and standards-compliant switching solutions that deliver breakthrough price-performance and transform data center networks into an Applications Acceleration Zone (AAZ), a discontinuous innovation that forever alters the Ethernet switching landscape at a fundamental level.

Applications Acceleration Zone?
An Applications Acceleration Zone is an isolated data center network environment with tightly controlled levels of prevailing artifacts that reduce performance of distributed mission-critical applications.

An Applications Acceleration Zone is to a data center as a "clean room" is to a semiconductor facility - or, for the un-initiated, an "operating room" to a hospital.

Think of "germ-free." Think of "isolation." Think of "environmental pollutants." Think of "vital signs." Think of "life saving."

Clean/operating rooms are isolated environments with tightly controlled levels of contamination from pollutants. Isolation is just as critical while processing semiconductor wafers as it is for vital life saving purposes.

Collating the Concepts
AAZ enables applications to maintain their vital performance signs in distributed and virtualized data center environments.

LANs based on AZ-10GE allow performance-impacting traffic to cut through all networking artifacts including congestion. Innovative IT managers substitute AZ-10GE for ordinary 10GE in all mission critical applications with stringent requirements for performance, reliability, and predictability. AZ-10GE LANs have 4x fewer links - optimally utilized to their capacity, consume up to 4x less power, are simpler to manage, and reduce time-to-profitability.

Applications Acceleration Zone Attributes
- 10Gbps overlay network (AZ-10GE)
- Isolated environment delivers predictable application performance
- It is Ethernet - just simply a whole lot better
- All applications run unmodified
- Complementary approach. Requires no forklift upgrades
- Works with legacy 1GE and 10GE equipment
- Leverages portals or gateway entry points for legacy 1GE and ordinary 10GE applications

How Large is Large?
AZ-10GE switching solutions can be deployed everywhere in the data center - from within a blade server chassis, to aggregating rack server and storage traffic at a rack-level, and then to scaling linearly across the entire access layer. Innovative portal and gateway appliances also enable applications with ordinary 1GE and 10GE connectivity to participate in the Applications Acceleration Zone without requiring any fork-lift upgrades.

Acceleration Zone Scales Linearly Across the Data Center

Monday, April 28, 2008

Heralding the Dawn of Individualized Genomics

The 2.8 billion contiguous bits of genetic code - the human genome - hold an extraordinary trove of information about human development, physiology, medicine, and evolution.

Already the widely held notion that we have exactly the same genes in the human population is being challenged. The variations revealed in the new genome, dubbed "HuRef," go far beyond previously identified single nucleotide polymorphisms (SNPs), once thought to be the key to differences in human traits and disease susceptibility. New data shows that, in an individual genome, upwards of 44 percent of genes are variable in sequence.

Time's a Wasting!

How does HuRef data influence drug discovery? Will genetic variations allow for tailoring drug efficacy?

Drug discovery and formal FDA processes alone can sometimes take upwards of ten years.

What if your IT infrastructure could be designed to reduce this process time by three years? How many more lives could we save? What is the opportunity cost to a pharmaceutical company’s top-line revenue?

What role does AZ-10GE have to play in reducing the drug discovery timeline by thirty percent? Just ask our customers, or the computer geeks and molecular biologists amongst us.




Sunday, April 20, 2008

Boldness in Simplicity

The following results speak for themselves:


AZ-10GE Performance Metrics Across Realistic Traffic Profiles:

1) Predictable and Bounded Application Performance
2) Maximal and Stable Goodput (approaches 10Gbps link capacity)
3) Low, Stable, and Bounded Latency
4) Self-Tuned Performance: Adapts to changes in network topology, add/move, and traffic mix/volume

Goodput = Load-invariant lossless throughput across all traffic profiles



Ordinary 10GE Performance Metrics Across Realistic Traffic Profiles:

1) Unpredictable Application Performance
2) Unpredictable Goodput
3) Unpredictable Latency



Saturday, April 5, 2008

Impacting Application Performance Where it Matters: The Top Line

1) What is the opportunity cost of an average downtime trading minute?

2) What is the opportunity cost of a lost trade to the competition?

3) What is at stake? What is the worst case failure scenario?

While engaging a CIO in meaningful conversation on cutting-edge technology at any one of the major Wall Street investment banks, a vendor must necessarily be prepared to propose concise solutions to address these vexing questions.

Do your solutions measure up to the challenge?

1) The cost of an average downtime trading minute could run in the multi-millions of dollars.

2) Investment banks view their IT infrastructure as a competitive weapon. Every millisecond matters when it comes to execution, delivering performance, and maximizing revenue streams.

3) A network-induced application meltdown is any CIO's worst case failure scenario.

As data center applications are distributed and virtualized across clustered compute and storage resource pools, I/O and networking is fast becoming the performance choke point.

Mission-critical applications running on ordinary 10GE switching solutions are getting the short end of the performance stick. Application performance on such networks is at best unpredictable. At its worst, poor performance can lead to a network-induced application meltdown.

What are the core elements to designing a future-proof next generation data center network?