Clivia Systems
 
Company  Activities  Technology  Resources  
  Article
Home

 

"THE FUTURE OF PROCESSING"

an unpublished article

By: Jeff Lawrence

A number of parallel forces are converging to create a new set of challenges in the network infrastructure. The nature of the content stored within the network, and the way in which it is presented to end-users, is changing. Content used to be text and graphics. In the future, it will also be voice, audio and video. Entirely new classes of devices will proliferate in the home and enterprise that will have varying display and input capabilities. Getting the content to these devices quickly, reliably and with low delay and then, displaying it in a useful form, will be a challenge for the storage, transport and processing technologies of the network infrastructure.

The advance of storage, transport and processing technologies are described by a number of “laws.” These laws attempt to provide some predictability to an industry that is constantly changing. Some well-known laws include Metcalfe’s Law, Gilder’s Law and, of course, Moore’s Law. Moore’s Law states that the number of transistors in a given area (transistors per square centimeter) is doubling every 18 months. Moore’s law is alive and well though a power wall is looming as a consequence of increasing transistor density and clock frequency. The power density of microprocessors is increasing from 1 watt per square centimeter to 100’s of watts per square centimeter. The current trend line leads to a future in which the power density of microprocessors will approach that of a nuclear reactor and even a rocket engine nozzle. Clearly, something along this path needs to change.

Gilder’s Law states that bandwidth (bits per second) is doubling about every 9 - 12 months. Many people argue over what exactly it claims to measure and the accuracy of its prediction, although exactly what Gilder’s Law represents and the rate of increase in bandwidth it describes isn’t critically important. The important thing to note is that it is a graph representing bits per second that goes up and to the right. Gilder’s Law and Moore’s Law are quite often put on the same graph to make a point that bandwidth is growing at a rate faster than processor performance. The problem with this comparison is that different and unrelated units of measurement describe the two lines on the graph and more importantly, there is a third line missing from the graph.

Network, application and algorithmic complexity are increasing. Metcalfe’s Law, as modified by Hundt and Goldberg, states that the value of the network increases as the square of the number of users, and in proportion to the content on the network and its accessibility. Another way to look at this is that as the content, accessibility and users increase, so does the complexity of the network. In addition to the network itself, application complexity is increasing as the network moves from text-based to signals-based (voice, audio, video) processing. This complexity is increasing not only because of the changing nature of the information, but also because of what is being done to it by compression, encoding, encryption and other algorithms. There are hundreds, if not thousands, of algorithms used in communications and they are evolving from the raw processing of bits to more complex and abstract processing. For example, early versions of MPEG perform bit compression on video, and future versions of MPEG will use fractal and wavelet modeling to represent video. Voice recognition today is focused on phoneme parsing and recognition. In the future, natural language processing will not only involve phoneme parsing and recognition but it will also involve semantic and contextual processing.

A bit is no longer just a bit. The service provided to an end user is the sum total of many individual services, applications and algorithms running across the network. A mobile wireless data service, for example, is a combination of many applications running on network elements such as storage devices, servers, trunking gateways, media gateways, location registers, radio interfaces and even the wireless client. A bit starts with a value at one end of the network and is processed many times as it is encoded, compressed, encrypted and acted upon in other ways towards its destination. The information content of a bit varies widely depending on where it is observed and measured within the network (figure 1).

The third line missing from the Moore’s Law and Gilder’s Law graph is “operations per bit” (figure 2). As described earlier, the number of operations being performed per bit is increasing as network, application and algorithmic complexity is increasing. The slope of this third line is not known yet but it is significant. As the operations per bit line is multiplied against the bits per second line, it will be observed that the new, resulting “operations per second” line is increasing faster than the Moore’s Law line.

The future is about transporting, cracking and stuffing packets at increasing wire and fiber speeds (figure 3). Ethernet clients will soon be making a transition from 100 Mbps to 1 Gbps bandwidth and related transitions will be occurring in the server, storage, Metropolitan Area Network (MAN) and Wide Area Network (WAN) to 10 Gbps and then 40 Gbps. The future unit of measurement in the network will be 10 Gbps. At 10 Gbps, the packet arrival rate is 35 ns. This doesn’t give traditional processors operating at 1 GHz much time to do things, and it only gets worse at 40 Gbps with an associated packet arrival rate of 8 ns. A 1 GHz microprocessor can execute about 35 cycles in 35 ns, which simply isn’t enough time to do everything needed by many applications. Sustainable packet processing at 10 Gbps can be offered only if: 1) processing for every packet is completed in less than 35 ns, 2) packets are aggregated together (folded) and processed together in less than 35 ns, or 3) processing is broken up into multiple stages sequential to each other (pipelined) and processed in each individual stage in less than 35 ns. This is a significant performance challenge for traditional processing models.

New processing models are needed for the future. A packet-centric processor and logic-based architecture can provide a continuum of optimized processing (i.e. performance, power, cost) for the different phases of an application or service running at wire or fiber speed (figure 4). There are a wide range of processors and configurable/programmable logic elements available today consisting of application processors (e.g. Intel® Pentium®, Intel® Itanium® and SPARC processors), control processors (e.g., Intel® XscaleTM, PowerPC and MIPS processors), packet processors (e.g., microengines), signal processors (e.g., DSPs), reprogrammable logic (e.g., FPGAs), and application specific logic (e.g., ASICs). At first glance, these processor and logic elements may appear to be distinct and separate, but their different strengths can be positioned in a conceptual framework that provides a powerful and scalable-processing model. New processing models will be measured on their ability to perform along three different axes: statefulness, decision complexity and algorithmic complexity. The three different axes define the space in which the particular capabilities of processor or logic elements may be mapped. Statefulness is that aspect of the application that must remember something about the past in order to make a decision in the present. Decision complexity represents the number of possible branches that may be taken on each bit of the packet. A typical application that may require high decision complexity would be intrusion detection or virus filtering. Algorithmic complexity, or algorithmic intensity as some call it, represents the number of arithmetic or logical operations that must be performed on a bit. A typical application that may require high algorithmic complexity would be media transcoding between MPEG-3 and MPEG-4 video streams. Applications and their “lifetime” can be plotted in the same space as the processor and logic elements, to identify which elements are needed to efficiently support the different phases of an application (figure 5).

The processor and logic elements can be tied together by a common interconnect technology and software that spans the processor and logic elements. We are about to enter a period of technological disruption as long established and proven interconnect technologies start to be replaced by new serial, low-pin count, low-voltage differential signaling and high-bandwidth solutions designed for the packet-based infrastructure. It appears the leading candidates to lead this disruption are what is currently known as PCI Express (formerly known as 3GIO), HyperTransport and RapidIO (RIO) for chip-to-chip and board-to-board interconnect. 1/10 Gbps Ethernet and InfiniBand will also be used for board-to-board and chassis-to-chassis interconnect.

In addition to the interconnect technology, compile time and runtime software are needed to complete the picture. Applications will call a standard set of library functions and use programming interfaces such as sockets. The compile time software will be able to parse the applications into functions that can take advantage of the particular strengths of whatever processor and logic elements are available in the system (codecs will use any available signal processors, pattern matching will use any available reprogrammable logic, etc.). The goal is to create a software environment in which the application can transparently scale its performance and remain unaware of the underlying details of the processor and logic elements.

The diversity of technologies (e.g. analog, digital and MEMS) and operational domains (e.g.. electrical, radio, optical and mechanical) in communications makes it very difficult to draw a boundary around a solution and compare it against another solution. In many cases, processing solutions are optimized for performance, power and other considerations by using different algorithms, architectures, data representations, numbering systems and other techniques. Architectural approaches focus on different processor, programmable logic and configurable logic combinations designed to operate in sequence or in parallel on a data flow. Different numbering systems and the manner in which data is represented, can have an impact on the amount of transistor logic needed to process data (e.g. sections of logic or memory can be turned off to reduce power consumption). The unit of “operations per bit” offers a way to compare various solutions and to also tie Gilder’s Law “demand” to Moore’s Law “supply.”

The unit of “operations per bit” begs the question, what is an operation? An operation is a set of functions that are executed across some combination of hardware and software. The exact distribution of an operation is dependent on a number of tradeoffs and optimizations. Operations can be measured in units of execution, structure, energy or maybe even complexity. The challenge is to find a unit that can account for the distributed nature of where an operation is performed, and provide a useful method for comparing different processing solutions and the distribution of functions being performed. Some approaches might try to minimize power consumption and others might try to maximize flexibility. In the research community, a number of disparate and non-converging units or measurement are in use. The traditional measure of an operation could be an abstract unit to which IPS, MACS, FLOPS and OPS can all be normalized against. A structural unit might be a 4-bit adder to which other logic such as a multiplier or memory cell could be normalized against. The most interesting unit of measure for an operation may be the unit of energy called the Joule. A change in information content requires a change in energy. This implies that a way to measure the increase in operations per bit would be to actually measure the increase in Joules expended per bit. Software running on an application processor might expend more energy to perform a given operation than an ASIC because of differences in the number of transistors needed to execute a software program rather than application specific logic. Joules per bit is interesting because when multiplied by bandwidth, it results in a graph of “demand” that is in units of Joules per second, which can be compared against Moore’s Law of transistors and the power they consume.

Existing processing models will prove to be insufficient for the NGN. The move from a text-based world to a signals-based world will drive the demand for new processing models. These new models will provide a continuum of processors and logic that are easily interconnected and designed to support transparent scalable processing of text and non-text based signals at wire and fiber speed. The shift to these new models will have a profound impact on the capabilities of the NGN and the types of processing offered by the network infrastructure.

The number of operations being performed per bit is increasing in the NGN as network, application and algorithmic complexity also increases. New approaches will be needed to provide the proper balance between performance, power and the other factors, but technology and domain differences make comparisons difficult with existing units of measurement such as processor frequency. Measuring performance by frequency is becoming less relevant, while Joules per operation may be the most appropriate new unit for providing a means to evaluate different approaches for distributing operations between hardware and software and, when used in conjunction with bandwidth, provides a means to evaluate overall system performance. The communications industry is at a crossroads and it will need to develop and agree upon new measurements of performance to reflect the diverse nature of technologies it has to offer and the solutions that they enable.

The preceding article was based, in part, on some of my previous columns from Communications Solutions.

 

(c) 2002, Clivia Systems. All rights Reserved.
Last updated: Friday September 13, 2002, Today's date: