Knowledge

TCP offload engine

Source 📝

322:
connection and its state are passed to the TCP offload engine. The heavy lifting of data transmit and receive is handled by the offload device. Almost all TCP offload engines use some type of TCP/IP hardware implementation to perform the data transfer without host CPU intervention. When the connection is closed, the connection state is returned from the offload engine to the main system stack. Maintaining control of TCP connections allows the main system stack to implement and control connection security.
543:– TOE breaks the assumption that kernels make about having access to all resources at all times – details such as memory used by open connections are not available with TOE. TOE also requires very large changes to a networking stack in order to be supported properly, and even when that is done, features like 187:
will be required to handle the TCP/IP processing associated with 5 Gbit/s of TCP/IP traffic. Since Ethernet (10GE in this example) is bidirectional, it is possible to send and receive 10 Gbit/s (for an aggregate throughput of 20 Gbit/s). Using the 1 Hz/(bit/s) rule this equates to
224:
and PCs. PCI is inefficient for transferring small bursts of data from main memory, across the PCI bus to the network interface ICs, but its efficiency improves as the data burst size increases. Within the TCP protocol, a large number of small packets are created (e.g. acknowledgements) and as these
228:
A TOE solution, located on the network interface, is located on the other side of the PCI bus from the CPU host so it can address this I/O efficiency issue, as the data to be sent across the TCP connection can be sent to the TOE from the CPU across the PCI bus using large data burst sizes with none
257:
licensed Alacritech's patent base and along with Alacritech created the partial TCP offload architecture that has become known as TCP chimney offload. TCP chimney offload centers on the Alacritech "Communication Block Passing Patent". At the same time, Broadcom also obtained a license to build TCP
241:
in early 1990. Auspex founder Larry Boucher and a number of Auspex engineers went on to found Alacritech in 1997 with the idea of extending the concept of network stack offload to TCP and implementing it in custom silicon. They introduced the first parallel-stack full offload network card in early
483:
For example, a unit of 64 KiB (65,536 bytes) of data is usually segmented to 45 segments of 1460 bytes each before it is sent through the NIC and over the network. With some intelligence in the NIC, the host CPU can hand over the 64 KB of data to the NIC in a single transmit-request, the NIC can
321:
TCP chimney offload addresses the major security criticism of parallel-stack full offload. In partial offload, the main system stack controls all connections to the host. After a connection has been established between the local host (usually a server) and a foreign host (usually a client) the
249:
spurred interest, it was said that "At least a dozen newcomers, most founded toward the end of the dot-com bubble, are chasing the opportunity for merchant semiconductor accelerators for storage protocols and applications, vying with half a dozen entrenched vendors and in-house ASIC designs."
531:, instead of just software, to address any security vulnerabilities found in a particular TOE implementation. This is further compounded by the newness and vendor-specificity of this hardware, as compared to a well tested TCP/IP stack as is found in an operating system that does not use TOE. 557:– TOE is implemented differently by each hardware vendor. This means more code must be rewritten to deal with the various TOE implementations, at a cost of the aforementioned complexity and, possibly, security. Furthermore, TOE firmware cannot be easily modified since it is closed-source. 469:
When a system needs to send large chunks of data out over a computer network, the chunks first need breaking down into smaller segments that can pass through all the network elements like routers and switches between the source and destination computers. This process is referred to as
312:
storage device. This type of TCP offload not only offloads TCP/IP processing but it also offloads the iSCSI initiator function. Because the HBA appears to the host as a disk controller, it can only be used with iSCSI devices and is not appropriate for general TCP/IP offload.
971:. Measurement, Modeling, and Evaluation of Computing Systems and Dependability and Fault Tolerance: 16th International GI/ITG Conference, MMB & DFT 2012. Lecture Notes in Computer Science. Vol. 7201. Kaiserslautern, Germany: Springer (published 2012). p. 198. 488:, and data link layer protocol headers — according to a template provided by the host's TCP/IP stack — to each segment, and send the resulting frames over the network. This significantly reduces the work done by the CPU. As of 2014 many new NICs on the market support TSO. 1043: 116:
environments at speeds of over 1 Gigabit per second. At these speeds the TCP software implementations on host systems require significant computing power. In the early 2000s, full-duplex gigabit TCP communication could consume more than 80% of a 2.4 GHz
295:
using a "vampire tap". The vampire tap intercepts TCP connection requests by applications and is responsible for TCP connection management as well as TCP data transfer. Many of the criticisms in the following section relate to this type of TCP offload.
694: 680: 989:
Large-Receive-Offload (LRO) reduces the per-packet processing overhead by aggregating smaller packets into larger ones and passing them up to the network stack. Generic-Receive-Offload (GRO) provides a generalized software version of LRO
511:
Unlike other operating systems, such as FreeBSD, the Linux kernel does not include support for TOE (not to be confused with other types of network offload). While there are patches from the hardware manufacturers such as
286:
Parallel-stack full offload gets its name from the concept of two parallel TCP/IP Stacks. The first is the main host stack which is included with the host OS. The second or "parallel stack" is connected between the
211:
In addition to the protocol overhead that TOE can address, it can also address some architectural issues that affect a large percentage of host based (server and PC) endpoints. Many older end point hosts are
537:
of hardware – because connections are buffered and processed on the TOE chip, resource starvation can more easily occur as compared to the generous CPU and memory available to the operating system.
199:
CPU) to perform other tasks such as file system processing (in a file server) or indexing (in a backup media server). In other words, a server with TCP/IP offload can do more
242:
1999; the company's SLIC (Session Layer Interface Card) was the predecessor to its current TOE offerings. Alacritech holds a number of patents in the area of TCP/IP offload.
266:
Instead of replacing the TCP stack with a TOE entirely, there are alternative techniques to offload some operations in co-operation with the operating system's TCP stack.
563:– Each TOE NIC has a limited lifetime of usefulness, because system hardware rapidly catches up to TOE performance levels, and eventually exceeds TOE performance levels. 183:
of TCP/IP. For example, 5 Gbit/s (625 MB/s) of network traffic requires 5 GHz of CPU processing. This implies that 2 entire cores of a 2.5 GHz
225:
are typically generated on the host CPU and transmitted across the PCI bus and out the network physical interface, this impacts the host computer IO throughput.
372:
According to benchmarks, even implementing this technique entirely in software can increase network performance significantly. As of April  2007, the
278:
and TCP acknowledgment offload are already implemented in some high-end Ethernet hardware, but are effective even when implemented purely in software.
1034: 1084: 1129: 357:
into a larger buffer before they are passed higher up the networking stack, thus reducing the number of packets that have to be processed.
138:
Acknowledgment of packets as they are received by the far end, adding to the message flow between the endpoints and thus the protocol load.
53:, where processing overhead of the network stack becomes significant. TOEs are often used as a way to reduce the overhead associated with 976: 213: 618: 350: 1124: 463: 377: 157: 132: 74: 708: 1119: 1077: 125: 638: 606: 476:. Often the TCP protocol in the host computer performs this segmentation. Offloading this work to the NIC is called 670:. Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS). Austin, Texas. 753: 1104: 612: 601: 572:
Much of the current work on TOE technology is by manufacturers of 10 Gigabit Ethernet interface cards, such as
491:
Some network cards implement TSO generically enough that it can be used for offloading fragmentation of other
439: 346: 292: 288: 101: 968:
Performance Evaluation of 10GE NICs with SR-IOV Support: I/O Virtualization and network Stack Optimizations
164:
Moving some or all of these functions to dedicated hardware, a TCP offload engine, frees the system's main
697:"Passing a Communication Block from Host to a Local Device such that a message is processed on the Device" 577: 500: 443: 38: 121:
processor, resulting in small or no processing resources left for the applications to run on the system.
1050: 585: 435: 342: 275: 271: 1089: 520:
that add TOE support, the Linux kernel developers are opposed to this technology for several reasons:
388: 267: 184: 661:
Annie P. Foong; Thomas R. Huff; Herbert H. Hum; Jaidev P. Patwardhan; Greg J. Regnier (2003-04-02).
308:
which present themselves as disk controllers to the host system while connecting (via TCP/IP) to an
176:
A generally accepted rule of thumb is that 1 Hertz of CPU processing is required to send or receive
1094: 472: 97: 62: 50: 45:
stack to the network controller. It is primarily used with high-speed network interfaces, such as
1073: 544: 354: 221: 196: 151: 972: 899: 485: 113: 85: 54: 548: 496: 419: 93: 46: 446:(NIC). The NIC then splits this buffer into separate packets. The technique is also called 492: 89: 1099: 896:"Linux* Base Driver for the Intel(R) Ethernet 10 Gigabit PCI Express Family of Adapters" 895: 238: 179: 147: 662: 466:. LSO and LRO are independent and use of one does not require the use of the other. 144:
and sequence number calculations - again a burden on a general purpose CPU to perform.
1113: 924: 920: 373: 305: 1014: 966: 217: 135:
using the "3-way handshake" (SYNchronize; SYNchronize-ACKnowledge; ACKnowledge).
109: 105: 17: 431: 338: 274:
are supported by the majority of today's Ethernet NICs. Newer techniques like
527:– because TOE is implemented in hardware, patches must be applied to the TOE 1058:
Proceedings of HotOS IX: The 9th Workshop on Hot Topics in Operating Systems
366: 254: 237:
One of the first patents in this technology, for UDP offload, was issued to
118: 871:"Poor TCP performance can occur in Linux virtual machines with LRO enabled" 216:
bus based, which provides a standard interface for the addition of certain
845: 820: 795: 573: 528: 141: 406:) implements a generalised LRO in software that isn't restricted to TCP/ 1100:
Case Studies of Performance issues with LSO and Traffic Shaping (Linux)
950: 729: 513: 484:
break that data down into smaller segments of 1460 bytes, add the TCP,
387:
LRO should not operate on machines acting as routers, as it breaks the
381: 362: 78: 770: 499:
for protocols that don't support fragmentation by themselves, such as
128:
which adds complexity and processing overhead. These aspects include:
874: 644: 589: 581: 517: 42: 1001: 945: 725: 870: 358: 309: 246: 81: 58: 965:
Huang, Shu; Baldine, Ilia (March 2012). Schmitt, Jens B. (ed.).
407: 100:
links) and faster and more reliable access mechanisms (such as
165: 77:
was designed for unreliable low speed networks (such as early
1038: 229:
of the smaller TCP packets having to traverse the PCI bus.
442:
overhead. It works by passing a multipacket buffer to the
361:
implementations generally use LRO in conjunction with the
349:(CPU) overhead. It works by aggregating multiple incoming 1105:
FreeBSD 7.0 new features, brief discussion on TSO support
683:"Parallel I/O network file server architecture category" 384:
8 supports LRO in hardware on adapters that support it.
195:
by TCP/IP offload and may be used by the CPU (usually a
245:
By 2002, as the emergence of TCP-based storage such as
304:
HBA (Host Bus Adapter) full offload is found in iSCSI
191:
Many of the CPU cycles used for TCP/IP processing are
771:"lro: Generic Large Receive Offload for TCP traffic" 84:) but with the growth of the Internet in terms of 1051:"TCP offload is a dumb idea whose time has come" 921:"Disable LRO for all NICs that have LRO enabled" 203:work than a server without TCP/IP offload NICs. 752:Aravind Menon; Willy Zwaenepoel (2008-04-28). 640:TCP Offload Is a Dumb Idea Whose Time Has Come 758:. USENIX Annual Technical Conference. USENIX. 8: 150:calculations for packet acknowledgement and 391:and can significantly impact performance. 41:(NIC) to offload processing of the entire 27:Technology used in network interface cards 719: 717: 709:"Newcomers spin storage network silicon " 337:) is a technique for increasing inbound 747: 745: 656: 654: 629: 430:) is a technique for increasing egress 7: 711:, Rick Merritt, 10/21/2002, EE Times 365:(NAPI) to also reduce the number of 410:or have the issues created by LRO. 946:"JLS2009: Generic receive offload" 755:Optimizing TCP Receive Performance 25: 1095:Brief Description of LSO in Linux 438:network connections by reducing 345:network connections by reducing 1044:Patent Application 20040042487 1002:"Linux and TCP offload engines" 637:Jeffrey C. Mogul (2003-05-18). 619:Autonomous peripheral operation 57:(IP) storage protocols such as 37:) is a technology used in some 769:Andrew Gallatin (2007-07-25). 724:Jonathan Corbet (2007-08-01). 695:United States Patent: 6247060 681:United States Patent: 5355453 220:such as Network Interfaces to 1: 1130:Transmission Control Protocol 1074:"TCP/IP offload Engine (TOE)" 1085:Windows Network Task Offload 1078:10 Gigabit Ethernet Alliance 456:generic segmentation offload 126:connection-oriented protocol 1004:, August 22, 2005, LWN.net 607:I/O Acceleration Technology 317:TCP chimney partial offload 282:Parallel-stack full offload 108:) it is frequently used in 88:transmission speeds (using 1146: 1049:Mogul, Jeffrey C. (2003). 664:TCP performance re-visited 188:eight 2.5 GHz cores. 1035:TCP Offload to the Rescue 613:Energy Efficient Ethernet 602:Scalable Networking Pack 495:protocols, or for doing 478:TCP segmentation offload 448:TCP segmentation offload 207:Reduction of PCI traffic 133:Connection establishment 726:"Large receive offload" 400:Generic receive offload 395:Generic receive offload 347:central processing unit 258:chimney offload chips. 39:network interface cards 578:Chelsio Communications 444:network interface card 158:Connection termination 586:Mellanox Technologies 331:Large receive offload 326:Large receive offload 293:Transport Layer (TCP) 276:large receive offload 272:large segment offload 1125:Network acceleration 1060:. USENIX Association 389:end-to-end principle 268:TCP checksum offload 185:multi-core processor 1120:Networking hardware 420:computer networking 380:in software only. 172:Freed-up CPU cycles 98:10 Gigabit Ethernet 63:Network File System 51:10 Gigabit Ethernet 1037:by Andy Currid at 1019:, Linux Foundation 545:quality of service 462:) when applied to 424:large send offload 414:Large send offload 152:congestion control 31:TCP offload engine 900:Intel Corporation 376:supports LRO for 289:Application Layer 168:for other tasks. 55:Internet Protocol 16:(Redirected from 1137: 1081: 1069: 1067: 1065: 1055: 1022: 1020: 1011: 1005: 999: 993: 992: 986: 985: 962: 956: 955: 942: 936: 935: 933: 932: 917: 911: 910: 908: 907: 892: 886: 885: 883: 882: 867: 861: 860: 858: 856: 842: 836: 835: 833: 831: 817: 811: 810: 808: 806: 792: 786: 785: 783: 782: 766: 760: 759: 749: 740: 739: 737: 736: 721: 712: 706: 700: 692: 686: 678: 672: 671: 669: 658: 649: 648: 634: 549:packet filtering 507:Support in Linux 497:IP fragmentation 300:HBA full offload 182: 94:Gigabit Ethernet 47:gigabit Ethernet 21: 1145: 1144: 1140: 1139: 1138: 1136: 1135: 1134: 1110: 1109: 1072: 1063: 1061: 1053: 1048: 1030: 1025: 1013: 1012: 1008: 1000: 996: 983: 981: 979: 964: 963: 959: 944: 943: 939: 930: 928: 919: 918: 914: 905: 903: 894: 893: 889: 880: 878: 869: 868: 864: 854: 852: 844: 843: 839: 829: 827: 819: 818: 814: 804: 802: 794: 793: 789: 780: 778: 768: 767: 763: 751: 750: 743: 734: 732: 723: 722: 715: 707: 703: 693: 689: 679: 675: 667: 660: 659: 652: 636: 635: 631: 627: 598: 570: 551:might not work. 509: 493:transport layer 416: 397: 328: 319: 302: 284: 264: 235: 209: 177: 174: 90:Optical Carrier 71: 28: 23: 22: 18:TCP accelerator 15: 12: 11: 5: 1143: 1141: 1133: 1132: 1127: 1122: 1112: 1111: 1108: 1107: 1102: 1097: 1092: 1087: 1082: 1070: 1046: 1041: 1029: 1028:External links 1026: 1024: 1023: 1016:Networking:TOE 1006: 994: 977: 957: 937: 912: 887: 862: 837: 812: 787: 777:(Mailing list) 761: 741: 713: 701: 687: 673: 650: 628: 626: 623: 622: 621: 616: 610: 604: 597: 594: 569: 566: 565: 564: 558: 552: 538: 532: 508: 505: 415: 412: 396: 393: 353:from a single 327: 324: 318: 315: 301: 298: 283: 280: 263: 260: 239:Auspex Systems 234: 231: 208: 205: 173: 170: 162: 161: 155: 148:Sliding window 145: 139: 136: 70: 67: 26: 24: 14: 13: 10: 9: 6: 4: 3: 2: 1142: 1131: 1128: 1126: 1123: 1121: 1118: 1117: 1115: 1106: 1103: 1101: 1098: 1096: 1093: 1091: 1088: 1086: 1083: 1080:. April 2002. 1079: 1075: 1071: 1059: 1052: 1047: 1045: 1042: 1040: 1036: 1032: 1031: 1027: 1018: 1017: 1010: 1007: 1003: 998: 995: 991: 980: 978:9783642285400 974: 970: 969: 961: 958: 953: 952: 947: 941: 938: 926: 925:Red Hat, Inc. 922: 916: 913: 901: 897: 891: 888: 876: 872: 866: 863: 851: 847: 841: 838: 826: 822: 816: 813: 801: 797: 791: 788: 776: 772: 765: 762: 757: 756: 748: 746: 742: 731: 727: 720: 718: 714: 710: 705: 702: 699: 698: 691: 688: 685: 684: 677: 674: 666: 665: 657: 655: 651: 646: 642: 641: 633: 630: 624: 620: 617: 614: 611: 608: 605: 603: 600: 599: 595: 593: 591: 587: 583: 579: 575: 567: 562: 559: 556: 553: 550: 546: 542: 539: 536: 533: 530: 526: 523: 522: 521: 519: 515: 506: 504: 502: 498: 494: 489: 487: 481: 479: 475: 474: 467: 465: 461: 457: 453: 449: 445: 441: 437: 433: 429: 425: 421: 413: 411: 409: 405: 401: 394: 392: 390: 385: 383: 379: 375: 370: 368: 364: 360: 356: 352: 348: 344: 340: 336: 332: 325: 323: 316: 314: 311: 307: 306:host adapters 299: 297: 294: 290: 281: 279: 277: 273: 269: 261: 259: 256: 251: 248: 243: 240: 232: 230: 226: 223: 219: 215: 206: 204: 202: 198: 194: 189: 186: 181: 171: 169: 167: 159: 156: 153: 149: 146: 143: 140: 137: 134: 131: 130: 129: 127: 122: 120: 115: 111: 107: 103: 99: 95: 91: 87: 83: 80: 76: 68: 66: 64: 60: 56: 52: 48: 44: 40: 36: 32: 19: 1090:GSO in Linux 1062:. Retrieved 1057: 1015: 1009: 997: 988: 982:. Retrieved 967: 960: 949: 940: 929:. Retrieved 915: 904:. Retrieved 902:. 2013-02-12 890: 879:. Retrieved 877:. 2011-07-04 865: 853:. Retrieved 849: 840: 828:. Retrieved 824: 815: 803:. Retrieved 799: 790: 779:. Retrieved 775:linux-kernel 774: 764: 754: 733:. Retrieved 704: 696: 690: 682: 676: 663: 639: 632: 571: 561:Obsolescence 560: 554: 540: 534: 524: 510: 490: 482: 477: 473:segmentation 471: 468: 459: 455: 451: 447: 427: 423: 417: 403: 399: 398: 386: 374:Linux kernel 371: 334: 330: 329: 320: 303: 285: 265: 252: 244: 236: 227: 210: 200: 192: 190: 175: 163: 123: 112:and desktop 110:data centers 106:cable modems 72: 34: 30: 29: 850:Freebsd.org 825:Freebsd.org 800:Freebsd.org 555:Proprietary 535:Limitations 218:peripherals 73:Originally 1114:Categories 984:2016-10-11 931:2013-04-24 927:2013-01-10 906:2013-04-24 881:2011-08-17 781:2007-08-22 735:2007-08-22 625:References 541:Complexity 432:throughput 367:interrupts 339:throughput 1039:ACM Queue 1033:Article: 643:. HotOS. 568:Suppliers 436:bandwidth 343:bandwidth 255:Microsoft 124:TCP is a 119:Pentium 4 596:See also 574:Broadcom 529:firmware 525:Security 434:of high- 341:of high- 291:and the 253:In 2005 193:freed-up 142:Checksum 86:backbone 1064:23 July 951:lwn.net 855:12 July 830:12 July 805:12 July 730:LWN.net 609:(I/OAT) 514:Chelsio 480:(TSO). 382:FreeBSD 363:New API 351:packets 233:History 222:Servers 178:1  79:dial-up 69:Purpose 65:(NFS). 975:  875:VMware 846:"Nxge" 821:"Mxge" 796:"Cxgb" 645:Usenix 590:QLogic 582:Emulex 518:Qlogic 355:stream 201:server 197:server 82:modems 43:TCP/IP 1054:(PDF) 668:(PDF) 615:(EEE) 454:) or 359:Linux 310:iSCSI 262:Types 247:iSCSI 180:bit/s 59:iSCSI 1066:2006 973:ISBN 857:2018 832:2018 807:2018 547:and 408:IPv4 270:and 104:and 96:and 61:and 49:and 516:or 501:UDP 464:TCP 460:GSO 452:TSO 440:CPU 428:LSO 418:In 404:GRO 378:TCP 335:LRO 214:PCI 166:CPU 102:DSL 75:TCP 35:TOE 1116:: 1076:. 1056:. 987:. 948:. 923:. 898:. 873:. 848:. 823:. 798:. 773:. 744:^ 728:. 716:^ 653:^ 592:. 588:, 584:, 580:, 576:, 503:. 486:IP 422:, 369:. 114:PC 92:, 1068:. 1021:. 990:. 954:. 934:. 909:. 884:. 859:. 834:. 809:. 784:. 738:. 647:. 458:( 450:( 426:( 402:( 333:( 160:. 154:. 33:( 20:)

Index

TCP accelerator
network interface cards
TCP/IP
gigabit Ethernet
10 Gigabit Ethernet
Internet Protocol
iSCSI
Network File System
TCP
dial-up
modems
backbone
Optical Carrier
Gigabit Ethernet
10 Gigabit Ethernet
DSL
cable modems
data centers
PC
Pentium 4
connection-oriented protocol
Connection establishment
Checksum
Sliding window
congestion control
Connection termination
CPU
bit/s
multi-core processor
server

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.