1
Chelsio N210 10Gb Ethernet Network Controller
3
Driver Release Notes for Linux
22
This document describes the Linux driver for Chelsio 10Gb Ethernet Network
23
Controller. This driver supports the Chelsio N210 NIC and is backward
24
compatible with the Chelsio N110 model 10Gb NICs.
30
Adaptive Interrupts (adaptive-rx)
31
---------------------------------
33
This feature provides an adaptive algorithm that adjusts the interrupt
34
coalescing parameters, allowing the driver to dynamically adapt the latency
35
settings to achieve the highest performance during various types of network
38
The interface used to control this feature is ethtool. Please see the
39
ethtool manpage for additional usage information.
41
By default, adaptive-rx is disabled.
42
To enable adaptive-rx:
44
ethtool -C <interface> adaptive-rx on
46
To disable adaptive-rx, use ethtool:
48
ethtool -C <interface> adaptive-rx off
50
After disabling adaptive-rx, the timer latency value will be set to 50us.
51
You may set the timer latency after disabling adaptive-rx:
53
ethtool -C <interface> rx-usecs <microseconds>
55
An example to set the timer latency value to 100us on eth0:
57
ethtool -C eth0 rx-usecs 100
59
You may also provide a timer latency value while disabling adaptive-rx:
61
ethtool -C <interface> adaptive-rx off rx-usecs <microseconds>
63
If adaptive-rx is disabled and a timer latency value is specified, the timer
64
will be set to the specified value until changed by the user or until
65
adaptive-rx is enabled.
67
To view the status of the adaptive-rx and timer latency values:
69
ethtool -c <interface>
72
TCP Segmentation Offloading (TSO) Support
73
-----------------------------------------
75
This feature, also known as "large send", enables a system's protocol stack
76
to offload portions of outbound TCP processing to a network interface card
77
thereby reducing system CPU utilization and enhancing performance.
79
The interface used to control this feature is ethtool version 1.8 or higher.
80
Please see the ethtool manpage for additional usage information.
82
By default, TSO is enabled.
85
ethtool -K <interface> tso off
89
ethtool -K <interface> tso on
91
To view the status of TSO:
93
ethtool -k <interface>
99
The following information is provided as an example of how to change system
100
parameters for "performance tuning" an what value to use. You may or may not
101
want to change these system parameters, depending on your server/workstation
102
application. Doing so is not warranted in any way by Chelsio Communications,
103
and is done at "YOUR OWN RISK". Chelsio will not be held responsible for loss
104
of data or damage to equipment.
106
Your distribution may have a different way of doing things, or you may prefer
107
a different method. These commands are shown only to provide an example of
108
what to do and are by no means definitive.
110
Making any of the following system changes will only last until you reboot
111
your system. You may want to write a script that runs at boot-up which
112
includes the optimal settings for your system.
114
Setting PCI Latency Timer:
115
setpci -d 1425:* 0x0c.l=0x0000F800
117
Disabling TCP timestamp:
118
sysctl -w net.ipv4.tcp_timestamps=0
121
sysctl -w net.ipv4.tcp_sack=0
123
Setting large number of incoming connection requests:
124
sysctl -w net.ipv4.tcp_max_syn_backlog=3000
126
Setting maximum receive socket buffer size:
127
sysctl -w net.core.rmem_max=1024000
129
Setting maximum send socket buffer size:
130
sysctl -w net.core.wmem_max=1024000
132
Set smp_affinity (on a multiprocessor system) to a single CPU:
133
echo 1 > /proc/irq/<interrupt_number>/smp_affinity
135
Setting default receive socket buffer size:
136
sysctl -w net.core.rmem_default=524287
138
Setting default send socket buffer size:
139
sysctl -w net.core.wmem_default=524287
141
Setting maximum option memory buffers:
142
sysctl -w net.core.optmem_max=524287
144
Setting maximum backlog (# of unprocessed packets before kernel drops):
145
sysctl -w net.core.netdev_max_backlog=300000
147
Setting TCP read buffers (min/default/max):
148
sysctl -w net.ipv4.tcp_rmem="10000000 10000000 10000000"
150
Setting TCP write buffers (min/pressure/max):
151
sysctl -w net.ipv4.tcp_wmem="10000000 10000000 10000000"
153
Setting TCP buffer space (min/pressure/max):
154
sysctl -w net.ipv4.tcp_mem="10000000 10000000 10000000"
156
TCP window size for single connections:
157
The receive buffer (RX_WINDOW) size must be at least as large as the
158
Bandwidth-Delay Product of the communication link between the sender and
159
receiver. Due to the variations of RTT, you may want to increase the buffer
160
size up to 2 times the Bandwidth-Delay Product. Reference page 289 of
161
"TCP/IP Illustrated, Volume 1, The Protocols" by W. Richard Stevens.
162
At 10Gb speeds, use the following formula:
163
RX_WINDOW >= 1.25MBytes * RTT(in milliseconds)
164
Example for RTT with 100us: RX_WINDOW = (1,250,000 * 0.1) = 125,000
165
RX_WINDOW sizes of 256KB - 512KB should be sufficient.
166
Setting the min, max, and default receive buffer (RX_WINDOW) size:
167
sysctl -w net.ipv4.tcp_rmem="<min> <default> <max>"
169
TCP window size for multiple connections:
170
The receive buffer (RX_WINDOW) size may be calculated the same as single
171
connections, but should be divided by the number of connections. The
172
smaller window prevents congestion and facilitates better pacing,
173
especially if/when MAC level flow control does not work well or when it is
174
not supported on the machine. Experimentation may be necessary to attain
175
the correct value. This method is provided as a starting point for the
176
correct receive buffer size.
177
Setting the min, max, and default receive buffer (RX_WINDOW) size is
178
performed in the same manner as single connection.
184
The following messages are the most common messages logged by syslog. These
185
may be found in /var/log/messages.
188
Chelsio Network Driver - version 2.1.1
191
eth#: Chelsio N210 1x10GBaseX NIC (rev #), PCIX 133MHz/64-bit
194
eth#: link is up at 10 Gbps, full duplex
203
These issues have been identified during testing. The following information
204
is provided as a workaround to the problem. In some cases, this problem is
205
inherent to Linux or to a particular Linux Distribution and/or hardware
208
1. Large number of TCP retransmits on a multiprocessor (SMP) system.
210
On a system with multiple CPUs, the interrupt (IRQ) for the network
211
controller may be bound to more than one CPU. This will cause TCP
212
retransmits if the packet data were to be split across different CPUs
213
and re-assembled in a different order than expected.
215
To eliminate the TCP retransmits, set smp_affinity on the particular
216
interrupt to a single CPU. You can locate the interrupt (IRQ) used on
217
the N110/N210 by using ifconfig:
218
ifconfig <dev_name> | grep Interrupt
219
Set the smp_affinity to a single CPU:
220
echo 1 > /proc/irq/<interrupt_number>/smp_affinity
222
It is highly suggested that you do not run the irqbalance daemon on your
223
system, as this will change any smp_affinity setting you have applied.
224
The irqbalance daemon runs on a 10 second interval and binds interrupts
225
to the least loaded CPU determined by the daemon. To disable this daemon:
226
chkconfig --level 2345 irqbalance off
228
By default, some Linux distributions enable the kernel feature,
229
irqbalance, which performs the same function as the daemon. To disable
230
this feature, add the following line to your bootloader:
233
Example using the Grub bootloader:
234
title Red Hat Enterprise Linux AS (2.4.21-27.ELsmp)
236
kernel /vmlinuz-2.4.21-27.ELsmp ro root=/dev/hda3 noirqbalance
237
initrd /initrd-2.4.21-27.ELsmp.img
239
2. After running insmod, the driver is loaded and the incorrect network
240
interface is brought up without running ifup.
242
When using 2.4.x kernels, including RHEL kernels, the Linux kernel
243
invokes a script named "hotplug". This script is primarily used to
244
automatically bring up USB devices when they are plugged in, however,
245
the script also attempts to automatically bring up a network interface
246
after loading the kernel module. The hotplug script does this by scanning
247
the ifcfg-eth# config files in /etc/sysconfig/network-scripts, looking
248
for HWADDR=<mac_address>.
250
If the hotplug script does not find the HWADDRR within any of the
251
ifcfg-eth# files, it will bring up the device with the next available
252
interface name. If this interface is already configured for a different
253
network card, your new interface will have incorrect IP address and
256
To solve this issue, you can add the HWADDR=<mac_address> key to the
257
interface config file of your network controller.
259
To disable this "hotplug" feature, you may add the driver (module name)
260
to the "blacklist" file located in /etc/hotplug. It has been noted that
261
this does not work for network devices because the net.agent script
262
does not use the blacklist file. Simply remove, or rename, the net.agent
263
script located in /etc/hotplug to disable this feature.
265
3. Transport Protocol (TP) hangs when running heavy multi-connection traffic
266
on an AMD Opteron system with HyperTransport PCI-X Tunnel chipset.
268
If your AMD Opteron system uses the AMD-8131 HyperTransport PCI-X Tunnel
269
chipset, you may experience the "133-Mhz Mode Split Completion Data
270
Corruption" bug identified by AMD while using a 133Mhz PCI-X card on the
273
AMD states, "Under highly specific conditions, the AMD-8131 PCI-X Tunnel
274
can provide stale data via split completion cycles to a PCI-X card that
275
is operating at 133 Mhz", causing data corruption.
277
AMD's provides three workarounds for this problem, however, Chelsio
278
recommends the first option for best performance with this bug:
280
For 133Mhz secondary bus operation, limit the transaction length and
281
the number of outstanding transactions, via BIOS configuration
282
programming of the PCI-X card, to the following:
284
Data Length (bytes): 1k
285
Total allowed outstanding transactions: 2
287
Please refer to AMD 8131-HT/PCI-X Errata 26310 Rev 3.08 August 2004,
288
section 56, "133-MHz Mode Split Completion Data Corruption" for more
289
details with this bug and workarounds suggested by AMD.
291
It may be possible to work outside AMD's recommended PCI-X settings, try
292
increasing the Data Length to 2k bytes for increased performance. If you
293
have issues with these settings, please revert to the "safe" settings
294
and duplicate the problem before submitting a bug or asking for support.
296
NOTE: The default setting on most systems is 8 outstanding transactions
297
and 2k bytes data length.
299
4. On multiprocessor systems, it has been noted that an application which
300
is handling 10Gb networking can switch between CPUs causing degraded
301
and/or unstable performance.
303
If running on an SMP system and taking performance measurements, it
304
is suggested you either run the latest netperf-2.4.0+ or use a binding
305
tool such as Tim Hockin's procstate utilities (runon)
306
<http://www.hockin.org/~thockin/procstate/>.
308
Binding netserver and netperf (or other applications) to particular
309
CPUs will have a significant difference in performance measurements.
310
You may need to experiment which CPU to bind the application to in
311
order to achieve the best performance for your system.
313
If you are developing an application designed for 10Gb networking,
314
please keep in mind you may want to look at kernel functions
315
sched_setaffinity & sched_getaffinity to bind your application.
317
If you are just running user-space applications such as ftp, telnet,
318
etc., you may want to try the runon tool provided by Tim Hockin's
319
procstate utility. You could also try binding the interface to a
320
particular CPU: runon 0 ifup eth0
326
If you have problems with the software or hardware, please contact our
327
customer support team via email at support@chelsio.com or check our website
328
at http://www.chelsio.com
330
===============================================================================
332
Chelsio Communications
336
http://www.chelsio.com
338
This program is free software; you can redistribute it and/or modify
339
it under the terms of the GNU General Public License, version 2, as
340
published by the Free Software Foundation.
342
You should have received a copy of the GNU General Public License along
343
with this program; if not, write to the Free Software Foundation, Inc.,
344
59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
346
THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
347
WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
348
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
350
Copyright (c) 2003-2005 Chelsio Communications. All rights reserved.
352
===============================================================================