1
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
4
<title>Linux SG_IO ioctl in the 2.6 series</title>
5
<meta http-equiv="Content-Type"
6
content="text/html; charset=iso-8859-1">
7
<meta name="keywords" content="Linux, SCSI, SG_IO, ioctl">
9
content="Mozilla/4.79 [en] (X11; U; Linux 2.5.31 i686) [Netscape]">
11
<body alink="#ff0000" background="paper.jpg" bgcolor="#ffffff"
12
link="#0000ff" text="#000000" vlink="#000080">
14
<h1><a class="mozTocH1" name="mozTocId288504"></a> The Linux
15
SG_IO ioctl in the 2.6 series<br>
18
<a href="#Conclusion"></a>
20
<!--mozToc h1 1 h2 2 h3 3 h4 4 h5 5 h6 6--><li><a href="#mozTocId288504">
22
SG_IO ioctl in the 2.6 series </a>
24
<li><a href="#mozTocId857690">Introduction</a></li>
25
<li><a href="#mozTocId844428">SCSI and related
26
command sets </a></li>
27
<li><a href="#mozTocId568229">SG_IO ioctl overview</a></li>
28
<li><a href="#mozTocId575826">SG_IO ioctl in the sg
30
<li><a href="#mozTocId104192">SG_IO ioctl
32
<li><a href="#mozTocId830340">open() considerations</a></li>
33
<li><a href="#mozTocId645134">SCSI command
35
<li><a href="#mozTocId154063">Maximum transfer size per command</a></li>
36
<li><a href="#mozTocId267334">Conclusion</a></li>
40
<h2><a class="mozTocH2" name="mozTocId857690"></a>Introduction</h2>
41
The <span style="font-weight: bold;">SG_IO</span> ioctl permits user
42
applications to send SCSI commands to a device. In the linux 2.4 series
43
this ioctl was only available via the SCSI generic (sg) driver. In the
44
linux 2.6 series the SG_IO ioctl is additionally available for block
46
and SCSI tape (st) devices. So there are multiple implementations
47
of this ioctl within the kernel with slightly different characteristics
48
and describing these is the purpose of this document.<br>
50
The information in this page is valid for linux kernel 2.6.16 .<br>
51
<h2><a class="mozTocH2" name="mozTocId844428"></a>SCSI and related
54
All SCSI devices should respond to an INQUIRY command and part of their
55
response is the so-called peripheral device type. This is used by the
56
linux kernel to decide which upper level driver controls the device.
57
There are also devices that belong to other (i.e. not considered SCSI)
58
transports that use SCSI command sets, the primary examples of this are
59
(S-)ATAPI CD and DVD drives. Not all peripheral device types map to
60
upper level drivers and devices of these types are usually accessed via
61
the SCSI generic (sg) driver.<br>
63
SCSI (draft) standards are found at <a href="http://www.t10.org/">www.t10.org</a>
64
. SCSI commands common to all
65
SCSI devices are found in SPC-4 while those specific to block devices
66
are found in SBC-2, those for CD/DVD drives are found in MMC-5 and
67
those for SCSI tape drives are found in SSC-3.<br>
69
The major non-SCSI command set in the storage area is for ATA
70
<span style="font-style: italic;">non-packet</span> devices which are
71
typically disks. ATA <span style="font-style: italic;">packet</span>
72
devices use ATAPI which in
73
the vast majority of cases carry a SCSI command set. The most recent
74
draft ATA command set standard is ATA8-ACS and can be found at <a
75
href="http://www.t13.org/">www.t13.org</a> . To complicate things
76
(non-packet) ATA devices may have their native command set translated
77
into SCSI. This can happen in the kernel (e.g. libata in linux) or in
78
an intermediate device (e.g. in a USB external disk enclosure). Yet
79
another possibility are disks
80
whose firmware can be changed to allow them to use either the SCSI or
81
ATA command set, this may happen in the SAS/SATA area since the
82
physical (cabling) and phy (electrical signalling) levels are so
84
<h2><a class="mozTocH2" name="mozTocId568229"></a>SG_IO ioctl overview</h2>
85
The third argument given to the SG_IO ioctl is a pointer to an instance
86
of the sg_io_hdr structure which is defined in the <scsi/sg.h>
87
header file. The execution of the SG_IO ioctl can viewed as going
88
through three phases:<br>
90
<li>do sanity checks on the metadata in the sg_io_hdr instance; read
91
the input fields and the data pointed to by some of those fields; build
92
a SCSI command and issue it to the device</li>
93
<li>wait for either a response from the device, the command to
95
the user to terminate the process (or thread) that invoked the SG_IO
97
<li>write the output fields and in some cases write data to locations
98
pointed to by some fields, then return</li>
100
Only phase 1 returns an ioctl error (i.e. a return value of -1 and a
102
set in errno). In phase 2, command timeouts should be used sparingly as
103
the device (and some others on the same interconnect) may end up being
104
reset. If the user terminates the process or thread that invoked the
105
SG_IO ioctl then obviously phase 3 never occurs but the command
106
execution runs to completion (or timeout) and the kernel "throws away"
107
the results. If the command yields a SCSI status of CHECK
108
CONDITION (in field "status") then sense data is written out in
111
Now we will assume that the SCSI command involves user data being
112
transferred to or from the device. The SCSI subsystem does not support
113
true bidirectional data transfers to a device. All data DMA transfers
114
(assuming the hardware supports DMA) occur in phase 2. However, if
115
indirect IO is being used (i.e. neither direct IO nor mmap-ed
116
transfers) then either:<br>
118
<li>data is read from the user space in phase 1 into kernel buffers
119
and DMA-ed to the device in phase 2, or</li>
120
<li>data is read from the device into kernel buffers in phase 2 and
121
written into the user space in phase 3</li>
123
When direct IO or mmap-ed transfers are being used then all user data
124
is moved in phase 2 . If a process is terminated during such a data
125
transfer then the kernel gracefully handles this (by pinning the
126
associated memory pages until the transfer is complete).<br>
128
The sg_io_hdr structure has 22 fields (members) but typically only a
129
small number of them need to be set. The following code fragment shows
130
the setup for a simple TEST UNIT READY SCSI command which has no
131
associated data transfers:<br>
132
<br>
133
<span style="font-family: monospace;"> unsigned char
134
sense_b[32];</span><br style="font-family: monospace;">
135
<span style="font-family: monospace;"> unsigned char
136
turCmbBlk[] = {TUR_CMD, 0, 0, 0, 0, 0};<br>
137
struct sg_io_hdr io_hdr;<br
138
style="font-family: monospace;">
139
</span><br style="font-family: monospace;">
140
<span style="font-family: monospace;">
141
memset(&io_hdr, 0, sizeof(struct sg_io_hdr));</span><br
142
style="font-family: monospace;">
143
<span style="font-family: monospace;">
144
io_hdr.interface_id = 'S';</span><br style="font-family: monospace;">
145
<span style="font-family: monospace;"> io_hdr.cmd_len
146
= sizeof(turCmbBlk);</span><br style="font-family: monospace;">
147
<span style="font-family: monospace;">
148
io_hdr.mx_sb_len = sizeof(sense_b);</span><br
149
style="font-family: monospace;">
150
<span style="font-family: monospace;">
151
io_hdr.dxfer_direction = SG_DXFER_NONE;</span><br
152
style="font-family: monospace;">
153
<span style="font-family: monospace;"> io_hdr.cmdp =
154
turCmbBlk;</span><br style="font-family: monospace;">
155
<span style="font-family: monospace;"> io_hdr.sbp =
156
sense_b;</span><br style="font-family: monospace;">
157
<span style="font-family: monospace;"> io_hdr.timeout
158
= DEF_TIMEOUT;</span><br style="font-family: monospace;">
159
<br style="font-family: monospace;">
160
<span style="font-family: monospace;"> if (ioctl(fd,
161
SG_IO, &io_hdr) < 0) {</span><br>
163
The memset() call is pretty important, setting unused input fields to
164
safe values. Setting the timeout field to zero is not a good idea;
165
30,000 (for 30 seconds) is a reasonable default for most SCSI commands.
166
As always, good error
167
processing consumes a lot more code. This is especially the case with
168
SCSI commands that yield "sense data" when something goes wrong. For
169
example, if there is a medium error during a disk read, the sense data
170
will contain the logical block address (lba) of the failure. Another
171
error processing example is a SCSI command that the device considers an
173
request", the sense data may show the byte and bit position of the
174
field in the command block (usually referred to as a "cdb") that it
175
objects to. For examples on error processing please refer to the
176
sg3_utils package, its "examples" directory and its library components:
177
sg_lib.c (SCSI error processing and tables) and sg_cmds.c (common SCSI
180
Below is a grouping of important sg_io_hdr structure fields with brief
182
Command block (historically referred to as the "cdb"):<br>
184
<li>cmdp - pointer to cdb (the SCSI command block)</li>
185
<li>cmd_len - length (in bytes) of cdb</li>
189
<li>dxferp - pointer to user data to start reading from or start
191
<li>dxfer_len - number of bytes to transfer</li>
192
<li>dxfer_direction - whether to read from device (into user memory)
193
or write to device (from user memory) or transfer no data:
194
DXFER_FROM_DEV, DXFER_TO_DEV or DXFER_NONE respectively<br>
196
<li>resid - requested number of bytes to transfer (i.e. dxfer_len)
197
less the actual number transferred</li>
199
Error indication:<br>
201
<li>status - SCSI status returned from the device</li>
202
<li>host_status - error from Host Bus Adapter including initiator
205
<li>driver_status - driver (mid level or low level driver) error and
208
Sense data (only used when 'status' is CHECK CONDITION or
209
(driver_status & DRIVER_SENSE) is true):<br>
211
<li>sbp - pointer to start writing sense data to<br>
213
<li>mx_sb_len - maximum number of bytes to write to sbp</li>
214
<li>sb_len_wr - actual number of bytes written to sbp</li>
216
The fields in the sg_io_hdr structure are defined in more detail in the
217
<a href="http://www.tldp.org/HOWTO/SCSI-Generic-HOWTO/index.html">SCSI-Generic-HOWTO</a>
219
<h2><a class="mozTocH2" name="mozTocId575826"></a>SG_IO ioctl in the sg
222
Linux kernel 2.4.0 was the first production kernel in which the SG_IO
223
ioctl appeared in the SCSI generic (sg) driver. The sg driver itself
224
has been in linux since around 1993.
225
An instance of the sg_io_hdr structure in the sg driver can either be:<br>
227
<li>pointed to by the third argument of the SG_IO ioctl</li>
228
<li>pointed to by the second argument of UNIX write() or read()
230
calls which have a file descriptor of a sg device node as their first
234
The <a href="http://www.tldp.org/HOWTO/SCSI-Generic-HOWTO/index.html">SCSI-Generic-HOWTO</a>
235
document describes the sg driver in the lk 2.4 series including its use
236
of the SG_IO ioctl. Prior to the lk 2.4 series the sg driver only had
237
the sg_header structure. It was used as an asynchronous
238
command interface in which command, metadata and optionally user data
239
was sent via a Unix write() system call. The corresponding response
241
error information (e.g. sense data) or optionally user data was
242
received via a Unix read() system call. Two major additions were made
244
driver at the beginning of the lk 2.4 series:<br>
246
<li>a new metadata structure (sg_io_hdr) as an alternative to the
247
original mixed metadata and data structure (sg_header)</li>
248
<li>the SG_IO ioctl that used the new metadata structure and was
249
synchronous: it sent a SCSI command and waited for its reply</li>
251
The sg_io_hdr only contains metadata in the sense that it contains
252
pointers to locations of where data will come from (command or data in)
253
or go to (sense data or data out). These pointers have caused problems
254
in mixed 32/64 bit environments, especially when the user application
255
(e.g. cdrecord) is built for 32 bits and the kernel is 64 bits. The lk
256
2.6 series has a compatibility layer to cope with this via code
257
specialized for the SG_IO ioctl. Unfortunately this problem was not
258
foreseen when the sg_io_hdr structure was designed.<br>
260
A significant feature of the SG_IO ioctl in the sg driver is that it
261
is user interruptible. This means between issuing a command (e.g. a
262
long duration command like a disk format) and its response arriving a
263
user could hit control-C on the associated application. The kernel
264
would remain stable and resources would be cleared up at the
265
appropriate time. The sg driver does not attempt to abort such a
266
command that is "in flight", it simply throws away the response and
267
cleans up. Naturally the user has no direct way of finding out whether
268
an interrupted command succeeded or not, by there may be indirect ways.<br>
270
A warning may also be in order here: a long duration command such as
271
format would typically be given a long timeout value. If the user
272
interrupted the application that sent the format command then the
274
busy doing the format (especially if the IMMED bit is not set). So if
275
the user then sent a short duration command such as TEST UNIT READY or
276
REQUEST SENSE to see what the device was doing, these commands may
277
timeout. This would invoke the SCSI subsystem error handler which would
279
send a device reset, thus aborting the format, to get the device's
280
attention. This is probably not what the user had in mind!<br>
282
<h2><a class="mozTocH2" name="mozTocId104192"></a>SG_IO ioctl
285
In the following table,
286
sg_io_hdr structure fields are listed in the order they appear in that
287
structure. Basically the "in" fields appear at the top of the structure
288
and are read in phase 1. The latter fields are termed as "out" and are
289
written by the SG_IO implementation in phase 3.<br>
291
<table style="width: 100%; text-align: left;" border="1" cellpadding="2"
293
<caption><span style="font-weight: bold;">Table 1. sg_io_hdr
294
structure summary and implementation differences</span><br>
297
<td style="vertical-align: top;"><span style="font-weight: bold;">sg_io_hdr
300
<td style="vertical-align: top;"><span style="font-weight: bold;">in
303
<td style="vertical-align: top;"><span style="font-weight: bold;">type</span><br>
305
<td style="vertical-align: top;"><span style="font-weight: bold;">different</span><br>
307
<td style="vertical-align: top;"><span style="font-weight: bold;">brief
308
description including differences between implementations</span><br>
312
<td style="vertical-align: top;">interface_id<br>
314
<td style="vertical-align: top;">in<br>
316
<td style="vertical-align: top;">int<br>
318
<td style="vertical-align: top;"><br>
320
<td style="vertical-align: top;">guard field. Current
321
implementations only accept " (int)'S' ". If not set, the sg driver
323
while the block layer sets it to EINVAL<br>
327
<td style="vertical-align: top;">dxfer_direction<br>
329
<td style="vertical-align: top;">in<br>
331
<td style="vertical-align: top;">(-ve) int<br>
333
<td style="vertical-align: top;">minor<br>
335
<td style="vertical-align: top;">direction of data transfer.
336
SG_DXFER_NONE and friends are defined as negative integers so the sg
337
driver can discriminate between sg_io_hdr instances and those of
338
sg_header. This nuance is irrelevant to non-sg driver usage of SG_IO.
343
<td style="vertical-align: top;">cmd_len<br>
345
<td style="vertical-align: top;">in<br>
347
<td style="vertical-align: top;">unsigned char<br>
349
<td style="vertical-align: top;"><br>
351
<td style="vertical-align: top;">limits command length to 255
352
bytes. No SCSI commands (even variable length ones in OSD) are this
357
<td style="vertical-align: top;">max_sb_len<br>
359
<td style="vertical-align: top;">in<br>
361
<td style="vertical-align: top;">unsigned char<br>
363
<td style="vertical-align: top;"><br>
365
<td style="vertical-align: top;">maximum number of bytes of sense
366
data that the driver can output via the sbp pointer<br>
370
<td style="vertical-align: top;">iovec_count<br>
372
<td style="vertical-align: top;">in<br>
374
<td style="vertical-align: top;">unsigned short<br>
376
<td style="vertical-align: top;">yes<br>
378
<td style="vertical-align: top;">if not sg driver and greater
379
than zero then the SG_IO ioctl fails with errno set to EOPNOTSUPP; sg
380
driver treats dxferp as a pointer to an array struct sg_iovec when this
381
field is greater than zero<br>
385
<td style="vertical-align: top;">dxfer_len<br>
387
<td style="vertical-align: top;">in<br>
390
<td style="vertical-align: top;">unsigned int<br>
392
<td style="vertical-align: top;">minor<br>
394
<td style="vertical-align: top;">number of bytes of data to
395
transfer to or from the device. Upper limit for block devices related
396
to <span style="font-family: monospace;">/sys/block/<device>/queue/max_sectors_kb</span><br>
400
<td style="vertical-align: top;">dxferp </td>
401
<td style="vertical-align: top;">in [*in or *out]<br>
403
<td style="vertical-align: top;">void *<br>
405
<td style="vertical-align: top;">minor<br>
407
<td style="vertical-align: top;">pointer to (user space) data to
408
transfer to (if reading from device) or transfer from (if writing to
409
device). Further level of indirection in the sg driver when iovec_count
410
is greater than 0 .<br>
414
<td style="vertical-align: top;">cmdp </td>
415
<td style="vertical-align: top;">in [*in]<br>
417
<td style="vertical-align: top;">unsigned char *<br>
419
<td style="vertical-align: top;"><br>
421
<td style="vertical-align: top;">pointer to SCSI command. The
422
SG_IO ioctl in the sg drive fails with errno set to EMSGSIZE if
423
cmdp is NULL and EFAULT if it is invalid; the block layer sets errno to
424
EFAULT in both cases.<br>
428
<td style="vertical-align: top;">sbp<br>
430
<td style="vertical-align: top;">in [*out]<br>
432
<td style="vertical-align: top;">unsigned char *<br>
434
<td style="vertical-align: top;"><br>
436
<td style="vertical-align: top;">pointer to user data area where
437
no more than max_sb_len bytes of sense data from the device will be
438
written if the SCSI status is CHECK CONDITION. <br>
442
<td style="vertical-align: top;">timeout<br>
444
<td style="vertical-align: top;">in<br>
446
<td style="vertical-align: top;">unsigned int<br>
448
<td style="vertical-align: top;">yes <br>
451
<td style="vertical-align: top;">time in milliseconds that the
452
SCSI mid-level will wait for a response. If that timer expires
453
before the command finishes, then the command may be aborted, the
454
device (and maybe others on the same interconnect) may be reset
456
handler settings. Dangerous stuff, the SG_IO ioctl has no control
457
(through this interface) of exactly what happens. In the sg driver a
458
timeout value of 0 means 0 milliseconds, in the block layer (currently)
459
it means 60 seconds.<br>
463
<td style="vertical-align: top;">flags<br>
465
<td style="vertical-align: top;">in<br>
467
<td style="vertical-align: top;">unsigned int<br>
469
<td style="vertical-align: top;">yes<br>
471
<td style="vertical-align: top;">Block layer SG_IO ioctl ignores
472
this field; the sg driver uses it to request special services like
473
direct IO or mmap-ed transfers. It is a bit mask.<br>
477
<td style="vertical-align: top;">pack_id<br>
479
<td style="vertical-align: top;">in -> out<br>
481
<td style="vertical-align: top;">int<br>
483
<td style="vertical-align: top;"><br>
485
<td style="vertical-align: top;">unused (for user space program
490
<td style="vertical-align: top;">usr_ptr<br>
492
<td style="vertical-align: top;">in -> out<br>
494
<td style="vertical-align: top;">void *<br>
496
<td style="vertical-align: top;"><br>
498
<td style="vertical-align: top;">unused (for user space pointer
503
<td style="vertical-align: top;">status<br>
505
<td style="vertical-align: top;">out<br>
507
<td style="vertical-align: top;">unsigned char<br>
509
<td style="vertical-align: top;"><br>
511
<td style="vertical-align: top;">SCSI command status, zero
516
<td style="vertical-align: top;">masked_status<br>
518
<td style="vertical-align: top;">out<br>
520
<td style="vertical-align: top;">unsigned char<br>
522
<td style="vertical-align: top;"><br>
524
<td style="vertical-align: top;">Logically: masked_status ==
525
((status & 0x3e) >> 1). Old linux SCSI subsystem usage,
529
<td style="vertical-align: top;">msg_status<br>
531
<td style="vertical-align: top;">out<br>
533
<td style="vertical-align: top;">unsigned char<br>
535
<td style="vertical-align: top;"><br>
537
<td style="vertical-align: top;">SCSI parallel interface (SPI)
543
<td style="vertical-align: top;">sb_len_wr<br>
545
<td style="vertical-align: top;">out<br>
547
<td style="vertical-align: top;">unsigned char<br>
549
<td style="vertical-align: top;"><br>
551
<td style="vertical-align: top;">actual length of sense data (in
552
bytes) output via sbp pointer.<br>
556
<td style="vertical-align: top;">host_status<br>
558
<td style="vertical-align: top;">out<br>
560
<td style="vertical-align: top;">unsigned short<br>
562
<td style="vertical-align: top;"><br>
564
<td style="vertical-align: top;">error reported by the initiator
565
(port). These are the "DID_*" error codes in scsi.h<br>
569
<td style="vertical-align: top;">driver_status<br>
571
<td style="vertical-align: top;">out<br>
573
<td style="vertical-align: top;">unsigned short<br>
575
<td style="vertical-align: top;"><br>
577
<td style="vertical-align: top;">bit mask: error and suggestion
578
reported by the low level driver (LLD). These are the "DRIVER_*" error
582
<td style="vertical-align: top;">resid<br>
584
<td style="vertical-align: top;">out<br>
586
<td style="vertical-align: top;">int<br>
588
<td style="vertical-align: top;"><br>
590
<td style="vertical-align: top;">(dxfer_len -
591
number_of_bytes_actually_transferred). Typically only set when there is
592
a shortened DMA transfer from the device. Not necessarily an
593
error. Older LLDs always yield zero.</td>
596
<td style="vertical-align: top;">duration<br>
598
<td style="vertical-align: top;">out<br>
600
<td style="vertical-align: top;">unsigned int<br>
602
<td style="vertical-align: top;"><br>
604
<td style="vertical-align: top;">number of milliseconds that
605
elapsed between when the command was injected into the SCSI mid level
606
and the corresponding "done" callback was invoked. Roughly the duration
607
of the SCSI command in milliseconds.<br>
611
<td style="vertical-align: top;">info<br>
613
<td style="vertical-align: top;">out<br>
615
<td style="vertical-align: top;">unsigned int<br>
617
<td style="vertical-align: top;">minor<br>
619
<td style="vertical-align: top;">bit mask indicating what was
620
done (or not) and whether any error was detected. Block layer SG_IO
621
ioctl only sets SG_INFO_CHECK if an error was detected<br>
627
The DID_* and DRIVER_* error and suggestion codes (associated with
628
host_status and driver_status) are discussed in more detail in the
629
<a href="http://www.tldp.org/HOWTO/SCSI-Generic-HOWTO/index.html">SCSI-Generic-HOWTO</a>
631
<h2><a class="mozTocH2" name="mozTocId830340"></a>open() considerations</h2>
632
Various drivers have different characteristics when a device node is
633
opened. One problem with the ioctl system call is that a user only
634
needs read permissions to execute it but may, with the ioctls like
635
SG_IO, write to a device (e.g. format it). Command (operation
636
code) sniffing logic is used to overcome this security problem. Also
637
users of the SG_IO ioctl need to be aware when they "share" a device
638
with sd, st or a cdrom driver that state machines within those drivers
639
may be tricked. This may be unavoidable but the users of the SG_IO
640
ioctl should take appropriate care.<br>
642
Opening a file in linux with flags of zero implies the O_RDONLY flag
643
and hence read only access. All open() system calls can yield ENOENT
644
(no such file or directory); ENODEV (no such device) if the file exists
645
but there is no attached device and EACCES (permission denied) if the
646
user doesn't have appropriate permissions.<br>
648
A user with CAP_SYS_RAWIO capability (normally associated with the
649
"root" user) bypasses all command sniffing and other access controls
650
that would otherwise lead to EACCES or EPERM errors. With the sg driver
651
such a user may still need to open() a device node with O_RDWR (rather
652
than O_RDONLY) to use all SCSI commands.<br>
654
<table style="width: 100%; text-align: left;" border="1" cellpadding="2"
656
<caption><span style="font-weight: bold;">Table 2. open() flags for
657
SG_IO ioctl usage</span><br>
660
<td style="vertical-align: top;"><span style="font-weight: bold;">open()
663
<td style="vertical-align: top;"><span style="font-weight: bold;">sg<br>
666
<td style="vertical-align: top;"><span style="font-weight: bold;">sd<br>
669
<td style="vertical-align: top;"><span style="font-weight: bold;">st<br>
672
<td style="vertical-align: top;"><span style="font-weight: bold;">cdrom<br>
675
<td style="vertical-align: top;"><span style="font-weight: bold;">Comments</span><br>
679
<td style="vertical-align: top;"><none> or<br>
682
<td style="vertical-align: top;">1, 2<br>
684
<td style="vertical-align: top;">3,4<br>
686
<td style="vertical-align: top;">3,5<br>
688
<td style="vertical-align: top;">3,6<br>
690
<td style="vertical-align: top;">best to add O_NONBLOCK. For a
691
device with removable media (e.g. tape drive) that depends on whether
692
the drive or its media is being accessed.<br>
696
<td style="vertical-align: top;">O_RDONLY | O_NONBLOCK<br>
698
<td style="vertical-align: top;">1,7<br>
700
<td style="vertical-align: top;">3<br>
702
<td style="vertical-align: top;">3,13<br>
704
<td style="vertical-align: top;">3<br>
706
<td style="vertical-align: top;">recommended when SCSI commands
707
are recognized as reading information from the device<br>
711
<td style="vertical-align: top;">O_RDWR<br>
713
<td style="vertical-align: top;">2<br>
715
<td style="vertical-align: top;">4,8,9<br>
717
<td style="vertical-align: top;">5,8,9<br>
719
<td style="vertical-align: top;">6,8,9<br>
721
<td style="vertical-align: top;">again, could be better to add
725
<td style="vertical-align: top;">O_RDWR | O_NONBLOCK<br>
727
<td style="vertical-align: top;">7<br>
729
<td style="vertical-align: top;">8,9<br>
731
<td style="vertical-align: top;">8,9,13<br>
733
<td style="vertical-align: top;">8,9<br>
735
<td style="vertical-align: top;">recommended when arbitrary
736
(including vendor specific) SCSI commands are to be sent<br>
740
<td style="vertical-align: top;"><< interaction with
743
<td style="vertical-align: top;">10<br>
745
<td style="vertical-align: top;">11<br>
747
<td style="vertical-align: top;">12<br>
749
<td style="vertical-align: top;">11<br>
751
<td style="vertical-align: top;">only use when sure that no other
752
application may want to access the device (or partition). A surprising
753
number of applications do "poke around" devices.<br>
757
<td style="vertical-align: top;"><< interaction with
760
<td style="vertical-align: top;">-<br>
762
<td style="vertical-align: top;">--><br>
764
<td style="vertical-align: top;">-<br>
766
<td style="vertical-align: top;">--><br>
768
<td style="vertical-align: top;">requires sector alignment on
769
data transfers (ignored by sg and st)<br>
775
<span style="font-weight: bold;">Notes</span>:<br>
777
<li>on subsequent SG_IO ioctl calls, the sg driver will only allow
778
SCSI commands in its allow_ops array, others result in EPERM (operation
779
not permitted) in errno. See <a href="#SCSI_command_permissions">below</a>
782
<li>if previous open() of this sg device node still holds O_EXCL then
783
this open() waits until it clears.</li>
784
<li>on subsequent SG_IO ioctl calls, the block layer will only allow
785
SCSI commands listed as "safe_for_read" in the verify_command()
786
function in the drivers/block/scsi_ioctl.c file; others result in EPERM
787
(operation not permitted) in errno. See <a
788
href="#SCSI_command_permissions">below</a> .<br>
790
<li>if removable media and it is not present then yields ENOMEDIUM
791
(no medium found)</li>
792
<li>if a tape is not present in drive then yields EIO (input/output
793
error), if tape is "in use" then yields EBUSY (resource busy). Only one
794
open file descriptor is allowed per st device node at a time (although
795
dup() can be used).<br>
797
<li>if tray closed and media is not present then yields ENOMEDIUM (no
798
medium found); if tray open then tries to close it and if no media
799
present then yields ENOMEDIUM</li>
800
<li>if previous open() of this sg device node still holds O_EXCL then
801
yields EBUSY (resource busy).<br>
803
<li>on subsequent SG_IO ioctl calls, the block layer will allow SCSI
804
commands listed as either "safe_for_read" or "safe_for_write". For
805
other SCSI commands the user requires the CAP_SYS_RAWIO capability
806
(usually associated with the "root" user); if not yields EPERM
808
not permitted). The first instance of other SCSI commands since boot,
809
sends an annoying "scsi: unknown opcode" message to the log.<br>
811
<li>if the media or drive is marked as not writable then yields EROFS
812
(read-only file system).</li>
813
<li>if sg device node already has exclusive lock then a subsequent
814
attempt to open(O_EXCL) will wait unless O_NONBLOCK is given in which
815
case it yields EBUSY (resource busy)<br>
817
<li>implemented at block device level (which knows about partitions
818
within devices). If a previous open(O_EXCL) is active then a subsequent
819
open(O_EXCL) yields EBUSY (resource busy). Mounted file systems
820
typically open a device/partition with O_EXCL; as long as an
821
application using the SG_IO ioctl does not also try and use the O_EXCL
822
flag then it will be allowed access to the device.</li>
823
<li>the st driver does not support (i.e. ignores) the O_EXCL flag.
824
However the fact that it only permits one active open() per tape device
825
is similar functionality.<br>
827
<li>if tape is "in use" then yields EBUSY (resource busy). Only one
828
open file descriptor is allowed per st device node at a time.</li>
830
The O_EXCL flag has a different effect in the sg driver and the block
831
layer. In the sg driver, once O_EXCL is held on a device, all
832
subsequent open() attempts will either wait or yield EBUSY
833
(irrespective of whether they attempt to use the O_EXCL flag). Once a
834
partition/device is opened successfully in the block layer (with the sd
835
or cdrom driver) only subsequent open() attempts that also use the
836
O_EXCL flag are rejected (with EBUSY). A O_EXCL lock held on a device
837
in the block layer has no effect on accessing the same device via the
838
sg driver (and vice versa).<br>
840
The first successful open on a sd or a cdrom device node that has
841
removable media will send a PREVENT ALLOW MEDIUM REMOVAL (prevent) SCSI
842
command to the device. If successful, this will inhibit a subsequent
843
START STOP UNIT (eject) SCSI command and de-activate the eject button
844
on the drive. In emergencies, the SG_IO ioctl can be used to defeat
845
this action, an example of this is the <a href="sdparm.html">sdparm</a>
846
utility, specifically "sdparm --command=unlock".<br>
848
The open() flag O_NDELAY has the same value and meaning as O_NONBLOCK.
849
Other flags such as O_DIRECT, O_TRUNC and O_APPEND have no effect on
851
<h2><a class="mozTocH2" name="mozTocId645134"></a>SCSI command
853
In linux a user only needs read permissions on a file descriptor to
854
execute an ioctl() system command. In the case of the SG_IO ioctl, a
855
SCSI command could be sent that obviously changes the state of a device
856
(e.g. WRITE to a disk). So both implementations of the SG_IO ioctl
857
require more than read permissions for some commands, especially those
858
that are known to change the state of a device or those that have some
859
unknown action (e.g. vendor specific commands).<br>
861
Here is a table of SCSI commands that don't need the user to have write
862
permissions (or in some cases CAP_SYS_RAWIO capability which usually
863
equates to "root" user):<br>
864
<table style="width: 100%; text-align: left;" border="1" cellpadding="2"
866
<caption><span style="font-weight: bold;">Table 3. SCSI command
867
minimum permission requirements</span><br>
870
<td style="vertical-align: top;"><span style="font-weight: bold;">SCSI
873
<td style="vertical-align: top;"><span style="font-weight: bold;">(draft)
876
<td style="vertical-align: top;"><span style="font-weight: bold;">sg
877
driver requires</span><br>
879
<td style="vertical-align: top;"><span style="font-weight: bold;">block
881
requires (except st)</span><br>
883
<td style="vertical-align: top;"><span style="font-weight: bold;">Comments</span><br>
887
<td style="vertical-align: top;">BLANK<br>
889
<td style="vertical-align: top;">MMC-4<br>
891
<td style="vertical-align: top;">O_RDWR</td>
892
<td style="vertical-align: top;">O_RDWR</td>
893
<td style="vertical-align: top;"><br>
897
<td style="vertical-align: top;">CLOSE TRACK/SESSION<br>
899
<td style="vertical-align: top;">MMC-4<br>
901
<td style="vertical-align: top;">O_RDWR</td>
902
<td style="vertical-align: top;">O_RDWR</td>
903
<td style="vertical-align: top;"><br>
907
<td style="vertical-align: top;">ERASE<br>
909
<td style="vertical-align: top;">MMC-4<br>
911
<td style="vertical-align: top;">O_RDWR</td>
912
<td style="vertical-align: top;">O_RDWR</td>
913
<td style="vertical-align: top;"><br>
917
<td style="vertical-align: top;">FLUSH CACHE<br>
919
<td style="vertical-align: top;">SBC-3, MMC-4<br>
921
<td style="vertical-align: top;">O_RDWR</td>
922
<td style="vertical-align: top;">O_RDWR</td>
923
<td style="vertical-align: top;">Really SYNCHRONIZE CACHE command<br>
927
<td style="vertical-align: top;">FORMAT UNIT<br>
929
<td style="vertical-align: top;">SBC-3, MMC-4<br>
931
<td style="vertical-align: top;">O_RDWR</td>
932
<td style="vertical-align: top;">O_RDWR</td>
933
<td style="vertical-align: top;">default command timeout may not
938
<td style="vertical-align: top;">GET CONFIGURATION<br>
940
<td style="vertical-align: top;">MMC-4<br>
942
<td style="vertical-align: top;">O_RDWR</td>
943
<td style="vertical-align: top;">O_RDONLY</td>
944
<td style="vertical-align: top;">reads CD/DVD metadata<br>
948
<td style="vertical-align: top;">GET EVENT STATUS NOTIFICATION<br>
950
<td style="vertical-align: top;">MMC-4<br>
952
<td style="vertical-align: top;">O_RDWR</td>
953
<td style="vertical-align: top;">O_RDONLY</td>
954
<td style="vertical-align: top;"><br>
958
<td style="vertical-align: top;">GET PERFORMANCE<br>
960
<td style="vertical-align: top;">MMC-4</td>
961
<td style="vertical-align: top;">O_RDWR</td>
962
<td style="vertical-align: top;">O_RDONLY</td>
963
<td style="vertical-align: top;"><br>
967
<td style="vertical-align: top;">INQUIRY<br>
969
<td style="vertical-align: top;">SPC-4<br>
971
<td style="vertical-align: top;">O_RDONLY<br>
973
<td style="vertical-align: top;">O_RDONLY<br>
975
<td style="vertical-align: top;">All SCSI devices should respond
980
<td style="vertical-align: top;">LOAD UNLOAD MEDIUM<br>
982
<td style="vertical-align: top;">MMC-4<br>
984
<td style="vertical-align: top;">O_RDWR</td>
985
<td style="vertical-align: top;">O_RDWR</td>
986
<td style="vertical-align: top;">MEDIUM may be replaced by CD,
991
<td style="vertical-align: top;">LOG SELECT<br>
993
<td style="vertical-align: top;">SPC-4<br>
995
<td style="vertical-align: top;">O_RDWR</td>
996
<td style="vertical-align: top;">O_RDWR</td>
997
<td style="vertical-align: top;">used to change logging or clear
1002
<td style="vertical-align: top;">LOG SENSE<br>
1004
<td style="vertical-align: top;">SPC-4<br>
1006
<td style="vertical-align: top;">O_RDONLY</td>
1007
<td style="vertical-align: top;">O_RDONLY</td>
1008
<td style="vertical-align: top;">used to fetch logged data<br>
1012
<td style="vertical-align: top;">MAINTENANCE COMMAND IN<br>
1014
<td style="vertical-align: top;">SPC-4<br>
1016
<td style="vertical-align: top;">O_RDONLY<br>
1018
<td style="vertical-align: top;">CAP_SYS_RAW_IO<br>
1021
<td style="vertical-align: top;">various "REPORT ..." commands
1022
such as REPORT SUPPORTED OPERATION CODES in here<br>
1026
<td style="vertical-align: top;">MODE SELECT (6+10)<br>
1028
<td style="vertical-align: top;">SPC-4<br>
1030
<td style="vertical-align: top;">O_RDWR</td>
1031
<td style="vertical-align: top;">O_RDWR</td>
1032
<td style="vertical-align: top;">Used to change SCSI device
1037
<td style="vertical-align: top;">MODE SENSE (6+10)<br>
1039
<td style="vertical-align: top;">SPC-4<br>
1041
<td style="vertical-align: top;">O_RDONLY </td>
1042
<td style="vertical-align: top;">O_RDONLY </td>
1043
<td style="vertical-align: top;">Used to read SCSI device metadata<br>
1047
<td style="vertical-align: top;">PAUSE RESUME</td>
1048
<td style="vertical-align: top;">MMC-4</td>
1049
<td style="vertical-align: top;">O_RDWR</td>
1050
<td style="vertical-align: top;">O_RDONLY</td>
1051
<td style="vertical-align: top;"><br>
1055
<td style="vertical-align: top;">PLAY AUDIO (10)<br>
1057
<td style="vertical-align: top;">MMC-4<br>
1059
<td style="vertical-align: top;">O_RDWR</td>
1060
<td style="vertical-align: top;">O_RDONLY</td>
1061
<td style="vertical-align: top;"><br>
1065
<td style="vertical-align: top;">PLAY AUDIO MSF<br>
1067
<td style="vertical-align: top;">MMC-4<br>
1069
<td style="vertical-align: top;">O_RDWR</td>
1070
<td style="vertical-align: top;">O_RDONLY</td>
1071
<td style="vertical-align: top;"><br>
1075
<td style="vertical-align: top;">PLAY AUDIO TI<br>
1077
<td style="vertical-align: top;">??<br>
1079
<td style="vertical-align: top;">O_RDWR</td>
1080
<td style="vertical-align: top;">O_RDONLY</td>
1081
<td style="vertical-align: top;">opcode 0x48, unassigned to
1082
any spec in SPC-4<br>
1086
<td style="vertical-align: top;">PLAY CD</td>
1087
<td style="vertical-align: top;">MMC-2<br>
1089
<td style="vertical-align: top;">O_RDWR</td>
1090
<td style="vertical-align: top;">O_RDONLY</td>
1091
<td style="vertical-align: top;">old, now SPARE IN in SPC-4<br>
1095
<td style="vertical-align: top;">PREVENT ALLOW MEDIUM REMOVAL<br>
1097
<td style="vertical-align: top;">SPC-4, MMC-4<br>
1099
<td style="vertical-align: top;">O_RDWR</td>
1100
<td style="vertical-align: top;">O_RDWR</td>
1101
<td style="vertical-align: top;">sd, st and cdrom drivers use
1106
<td style="vertical-align: top;">READ (6+10+12+16)<br>
1108
<td style="vertical-align: top;">SBC-3<br>
1110
<td style="vertical-align: top;">O_RDONLY</td>
1111
<td style="vertical-align: top;">O_RDONLY</td>
1112
<td style="vertical-align: top;"><br>
1116
<td style="vertical-align: top;">READ BUFFER<br>
1118
<td style="vertical-align: top;">SPC-4<br>
1120
<td style="vertical-align: top;">O_RDONLY</td>
1121
<td style="vertical-align: top;">O_RDONLY</td>
1122
<td style="vertical-align: top;"><br>
1126
<td style="vertical-align: top;">READ BUFFER CAPACITY<br>
1128
<td style="vertical-align: top;">MMC-4<br>
1130
<td style="vertical-align: top;">O_RDWR</td>
1131
<td style="vertical-align: top;">O_RDONLY</td>
1132
<td style="vertical-align: top;"><br>
1136
<td style="vertical-align: top;">READ CAPACITY(10)<br>
1138
<td style="vertical-align: top;">SBC-3, MMC-4<br>
1140
<td style="vertical-align: top;">O_RDONLY</td>
1141
<td style="vertical-align: top;">O_RDONLY</td>
1142
<td style="vertical-align: top;"><br>
1146
<td style="vertical-align: top;">READ CAPACITY(16)<br>
1148
<td style="vertical-align: top;">SBC-3,<br>
1151
<td style="vertical-align: top;">O_RDONLY<br>
1153
<td style="vertical-align: top;">CAP_SYS_RAW_IO<br>
1155
<td style="vertical-align: top;">within SERVICE ACTION IN
1156
command. Needed for RAIDs larger than 2 TB<br>
1160
<td style="vertical-align: top;">READ CD<br>
1162
<td style="vertical-align: top;">MMC-4<br>
1164
<td style="vertical-align: top;">O_RDWR</td>
1165
<td style="vertical-align: top;">O_RDONLY</td>
1166
<td style="vertical-align: top;"><br>
1170
<td style="vertical-align: top;">READ CD MSF<br>
1172
<td style="vertical-align: top;">MMC-4<br>
1174
<td style="vertical-align: top;">O_RDWR</td>
1175
<td style="vertical-align: top;">O_RDONLY</td>
1176
<td style="vertical-align: top;"><br>
1180
<td style="vertical-align: top;">READ CDVD CAPACITY<br>
1182
<td style="vertical-align: top;">SBC-3, MMC-4<br>
1184
<td style="vertical-align: top;">O_RDONLY</td>
1185
<td style="vertical-align: top;">O_RDONLY</td>
1186
<td style="vertical-align: top;">Strange (old ?) name from
1187
cdrom.h . Actually is READ CAPACITY.<br>
1191
<td style="vertical-align: top;">READ DEFECT (10)<br>
1193
<td style="vertical-align: top;">SBC-3<br>
1195
<td style="vertical-align: top;">O_RDWR<br>
1197
<td style="vertical-align: top;">O_RDONLY</td>
1198
<td style="vertical-align: top;"><br>
1202
<td style="vertical-align: top;">READ DISC INFO<br>
1204
<td style="vertical-align: top;">MMC-4<br>
1206
<td style="vertical-align: top;">O_RDWR</td>
1207
<td style="vertical-align: top;">O_RDONLY</td>
1208
<td style="vertical-align: top;"><br>
1212
<td style="vertical-align: top;">READ DVD STRUCTURE<br>
1214
<td style="vertical-align: top;">MMC-4<br>
1216
<td style="vertical-align: top;">O_RDWR</td>
1217
<td style="vertical-align: top;">O_RDONLY</td>
1218
<td style="vertical-align: top;"><br>
1222
<td style="vertical-align: top;">READ FORMAT CAPACITIES<br>
1224
<td style="vertical-align: top;">MMC-4<br>
1226
<td style="vertical-align: top;">O_RDWR</td>
1227
<td style="vertical-align: top;">O_RDONLY</td>
1228
<td style="vertical-align: top;"><br>
1232
<td style="vertical-align: top;">READ HEADER<br>
1234
<td style="vertical-align: top;">MMC-2<br>
1236
<td style="vertical-align: top;">O_RDWR</td>
1237
<td style="vertical-align: top;">O_RDONLY</td>
1238
<td style="vertical-align: top;"><br>
1242
<td style="vertical-align: top;">READ LONG (10)<br>
1244
<td style="vertical-align: top;">SBC-3<br>
1246
<td style="vertical-align: top;">O_RDONLY</td>
1247
<td style="vertical-align: top;">O_RDONLY</td>
1248
<td style="vertical-align: top;">but not READ LONG (16)<br>
1252
<td style="vertical-align: top;">READ SUB-CHANNEL<br>
1254
<td style="vertical-align: top;">MMC-4<br>
1256
<td style="vertical-align: top;">O_RDWR</td>
1257
<td style="vertical-align: top;">O_RDONLY</td>
1258
<td style="vertical-align: top;"><br>
1262
<td style="vertical-align: top;">READ TOC/PMA/ATIP<br>
1264
<td style="vertical-align: top;">MMC-4<br>
1266
<td style="vertical-align: top;">O_RDWR</td>
1267
<td style="vertical-align: top;">O_RDONLY</td>
1268
<td style="vertical-align: top;"><br>
1272
<td style="vertical-align: top;">READ TRACK (RZONE) INFO<br>
1274
<td style="vertical-align: top;">MMC-4<br>
1276
<td style="vertical-align: top;">O_RDWR</td>
1277
<td style="vertical-align: top;">O_RDONLY</td>
1278
<td style="vertical-align: top;">In MMC-4 called READ TRACK INFO<br>
1282
<td style="vertical-align: top;">RECEIVE DIAGNOSTIC<br>
1284
<td style="vertical-align: top;">SPC-4<br>
1286
<td style="vertical-align: top;">O_RDONLY</td>
1287
<td style="vertical-align: top;">CAP_SYS_RAW_IO</td>
1288
<td style="vertical-align: top;">the SES command set uses this
1289
command a lot. An SES device is only accessible via an sg device node<br>
1293
<td style="vertical-align: top;">REPAIR (RZONE) TRACK<br>
1295
<td style="vertical-align: top;">MMC-4</td>
1296
<td style="vertical-align: top;">O_RDWR</td>
1297
<td style="vertical-align: top;">O_RDWR</td>
1298
<td style="vertical-align: top;"><br>
1302
<td style="vertical-align: top;">REPORT KEY<br>
1304
<td style="vertical-align: top;">MMC-4<br>
1306
<td style="vertical-align: top;">O_RDWR</td>
1307
<td style="vertical-align: top;">O_RDONLY</td>
1308
<td style="vertical-align: top;"><br>
1312
<td style="vertical-align: top;">REPORT LUNS<br>
1314
<td style="vertical-align: top;">SPC-4<br>
1316
<td style="vertical-align: top;">O_RDONLY</td>
1317
<td style="vertical-align: top;">CAP_SYS_RAW_IO</td>
1318
<td style="vertical-align: top;">mandatory since SPC-3<br>
1322
<td style="vertical-align: top;">REQUEST SENSE<br>
1324
<td style="vertical-align: top;">SPC-4<br>
1326
<td style="vertical-align: top;">O_RDONLY</td>
1327
<td style="vertical-align: top;">O_RDONLY</td>
1328
<td style="vertical-align: top;">has uses other than those
1329
displaced by autosense<br>
1333
<td style="vertical-align: top;">RESERVE (RZONE) TRACK<br>
1335
<td style="vertical-align: top;">MMC-4<br>
1337
<td style="vertical-align: top;">O_RDWR</td>
1338
<td style="vertical-align: top;">O_RDWR</td>
1339
<td style="vertical-align: top;"><br>
1343
<td style="vertical-align: top;">SCAN<br>
1345
<td style="vertical-align: top;">MMC-4<br>
1347
<td style="vertical-align: top;">O_RDWR</td>
1348
<td style="vertical-align: top;">O_RDONLY</td>
1349
<td style="vertical-align: top;"><br>
1353
<td style="vertical-align: top;">SEEK<br>
1355
<td style="vertical-align: top;">MMC-4</td>
1356
<td style="vertical-align: top;">O_RDWR</td>
1357
<td style="vertical-align: top;">O_RDONLY</td>
1358
<td style="vertical-align: top;"><br>
1362
<td style="vertical-align: top;">SEND CUE SHEET<br>
1364
<td style="vertical-align: top;">MMC-4<br>
1366
<td style="vertical-align: top;">O_RDWR</td>
1367
<td style="vertical-align: top;">O_RDWR</td>
1368
<td style="vertical-align: top;"><br>
1372
<td style="vertical-align: top;">SEND DVD STRUCTURE<br>
1374
<td style="vertical-align: top;">MMC-4<br>
1376
<td style="vertical-align: top;">O_RDWR</td>
1377
<td style="vertical-align: top;">O_RDWR</td>
1378
<td style="vertical-align: top;"><br>
1382
<td style="vertical-align: top;">[SEND EVENT]<br>
1384
<td style="vertical-align: top;">MMC-2<br>
1386
<td style="vertical-align: top;"><br>
1388
<td style="vertical-align: top;">O_RDWR</td>
1389
<td style="vertical-align: top;">cdrom.h associates opcode 0xa2
1390
but MMC-2 uses opcode 0x5d ??<br>
1394
<td style="vertical-align: top;">SEND KEY<br>
1396
<td style="vertical-align: top;">MMC-4<br>
1398
<td style="vertical-align: top;">O_RDWR</td>
1399
<td style="vertical-align: top;">O_RDWR</td>
1400
<td style="vertical-align: top;"><br>
1404
<td style="vertical-align: top;">SEND OPC INFORMATION<br>
1406
<td style="vertical-align: top;">MMC-4<br>
1408
<td style="vertical-align: top;">O_RDWR</td>
1409
<td style="vertical-align: top;">O_RDWR</td>
1410
<td style="vertical-align: top;"><br>
1414
<td style="vertical-align: top;">SERVICE ACTION IN<br>
1416
<td style="vertical-align: top;">SPC-4, SBC-3<br>
1418
<td style="vertical-align: top;">O_RDONLY</td>
1419
<td style="vertical-align: top;">CAP_SYS_RAW_IO</td>
1420
<td style="vertical-align: top;">READ CAPACITY (16) service
1425
<td style="vertical-align: top;">SET CD SPEED<br>
1427
<td style="vertical-align: top;">MMC-4<br>
1429
<td style="vertical-align: top;">O_RDWR</td>
1430
<td style="vertical-align: top;">O_RDWR</td>
1431
<td style="vertical-align: top;">cdrom.h calls this SET SPEED<br>
1435
<td style="vertical-align: top;">SET STREAMING<br>
1437
<td style="vertical-align: top;">MMC-4<br>
1439
<td style="vertical-align: top;">O_RDWR</td>
1440
<td style="vertical-align: top;">O_RDWR</td>
1441
<td style="vertical-align: top;"><br>
1445
<td style="vertical-align: top;">START STOP UNIT<br>
1447
<td style="vertical-align: top;">SBC-3, MMC-4<br>
1449
<td style="vertical-align: top;">O_RDWR</td>
1450
<td style="vertical-align: top;">O_RDONLY</td>
1451
<td style="vertical-align: top;">hmm<br>
1455
<td style="vertical-align: top;">STOP PLAY/SCAN<br>
1457
<td style="vertical-align: top;">MMC-4</td>
1458
<td style="vertical-align: top;">O_RDWR</td>
1459
<td style="vertical-align: top;">O_RDONLY</td>
1460
<td style="vertical-align: top;"><br>
1464
<td style="vertical-align: top;">SYNCHRONIZE CACHE</td>
1465
<td style="vertical-align: top;">SBC-3, MMC-4<br>
1467
<td style="vertical-align: top;">O_RDWR</td>
1468
<td style="vertical-align: top;">O_RDWR</td>
1469
<td style="vertical-align: top;">cdrom.h calls this FLUSH CACHE<br>
1473
<td style="vertical-align: top;">TEST UNIT READY<br>
1475
<td style="vertical-align: top;">SPC-4<br>
1477
<td style="vertical-align: top;">O_RDONLY</td>
1478
<td style="vertical-align: top;">O_RDONLY</td>
1479
<td style="vertical-align: top;">All SCSI devices should respond
1480
to this command</td>
1483
<td style="vertical-align: top;">VERIFY (10+16)<br>
1485
<td style="vertical-align: top;">SBC-3, MMC-4<br>
1487
<td style="vertical-align: top;">O_RDWR</td>
1488
<td style="vertical-align: top;">O_RDONLY</td>
1489
<td style="vertical-align: top;"><br>
1493
<td style="vertical-align: top;">WRITE (6+10+12+16)</td>
1494
<td style="vertical-align: top;">SBC-3<br>
1496
<td style="vertical-align: top;">O_RDWR</td>
1497
<td style="vertical-align: top;">O_RDWR</td>
1498
<td style="vertical-align: top;"><br>
1502
<td style="vertical-align: top;">WRITE LONG (10+16)<br>
1504
<td style="vertical-align: top;">SBC-3<br>
1506
<td style="vertical-align: top;">O_RDWR</td>
1507
<td style="vertical-align: top;">O_RDWR</td>
1508
<td style="vertical-align: top;"><br>
1512
<td style="vertical-align: top;">WRITE VERIFY (10+16)<br>
1514
<td style="vertical-align: top;">SBC-3, MMC-4<br>
1516
<td style="vertical-align: top;">O_RDWR</td>
1517
<td style="vertical-align: top;">O_RDWR</td>
1518
<td style="vertical-align: top;">only WRITE VERIFY(10) is in MMC-4<br>
1524
Any other SCSI command (opcode) not mentioned for the sg driver needs
1525
O_RDWR. Any other SCSI command (opcode) not mentioned for the block
1526
layer SG_IO ioctl needs a user with CAP_SYS_RAW_IO capability. All
1527
"block" SG_IO ioctl calls on st device nodes need a user with
1528
CAP_SYS_RAW_IO capability. If a
1529
user does not have sufficient permissions to execute a SCSI command via
1530
the SG_IO ioctl then the system calls fails (i.e. no SCSI command is
1531
sent) and errno is set to EPERM (operation not permitted).<br>
1533
Both the sg driver and the block layer SG_IO code use internal tables
1534
to enforce the permissions shown in the above table (allow_ops and
1535
cmd_type [safe_for_read and safe_for_write] respectively). This
1536
technique doesn't scale well, since more advanced command sets (e.g.
1537
OSD) use service actions (and one opcode: 0x7f in the case of OSD).
1538
There may also be overlap in opcode usage between command sets, for
1539
example between SBC, MMC and SSC.<br>
1540
<h2><a class="mozTocH2" name="mozTocId154063"></a>Maximum transfer size
1542
The largest amount of data that can be transferred by a single SCSI
1543
command is often a concern. Various SCSI command sets (e.g. SBC-3 for
1545
READs and WRITEs, SSC-3 for tape READs and WRITEs, and SPC-4 for
1546
READ+WRITE BUFFER) allow very large
1547
data transfer sizes but Linux is not so accommodating. The Host Bus
1549
could have transfer size limits as could the transport and finally the
1551
itself. In the latter case SBC-3 defines a "Block Limits" Vital Product
1552
Data (VPD) while SSC has the READ BLOCK LIMITS SCSI command. SBC-3's
1553
optional Block Limits VPD page contains both maximum and optimal
1554
counts. In the author's opinion that latter distinction is very
1555
important: the block susbsystem should try and use optimal sizes while
1556
pass through users should only be constrained by maximum sizes. Also if
1557
a pass through user exceeds a maximum transfer size imposed by a SCSI
1558
device, then the device can report an error. There is an
1559
underlying assumption that the applications using a pass through
1560
interface know what they are doing, or at least know more than the
1561
various kernel susbsystems. On the other hand, the kernel has the
1562
responsibility to allocate critical shared resources such as memory.<br>
1564
In the past, Linux used a single, "big-enough", block of memory for the
1565
source or destination of large data transfers. Then scatter-gather
1566
lists where added to break transfers up into smaller (often "page" size
1567
(4 KB on i386 architecture)) chunks which made memory management easier
1568
for the kernel. Now, in the lk 2.6 series, the single block of memory
1569
option is being phased out. <br>
1571
The Linux SCSI subsystem imposes a 128
1572
element limit on scatter gather lists via its SCSI_MAX_PHYS_SEGMENTS
1573
define. The way various memory pools are allocated by the linux SCSI
1574
subsystem, SCSI_MAX_PHYS_SEGMENTS could be increased to 256. Associated
1575
with each type of HBA there is normally a low level driver (LLD). Each
1576
LLD can further limit the maximum number of elements with
1577
the scsi_host_template::sg_tablesize field. Prior to lk 2.6.16 the sg
1579
st drivers used the .sg_tablesize field only, since lk 2.6.16 those
1580
drivers are also constrained by SCSI_MAX_PHYS_SEGMENTS. This leads to a
1581
potential halving of the maximum transfer size. Many LLDs set the
1582
.sg_tablesize field to SG_ALL (which is 255) but they may as well set
1583
that field to 256 unless the HBA hardware has a constraint.<br>
1586
memory may be allocated as the source and/or destination for DMA
1588
the HBA (i.e. direct IO). Even if the user space allocated a large
1590
with a single malloc(), the HBA DMA element typically has a different
1592
memory. This view may well contain many "page" size discontinuous
1593
pieces. This has the
1594
effect of using up, or perhaps exhausting, scatter-gather elements.<br>
1596
The sg driver attempts to build scatter gather lists with each element
1597
up to SG_SCATTER_SZ bytes large. This define is found in
1599
and has been set to 32 KB for some years. That is 8 times the page size
1600
(of 4 KB) on the i386 architecture. Some users who need really
1601
large transfers increase this define (and it is best to keep it a power
1602
of 2). However since lk 2.6.16 another limit comes into play: the
1603
MAX_SEGMENT_SIZE define which is set to 64 KB. MAX_SEGMENT_SIZE is a
1604
default and can be overridden by the LLD calling
1605
blk_queue_max_segment_size().<br>
1607
In lk 2.6.16 two further LLD parameters
1608
come into play even when the sg (and st) driver is used. These are
1609
scsi_host_template::max_sectors and scsi_host_template::use_clustering
1612
The .max_sectors setting in the LLD is the maximum number of 512 byte
1613
sectors allowed in a single SCSI command's scatter gather lists (for
1614
data transfers). Yes, that is a strange limit when trying to send a
1616
WRITE BUFFER command to upload firmware. Sysfs makes the LLD's
1617
.max_sectors setting visible (converted to kilobytes) in
1618
/sys/block/sd<x>/queue/max_hw_sectors_kb . The maximum allowable
1619
value in a LLD's .max_sector seems to be 65535 (0xffff in hexadecimal).
1620
This limits the maximum transfer size to (32*1024*1024 - 512) bytes,
1621
assuming other limitations have been overcome. [The 65535 sector limit
1622
is because Scsi_Host::max_sectors has type "unsigned short". Hopefully
1623
this type is expanded to "int" in the future (or removed).]<br>
1625
The .use_clustering field should be set to ENABLE_CLUSTERING . If not,
1626
the block subsystem rebuilds the scatter gather list it gets from the
1627
sg driver with page size (e.g. 4 KB) elements. [Actually is does that
1629
when ENABLE_CLUSTERING is set, it coalesces them again!]<br>
1631
<h2><a class="mozTocH2" name="mozTocId267334"></a>Conclusion</h2>
1632
In some situations, sending commands via the SG_IO ioctl may interfere
1633
with a higher level driver's use of a device. Users of the SG_IO ioctl
1634
should be aware that they are using a powerful, but low level facility,
1635
and write code accordingly. An example of this would be a utility to
1636
perform self tests on a disk: "background" self tests should be
1637
preferred over "foreground" self tests if there is a chance the
1638
computer may be using a file system on that disk at the time. Even a
1639
short foreground self test may take up to two minutes which is a long
1640
time to lock out a file system.<br>
1642
<p>Return to <a href="index.html">main</a> page. </p>
1644
<p>Last updated: 2nd Aprl 2006<br>