1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
|
'\" te
.\" Copyright (c) 2005 Sun Microsystems, Inc. All Rights Reserved.
.\" The contents of this file are subject to the terms of the Common Development and Distribution License (the "License"). You may not use this file except in compliance with the License.
.\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing. See the License for the specific language governing permissions and limitations under the License.
.\" When distributing Covered Code, include this CDDL HEADER in each file and include the License file at usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your own identifying information: Portions Copyright [yyyy] [name of copyright owner]
.TH MHD 7I "Mar 18, 2011"
.SH NAME
mhd \- multihost disk control operations
.SH SYNOPSIS
.LP
.nf
\fB#include\fR \fB<sys/mhd.h>\fR
.fi
.SH DESCRIPTION
.sp
.LP
The \fBmhd\fR \fBioctl\fR(2) control access rights of a multihost disk, using
disk reservations on the disk device.
.sp
.LP
The stability level of this interface (see \fBattributes\fR(5)) is evolving. As
a result, the interface is subject to change and you should limit your use of
it.
.sp
.LP
The mhd ioctls fall into two major categories: (1) ioctls for non-shared
multihost disks and (2) ioctls for shared multihost disks.
.sp
.LP
One ioctl, \fBMHIOCENFAILFAST\fR, is applicable to both non-shared and shared
multihost disks. It is described after the first two categories.
.sp
.LP
All the ioctls require root privilege.
.sp
.LP
For all of the ioctls, the caller should obtain the file descriptor for the
device by calling \fBopen\fR(2) with the \fBO_NDELAY\fR flag; without the
\fBO_NDELAY\fR flag, the open may fail due to another host already having a
conflicting reservation on the device. Some of the ioctls below permit the
caller to forcibly clear a conflicting reservation held by another host,
however, in order to call the ioctl, the caller must first obtain the open file
descriptor.
.SS "Non-shared multihost disks"
.sp
.LP
Non-shared multihost disks ioctls consist of \fBMHIOCTKOWN\fR,
\fBMHIOCRELEASE\fR, \fBHIOCSTATUS\fR, and \fBMHIOCQRESERVE\fR. These ioctl
requests control the access rights of non-shared multihost disks. A non-shared
multihost disk is one that supports serialized, mutually exclusive I/O mastery
by the connected hosts. This is in contrast to the shared-disk model, in which
concurrent access is allowed from more than one host (see below).
.sp
.LP
A non-shared multihost disk can be in one of two states:
.RS +4
.TP
.ie t \(bu
.el o
Exclusive access state, where only one connected host has I/O access
.RE
.RS +4
.TP
.ie t \(bu
.el o
Non-exclusive access state, where all connected hosts have I/O access. An
external hardware reset can cause the disk to enter the non-exclusive access
state.
.RE
.sp
.LP
Each multihost disk driver views the machine on which it's running as the
"local host"; each views all other machines as "remote hosts". For each I/O or
ioctl request, the requesting host is the local host.
.sp
.LP
Note that the non-shared ioctls are designed to work with SCSI-2 disks. The
SCSI-2 RESERVE/RELEASE command set is the underlying hardware facility in the
device that supports the non-shared ioctls.
.sp
.LP
The function prototypes for the non-shared ioctls are:
.sp
.in +2
.nf
ioctl(fd, MHIOCTKOWN);
ioctl(fd, MHIOCRELEASE);
ioctl(fd, MHIOCSTATUS);
ioctl(fd, MHIOCQRESERVE);
.fi
.in -2
.sp
.ne 2
.na
\fB\fBMHIOCTKOWN\fR \fR
.ad
.RS 18n
Forcefully acquires exclusive access rights to the multihost disk for the local
host. Revokes all access rights to the multihost disk from remote hosts.
Causes the disk to enter the exclusive access state.
.sp
Implementation Note: Reservations (exclusive access rights) broken via random
resets should be reinstated by the driver upon their detection, for example, in
the automatic probe function described below.
.RE
.sp
.ne 2
.na
\fB\fBMHIOCRELEASE\fR \fR
.ad
.RS 18n
Relinquishes exclusive access rights to the multihost disk for the local host.
On success, causes the disk to enter the non- exclusive access state.
.RE
.sp
.ne 2
.na
\fB\fBMHIOCSTATUS\fR \fR
.ad
.RS 18n
Probes a multihost disk to determine whether the local host has access rights
to the disk. Returns \fB0\fR if the local host has access to the disk,
\fB1\fR if it doesn't, and \fB-1\fR with errno set to \fBEIO\fR if the probe
failed for some other reason.
.RE
.sp
.ne 2
.na
\fB\fBMHIOCQRESERVE\fR \fR
.ad
.RS 18n
Issues, simply and only, a SCSI-2 Reserve command. If the attempt to reserve
fails due to the SCSI error Reservation Conflict (which implies that some other
host has the device reserved), then the ioctl will return \fB-1\fR with errno
set to \fBEACCES\fR. The \fBMHIOCQRESERVE\fR ioctl does NOT issue a bus device
reset or bus reset prior to attempting the SCSI-2 reserve command. It also
does not take care of re-instating reservations that disappear due to bus
resets or bus device resets; if that behavior is desired, then the caller can
call \fBMHIOCTKOWN\fR after the \fBMHIOCQRESERVE\fR has returned success. If
the device does not support the SCSI-2 Reserve command, then the ioctl returns
\fB-1\fR with \fBerrno\fR set to \fBENOTSUP.\fR The \fBMHIOCQRESERVE\fR ioctl
is intended to be used by high-availability or clustering software for a
"quorum" disk, hence, the "Q" in the name of the ioctl.
.RE
.SS "Shared Multihost Disks"
.sp
.LP
Shared multihost disks ioctls control access to shared multihost disks. The
ioctls are merely a veneer on the SCSI-3 Persistent Reservation facility.
Therefore, the underlying semantic model is not described in detail here, see
instead the SCSI-3 standard. The SCSI-3 Persistent Reservations support the
concept of a group of hosts all sharing access to a disk.
.sp
.LP
The function prototypes and descriptions for the shared multihost ioctls are as
follows:
.sp
.ne 2
.na
\fB\fBioctl\fR(\fBfd\fR, \fBMHIOCGRP_INKEYS\fR, (\fBmhioc_inkeys_t\fR)
\fI*k\fR\fB);\fR\fR
.ad
.sp .6
.RS 4n
Issues the SCSI-3 command Persistent Reserve In Read Keys to the device. On
input, the field \fBk->li\fR should be initialized by the caller with
\fBk->li.listsize\fR reflecting how big of an array the caller has allocated
for the \fBk->li.list\fR field and with \fBk->li.listlen\fR \fB==\fR \fB0.\fR
On return, the field \fBk->li.listlen\fR is updated to indicate the number of
reservation keys the device currently has: if this value is larger than
\fBk->li.listsize\fR then that indicates that the caller should have passed a
bigger \fBk->li.list\fR array with a bigger \fBk->li.listsize.\fR The number of
array elements actually written by the callee into \fBk->li.list\fR is the
minimum of \fBk->li.listlen\fR and \fBk->li.listsize.\fR The field
k->generation is updated with the generation information returned by the SCSI-3
Read Keys query. If the device does not support SCSI-3 Persistent Reservations,
then this ioctl returns \fB-1\fR with \fBerrno\fR set to \fBENOTSUP\fR.
.RE
.sp
.ne 2
.na
\fB\fBioctl\fR(\fBfd\fR, \fBMHIOCGRP_INRESV\fR, (\fBmhioc_inresvs_t\fR)
\fI*r\fR\fB);\fR\fR
.ad
.sp .6
.RS 4n
Issues the SCSI-3 command Persistent Reserve In Read Reservations to the
device. Remarks similar to \fBMHIOCGRP_INKEYS\fR apply to the array
manipulation. If the device does not support SCSI-3 Persistent Reservations,
then this ioctl returns \fB-1\fR with \fBerrno\fR set to \fBENOTSUP\fR.
.RE
.sp
.ne 2
.na
\fB\fBioctl\fR(\fBfd\fR, \fBMHIOCGRP_REGISTER\fR, (\fBmhioc_register_t\fR)
\fI*r\fR\fB);\fR\fR
.ad
.sp .6
.RS 4n
Issues the SCSI-3 command Persistent Reserve Out Register. The fields of
structure \fIr\fR are all inputs; none of the fields are modified by the ioctl.
The field \fBr->aptpl\fR should be set to true to specify that registrations
and reservations should persist across device power failures, or to false to
specify that registrations and reservations should be cleared upon device power
failure; true is the recommended setting. The field \fBr->oldkey\fR is the key
that the caller believes the device may already have for this host initiator;
if the caller believes that that this host initiator is not already registered
with this device, it should pass the special key of all zeros. To achieve the
effect of unregistering with the device, the caller should pass its current key
for the \fBr->oldkey\fR field and an \fBr->newkey\fR field containing the
special key of all zeros. If the device returns the SCSI error code
Reservation Conflict, this ioctl returns \fB-1\fR with \fBerrno\fR set to
\fBEACCES\fR.
.RE
.sp
.ne 2
.na
\fB\fBioctl\fR(\fBfd\fR, \fBMHIOCGRP_RESERVE\fR, (\fBmhioc_resv_desc_t\fR)
\fI*r\fR\fB);\fR\fR
.ad
.sp .6
.RS 4n
Issues the SCSI-3 command Persistent Reserve Out Reserve. The fields of
structure \fIr\fR are all inputs; none of the fields are modified by the ioctl.
If the device returns the SCSI error code Reservation Conflict, this ioctl
returns \fB-1\fR with \fBerrno\fR set to \fBEACCES.\fR
.RE
.sp
.ne 2
.na
\fB\fBioctl\fR(\fBfd\fR, \fBMHIOCGRP_PREEMPTANDABORT\fR,
(\fBmhioc_preemptandabort_t\fR) \fI*r\fR\fB);\fR\fR
.ad
.sp .6
.RS 4n
Issues the SCSI-3 command Persistent Reserve Out Preempt-And-Abort. The fields
of structure \fIr\fR are all inputs; none of the fields are modified by the
ioctl. The key of the victim host is specified by the field
\fBr->victim_key\fR. The field \fBr->resvdesc\fR supplies the preempter's key
and the reservation that it is requesting as part of the SCSI-3
Preempt-And-Abort command. If the device returns the SCSI error code
Reservation Conflict, this ioctl returns \fB-1\fR with \fBerrno\fR set to
\fBEACCES.\fR
.RE
.sp
.ne 2
.na
\fB\fBioctl\fR(\fBfd\fR, \fBMHIOCGRP_PREEMPT\fR,
(\fBmhioc_preemptandabort_t\fR) \fI*r\fR\fB);\fR\fR
.ad
.sp .6
.RS 4n
Similar to \fBMHIOCGRP_PREEMPTANDABORT\fR, but instead issues the SCSI-3
command Persistent Reserve Out Preempt. (Note: This command is not
implemented).
.RE
.sp
.ne 2
.na
\fB\fBioctl\fR(\fBfd\fR, \fBMHIOCGRP_CLEAR\fR, (\fBmhioc_resv_key_t\fR)
\fI*r\fR\fB);\fR\fR
.ad
.sp .6
.RS 4n
Issues the SCSI-3 command Persistent Reserve Out Clear. The input parameter
\fIr\fR is the reservation key of the caller, which should have been already
registered with the device, by an earlier call to \fBMHIOCGRP_REGISTER\fR.
.RE
.sp
.LP
For each device, the non-shared ioctls should not be mixed with the Persistent
Reserve Out shared ioctls, and vice-versa, otherwise, the underlying device is
likely to return errors, because SCSI does not permit SCSI-2 reservations to be
mixed with SCSI-3 reservations on a single device. It is, however, legitimate
to call the Persistent Reserve In ioctls, because these are query only.
Issuing the \fBMHIOCGRP_INKEYS\fR ioctl is the recommended way for a caller to
determine if the device supports SCSI-3 Persistent Reservations (the ioctl
will return \fB-1\fR with \fBerrno\fR set to \fBENOTSUP\fR if the device does
not).
.SS "MHIOCENFAILFAST Ioctl"
.sp
.LP
The \fBMHIOCENFAILFAST\fR ioctl is applicable for both non-shared and shared
disks, and may be used with either the non-shared or shared ioctls.
.sp
.ne 2
.na
\fB\fBioctl\fR(\fBfd\fR, \fBMHIOENFAILFAST\fR, (unsigned int \fI*\fR)
\fImillisecs\fR\fB);\fR\fR
.ad
.sp .6
.RS 4n
Enables or disables the failfast option in the multihost disk driver and
enables or disables automatic probing of a multihost disk, described below.
The argument is an unsigned integer specifying the number of milliseconds to
wait between executions of the automatic probe function. An argument of zero
disables the failfast option and disables automatic probing. If the
\fBMHIOCENFAILFAST\fR ioctl is never called, the effect is defined to be that
both the failfast option and automatic probing are disabled.
.RE
.SS "Automatic Probing"
.sp
.LP
The \fBMHIOCENFAILFAST\fR ioctl sets up a timeout in the driver to periodically
schedule automatic probes of the disk. The automatic probe function works in
this manner: The driver is scheduled to probe the multihost disk every n
milliseconds, rounded up to the next integral multiple of the system clock's
resolution. If
.RS +4
.TP
1.
the local host no longer has access rights to the multihost disk, and
.RE
.RS +4
.TP
2.
access rights were expected to be held by the local host,
.RE
.sp
.LP
the driver immediately panics the machine to comply with the failfast model.
.sp
.LP
If the driver makes this discovery outside the timeout function, especially
during a read or write operation, it is imperative that it panic the system
then as well.
.SH RETURN VALUES
.sp
.LP
Each request returns \fB-1\fR on failure and sets \fBerrno\fR to indicate the
error.
.sp
.ne 2
.na
\fB\fBEPERM\fR \fR
.ad
.RS 14n
Caller is not root.
.RE
.sp
.ne 2
.na
\fB\fBEACCES\fR \fR
.ad
.RS 14n
Access rights were denied.
.RE
.sp
.ne 2
.na
\fB\fBEIO\fR\fR
.ad
.RS 14n
The multihost disk or controller was unable to successfully complete the
requested operation.
.RE
.sp
.ne 2
.na
\fB\fBEOPNOTSUP\fR \fR
.ad
.RS 14n
The multihost disk does not support the operation. For example, it does not
support the SCSI-2 Reserve/Release command set, or the SCSI-3 Persistent
Reservation command set.
.RE
.SH ATTRIBUTES
.sp
.LP
See \fBattributes\fR(5) for a description of the following attributes:
.sp
.sp
.TS
box;
c | c
l | l .
ATTRIBUTE TYPE ATTRIBUTE VALUE
_
Stability Evolving
.TE
.SH SEE ALSO
.sp
.LP
\fBioctl\fR(2), \fBopen\fR(2), \fBattributes\fR(5), open(2)
|