1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
|
.\" Copyright (c) 2005 Sun Microsystems, Inc. All Rights Reserved.
.\" Copyright (c) 2017, Joyent, Inc.
.\" The contents of this file are subject to the terms of the
.\" Common Development and Distribution License (the "License").
.\" You may not use this file except in compliance with the License.
.\"
.\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
.\" or http://www.opensolaris.org/os/licensing.
.\" See the License for the specific language governing permissions
.\" and limitations under the License.
.\"
.\" When distributing Covered Code, include this CDDL HEADER in each
.\" file and include the License file at usr/src/OPENSOLARIS.LICENSE.
.\" If applicable, add the following below this CDDL HEADER, with the
.\" fields enclosed by brackets "[]" replaced with your own identifying
.\" information: Portions Copyright [yyyy] [name of copyright owner]
.Dd March 13, 2022
.Dt MHD 4I
.Os
.Sh NAME
.Nm mhd
.Nd multihost disk control operations
.Sh SYNOPSIS
.In sys/mhd.h
.Sh DESCRIPTION
The
.Nm
.Xr ioctl 2
control access rights of a multihost disk, using
disk reservations on the disk device.
.Pp
The stability level of this interface (see
.Xr attributes 7 )
is evolving.
As a result, the interface is subject to change and you should limit your use of
it.
.Pp
The mhd ioctls fall into two major categories: (1) ioctls for non-shared
multihost disks and (2) ioctls for shared multihost disks.
.Pp
One ioctl,
.Dv MHIOCENFAILFAST ,
is applicable to both non-shared and shared multihost disks.
It is described after the first two categories.
.Pp
All the ioctls require root privilege.
.Pp
For all of the ioctls, the caller should obtain the file descriptor for the
device by calling
.Xr open 2
with the
.Dv O_NDELAY
flag; without the
.Dv O_NDELAY
flag, the open may fail due to another host already having a
conflicting reservation on the device.
Some of the ioctls below permit the caller to forcibly clear a conflicting
reservation held by another host, however, in order to call the ioctl, the
caller must first obtain the open file descriptor.
.Ss "Non-shared multihost disks"
Non-shared multihost disks ioctls consist of
.Dv MHIOCTKOWN ,
.Dv MHIOCRELEASE ,
.Dv MHIOCSTATUS ,
and
.Dv MHIOCQRESERVE .
These ioctl requests control the access rights of non-shared multihost disks.
A non-shared multihost disk is one that supports serialized, mutually exclusive
I/O mastery by the connected hosts.
This is in contrast to the shared-disk model, in which
concurrent access is allowed from more than one host (see below).
.Pp
A non-shared multihost disk can be in one of two states:
.Bl -bullet -width indent
.It
Exclusive access state, where only one connected host has I/O access
.It
Non-exclusive access state, where all connected hosts have I/O access.
An external hardware reset can cause the disk to enter the non-exclusive access
state.
.El
.Pp
Each multihost disk driver views the machine on which it's running as the
.Dq local host ;
each views all other machines as
.Dq remote hosts .
For each I/O or ioctl request, the requesting host is the local host.
.Pp
Note that the non-shared ioctls are designed to work with SCSI-2 disks.
The
SCSI-2 RESERVE/RELEASE command set is the underlying hardware facility in the
device that supports the non-shared ioctls.
.Pp
The function prototypes for the non-shared ioctls are:
.Bd -literal -offset 2n
.Fn ioctl fd MHIOCTKOWN ;
.Fn ioctl fd MHIOCRELEASE ;
.Fn ioctl fd MHIOCSTATUS ;
.Fn ioctl fd MHIOCQRESERVE ;
.Ed
.Bl -tag -width MHIOCQRESERVE
.It Dv MHIOCTKOWN
Forcefully acquires exclusive access rights to the multihost disk for the local
host.
Revokes all access rights to the multihost disk from remote hosts.
Causes the disk to enter the exclusive access state.
.Pp
Implementation Note: Reservations (exclusive access rights) broken via random
resets should be reinstated by the driver upon their detection, for example, in
the automatic probe function described below.
.It Dv MHIOCRELEASE
Relinquishes exclusive access rights to the multihost disk for the local host.
On success, causes the disk to enter the non- exclusive access state.
.It Dv MHIOCSTATUS
Probes a multihost disk to determine whether the local host has access rights
to the disk.
Returns
.Sy 0
if the local host has access to the disk,
.Sy 1
if it doesn't, and
.Sy -1
with
.Va errno
set to
.Er EIO
if the probe failed for some other reason.
.It Dv MHIOCQRESERVE
Issues, simply and only, a SCSI-2 Reserve command.
If the attempt to reserve
fails due to the SCSI error Reservation Conflict (which implies that some other
host has the device reserved), then the ioctl will return
.Sy -1
with
.Va errno
set to
.Er EACCES .
The
.Dv MHIOCQRESERVE
ioctl does NOT issue a bus device
reset or bus reset prior to attempting the SCSI-2 reserve command.
It also
does not take care of re-instating reservations that disappear due to bus
resets or bus device resets; if that behavior is desired, then the caller can
call
.Dv MHIOCTKOWN
after the
.Dv MHIOCQRESERVE
has returned success.
If
the device does not support the SCSI-2 Reserve command, then the ioctl returns
.Er -1
with
.Va errno
set to
.Er ENOTSUP .
The
.Dv MHIOCQRESERVE
ioctl is intended to be used by high-availability or clustering software for a
.Dq quorum
disk, hence, the
.Dq Q
in the name of the ioctl.
.El
.Ss "Shared Multihost Disks"
Shared multihost disks ioctls control access to shared multihost disks.
The ioctls are merely a veneer on the SCSI-3 Persistent Reservation facility.
Therefore, the underlying semantic model is not described in detail here, see
instead the SCSI-3 standard.
The SCSI-3 Persistent Reservations support the
concept of a group of hosts all sharing access to a disk.
.Pp
The function prototypes and descriptions for the shared multihost ioctls are as
follows:
.Bl -tag -width 1n
.It Fn ioctl fd MHIOCGRP_INKEYS "(mhioc_inkeys_t *)k"
.Pp
Issues the SCSI-3 command Persistent Reserve In Read Keys to the device.
On input, the field
.Fa k->li
should be initialized by the caller with
.Fa k->li.listsize
reflecting how big of an array the caller has allocated for the
.Fa k->lilist
field and with
.Ql k->li.listlen\& ==\& 0 .
On return, the field
.Fa k->li.listlen
is updated to indicate the number of
reservation keys the device currently has: if this value is larger than
.Fa k->li.listsize
then that indicates that the caller should have passed a bigger
.Fa k->li.list
array with a bigger
.Fa k->li.listsize .
The number of array elements actually written by the callee into
.Fa k->li.list
is the minimum of
.Fa k->li.listlen
and
.Fa k->li.listsize .
The field
.Fa k->generation
is updated with the generation information returned by the SCSI-3
Read Keys query.
If the device does not support SCSI-3 Persistent Reservations,
then this ioctl returns
.Sy -1
with
.Va errno
set to
.Er ENOTSUP .
.It Fn ioctl fd MHIOCGRP_INRESV "(mhioc_inresvs_t *)r"
.Pp
Issues the SCSI-3 command Persistent Reserve In Read Reservations to the
device.
Remarks similar to
.Dv MHIOCGRP_INKEYS
apply to the array manipulation.
If the device does not support SCSI-3 Persistent Reservations,
then this ioctl returns
.Sy -1
with
.Va errno
set to
.Er ENOTSUP .
.It Fn ioctl fd MHIOCGRP_REGISTER "(mhioc_register_t *)r"
.Pp
Issues the SCSI-3 command Persistent Reserve Out Register.
The fields of structure
.Va r
are all inputs; none of the fields are modified by the ioctl.
The field
.Fa r->aptpl
should be set to true to specify that registrations
and reservations should persist across device power failures, or to false to
specify that registrations and reservations should be cleared upon device power
failure; true is the recommended setting.
The field
.Fa r->oldkey
is the key that the caller believes the device may already have for this host
initiator; if the caller believes that that this host initiator is not already
registered with this device, it should pass the special key of all zeros.
To achieve the effect of unregistering with the device, the caller should pass
its current key for the
.Fa r->oldkey
field and an
.Fa r->newkey
field containing the special key of all zeros.
If the device returns the SCSI error code
Reservation Conflict, this ioctl returns
.Sy -1
with
.Va errno
set to
.Er EACCES .
.It Fn ioctl fd MHIOCGRP_RESERVE "(mhioc_resv_desc_t *)r"
.Pp
Issues the SCSI-3 command Persistent Reserve Out Reserve.
The fields of
structure
.Va r
are all inputs; none of the fields are modified by the ioctl.
If the device returns the SCSI error code Reservation Conflict, this ioctl
returns
.Sy -1
with
.Va errno
set to
.Er EACCES .
.It Fn ioctl fd MHIOCGRP_PREEMPTANDABORT "(mhioc_preemptandabort_t *)r"
.Pp
Issues the SCSI-3 command Persistent Reserve Out Preempt-And-Abort.
The fields
of structure
.Va r
are all inputs; none of the fields are modified by the ioctl.
The key of the victim host is specified by the field
.Fa r->victim_key .
The field
.Fa r->resvdesc
supplies the preempter's key and the reservation that it is requesting as part
of the SCSI-3 Preempt-And-Abort command.
If the device returns the SCSI error code
Reservation Conflict, this ioctl returns
.Sy -1
with
.Va errno
set to
.Er EACCES .
.It Fn ioctl fd MHIOCGRP_PREEMPT "(mhioc_preemptandabort_t *)r"
.Pp
Similar to
.Dv MHIOCGRP_PREEMPTANDABORT ,
but instead issues the SCSI-3 command Persistent Reserve Out Preempt.
(Note: This command is not implemented).
.It Fn ioctl fd MHIOCGRP_CLEAR "(mhioc_resv_key_t *)r"
Issues the SCSI-3 command Persistent Reserve Out Clear.
The input parameter
.Va r
is the reservation key of the caller, which should have been already
registered with the device, by an earlier call to
.Dv MHIOCGRP_REGISTER .
.El
.Pp
For each device, the non-shared ioctls should not be mixed with the Persistent
Reserve Out shared ioctls, and vice-versa, otherwise, the underlying device is
likely to return errors, because SCSI does not permit SCSI-2 reservations to be
mixed with SCSI-3 reservations on a single device.
It is, however, legitimate
to call the Persistent Reserve In ioctls, because these are query only.
Issuing the
.Dv MHIOCGRP_INKEYS
ioctl is the recommended way for a caller to
determine if the device supports SCSI-3 Persistent Reservations (the ioctl
will return
.Sy -1
with
.Va errno
set to
.Er ENOTSUP
if the device does not).
.Ss "MHIOCENFAILFAST Ioctl"
The
.Dv MHIOCENFAILFAST
ioctl is applicable for both non-shared and shared
disks, and may be used with either the non-shared or shared ioctls.
.Bl -tag -width 1n
.It Fn ioctl fd MHIOENFAILFAST "(unsigned int *)millisecs"
.Pp
Enables or disables the failfast option in the multihost disk driver and
enables or disables automatic probing of a multihost disk, described below.
The argument is an unsigned integer specifying the number of milliseconds to
wait between executions of the automatic probe function.
An argument of zero disables the failfast option and disables automatic probing.
If the
.Dv MHIOCENFAILFAST
ioctl is never called, the effect is defined to be that
both the failfast option and automatic probing are disabled.
.El
.Ss "Automatic Probing"
The
.Dv MHIOCENFAILFAST
ioctl sets up a timeout in the driver to periodically
schedule automatic probes of the disk.
The automatic probe function works in this manner: The driver is scheduled to
probe the multihost disk every n milliseconds, rounded up to the next integral
multiple of the system clock's resolution.
If
.Bl -enum -offset indent
.It
the local host no longer has access rights to the multihost disk, and
.It
access rights were expected to be held by the local host,
.El
.Pp
the driver immediately panics the machine to comply with the failfast model.
.Pp
If the driver makes this discovery outside the timeout function, especially
during a read or write operation, it is imperative that it panic the system
then as well.
.Sh RETURN VALUES
Each request returns
.Sy -1
on failure and sets
.Va errno
to indicate the error.
.Bl -tag -width Er
.It Er EPERM
Caller is not root.
.It Er EACCES
Access rights were denied.
.It Er EIO
The multihost disk or controller was unable to successfully complete the
requested operation.
.It Er EOPNOTSUP
The multihost disk does not support the operation.
For example, it does not support the SCSI-2 Reserve/Release command set, or the
SCSI-3 Persistent Reservation command set.
.El
.Sh STABILITY
Uncommitted
.Sh SEE ALSO
.Xr ioctl 2 ,
.Xr open 2 ,
.Xr attributes 7
|