1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
|
.\"
.\" Copyright 2022 Oxide Computer Company
.\" Copyright (c) 2005, Sun Microsystems, Inc. All Right Reserved.
.\"
.\" The contents of this file are subject to the terms of the
.\" Common Development and Distribution License (the "License").
.\" You may not use this file except in compliance with the License.
.\"
.\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
.\" or http://www.opensolaris.org/os/licensing.
.\" See the License for the specific language governing permissions
.\" and limitations under the License.
.\"
.\" When distributing Covered Code, include this CDDL HEADER in each
.\" file and include the License file at usr/src/OPENSOLARIS.LICENSE.
.\" If applicable, add the following below this CDDL HEADER, with the
.\" fields enclosed by brackets "[]" replaced with your own identifying
.\" information: Portions Copyright [yyyy] [name of copyright owner]
.\"
.Dd March 13, 2022
.Dt MADVISE 3C
.Os
.Sh NAME
.Nm madvise
.Nd provide advice to VM system
.Sh SYNOPSIS
.In sys/types.h
.In sys/mman.h
.Ft int
.Fo madviase
.Fa "void *addr"
.Fa "size_t len"
.Fa "int advice"
.Fc
.Sh DESCRIPTION
The
.Fn madvise
function advises the kernel that a region of user mapped memory in the range
.Pf [ Fa addr ,
.Fa addr
+
.Fa len Ns
) will be accessed following a type of pattern.
The kernel uses this information to optimize the procedure for manipulating and
maintaining the resources associated with the specified mapping range.
In general
.Pq and true to the name of the function ,
the advice is merely advisory, and the only user-visible ramifications are in
terms of performance, not semantics.
Note that
.Dv MADV_PURGE
is an exception to this; see below for details.
.Pp
Values for
.Fa advice
are defined in
.In sys/mman.h
as:
.Bd -literal -offset indent
#define MADV_NORMAL 0x0 /* No further special treatment */
#define MADV_RANDOM 0x1 /* Expect random page references */
#define MADV_SEQUENTIAL 0x2 /* Expect sequential page references */
#define MADV_WILLNEED 0x3 /* Will need these pages */
#define MADV_DONTNEED 0x4 /* Don't need these pages */
#define MADV_FREE 0x5 /* Contents can be freed */
#define MADV_ACCESS_DEFAULT 0x6 /* default access */
#define MADV_ACCESS_LWP 0x7 /* next LWP to access heavily */
#define MADV_ACCESS_MANY 0x8 /* many processes to access heavily */
#define MADV_PURGE 0x9 /* contents will be purged */
.Ed
.Bl -tag -width Ds
.It Dv MADV_NORMAL
This is the default system characteristic where accessing memory within the
address range causes the system to read data from the mapped file.
The kernel reads all data from files into pages which are retained for a period
of time as a
.Dq cache .
System pages can be a scarce resource, so the kernel steals pages from other
mappings when needed.
This is a likely occurrence, but adversely affects system performance only if a
large amount of memory is accessed.
.It Dv MADV_RANDOM
Tell the kernel to read in a minimum amount of data from a mapped file on any
single particular access.
If
.Dv MADV_NORMAL
is in effect when an address of a mapped file is accessed, the system tries to
read in as much data from the file as reasonable, in anticipation of other
accesses within a certain locality.
.It Dv MADV_SEQUENTIAL
Tell the system that addresses in this range are likely to be accessed only
once, so the system will free the resources mapping the address range as
quickly as possible.
.It Dv MADV_WILLNEED
Tell the system that a certain address range is definitely needed so the kernel
will start reading the specified range into memory.
This can benefit programs wanting to minimize the time needed to access memory
the first time, as the kernel would need to read in from the file.
.It Dv MADV_DONTNEED
Tell the kernel that the specified address range is no longer needed, so the
system starts to free the resources associated with the address range.
While the semantics of
.Dv MADV_DONTNEED
are similar to other systems, they differ significantly from the semantics on
Linux, where
.Dv MADV_DONTNEED
will actually synchronously purge the address range, and subsequent faults will
load from either backing store or be zero-filled on demand.
If the peculiar Linux semantics are desired,
.Dv MADV_PURGE
should be used in lieu of
.Dv MADV_DONTNEED .
.It Dv MADV_FREE
Tell the kernel that contents in the specified address range are no longer
important and the range will be overwritten.
When there is demand for memory, the system will free pages associated with the
specified address range.
In this instance, the next time a page in the address range is referenced, it
will contain all zeroes.
Otherwise, it will contain the data that was there prior to the
.Dv MADV_FREE
call.
References made to the address range will not make the system read from backing
store
.Pq swap space
until the page is modified again.
.Pp
This value cannot be used on mappings that have underlying file objects.
.It Dv MADV_PURGE
Tell the kernel to purge the specified address range.
The mapping will be retained, but the pages themselves will be destroyed;
subsequent faults on the range will result in the page being read from backing
store
.Pq if file-backed
or being zero-filled on demand
.Pq if anonymous .
Note that these semantics are generally inferior to
.Dv MADV_FREE ,
which gives the system more flexibility and results in better performance when
pages are, in fact, reused by the caller.
Indeed,
.Dv MADV_PURGE
only exists to provide an equivalent to the unfortunate
.Dv MADV_DONTNEED
semantics found in Linux, upon which some programs have
.Pq regrettably
come to depend.
In de novo applications,
.Dv MADV_PURGE
should be avoided;
.Dv MADV_FREE
should always be preferred.
.It Dv MADV_ACCESS_LWP
Tell the kernel that the next LWP to touch the specified address range will
access it most heavily, so the kernel should try to allocate the memory and
other resources for this range and the LWP accordingly.
.It Dv MADV_ACCESS_MANY
Tell the kernel that many processes and/or LWPs will access the specified
address range randomly across the machine, so the kernel should try to allocate
the memory and other resources for this range accordingly.
.It Dv MADV_ACCESS_DEFAULT
Reset the kernel's expectation for how the specified range will be accessed to
the default.
.El
.Pp
The
.Fn madvise
function should be used by applications with specific knowledge of their access
patterns over a memory object, such as a mapped file, to increase system
performance.
.Sh RETURN VALUES
Upon successful completion,
.Fn madvise
returns
.Sy 0 ;
otherwise, it returns
.Sy -1
and sets
.Va errno
to indicate the error.
.Sh ERRORS
.Bl -tag -width Er
.It Er EAGAIN
Some or all mappings in the address range
.Pf [ Fa addr ,
.Fa addr
+
.Fa len Ns
) are locked for I/O.
.It Er EBUSY
.ad
Some or all of the addresses in the range
.Pf [ Fa addr ,
.Fa addr
+
.Fa len Ns
) are locked and
.Dv MS_SYNC
with the
.Dv MS_INVALIDATE
option is specified.
.It Er EFAULT
Some or all of the addresses in the specified range could not be read into
memory from the underlying object when performing
.Dv MADV_WILLNEED .
The
.Fn madvise
function could return prior to this condition being detected, in which case
.Va errno
will not be set to
.Er EFAULT .
.It Er EINVAL
The
.Fa addr
argument is not a multiple of the page size as returned by
.Xr sysconf 3C ,
the length of the specified address range is equal to 0, or the
.Fa advice
argument was invalid.
.It Er EIO
An I/O error occurred while reading from or writing to the file system.
.It Er ENOMEM
Addresses in the range
.Pf [ Fa addr ,
.Fa addr
+
.Fa len Ns
) are outside the valid range for the address space of a process, or specify one
or more pages that are not mapped.
.It Er ESTALE
Stale NFS file handle.
.El
.Sh INTERFACE STABILITY
.Sy Committed
.Sh MT-LEVEL
.Sy MT-Safe
.Sh SEE ALSO
.Xr meminfo 2 ,
.Xr mmap 2 ,
.Xr sysconf 3C ,
.Xr attributes 7
|