summaryrefslogtreecommitdiff
path: root/NOTES.opensolaris
blob: 7b77b6f70f707058dc75c42b5af38d1cb3e6d2fd (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
compiling:

  Both glibc and sun's libc require 64-bit atomic operations, first found in
  the Pentium Pro. The suggested method of compiling on 32-bit x86 is to set
  CC='gcc -march=i586'.

headers:

  In order to avoid duplicating OpenSolaris-specifc headers, most extensions
  define any constants/structs in accompanying private/"P" headers. For
  example, zone_* is implemented in zone.c and constants/structs are defined in
  zoneP.h.

auxiliary vector (auxv_t):

  Proper OpenSolaris does not support statically-linked executables (i.e. via
  gcc -static). However, glibc does, but with certain restrictions. The kernel
  only builds the auxv_t if the elf file is not of type ET_EXEC or if the
  PT_INTERP program header exists. This means that dynamically-linked
  executables and libaries get an auxv_t while statically-linked executables
  don't. This means that statically-linked executables won't see PT_TLS, which
  is needed for __thread support. We can test for the SHARED macro for libc
  library code, but in general, __thread will not work for statically-linked
  executables.

  In order to fix this, it should be a matter of changing the kernel to
  unconditionally supply the auxv_t.

scheduling:

  The OpenSolaris kernel allows for loadable schedule classes. A scheduling
  class has an id (pc_cid), associated name (pc_clname), and class-specific
  scheduling information. As new schedulers are loaded, they are incrementally
  assigned new id's.

  Since id's are assigned dynamically, there is no way to statically associate
  a class id with a posix scheduler (i.e. SCHED_*). The only exception is
  SCHED_SYS, which is guaranteed to have cid == 0.

  The following schedulers are defined:

  SCHED_*       | Name  | Min Prio  | Max Prio  |
  -----------------------------------------------
  SCHED_OTHER   | TS    | -60       | 60        |
  SCHED_FIFO    | RT    | 0         | 59        |
  SCHED_RR      | RT    | 0         | 59        |
  SCHED_SYS     | SYS   | 60        | 99        |
  SCHED_IA      | IA    | -60       | 60        |
  SCHED_FSS     | FSS   | -60       | 60        |
  SCHED_FX      | FX    | 0         | 60        |

  Internally the maximum and minimum are stored in a short int. Further, for
  mutexes, the ceiling is stored in a uint8_t. This means that it can be
  assumed that priorities must be between -128 and 127.

privileges:

  Each process has a set of privileges, represented by a prpriv_t. This struct
  contains a header, followed by a number of priv_chunk_t blocks (the priv
  sets), and finally followed by a number of priv_info_t blocks (per process
  additional info).

threads:

  The sun libpthread/libthread implementation assumes a 1:1 mapping between
  pthread_t/thread_t and lwpid_t, while NPTL maps thread descriptors to
  pthread_t. This behaviour was added to NPTL and maybe enabled by defining
  PTHREAD_T_IS_TID.

mutex:

  Recursive locks are represented by an 8-byte counter defined by the
  mutex_rcount macro. The maximum number of recursive waiters is
  UCHAR_MAX (255).

  Various fields are defined in a 64-bit field. 32 of the bits are used to
  hold the owner pid. 8-bits each are used for holding the lock byte, the
  number of waiters, and the number of spinners. Solaris defines some macros
  for accessing these (architecture dependent of course):

    mutex_lockword (32-bits): This is used if only the lock bit needs to
      touched.

    mutex_lockword64 (64-bits): This is used if you need to atomically swap
      both the lock bytes and the owner pid. Note that where the pid portion
      is located is dependent on byte ordering.

    mutex_lockbyte (8-bits): This is the actual lock byte. It is set to 1 when
      the lock is locked, and 0 when unlocked.

    mutex_waiters (8-bits): This is set to 1 when there is another thread
      waiting and 0 when there are no other waiters.

    mutex_spinners (8-bits): This byte is apparently unused.

    mutex_ownerpid (32-bits): Set to the mutex owner's process pid when the
      mutex is shared.

  The data field (aka mutex_owner) is used by sun libc to store a pointer to
  the thread-descriptor of the owning thread. We split this 64-bit field into
  two fields:

    mutex_owner (32-bits): The lwpid of the owning thread.

    mutex_cond_waiters (32-bits): An in-use counter that is incremented when
      waiting on a condition and decremented when we return (or are cancelled).

  The kernel only touches the data field when it is cleared during cleanup for
  certain mutex types.

  The kernel does not handle recursive or error-checking mutexes.

  The kernel does not set mutex_lockbyte for mutexes with the
  LOCK_PRIO_INHERIT bit set.

  The kernel does not use data.flag2. We use this to track the current priority
  ceiling (mutex_real_ceiling) for LOCK_PRIO_PROTECT mutexes.

semaphore:

condition variable:

  The cond_waiters_kernel byte is set to 1 if there are waiters on the
  condition variable and 0 otherwise. The cond_waiters_user byte is not
  used by the kernel.

  The only clock types supported by sun libc are CLOCK_REALTIME and
  CLOCK_HIGHRES.

  The data field is not used by the kernel.

reader-writer lock:

  The kernel only supports shared/process reader-writer locks; the private
  rwlock implementation must be completely implemented in libc. For the shared
  case, readercv and writercv are used to track the owner (thread and process).
  The sun docs also state that the sun implementation favours writers over
  readers[0].

  There is no apparent advantage in using the rwlock syscalls since any
  private implementation that used the embedded mutex and cv's would also work
  correctly in the shared case.

  Our implementation adds three additional fields for tracking the owner (thread
  and process) of a reader-writer lock.

[0] http://docs.sun.com/app/docs/doc/819-2243/rwlock-init-3c?a=view

nsswitch:

  nss_search

    This is used to search a database given a key. Examples that use nss_search
    include gethostbyname_r and _getauthattr.

  nss_getent
  nss_setent
  nss_endent
  nss_delete

    These are used when for iterating over a database. nss_getent, nss_sent,
    and nss_endent are used in gethostent, sethostent, and endhostent,
    respectively. nss_delete is used to free resources used by the interation;
    it usually directly follows a call to nss_endent.

  _nss_XbyY_fgets

    This function is used to parse a file directly, rather than going through
    nsswitch.conf and its databases.

syscalls:

  Dealing with 64-bit returns in 32-bit code is tricky. For 32-bit x86, %eax
  and %edx are not saved across function calls. Since syscalls always return
  a 32-bit integer we always have to push/pop %eax across functions. However,
  since there are very few 64-bit returning syscalls, we don't save %edx unless
  we have a 64-bit returning syscall. The following is a list of 64-bit
  returning syscalls:

    getgid, getuid, getpid, forkx, pipe, lseek64

  Currently, the only time we actually call functions is in the case of
  cancellation points (we call pthread_async_enable/disable). lseek64 is the
  only syscall listed above that is a cancellation point. To deal with this,
  we define SYSCALL_64BIT_RETURN in lseek64.S, which triggers inclusion of
  %edx saving.

  Additionally, 64-bit returning syscalls set both %eax and %edx to -1 on
  error. Similarily this behaviour is enabled by SYSCALL_64BIT_RETURN. Note
  that getegid, geteuid, and getppid are special in that their libc
  equivalents actually return 32-bit integers so we don't need to worry about
  %edx on error. With forkx and pipe, it suffices to check only the lower
  32-bits (one of the pid/fd's returned) for -1. For lseek64 we do have to
  check the full 64-bit return for -1.

sysconf:

  Many of the _SC_ sysconf values are obtained via the systemconf syscall. The
  following is a table of mappings from _SC_ to _CONFIG_ values. The third
  column lists the value returned by sysdeps/posix/sysconf.c.

    _SC_CHILD_MAX           _CONFIG_CHILD_MAX           _get_child_max
    _SC_CLK_TCK             _CONFIG_CLK_TCK             _getclktck
    _SC_NGROUPS_MAX         _CONFIG_NGROUPS             NGROUPS_MAX
    _SC_OPEN_MAX            _CONFIG_OPEN_FILES          __getdtablesize
    _SC_PAGESIZE            _CONFIG_PAGESIZE            __getpagesize
    _SC_XOPEN_VERSION       _CONFIG_XOPEN_VER           _XOPEN_VERSION
    _SC_STREAM_MAX          _CONFIG_OPEN_FILES          STREAM_MAX
    _SC_NPROCESSORS_CONF    _CONFIG_NPROC_CONF          __get_nprocs_conf
    _SC_NPROCESSORS_ONLN    _CONFIG_NPROC_ONLN          __get_nprocs
    _SC_NPROCESSORS_MAX     _CONFIG_NPROC_MAX
    _SC_STACK_PROT          _CONFIG_STACK_PROT
    _SC_AIO_LISTIO_MAX      _CONFIG_AIO_LISTIO_MAX      AIO_LISTIO_MAX
    _SC_AIO_MAX             _CONFIG_AIO_MAX             AIO_MAX
    _SC_AIO_PRIO_DELTA_MAX  _CONFIG_AIO_PRIO_DELTA_MAX  AIO_PRIO_DELTA_MAX
    _SC_DELAYTIMER_MAX      _CONFIG_DELAYTIMER_MAX      DELAYTIMER_MAX
    _SC_MQ_OPEN_MAX         _CONFIG_MQ_OPEN_MAX         MQ_OPEN_MAX
    _SC_MQ_PRIO_MAX         _CONFIG_MQ_PRIO_MAX         MQ_PRIO_MAX
    _SC_RTSIG_MAX           _CONFIG_RTSIG_MAX           RTSIG_MAX
    _SC_SEM_NSEMS_MAX       _CONFIG_SEM_NSEMS_MAX       SEM_NSEMS_MAX
    _SC_SEM_VALUE_MAX       _CONFIG_SEM_VALUE_MAX       SEM_VALUE_MAX
    _SC_SIGQUEUE_MAX        _CONFIG_SIGQUEUE_MAX        SIGQUEUE_MAX
    _SC_SIGRT_MAX           _CONFIG_SIGRT_MAX
    _SC_SIGRT_MIN           _CONFIG_SIGRT_MIN
    _SC_TIMER_MAX           _CONFIG_TIMER_MAX           TIMER_MAX
    _SC_PHYS_PAGES          _CONFIG_PHYS_PAGES          __get_phys_pages
    _SC_AVPHYS_PAGES        _CONFIG_AVPHYS_PAGES        __get_avphys_pages
    _SC_COHER_BLKSZ         _CONFIG_COHERENCY
    _SC_SPLIT_CACHE         _CONFIG_SPLIT_CACHE
    _SC_ICACHE_SZ           _CONFIG_ICACHESZ
    _SC_DCACHE_SZ           _CONFIG_DCACHESZ
    _SC_ICACHE_LINESZ       _CONFIG_ICACHELINESZ
    _SC_DCACHE_LINESZ       _CONFIG_DCACHELINESZ
    _SC_ICACHE_BLKSZ        _CONFIG_ICACHEBLKSZ
    _SC_DCACHE_BLKSZ        _CONFIG_DCACHEBLKSZ
    _SC_DCACHE_TBLKSZ       _CONFIG_DCACHETBLKSZ
    _SC_ICACHE_ASSOC        _CONFIG_ICACHE_ASSOC
    _SC_DCACHE_ASSOC        _CONFIG_DCACHE_ASSOC
    _SC_MAXPID              _CONFIG_MAXPID
    _SC_CPUID_MAX           _CONFIG_CPUID_MAX
    _SC_EPHID_MAX           _CONFIG_EPHID_MAX
    _SC_SYMLOOP_MAX         _CONFIG_SYMLOOP_MAX         SYMLOOP_MAX

fgetattr, fsetattr, getattrat, setattrat:

  The *attr calls are new to OpenSolaris  and are similar to Linux's extended
  attributes functions. They are implemented as openat(fd, attr_name, O_XATTR)
  and then read/written via the respective syscall.