1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
|
#LyX 2.0 created this file. For more info see http://www.lyx.org/
\lyxformat 413
\begin_document
\begin_header
\textclass article
\begin_preamble
\usepackage{a4wide}
\usepackage{times}
\end_preamble
\use_default_options false
\maintain_unincluded_children false
\language english
\language_package default
\inputencoding auto
\fontencoding global
\font_roman default
\font_sans default
\font_typewriter default
\font_default_family default
\use_non_tex_fonts false
\font_sc false
\font_osf false
\font_sf_scale 100
\font_tt_scale 100
\graphics default
\default_output_format default
\output_sync 0
\bibtex_command default
\index_command default
\paperfontsize default
\spacing single
\use_hyperref false
\papersize default
\use_geometry false
\use_amsmath 1
\use_esint 0
\use_mhchem 1
\use_mathdots 1
\cite_engine basic
\use_bibtopic false
\use_indices false
\paperorientation portrait
\suppress_date false
\use_refstyle 0
\index Index
\shortcut idx
\color #008000
\end_index
\leftmargin 1.5cm
\topmargin 1cm
\rightmargin 1.5cm
\bottommargin 1cm
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\paragraph_indentation default
\quotes_language english
\papercolumns 1
\papersides 1
\paperpagestyle default
\tracking_changes false
\output_changes false
\html_math_output 0
\html_css_as_file 0
\html_be_strict false
\end_header
\begin_body
\begin_layout Title
The new Unix RTL.
\end_layout
\begin_layout Author
Marco van de Voort (marco@freepascal.org)
\end_layout
\begin_layout Section*
Versions
\end_layout
\begin_layout Standard
Current version: 1.4a, just after the 2.6.2 release
\end_layout
\begin_layout Description
1.0 The version of 2005.
No version but the date in the PDF.
\end_layout
\begin_layout Description
1.1 Unversioned PDF with
\begin_inset Quotes eld
\end_inset
June 10th, 2008
\begin_inset Quotes erd
\end_inset
as date in it.
Mostly adds the
\begin_inset Quotes eld
\end_inset
prefix
\begin_inset Quotes erd
\end_inset
section.
\end_layout
\begin_layout Description
1.2 First numbered version.
\end_layout
\begin_layout Description
1.3 Minor changes, unixtype, libc wiki link
\end_layout
\begin_layout Description
1.4 more minor changes and updates.
\end_layout
\begin_layout Description
1.4a minor fixes to layout, unixutil paragraph
\end_layout
\begin_layout Description
1.4b mostly spelling fixes done while committing 1.4a changes.
\end_layout
\begin_layout Section
Introduction
\end_layout
\begin_layout Standard
This is a document where I wrote down some of the reasons for the restructiring
of the Unix rtl in the 1.1.x/1.9.x/2.0.x branch
\begin_inset Foot
status collapsed
\begin_layout Plain Layout
These versions are all the same series.
It was called 1.1.x when pre-beta, 1.9.x when in beta stage, and will be 2.0.x
when released.
\end_layout
\end_inset
.
This document was mostly written in retrospect while this branch was maturing,
and end-users needed to be prepared for the 1.0->2.0 changes, so it is not
really a design document written before the deed.
\end_layout
\begin_layout Standard
The restructure was never truly finished, and even now (while preparing
for 2.8.0), there are still more things to change, of course the main restructuri
ng has been done, but because some details still have to be done, this document
is still relevant I think.
The document also tries to explain some of the design considerations behind
these changes.
Recently, a wiki article
\begin_inset CommandInset href
LatexCommand href
name "http://wiki.freepascal.org/libc_unit"
target "http://wiki.freepascal.org/libc_unit"
\end_inset
was written that shares some of the issues in this doc (e.g.
Kylix libc unit issues), and is kept up to date better.
\end_layout
\begin_layout Section
History
\end_layout
\begin_layout Standard
The Unix rtl started life as the Linux rtl.
I don't have the exact date, but the design of the Linux unit of the 0.9(9).x
and 1.0.x series dates back to 1995-1996, and was made by Michael van Canneyt
based on the kernels of that era.
(1.1.x, pre-glibc).
This rtl was maintained and slightly expanded during the 1996-2000 period,
but no fundamental rearrangements were made.
\end_layout
\begin_layout Standard
Just before the 1.0 release Marco van de Voort started tinkering with FreeBSD,
and the unit linux was a major problem, at least for his skills then :-)
However FPC was already too deep into the codefreeze that was needed to
stabilize the upcoming 1.0 release to allow a junior member to fully redesign
the Unix rtl.
FreeBSD had started working when the first betas of 1.0.x were out, and it
was mostly a patched Linux RTL.
(so it couldn't be committed)
\end_layout
\begin_layout Standard
That's why 1.0 was released without formal FreeBSD support, and after cleanup
and integration, a 1.0 FreeBSD beta release was delivered a few weeks after
the formal release.
The minimal modifications for FreeBSD were merged into the CVS system before
the 1.0.2 release, and the FreeBSD platform was reasonably established and
regarded stable with the release of FPC 1.0.4.
\end_layout
\begin_layout Standard
In hindsight however not forcing more fundamental changes at the 1.0.2 release
point was a pity.
At least a few exotic linux functions should have been banned (sysinfo,
clone), and unix typing should have been introduced, and preferably the
unit should have been renamed to Unix.
Also the syscall interface should have been changed.
However besides the conservatism that resulted from the code freeze, I
had some doubts about the feasibility of the BSD ports then, and didn't
push it hard enough.
\end_layout
\begin_layout Standard
During the 1.0.x lifetime, I regretted this deeply, specially after 1.0.6 when
other Unix ports appeared, and the FreeBSD port turned out to be qualitatively
good.
The things that could be solved with a simple IFDEF when the FreeBSD port
was done, turned out to be annoying and complicated with multiple ports,
and likely to introduce bugs.
Small fixes done to the Linux RTL by others constantly broke the BSD ports.
The 1.1 branch was made already a year before 1.0.
Bug fixes and restructures were only partially ported to the 1.1.x branch,
and the OpenBSD/NetBSD/Solaris/QNX/BeOS ports were never ported to 1.1.x.
\end_layout
\begin_layout Standard
Because of these facts, there was a lot of maintenance work to do for 1.1.x,
and I decided to combine the needed maintenance and updating work with
the postponed restructure described above, and along the way tackle as
many other problems as possible.
\end_layout
\begin_layout Section
What's wrong with the situation in 1.0?
\end_layout
\begin_layout Standard
Well, there are a lot of reasons actually.
Some important ones:
\end_layout
\begin_layout Enumerate
The Linux unit was originally targeted at Linux only, 1.x kernels even.
Some details
\begin_inset Quotes eld
\end_inset
bled
\begin_inset Quotes erd
\end_inset
through in the units interface.
\end_layout
\begin_layout Enumerate
Quite a lot of different groups of functions with different portability
aspects are stuffed together in one unit.
This makes porting the complete unit nearly impossible, and also poses
some challenges to keep the unit long term compatible on Linux.
\end_layout
\begin_layout Enumerate
The Linux unit doesn't have any form of (Unix) typing.
Parameters types are translated to Turbo Pascal's integer or longint, using
the size in bytes they had in Linux 1.0.x This creates portability problems
to other Unices, Linux on other architectures, and makes it harder to fix
the unit for newer Linux versions.
\end_layout
\begin_layout Enumerate
The error handling of the Linux unit is an own invention, and not compatible
with libc, without any major benefits.
This complicates the situation when the base library bases on libc.
\end_layout
\begin_layout Enumerate
The name is wrong, at least for the current FPC.
It doesn't make sense to import an Linux unit under FreeBSD to access Unix
functions.
Also, in which unit do Linux specific functions end up?
\begin_inset Quotes eld
\end_inset
Linux
\begin_inset Quotes erd
\end_inset
would be logical, but is already taken.
\end_layout
\begin_layout Enumerate
Syscalls that are both used in unit Linux and system were duplicated.
Some other units also include these.
This adds a small overhead only (typically a few hundreds till several
KB) under Linux, but can dramatically increase if wrappers become complex.
See also next point.
\end_layout
\begin_layout Enumerate
The then-current readdir situation on Linux was bad.
Each readdir to get an entry is a syscall, which can be slow.
This can sped up by moving parts of the readdir call to userland, and only
calling the kernel once in every so and so many blocks (call getdents or
getdirentries).
Linux implements this too, since the old libc->glibc change, but FPC hadn't
caught up yet.
The *BSD ports did this from the start, but it is hand coded, and not a
translated libc version, which might cause problems with unusual filesystem
drivers.
(due to AMD64/Linux rtl work, I believe this has been remedied by Peter)
\end_layout
\begin_layout Enumerate
(see also 6) The structure of the include files was quite Linux centric,
and not very flexible.
System and Linux/UNIX unit are too rigidly entangled.
\end_layout
\begin_layout Enumerate
Functions weren't named consistently.
Some have fd- prefix, some none, some have a slightly different name from
libc etc etc.
(hmm, this was partially correct in hindsight.
fd* functions use the C file type, while the normal (without prefix) are
syscalls and use a kernel handle as first argument)
\end_layout
\begin_layout Enumerate
(minor) The parameter passing of the syscall interface was system dependent.
(Linux: record, BSD: pseudo procedural), this is bad because the syscall
interface was exported too.
\end_layout
\begin_layout Standard
These reasons are made worse because 2.0 was supposed to support several
architectures, and probably more OSes.
During the 1.0 lifetime, the Linux unit was ported to *BSD and BeOS, and
that already stretched the design to its limits.
2.0 was expected to grow beyond 20 OS-architecture combinations, so that
made it much worse.
Portability aspects became more important if we wanted to avoid having
2
\begin_inset Quotes eld
\end_inset
good
\begin_inset Quotes erd
\end_inset
platforms, and the rest outdated builds that are only partially implemented.
\end_layout
\begin_layout Standard
These reasons except number two could be fixed by some major, but doable
refactoring of unit Linux, and renaming it to UNIX (as was done originally).
However reason two couldn't be fixed this way.
\end_layout
\begin_layout Standard
Since full compatibility would be broken by all those other changes anyway,
it was decided to do a full redesign, and start from the bottom up, and
take care of all these issues, with special attention to ease of maintenance,
portability (read: separating portable from unportable code).
The design must also scale enough to last for while, 2.0 shipped in 2005,
so the 1.0.x series roughly had a lifespan of about 5 years.
So a fundamental RTL design must be at least as durable.
\end_layout
\begin_layout Subsection
Why is it necessary to split up the unit?
\end_layout
\begin_layout Standard
The main reasons are related to portability and maintenance.
It's easier to do a new port (only the necessary units will be implemented),
units will less often be
\begin_inset Quotes eld
\end_inset
incomplete
\begin_inset Quotes erd
\end_inset
for some targets.
\end_layout
\begin_layout Standard
An important side effect is that future source will show more clearly in
the USES clause what UNIX functionality is actually used.
Use Termio or Syscall is more clear than
\begin_inset Quotes eld
\end_inset
Linux
\begin_inset Quotes erd
\end_inset
.
People often think that a single unit is easier than many, but this isn't
the case anymore if it stuffed from top till bottom with IFDEFs, and a
short description of what function is implemented on what platform is longer
than the source code itself.
\end_layout
\begin_layout Section
What are the basic idea's behind the new 1.1.x/1.9/2.0.x RTL?
\end_layout
\begin_layout Enumerate
Introduce Unix typing, so dev_t, off_t etc.
\end_layout
\begin_layout Enumerate
Fix the error handling to be compatible with normal Unix (POSIX) errno.
\series bold
\emph on
(
\emph default
Thread safe)
\end_layout
\begin_layout Enumerate
At least keep a possible implementation on top of libc in mind while designing
the new RTL.
The libraries must be recompilable with a define to keep them syscall free.
\end_layout
\begin_layout Enumerate
No more duplication of code.
Currently code is duplicated between system and the UNIX/Linux unit.
\end_layout
\begin_layout Enumerate
Split up and rename the unit into parts.
\end_layout
\begin_deeper
\begin_layout Enumerate
Baseunix which contains the reasonably portable calls (selection loosely
based on POSIX)
\end_layout
\begin_layout Enumerate
Termio which contains the
\begin_inset Quotes eld
\end_inset
termio
\begin_inset Quotes erd
\end_inset
calls.
\end_layout
\begin_layout Enumerate
The syscalls moved to the syscall unit.
\end_layout
\begin_layout Enumerate
The inport, outport calls move to the x86 unit.
\end_layout
\begin_layout Enumerate
Some very Linux specific calls move to unit Linux.
This includes calls like Clone and SysInfo
\end_layout
\begin_layout Enumerate
Unixutil which contains a few calls that are not Unix specific (usually
more general C interfacing).
A good place still has to be found for these
\end_layout
\begin_layout Enumerate
Unix pretty much contains a cleaned up version of the rest.
\end_layout
\begin_layout Enumerate
If the number of function-categories expands, add additional units instead
of adding it to an existing one.
E.g.
users,sockets,netdb cwstring etc.
\end_layout
\end_deeper
\begin_layout Enumerate
Functions that have an equivalent in libc are renamed to fp<libcname>.
All non fp functions that were added to ease the transition were deprecated
in 2.2
\end_layout
\begin_layout Enumerate
Introducing a modern readdir will be done too, but as one of the last things
to do, since it can be done
\begin_inset Quotes eld
\end_inset
under the hood
\begin_inset Quotes erd
\end_inset
.
I believe it was Peter that ultimately did it.
\end_layout
\begin_layout Enumerate
Restructuring the includefiles, and detangling the includefiles, and redividing
the contents into a platformspecific and -independant parts.
\end_layout
\begin_layout Enumerate
The linux syscalls were changed to the BSD way, instead of something that
can only be expressed in assembler, the BSDs internally have a pseudo procedura
l syntax for syscalls.
(which is quite generic, probably NetBSD's influence).
This spells the end for the syscallreg record that was linux AND x86 centric.
\end_layout
\begin_layout Subsection
Phasing of the changes.
\end_layout
\begin_layout Standard
The restructuring of the code was done in several phases, because the 1.1
branch should remain compilable, so that the compiler developers could
keep on working on it.
Usually after each phase there was some pauze for stabilising and clean-ups.
Roughly these phases were followed:
\end_layout
\begin_layout Enumerate
Renaming the linux unit to unix was the first step This sounds trivial,
but in practice it turned out to be adding {$ifdef ver1_0} uses linux{$else}
uses unix{$endif} for two days.
(called
\emph on
Renamefest
\emph default
in cvs logs)
\end_layout
\begin_layout Enumerate
Restructuring of the syscall interface.
This affected both unit Unix and System.
All was changed to use the BSD structure as much as possible.
\end_layout
\begin_layout Enumerate
At the roughly the same time, the unix typing was introduced.
\end_layout
\begin_layout Enumerate
These all needed a lot of cleanup.
The BSD ports turned out to be so familiar, that I roughly redivided the
BSD rtl between a generic Unix, generic BSD and OS specific part.
The BSD rtls share a lot more code now.
\end_layout
\begin_layout Enumerate
The system unit was cleaned of linuxisms (mainly sysunix.inc), and parts
were made more OS specific
\end_layout
\begin_layout Enumerate
A rough first implementation of the baseunix unit was made, using via
\emph on
external alias
\emph default
exported syscalls from system.
All the rearranging of the includefiles was quite a lot of work.
First for FreeBSD, then for Linux.
\end_layout
\begin_layout Enumerate
The complete CVS was checked, and changed to use functions from baseunix
instead of unit unix.
Again (for compiler, fcl, packages, ide) under $IFDEF VER1_0 for bootstrapping
reasons.
\emph on
(Renamefest II
\emph default
in CVS logs)
\end_layout
\begin_layout Enumerate
Functions both in baseunix and unix were removed from unit unix.
Unit baseunix was also extended a bit in this phase.
\end_layout
\begin_layout Enumerate
Unit unix was cleaned up and split up into multiple units (still in progress)
\end_layout
\begin_layout Enumerate
A possibility to recompile unix rtl using libc
\end_layout
\begin_layout Enumerate
Cleanup, redividing unix unit over platform (in)dependant includefiles.
(mostly done)
\end_layout
\begin_layout Enumerate
Darwin port, beos port, more non x86 ports.
\end_layout
\begin_layout Subsection
Unix errorhandling
\end_layout
\begin_layout Standard
The rules of Unix errorhandling are quite easy:
\end_layout
\begin_layout Itemize
Each function call indicates somehow if an error occurs.
Usually by returning -1.
For other functions check the manpages.
(typically these functions return a different type then a (C) integer).
\end_layout
\begin_layout Itemize
You are only allowed to read the error variable (errno, cerrno, see below)
if the function indicates an error.
\end_layout
\begin_layout Standard
Besides compability there is another nice thing about this scheme: if an
error occurs, one can simply bail out of the function with -1 in some situation
s, like in the next example:
\end_layout
\begin_layout LyX-Code
Function somefunc:cint; // a
\begin_inset Quotes eld
\end_inset
unix
\begin_inset Quotes erd
\end_inset
function.
\end_layout
\begin_layout LyX-Code
Var st : Stat;
\end_layout
\begin_layout LyX-Code
Begin
\end_layout
\begin_layout LyX-Code
If FpStat('/',st)=-1 Then
\end_layout
\begin_layout LyX-Code
exit(-1); // exit, errno is already set by fpstat.
\end_layout
\begin_layout LyX-Code
...
more code...
\end_layout
\begin_layout LyX-Code
If FpRmdir('/')=-1 Then
\end_layout
\begin_layout LyX-Code
exit(-1); // exit, errno is already set by fprmdir.
\end_layout
\begin_layout LyX-Code
...
more code
\end_layout
\begin_layout LyX-Code
somefunc:=0;
\end_layout
\begin_layout LyX-Code
end;
\end_layout
\begin_layout LyX-Code
etc etc.
\end_layout
\begin_layout Standard
This sounds like a shorthand, but there is more to it.
If fpstat fills different values on different platforms (or -versions),
you simply pass it on.
\end_layout
\begin_layout Subsubsection
The FPC errorhandling situation, errno and cerrno
\end_layout
\begin_layout Standard
FPC normally does its own system calls, and doesn't always to link to libc,
which is why the FPC rtl needs an own, independant errorvariable.
However when linking to libc or other libraries that use libc it needs
access to the libc error variable too.
In theory, we could let FPC's syscall write to libc's errno when libc is
(also) used, but since that could introduce subtle but hard to trace compabilit
y problems, it was decided to keep both errorvariables separate at all times,
except when FPC doesn't do syscalls internally at all.
\end_layout
\begin_layout Standard
FPC's own errornumber is called errno and is accessable via unit baseunix,
libc's errno is accesable via unit initc, and called cerrno.
If FPC uses libc for OS interfacing, then both errno's will point to the
libc errno.
\end_layout
\begin_layout Standard
What does this mean in practice? You need to know if the function you are
calling is from a unit that bases on libc calls or
\emph on
can
\emph default
also be based on (FPC internal) syscalls.
Then select the errorcode (errno,cerrno) accordingly.
So if you use unix, linux or similar units, you should get errno (baseunix.fpget
errno/fpseterrno), if you want to for e.g.
unit inet (a typically libc using unit), you need cerrno (initc.fpgetCerrno/fpse
tCerrno)
\end_layout
\begin_layout Standard
Don't worry about that a syscall using unit uses libc when compiled with
FPC_USE_LIBC, that is taken care of properly.
(when FPC_USE_LIBC, get/seterrno also update libc's errno)
\end_layout
\begin_layout Subsection
Libc or syscall?
\end_layout
\begin_layout Standard
From time to time, people are asking why FPC isn't using libc, and resorts
to syscalls.
\end_layout
\begin_layout Standard
There are several reasons for this, but the most important ones were the
constant small incompabilities in (Linux) glibc, and the large amount of
glibc versions in circulation.
(again, mainly for Linux).
This includes distributions that compile libc with special options (often
legacy free), and then work around this in headers for C users.
We even have seen distributions package versions that were officially (accordin
g to the glibc site) beta versions.
\end_layout
\begin_layout Standard
Moving to use libc by default would mean more than one binary distribution
per platform (mainly for Linux, but maybe also for other *nix OSes), without
much gain.
(the binaries would become slightly larger even when dynlinked with libc,
contrary to what you would expect, which is about a 10-40kb.
This is probably due to larger libc stubs and relocation data, if PIC is
used when linking to libc, the difference might be larger even).
Statically linked to libc the binaries are huge.
This is mostly because the glibc team probably doesn't prepare for this
eventuality.
\end_layout
\begin_layout Standard
Not being libc based also avoids some minor binary loader incompabilities
that creep up, even if the libc is statically linked.
\end_layout
\begin_layout Standard
Another reason is that FPC programs have structures for use with certain
functions (like struct STAT) defined in the Pascal rtl, while one calls
the C function directly in libc.
A C program that calls the same libc function, always uses the right stat
because it uses headers supplied with the OS, at least as long as field
renaming is consistent
\begin_inset Foot
status collapsed
\begin_layout Plain Layout
And unfortunately automatic unattended conversion of C headers is not really
doable.
\end_layout
\end_inset
.
And you'll get a warning if it isn't.
But FPC always uses the same one in the RTL.
This can be problematic if there are multiple libc's in circulation that
use different versions of the structure.
(like a 64-bit filesystem version of STAT, and an ordinary one).
The kernel doesn't have this, since incompatible versions of the call always
get a different syscall number.
Sometimes this is possible for libc too (e.g.
by always using stat32 or so, or ELF symbol versioning), but the libc situation
is generally a bit more difficult.
In the stat example, some distributions didn't support stat32 (to force
quicker migration to 64-bit fs).
And then there are the other unixes to consider.
\end_layout
\begin_layout Standard
However all this doesn't meant that an compile option for a libc based rtl
isn't nice, since linking to libc can be useful for
\end_layout
\begin_layout Itemize
porting purposes (to get the compiler working on a platform for the first
time), platforms that are poorly maintained.
(QNX, BeOS)
\end_layout
\begin_layout Itemize
Darwin (Mac OS X), an OS where the syscalls are said to be a bit more in
a state of flux.
\end_layout
\begin_layout Itemize
debugging purposes, switch to libc and see if the problem disappears.
This works both ways (switch to syscall to detect slight libc incompabilities)
\end_layout
\begin_layout Itemize
saving space, e.g programs like Lazarus will link to libc no matter what.
Having the RTL link to libc might save a few tens of kb's per application.
The exact savings in such case still have to be tested.
\end_layout
\begin_layout Itemize
Some functions can be
\begin_inset Quotes eld
\end_inset
enhanced
\begin_inset Quotes erd
\end_inset
in libc.
Specially for security and nameresolving related functionality.
\end_layout
\begin_layout Standard
Moreover when done during a large restructure and considered during the
design (errno handling), introducing libc support isn't really a lot of
work.
(initial implementation, generic parts+FreeBSD, about 6-7 hours)
\end_layout
\begin_layout Standard
A solution would be a GUI installer (e.g.
in Lazarus) that bootstraps FPC, and allows configuring by simply toggling
switches.
However such an app is a lot of work, and keeps always a bit of a DIY shine.
The FreeBSD ports system is also a natural fit, if somebody with enough
knowledge of it would step up.
But all these require a magnitude more of maintenance.
\end_layout
\begin_layout Subsubsection
Basic libc Implementation
\end_layout
\begin_layout Standard
The units that are primarily affected by libc are system, baseunix and unix.
This because these contain a lot of functions that are also in libc, or
access these via assembler aliases.
Secondary units are nearly all units that are based on syscall, like sockets,
ipc etc
\end_layout
\begin_layout Standard
A global define FPC_USE_LIBC is introduced that signals
\begin_inset Quotes eld
\end_inset
use base functions from libc
\begin_inset Quotes erd
\end_inset
.
(-Ur might be necessary to avoid recompilation).
The syscall primitives remain available via unit syscall (and units other
than baseunix and unix should use unit syscall and not the aliases)
\end_layout
\begin_layout Standard
The 1.0.x compability unit
\begin_inset Quotes eld
\end_inset
oldlinux
\begin_inset Quotes erd
\end_inset
isn't touched, and always uses syscalls.
Since 2.6.0, after 9 years of compatibility-only existence, it is no longer available
precompiled in the default distribution.
\end_layout
\begin_layout Subsubsection
pipe functions, popen/pclose, a problem?
\end_layout
\begin_layout Standard
At first it looked that the pipe functions popen/pclose would become a problem.
The FILE type used by these records is the libc internal file structure.
Internally these are backed by plain files in libc, and the implementation
of these functions in FPC is trivial (using FPC's own internal file record).
\end_layout
\begin_layout Standard
A solution proposed by another coremember could be to try keeping the pointer
type opague and retrieve the kernel filehandle with fileno() to be able
to overload the popen functions with proper pascal filetypes.
At least on the platforms where fileno() is a function (and not only a
macro).
For the closing operation, the FILE pointer should be stored somewhere
in the pascal filerecord too.
\end_layout
\begin_layout Subsection
__errno, __error, _errno_location, h_errno etc.
\end_layout
\begin_layout Standard
C is an ancient language which is pretty much frozen due to the enormous
amounts of Unix code, and doesn't have an in language threadvar system.
However (c)errno is an important global variable that must be threadsafe.
This is solved in libc by using some form of macro that usually transforms
an errno access to a function call that returns a pointer to the actual
errno (right threadinstance) Macro's don't exist after preprocessing, let
alone compilation.
So when linking to libc we have to poke in the internals, and somehow use
the function that returns the pointer to errno directly.
This situation is far from ideal, but the problem is made worse by Unix
API designers who simply aren't aware of the existance of other languages.
(or even C compilers other than the default installed one)
\end_layout
\begin_layout Standard
The problem is that the name isn't uniform over platforms, even the free
ones.
FreeBSD calls it __error, NetBSD __errno and Linux __errno_location.
The initc.setcerrno/initc.getcerrno routines wrap this difference.
\end_layout
\begin_layout Standard
h_errno is the symbol in the libc library for the non threadsafe variant,
and was used in 1.0.x.
However this isn't threadsafe, and newer glibc libraries seem to omit it.
By default, 1.1.x will use the threadsafe variants, but support for h_errno
is still under {$IFDEF }in the initc unit in case you need to work with
older libc's.
\end_layout
\begin_layout Standard
In general, try to avoid to update C style error variables directly, always
use either set/getcerrno or get/seterrno.
(the symbols errno and cerrno are ok, these call get/set(c)errno internally)
\end_layout
\begin_layout Subsection
Exec() functions
\end_layout
\begin_layout Standard
The exec() functions have been replaced by the fpexec functions.
Moreover, platform independant alternatives like TProcess and ExecuteProcess()
are more mature now.
The old 1.0.x linux.exec() functions remained in 2.0.x as legacy functions,
but were removed starting with 2.2.0
\end_layout
\begin_layout Standard
The main idea behind all new functions is the use of
\begin_inset Quotes eld
\end_inset
array of ansistring
\begin_inset Quotes erd
\end_inset
for the argument of the execl functions.
This means a programmer can specify the arguments himself, and are then
(with zero copy) passed to the OS.
The new way decreases the amount of string operations, and avoids the problems
with arguments and filenames that contain spaces that the old functions
had.
The old functions have been fixed for the most basic quote problems though.
\end_layout
\begin_layout Subsubsection
SysUtils.Executeprocess
\end_layout
\begin_layout Standard
The new execute functions are used in SysUtils.Executeprocess, which is the
new platform independant way of running a program.
(comparable to dos.exec, but without the 255 char limit)
\end_layout
\begin_layout Standard
Slowly the unit Dos interface is getting increasingly uncomfortable because
of shortstrings and dosisms.
In general, currently it is recommended to use sysutils as much as possible.
\end_layout
\begin_layout Subsection
The
\begin_inset Quotes eld
\end_inset
FP
\begin_inset Quotes erd
\end_inset
prefix
\end_layout
\begin_layout Standard
During the past years, I've been pestered about both the need for, and choice
of a prefix again and again.
The new Unix rtl was written pretty much from scratch, but due to similarity
in design requirements resembled Carl's POSIX unit pretty to an high degree.
The improved Unix typing was the biggest difference, as well as the splitting
up in .inc file that resulted from supporting multiple platforms (posix
only supported BeOS afaik).
\end_layout
\begin_layout Standard
The prefix was IIRC already in Carl's predecessor the POSIX unit, but there
it was
\begin_inset Quotes eld
\end_inset
POSIX_
\begin_inset Quotes erd
\end_inset
.
The reason for the prefix was pretty much to have one uniform rule to transform
the
\begin_inset Quotes eld
\end_inset
C
\begin_inset Quotes erd
\end_inset
name to the FPC one.
No prefix was dangerous because of clashes with the (then still supported)
Linux unit, and long term also for other functions like
\begin_inset Quotes eld
\end_inset
exit
\begin_inset Quotes erd
\end_inset
\begin_inset Quotes eld
\end_inset
write
\begin_inset Quotes erd
\end_inset
and
\begin_inset Quotes eld
\end_inset
read
\begin_inset Quotes erd
\end_inset
and with OS specific units.
\end_layout
\begin_layout Standard
I didn't want anything with
\begin_inset Quotes eld
\end_inset
POSIX
\begin_inset Quotes erd
\end_inset
in the Unix rtl, because I didn't want to commit outright to POSIX compability
due to the possible issues with macroed functionality, and definition on
the libc level (vs kernel level).
IOW, follow POSIX at armslength, no guarantees it is always an exact 1:1
mapping.
\end_layout
\begin_layout Standard
In early versions of the , the prefix was
\begin_inset Quotes eld
\end_inset
unx_
\begin_inset Quotes erd
\end_inset
, but this was considered ugly (and not pascal due to the underscore).
\end_layout
\begin_layout Standard
Who and when actually
\begin_inset Quotes eld
\end_inset
fp
\begin_inset Quotes erd
\end_inset
was decided I don't know.
Probably during the BBQ at Rosa and Joerg's place in France, where Carl,
Michael and I had a fairly heated discussion about the Unix RTL changes,
or the correspondence to work out the details afterwards.
It could have been somebody else on IRC even who suggested it.
(Florian, Peter or Jonas being the main candidates)
\end_layout
\begin_layout Standard
However even in retrospect I still stand by the need to add an prefix, and
a short prefix is fine.
People often bang on about the
\begin_inset Quotes eld
\end_inset
confusion
\begin_inset Quotes erd
\end_inset
it will cause, but it would have been much worse IMHO, when one had to
explain the non prefixed case: the exceptions, the name clashes with the
libc and linux and other platform specific units.
Yes, the transition
\begin_inset Foot
status collapsed
\begin_layout Plain Layout
The pains mostly were the
\begin_inset Quotes eld
\end_inset
renamefests
\begin_inset Quotes erd
\end_inset
and a fat year long dual maintainenance to keep code working with both
the 1.0.x and the 1.9.x branch till 2.0 came out.
Not only for the FPC project, but also for Lazarus.
However with over 5 years between the 1.0 and 2.0 release, there was no way
to do it more gradual without compromising 1.0.x internal compatibility.
(though IMHO that wouldn't have been a bad thing)
\end_layout
\end_inset
hurt, but IMHO we are in way a better supportable place now.
\end_layout
\begin_layout Standard
In my opinion the only way without a prefix would be a Modula-2 like extension
that forced all relevant identifiers from the baseunix,unix and socket
units to mandatorily prefix with unix name (EXPORT QUALIFIED)
\end_layout
\begin_layout Section
RTL layout
\end_layout
\begin_layout Standard
The following picture tries to explain some of the unit dependancies in
the Unix rtl, of course like all documentation, it is probably already
outdated :-)
\end_layout
\begin_layout Standard
\begin_inset Graphics
filename deeperrtl.png
width 15cm
\end_inset
\end_layout
\begin_layout Subsection
Includefiles
\end_layout
\begin_layout Subsubsection
Why so many includefiles and ifdefs?
\end_layout
\begin_layout Standard
There are many reasons why the FPC rtl is organised as it is.
Some of the reasons are:
\end_layout
\begin_layout Enumerate
The includefiles allow sharing of code used in multiple places, and that
eases maintaining.
\end_layout
\begin_layout Enumerate
A higher granularity of the source helps working with CVS'/SVN somewhat.
There is less chance that two people work on the same file and have to
merge their changes, a problem with units that are thousands of lines.
\end_layout
\begin_layout Enumerate
The implementation of a system unit uses a lot of OS dependant types, records
and functions that are also used in other units.
Includefiles and some tricks allow to reuse the declarations, usually without
increasing the size of the binaries.
Particularly the Unix system unit exports syscalls via an external alias
mechanism.
See the separate paragraph about this subject.
The main reason for this is to precisely control how many and which symbols
the system unit exports, since these are always visible.
\end_layout
\begin_layout Enumerate
Exactly what includefiles are OS dependant, and which not, is susceptable
to change in the long run.
Moving an inc file is easier than totally revising the ifdef system of
a huge unit.
\end_layout
\begin_layout Enumerate
The system must allow to make exceptions.
This is why key units (like System, Baseunix, and in the future unix) are
system dependant, but include generic parts.
The idea is that a porter can say
\begin_inset Quotes eld
\end_inset
I want to implement these parts in a generic way by including the generic
includefiles
\begin_inset Quotes erd
\end_inset
or
\begin_inset Quotes eld
\end_inset
I want to override this functionality with my own code
\begin_inset Quotes erd
\end_inset
\end_layout
\begin_layout Enumerate
It allows for the situation where Pascal/Delphi tradition stuffs all related
headers in one unit, and still have a file per C header file, which eases
header maintenance.
\end_layout
\begin_layout Subsubsection
\begin_inset Quotes eld
\end_inset
\noun on
improper
\noun default
\begin_inset Quotes erd
\end_inset
exporting from the Unix system unit.
\end_layout
\begin_layout Standard
As said in one of the previous paragraphs, the unix system unit exports
some OS dependant functions via the [public, alias: 'xxx']; construct.
This construct is used to declare names without mangling.
This means that some os specific functions in system get a name in a namespace
outside the pascal realm, so that other units can import them, like you
would from a DLL or from external code.
The functions are not exported by normal pascal declarations, thus keeping
the interface of the system unit OS-independant.
The
\begin_inset Quotes eld
\end_inset
client
\begin_inset Quotes erd
\end_inset
unit is mainly BaseUnix, but unit Unix also reuses a few functions from
system, and exports them.
\end_layout
\begin_layout Standard
This is all done to avoid duplication of functions between system on one
side and baseunix/unix on the otherside.
This saves a few tens of kbs.
The types (in ptypes.inc/ctypes.inc) are still imported twice, once in the
implementation of system, once in the unixtype unit.
\end_layout
\begin_layout Subsection
Unixtype
\end_layout
\begin_layout Standard
The unit unixtype was introduced pretty late in the rearchitecting proces.
Initially baseunix imported ptypes.inc and ctypes, but some platforms needed
base unix types below this level (e.g.
in header units that were used to implement baseunix).
At first these units simply also included ptypes.inc, but this led to type
incompability problems once more platforms were implemented
\begin_inset Foot
status collapsed
\begin_layout Plain Layout
The restructure was mostly carried out on FreeBSD, which pretty much only
has 32-bit types in the kernel interface.
Contrary to linux/x86 that also has 16-bit types
\end_layout
\end_inset
.
The only solution was to move all types to a separate unit and declare
the lowest unit in the RTL (the root of the dependence graph).
Since we still wanted to export all unix symbols from baseunix, after some
heated discussion, all types in ptypes.inc were aliased.
(see aliasptp.inc that aliases ptypes.inc and aliasctp.inc that aliases ctypes.inc)
\end_layout
\begin_layout Section
Remaining problems
\end_layout
\begin_layout Standard
Besides already named problems (e.g.
popen), there are some todo's left.
Most of these surfaced while porting Kylix apps, and there were some situations
where a FPC substitute wasn't easily found:
\end_layout
\begin_layout Enumerate
64-bit file access.
The best way to do this, is to simply only have a 64-bit interface, and
translate this internally on the few platforms that don't do 64-bit access.
Michael deprecated the 32-bit TStreams seek () primitive in 2.6.0.
\end_layout
\begin_layout Enumerate
Access to security data (/etc/passwd /etc/groups files etc).
Should be extracted and abstract to a separate unit, _with_ a FPC_USE_LIBC
option, so that via that avenue users can make sure their apps access via
libc, and tie in with all kinds of authentication systems.
(there is a header now in the users package since 2.4.2 or so)
\end_layout
\begin_layout Enumerate
Improve DNS resolving and accessing.
Netdb is not perfect yet, and needs a FPC_USE_LIBC option.
(cnetdb added in 2.4.4)
\end_layout
\begin_layout Enumerate
Kylixcomp unit for
\begin_inset Quotes eld
\end_inset
easy
\begin_inset Quotes erd
\end_inset
substitutes for certain constants that ease libc->baseunix porting that
we don't want to expose in the proper RTL.
(these are hardly used in practice)
\end_layout
\begin_layout Enumerate
the unicode primitives are also among the most used functions in unit libc.
The widestring manager has some support for these.
\end_layout
\begin_layout Enumerate
A lot of the transitional functionality still has to be phased out.
See separate paragraph.
\end_layout
\begin_layout Subsection
Solved problems
\end_layout
\begin_layout Enumerate
unit libc is no longer needed for dynamic loading of libraries (dynlibs)
\end_layout
\begin_layout Enumerate
unit libc is no longer needed for basic user/group querying (v2.2.2
\begin_inset Quotes eld
\end_inset
users
\begin_inset Quotes erd
\end_inset
package)
\end_layout
\begin_layout Enumerate
unit libc is no longer needed for Iconv calls (v2.2.4 iconvenc)
\end_layout
\begin_layout Enumerate
the resolver functions of libc are available in unit cnetdb as of 2.4.4
\end_layout
\begin_layout Enumerate
A deprecated warning has been added to unit libc in v2.6.2 to avoid design-in
in new programs.
\end_layout
\begin_layout Subsection
Deprecating transitional functionality
\end_layout
\begin_layout Standard
A start has been made to remove the helpers and transitional functionality,
and a some calls are marked and documented deprecated in 2.2 and 2.2.2 (mantis
#0011119), and will be removed in 2.3/2.4.
This is a bit hampered by the fact that not all symbols can be marked with
deprecated yet.
\end_layout
\begin_layout Standard
Some of the deprecated functionality is listed below:
\end_layout
\begin_layout Enumerate
1.0.x fields of the Linux stat record that require a ifdef.
\end_layout
\begin_layout Enumerate
non fp socket functions.
These were buggy in some cases (formal parameter bug?)
\end_layout
\begin_layout Enumerate
Some non fp functions in the unix rtl
\end_layout
\begin_layout Enumerate
Unixutil and the functions in it are now in limbo for 3 major versions.
See next paragraph
\end_layout
\begin_layout Subsubsection
unixutil
\end_layout
\begin_layout Standard
The remaining routines in this are reusable, and thus should move to something
portable.
On the other hand this is not possible because non-portable Unix unit depends
on them (see RTL dependency graph) A good solution still has to be found
for this issue.
\end_layout
\end_body
\end_document
|