diff options
Diffstat (limited to 'libdwarf/mips_extensions.mm')
-rw-r--r-- | libdwarf/mips_extensions.mm | 1266 |
1 files changed, 1266 insertions, 0 deletions
diff --git a/libdwarf/mips_extensions.mm b/libdwarf/mips_extensions.mm new file mode 100644 index 0000000..7a312f0 --- /dev/null +++ b/libdwarf/mips_extensions.mm @@ -0,0 +1,1266 @@ +\." +\." the following line may be removed if the ff ligature works on your machine +.lg 0 +\." set up heading formats +.ds HF 3 3 3 3 3 2 2 +.ds HP +2 +2 +1 +0 +0 +.nr Hs 5 +.nr Hb 5 +\." ============================================== +\." Put current date in the following at each rev +.ds vE rev 1.18, 31 March 2005 +\." ============================================== +\." ============================================== +.ds | | +.ds ~ ~ +.ds ' ' +.if t .ds Cw \&\f(CW +.if n .ds Cw \fB +.de Cf \" Place every other arg in Cw font, beginning with first +.if \\n(.$=1 \&\*(Cw\\$1\fP +.if \\n(.$=2 \&\*(Cw\\$1\fP\\$2 +.if \\n(.$=3 \&\*(Cw\\$1\fP\\$2\*(Cw\\$3\fP +.if \\n(.$=4 \&\*(Cw\\$1\fP\\$2\*(Cw\\$3\fP\\$4 +.if \\n(.$=5 \&\*(Cw\\$1\fP\\$2\*(Cw\\$3\fP\\$4\*(Cw\\$5\fP +.if \\n(.$=6 \&\*(Cw\\$1\fP\\$2\*(Cw\\$3\fP\\$4\*(Cw\\$5\fP\\$6 +.if \\n(.$=7 \&\*(Cw\\$1\fP\\$2\*(Cw\\$3\fP\\$4\*(Cw\\$5\fP\\$6\*(Cw\\$7\fP +.if \\n(.$=8 \&\*(Cw\\$1\fP\\$2\*(Cw\\$3\fP\\$4\*(Cw\\$5\fP\\$6\*(Cw\\$7\fP\\$8 +.if \\n(.$=9 \&\*(Cw\\$1\fP\\$2\*(Cw\\$3\fP\\$4\*(Cw\\$5\fP\\$6\*(Cw\\$7\fP\\$8\ +*(Cw +.. +.nr Cl 4 +.SA 1 +.TL +MIPS Extensions to DWARF Version 2.0 +.AF "" +.AU "Silicon Graphics Computer Systems" +.PF "'\*(vE'- \\\\nP -''" +.AS 1 +This document describes the MIPS/Silicon Graphics extensions +to the "DWARF Information Format" (version 2.0.0 dated July 27, 1993). +DWARF3 draft 8 (or draft 9) is out as of 2005, and +is mentioned below where applicable. +MIPS/IRIX compilers emit DWARF2 (with extensions). +.P +Rather than alter the base documents to describe the extensions +we provide this separate document. +.P +The extensions documented here are subject to change. +.P +It also describes known bugs resulting in incorrect dwarf usage. +.P +\*(vE + +.AE +.MT 4 + +.H 1 "INTRODUCTION" +.P +This +document describes MIPS extensions +to the DWARF debugging information format. +The extensions documented here are subject to change at +any time. +.H 1 "64 BIT DWARF" +.P +The DWARF2 spec has no provision for 64 bit offsets. +SGI-IRIX/MIPS Elf64 objects contain DWARF 2 with all offsets +(and addresses) as 64bit values. +This non-standard extension was adopted in 1992. +Nothing in the dwarf itself identifies the dwarf as 64bit. +This extension 64bit-offset dwarf cannot be mixed with 32bit-offset dwarf +in a single object or executable, and SGI-IRIX/MIPS compilers +and tools do not mix the sizes. +.P +In 2001 DWARF3 adopted a very different 64bit-offset +format which can be mixed usefully with 32bit-offset DWARF2 or DWARF3. +It is not very likely SGI-IRIX/MIPS compilers will switch to the +now-standard +DWARF3 64bit-offset scheme, but such a switch is theoretically +possible and would be a good idea. +.P +SGI-IRIX/MIPS Elf32 objects +contain DWARF2 with all offsets (and addresses) 32 bits. +.H 1 "How much symbol information is emitted" +The following standard DWARF V2 sections may be emitted: +.AL +.LI +Section .debug_abbrev +contains +abbreviations supporting the .debug_info section. +.LI +Section .debug_info +contains +Debug Information Entries (DIEs). +.LI +Section .debug_frame +contains +stack frame descriptions. +.LI +Section .debug_line +contains +line number information. +.LI +Section .debug_aranges +contains +address range descriptions. +.LI +Section .debug_pubnames +contains +names of global functions and data. +.P +The following +are MIPS extensions. +Theses were created to allow debuggers to +know names without having to look at +the .debug_info section. +.LI +Section .debug_weaknames +is a MIPS extension +containing .debug_pubnames-like entries describing weak +symbols. +.LI +Section .debug_funcnames +is a MIPS extension +containing .debug_pubnames-like entries describing file-static +functions (C static functions). +The gcc extension of nested subprograms (like Pascal) +adds non-global non-static functions. These should be treated like +static functions and gcc should add such to this section +so that IRIX libexc(3C) will work correctly. +Similarly, Ada functions which are non-global should be here too +so that libexc(3C) can work. +Putting it another way, every function (other than inline code) +belongs either in .debug_pubnames or in .debug_funcnames +or else libexc(3C) cannot find the function name. +.LI +Section .debug_varnames +is a MIPS extension +containing .debug_pubnames-like entries describing file-static +data symbols (C static variables). +.LI +Section .debug_typenames +is a MIPS extension +containing .debug_pubnames-like entries describing file-level +types. +.P +The following are not currently emitted. +.LI +Section .debug_macinfo +Macro information is not currently emitted. +.LI +Section .debug_loc +Location lists are not currently emitted. +.LI +Section .debug_str +The string section is not currently emitted. +.LE +.H 2 "Overview of information emitted" +We emit debug information in 3 flavors. +We mention C here. +The situation is essentially identical for f77, f90, and C++. +.AL +.LI +"default C" +We emit line information and DIEs for each subprogram. +But no local symbols and no type information. +Frame information is output. +The DW_AT_producer string has the optimization level: for example +"-O2". +We put so much in the DW_AT_producer that the string +is a significant user of space in .debug_info -- +this is perhaps a poor use of space. +When optimizing the IRIX CC/cc option -DEBUG:optimize_space +eliminates such wasted space. +Debuggers only currently use the lack of -g +of DW_AT_producer +as a hint as to how a 'step' command should be interpreted, and +the rest of the string is not used for anything (unless +a human looks at it for some reason), so if space-on-disk +is an issue, it is quite appropriate to use -DEBUG:optimize_space +and save disk space. +Every function definition (not inline instances though) is mentioned +in either .debug_pubnames or .debug_funcnames. +This is crucial to allow libexc(3C) stack-traceback to work and +show function names (for all languages). +.LI +"C with full symbols" +All possible info is emitted. +DW_AT_producer string has all options that might be of interest, +which includes -D's, -U's, and the -g option. +These options look like they came from the command line. +We put so much in the DW_AT_producer that the string +is a significant user of space in .debug_info. +this is perhaps a poor use of space. +Debuggers only currently use the -g +of DW_AT_producer +as a hint as to how a 'step' command should be interpreted, and +the rest of the string is not used for anything (unless +a human looks at it for some reason). +Every function definition (not inline instances though) is mentioned +in either .debug_pubnames or .debug_funcnames. +This is crucial to allow libexc(3C) stack-traceback to work and +show function names (for all languages). +.LI +"Assembler (-g, non -g are the same)" +Frame information is output. +No type information is emitted, but DIEs are prepared +for globals. +.LE +.H 2 "Detecting 'full symbols' (-g)" +The debugger depends on the existence of +the DW_AT_producer string to determine if the +compilation unit has full symbols or not. +It looks for -g or -g[123] and accepts these as +full symbols but an absent -g or a present -g0 +is taken to mean that only basic symbols are defined and there +are no local symbols and no type information. +.P +In various contexts the debugger will think the program is +stripped or 'was not compiled with -g' unless the -g +is in the DW_AT_producer string. +.H 2 "DWARF and strip(1)" +The DWARF section ".debug_frame" is marked SHF_MIPS_NOSTRIP +and is not stripped by the strip(1) program. +This is because the section is needed for doing +stack back traces (essential for C++ +and Ada exception handling). +.P +All .debug_* sections are marked with elf type +SHT_MIPS_DWARF. +Applications needing to access the various DWARF sections +must use the section name to discriminate between them. + +.H 2 "Evaluating location expressions" +When the debugger evaluates location expressions, it does so +in 2 stages. In stage one it simply looks for the trivial +location expressions and treats those as special cases. +.P +If the location expression is not trivial, it enters stage two. +In this case it uses a stack to evaluate the expression. +.P +If the application is a 32-bit application, it does the operations +on 32-bit values (address size values). Even though registers +can be 64 bits in a 32-bit program all evaluations are done in +32-bit quantities, so an attempt to calculate a 32-bit quantity +by taking the difference of 2 64-bit register values will not +work. The notion is that the stack machine is, by the dwarf +definition, working in address size units. +.P +These values are then expanded to 64-bit values (addresses or +offsets). This extension does not involve sign-extension. +.P +If the application is a 64-bit application, then the stack +values are all 64 bits and all operations are done on 64 bits. +.H 3 "The fbreg location op" +Compilers shipped with IRIX 6.0 and 6.1 +do not emit the fbreg location expression +and never emit the DW_AT_frame_base attribute that it +depends on. +However, this changes +with release 6.2 and these are now emitted routinely. + +.H 1 "Frame Information" +.H 2 "Initial Instructions" +The DWARF V2 spec +provides for "initial instructions" in each CIE (page 61, +section 6.4.1). +However, it does not say whether there are default +values for each column (register). +.P +Rather than force every CIE to have a long list +of bytes to initialize all 32 integer registers, +we define that the default values of all registers +(as returned by libdwarf in the frame interface) +are 'same value'. +This is a good choice for many non-register-windows +implementations. +.H 2 "Augmentation string in debug_frame" +The augmentation string we use in shipped compilers (up thru +irix6.2) is the empty string. +IRIX6.2 and later has an augmentation string +the empty string ("") +or "z" or "mti v1" +where the "v1" is a version number (version 1). +.P +We do not believe that "mti v1" was emitted as the +augmentation string in any shipped compiler. +.P +.H 3 "CIE processing based on augmentation string:" +If the augmentation string begins with 'z', then it is followed +immediately by a unsigned_leb_128 number giving the code alignment factor. +Next is a signed_leb_128 number giving the data alignment factor. +Next is a unsigned byte giving the number of the return address register. +Next is an unsigned_leb_128 number giving the length of the 'augmentation' +fields (the length of augmentation bytes, not +including the unsigned_leb_128 length itself). +As of release 6.2, the length of the CIE augmentation fields is 0. +What this means is that it is possible to add new +augmentations, z1, z2, etc and yet an old consumer to +understand the entire CIE as it can bypass the +augmentation it does not understand because the +length of the augmentation fields is present. +Presuming of course that all augmentation fields are +simply additional information, +not some 'changing of the meaning of +an existing field'. +Currently there is no CIE data in the augmentation for things +beginning with 'z'. +.P +If the augmentation string is "mti v1" or "" then it is followed +immediately by a unsigned_leb_128 number giving the code alignment factor. +Next is a signed_leb_128 number giving the data alignment factor. +Next is a unsigned byte giving the number of the return address register. +.P +If the augmentation string is something else, then the +code alignment factor is assumed to be 4 and the data alignment +factor is assumed to be -1 and the return +address register is assumed to be 31. Arbitrarily. +The library (libdwarf) assumes it does not understand the rest of the CIE. +.P +.H 3 "FDE processing based on augmentation" +If the CIE augmentation string +for an fde begins with 'z' +then the next FDE field after the address_range field +is an +unsigned_leb_128 number giving the length of the 'augmentation' +fields, and those fields follow immediately. + +.H 4 "FDE augmentation fields" +.P +If the CIE augmentation string is "mti v1" or "" +then the FDE is exactly as described in the Dwarf Document section 6.4.1. +.P +Else, if the CIE augmentation string begins with "z" +then the next field after the FDE augmentation length field +is a Dwarf_Sword size offset into +exception tables. +If the CIE augmentation string does not begin with "z" +(and is neither "mti v1" nor "") +the FDE augmentation fields are skipped (not understood). +Note that libdwarf actually (as of MIPSpro7.3 and earlier) +only tests that the initial character of the augmentation +string is 'z', and ignores the rest of the string, if any. +So in reality the test is for a _prefix_ of 'z'. +.P +If the CIE augmentation string neither starts with 'z' nor is "" +nor is "mti v1" then libdwarf (incorrectly) assumes that the +table defining instructions start next. +Processing (in libdwarf) will be incorrect. +.H 2 "Stack Pointer recovery from debug_frame" +There is no identifiable means in +DWARF2 to say that the stack register is +recovered by any particular operation. +A 'register rule' works if the caller's +stack pointer was copied to another +register. +An 'offset(N)' rule works if the caller's +stack pointer was stored on the stack. +However if the stack pointer is +some register value plus/minus some offset, +there is no means to say this in an FDE. +For MIPS/IRIX, the recovered stack pointer +of the next frame up the stack (towards main()) +is simply the CFA value of the current +frame, and the CFA value is +precisely a register (value of a register) +or a register plus offset (value of a register +plus offset). This is a software convention. +.H 1 "egcs dwarf extensions (egcs-1.1.2 extensions)" +This and following egcs sections describe +the extensions currently shown in egcs dwarf2. +Note that egcs has chosen to adopt tag and +attribute naming as if their choices were +standard dwarf, not as if they were extensions. +However, they are properly numbered as extensions. + +.H 2 "DW_TAG_format_label 0x4101" +For FORTRAN 77, Fortran 90. +Details of use not defined in egcs source, so +unclear if used. +.H 2 "DW_TAG_function_template 0x4102" +For C++. +Details of use not defined in egcs source, so +unclear if used. +.H 2 "DW_TAG_class_template 0x4103" +For C++. +Details of use not defined in egcs source, so +unclear if used. +.H 2 "DW_AT_sf_names 0x2101" +Apparently only output in DWARF1, not DWARF2. +.H 2 "DW_AT_src_info 0x2102" +Apparently only output in DWARF1, not DWARF2. +.H 2 "DW_AT_mac_info 0x2103" +Apparently only output in DWARF1, not DWARF2. +.H 2 "DW_AT_src_coords 0x2104" +Apparently only output in DWARF1, not DWARF2. +.H 2 "DW_AT_body_begin 0x2105" +Apparently only output in DWARF1, not DWARF2. +.H 2 "DW_AT_body_end 0x2106" +Apparently only output in DWARF1, not DWARF2. + +.H 1 "egcs .eh_frame (non-sgi) (egcs-1.1.2 extensions)" +egcs-1.1.2 (and earlier egcs) +emits by default a section named .eh_frame +for ia32 (and possibly other platforms) which +is nearly identical to .debug_frame in format and content. +This section is used for helping handle C++ exceptions. +.P +Because after linking there are sometimes zero-ed out bytes +at the end of the eh_frame section, the reader code in +dwarf_frame.c considers a zero cie/fde length as an indication +that it is the end of the section. +.P +.H 2 "CIE_id 0" +The section is an ALLOCATED section in an executable, and +is therefore mapped into memory at run time. +The CIE_pointer (aka CIE_id, section 6.4.1 +of the DWARF2 document) is the field that +distinguishes a CIE from an FDE. +The designers of the egcs .eh_frame section +decided to make the CIE_id +be 0 as the CIE_pointer definition is +.in +2 +the number of bytes from the CIE-pointer in the FDE back to the +applicable CIE. +.in -2 +In a dwarf .debug_frame section, the CIE_pointer is the +offset in .debug_frame of the CIE for this fde, and +since an offset can be zero of some CIE, the CIE_id +cannot be 0, but must be all 1 bits . +Note that the dwarf2.0 spec does specify the value of CIE_id +as 0xffffffff +(see section 7.23 of v2.0.0), +though earlier versions of this extensions document +incorrectly said it was not specified in the dwarf +document. +.H 2 "augmentation eh" +The augmentation string in each CIE is "eh" +which, with its following NUL character, aligns +the following word to a 32bit boundary. +Following the augmentation string is a 32bit +word with the address of the __EXCEPTION_TABLE__, +part of the exception handling data for egcs. +.H 2 "DW_CFA_GNU_window_save 0x2d" +This is effectively a flag for architectures with +register windows, and tells the unwinder code that +it must look to a previous frame for the +correct register window set. +As of this writing, egcs gcc/frame.c +indicates this is for SPARC register windows. +.H 2 "DW_CFA_GNU_args_size 0x2e" +DW_CFA_GNU_args_size has a single uleb128 argument +which is the size, in bytes, of the function's stack +at that point in the function. +.H 2 "__EXCEPTION_TABLE__" +A series of 3 32bit word entries by default: +0 word: low pc address +1 word: high pc address +2 word: pointer to exception handler code +The end of the table is +signaled by 2 words of -1 (not 3 words!). +.H 1 "Interpretations of the DWARF V2 spec" +.H 2 "template TAG spellings" +The DWARF V2 spec spells two attributes in two ways. +DW_TAG_template_type_param +(listed in Figure 1, page 7) +is spelled DW_TAG_template_type_parameter +in the body of the document (section 3.3.7, page 28). +We have adopted the spelling +DW_TAG_template_type_param. +.P +DW_TAG_template_value_param +(listed in Figure 1, page 7) +is spelled DW_TAG_template_value_parameter +in the body of the document (section 3.3.7, page 28). +We have adopted the spelling +DW_TAG_template_value_parameter. +.P +We recognize that the choices adopted are neither consistently +the longer nor the shorter name. +This inconsistency was an accident. +.H 2 DW_FORM_ref_addr confusing +Section 7.5.4, Attribute Encodings, describes +DW_FORM_ref_addr. +The description says the reference is the size of an address +on the target architecture. +This is surely a mistake, because on a 16bit-pointer-architecture +it would mean that the reference could not exceed +16 bits, which makes only +a limited amount of sense as the reference is from one +part of the dwarf to another, and could (in theory) +be *on the disk* and not limited to what fits in memory. +Since MIPS is 32 bit pointers (at the smallest) +the restriction is not a problem for MIPS/SGI. +The 32bit pointer ABIs are limited to 32 bit section sizes +anyway (as a result of implementation details). +And the 64bit pointer ABIs currently have the same limit +as a result of how the compilers and tools are built +(this has not proven to be a limit in practice, so far). +.P +This has been clarified in the DWARF3 spec and the IRIX use +of DW_FORM_ref_addr being an offset is correct. +.H 2 "Section .debug_macinfo in a debugger" +It seems quite difficult, in general, to +tie specific text(code) addresses to points in the +stream of macro information for a particular compilation unit. +So it's been difficult to see how to design a consumer +interface to libdwarf for macro information. +.P +The best (simple to implement, easy for a debugger user to +understand) candidate seems to be that +the debugger asks for macros of a given name in a compilation +unit, and the debugger responds with *all* the macros of that name. +.H 3 "only a single choice exists" +If there is exactly one, that is usable in expressions, if the +debugger is able to evaluate such. +.H 3 "multiple macros with same name". +If there are multiple macros with the same name +in a compilation unit, +the debugger (and the debugger user and the application +programmer) have +a problem: confusion is quite possible. +If the macros are simple the +debugger user can simply substitute by hand in an expression. +If the macros are complicated hand substitution will be +impractical, and the debugger will have to identify the +choices and let the debugger user choose an interpretation. +.H 2 "Section 6.1.2 Lookup by address problem" +Each entry is a beginning-address followed by a length. +And the distinguished entry 0,0 is used to denote +the end of a range of entries. +.P +This means that one must be careful not to emit a zero length, +as in a .o (object file) the beginning address of +a normal entry might be 0 (it is a section offset after all), +and the resulting 0,0 would be taken as end-of-range, not +as a valid entry. +A dwarf dumper would have trouble with such data +in an object file. +.P +In an a.out or shared object (dynamic shared object, DSO) +no text will be at address zero so in such this problem does +not arise. +.H 2 "Section 5.10 Subrange Type Entries problem" +It is specified that DW_AT_upper_bound (and lower bound) +must be signed entries if there is no object type +info to specify the bound type (Sec 5.10, end of section). +One cannot tell (with some +dwarf constant types) what the signedness is from the +form itself (like DW_FORM_data1), so it is necessary +to determine the object and type according to the rules +in 5.10 and then if all that fails, the type is signed. +It's a bit complicated and earlier versions of mips_extensions +incorrectly said signedness was not defined. +.H 2 "Section 5.5.6 Class Template Instantiations problem" +Lots of room for implementor to canonicalize +template declarations. Ie various folks won't agree. +This is not serious since a given compiler +will be consistent with itself and debuggers +will have to cope! +.H 2 "Section 2.4.3.4 # 11. operator spelling" +DW_OP_add should be DW_OP_plus (page 14) +(this mistake just one place on the page). +.H 2 "No clear specification of C++ static funcs" +There is no clear way to tell if a C++ member function +is a static member or a non-static member function. +(dwarf2read.c in gdb 4.18, for example, has this observation) +.H 2 "Misspelling of DW_AT_const_value" +Twice in appendix 1, DW_AT_const_value is misspelled +as DW_AT_constant_value. +.H 2 "Mistake in Atribute Encodings" +Section 7.5.4, "Attribute Encodings" +has a brief discussion of "constant" +which says there are 6 forms of constants. +It is incorrect in that it fails to mention (or count) +the block forms, which are clearly allowed by +section 4.1 "Data Object Entries" (see entry number 9 in +the numbered list, on constants). +.H 2 "DW_OP_bregx" +The description of DW_OP_bregx in 2.4.3.2 (Register Based +Addressing) is slightly misleading, in that it +lists the offset first. +As section 7.7.1 (Location Expression) +makes clear, in the encoding the register number +comes first. +.H 1 "MIPS attributes" +.H 2 "DW_AT_MIPS_fde" +This extension to Dwarf appears only on subprogram TAGs and has as +its value the offset, in the .debug_frame section, of the fde which +describes the frame of this function. It is an optimization of +sorts to have this present. + +.H 2 "DW_CFA_MIPS_advance_loc8 0x1d" +This obvious extension to dwarf line tables enables encoding of 8 byte +advance_loc values (for cases when such must be relocatable, +and thus must be full length). Applicable only to 64-bit objects. + +.H 2 "DW_TAG_MIPS_loop 0x4081" +For future use. Not currently emitted. +Places to be emitted and attributes that this might own +not finalized. + +.H 2 "DW_AT_MIPS_loop_begin 0x2002" +For future use. Not currently emitted. +Attribute form and content not finalized. + +.H 2 "DW_AT_MIPS_tail_loop_begin 0x2003" +For future use. Not currently emitted. +Attribute form and content not finalized. + +.H 2 "DW_AT_MIPS_epilog_begin 0x2004" +For future use. Not currently emitted. +Attribute form and content not finalized. + +.H 2 "DW_AT_MIPS_loop_unroll_factor 0x2005" +For future use. Not currently emitted. +Attribute form and content not finalized. + +.H 2 "DW_AT_MIPS_software_pipeline_depth 0x2006" +For future use. Not currently emitted. +Attribute form and content not finalized. +.H 2 "DW_AT_MIPS_linkage_name 0x2007" +The rules for mangling C++ names are not part of the +C++ standard and are different for different versions +of C++. With this extension, the compiler emits +both the DW_AT_name for things with mangled names +(recall that DW_AT_name is NOT the mangled form) +and also emits DW_AT_MIPS_linkage_name whose value +is the mangled name. +.P +This makes looking for the mangled name in other linker +information straightforward. +It also is passed (by the debugger) to the +libmangle routines to generate names to present to the +debugger user. +.H 2 "DW_AT_MIPS_stride 0x2008" +F90 allows assumed shape arguments and pointers to describe +non-contiguous memory. A (runtime) descriptor contains address, +bounds and stride information - rank and element size is known +during compilation. The extent in each dimension is given by the +bounds in a DW_TAG_subrange_type, but the stride cannot be +represented in conventional dwarf. DW_AT_MIPS_stride was added as +an attribute of a DW_TAG_subrange_type to describe the +location of the stride. +Used in the MIPSpro 7.2 (7.2.1 etc) compilers. +.P +If the stride is constant (ie: can be inferred from the type in the +usual manner) DW_AT_MIPS_stride is absent. +.P +If DW_AT_MIPS_stride is present, the attribute contains a reference +to a DIE which describes the location holding the stride, and the +DW_AT_stride_size field of DW_TAG_array_type is ignored if +present. The value of the stride is the number of +4 byte words between +elements along that axis. +.P +This applies to +.nf +a) Intrinsic types whose size is greater + or equal to 4bytes ie: real*4,integer*8 + complex etc, but not character types. + +b) Derived types (ie: structs) of any size, + unless all components are of type character. +.fi + +.H 2 "DW_AT_MIPS_abstract_name 0x2009" +This attribute only appears in a DA_TAG_inlined_subroutine DIE. +The value of this attribute is a string. +When IPA inlines a routine and the abstract origin is +in another compilation unit, there is a problem with putting +in a reference, since the ordering and timing of the +creation of references is unpredicatable with reference to +the DIE and compilation unit the reference refers to. +.P +Since there may be NO ordering of the compilation units that +allows a correct reference to be done without some kind of patching, +and since even getting the information from one place to another +is a problem, the compiler simply passes the problem on to the debugger. +.P +The debugger must match the DW_AT_MIPS_abstract_name +in the concrete +inlined instance DIE +with the DW_AT_MIPS_abstract_name +in the abstract inlined subroutine DIE. +.P +A dwarf-consumer-centric view of this and other inline +issues could be expressed as follows: +.nf +If DW_TAG_subprogram + If has DW_AT_inline is abstract instance root + If has DW_AT_abstract_origin, is out-of-line instance + of function (need abstract origin for some data) + (abstract root in same CU (conceptually anywhere + a ref can reach, but reaching outside of CU is + a problem for ipa: see DW_AT_MIPS_abstract_name)) + If has DW_AT_MIPS_abstract_name is abstract instance + root( must have DW_AT_inline) and this name is used to + match with the abstract root + +If DW_TAG_inline_subroutine + Is concrete inlined subprogram instance. + If has DW_AT_abstract_origin, it is a CU-local inline. + If it has DW_AT_MIPS_abstract_name it is an + inline whose abstract root is in another file (CU). +.fi + +.H 2 "DW_AT_MIPS_clone_origin 0x200a" +This attribute appears only in a cloned subroutine. +The procedure is cloned from the same compilation unit. +The value of this attribute is a reference to +the original routine in this compilation unit. +.P +The 'original' routine means the routine which has all the +original code. The cloned routines will always have +been 'specialized' by IPA. +A routine with DW_AT_MIPS_clone_origin +will also have the DW_CC_nocall value of the DW_AT_calling_convention +attribute. + +.H 2 "DW_AT_MIPS_has_inlines 0x200b" +This attribute may appear in a DW_TAG_subprogram DIE. +If present and it has the value True, then the subprogram +has inlined functions somewhere in the body. +.P +By default, at startup, the debugger may not look for +inlined functions in scopes inside the outer function. +.P +This is a hint to the debugger to look for the inlined functions +so the debugger can set breakpoints on these in case the user +requests 'stop in foo' and foo is inlined. +.H 2 "DW_AT_MIPS_stride_byte 0x200c" +Created for f90 pointer and assumed shape +arrays. +Used in the MIPSpro 7.2 (7.2.1 etc) compilers. +A variant of DW_AT_MIPS_stride. +This stride is interpreted as a byte count. +Used for integer*1 and character arrays +and arrays of derived type +whose components are all character. +.H 2 "DW_AT_MIPS_stride_elem 0x200d" +Created for f90 pointer and assumed shape +arrays. +Used in the MIPSpro 7.2 (7.2.1 etc) compilers. +A variant of DW_AT_MIPS_stride. +This stride is interpreted as a byte-pair (2 byte) count. +Used for integer*2 arrays. +.H 2 "DW_AT_MIPS_ptr_dopetype 0x200e" +See following. +.H 2 "DW_AT_MIPS_allocatable_dopetype 0x200f" +See following. +.H 2 "DW_AT_MIPS_assumed_shape_dopetype 0x2010" +DW_AT_MIPS_assumed_shape_dopetype, DW_AT_MIPS_allocatable_dopetype, +and DW_AT_MIPS_ptr_dopetype have an attribute value +which is a reference to a Fortran 90 Dope Vector. +These attributes are introduced in MIPSpro7.3. +They only apply to f90 arrays (where they are +needed to describe arrays never properly described +before in debug information). +C, C++, f77, and most f90 arrays continue to be described +in standard dwarf. +.P +The distinction between these three attributes is the f90 syntax +distinction: keywords 'pointer' and 'allocatable' with the absence +of these keywords on an assumed shape array being the third case. +.P +A "Dope Vector" is a struct (C struct) which describes +a dynamically-allocatable array. +In objects with full debugging the C struct will be +in the dwarf information (of the f90 object, represented like C). +A debugger will use the link to find the main struct DopeVector +and will use that information to decode the dope vector. +At the outer allocatable/assumed-shape/pointer +the DW_AT_location points at the dope vector (so debugger +calculations use that as a base). +.H 2 "Overview of debugger use of dope vectors" +Fundamentally, we build two distinct +representations of the arrays and pointers. +One, in dwarf, represents the statically-representable +information (the types and +variable/type-names, without type size information). +The other, using dope vectors in memory, represents +the run-time data of sizes. +A debugger must process the two representations +in parallel (and merge them) to deal with user expressions in +a debugger. +.H 2 "Example f90 code for use in explanation" +[Note +We want dwarf output with *exactly* +this little (arbitrary) example. +Not yet available. +end Note] +Consider the following code. +.nf + type array_ptr + real :: myvar + real, dimension (:), pointer :: ap + end type array_ptr + + type (array_ptr), allocatable, dimension (:) :: arrays + + allocate (arrays(20)) + do i = 1,20 + allocate (arrays(i)%ap(i)) + end do +.fi +arrays is an allocatable array (1 dimension) whose size is +not known at compile time (it has +a Dope Vector). At run time, the +allocate statement creats 20 array_ptr dope vectors +and marks the base arrays dopevector as allocated. +The myvar variable is just there to add complexity to +the example :-) +.nf +In the loop, arrays(1)%ap(1) + is allocated as a single element array of reals. +In the loop, arrays(2)%ap(2) + is allocated as an array of two reals. +... +In the loop, arrays(20)%ap(20) + is allocated as an array of twenty reals. +.fi +.H 2 "the problem with standard dwarf and this example" +.sp +In dwarf, there is no way to find the array bounds of arrays(3)%ap, +for example, (which are 1:3 in f90 syntax) +since any location expression in an ap array lower bound +attribute cannot involve the 3 (the 3 is known at debug time and +does not appear in the running binary, so no way for the +location expression to get to it). +And of course the 3 must actually index across the array of +dope vectors in 'arrays' in our implementation, but that is less of +a problem than the problem with the '3'. +.sp +Plus dwarf has no way to find the 'allocated' flag in the +dope vector (so the debugger can know when the allocate is done +for a particular arrays(j)%ap). +.sp +Consequently, the calculation of array bounds and indices +for these dynamically created f90 arrays +is now pushed of into the debugger, which must know the +field names and usages of the dope vector C structure and +use the field offsets etc to find data arrays. +C, C++, f77, and most f90 arrays continue to be described +in standard dwarf. +At the outer allocatable/assumed-shape/pointer +the DW_AT_location points at the dope vector (so debugger +calculations use that as a base). +.P +It would have been nice to design a dwarf extension +to handle the above problems, but +the methods considered to date were not +any more consistent with standard dwarf than +this dope vector centric approach: essentially just +as much work in the debugger appeared necessary either way. +A better (more dwarf-ish) +design would be welcome information. + +.H 2 "A simplified sketch of the dwarf information" +[Note: +Needs to be written. +end Note] + +.H 2 "A simplified sketch of the dope vector information" +[Note: +This one is simplified. +Details left out that should be here. Amplify. +end Note] +This is an overly simplified version of a dope vector, +presented as an initial hint. +Full details presented later. +.nf +struct simplified{ + void *base; // pointer to the data this describes + long el_len; + int assoc:1 + int ptr_alloc:1 + int num_dims:3; + struct dims_s { + long lb; + long ext; + long str_m; + } dims[7]; +}; +.fi +Only 'num_dims' elements of dims[] are actually used. + +.H 2 "The dwarf information" + +Here is dwarf information from the compiler for +the example above, as printed by dwarfdump(1) +.nf +[Note: +The following may not be the test. +Having field names with '.' in the name is +not such a good idea, as it conflicts with the +use of '.' in dbx extended naming. +Something else, like _$, would be much easier +to work with in dbx (customers won't care about this, +for the most part, +but folks working on dbx will, and in those +rare circumstances when a customer cares, +the '.' will be a real problem in dbx.). +Note that to print something about .base., in dbx one +would have to do + whatis `.base.` +where that is the grave accent, or back-quote I am using. +With extended naming one do + whatis `.dope.`.`.base.` +which is hard to type and hard to read. +end Note] + +<2>< 388> DW_TAG_array_type + DW_AT_name .base. + DW_AT_type <815> + DW_AT_declaration yes(1) +<3>< 401> DW_TAG_subrange_type + DW_AT_lower_bound 0 + DW_AT_upper_bound 0 +<2>< 405> DW_TAG_pointer_type + DW_AT_type <388> + DW_AT_byte_size 4 + DW_AT_address_class 0 +<2>< 412> DW_TAG_structure_type + DW_AT_name .flds. + DW_AT_byte_size 28 +<3>< 421> DW_TAG_member + DW_AT_name el_len + DW_AT_type <815> + DW_AT_data_member_location DW_OP_consts 0 +<3>< 436> DW_TAG_member + DW_AT_name assoc + DW_AT_type <841> + DW_AT_byte_size 0 + DW_AT_bit_offset 0 + DW_AT_bit_size 1 + DW_AT_data_member_location DW_OP_consts 4 +<3>< 453> DW_TAG_member + DW_AT_name ptr_alloc + DW_AT_type <841> + DW_AT_byte_size 0 + DW_AT_bit_offset 1 + DW_AT_bit_size 1 + DW_AT_data_member_location DW_OP_consts 4 +<3>< 474> DW_TAG_member + DW_AT_name p_or_a + DW_AT_type <841> + DW_AT_byte_size 0 + DW_AT_bit_offset 2 + DW_AT_bit_size 2 + DW_AT_data_member_location DW_OP_consts 4 +<3>< 492> DW_TAG_member + DW_AT_name a_contig + DW_AT_type <841> + DW_AT_byte_size 0 + DW_AT_bit_offset 4 + DW_AT_bit_size 1 + DW_AT_data_member_location DW_OP_consts 4 +<3>< 532> DW_TAG_member + DW_AT_name num_dims + DW_AT_type <841> + DW_AT_byte_size 0 + DW_AT_bit_offset 29 + DW_AT_bit_size 3 + DW_AT_data_member_location DW_OP_consts 8 +<3>< 572> DW_TAG_member + DW_AT_name type_code + DW_AT_type <841> + DW_AT_byte_size 0 + DW_AT_bit_offset 0 + DW_AT_bit_size 32 + DW_AT_data_member_location DW_OP_consts 16 +<3>< 593> DW_TAG_member + DW_AT_name orig_base + DW_AT_type <841> + DW_AT_data_member_location DW_OP_consts 20 +<3>< 611> DW_TAG_member + DW_AT_name orig_size + DW_AT_type <815> + DW_AT_data_member_location DW_OP_consts 24 +<2>< 630> DW_TAG_structure_type + DW_AT_name .dope_bnd. + DW_AT_byte_size 12 +<3>< 643> DW_TAG_member + DW_AT_name lb + DW_AT_type <815> + DW_AT_data_member_location DW_OP_consts 0 +<3>< 654> DW_TAG_member + DW_AT_name ext + DW_AT_type <815> + DW_AT_data_member_location DW_OP_consts 4 +<3>< 666> DW_TAG_member + DW_AT_name str_m + DW_AT_type <815> + DW_AT_data_member_location DW_OP_consts 8 +<2>< 681> DW_TAG_array_type + DW_AT_name .dims. + DW_AT_type <630> + DW_AT_declaration yes(1) +<3>< 694> DW_TAG_subrange_type + DW_AT_lower_bound 0 + DW_AT_upper_bound 0 +<2>< 698> DW_TAG_structure_type + DW_AT_name .dope. + DW_AT_byte_size 44 +<3>< 707> DW_TAG_member + DW_AT_name base + DW_AT_type <405> + DW_AT_data_member_location DW_OP_consts 0 +<3>< 720> DW_TAG_member + DW_AT_name .flds + DW_AT_type <412> + DW_AT_data_member_location DW_OP_consts 4 +<3>< 734> DW_TAG_member + DW_AT_name .dims. + DW_AT_type <681> + DW_AT_data_member_location DW_OP_consts 32 +<2>< 750> DW_TAG_variable + DW_AT_type <815> + DW_AT_location DW_OP_fbreg -32 + DW_AT_artificial yes(1) +<2>< 759> DW_TAG_variable + DW_AT_type <815> + DW_AT_location DW_OP_fbreg -28 + DW_AT_artificial yes(1) +<2>< 768> DW_TAG_variable + DW_AT_type <815> + DW_AT_location DW_OP_fbreg -24 + DW_AT_artificial yes(1) +<2>< 777> DW_TAG_array_type + DW_AT_type <815> + DW_AT_declaration yes(1) +<3>< 783> DW_TAG_subrange_type + DW_AT_lower_bound <750> + DW_AT_count <759> + DW_AT_MIPS_stride <768> +<2>< 797> DW_TAG_variable + DW_AT_decl_file 1 + DW_AT_decl_line 1 + DW_AT_name ARRAY + DW_AT_type <698> + DW_AT_location DW_OP_fbreg -64 DW_OP_deref +<1>< 815> DW_TAG_base_type + DW_AT_name INTEGER_4 + DW_AT_encoding DW_ATE_signed + DW_AT_byte_size 4 +<1>< 828> DW_TAG_base_type + DW_AT_name INTEGER_8 + DW_AT_encoding DW_ATE_signed + DW_AT_byte_size 8 +<1>< 841> DW_TAG_base_type + DW_AT_name INTEGER*4 + DW_AT_encoding DW_ATE_unsigned + DW_AT_byte_size 4 +<1>< 854> DW_TAG_base_type + DW_AT_name INTEGER*8 + DW_AT_encoding DW_ATE_unsigned + DW_AT_byte_size 8 + +.fi +.H 2 "The dope vector structure details" +A dope vector is the following C struct, "dopevec.h". +Not all the fields are of use to a debugger. +It may be that not all fields will show up +in the f90 dwarf (since not all are of interest to debuggers). +.nf +[Note: +Need details on the use of each field. +And need to know which are really 32 bits and which +are 32 or 64. +end Note] +The following +struct +is a representation of all the dope vector fields. +It suppresses irrelevant detail and may not +exactly match the layout in memory (a debugger must +examine the dwarf to find the fields, not +compile this structure into the debugger!). +.nf +struct .dope. { + void *base; // pointer to data + struct .flds. { + long el_len; // length of element in bytes? + unsigned int assoc:1; //means? + unsigned int ptr_alloc:1; //means? + unsigned int p_or_a:2; //means? + unsigned int a_contig:1; // means? + unsigned int num_dims: 3; // 0 thru 7 + unsigned int type_code:32; //values? + unsigned int orig_base; //void *? means? + long orig_size; // means? + } .flds; + + struct .dope_bnd. { + long lb ; // lower bound + long ext ; // means? + long str_m; // means? + } .dims[7]; +} +.fi + +.H 2 "DW_AT_MIPS_assumed_size 0x2011" +This flag was invented to deal with f90 arrays. +For example: + +.nf + pointer (rptr, axx(1)) + pointer (iptr, ita(*)) + rptr = malloc (100*8) + iptr = malloc (100*4) +.fi + +This flag attribute has the value 'yes' (true, on) if and only if +the size is unbounded, as iptr is. +Both may show an explicit upper bound of 1 in the dwarf, +but this flag notifies the debugger that there is explicitly +no user-provided size. + +So if a user asks for a printout of the rptr allocated +array, the default will be of a single entry (as +there is a user slice bound in the source). +In contrast, there is no explicit upper bound on the iptr +(ita) array so the default slice will use the current bound +(a value calculated from the malloc size, see the dope vector). + +Given explicit requests, more of rptr(axx) can me shown +than the default. + +.H 1 "Line information and Source Position" +DWARF does not define the meaning of the term 'source statement'. +Nor does it define any way to find the first user-written +executable code in a function. +.P +It does define that a source statement has a file name, +a line number, and a column position (see Sec 6.2, Line Number +Information of the Dwarf Version 2 document). +We will call those 3 source coordinates a 'source position' +in this document. We'll try not to accidentally call the +source position a 'line number' since that is ambiguous +as to what it means. + +.H 2 "Definition of Statement" +.P +A function prolog is a statement. +.P +A C, C++, Pascal, or Fortran statement is a statement. +.P +Each initialized local variable in C,C++ is a statement +in that its initialization generates a source position. +This means that + x =3, y=4; +is two statements. +.P +For C, C++: +The 3 parts a,b,c in for(a;b;c) {d;} are individual statements. +The condition portion of a while() and do {} while() is +a statement. (of course d; can be any number of statements) +.P +For Fortran, the controlling expression of a DO loop is a statement. +Is a 'continue' statement in Fortran a DWARF statement? +.P +Each function return, whether user coded or generated by the +compiler, is a statement. This is so one can step over (in +a debugger) the final user-coded statement +(exclusive of the return statement if any) in a function +wile not leaving the function scope. +.P + +.H 2 "Finding The First User Code in a Function" + +.nf +Consider: +int func(int a) +{ /* source position 1 */ + float b = a; /* source position 2 */ + int x; + x = b + 2; /* source position 3 */ +} /* source position 4 */ +.fi +.P +The DIE for a function gives the address range of the function, +including function prolog(s) and epilog(s) +.P +Since there is no scope block for the outer user scope of a +function (and thus no beginning address range for the outer +user scope: the DWARF committee explicitly rejected the idea +of having a user scope block) +it is necessary to use the source position information to find +the first user-executable statement. +.P +This means that the user code for a function must be presumed +to begin at the code location of the second source position in +the function address range. +.P +If a function has exactly one source position, the function +presumably consists solely of a return. +.P +If a function has exactly two source positions, the function +may consist of a function prolog and a return or a single user +statement and a return (there may be no prolog code needed in a +leaf function). In this case, there is no way to be sure which +is the first source position of user code, so the rule is to +presume that the first address is user code. +.P +If a function consists of 3 or more source positions, one +should assume that the first source position is function prolog and +the second is the first user executable code. + +.H 2 "Using debug_frame Information to find first user statement" +In addition to the line information, the debug_frame information +can be +useful in determining the first user source line. +.P +Given that a function has more than 1 source position, +Find the code location of the second source position, then +examine the debug_frame information to determine if the Canonical +Frame Address (cfa) is updated before the second source position +code location. +If the cfa is updated, then one can be pretty sure that the +code for the first source position is function prolog code. +.P +Similarly, if the cfa is restored in the code for +a source position, the source position is likely to +represent a function exit block. + +.H 2 "Debugger Use Of Source Position" +Command line debuggers, such as dbx and gdb, will ordinarily +want to consider multiple statements on one line to be a single +statement: doing otherwise is distressing to users since it +causes a 'step' command to appear to have no effect. +.P +An exception for command line debuggers is in determining the +first user statement: as detailed above, there one wants to +consider the full source position and will want to consider +the function return a separate statement. It is difficult to +make the function return a separate statement 'step' reliably +however if a function is coded all on one line or if the last +line of user code before the return is on the same line as the +return. +.P +A graphical debugger has none of these problems if it simply +highlights the portion of the line being executed. In that +case, stepping will appear natural even stepping within a +line. +.H 1 "Known Bugs" +Up through at least MIPSpro7.2.1 +the compiler has been emitting form DW_FORM_DATA1,2, or 4 +for DW_AT_const_value in DW_TAG_enumerator. +And dwarfdump and debuggers have read this with dwarf_formudata() +or form_sdata() and gotten some values incorrect. +For example, a value of 128 was printed by debuggers as a negative value. +Since dwarfdump and the compilers were not written to use the +value the same way, their output differed. +For negative enumerator values the compiler has been emitting 32bit values +in a DW_FORM_DATA4. +The compiler should probably be emitting a DW_FORM_sdata for +enumerator values. +And consumers of enumerator values should then call form_sdata(). +However, right now, debuggers should call form_udata() and only if +it fails, call form_sdata(). +Anything else will break backward compatibility with +the objects produced earlier. +.SK +.S +.TC 1 1 4 +.CS |