Steve Nygard

Class-dump 3.3 Is Now Available

A new version of class-dump is now available. You can download it from the links on the class-dump project page. It is built for Mac OS X 10.5 or later.

Version 3.3 adds support for Snow Leopard, improves property handling, improves structure/union handling, fixes a bunch of bugs (including two crashers), and no doubt adds some new bugs.

Snow Leopard

There is a new load command (LC_DYLD_INFO) that contains a byte stream of information that dyld needs to load an image. This replaces the relocation info from the old dynamic symbol table; otool -rv outputs nothing for files built for 10.6. class-dump needs to understand this to correctly look up references to external class names. Version 3.2 shows a blank name for many superclass and category class names. This is fixed in version 3.3.

Snow Leopard also adds a new version of protected segments. These are recognized so that class-dump doesn’t spew out a bunch of garbage. Encrypted iPhone apps (indicated by the LC_ENCRYPTION_INFO command) are recognized for the same reason.

Bug Fixes

There are a lot of bug fixes. First of all, Mac OS X prefers 64-bit executables over 32-bit executables, so there was no need for a separate 32-bit executable. Now there is just one 32/64-bit universal executable. I’ve improved the parsing of C++ types, and class-dump can now make it through Safari and WebKit without complaint. Also fixed a crasher when trying to dump the 10.5 Finder.

Properties

More property attributes are now supported: dynamic, weak, strong, and non-atomic. Thanks to Joe Ranieri for sending a patch to handle these. Objective-C 2.0 categories and protocols now show their properties. Properties are now emitted when the first accessor method is encountered, and those accessors are not shown. Any remaining properties (usually dynamic ones, but there seem to be other cases where this can happen) are shown at the end of the method list. Properties declared in the optional section of protocols appear there in the output. The result is shorter, more readable output.

Property type strings can contain commas (for example, T^{map<int, int>=...},V_task), so we can’t just split the property attribute string on commas. Now class-dump parses the type, and then splits the rest.

Structure Handling

This is the process of trying to make sense out of all the structure and union definitions found in the files, and then to present it in a meaningful way.

Merging member names and types: Not all occurrences of a structure have member names, so it needs to merge member names when possible. For example, there might be both {_NSPoint="x"f"y"f} and {_NSPoint=ff}. Of course, there are exceptions where structures with the same name have either the same underlying type with different member names, or a different underlying type altogether. An example of this is ^{?=@ii{_flags=b1b1b1b1b12}} from @category NSApplication (NSWindowCache) and ^{__NSRowHeightBucket=fi{_flags="equalRowSizes"b1"reserved"b31}} from the _NSTableRowHeightStorage class. Anonymous structures can collide in the same way. It is common to have a structure of bitfield flags, and these can easily have the same layout but different member names. Types in methods never have member names, but worse still is that they reduce object types down to ids. That is, @”NSObject” appears as @ instead. Sometimes the compiler generates different names (such as $_2531) for multiple instances of the same union.

Determining how named and anonymous structures are declared: Named structures are generally declared at the top, unless they are only referenced once in an ivar or another structure, or if they are a special case (such as _flags), in which case they get expanded at the point where they occur. Any top level anonymous structures that are referenced by a method, or referenced from more than one place, need to be typedef’d and referred to by that name. And there are a few more special cases that should be handled.

Hmm, when I put it that way it sounds easy enough… But my specification was more like “Take a bunch of types, mix ‘em all together, and generate something pretty”. So there’s a lot of details and special cases to get right, and I ran into them one at a time. When I wrote that code, years ago, it took a lot of time to get it working on all of my test cases. I was tired of tweaking it, happy that it mostly worked, and ready to move on to something else. So I didn’t document it, and now I fear that code. I’ve made a couple attempts to figure out what it was doing and document it, but they didn’t get far.

I was working on the code after 3.2 was released. I decided to cache the parsed types, and pass the parsed types to the type formatter instead of passing a type string. Easy enough, but it introduced a bug, and that led me to the code that I feared. I ended up removing most of the old code and writing it from scratch. By the time is was mostly working again I was tired of tweaking it and ready to move on to something else. But this time I’ve documented it, so that I can remember how it works next year. Or next week.

What’s the result of that effort? Some structures get an actual object type, instead of just an id. Some structures pick up the correct member names, which were missing previously. Some structures have member names generated where they weren’t before. Some of the special cases (like _flag) are handled better.

I have also changed how typedef names are generated. Previously I just added an index to the name. This worked, but was extremely fragile. Unrelated changes to the code could alter the order that structures were processed, and change their name. Updates to frameworks could lead to large diffs in the output. The author of class-dump-z, kennytm, rightly criticized this (and other things!). Now I generate a hash based on the type string (after it’s been filled out with member names, and before the generic _field names have been added), and use that to create a unique typedef name. The results are great. There were only 6 diffs between the 10.5.5 and 10.5.8 AppKits with the new version, compared to 78 with the old.

The named structures are sorted by name and printed first, followed by the special cases. The anonymous structures are next, sorted by nested structure depth and then by the type string, followed by the special cases. #pragma marks are added before each section, and a #pragma mark - before each mach-o file, to make it easy to search for the next section.

As always, you can send bug reports to me at nygard at gmail.com. There were a lot of changes, and I haven’t tested it as thoroughly as I normally do.