Fun (Part deux)

We're five months down the road, the number of spikey balls floating around was slowly getting down to manageable levels, but is thanks to the delta variant making an unwelcome comeback. However, I've had both of my shots on the road to more freedom, and I'm still having fun…

So let's continue from where we left off a while ago, and talk more about old Turbo Pascal programs.

Hal Sampson and TLC

As I wrote, every program compiled with the early (1/2/3) versions of Turbo Pascal starts with about 10kb of code to perform the actions you've specified in Pascal, even a simple one-line program that prints a number like, I cannot resist the temptation, "42".

However, there turned out to be a very clever guy, Hal Sampson, who decided that it would be useful to remove the unused code from the RTL, and to that effect he wrote a program, TLC, "Turbo Library Compactor", which was sold by another legendary company in the heydays of that era, "Turbo Power Software", founded by Kim Kokkonen. I did acquire some of their software many years later, but TLC seemed to have vanished into the great digital never-never land. So, having succeeded in getting help from Eugene Roshal with RAR, and Ilfak Guilfanov with IDA Pro, more about that later, I sent off a "connect" request to Kim Kokkonen on LinkedIn. Kim replied, and the sad news was that he no longer had any of the Turbo Power software. The good news was that he still had a list of names and email addresses of people involved with Turbo Pascal in those days, so I BCC'ed them all the same email.

I got two bounces, and one of the guys, L. David Baldwin, turned out to have died earlier this year. His original website with his "Dave Baldwin's Free Programmer's Utilities" has also disappeared (in the last few months!), but fortunately there's a copy available via the Wayback Machine, which is where the provided link will take you.

Only the aforementioned Hal Sampson came back with good news, he found some 5 1/4" floppies that looked from the labels like they might contain what I was looking for, it was quite likely down to Turbo Power's honest licensing and the relatively low proces they charged for quite extra-ordinary software, that none of their users ever felt inclined to spread it around on BBS'es. None of my searches over the years had ever found a trace of it, that is until now.

There was however a tiny problem, Hal did have a working 5 1/4" drive, but it wasn't connected to any PC. He solved it, and that shows he's not only a world-class programmer, but also an amazing hardware guy, by connecting the drive to a Raspberry Pi to read out the signals. The result? Perfect disk-images, one containing not only the sources of TLC, but also those of a similar program, TOPT, "Turbo Object Optimizer" written by a John Cameron, that seems to be able to do more optimisations, and a "Turbo Object Librarian", written by a Hub Vandervoort, Jr. The documentation for all three? The comments in the programs, ouch!

Both TLC and TOPT work as advertised, but the .COM file out of TOPT has an as yet unexplored strange format, it works, but tracing it in a debugger is a nightmare.

The output of TLC is a "normal" .COM file, made up of a compacted RTL followed by the slightly optimized user code, savings for my "lift" programs are around 4 to 5kb, or 10%, irrelevant on today's multi-TB disks, but useful on the floppy disk based systems of the 1980'ies!

TLC has one minor issue, it doesn't like code that cannot be generated by the Turbo Pascal compilers, and that means that if you, like me, have used 386(+) instructions in "inline()" statements or in external routines, you're up the creek without a paddle. I had to go back 10 generations to find a version of lift that didn't use "$66" bytes to turn 16-bit register moves into 32-bit ones.

The other image recovered by Hal contains a Turbo Pascal V3 "full-screen" debugger, written by the earlier mentioned L. David Baldwin. I've, as yet, not tried that one.

Hex Rays and IDA Pro

When I asked Ilfak Guilfanov for some help with an IDA Pro issue, he duly came back, and what's more, he actually provided me with a "Special License"d copy of the software, whatever that may mean - I'm reluctant/afraid to ask. So far, I've reciprocated by reporting about some typos and other inconsistencies in the help file.

I had been pretty proficient with IDA in the early 2000'ies when I bought and upgraded previous versions, it took me some time to get back to speed, and this time I've actually gone all-in, not just by using the simple (in the right sense, it's very intuitive) full-screen interface, but also by using IDC, it's built-in scripting language, even though the internet tells me that I really should use Python! However, the thought of having to use, AD 2021, a language that isn't free-format, gives me the shivers, if not to say: PMABIWTP!

The first thing I had to do to make the marriage of TLC and IDA Pro work was to create an IDC script to actually disassemble TP3.01a generated programs. This turned out to be pretty easy, with the help of yet another gem from a bygone age, David Lindauer's GRDB, a greatly enhanced version of DOS' DEBUG. Using IDA Pro's ability to generate a MAP file and David Lindauer's MKSYM, and with IDA Pro on the side, I could single-step through the code, and see exactly how TP compiled programs set up their segments.

Adding that knowledge to the IDC script I created earlier, the one to disassemble the whole compiler, and zapping the part that deals with everything beyond the TP3 RTL led to an initial working script to disassemble TP 3.10a generated programs, but, but, but…

Sure, the IDC script produced a neatly commented listing for the RTL, and the code that resulted from compiling the user's source also showed the names of all RTL routines called, but at times the lines were missing, substantial sections of code were never disassembled, and at other times lines were garbled, and in one location I could see that my script did what it had to do, only for IDA to screw up the results all over again milliseconds later.

So I asked Hex-Rays for an explanation, and one of their developers, Igor Skochinsky, came back quickly with the solution: when you tell IDA to turn a series of instructions into an assembly language procedure, which should not be confused with a procedure in a high-level language, you can set a flag that indicates that it never returns.

Of course this obviously can (and the TP3 case, will) lead to more unassembled code, turning a drizzle into a downpour, but the eventual solution was simple! I decided to let IDA finish its work, emptying all it's queues, and then called another IDC procedure, changing the flags as per Igor's suggestion, and followed that by searching for the three non-returning procedures, and converting the data following them in either Pascal strings, or a Borland 6-byte real, and any bytes after that back into disassembled code.

IDA Pro and 60 old versions of lift

As I earlier wrote, I used a rather primitive way of keeping track of the old vesions of lift, and that had led to both explainable and unexplainable differences in the sizes of the newly compiled "lift.com" files, but here the old DOS version of SuperC came to the rescue. Comparing the newly compiled versions with the old saved ones gave me the offsets where the code contained differences, and loading both the old and new versions of "LIFT.COM" into IDA Pro allowed me to zoom in on them.

In a few cases it turned out that I had compiled an earlier or later version of the source, and in a few others it turned out that I had reversed "{$I filename}" statements, something the SuperC listing had already suggested. In the end, and not in the least helped by the fact that I had created an IDA Pro script to disassemble the TP 3.01a RTL, which allowed me to match generated code with Pascal source, I managed to fully recover all the sources for those versions where the old and new "lift.com" files didn't match, and a recompile of those sources resulted in (almost) byte-for-byte equivalent files, the only remaining differences being those bytes that contained the initialisation for the delay loop, the old files had been compiled on an original 4.77 MHz IBM PC, the new versions using DOSBox-X emulating a 100 MHz Pentium!

IDA Pro as a helper for TLC

Uh?

Yes!

When I write code, be it in PL/I, Pascal, or REXX, I tend to code in a logical way, and to give an example, when comparing two fields agains two other fields, using the second field as a tie-breaker, something like this would make perfect sense:

if (a1 > b1) | ((a1 = b1) & (a2 < b2)) then

except in Turbo Pascal V1/2/3 when your comparing strings or those, there they rear their heads again, 6-byte reals, which are not native x86 data types.

Why? Because Turbo Pascal is a simple single pass compiler! When the compiler "sees" that "a1" is a 6-byte real (or for that matter, a string) followed by a '<' sign, it knows that it needs to emit code to the routine that compares it another variable of the same type in casu "b1" for a "greater-than" result, so it emits a call to the "realg" RTL routine, and it does the same for the "a2" vs "b2" compare, this time emitting a call to the "reall" RTL routine, that differs in just one instruction from the "realg" one. And yes, there are four more of them, for "<=", ">=", "=", and "<>", and a similar set of six for string compares.

So where does IDA come into this? It has an option to show cross-references, just put the cursor on a variable or function, press, again ever so simple, but ever so intuitive, X, and instantaneously it shows a list where that variable or function is used. Do it on "reall" and "realg", or "realle" and "realge", and if there are only a few of one and a lot of the other, reverse the compares in the source, which eliminates the calls to them, allowing TLC to zap 26 bytes per eliminated "realx" compare and 12 bytes per eliminated string compare.

Some final notes (for now)

I've still not done much work on Hal's TLC, I would really like to convert into a format that compiles with Virtual Pascal, it would eliminate some of the hassle of working with (only) signed 16-bit integers on potentially 64K of read in data, and the VP debugger would allow much easier visualisation of the internal structures that hold information on the program that's being analysed!

As for IDA Pro, it's an absolutely amazing tool, I'm a hell of a lucky sod that I can use the current Pro version, and Hex-Rays' support is "ausgezeignet", which might very well be the reason that when you see disassembled code in a CVE, it's nearly always in the format that IDA puts out. It's also a paid for product, which means that Ilfak and his team will be there for their customers, whereas the ones using the new free kid on the block, Ghidra, just have to hope that any bug they file is interesting enough to be picked up by its developers!

Last updated on 30 September 2021 (Initial version)


Flags