Having fun - hitchhiking around IT

Given that those nasty spiky tiny balls are still floating around in way to large quantities, that I'm not doing much z/OS and that there are only so many doors you can paint, I decided that it would be fun (and yes, my notion of "fun" may not match yours) to "hitchhike" down memory lane and to do something I had been thinking about for years, but had always managed to put off for one bogus reason or another.

So I unZIPped the 60(!) archives dating back to the 28 months between April 1994 and August 1996, containing the first, written in Borland's Turbo Pascal V3(.01a) [TP3], versions of my hitchhike statistics extraction program(s).

Obviously, given that IT moves a lot faster than the average hitchhiker, I hit a few snags on the way, but I also met two great "drivers", so lets start…

The environment

To run 16-bit TP3 programs on today's (mostly) 64-bit systems, which probably can still be forced to boot in 16-bit real mode if you're really into SM, you most likely need something like VirtualBox, which uses the hardware virtualisation extensions of the current generation of CPUs, to create a "real" DOS machine (which actually isn't officially supported by VB) or an emulator, and preferably an emulator that's still under active development, like DOSBox-X

VirtualBox and DOSBox-X both have their pros and cons, I personally prefer DOSBox-X, as it has direct access to the file system of the host, in my case W7-64 Pro, and reboots nearly instantaneously.

Getting started

The first snag I hit after unZIPping the 60 archives was that I had used a primitive way of keeping version control, i.e. if "types.pas" had not changed going from version(-1) to the current, version(0), I had deleted it from the archive of version(-1), to save a few bytes.

Obviously, yes, big sigh, I had made mistakes along the way, and the fact that on one particularly productive day, 22 May 1994, I had created three versions also didn't help, so in the end there were both explainable and unexplainable differences in the sizes of the newly compiled "lift.com" files. (Yes, I had kept the originals!)

The one change of size I could explain was caused by the fact that I decided to rename one output file from 'day.h-h' to 'days.h-h', the change I had made in the first version compiled with Turbo Pascal V6.0 [TP6].

The second snag was the format of the input file for the program, starting at version 21 I had changed the format into something I'm still, be it with just tiny tweaks, using today, but versions 1..20 used a completely different format. However, as both are CSV files, it wasn't too hard to convert the new format back into the old, I loaded the AD 2021 CVS into LibreOffice Calc (the spreadsheet component), created a second sheet with formulae that just picked out the required data, saved the result as another CSV file, and finally removed some lines that were surplus to requirements.

Snag three followed the moment I ran the first version of "lift.com" with the newly recreated "lift.dat". The program crashed, not entirely unexpectedly, due to the amount of data thrown at it, "lift.dat" contained data for 252 trips, whereas the program, based on its timestamp should not have expected more than 25 trips, oops…

In the end I found that it could handle 77 trips, but given that later versions started crashing due to limits imposed by TP3, and valid data that I would only encounter and cater for much later, I settled for 40 trips, the last 7 of them made after the 60th version of "lift.com".

I also tried compiling the old programs with the compiler I use nowadays, Virtual Pascal, [VP] and, much to my surprise, I only needed to make two fairly trivial changes, use 32-bit integers, and comment out a call to DOS interrupt 21h, which I used to get the time. I didn't bother to add the equivalent VP code. The conversion to VP worked up to the 45th version of the program, in version 46 I had started using inline statements to directly include assembler, and that was incompatible with VP.

The use of VP led to the first foray into Regina REXX, which actually turned out to be usable this time, due to the implementation of some ANSI features that are very similar of the "EXECIO" function on IBM's z/OS, the lack of which was the reason I originally stopped using it.

So why the need for REXX?

VP uses .VPO files to store compiler options, directories for input and output, and the configuration of the IDE, and while all of the contents of these files is just plain text, some of it is stored in hex, including, sigh, PMABIWTP, a second required copy of the name of the input file.

So, rather than manually going into each of the 60 directories, start VP, load 'LIFT.PAS', select the layout, and save this set-up as a VPO file, I only did so for the oldest backup, and then wrote a mere 22 lines of REXX to copy an updated version of this file to the directories of the other 59 versions. In the process I hit another snag that I eventually worked around, but never managed to solve directly.

Click here if you want to look at the really geeky stuff

To make the changes I used GNU's "SED" which of course requires quotes around change strings, and in Regina a line with just a quoted string is simply passed to the calling environment for execution. I simply could not figure out which quotes to double, and in the end just created the required command-line, assigned it to my all-time most favourite temporary variable and used that, i.e.

? = 'sed "-i" "s£bu-'pid'£bu-'id'£" lift'id'.vpo' ? old = c2x(substr(pid, 1, 1))','c2x(substr(pid, 2, 1))','c2x(substr(pid, 3, 1))',5C' new = c2x(substr(id, 1, 1))','c2x(substr(id, 2, 1))','c2x(substr(id, 3, 1))',5C' ? = 'sed "-i" "s£'old'£'new'£" lift'id'.vpo' ?

which worked. If you know how to it directly, please share your knowledge, it would remove some of the bloat, more about bloat later, from this exec (reducing it from 22 lines of code to just 20)!

And yes, '?' (and '!') can be used in, or like here, as the names of REXX variables.


Running the old, new, and VP versions of the programs, the first two in DOSBox-X, the latter in plain Windoze and comparing the results showed that everything matched, and comparing the output from subsequent versions gave me a decent overview of the changes I had introduced over the 28 months that these programs were in use, which was what I needed for

The second phase of the project

Seeing the changes in the various output files was interesting, but what I really wanted to do was to document the changes. Nowadays all of my files contain a flower box with comments detailing the latest changes, but there wasn't anything in these sources, and to immediately counter the "Why bother, nobody is ever going to look at them again" shouts of wasting time, I will reply: "For the same reasons that people hitchhike to the Northcape!", they've got an itch to scratch.

And deep in the back of my mind I'm thinking about putting the whole lot into Git so that others, and no, I'm not going into "Which others?", might learn something from it, and yes, I hear you laughing.

Just think "Petje Pitamientje"…

Adding flower boxes to the (-59) versions was of course easy, I just created a file containing

{************** Copyright (C) Robert AH Prins 1994-1994 **************** * * * This program is free software; you can redistribute it and/or modify * * it under the terms of the GNU General Public License as published by * * the Free Software Foundation; either version 3, or (at your option) * * any later version. * * * * This program is distributed in the hope that it will be useful, * * but WITHOUT ANY WARRANTY; without even the implied warranty of * * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * * GNU General Public License for more details. * * * * You should have received a copy of the GNU General Public License * * along with this program; if not, write to the Free Software * * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110, USA * ************************************************************************ +------------+---------------------------------------------------------+ | Date | Major changes | +------------+---------------------------------------------------------+ | | | +------------+---------------------------------------------------------+ | 1994-03-06 | First saved TP 3.01a version | +------------+---------------------------------------------------------+ ***********************************************************************}

and copied that into the 27 files that made up that version, with a simple batch file "for" loop.

Adding the changes for the next 59 versions was not much harder, it took a fair bit of time. I used WinMerge to compare whole directories, copied the flower boxes from left to right, pasted in new sections, and "pencilled" in the major changes, methodically working my way up to lift(0). WinMerge may not be as smart as SuperC, the brainchild of the late Donald Ludlow, one of IBM's legendary developers, on z/OS, in detecting some changes, in particular moved lines, but with a bit of Cut & Paste help, it does the job. (For what it's worth Donald's son still sells various versions of SuperC, and as shareware you can try them out first, with minor limitations)

Once finished, I copied everything to DVD-RAM, which is supposed to last "forever", and used WinRAR to keep compressed versions on my notebook, after all I'm unlikely to use them a lot. Now I've been a registered user of WinRAR since 2004, my keyfile is dated 2004-03-17T14:23:34, but I still use version 4.2, which allows me to insert "Authenticity Verification" data (which is marginally more useful than totally useless, but it's cute) into the files.

However, as these are DOS programs, I thought is would be neater to use the original RAR for DOS, for which I never forked out any money, as I had been a ZIPper, and that meant the archives would be "naked". So, "I've got 'no', and things can't get any worse anyway", I sent an email to the first of the two "drivers" mentioned at the top of this page, Eugene Roshal, the guy who developed RAR, asking him for a virtual ride:

Eugene,

I've been a registered user of WinRAR since forever, my rarreg.key is dated "2004-03-17 14:23", but I never registered for the DOS version, and lately I find myself doing lots of work with both pure DOS and DOS running in a VM, and I find myself reusing RAR for DOS frequently, and it would be "nice" if I could add AV to those archives, so would it be possible to still get an AV key for V2.50? If yes, you'd make me a happy bunny, if not, "C'est la vie…"

My registration text for WinRAR is "Robert AH Prins".

And within 24 hours(!) Eugene replied with a

Please try the following text and code:

Robert AH Prins
16 HEX DIGITS

giving me a registered version of RAR for DOS. Thank you Eugene, very, very classy!

What came next…

Having completed the above steps, and with still to much free time on my hands, I have thought about what I could do next.

One very obvious thing would be to look at the generated code, TP3 wasn't exactly an optimising compiler, and it would be, there I go again, "fun", to see if there are ways of improving the generated code, without resorting to unmaintainable inline statements, or external, hardly less unmaintainable, binary files.

So I reinstalled my old registered version of IDA Pro, and started by disassembling "turbo.com", the tiny, its size is just 39,671 bytes, file that contains an editor and a compiler and and a R(un)T(ime)L(ibrary).

Click here to compare this requirement with those of today's offerings?

M$ Visual Studio 2019 requires from 800Mb to a staggering 210Gb (Yes, TWO HUNDRED-AND-TEN GIGABYTES) of disk space, check out "The Bloatware Debate", M$ doesn't seem to have learned a lot in 22 years…

Why looking at the compiler? Simple, every program created by the compiler starts with a little over 10k copied straight from "turbo.com", the RTL, and although there's not much you can do about that, it is useful to have the code to check out what routines are called from your own code.

Disassembling "turbo.com" threw up yet another snag, DOS .COM files are supposed to consist of just one segment. Of course, "turbo.com" written by a pretty brilliant Danish guy, Anders Hejlsberg, isn't your "run-of-the-mill.COM" and IDA just ended up disassembling data as code, which isn't very useful. Having never encountered this issue in all the years I had used IDA, I went into the help, found how to create segments, and failed miserably trying to do so.

My IDA Pro license is a perpetual one, but access to Hex Rays forum requires a current one, so I posted a question on Stackexchange, and in news:comp.lang.asm.x86, and in another case of ""I've got 'no', and things can't get any worse anyway", sent off an email to IDA's brilliant developer, Ilfak Guilfanov, also asking him for a "virtual" ride towards a solution.

It took him a few days to reply, explaining me how to set up the multiple segments for these "frankencom" files, but by the time I received his reply I had already managed to find a solution, throwing a multi-segment executable into it, let IDA analyse it, and save the disassembly in an IDC script file. Hacking that one had already given me a solution to the problem.

However, I would still like to give Ilfak a virtual feather on his hat, Hex Rays cares about its customers, just like win.rar GmbH, and probably other small companies, where a customer is first and foremost a person, and not just another account.

For what it's worth, Hex Rays regularly produces a freeware version of IDA. It's limited to the x86 architecture, you (obviously) don't get access to their forums or the decompiler, but for the rest it seems to be the real deal, and what's more, I discovered that it's actually still capable of processing, after minor changes, the IDC files produced by my version (4.7) of IDA, and here's a WinRAR'ed version of tc301.idc that will create a commented disassembler listing of TURBO.COM V3.01a, with a filesize of 39,671 bytes, and a timestamp 1985-03-01 03:33.

Looking at the TP3 code

I don't want to discuss what's going on after the first 10kb of "turbo.com", as I am a total noob when it comes to parsing source, building parse trees, and generating code, but the first 10kb makes up the RTL, and its code is called when you read in files, write out variables in human readable form, look for characters in a string, etc, etc, etc, and because every program relies on it, the quality of the code should be as high as possible. Does this RTL pass that test? Let's say the question is rhetorical…

Although? Some of the code in it made it virtually unchanged into the RTL of Delphi 1, in casu the code that deals with those exotic 6-byte "real". Sadly, the "although" should probably be qualified. Anders H wrote (or so I assume) this code, it worked, and there's the old rule, "If it ain't broke, don't fix it!", because in the early 1990's a German student, Norbert Juffa, now living in San Jose, having worked for IIT, AMD and Nvidia, and considered one of the foremost experts on everything FPU, decided to have a look at the RTL of TP6.

The result was code that was both faster, but, far more significant, it made 6-byte real arithmetic as IEEE 754 compliant as possible, within the limitations of the non-IEEE format! It might be, there I go again, fun, to transplant some, or even all of Norbert's code into the TP3 RTL!

…and at my code

Just as Anders H's RTL code survived until the very end, so does a lot of the code I wrote in 1994, but what did the machine code look like all those years ago? To find out I loaded the last TP3 coded version of "lift.com" into IDA, and realised that it would be useless going through it, without knowing what came from where in the source, something modern compilers keep track off, and spit out in their MAP files.

So what should I do, having come this far? Look at the creation of stack-frames at the start of each procedure, count trough the list of procedures extracted in the correct order from the sources? Sure, it would have been an option, but I decided to use the coding style I had taken from PL/I to Pascal right from the beginning to my advantage.

Explanation?

From the moment I started coding in PL/I, way back in 1985, we, the group of trainees, were encouraged to end every PL/I procedure starting with 'myproc: proc;' with 'end myproc;' and in the current release of IBM's Enterprise PL/I there's an option to potentially force it.

Pascal doesn't allow it, but of course nobody can stop a programmer from ending procedures with 'end; {myproc}' rather than just 'end;', and that's what I had been doing since the moment my father had bought his copy of TP3.

"Yes, and how does that help, comments don't appear in the executable?" Of course they don't, but they make it, there's REXX again, easy to detect where procedures start and end in the source, a trifle complicated by the fact that Pascal allows procedures to be nested, which means you potentially have to stack names.

So knowing where procedures start and end allows you to read the source and insert eye-catchers into the code, and given that TP3 encodes string-literals right into the source, my first thought was to add a global "my-eye: string[255]" variable to the program, and insert a 'my_eye:= 'name‑of‑proc'; statement right after the initial 'begin' statement. Sadly such assignments in TP3 are costly, and I found a better way.

You're a geek? Cleek!

If a procedure is executed thousands of times you don't really want the code to accidentally be left in your final executable!

In the PL/I version of "lift" dating back to that period, there is some code, when run with certain compiler options of the old OS compiler, that actually breaks the RTL because some statements are executed nearly 2**32-1 times and adding their execution counts overflows an internal 31-bit counter, resulting in, unless you cater for the hardware exception that this causes, a program that ABENDs after it has finished with your code, giving you an error location that makes you scratch your head, I've been there and got the T-shirt.

So how do you add an eye-catcher that lets you find the procedure, without it actually resulting in code being executed? Sadly, correct me if I'm wrong, in a high-level language like Pascal this is probably not possible, but you can minimise effects, and the easiest solution is just to code a jump over the eye-catcher, and so my 69 line REXX exec creates, for a procedure with a name that's n characters long, a jump of n+1 characters and after the jump it creates a Pascal type string (i.e. a byte containing 'n' followed by n the decimal ASCII equivalents of the uppercased procedure name), allowing me to convert it into readable text with IDA's 'A' command.


After having recompiled the code, and loading the file back into IDA it was child's play to look at the code generated for my sources, and I can't say I was overwhelmed with a warm feeling.

Who should be blamed for this?

AD 2021 probably nobody cares anymore. Anders H did, with probably very few exceptions, the best he could on a CPU with a far from ideal instruction set.

And what might still be coming?

It's of course a bit silly, or should I say "Utter madness?", to go back to these programs and try to see, as I wrote before, if it's possible to generate more optimal code, however,

the past doesn't need to be an indication of the future but,

More than two decades ago, while working at Willis Corroon, my colleagues came to me when something needed to be optimised, as I had an uncanny ability to find hotspots in the assembler listings generated by IBM's OS PL/I compiler, without actually being able to write assembler. On one occasion a change I made, merging two fields in a structure, resulted in the CPU time of a program going down from well over three hours, to just 20 minutes!

Something similar happened during the year (1996) I worked for KLM. I found out that some code, using a self-referential structure was using CPU like there was no tomorrow, and suggested its processing should be moved to a subroutine, using a calling convention that's unique (is it?) to PL/I. The result? CPU usage turned from a mountain into a molehill, and when I wrote Peter Elderon, IBM's lead PL/I developer, he almost literally copied the text of my email into the Programming Guide, where, even now, some 25 years later, you can still find it on page 320 (of the Enterprise PL/I for z/OS 5.3 Programming Guide), where it reads:

When your code refers to a member of a BASED structure with REFER, the compiler often has to generate one or more calls to a library routine to map the structure at run time. These calls can be expensive, and so when the compiler makes these calls, it will issue a message so that you can locate these potential hot-spots in your code.

If you do have code that uses BASED structures with REFER, which the compiler flags with this message, you might get better performance by passing the structure to a subroutine that declares a corresponding structure with * extents. This will cause the structure to be mapped once at the CALL statement, but there will no further remappings when it is accessed in the called subroutine. [RP: Emphasis added to my contribution]


I can actually understand a fair bit of x86 assembler, and having IDA generated compiler output right next to source code, would allow me to try various "what…if" scenarios, and there might even be some (limited) scope to backport some of the current VP code to TP3. There's definitely one small routine, to strip commas out of the input, that I already backported to Pascal from the assembler code suggested by, there's another huge Sequoia in the forest of x86 assembler programmers, Terje Mathisen.

Of course getting a disassembled version of "turbo.com" that would actually reassemble to a byte-4-byte equivalent version of itself would also be an interesting project.

And an ultimate "time waster" would be to load a few old versions of "lift.com" into IDA, and compare the generated code with the AD 2021 recompiled versions, just to see why, in quite a few cases, there is a 12-byte difference in file size.

Final words?

On this page I wrote a lot about DOS, and when I use DOS, there's one program that I cannot live without, and it's also a program that never really made it to Windoze. I'm talking about yet another tiny miracle, the last regular version counts just 27,809 bytes, but its developer gave me the "enhanced" version as a reward for finding a Y2K bug that caused the version of a "helper" program, "FV.COM" to be bumped up from 1.45 to 2.00.

I'm talking about "LIST" from the late, he died more than a decade ago, on 30 December 2009, Vernon D. Buerg. I learned about this about a year later, when no new versions were being released, and tried to contact his surviving relatives, to find out if any of them knew anything about the source code of "LIST". Sadly, I failed. For your convenience, you can find a copies of "list96y1[.rar]", the last version, and "list77a.rar]", the last version that's supposed to support network drives, on my site.

Last updated on 11 June 2021 (Remove some typos)


Flags