Keeping statistics - one way of doing it

So you want to keep track of your rides and, once you've managed to get safely back home, do something "useful" with the data you've collected? If yes, read on...

This work-in-progress page is based on the notes Prino has used since he started recording his rides on 16 June 1980 @ 7:47. The notes are pretty simple, you can obviously adapt them to your own requirements.

A possible form to record data

../images/liftnote.png
Prino's liftnote
The forms Prino uses to record his data look like this and properly spaced you can put three columns with four rows on a sheet of A4 (210x297mm) paper. Fitting three columns on a US "Letter" size sheet (215.9x279.4mm) of paper might be possible if the column widths are reduced. Prino saves his notes in 17-ring binders.

Need a translation?

The key to the detail of the produced statistics is the Opmerkingen section. It's simply called Opmerkingen as you are unlikely to add specific comments to all of your rides...

The fields

What follows is a description of the various fields on the above form. There is one field that might seem to be missing, "Waiting Time", but it should be obvious that it's automagically included as the difference between the arrival time of one ride and the departure time of the next ride, and in those cases where something happens in between, the Opmerkingen section comes to the rescue.

Datum (Date)

Pretty obvious.

If you use the ISO 8601 format (YYYY-MM-DD), it's easy to extend this to rides spanning multiple days by modifying the format into YYYY-MM-DD/DD.

Vertrek (Departure) / Aankomst (Arrival)

Again pretty obvious.

The three columns contain the Tijd (Time), KM-stand (Odometer) and Plaats (Location) of departure and arrival. In cases where no odometer is available, or where it doesn't work, you can use Google maps to determine distances, Prino has found that it is usually accurate to the nearest kilometre.

The unnamed row below "Vertrek" contains the total time and distance of the ride.

Snelheid (Speed)

The contents of this field depends on individual preferences. Prino puts the real speed in it, i.e. the distance divided by the actual driving time, which is the arrival time minus the departure time minus any time recorded for stops.

Opmerkingen (Miscellaneous notes)

This is a free-format field that you can use for any purpose you like. Why free format? Easy, it's unlikely that you would need a specific format for all of your rides, e.g. why include specific headings for rides that cross borders or span more than one day, when the number of such rides is likely to be pretty small compared to 'normal' rides.

Here's a non-exhaustive list of things Prino has been using this field for:

There are probably lots of other things you might want to put into it.

If the Opmerkingen section is too small (and Prino has had rides with a dozen stops and going across five borders in two days), he usually continues on the back of the form.

Storing the data on a PC or other device

This is likely to be the most important decision you will have to make. There are (at least) three options:

  1. a (structured) text file, to be processed by a user-written program
  2. a spreadsheet
  3. a database

Each of these options has its pros and cons, here are some details:

Text file

Prino uses the text file option with a few programs he has written himself. The advantage of using this format is the fact that it allows him total flexibility, but it has a pretty big disadvantage in that you have to think very carefully about the format you plan to use: it should be able to cater for future changes without you having to completely rewrite your programs. The format used by Prino, described later, was developed over about 25 years and despite that fact that he has moved to a new format after a few years, the result of not giving the format enough thought initially, is rather cryptic due to more additions since adopting it!

A spreadsheet

If you're well versed in spreadsheets (or if not, try LibreOffice, it's free) you might want to consider using one to process your data. It will have the big advantage that you can insert or delete columns in your source data and the program will automagically update the references in all other cells and/or sheets. Combined with the many conditional functions, you should (probably) be able to produce any statistics you like, although some of the more esoteric ones Prino's program creates will be pretty hard (or even impossible) to replicate.

A database

What was written about spreadsheets also holds true for databases. Not having used any PC database programs, Prino cannot recommend any, but there are plenty of free ones, LibreOffice and MariaDB, to name just two of the more well known ones. Creating your statistics will mean writing queries (most likely in the fairly easy to learn language SQL), but given the non-procedural nature of this language, some results that can be created with a self-written program or a spreadsheet may be hard or even impossible to recreate.

Prino's original program

As mentioned above, and being a programmer by profession, Prino selected the first method of storing the data, a text file. The first 60(!) versions of his program were written in Turbo Pascal (V3.01a) and until about version 20 they used "Version I" of a simple CSV file with the data. It could handle rides passing through multiple countries and spanning more than one day, but did not know anything about ferry crossings, stops, or time-zones, to name but a few of the things that arrived later...

Given that the old format became obsolete a long time ago, it's not interesting to go into the nitty-gritty of it, but its output mimic'ed his manually created five tables per trip, containing:

Simple, uncomplicated and one might assume that most hitch-hikers would leave it at this...

Prino's current program

The current program is written in Virtual Pascal V2.1.279 (or PL/I, should you want to run it on IBM's z/OS). It is licensed under the provisions of the GPL V3. The authenticity verified WinRAR archive containing the sources and executable files can be found on Prino's Google Drive.

Data format used by Prino's current program

The 'simple' format was used until the end of 1994. Due to the fact that Prino wanted to add some additional statistics to the output files, it was changed into something a bit more logical, although some people might find otherwise. (And they are right, it's a right-royal mess due to more additional requirements, and Prino would like to simplify some of the more esoteric uses of punctuation, but that's unlikely to happen anytime soon as he has a list of additional tables he would like to add first)

The current format, split into two parts to avoid scrolling, looks like this:

....v....1....v....2....v....3....v....4....v....5....v....6....v....7....v....8....v....9.. 999, 9999, AAA, 99999.9, HHH.MM, 999.9, NAT, TYPE, CTY, HH.MM, S, HH.MM, HH.MM, YYYY-MM-DD | | | | | | | | | | | | | | a b c d e f g h i j k l m n ..v....0....v....1....v....2....v....3....v....4....v....5....v....6....v....7....v....8....v....9....v....0. , 999999.9, DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD , AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA | | | o p q

and the fields used in it are:

Col   1 -   3 ("Trip")
Col   6 -   9 ("Ride")
Col  13 -  15 ("Day")
Col  18 -  24 ("Distance")
Col  27 -  32 ("Time")
Col  34 -  40 ("Velocity")
Col  43 -  45 ("Nationality")
Col  48 -  51 ("Type")
Col  54 -  56 ("Country")
Col  59 -  63 ("Wait")
Col        66 ("Split")
Col  69 -  73 ("Departure time")
Col  76 -  80 ("Arrival time")
Col  83 -  92 ("Date")
Col  95 - 102 ("Odometer")
Col 105 - 151 ("Place of departure")
Col 155 - 202 ("Place of arrival")

Notes:

 [1] Prino uses DISTINGUISHING SIGNS USED ON VEHICLES IN INTERNATIONAL TRAFFIC and the List of international vehicle registration codes on Wikipedia. Due the fact that two of these have changed after Prino encontered them for the first time, he still uses "SF" for Finland and "ROU" for Finland and Uruguay, rather than the current "FIN" and "UY".

 [2] Given that Prino has never encountered a triple crossing with a backward moving time-zone (e.g. F-GB at midnight), he doesn't expect the program to handle them.

 [3] Prino never records the waiting time before the first ride of the day, unless the first ride of the day happens to be one that continues directly after the last ride of the previous day, without any intervening (sleep?) stop.

 [4] The normal "." separator in the waiting time must be replaced by a ":" (colon) for those waits caused by the departure from a ferry terminal where you haven't been able to get a ride on the ferry (and may have had to wait until the next ferry...)

 [5] Multiple split lines may be present, but they must be grouped, and the groups must be in "#", "*", "!" order!

 [6] The normal "." separator in the departure time must be replaced by

 [7] Non-stopping (i.e. drive-through) border crossings must not record an arrival time, unless the time-zone changes.

 [8] The normal "." separator in the arrival time of a time-zone changing border crossing must be replaced by

 [9] The places of departure and arrival must be UTF-8 encoded and their length is (currently) limited to 47 bytes!

[10] Lines should not contain trailing blanks, but completely blanks lines, consisting of just a CR/LF, are allowed.

[11] Although the data is in CSV format, all positions are fixed!

[12] Lines starting with "{" in column 1 are treated as comment or meta-data. A description of the meta-date format can be found below.

[13] Columns containing numerical data or times must be right aligned, columns containing text must be left aligned, and if the textual data is shorter than the column width, the separating comma may follow it without intervening blanks.

[14] The "Odometer" data is (currently) completely ignored by the main "lift" program.

Metadata

As Prino didn't want to make the format of the data even more complex than it already is, he decided to allow for comments and meta-data to be embedded into the input file. Both comments and meta-data start with a "{" and can be up to 255 characters long.

The (currently) defined, but only partly used types of meta-data are:

Col         1
Col         2..
Time-zone information
The meta-data contained in lines starting with "{Z-" or "{Z+" provide information about the time-zones for the countries passed in a trip. The difference between the two variants is that time-zone information following a "{Z-" completely replaces the current set of countries/time-zone pairs, whereas "{Z+" will add additional countries/time-zone data pairs, or replace existing ones. The option can be used in multi-zonal countries, or in trips that span the change from winter- into summer-time, or the reverse.

Up to 31 country/time-zone data pairs immediately follow the "{Zx" and consist of eight characters,
Note that if a trip spans more than 31 countries, time-zone data for additional countries can be added via "{Z+" meta-data.
Partner data
The meta-data contained in lines starting with "{Z<" and "{Z>" provide a means of embedding data from (a) hitchhiking partner(s) in your own data, provided the partner data is an exact subset of your own data.

The "<" character must be followed by ␣tttt␣rrrr␣dddd␣-␣name-of-partner where

indicates a single blank character
tttt
is the value that will be subtracted from the "Trip" to give the actual trip number for the partner
rrrr
is the value that will be subtracted from the "Ride" to give the actual ride number for the partner
dddd
is the value that will be subtracted from the "Day" to give the actual day number for the partner
name-of-partner
will be used by the get-aud program when data for a specific partner needs to be extracted

The ">" character must be followed by ␣-␣name-of-partner to indicate the end of the embedded data.
Interruption information
The meta-data contained in lines starting with "{I" provide a means of embedding information for use in h-h2html, such as non-hitching breaks, or even addtional html!

The use of the "I" character is further explained in the notes following the description of that program.
In-trip split-year indicator
The "*" character is currently not used by any programs, it may be used in the future by a to-be-written program splitting the input file into separate per-year files.
Google short URL
The "H" character is currently not used by any programs. It is followed by the trip, ride, Google Maps short URL, departure and arrival locations and the distance between them returned by Google.
UTF-8 encoded departure or arrival location
The "U" character is currently not used by any programs. It will (at some time in the future) be used to allow the use of UTF-8 encoded locations that exceed the current 47-byte limit.

The format of "{U" metadata is:

{U␣x␣Αθήνα, where "x" can be either "d"(eparture) or "a"(rrival).
z/OS using EBCDIC
The lowercase "ascii" string metadata is only required if the input file is transferred to z/OS which uses EBCDIC rather than ASCII/UTF-8, and the resulting output needs to be transferred back, via the same procedure, to the original ASCII/UTF-8 based system.

The explanation for the above is rather technical, but it boils down to the fact that there are at least two ways (FTP and IND$FILE, an IBM proprietary protocol) of transferring data from the ASCII/UTF-8 realm to z/OS. Both allow lossless transfers from and to z/OS, but both use (in many cases) different translate tables for ASCII control characters (0x00..0x1F) and characters between 0x80..0xFF (many of which are used to encode UTF-8 characters). Using the ASCII metadata allows programs written in PL/I (and COBOL) to detect UTF-8 characters no matter which transfer procedure was used. Feel free to email me if you want a full explanation.

The results of Prino's current programs

The current programs produce rather a lot more output than the five tables per trip! In fact the main program now produces seven files and an optional additional post-processor program that translates the output into .RTF format creates two additional files with two tables sorted in various other orders. The programs and files, assuming their default names, they produce are:

The main program, lift

lift produces seven files, described in the following sections.

"summ.h-h" - A general summary

This file contains no less than 100 tables (some of them broken into several parts because they would otherwise require A3 or A2 size paper). Here's the full list:

  1. two tables of totals for every trip
  2. a table of totals for all distances
  3. a table of totals for all types
  4. a table of totals for all countries
  5. a table of totals for all nationalities
  6. a table of totals for all speeds
  7. three tables of totals for all waits
  8. two tables of ferry related waits (*)
  9. three tables of pick-ups
  10. a table with the distribution of departure times per weekday

  11. a table with the first and last ride for all distances
  12. a table with the first and last ride for all types
  13. a table with the first and last ride for all countries
  14. a table with the first and last ride for all nationalities
  15. a table with the first and last ride for all speeds
  16. two tables of waits per trip, split in short and long waits
  17. a table of waits per country, split in short and long waits
  18. a table of waits before departure times (also s & l split)
  19. a table of waits per weekday (also s & l split)
  20. a table of waits per month (also s & l split)
  21. a table of waits per year (also s & l split)
  22. a max/min/average summary for all rides
  23. a max/min/average summary for all days
  24. a max/min/average summary for all types
  25. a max/min/average summary for all nationalities
  26. a max/min/average summary for all countries
  27. a table of rides per country, split in internal and border crossing rides
  28. a table of rides per country, split in native and "foreign" drivers
  29. four tables for the max/min speed & max/min rides for a given number of distances
  30. four tables for the max/min speed & max/min distance for a given number of rides
  31. four tables for the maximum number of rides exceeding a number of selected velocities, maximized for the number of rides and the distance,
  32. four tables for the maximum number of rides exceeding a number of selected lengths, maximized for the number of rides and the distance,
  33. a max/min/average summary for all rides per year
  34. a max/min/average summary for all days per year
  35. a table of distribution of distances per trip
  36. a table of distribution of velocities per trip
  37. a table of distribution of distances per day
  38. a table of distribution of speeds per day
  39. a table with the first and last day for all distances
  40. a table with the first and last day for all speeds
  41. four tables for the max/min speed & max/min days for a given number of distances
  42. four tables for the max/min speed & max/min distance for a given number of days
  43. four tables for the maximum number of days exceeding a number of selected velocities, maximized for the number of days and the distance,
  44. four tables for the maximum number of days exceeding a number of selected lengths, maximized for the number of days and the distance,
  45. a table with totals per weekday
  46. a table with totals per month
  47. a table of general totals per year
  48. a table with first/last ride/trip per year
  49. a table with usage of days per year
  50. a table of totals for consecutive days
  51. two tables of totals for 24 hour periods
  52. a table of totals for 365 day periods
  53. a table of minimum number of rides needed for selected numbers of nationalities
  54. two tables (trip/year) with the number of calendar days, types, countries and nationalities of drivers met during the trip/year, split in total and 'new'
  55. four tables (two per type, two per nationality) with
  56. a table of pickup times per 4-hour interval per country
  57. two (ride + day) sets of three tables with 10% information about rides/days, distances, and times
  58. a table with indicators which countries have been hitched-in or hitched through and with locals or foreigners
  59. a table of nationalities with full country names
  60. three tables showing the progressive maxima for distance, time, and velocity

"lift.h-h" - Detailed summaries per trip, type, country, nationality, and year

This file contains detailed summaries per trip, type (of driver/vehicle), country, nationality (of driver), and year. It contains a section for each of these entities, split by a 'Totals per t/t/c/n/y' separator page.

The "per trip" section"

The "per trip" section contains four pages per trip:

  1. page 1
    1. a table with totals per day
    2. a table of totals for all distances
    3. a table of totals for all types
    4. a table of totals for all countries
    5. a table of totals for all nationalities
    6. a table of totals for all speeds
    7. a max/min/average summary for all rides and days
  2. page 2
    1. a table of totals for all waits
    2. a table of the statistical waiting time distribution
    3. a table of all in-ride waits per category (*)
    4. a table of waits per country, split in short and long waits
    5. a table of successively longer distance per 24-hours
  3. page 3
    1. three tables of pick-ups
      • per nationality per country
      • per country per type
      • per nationality per type
  4. page 4
    1. a max/min/average summary for all types
    2. a max/min/average summary for all nationalities
    3. a max/min/average summary for all countries
    4. two tables detailing distances per country
      • a table listing the (partial) country distances in the order they were passed
      • a table that just summarizes the distance per country

The "per type" section"

The "per type" section (currently) contains one page per type (of driver/vehicle), containing the following five tables:

  1. page 1
    1. a table of totals for all distances
    2. a table of totals for all countries
    3. a table of totals for all nationalities
    4. a table of totals for all speeds
    5. a max/min/average summary for the type

Note that the "per type" section individual per-type pages do not contain a totals-per-type table, as it would contain just a single line with the totals for that particular type. Instead the type is merged into the heading of the totals-for-all-distances table.

The "per country" section"

The "per country" section (currently) contains one page per country travelled in, containing the following four tables:

  1. page 1
    1. a table of totals for all waits
    2. a table of the statistical waiting time distribution
    3. a table with the distribution of departure times
    4. a max/min/average summary for the country, containing two rows, one for the non border-crossing rides, and one for the border-crossing rides

Note that this section does not include a totals-per-country table. Like in the "per type" section, the country is merged into the heading of the first table on the page.

The "per nationality" section"

The "per nationality" section (currently) contains one page per nationality of driver, containing the following five tables:

  1. page 1
    1. a table of totals for all distances
    2. a table of totals for all types
    3. a table of totals for all countries
    4. a table of totals for all speeds
    5. a max/min/average summary for the nationality

Note that the "per nationality" section individual per-nationality pages do not contain a totals-per-nationality table. It follows the format of the two previously described sections, and merges the nationality into the heading of the first table on the page.

The "per year" section"

The "per year" section (currently) contains five pages per year, containing the following tables:

  1. page 1
    1. a table of totals for all distances
    2. a table of totals for all types
    3. a table of totals for all countries
    4. a table of totals for all nationalities
    5. a table of totals for all speeds
    6. a max/min/average summary for all rides and days
  2. page 2
    1. a table of totals for all waits
    2. a table of the statistical waiting time distribution
    3. a table of all in-ride waits per category (*)
    4. a table of waits per country, split in short and long waits
  3. page 3
    1. three tables of pick-ups
      • per nationality per country
      • per country per type
      • per nationality per type
  4. page 4
    1. a max/min/average summary for all types
    2. a max/min/average summary for all nationalities
    3. a max/min/average summary for all countries
    4. a table that summarizes the distance per country, split in non border-crossing and border-crossing rides
  5. page 5
    1. a table of totals for all distances per day
    2. a table of totals for all speeds per day
    3. a table with totals per weekday
    4. a table with totals per month
    5. a table with progressive totals for 24 hour periods
    6. a table with the total period in days hitched during the year

Like in all previous sections, the year is merged into the heading of the first table on page 1 of each "per-year" page.

Additional notes regarding the "lift.h-h" file

The set of programs contains an optional program, newlift, which can be used to remove all data that does not relate to the current trip from "lift.h-h", leaving only

which is kinder to trees, if you insist on also keeping the results on paper.

"days.h-h" - Per day summary

This file contains a single table with a line for every calendar day of every trip, detailing

A follow-up program, dayform, will process this file, putting the original single column data in four columns of 70 rows. It also sorts the file into three additional orders, Distance, Time and Velocity. If the data is required to be in .RTF format, this program is required.

"trip.h-h" - Formatted input data

This file contains the input data into a neat table, omitting the odometer, place of departure and arrival columns. lift will paginate trips that do not fit on a single sheet of A4 paper.

"week.h-h" - Weekday per year summary

This file contains a table with the number of times and distance hitched on every weekday (Mon to Sun) for every year. It does not contain times or velocities.

"mnth.h-h" - Month per year summaries

This file contains three tables with monthly data,

  1. a table containing days hitched and distance per month per year,
  2. a table cumulating the results of the previous table, giving a running total per year, and
  3. a table detailing the first use (or not) of all calendar days,

Like the table in "week.h-h", the first two tables in file do not contain time or velocity data.

"ntop.h-h" - Top-N tables

This file contains sets of three Top-N tables:

  1. three top-50 tables for all rides (for distance, time, and velocity)
  2. three top-10 tables for each trip, type, country, nationality, and year (ditto)

In those cases where there are less than 10 rides in a trip, for a type, in a country, for a nationality, or in a year, the Top-10 may actually reduce to, in some cases, a Top-1, which is then repeated three times for distance, time, and velocity...

Auxiliary programs, newlift and dayform

The stripper, newlift

As mentioned before, newlift is a post-processor for lift, which strips all data not belonging to the last trip(s) from "lift.h-h" and "trip.h-h".

The reformatter, dayform

dayform is another post-processor for lift. It takes in "days.h-h", and spits it out in a multi-column (4x70) format. It also sorts the input file into three additional (distance, time, and velocity) orders and outputs those in the same multi-column format.

Note that newlift actually modifies the "lift.h-h" and "trip.h-h" files, whereas dayform will spit out the result of processing "days.h-h" into a new file called "days.h-c".
Free counters!