Keeping statistics - one way of doing it

So you want to keep track of your rides (see also [[Statistics]]) and, once you've managed to get safely back home, do something with the data you've collected? If yes, read on...

This work-in-progress page is based on the notes I, Prino, have used since I started recording my rides on 16 June 1980 @ 7:47. The notes I use are pretty simple, you can obviously adapt them to your own requirements.

A possible form to record data

../images/liftnote.png
Prino's liftnote
The forms I use to record my data look like this and properly spaced you can put three columns with four rows on a sheet of A4 (210x297mm) paper. Fitting three columns on a US "Letter" size sheet (215.9x279.4mm) of paper might be possible if the column widths are reduced. I save my notes in 17-ring binders.

Need a translation?

The key to the detail of the produced statistics is the Opmerkingen section. I've simply called it Opmerkingen as you are unlikely to add specific comments to all of your rides...

The fields

What follows is a description of the various fields in the above format. There is one field that might seem to be missing, 'Waiting Time', but it should be obvious that it's automagically included as the difference between the arrival time of one ride and the departure time of the next ride, and in those cases where something happens in between, the Opmerkingen section comes to the rescue.

Date

Pretty obvious.

If you use the ISO 8601 format (YYYY-MM-DD), it's easy to extend this to rides spanning multiple days by modifying the format into YYYY-MM-DD/DD.

Departure/Arrival

Again pretty obvious.

The three columns contain the time, odometer and place of departure and arrival. In cases where no odometer is available, or where it doesn't work, you can use Google maps to determine distances, I've found that it is usually accurate to the nearest kilometre

The unnamed row below 'Departure' contains the total time and distance of the ride.

Speed

The contents of this field depends on individual preferences. I put the real speed in it, i.e. the distance divided by the actual driving time, which is the arrival time minus the departure time minus any time recorded for stops.

Opmerkingen

This is a free-format field that you can use for any purpose you like. Why free format? Easy, it's unlikely that you would need a specific format for all of your rides, e.g. why include specific headings for rides that cross borders or span more than one day, when the number of such rides is likely to be pretty small compared to 'normal' rides.

Here's a non-exhaustive list of things I've been using this field for:

There are probably lots of other things you might want to put into it.

If the Opmerkingen section is too small (and I've had rides with a dozen stops and going across five borders in two days), I usually continue on the back of the form - see the two samples somewhere on this page.

Storing the data on a PC

This is likely to be the most important decision you will have to make. There are (at least) three options:

  1. a (structured) text file, to be processed by a user-written program
  2. a spreadsheet
  3. a database

Each of these options has its pros and cons, here are some details:

Text file

I use the text file option with a few programs I've written myself. The advantage of using this format is the fact that it allows me total flexibility, but it has a pretty big disadvantage in that you have to think very carefully about the format you plan to use: it should be able to cater for future changes without you having to completely rewrite your programs. My format, described later, was developed over about 25 years and despite that fact that I've moved to a new format after a few years, the result of not giving the format enough thought initially, is rather cryptic due to more additions since adopting it!

A spreadsheet

If you're well versed in spreadsheets (or if not, try LibreOffice, it's free) you might want to consider using one to process your data. It will have the big advantage that you can insert or delete columns in your source data and the program will automagically update the references in all other cells and/or sheets. Combined with the many conditional functions, you might(probably) be able to produce any statistics you like, although some of the more esoteric ones my program creates will be pretty hard (or even impossible) to replicate.

A database

What was written about spreadsheets also holds true for databases. Not having used any PC database programs, I cannot recommend any, but there are plenty of free ones, LibreOffice and MariaDB, to name just two of the more well known ones. Creating your statistics will mean writing queries (most likely in the fairly easy to learn language SQL), but given the non-procedural nature of this language, some results that can be created with a self-written program or a spreadsheet may be hard or even impossible to recreate.

Prino's original program

As mentioned above, and being a programmer by profession, I selected the first method of storing the data, a text file. The first 60(!) versions of the program were written in Turbo Pascal (V3.01a) and until about version 20 they used 'Version I' of a simple CSV file with the data. It could handle rides passing through multiple countries and spanning more than one day, but did not know anything about ferry crossings, stops or time-zones, to name but a few of the things that arrived later...

Given that the old format became obsolete a long time ago, I've not included any details about it, but its output mimic'ed my manually created five tables per trip, containing:

Simple, uncomplicated and one might assume that most hitch-hikers would leave it at this...

Prino's current program

The current program is written in Virtual Pascal V2.1.279 (or PL/I, should you want to run it on IBM's z/OS). It is licensed under the provisions of the GPL V3. The authenticity verified WinRAR archive containing the sources and executable files can be found on Prino's Google Drive.

Data format used by Prino's current program

The 'simple' format was used until the end of 1994. Due to the fact that I wanted to add some additional statistics to the output files, it was changed into something a bit more logical, although some people might find otherwise. (And they are right, it's a right-royal mess due to more additional requirements, and I would like to simplify some of the more esoteric uses of punctuation, but that won't happen until I get back onto the 'big iron' with its superb debugging facilities!)

The current format, split into two parts to avoid scrolling, looks like this:

....v....1....v....2....v....3....v....4....v....5....v....6....v....7....v....8....v....9.. 999, 9999, AAA, 99999.9, HHH.MM, 999.9, NAT, TYPE, CTY, HH.MM, S, HH.MM, HH.MM, YYYY-MM-DD | | | | | | | | | | | | | | a b c d e f g h i j k l m n ..v....0....v....1....v....2....v....3....v....4....v....5....v....6....v....7....v....8....v....v... , 999999.9, DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD , AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA | | | o p q

and the fields used in it are:

Col   1 -   3 ("Trip")
Col   6 -   9 ("Ride")
Col  13 -  15 ("Day")
Col  18 -  24 ("Distance")
Col  27 -  32 ("Time")
Col  34 -  40 ("Velocity")
Col  43 -  45 ("Nationality")
Col  48 -  51 ("Type")
Col  54 -  56 ("Country")
Col  59 -  63 ("Wait")
Col        66 ("Split")
Col  69 -  73 ("Departure time")
Col  76 -  80 ("Arrival time")
Col  83 -  92 ("Date")
Col  95 - 102 ("Odometer")
Col 105 - 151 ("Place of departure")
Col 155 - 202 ("Place of arrival")

Notes:

 [1] Prino uses DISTINGUISHING SIGNS USED ON VEHICLES IN INTERNATIONAL TRAFFIC and the List of international vehicle registration codes on Wikipedia.

 [2] Given that Prino has never encountered a triple crossing with a backward moving time-zone (e.g. F-GB at midnight), he doesn't expect the program to handle them.

 [3] Prino never records the waiting time before the first ride of the day, unless the first ride of the day happens to be one that continues directly after the last ride of the previous day, without any intervening (sleep?) stop.

 [4] The normal "." separator in the waiting time must be replaced by a ":" (colon) for those waits caused by the departure from a ferry terminal where you haven't been able to get a ride on the ferry (and may have had to wait until the next ferry...)

 [5] Multiple split lines may be present, but they must be grouped, and the groups must be in "#", "*", "!" order!

 [6] The normal "." separator in the departure time must be replaced by

 [7] Non-stopping (i.e. drive-through) border crossings must not record an arrival time, unless the time-zone changes.

 [8] The normal "." separator in the arrival time of a time-zone changing border crossing must be replaced by

 [9] The places of departure and arrival must be UFT-8 encoded and their length is (currently) limited to 47 bytes!

[10] Lines should not contain trailing blanks, but completely blanks lines, consisting of just a CR/LF, are allowed.

[11] Although the data is in CSV format, all positions are fixed!

[12] Lines starting with "{" in column 1 contain are treated as comment or meta-data. A description of the meta-date format can be found below.

[13] Columns containing numerical data or times must be right aligned, columns containing text must be left aligned, and if the textual data is shorter than the column width, the separating comma may follow it without intervening blanks.

[14] The "Odometer", "Place of departure", and "Place of arrival", are (currently) completely ignored by the main "lift" program.

Meta data
As I didn't want to make the format of the data even more complex than it already is, I decided to allow for comments and meta-data to be embedded into the input file. Both comments and meta-data start with a "{" and can be up to 255 characters long.

The (currently) defined, but only partly used types of meta-date are:

Time-zone information
The meta-data contained in lines starting with "{Z-" or "{Z+" provide information about the time-zones for the countries passed in a trip. The difference between the two variants is that

All lines containing comments and meta-data share their first character, a "{", which also happens to be the character that starts a comment in Pascal, which happens to be the language my programs are written in...

Col         1
Col         2
{Z?CTY sNN CTY sNN CTY sNN ...
....v....1....v....2....v....3
These lines contain time-zone information. The format of these comments is:
Pos Description
1..2 Time-zone identifier, '{Z'
3
  • - : the remaining data on the line will completely replace the current time-zone info.
  • + : the remaining data on the line will be added to the current time-zone information, possibly overwriting existing information. This option can be used in multi-zonal countries to update the time-zone for the country.
4 three letter abbreviation for the country
7 blank
8..11 zone difference from a default (your?) country
11 blank
Pos 4..11 can be repeated up to 31 times. Should a trip pass through more than 31 countries or should you wish to include all countries in one single place, additional '{Z+...' lines must be used.

The program can handle up to 256 countries, which is more than the current number of countries on Earth, but it still requires a change to handle fractional time-zones, for countries like Iran (UTC +3.30), India (UTC +4.30)
{< aaaa bbbb cccc - lorum ipsum
....v....1....v....2....v....3...
{> - lorum ipsum
These lines allow the rides of a second person, provided they are an exact subset of the rides of the first person, to be extracted into their own file. The essential parts of the lines are:
Pos Description
1..2 Identifier, "{<" - Start of second-person data / "{>" - End of second-person data
04..07 value to subtract from main file trip number to create the trip number for the second person
09..12 value to subtract from main file ride number to create the ride number for the second person
14..17 value to subtract from main file day number to create the day number for the second person
20..EOL Name of second person, used to extract specific records (on "{<" record)
06..EOL Name of second person, used to extract specific records (on "{>" record)
{W These lines allow for alternate descriptions of the departure and arrival locations. I (Prino) use them to add consistent descriptions and English translations to my original history data. The data is used by the DAT2CSV and H-H2WIKI programs, and should be in the format of the normal data, but only the departure and arrival locations should be used, the rest of the line should be blank.
Note: Any line starting with '{' that does not fit into any of the above categories is ignored completely, i.e. treated as a comment!

The results of Prino's current program

The current program produces rather a lot more output than the five tables per trip! In fact it now produces four files and an optional additional post-processor program that translates the output into .RTF format creates two additional files with two tables sorted in various other orders.

The summary output file: 'summ.h-h'

This file contains no less than 86 tables (some of them broken into several parts because they would otherwise require A3 or A2 size paper). Here's the full list, the examples given are based on the first two trips of my hitch-hiking career:

  1. two tables of general totals for every trip
  2. a table of totals for all distances
  3. a table of totals for all types
  4. a table of totals for all countries
  5. a table of totals for all nationalities
  6. a table of totals for all speeds
  7. three tables of totals for all waits
  8. two tables of ferry related waits
  9. three tables of pick-ups
  10. a table with the distribution of departure times per weekday
  11. a table with the first and last ride for all distances
  12. a table with the first and last ride for all types
  13. a table with the first and last ride for all countries
  14. a table with the first and last ride for all nationalities
  15. a table with the first and last ride for all speeds
  16. two tables of waits per trip, split in short and long waits
  17. a table of waits per country, split in short and long waits
  18. a table of waits per weekday, split in short and long waits
  19. a table of waits per month, split in short and long waits
  20. a table of waits per year, split in short and long waits
  21. a max/min/average summary for all rides
  22. a max/min/average summary for all days
  23. a max/min/average summary for all types
  24. a max/min/average summary for all nationalities
  25. a max/min/average summary for all countries
  26. a table of rides per country, split in internal and border crossing rides
  27. four tables for the max/min speed & max/min rides for a given number of distances
  28. four tables for the max/min speed & max/min distance for a given number of rides
  29. four tables for the maximum number of rides exceeding a number of selected velocities, maximized for the number of rides and the distance,
  30. four tables for the maximum number of rides exceeding a number of selected lengths, maximized for the number of rides and the distance,
  31. a max/min/average summary for all rides per year
  32. a max/min/average summary for all days per year
  33. a table of totals for all distances per trip
  34. a table of totals for all speeds per trip
  35. a table of totals for all distances per day
  36. a table of totals for all speeds per day
  37. a table with the first and last day for all distances
  38. a table with the first and last day for all speeds
  39. four tables for the max/min speed & max/min days for a given number of distances
  40. four tables for the max/min speed & max/min distance for a given number of days
  41. four tables for the maximum number of days exceeding a number of selected velocities, maximized for the number of days and the distance,
  42. four tables for the maximum number of days exceeding a number of selected lengths, maximized for the number of days and the distance,
  43. a table with totals per weekday
  44. a table with totals per month
  45. a table of general totals per year
  46. a table with first/last ride/trip per year
  47. a table with usage of days per year
  48. a table of totals for consecutive days
  49. a table of totals for 24 hour periods
  50. a table of totals for 365 day periods
  51. a table of minimum number of rides needed for selected numbers of nationalities
  52. two tables (one per trip, one per year) with the number of types, countries and nationalities encountered during the trip/year, split in a total and a "new" column
  53. four tables (two per type, two per nationality) with
  54. a table of pickup times per 4-hour interval per country
The trip/type/country/nationality/year output file: 'lift.h-h'

This file contains

However, some logical pages may overflow physical pages, which is most likely to happen with the page that contains your most seen type, especially if you've visited a fair amount of countries.

The set of programs contains an optional program to remove all data that does not relate to the current trip from this file, leaving only

which is kinder to trees, if you insist on also keeping the results on paper.

The daily summary output file: 'days.h-h'

This file contains one table with a line for every calendar day of every trip, detailing

A follow-up program will process this file, putting the original single column data in four columns of 70 rows. It also sorts the file into three additional orders, Distance, Time and Velocity. If the data is required to be in .RTF format, this program is required.

The formatted input data output file: 'trip.h-h'

This file merely puts the input data into a neat table (zapping the odometer and place of departure & arrival columns). The program will paginate trips that do not fit on A4 paper.
Free counters!