Saturday, December 31, 2005

Post Validation

Free Census was originally intended to have three stages, transcription, checking and validation. After Validation, the final file would be cleared for upload, after a final check by the uploader. Who was the man running the whole project - it was very much a one-man band in those days.

However, when the first validated files appeared after a couple of years, it was obvious that Plan A would not work. So a Cornish volunteer, Rick Parsons, produced FCtools, a diagnostic programme. It has a variety of uses, one of which is to turn the final zip back into a spreadsheet and check it for errors. This is the sort of thing that you get:

Warning: row 63, page number = 4 not sequential
Warning: row 95, possibly too many lines on page 4
Warning: row 128, consecutive schedule numbers (30) are the same
Warning: row 663, schedule number = 19 not sequential
Warning: row 883, consecutive schedule numbers (78) are the same
Warning: row 1004, consecutive schedule numbers (112) are the same
Warning: row 1012, schedule number = 192 not sequential
Warning: row 1013, schedule number = 116 not sequential
Warning: row 1144, consecutive schedule numbers (146) are the same
Warning: row 1311, birth place = Willesden N 3 contains unusual characters
Warning: row 1312, birth place = Willesden N 3 contains unusual characters
Warning: row 1630, schedule number = 23 not sequential
Warning: row 2012, schedule number = 1 not sequential
Error: row 2987, field too long , truncated
Error: row 2991, field too long , truncated
Error: row 2996, field too long , truncated
Error: row 3004, field too long , truncated
Error: row 3009, field too long , truncated
Warning: row 3009, consecutive schedule numbers (91) are the same
Error: row 3010, field too long , truncated
Error: row 3013, field too long , truncated
Warning: row 3779, birth place = Walworth S1 contains unusual characters
Warning: row 3844, Head of household is not the first entry in the schedule
Warning: row 4410, possibly too many lines on page 13
Warning: row 4443, possibly too many lines on page 14
Warning: row 4619, first page of ED = 20
Warning: row 4619, first schedule of ED = 119
Warning: row 4956, schedule number = 8 not sequential
Warning: row 4957, schedule number = 85 not sequential
Warning: row 4958, schedule number = 140 not sequential
Warning: row 5122, consecutive schedule numbers (20) are the same
Warning: row 5431, consecutive schedule numbers (71) are the same
Warning: row 5617, consecutive schedule numbers (113) are the same
Warning: row 5649, possibly too many lines on page 20
Warning: row 5898, consecutive schedule numbers (169) are the same
Warning: row 5961, possibly too many lines on page 30

The post-validation involves correcting these errors, if in fact they are errors. It also involves eye-balling the data, because there are some things that, although incorrect, are not picked up by the software. In the case of the COCP returns, FCTools is also the means of producing the html. This gives us the opportunity for a final check; when the html is eye-balled for errors. Some elements have to be introduced by hand at this stage; including the credits for the transcribers and checkers.

The html file goes off to the COCP web pages; the validation file is uploaded to Free Census. We have now uploaded over a million records to the OLDB and about 1.2 million to our own web pages.

Amazing! And it has only taken five years.....

Friday, December 30, 2005


In the original plan, Validation was the third and final stage. However, it is in fact the penultimate stage and is followed by post-validation. This note covers both.

When a corrected zip arrives back, I load it into Valdrev, and run it against the images. Unlike checking, I do not have to view every line, but only those on which Valdrev stops.

Valdrev stops for:

1. Alerts, either inserted by the checker, or inserted by the transcriber and not resolved by the checker.

2. Records that have notes left by the transcriber or the checker, but not those contained in the transcriber’s Mynotes file. I do not see those, although Valdrev does stop.

3. County or place of birth names that do not exist as far as the geographical database GENIE is concerned.

From this you can see that if the transcriber leaves lots of notes, I get lots of stops. During validation I edit the notes left by transcribers. Usually, I delete them, but sometimes I retain them, edit them or add to them or insert new ones – as the fancy takes me! If Chapman codes for the Irish or Scottish counties have not been used, I get stops on all those. In the original plan, it was thought that the validation process would be pretty quick, with stops every hundred or so records. Like many things, this didn’t work out and stops are only too frequent.

The main problem is that the geographical database GENIE is limited in size and it doesn’t hold many perfectly good place names. In general, I pass all place names that are “as is”. I do not avail myself of the validater’s option to put in the modern or corrected names.

At the end of this process, I pack for uploading; in theory, this output file could be uploaded. But in practice, we know that there are a lot of errors still in the file, invisible during validation. The file is, therefore, loaded into FCTools. This is a diagnostic tool that identifies errors and gives warnings of possible problems.

FCTools produces a list of errors and a spreadsheet. As well as making the corrections indicated by FCTools, the opportunity its taken to “eyeball” the spreadsheet. It is surprising how many minor errors jump out and hit you in the eye! Once it as good as it can be, two files are produced. The validation file is uploaded to the Online DataBase (OLDB) and to the Mormon’s Great Granite Cave in Utah. The html file is sent to our web site.

Wednesday, December 21, 2005

What does a checker do?

As you know, I constantly witter on about this being a system and we being a team. So this is all about those strange creatures (to transcribers) who are checkers.

In an ideal world, checkers would be people who had done some transcribing. As it is not an ideal world, many checkers have not done any. Some of the most successful have just done one transcription and have probably forgotten what they learnt.

Starting up a checker is a little more complicated than getting a transcriber underway. However, if the instructions are followed, it can be done. The software is downloaded from the Free Census web site and is known as WINCC. The checker gets a zipped data file by email and loads it into the software. The task is then to go through the data, line by line, and compare it to the returns on the fiche. I cannot make the checkers look at each line and it is possible to just tab and save your way through the whole thing. I have had one or two people who appear to have done just that.

The checker should attempt to identify the transcriber’s mistakes AND correct them. This latter might seem pretty obvious – but isn’t to some people. The software enables the checker to do lots of interesting things including inserting people;complete households; or even whole pages. They can split up households and join people to households. They can alter the header data and leave lots of interesting little notes for the validater.

The checker should attempt to solve any flagged up records left by the transcriber. They can leave them for the validator and they can also flag items themselves. Of course, it is just as possible for a checker to be wrong as it is for the transcriber. However, I do not second guess checker’s decisions – well, not often. They can see the evidence and what the transcriber thought – I want them to make a decision. Flagging up the query or letting a query through is OK with me – but I would rather they sorted it out before it gets to me.

A well transcribed piece is easy to check; but as we are all human, most transcribed pieces are full of errors. The system will take care of them if we all operate it properly. Sometimes the checkers get nightmare pieces with virtually every record requiring a correction. Usually, these are repetitious mistakes and easy to correct. But each correction involves a number of key strokes or mouse clicks and it can become very boring to do ten mouse clicks for each of 5000 records!

At the end of the process the software produces another zipped file. The name has changed from censXXXX to ZZZZyyyy. This zipped file is emailed to the validater, who stuffs it into a third piece of software. You’ll have to wait for the next installment to find out about that.

Monday, December 19, 2005

What do Transcribers do?

This is a note for those people about to start, or for people who might volunteer and for checkers who have not done any transcribing.

Nowadays, we are all working from discs. Many of our volunteers have bought their own, but we can now supply free discs, courtesy of the LDS. A new transcriber is allocated a piece or a group of parishes and a lot of information. There are help files with the software and FAQ and other things on the main Free Census web site. Everyone gets a lengthy piece written by me and there are also “hints’ pages on the web site. In spite of this, questions still arise and transcribers and checkers still come across new problems.

I am not complaining about this, for many volunteers this is a completely new field; they haven’t transcribed census returns before and they haven’t got much experience with computers or the the internet. It constantly amazes me how much we are achieving given our wide geographical spread and our inexperience. It is also a fact that the enumerators of the 19th century had many and varied ideas on how they should carry out their task. As the census taking was organised by the government, the instructions to the enumerators were confused and confusing and sometimes downright contradictory.

The original project software is WINCENS. It started off as a DOS programme and was then changed into a Windows programme. It is still DOS of course, under the Windows interface, and transcribers will occasionally see the black DOS screens. However, we are using an alternative to WINCENS - SSCENS.

Because many people did not like the WINCENS programme, especially its inability to allow people to retrace their steps and edit the data, we introduced SSCENS. This is just a spreadsheet with knobs on. But it does allow editing and people can look at a page in its entirety; can look at a whole document. If the rules of SSCens are followed, then this spreadsheet can be converted into a format that will work with the checking software, WINCC. Any spreadsheet will do, although most people seem to be using Excel. It does not matter what platform you are using or which version of Windows. Anybody using an Apple should contact me as I am an Apple user.

A transcriber should aim to combine speed with accuracy. This is a system; transcription, checking and validation. A transcriber should not spend hours on trying to decipher one surname. Give it your best shot and move on. If you cannot resolve a problem, flag it up as a query; leave a note if you think it will help.

On completion, a transcriber should email us their completed spreadsheet. You should try and reformat the file as .csv, but if you can’t, then send it as it is. The SSCENS spreadsheet is reformatted for input into the next stage – checking.

Saturday, December 17, 2005


This project is built round the use of emails. There is, however, a supplement to the use of email. Instant Messaging. This gives you the chance to have instantaneous one-to-one comms with me and to use the COCP chat room to talk to each other and to us. Here are the instructions.

Go to

Left hand side, under software, click on clients

Next page, under Platforms, click on Exodus (if you are an Apple user, contact me first).

Next page; click on Get Exodus – download “Stable releases”

Next page; click on exodus

Click on download now.

Install Exodus. Open an account with jabber. When you have done this, email me and I will send you my jabber account name.

You can then add me as a new contact. Jabber will send me a message asking for permission to display my online status to you. I say yes. You will then be able to see when I am online. Just click on my icon and a panel opens up. The panel is split into two; the bottom one is where you type. You type "hello" and away we go.

We can then have one-to-one contact any time we are both online. The software will tell you when this happy state is available. In addition, you can use the COCP chat room. There is usually someone there, including about 4 regular participants, all of whom have all the discs you can imagine. So you can get a more or less instant second opinion on anything.

You do not have to do this unless you want to - but I recommend you give it a try. You will get instant response, most of the time!