Skip to content

Command Line Genealogy: Filtering City Directories from Archive.org

October 4, 2014
The 1848 Philadelphia City Directory, filtered on the Linux command line to show all entries that include the word 'Montgomery'.

The 1848 Philadelphia City Directory, filtered to show all entries that include the word ‘Montgomery’.

Fear not! It’s easier than you think!

☞ Background: I want a concise list of everyone who lived on Montgomery Avenue in Philadelphia in 1848.

I have a cemetery return for a baby girl named Mary Pickersgill, who died in 1848 and was buried in the now-defunct Mutual of Kensington Cemetery, in the then-newly consolidated city of Philadelphia. I suspect she is a lost sister of my great-great-great-grandfather, who was born William Harrison Pickersgill in February of 1846. The parents of these children, especially their father, is mysterious. 

Mary Pickersgill's Cemetery Return - 6-Feb-1848

A Mr. George Pickersgill apparently married the children’s supposed mother, Sarah, in 1844. By 1849, Mr. Pickersgill was out of the picture. Sarah had remarried one Hugh Young that same year. Mr. and Mrs. Young appeared together in the 1850 census, along with Hugh’s family from a prior marriage, and young William, who retained his birth name.

The year 1848 was a transitional one for the Pickersgill family. I would like to know with whom Mrs. Sarah Pickersgill was living at that time, but the doctor, Mr. John Uhler, was not as helpful as he could have been. He wrote that the young Mary resided on Montgomery Street, but did not trouble himself to include a house number, despite the prompt on the above form.

In the absence of a house number, I wished to see a list of everyone who lived on Montgomery Street in 1848—in case any familiar names should appear. Well … wishes can come true! I’ll show you how to use your computer’s command line to make such a list from the digitized city directories available at Archive.org.

☞ Acquire the city directory file from Archive.org

Archive.org has quietly become a great place to acquire royalty-free public domain images for your genealogy collection. In addition to its growing collection of city directories, it also has unindexed scans of all the federal censuses through 1930. For today’s demonstration, I will use the 1848 Philadelphia City Directory. Here is a screenshot of the Archive.org home page for that publication.

ArchiveOrg1

To find the city directory that you are looking for, just Google it, or type the year, city and the words ‘city directory’ in the search box. After you’ve found the directory of your choice, mind the two links that I’ve circled in red. The first of these is the ‘Read Online’ link. This is quite useful for browsing the text, and also useful for single name searches, but it is not the most convenient for the type of filtering I want to do. Try clicking the ‘Read Online’ link and performing a search for ‘Montgomery’ in the upper right corner of the screen, observe the results:

ArchiveOrg2

The results appear at the bottom of the screen, with a pin representing each time that the term appears in the book. Hovering over a pin reveals a preview of that instance as it appears in the book. Again, it may be useful for single name searches, but is not the best format for what I want to do. In that case, I’ll click the ‘Back’ button on my browser to return to the previous screen. Once there, I’ll click on the ‘Full Text’ link that I circled above. Here is what you’ll see (Note: Not all publications have a ‘Full Text’ link. If that link is not there, then you are out of luck for the rest of this demonstration. Sorry!):

ArchiveOrg3

At this time, you can type ‘Ctrl-F’ to find all instances of the search term. Your browser should highlight each instance of the search term, and pressing enter repeatedly should cycle through each instance. This is a better way to search, in my opinion, but it still is not the concise list that I am looking for. Let’s filter this list even further.

Click the ‘Back‘ button to return to the publication’s home page, and then right-click the ‘Full Text’ link to save the text file to your computer. I recommend saving the file to your computer’s Desktop, so it will be easier to see what we’re about to do with it. The text file I downloaded for the 1848 Philadelphia city directory is called ‘mcelroysphiladel1848amce_djvu.txt’. Here it is on my desktop:

Screenshot from 2014-10-01

We’re ready for the next step. If you are using a Windows computer, skip on down to the following section. The principles are the same, but the commands are different. Linux users: stick with me a while.

☞ Filter the city directory file using the Linux command line.

To start, you might want to open the file you just downloaded to familiarize yourself with what’s in it. Here is my text file, opened using the popular text editor called GEdit:

Screenshot from 2014-10-02

Screenshot from 2014-10-02 12:29:43

Once you are familiar with the contents of the text file, go ahead and close your text editor using the ‘X’ at the upper right.

My goal is to filter this text file so to view only those lines containing the street name ‘Montgomery’. To do this I will open a terminal window and issue a few commands that interact with the city directory file. The process for opening a terminal window varies depending on which distribution you are using, but the most popular distributions for beginners, such as Linux Mint and Ubuntu, use the keyboard shortcut ‘Ctrl-Alt-T’. The terminal is also fairly easy to find under the Main Menu. Look for it under either ‘Accessories’ or ‘Utilities’.

Once you open your terminal window, use the cd command to change the working directory to the location in which you saved your file. I advise saving the file to your Desktop, so you can see the results of your commands appear on the screen as you work. in this case, the full command is cd Desktop. To confirm that you have changed to the proper location, use the command ls. This command lists all of the files in the current location (which should be your computer’s Desktop). Look for your city directory text file in the output. Scroll up the terminal window if you have to. If you see the city directory file in the terminal’s output, then you are ready to begin filtering. If you do not see your city directory file, then use the command cd ~ to return to your computers home directory, and then try a different location. You may need to try downloading the file again.

Here is the result of my commands. (Note: I’ve made the text in my terminal larger to improve readability.)

Screenshot from 2014-10-02 12:59:58

Once sure you are in the correct directory you can issue the commands to filter the file. The cat command will reproduce the contents of a text file in the terminal window. If issue that command on your city directory file, you will see the whole book whiz across the screen in a flash. This is not very useful for us by itself, but when combined with a filtering command, the computer gives us just what we need.

The filtering command is grep. When you follow the grep command with a filtering word, such as ‘Montgomery’, the computer will reproduce only those lines of text that include the word ‘Montgomery’.

To filter our city directory, we will issue the cat command and the grep command together on one line. Join the two commands together by using the ‘pipe’ character. That is the vertical line above the ‘Enter’ key on most computers (like this: | ). The full command that you should enter into your computer is here:

cat mcelroysphiladel1848amce_djvu.txt | grep Montgomery

Tip: You may use the Tab key to auto-complete long file names. Try it! Don’t worry if the command spills over onto a second line. That is very common. Just press the Enter key after the command and observe the results. The result should be that every line of text that includes the word ‘Montgomery’ is reproduced in the terminal window. You may use your terminal’s scroll bar to review all of the results. Here is my output, after issuing the command:

Screenshot from 2014-10-02 13:29:49

I’ve circled the command I issued in red. Notice below the command that the computer has filtered the contents of the city directory, and has reproduced only those lines containing the word ‘Montgomery’. Pretty cool, huh?

Suppose we want to save this list for future reference in a new text file. Our command requires only a slight modification. We use the greater than symbol ‘>‘ to specify output to a new file. Like so:

cat mcelroysphiladel1848amce_djvu.txt | grep Montgomery > Montgomery1848.txt

Under this command, your computer will not reproduce the results to your screen. Your computer will instead create a new text file on your Desktop and reproduce the results there. Here is a picture of my result. Notice that no new output follows the command I issued, but a new text file has appeared on my Desktop:

Screenshot from 2014-10-02 13:43:37
Lastly, try opening the new file using your favorite text editor. You should see that the city directory has been filtered, and now contains only the lines with the word ‘Montgomery’. Here is my example:

Screenshot from 2014-10-02 13:56:59

That’s all! It wasn’t so bad, was it? If you’ve followed me this far on your Linux computer, you may skip the next section, which details the same process for Windows users. The commands are a little different. You might want to have a look anyway.

☞ Filter the city directory file using the Windows command line

Those of you who remember the days of DOS already know a little bit about the command line. You may have thought the old DOS prompt to be a thing of the past, but that is not so! All versions of Windows, including the flashy Windows 8, have the ability to issue commands from a DOS-like command line, just like in the good old days of the 1990s.

First, let’s open the text file in a text editor to familiarize ourselves with its contents. ‘Notepad’ is usually the preferred Windows program to open text files, but in our case, I find that the Wordpad program handles the formatting better. To view the contents of the text file, right click on its Desktop icon and choose ‘Open With … WordPad’. You should see something like this:

00o

000o

0000o

My goal is to filter this text file so to view only those lines containing the street name ‘Montgomery’. To do this I will open the Windows command prompt and issue a few commands that interact with the city directory file. To access the command prompt in Windows, run the cmd command from the Start Menu, like so:

01o

If you are running Windows 8, simply type ‘cmd‘ at the Start screen to achieve a similar result. For more information on how to access the command prompt in Windows 8, see this video.

Once you access the command prompt, use the cd command to change the working directory to the location in which you saved your file. I advise saving the file to your Desktop, so you can see the results of your commands appear on the screen as you work. In our case, the full command is cd Desktop. Just type that in and press ‘Enter‘. To confirm that you have changed to the proper location, use the dir command. This command lists all of the files in the current location (which should be your computer’s Desktop). Look for your city directory text file in the output that follows. Scroll up the terminal window if you have to. If you see the city directory file in the terminal’s output, then you are ready to begin filtering. If you do not see your city directory file, then close and re-start the command prompt, and try a new location.

Here is the result of my commands:

02o

Once sure you are in the correct directory you can issue the commands to filter the file. In Windows, the filtering command is findstr. To issue this command, just type it after the prompt, followed by the word you want to search for, followed by the name of the file you want to filter, followed y the Enter key, like so:

findstr “Montgomery” mcelroysphiladel1848amce_djvu.txt

Tip: You may use the Tab key to auto-complete long file names. Try it! Don’t worry if the command spills over onto a second line. That is very common. Just press the Enter key after the command and observe the results. The result should be that every line of text that includes the word ‘Montgomery’ is reproduced under your command. You may use your terminal’s scroll bar to review all of the results. Here is my output, after issuing the command:

03o

I’ve circled the command I issued in red. Notice below the command that the computer has filtered the contents of the city directory, and has reproduced only those lines containing the word ‘Montgomery’. Pretty cool, huh?

Suppose we want to save this list for future reference in a new text file. Our command requires only a slight modification. We use the greater than symbol ‘>‘ to specify output to a new file. Like so:

findstr “Montgomery” mcelroysphiladel1848amce_djvu.txt > Montgomery1848.txt

Under this command, your computer will not reproduce the results to your screen. Your computer will instead create a new text file on your Desktop and reproduce the results there. Here is a picture of my result. Notice that no new output follows the command I issued, but a new text file has appeared on my Desktop:

04o

Lastly, try opening the new file using Wordpad. You should see that the city directory has been filtered, and now contains only the lines with the word ‘Montgomery’. Here is my example:

05o

That’s all! It wasn’t so bad, was it?

☞ Improve your results

Archive.org creates text files from scanned images using software called Optical Character Recognition, or OCR. While the software greatly reduces the amount of work needed to create a useful text file, it is imperfect as a transcriber. The text files at Archive.org retain many errors.

The software reads the image exactly as it is on the page. Therefore, if a word is hyphenated in the original book, then the hyphenation remains in the text file. This is especially true where the original text columns are thin, and the word you are searching for is long. When searching a city directory for a word as long as ‘Montgomery’, you may wish to keep your search term down to its first syllable. Notice how my results improve by searching for ‘Montg‘, rather than ‘Montgomery‘:

Screenshot from 2014-10-04 09:40:10

Also, try searching for the last syllable, for those cases in which a scanner error appears at the beginning of the word. (Tip: This also holds true when performing searches on the Archive.org web site: Less is more!)

Lastly, notice that some special characters will appear differently so as not to conflict with the computer code used to make the Archive.org web site. For example, the ampersand character, ‘&‘, appears in the text file as ‘&‘. My advice would be to avoid using special characters in your search. If you must include them, browse the text file first to see how they are represented.

☞ (Don’t) Use Multiple Word Searches

If you are searching for a unique name or address, such as ‘19 Montgomery‘, or ‘Ludlow Martha‘, I believe that the the search tools available on the Archive.org web site should suffice. The results in these cases are likely to be either few in number, or concentrated on one page for easy viewing. The method I described here is best used to filter a somewhat lengthy list of persons of interest who appear scattered throughout the directory. Searching for a street name, such as ‘Montgomery’, or a woman’s first name, such as ‘Martha’ (perhaps to generate a list of potential married names), is appropriate, but using a multiple word search with these commands to find essentially unique entries seems like overkill to me.

If you really must filter the city directory using a multiple word search, then the commands above will require a little modification.

In Linux: To filter the city directory so it shows all entries containing the exact phrase ‘29 Montgomery‘, put quotes around the exact phrase, eg:

cat mcelroysphiladel1848amce_djvu.txt | grep “29 Montgomery” > Montgomery1848.txt

To filter the city directory so it shows all entries containing either29orMontgomery‘, use this form:

cat mcelroysphiladel1848amce_djvu.txt | grep ‘ 29 \| Montgomery’ > Montgomery1848.txt

In Windows: To filter the city directory so it shows all entries containing the exact phrase ‘29 Montgomery‘, use this form:

findstr /c:”29 Montgomery” mcelroysphiladel1848amce_djvu.txt > Montgomery1848.txt

To filter the city directory so it shows all entries containing either29orMontgomery‘, use this form:

findstr “29 Montgomery” mcelroysphiladel1848amce_djvu.txt > Montgomery1848.txt

Again, I don’t think I’d bother, but if you must, that’s how to do it.

☞ Conclusion

Alas, no obvious relatives of young William Pickersgill jumped out at me after this search and filter, but I now have handy a concise list of “persons of interest” who lived on Montgomery Avenue.

I hope you enjoyed this little tutorial, and that you learned a little bit about the command line in the process. If you come across any trouble filtering your city directories, remember that Google is your friend! If you still have trouble filtering, post a comment below or send me an e-mail. I’d like to hear from you!

Advertisements
No comments yet

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s