Wednesday, January 25, 2012

Authority Control Matters

Authority control: The procedures by which consistency of form is maintained in the headings (names, uniform titles, series titles, and subjects) used in a library catalog or file of bibliographic records through the application of an authoritative list (called an authority file) to new items as they are added to the collection. Authority control is available from commercial service providers.  (Online Dictionary for Library and Information Science http://www.abc-clio.com/ODLIS/odlis_A.aspx)
Authority Control is a necessary part of a library catalog.  And it sure helps with any large database.  If I ever doubted the value of authority control (I never have), working on this database would be enough to convince me of its importance.  I'll explain.

Since my last post regarding the multitude of problems I was encountering I discussed the issues with a friend who works in computers and databases and she made a number of wonderful suggestions.  She also is going to help me with an aspect of moving the database once I get it cleaned up.

Speaking of cleaning it up ... wow, what a mess!!

After discussing this with my friend, I am no longer trying to move everything from multiple lines into one line as I was doing before.  That was taking way too long!  After two weeks on this project, I had managed to combine about 60 lines into 10 records.  I have a whole new approach now.  Currently I am going down the "Name" column only.  The goal is to separate out the individual names (remain in the column) from the ensemble names (put in a new column) from other phrases that are most likely titles (another new column).

Sounds easy, right?  Mostly it is.  I have managed to get down around line 7000 of the 10,100+ line spreadsheet.  That's big progress!

It is enlightening to see how incredible inconsistent names were entered into this database.  Notice in the definition I quoted at the beginning of this post that authority control requires "consistency of form."  Obviously that wasn't a concern with this database ... ever.

Just today I found a recital that consisted of about 10 lines of data (i.e. 10 pieces performed).  The same name appeared in all 10 rows of the column, but in about three different forms, just as example:

  • Last, Matthew R.
  • R. Matthew Last, piano
  • Last, R. Matthew
Hmmm.  So my first question: why is the person's instrument listed in some places and not others?  Second, is the initial a first initial or a middle initial?  And finally, could they not decide if the name should be listed last name-comma-first or first-last?

The order of the names is constantly changing as I go down the list.  The addition of instrument or voice part is also inconsistent.  It seems to me that there is a tendency to prefer last name-comma-first name unless there is an instrument name added on in which case it becomes first name-last name-comma- instrument/voice.  But not always.  Another issue is nicknames: sometimes they are used (Jim or Ben) and sometimes not (James or Benjamin); and it is obvious that it is the same person.  

Oh, authority control, how I miss you!

There are similar issues with ensemble names.  Some are more complete than others.  Sometimes ensembles are combined into one makeshift name, sometimes they are just listed together.

The one thing I haven't bothered with yet is lines that have two or more names or two or more ensembles.  I'm going to have to add some new columns.  I'm also not bothering with changing the names all into the same format or dealing with inconsistent punctuation.  For now, I just want to separate everything out and then I can go back through and do all those little details.  

I'm considering going on to the notes column and then coming back to the names once that is done.  Mostly because I'm expecting some duplicate info in the notes area.  That will be something to evaluate once I get done with this current column.  

Only about 3000+ lines to go!  

No comments: