Wednesday, January 25, 2012

Authority Control Matters

Authority control: The procedures by which consistency of form is maintained in the headings (names, uniform titles, series titles, and subjects) used in a library catalog or file of bibliographic records through the application of an authoritative list (called an authority file) to new items as they are added to the collection. Authority control is available from commercial service providers.  (Online Dictionary for Library and Information Science http://www.abc-clio.com/ODLIS/odlis_A.aspx)
Authority Control is a necessary part of a library catalog.  And it sure helps with any large database.  If I ever doubted the value of authority control (I never have), working on this database would be enough to convince me of its importance.  I'll explain.

Since my last post regarding the multitude of problems I was encountering I discussed the issues with a friend who works in computers and databases and she made a number of wonderful suggestions.  She also is going to help me with an aspect of moving the database once I get it cleaned up.

Speaking of cleaning it up ... wow, what a mess!!

After discussing this with my friend, I am no longer trying to move everything from multiple lines into one line as I was doing before.  That was taking way too long!  After two weeks on this project, I had managed to combine about 60 lines into 10 records.  I have a whole new approach now.  Currently I am going down the "Name" column only.  The goal is to separate out the individual names (remain in the column) from the ensemble names (put in a new column) from other phrases that are most likely titles (another new column).

Sounds easy, right?  Mostly it is.  I have managed to get down around line 7000 of the 10,100+ line spreadsheet.  That's big progress!

It is enlightening to see how incredible inconsistent names were entered into this database.  Notice in the definition I quoted at the beginning of this post that authority control requires "consistency of form."  Obviously that wasn't a concern with this database ... ever.

Just today I found a recital that consisted of about 10 lines of data (i.e. 10 pieces performed).  The same name appeared in all 10 rows of the column, but in about three different forms, just as example:

  • Last, Matthew R.
  • R. Matthew Last, piano
  • Last, R. Matthew
Hmmm.  So my first question: why is the person's instrument listed in some places and not others?  Second, is the initial a first initial or a middle initial?  And finally, could they not decide if the name should be listed last name-comma-first or first-last?

The order of the names is constantly changing as I go down the list.  The addition of instrument or voice part is also inconsistent.  It seems to me that there is a tendency to prefer last name-comma-first name unless there is an instrument name added on in which case it becomes first name-last name-comma- instrument/voice.  But not always.  Another issue is nicknames: sometimes they are used (Jim or Ben) and sometimes not (James or Benjamin); and it is obvious that it is the same person.  

Oh, authority control, how I miss you!

There are similar issues with ensemble names.  Some are more complete than others.  Sometimes ensembles are combined into one makeshift name, sometimes they are just listed together.

The one thing I haven't bothered with yet is lines that have two or more names or two or more ensembles.  I'm going to have to add some new columns.  I'm also not bothering with changing the names all into the same format or dealing with inconsistent punctuation.  For now, I just want to separate everything out and then I can go back through and do all those little details.  

I'm considering going on to the notes column and then coming back to the names once that is done.  Mostly because I'm expecting some duplicate info in the notes area.  That will be something to evaluate once I get done with this current column.  

Only about 3000+ lines to go!  

Thursday, January 12, 2012

Problems, problems, and more problems

There are lots of problems in the spreadsheet I am working off of.  I knew there would be, but I'm realizing that it is even worse than I could have imagined.

I've had various people over the years work on this spreadsheet, but at the time I didn't know that each recording was represented by multiple rows in the spreadsheet.  So they would sort the sheet by composer name and then be able to edit all the names to the authorized heading and only have to look up the composer once since they were listed in alphabetical order.  Sounds like a great plan, right?

So now the spreadsheet is organized by date (since that is the only way to know which rows belong together) and the composers are organized alphabetically within the concert or recital date.  Thus, I don't know what the actual order the works were performed in.

Why does this matter?  Because when we catalog sound recordings we list the works in a contents note in the order in which they were performed.  And if there are multiple performers who played on some works but not on others, we list which work they performed on, as in number of work (i.e. Kerri Baunach, clarinet (3rd work)), in a performers note.  I have no way of knowing that info at this time, so I am just listing the various performers or groups of performers in no particular order.

As a result, my performer notes aren't that helpful at the moment.  And my contents notes appear as if the works were all performed in alphabetical order according to the composer's name.  If you're a bit OCD, that might seem kind of cool.  For this OCD cataloger, it's not.

Other problems:

  • Misspellings in names of performers or groups
  • Inaccurate title info (how about Sarabande for guitar ensemble by J.S. Bach.  Or Fugue for guitar ensemble by Handel.  Seriously, no arrangers listed, no further title info to let me know WHICH sarabande or WHICH fugue.)
  • Incomplete title info (kind of goes along with the point above)
  • I have no idea which concerts or recitals have programs and which do not (I may need to contact the School of Music and spend some time with a scanner, which means going to the office and finding a babysitter.)
  • The authority work
As for dealing with the spreadsheet itself:
I scrolled down to the bottom of the sheet and found that I have about 10160+ rows of information.  I have so far converted the first 57 (well 56, row 1 is column headings) into 10 rows on a new spreadsheet that would create 10 MARC records.

That doesn't sound like much when I look at those numbers, but really ... that did take me a long time.  Refer back to the problems I'm dealing with.  Those problems are on each performance.  Every. Single. One.

On the spreadsheet, I've created four new tabs.  One for the student recitals and one for the ensemble concerts.  Since I'm creating columns for each MARC field, it seemed either to do it this way so there was only one 1XX field in each spreadsheet: 100 on the student recitals and 110 on the ensembles.  Then I created a tab as a "transfer" space.  This is just for me to copy the info for the recording I'm currently working on over to this space so I can see it better.  It was getting hard on the master list to see just the parts I needed and I kept losing my place, thus wasting time.  It's working well so far.  Finally, the fourth tab I added today as a place to list the date of recordings where there is insufficient info and what that insufficient information is.  This will help when I have to go back and fix things, it should be easier to locate the problem items.

As for the original spreadsheet, I'm not changing anything on it.  I'm keeping it as a master in case I mess something up somewhere and need to refer back to something.  I've made the date column bold and as I complete a recording I un-bold those dates.  That way I can keep track of where I am visually, especially since I am working in short stints and sometimes have to walk away in the middle of something.  Seven month old babies don't like to be kept waiting.

I also now have MARCEdit on my work computer and someone sent me a link to a tutorial on You Tube.  So new part of the sabbatical project: learn how to use MARCEdit and transfer all these records I'm working on into the actual MARC format.

Lots going on.  Problems galore, lots of cutting and pasting between spreadsheets, heavy use of the authority file, and eventually learning a new program.  On top of that, I really have to figure out a better schedule!

Wednesday, January 4, 2012

The Sabbatical Starts

The Sabbatical Project has officially started!  It's nice to work on a more leisurely pace and be able to focus on one thing rather than juggling a gazillion responsibilities (seemingly).

The biggest challenge is just figuring out a schedule.  I'm doing this sabbatical project with two babies at home with me, currently 7 months old.  Today was tough, we got a little off their schedule so they were each sleeping and eating at different times from each other.  Not good!  On the positive side, that rarely happens, so I have hope that we'll be back on track tomorrow.

I have discovered that I can hold a baby in one arm, hold a bottle in that same hand, and have the other hand free to check email, type, and search other institution's OPACs (ooooh, a new term for my Definitions page!).  That worked for one baby, not so much with the second baby.

One thing I did today was search for recital cataloging at other institutions.  This helped give me a better idea of what I really need to do with the data we have.  Biggest discovery: the data we have totally sucks.

The second thing today was looking at the data in our spreadsheet (used to be in a database format that is no longer supported, thus the spreadsheet).  Literally "looking."  I had a baby in my arms that was fascinated by the laptop keyboard.  I was a bit concerned looking at the first several lines of the spreadsheet.  I had to email the manager of our Fine Arts Media Center to see if programs were available for a couple mid-80s recitals or concerts so I could make sense of what I was looking at.  He provided the program for one and described the two cassette tapes for the others that helped answer my questions.

(Should I add "Cassette tape" to my list of definitions?  It was recently brought to my attention that there is now a generation of people who don't know what a cassette tape is.  Wow, I'm getting old!)

Lastly, just a note about the spreadsheet I'm working on.  Each performed piece of music is listed on a separate row of the spreadsheet.  Each entry contains the recital date, the student's name, and then the composer and title of the piece and any notes (all in one cell, by the way).  That's it.

So imagine a student who gives a senior recital and performs 5 pieces.  Then they stay to get a master's degree and they give another recital performing another 5 pieces.  That's ten lines on the spreadsheet that will contain their name.  The date is the ONLY way I am able to tell which recital pieces go together; it is THE most important piece of information I have.

And then you have lines with the same date but different performers.  Oy!  Two different recitals on the same day?  One recital with different performers?

I may be taking more trips into the office than I originally thought I would.  It'll be good for the babies to get out.

Tomorrow's goal: Set up a second tab on the spreadsheet  for editing purposes.  I'm also considering moving all the info for each recital or concert into one row, rather than multiple rows.  And then maybe also separate out large ensemble concerts from the student recitals (more tabs).  I feel like this week mainly about realistic organizing (as opposed to the previous planning I did); I'm getting a feel for how this is really going to work and what really needs to be done.  It's already looking a little different than I had thought it would.