Friday, August 17, 2012

And then one day ...

... someone comes to your desk and tells you they just discovered a few boxes containing probably a couple hundred DAT tapes of School of Music recitals and concerts!!!

Oh yes, this was today. My first question was:

Do we support DAT?

Why yes we do. We have two players. Now to figure out if these are duplicates of other tapes or brand new.

Glad I hadn't gotten into the '90s yet with my revising of the spreadsheet. Ah, the joy of working in a library. You just never know what might turn up!

Friday, June 29, 2012

Cataloging Templates

It is the last official day of my sabbatical and I managed to put together the beginnings of two templates for cataloging recital and concert recordings. I decided to go with two because of the differences in some of the necessary fields between individual student recitals and concert recordings of larger ensembles. Even recitals featuring more than one performer were distinctly different than ensemble performances as were concert performances by a chamber ensemble still different than student, faculty, or guest artist recitals.

The main differences were in the fields for access points, the 1xx and 7xx fields. Also as I worked through the spreadsheet I came up with a format for titles of the recordings and these were different between the ensembles and the recitals. Overall, it just made sense to go with two templates.

I had questioned creating templates in our local catalog system (Voyager) versus creating them as constant data files in OCLC. Since OCLC is more likely to be around long term and Voyager is not, I opted for OCLC. The only complication in this is that I have not discussed with a few key people at my institution whether we want to put these records in OCLC or not. My opinion on the matter used to be to catalog locally except for maybe ensemble concerts and/or graduate recitals, but a late great music cataloger once opined the importance of adding this kind of material to OCLC and it convinced me that was the better way to go.  So for now, my templates are living in OCLC as constant data files.

This is what I have done so far:
Template for recital recordings
I am defining recital recordings as any performance featuring one or more individual performers of works mostly for solo instrument. Recitals may include chamber pieces. Some exceptions: chamber recitals in which there is still one primary performer who performs on all works while the other members of the ensembles change. Also lecture recitals which may include a number of performers but the main thrust of the recital is still focused on one individual.

In the template above I have the main descriptive and access fields. I have not yet dealt with the fixed fields or the 007 field. There are other 0xx fields I may add as well. In the rest of this record anywhere that I have text bracketed and in all caps is where I expect information to go. You'll notice in the 245 that I have left the statement of responsibility area blank. I'm still trying to decide how I am going to set that up in conjunction with the 100 and the title portion of the 245. The 260 includes only the year since we are dealing with unpublished material. But that leads me to another question I need to answer as well: I did code the Country code in the fixed fields for "xx" since the 260 does not include a place of publication. However, you'll notice the codes I added to the 033 field which do indicate location. I'm thinking I can use those because the location of performance is cited in the note field and the 033 is linked more so to that than the publication info.  Am I correct on that?

Moving along, I have a standard 300 description field followed by my notes. Compact disc note first and then a note for "Program available." That is a place holder. I hope to have a more detailed note about the program once we make a decision on to house the programs. I have also set up notes in which to add performers, date, time, and place of performance, and a contents note. Finally I've stuck in a couple places for additional performers to be listed.

I skipped subject headings (6xx fields) for now, but I do plan on having them there. Those will differ for each recording, so I do not expect that a student worker or circulation staff member will be adding those fields. So having them in the template was not important to me.

The next step will be putting together instructions for how to properly enter info into these templates.

For comparison purposes, here is the template for ensemble concert recordings:
Template for ensemble concert recordings
I am defining ensemble concert recordings as any recording of a performance by a named ensemble, whether a chamber ensemble of 3-5 players or a large group such as a concert band, symphonic orchestra, or opera production. The individual performers may or may not be named. Generally, conductors and soloists are listed alongside the group names.

Finally, I need to put together instructions. These will include things like how to put a person's name in (last, first), how to structure and what information is included in the 505/Contents field, how to add additional notes, how to create the title, and who to put where when their are many names present. Also little things like how to add a field and other technical stuff.

As I worked through the spreadsheet and then created these templates I can envision now how these records will really get created. I can see a student worker or a circulation staff member getting these recordings, entering the descriptive information and the names for the access points. I can even see training a full-time staff member to do basic searches in the LCNAF on OCLC to get the proper headings for personal and corporate names that may be there. The records would then be saved in an online file in OCLC and the recording and program passed on to either myself or my cataloging staff member for us to do the rest of the authority work and add subject headings as well as checking the overall work in the rest of the record. One of us could also add it to OCLC and our local system or maybe we pass it back to the circulation department for them to handle. Not sure yet of that last step.

And that is all for now! Since there are still many open parts of this project I will continue to keep up this blog as work on this project continues. So stay tuned.

Image source morgueFile

Thursday, June 28, 2012

Floundering

I feel a little like I am floundering. I have done a lot on this project, but every time I look at my list of what I wanted to accomplish I feel like I barely scratched the surface.

I have one day left after today, so I know I'm not going to suddenly accomplish everything in a day and a half. But I feel like I should still be doing something. I just can't figure out what.

I could work on the revisions to my survey based on the feedback I received. But it feels overwhelming every time I look at it.

I could work on creating templates, and did a little bit. But I hate trying to create something where someone else can just plug in information even though the information may not go neatly into a template without some educated decisions.

Which led me to thinking that I could write instructions on how to use the template and how to make decisions on what information to use and how and what not to use. But I again get overwhelmed thinking about whether or not I can anticipate every scenario.

What I did do today, besides starting a brief OCLC Constant Data file for recital recordings:

  • Did some more [very basic] internet searches on other institutions digitizing their recital and concert recordings
  • Emailed the MLA-L list to ask if anyone worked at an institutions that had comprehensive guidelines for what information students supply for their recital programs
  • Studied my five point list of what my project was going to accomplish (leading me to write this post)
And that was it.  I still have a good chunk of the day left, but I'm not feeling motivated right now.

Most likely tomorrow I'll no longer be floundering, I'll start panicking instead and second guessing everything about the last six months.

Image source morgueFile

Tuesday, June 26, 2012

Coming to a Close ...

Not my desk, but the idea is close
... but not really.

Yes, the sabbatical period is coming to a close very soon, but this project will go on.  It has to, it is so far from being complete.  "Complete" being a relative term in librarian-ese.

For now, in my last week working on this project full-time, I wanted to give a quick update on where the spreadsheet stands.

Let's recap: the spreadsheet was a total of 10,123 rows of data.  Each row represented one specific musical work performed on a recital or concert recording.  Thus one particular recital or concert could be made up of anywhere from 1 row to 20 rows (or more in a few select cases).

In an effort to get a small portion of the work done and be able to do something with it, I ended up stopping my work at the end of 1989.  That gave me roughly 8 years of data (1982-1989).  The data is sporadic in the first few years, but 1986-1989 are pretty complete.

The years 1982-1989 represented 1,789 rows on the spreadsheet, 17.7% of the total.

I copied the years 1982-1989 over to a new spreadsheet so I could play with the data without messing up the entire spreadsheet.  In the new spreadsheet file I separated this chunk of years by "Student, faculty, & guest recitals" and "Ensemble concerts."  This allowed my to delete columns in one or the other that no longer applied to that chunk of data; for example, the 100 field from the Ensemble list and the 110 field from the Recitals list.

Recitals came out to 886 rows of data or 49.5% of the total of 1982-1989 data.  Ensemble concerts cam out to 903 rows of data of 50.5% of the total.  Pretty much an even 50/50 split.

I then went through the list one last time to put in more uniformity.  One of the problems with working on a large set of data over a long period of time is as you go along and get more experience you make decisions that you didn't think of earlier on in your work.  By the time I got the end I had a rhythm established and knew how I wanted things to look.  So going back to the top and revising one last time was important for me to have the uniformity across the data.  Especially as I consider making templates for future data.

The templates were constantly on my mind.  Things have to be consistent if you want to be able to create a template for someone else to use who doesn't have the experience and can't make cataloging decisions.

Thankfully, the process of going back over those 8 years of data went pretty quickly.  Having the data separated into recitals in one place and ensembles in another made a big difference, I didn't have to constantly switch gears.  And uniformity came much easier than.

Titles (245) were a big issue in the Ensemble list.  For some reason I had left this out all the way through.  But I came up with an easy solution and it didn't take long to stick titles in for each recording.

Programs are going to be an issue down the road.  Available programs were pretty much non-existent until around 1986, but remain sporadic until around 1988.  I did not request copies of every program, that would have been too time consuming, but did request one when the info in the spreadsheet was confusing or needed to be better sorted out.  Programs will have to be addressed at a later time.

Image source: MorgueFile

Tuesday, June 12, 2012

In the Home Stretch

I can't believe this sabbatical time is almost to a close.  In a little less than three weeks I will be back at work and any further work on this project will become just a small part of my regular job again.  There is SO MUCH work that still needs to be done!!  I've done a lot but a lot more could be done.

Here's a quick summary of what I have tackled and hope to accomplish in the next few weeks.

The largest part of the project has been the database itself.  This is the main instigator of doing this project in the first place.  I have handled all the composer names and cleaned up a lot of other names (performers, arrangers, etc.) throughout most of the database: composers are done all the way through; other names are semi-done.

Currently I'm working through the notes, which has been way more of a headache than I anticipated.  But in the last few weeks I've gotten into a rhythm with those.  Basically I am taking all the notes and getting rid of duplicate information and condensing all the note info onto one line for each recording.  This is in contrast to the current set up where the notes appear on every line.  I had originally thought that they were all duplicates and that all I had to do with delete all lines but one and then arrange them appropriately into the columns I had established (general note, performers note, location note, etc.).  What a silly assumption!  Many of those lines of notes were specific to the piece that line represented.  So this is what I have done:

  • Find something in the notes section that can be or seems to represent a title for the whole recording and cut and paste it into the title column.  Delete all other instances of that title.
  • Move all performers into one line regardless of which pieces they do or do not perform on.  IF a program is available, designate which pieces individuals perform on within that note.  List each name or group once and delete all other instances of the name.
  • Move any other general notes into a general note field.  Delete all other instances of that note.
  • If a program is available create a time and place note (518) for the performance and recording.
  • If there is a note on a student recital regarding the recital being "in partial fulfillment of" a certain degree type, put it in a general note.  Delete all other instances of the note.  [May reconsider moving these notes to a 502 note later, for now keeping them general (500) notes.]
  • Delete notes made that are typically not used in MARC records for sound recordings.  In other words, if it doesn't fit, delete it.  [This was a really hard concept for me at first.  Delete information??  But something had to give and there were many instances of extraneous info.]
Right now my goal is to finish working on these notes through 1989 (I'm almost there) which will give me 8 years worth of recordings to play with in MARC format.  I am then going to copy and paste those 8 years into a new spreadsheet.  Once I have a new spreadsheet for them I will create two tabs and separate the recordings by recitals in one tab and large ensemble concerts in another tab.  For the most part every recording fits one of those two categories, exceptions are few and I can put those where they best fit.   Once I have them separated in this way, I can customize a few things that are specific to those types.  I should be able to complete all of this by the end of this week.  Next steps will be to finally move the info to MARC records.

The second part of my project was to create templates for future recordings.  I don't think this will be difficult to accomplish.  Especially with all the work I have done so far, I can see much more clearly what information we even have for these recordings and how I have been dealing with that info.  I think now that I have this experience, creating the templates won't be too hard.  That is unless I decide to create templates in RDA format (which makes the most sense, unfortunately).  Even then though, I don't think there would be too many differences.

Third part of the project was the digital side.  I have a proposal in my head for digitizing the programs and linking them up with the finished MARC records in our local catalog.  I haven't yet explored the possibilities of digitizing the sound.  Sometime in the next couple weeks I want to get my hands on some of the articles written about the Variations project at Indiana and maybe talk to someone who was heavily involved in that project.  I need to find out who else is doing something similar.  What I don't know is if they are digitizing recitals or not.  I'm not interested (at the time) in digitizing our regular sound collection.

Researching requirements of recital programs (the fourth aspect of my project) has also not happened.  The first part of this project really was very time consuming!  This is something I REALLY want to do though.  I know the approach I want to take with this, but haven't been able to devote the time to it yet.

Finally, surveying other institutions on their cataloging practices regarding recital and concert performances at their institutions.  I have put together a survey and I sent it out to a few colleagues just for feedback.  I have received that feedback and made a few changes, but I need to make more based on that feedback.  But then I stopped short on actually setting it up as a survey (I originally thought I'd get a survey out before the end of May).  Reason being is that I put together a short survey for a personal project and realized in that process that how I wanted to set up this survey wasn't going to work the way I wanted it to.  Plus it occurred to me that the university may have guidelines that need to be followed in doing surveys for research and since I want to eventually publish an article with the information I gather, I need to investigate further.  Plus, maybe the university has a survey tool that I could take advantage of that would be better than the free tools I have been using online for other projects.  So this aspect of my project has started to look bigger than I anticipated.

So that's where I stand right now.  In the last month of this project I am looking back at the project overall, what I have accomplished, what I know I can get done in the remaining three weeks, and what still needs to be done and I wonder if this project was too big from the beginning.  I knew it was going to be a lot (and one aspect was added by my superiors {part 3--researching digitization possibilities} so that wasn't even in my original plan) and it has proved itself so.

It probably didn't help that we sold a house, bought a house, and moved during my sabbatical.  That probably cost me about a month of time.  But other than that I have worked pretty regularly most every day.  I do feel like I have accomplished a lot and I know this database far better than I ever did.  If I can make MARC records of the first 8 years of the database and get templates established for creating MARC records from 2012 forward, that will be a big boost to what was there before.  And I will know how to deal with the years 1990-2004 and everything added since 2004 in a much more efficient way.

Thursday, May 3, 2012

Fun with Notes

A while back, when I still figuring out just how to deal with this spreadsheet a friend of mine volunteered to look it over with me.  This friend is also a database manager, so of course I said yes!

One thing that was irritating me was that all the notes on all the recordings were  squeezed into one field.  It was not pretty.  She asked me a few questions about how we catalogers make notes and then she worked some magic in Excel to split out the field.  She used the period to plit it up.  This was the best solution we could come up with for creating multiple fields and then all I had to do was move the info as I saw fit.  Sounds great, right?

Well ... yes and no.

There are periods in many other places besides the end of a sentence.  Initials in someone's name and abbreviations are two of the big ones I have run into.  Dealing with them is tedious, but overall Istill do think that the cell split we did has its benefits.  For one, it is easier to see all or most of the information.

Right now I am working on the notes.  For the most part they consist of other performers on a recording: conductors, names of all members of a chamber group, soloists, other instrumentalists, the performers in major roles of an opera or concert of opera arias, etc.  Occasionally I also have place of performance info or something else, but not often.  Sometimes there is something in the notes area that I think is actually the title of the concert or recital.  Those are nice to find, especially since I do not have titles (MARC field 245) on the vast majority of these concerts. 

I've shared in the past how every work on a recital has the notes associated with it.  That's a lot of repeated notes.  But when a note only applies to one particular work, it is only listed next to that work.  My method is to collapse all those notes into one line next to the first work listed on the spreadsheet.  Easier said than done.  I don't have a good method for dealing with those specific notes since I don't have the order the works were performed in.  So instead of saying: harpsichord (1st, 3rd, and 4th works) and fortepiano (2nd and 5th works), I'm currently putting part of the work's title in the notes (enough for me to identify it).  The plan will be to change that once I can get hold of either the program or the recording itself.

It's hard working off just a spreadsheet.

Here's a bit of an illustration of some of what I see:


In the above illustration there are five works performed on the recital on 10/19/1985 (all in bold).  You will notice that the performer played these works on two different instruments.  I should collapse the five lines that say Harpsichord and Fortepiano into one coherent note.  Unfortunately the order of works here is most likely not the order of works as they were actually performed. I'll still get it all into one note, but it won't be pretty.

Same five works, further down the spreadsheet:


I apologize if this one is small and hard to read.  This is info about the various parts of the individual pieces: keys and catalog numbers mostly.  This is where the period caused a big split due to abbreviations.  But also, how do I collapse all this into one note?  (I'm thinking maybe a Contents 505 note, but then I have think about how I will create 505 Contents notes when we automatically create MARC records and will this get replaced by the titles of the works ... it sometimes makes my head hurt.)  You can see some of the difficulty I am dealing with.

This is not an isolated incident.  This is actually very common as I move along this spreadsheet.  I've worked on the notes section before and given up.  But now I'm back to it and really can't leave it again. 

It must be done, so no time like the present!!

P.S. In other news, I have a survey done and it is being looked over by some colleagues for feedback.  Hope to get that finalized VERY soon!

Monday, April 23, 2012

Spreadsheet Nastiness: Your guess is as good as mine

In my last post, Half Way Plus, I happened to call the spreadsheet I am working on "nasty."  And I wasn't trying to be funny.  It really is rather nasty.  I'm sure people have had to deal with much worse, but this is by far the worst spreadsheet I have had the misfortune to work on.

The spreadsheet I am working from was pulled from an old, no-longer-supported database (File Maker Pro).  The information had been entered by student workers (I am assuming) just by using whatever they were given from the School of Music: the sound cassette tape and/or the program.  Yep, sometimes just the cassette tape, no program.  The info that was to be entered was pretty rudimentary: date, performers, composer, title, and notes.  From what I can tell, each of these categories was just one box and recordings were entered by work performed.  So for example, if a recital had four pieces on it, there are four entries in the database.  The notes would be repeated each time unless there was a note specific to one piece that wasn't applicable to the other pieces.

But sometimes, apparently, the information available wasn't very ... um ... comprehensive.  To put it nicely.  Check out this one recital:



"Who knows?"????  That's what someone entered under Performer?  And why is everything else in this same field as well.  Very weird, right?  But wait ... it gets better.


The composer field contains three composers as well as the weird note: "Your guess is as good as mine on this one."  This is the only entry for this recital, so instead of entering information for each piece on the recital, there is just one entry with all three composers listed together.  Obviously there was little to no info so I guess this was the only way to do it.  But really, you have to wonder what was going on in the person's head who was entering this info.

Finally, the last two fields:


Apparently no title was entered (how could there be??), the notes ended up there instead, and the notes field is just funny.  All this info was in the very first available field and then pieced out throughout the rest of the fields.  I don't understand why.  Was it a flaw in the database program being used?  Was it user error?  A combination of both?  Obviously there was some issue with the person doing the data entry, that goes without saying.

I have no explanation on this one.  I will have to actually go pull the recording to see if there is any way to decipher what this could possibly be.

Friday, April 20, 2012

Half Way Plus

First of all, I have to say that I had really hoped to make more use of this blog.  Here we are at the halfway point (well, a little past the halfway point) and this is only my ninth post.  Um, yeah ... things have been slow going.

So being that we are beyond the half way point, I really should evaluate where I am.

Number one on my Project List: Clean up the recital database.  Is this done?  No.  Is it much improved?  Yes!

  • I have separated out all personal names from all corporate names in the "performers" column of the database.
  • In the composer column I have cleaned up the vast majority of the names, fixed all the problem names that were in red (except for about 10) and blue, but decided to leave the problems that were in yellow and green (more on that later)
  • In many places there was more than one name in the composer column.  I separated out all the extra people and have them all in their own columns.
  • I started working on the Notes columns
What is still left to do:
  • Create a spreadsheet of all the names that are not authorized (yellow) or uncertain (green) so they can be looked out at a future time.  This was bogging me down too much and i finally decided that I could spend the next three months fixing all these or I could skip it and get more work done in other areas.  Skipping it made the most sense.
  • Authorize all the "other" names.  These are arrangers, transcribers, writers of lyrics, etc.
  • Notes, notes, notes, and more notes.  I found this extremely time consuming when I started it and difficult to do.  I feel like I need to see the programs for each one or at least the recordings in order to  better organize the notes.  They really are a terrible mess!!
Number two on the list: Creating templates.  Not been done yet.  This is something that I feel confident will be easy to do once I get to a point where I'm ready to move the spreadsheet into MARC records.  Plus, I know someone at another institution who told me she has a template already.  Cooperative cataloging at it's finest!!

Number five on the list (yes, I know I skipped some): Survey.  I actually forgot that it was five on the list, but really my list isn't exactly priority order.  Well, maybe a little bit, but I have my reasons for skipping three and four.  I did finally take all my notes from when I was at MOUG and MLA in Dallas and transcribe the questions and suggestions I got from my awesome, supremely more intelligent colleagues and I'm ready to start reorganizing that list and editing the questions into a survey.  I have a reference question out to help with the "demographics" part and I have a list of colleagues I respect (paired down from 300+ to about 5) who I would like to ask to look the survey questions over and give me suggestions before I send it out.  So I feel like I've made progress here.

Number three: investigate digital possibilities.  Haven't done this yet.  This feels like just an intellectual exercise, which is the kind of thing I like.  The kind of thing that led my Master's thesis adviser to tell me that I'm a great researcher but not much of a wordsmith (thankyouverymuch).  So this is on the back burner for now.

Number four: Requirements for recital programs.  Also an item that has not been done yet.  But I have started thinking about it.  That's a start, right?

A little beyond the halfway point and this is where we stand.  I feel like I still have so much to do with this nasty spreadsheet!!  There is just still so much wrong with it.

Friday, March 2, 2012

Sabbatical Challenges


As I work on this sabbatical project, there are many challenges I have as well.

The biggest cutest challenge is caring for two babies who are just now nine months old while I work at home.  I love having this opportunity to be home with them, but it is a challenge.  My work schedule consists of about an hour and a half in the morning during nap time, another hour in the afternoon during their second nap time, a couple hours in the evening a few nights a week after the boys are in bed, and occasionally some time on the weekends.  Doesn't really add up to a lot of time, so I try to make use of every minute I can once the boys are sleeping.  It's a challenge but it's a wonderful challenge to have.

An additional challenge has been trying to find a daycare for the boys for when I go back to work in July.  It seems I could find a daycare for later in the fall, but not necessarily in July.  I'm still on the hunt, but going out to interview and tour daycare facilities takes time away from my project.  

On another front, just when this sabbatical started we got an offer on our house.  We put it up for sale early last fall and this was great news!  But it also meant we had to find a new place. We did and we're going to be moving soon.  So the project will probably take a backseat temporarily while we pack up and move from one house to another.  I assume we'll also be without internet for a short period while we're between houses.

I am also working on two other projects at the same time as the sabbatical project.  One is an ongoing thing with a deadline in June.  I'm working on getting that project wrapped up and then not worrying about that project again until I'm back at work full time.  I'll have about 11 months before the next deadline.  The second project is one that I thought was complete but was asked to do a few more things too.  I never found the time at the end of the year to squeeze it in along with the many other things I was doing in trying to prepare to be gone from the office for six months and since I was also not given a deadline I've put it aside for the time being. But it weighs on my mind and I feel like I need to just go in and get it done.  I don't think it'll take long, but I'm afraid once I start that I'll discover otherwise and get bogged down in that.  I really need to find out what the deadline is, that would give me the sense of urgency (or not) that I probably need.

It's a bit of a long list and it feels overwhelming to me to list it all out like this.  But I wanted to do this post because this is part of the reality of this sabbatical and this blog is a journal of this six month project.  In the middle of all this, I am pleased with the amount of work I have actually been able to accomplish.  And when I go back to work in July I think I'll look back on these six months as the busiest ever.  This might be a break from work, but it's not a break by any means.  I feel even busier than I do when I'm working in the office full-time.  That's the reality.

Thursday, March 1, 2012

Two Months Gone, Four To Go

One third of the way through this sabbatical deserves an update.  Especially since I haven't posted about anything here in a while.

Since my last update I am still working through the Composer column to research the names that were problems during the first pass.  I'm currently a little more than 30% done.  As expected it is taking a while to work on each name.  Some are easier than others.  The problem names are color coded four different ways.  Those that are in highlighted in yellow are names that are not in the authority file.  I've decided to leave those alone for now.  I am focusing instead on the other three problem types and for the most part I have been able to figure out either who in the authority file the person is or change the color coding to yellow because the name is not in the authority file.

Two weeks ago I attended the Music OCLC Users Group Meeting and the Music Library Association Annual Conference.  While at these meetings I chatted with several colleagues about my project and specifically about what they were each interested in knowing about how others manage their recital recordings.  I had some very interesting conversations with people from small institutions and large institutions.  It helped give me more perspective on how I would like to put together a survey and the kinds of questions I should ask.  I also received a lot of good ideas on questions to ask on the survey.  I am hoping to start putting together my questions soon based on my notes of my ideas and the notes I took while at the conference.  I even had someone volunteer to look over a draft of the survey questions.

After two months I do feel like I have accomplished something, although right now it doesn't feel like it as I work through the problem names.  But I have a much more organized spreadsheet and a plan for the the parts that are still to be dealt with.  Plus the survey is coming together even if it is currently in my notes and in my head.  There is still a lot to do in the next four months!

Thursday, February 9, 2012

Progress Update

In the last update I gave I had completed about 70% of the "Name" column on the spreadsheet.  That column consisted of student performer names, ensemble names, faculty ensemble names, faculty names, guest performers, and a few other random names.  Mostly I was separating out the individual names from the ensemble names.  That is now done!

What is now left in that column and the columns that got created from it is to separate out multiple names.  Some cells have two or more performers' names or two or more ensemble names.  I also haven't gone through those columns for authorized names.  So that is all still to be done.

I have also just completed my first pass through the column labeled "Composer."  This column had been worked on previously.  So I sorted the spreadsheet so it was in order according to this column and started going through it again.  I created four new columns to go along with this one:

  • Composer: name only (existing column)
  • Composer#$b: any sort of number associated with a name
  • ComposerOther$c: any sort of title (for example, Sir) associated with a name
  • ComposerFuller: for the fuller version of a name (when a name is commonly used with initials and we know what the initials stand for, we often put the fuller form of the name in parenthesis in another subfield in the name field).
  • ComposerDates: Birth and/or death dates associated with a name
Since this field was already checked against the authority file, I was just moving the info around.  It went pretty quickly.  It's not 100% done, but the good majority of it is done.  The remainder cells have some sort of issue associated with them: multiple names, just a last name, two or more possible authorized headings, problems with diacritics in Excel, unauthorized names, and a few other minor problems.  Those are all highlighted in different colors to let me know what the problem is.

My next step is to go through the Composer column again and work on those highlighted cells.  I expect this to take a little longer since I'll have to go into the authority file for each one and try to determine the correct form of name.  I'll also have to move the additional names to a different column.  I currently have three additional columns for "Other name."  

I'll get started on this this week, but then I'm leaving early next week to attend the Music OCLC Users Group Meeting and the Music Library Association's annual conference in Dallas, TX.  While there I'll probably take the opportunity to try and talk to other librarians about the cataloging and management of the recital recordings at their institutions.  I'm looking forward to the trip!

Wednesday, January 25, 2012

Authority Control Matters

Authority control: The procedures by which consistency of form is maintained in the headings (names, uniform titles, series titles, and subjects) used in a library catalog or file of bibliographic records through the application of an authoritative list (called an authority file) to new items as they are added to the collection. Authority control is available from commercial service providers.  (Online Dictionary for Library and Information Science http://www.abc-clio.com/ODLIS/odlis_A.aspx)
Authority Control is a necessary part of a library catalog.  And it sure helps with any large database.  If I ever doubted the value of authority control (I never have), working on this database would be enough to convince me of its importance.  I'll explain.

Since my last post regarding the multitude of problems I was encountering I discussed the issues with a friend who works in computers and databases and she made a number of wonderful suggestions.  She also is going to help me with an aspect of moving the database once I get it cleaned up.

Speaking of cleaning it up ... wow, what a mess!!

After discussing this with my friend, I am no longer trying to move everything from multiple lines into one line as I was doing before.  That was taking way too long!  After two weeks on this project, I had managed to combine about 60 lines into 10 records.  I have a whole new approach now.  Currently I am going down the "Name" column only.  The goal is to separate out the individual names (remain in the column) from the ensemble names (put in a new column) from other phrases that are most likely titles (another new column).

Sounds easy, right?  Mostly it is.  I have managed to get down around line 7000 of the 10,100+ line spreadsheet.  That's big progress!

It is enlightening to see how incredible inconsistent names were entered into this database.  Notice in the definition I quoted at the beginning of this post that authority control requires "consistency of form."  Obviously that wasn't a concern with this database ... ever.

Just today I found a recital that consisted of about 10 lines of data (i.e. 10 pieces performed).  The same name appeared in all 10 rows of the column, but in about three different forms, just as example:

  • Last, Matthew R.
  • R. Matthew Last, piano
  • Last, R. Matthew
Hmmm.  So my first question: why is the person's instrument listed in some places and not others?  Second, is the initial a first initial or a middle initial?  And finally, could they not decide if the name should be listed last name-comma-first or first-last?

The order of the names is constantly changing as I go down the list.  The addition of instrument or voice part is also inconsistent.  It seems to me that there is a tendency to prefer last name-comma-first name unless there is an instrument name added on in which case it becomes first name-last name-comma- instrument/voice.  But not always.  Another issue is nicknames: sometimes they are used (Jim or Ben) and sometimes not (James or Benjamin); and it is obvious that it is the same person.  

Oh, authority control, how I miss you!

There are similar issues with ensemble names.  Some are more complete than others.  Sometimes ensembles are combined into one makeshift name, sometimes they are just listed together.

The one thing I haven't bothered with yet is lines that have two or more names or two or more ensembles.  I'm going to have to add some new columns.  I'm also not bothering with changing the names all into the same format or dealing with inconsistent punctuation.  For now, I just want to separate everything out and then I can go back through and do all those little details.  

I'm considering going on to the notes column and then coming back to the names once that is done.  Mostly because I'm expecting some duplicate info in the notes area.  That will be something to evaluate once I get done with this current column.  

Only about 3000+ lines to go!  

Thursday, January 12, 2012

Problems, problems, and more problems

There are lots of problems in the spreadsheet I am working off of.  I knew there would be, but I'm realizing that it is even worse than I could have imagined.

I've had various people over the years work on this spreadsheet, but at the time I didn't know that each recording was represented by multiple rows in the spreadsheet.  So they would sort the sheet by composer name and then be able to edit all the names to the authorized heading and only have to look up the composer once since they were listed in alphabetical order.  Sounds like a great plan, right?

So now the spreadsheet is organized by date (since that is the only way to know which rows belong together) and the composers are organized alphabetically within the concert or recital date.  Thus, I don't know what the actual order the works were performed in.

Why does this matter?  Because when we catalog sound recordings we list the works in a contents note in the order in which they were performed.  And if there are multiple performers who played on some works but not on others, we list which work they performed on, as in number of work (i.e. Kerri Baunach, clarinet (3rd work)), in a performers note.  I have no way of knowing that info at this time, so I am just listing the various performers or groups of performers in no particular order.

As a result, my performer notes aren't that helpful at the moment.  And my contents notes appear as if the works were all performed in alphabetical order according to the composer's name.  If you're a bit OCD, that might seem kind of cool.  For this OCD cataloger, it's not.

Other problems:

  • Misspellings in names of performers or groups
  • Inaccurate title info (how about Sarabande for guitar ensemble by J.S. Bach.  Or Fugue for guitar ensemble by Handel.  Seriously, no arrangers listed, no further title info to let me know WHICH sarabande or WHICH fugue.)
  • Incomplete title info (kind of goes along with the point above)
  • I have no idea which concerts or recitals have programs and which do not (I may need to contact the School of Music and spend some time with a scanner, which means going to the office and finding a babysitter.)
  • The authority work
As for dealing with the spreadsheet itself:
I scrolled down to the bottom of the sheet and found that I have about 10160+ rows of information.  I have so far converted the first 57 (well 56, row 1 is column headings) into 10 rows on a new spreadsheet that would create 10 MARC records.

That doesn't sound like much when I look at those numbers, but really ... that did take me a long time.  Refer back to the problems I'm dealing with.  Those problems are on each performance.  Every. Single. One.

On the spreadsheet, I've created four new tabs.  One for the student recitals and one for the ensemble concerts.  Since I'm creating columns for each MARC field, it seemed either to do it this way so there was only one 1XX field in each spreadsheet: 100 on the student recitals and 110 on the ensembles.  Then I created a tab as a "transfer" space.  This is just for me to copy the info for the recording I'm currently working on over to this space so I can see it better.  It was getting hard on the master list to see just the parts I needed and I kept losing my place, thus wasting time.  It's working well so far.  Finally, the fourth tab I added today as a place to list the date of recordings where there is insufficient info and what that insufficient information is.  This will help when I have to go back and fix things, it should be easier to locate the problem items.

As for the original spreadsheet, I'm not changing anything on it.  I'm keeping it as a master in case I mess something up somewhere and need to refer back to something.  I've made the date column bold and as I complete a recording I un-bold those dates.  That way I can keep track of where I am visually, especially since I am working in short stints and sometimes have to walk away in the middle of something.  Seven month old babies don't like to be kept waiting.

I also now have MARCEdit on my work computer and someone sent me a link to a tutorial on You Tube.  So new part of the sabbatical project: learn how to use MARCEdit and transfer all these records I'm working on into the actual MARC format.

Lots going on.  Problems galore, lots of cutting and pasting between spreadsheets, heavy use of the authority file, and eventually learning a new program.  On top of that, I really have to figure out a better schedule!

Wednesday, January 4, 2012

The Sabbatical Starts

The Sabbatical Project has officially started!  It's nice to work on a more leisurely pace and be able to focus on one thing rather than juggling a gazillion responsibilities (seemingly).

The biggest challenge is just figuring out a schedule.  I'm doing this sabbatical project with two babies at home with me, currently 7 months old.  Today was tough, we got a little off their schedule so they were each sleeping and eating at different times from each other.  Not good!  On the positive side, that rarely happens, so I have hope that we'll be back on track tomorrow.

I have discovered that I can hold a baby in one arm, hold a bottle in that same hand, and have the other hand free to check email, type, and search other institution's OPACs (ooooh, a new term for my Definitions page!).  That worked for one baby, not so much with the second baby.

One thing I did today was search for recital cataloging at other institutions.  This helped give me a better idea of what I really need to do with the data we have.  Biggest discovery: the data we have totally sucks.

The second thing today was looking at the data in our spreadsheet (used to be in a database format that is no longer supported, thus the spreadsheet).  Literally "looking."  I had a baby in my arms that was fascinated by the laptop keyboard.  I was a bit concerned looking at the first several lines of the spreadsheet.  I had to email the manager of our Fine Arts Media Center to see if programs were available for a couple mid-80s recitals or concerts so I could make sense of what I was looking at.  He provided the program for one and described the two cassette tapes for the others that helped answer my questions.

(Should I add "Cassette tape" to my list of definitions?  It was recently brought to my attention that there is now a generation of people who don't know what a cassette tape is.  Wow, I'm getting old!)

Lastly, just a note about the spreadsheet I'm working on.  Each performed piece of music is listed on a separate row of the spreadsheet.  Each entry contains the recital date, the student's name, and then the composer and title of the piece and any notes (all in one cell, by the way).  That's it.

So imagine a student who gives a senior recital and performs 5 pieces.  Then they stay to get a master's degree and they give another recital performing another 5 pieces.  That's ten lines on the spreadsheet that will contain their name.  The date is the ONLY way I am able to tell which recital pieces go together; it is THE most important piece of information I have.

And then you have lines with the same date but different performers.  Oy!  Two different recitals on the same day?  One recital with different performers?

I may be taking more trips into the office than I originally thought I would.  It'll be good for the babies to get out.

Tomorrow's goal: Set up a second tab on the spreadsheet  for editing purposes.  I'm also considering moving all the info for each recital or concert into one row, rather than multiple rows.  And then maybe also separate out large ensemble concerts from the student recitals (more tabs).  I feel like this week mainly about realistic organizing (as opposed to the previous planning I did); I'm getting a feel for how this is really going to work and what really needs to be done.  It's already looking a little different than I had thought it would.