What is metadata? A Christmas themed exploration

This content originally appeared on my Scientific American blog, Information Culture, in 2012. While the page remains, images are no longer visible. At the request of several readers, I’m reposting the content here. 

When I talk to most scientists and mention the word “metadata” they look at me as if I’ve grown a second head. Despite the fact that these folks regularly use and create metadata (not to be confused with megadata or “big data” which is a whole other subject), many have not heard of the term.

Broadly speaking, metadata is simply a structured description of something else. The most popular example of metadata comes from the library catalog. Each book has a title, author, call number, publisher, ISBN etc. listed in the online catalog. These elements comprise the book’s metadata, and there are rules to make sure that things are standardized.

Without metadata, discovery and reuse of digital information would be much harder. This is why discussion about metadata has increased greatly since the second half of the twentieth century.

The best way to understand metadata is to look at a few examples of metadata at work.

Here is part of a digital data table:

Screenshot of Santa's Google spreadsheet

If you stumbled across this list on the web you might be able to guess what it was, but you couldn’t be sure. It would also be difficult to find this list again if you were looking for it. The list creator might find this pretty useful, but if he or she shared it with others, we would want some added information to help the new user understand what he or she was looking at: this is metadata.

Metadata for this data file:

  • Who created the data: Santa Claus, North Pole. An email address would be nice. This way we have some contact information in case we need clarification.
  • Title: “My List” isn’t a title that is conducive to finding the file again. While it might be tempting to just call this “Santa’s list” that won’t help other folks who see this file. The title should be descriptive of what the data file contains, and “Santa’s List” could be many things: Santa’s list of Reindeer? Santa’s list of toys that need to be made? A more descriptive title might be “Santa’s list of naughty and nice children.”
  • Date created: We don’t want to confuse this year’s list (2012) with last year’s list (2011). This could lead to all sorts of unfortunate events where nice kids get coal, naughty kids get presents, or infants (who weren’t around in 2011) get nothing at all.
  • Who created the data file: Perhaps Santa created the data, but then used an elf to input the data into a computer file. Many computer programs automatically record this information, although you may not realize this.
  • How the list was created: Behavioral scans? Parental surveys? Elf on the Shelf reports? All of the above? In order to reuse this data in future research projects, we need to know how it was collected, including collection instruments and methodologies.
  • Definitions of terms used: What is “naughty” what is “nice”? How did Santa place a child into one category or another?
  • File type: What kind of file is it? The data here are pretty simple, but Santa has lots of different file formats to choose from: excel, .csv, xml, etc. Knowing the file type helps end users determine if they can use the data

Naturally, a different kind of item might have a completely different set of metadata.

This is my mom’s favorite Christmas picture of me:Me sitting on Santa's lap

My mom remembers the details of where, when and how this picture was taken, but if she isn’t around to tell the story, metadata can help:

Metadata for this photo:

  • Date the photo was taken: December, 1981. The digital version was created on 12/13/2012
  • Who took the photo: A mall employee. This can have implications for who owns the rights to use and distribute the image. The photographer? The folks who paid to have the photo taken?
  • Camera used to take the photo: I have no idea what camera was used for this picture. Luckily, modern digital cameras often automatically record this information as a part of the .jpg file. Digital cameras can also record all the detailed camera settings (for those who understand these things).
  • Location where the photo was taken: Arnot Mall, Horseheads, NY. Some digital cameras can automatically capture this information too, using built in GPS.
  • Picture format: .jpg
  • Picture size: Original size of the photo is 3.5 x 5.5 (I think). The original scanned image is 852 x 1116 pixels.
  • Description of the photo: Currently, the primary way of searching for an image is for a computer to search for the associated text. Good file names and good descriptions can be key to finding the image again. Bonnie J M Swoger, age 3, sitting on Santa’s lap. Her grandpa brought her to the mall to visit Santa. While not enthusiastic about it, she loved her grandpa and obliged him by sitting on Santa’s lap.
  • Copyright information: I don’t think the mall Santa folks were thinking about copyright in 1981 because there wasn’t an easy way to copy the photo. These days, it is important to state explicitly what rights other folks have to use the picture. Creative Commons licenses are great for being explicit about what users can do with your content.

Depending on the type of data, there may be many more metadata elements. Geospatial data, chemical data, astronomical data, etc. each have specific descriptive elements that are used. Many organizations have developed standards describing what kinds of metadata should be included and how the metadata should be formatted. This helps data creators add metadata that can be read by computers and reused by other interested folks.

Once you have well established metadata formats, you can start analyzing the metadata. Common metrics used to evaluate scholarly publication (impact factor, alt metrics, etc.) all rely on high quality metadata.

I think we can agree that Santa would use sound data management practices, including the creation and use of proper metadata, to keep track of his gift giving and logistical data. He would want the rest of us to use good metadata so we can always locate that 30 year old picture of him, too.

Be like Santa and make sure your data is findable and re-useable: use good metadata!

For a more robust (yet clear and understandable) definition of metadata, see NISO’s Understanding Metadata (PDF).


Originally published by Scientific American in 2012.


Stop saying “print” when that’s not what you really mean

At the reference desk, I occasionally see students who are looking for only “print” resources. Their professors have asked them to find journal articles or books, but are requiring them to use “print” resources. The challenge here is that their professors don’t really mean “print.” Most often, they want their students to find formally published, peer-reviewed, or scholarly sources, not blogs, wikis, or random websites. They use the term “print” because in previous decades, these types of sources would have been found as physical copies. I understand what they are trying to tell students, but students don’t understand this no-longer-relevant distinction.

I would guess that about 97% of all the journal articles that the students at my institution have access to (not including ILL) are only available to them online. Of the print journals available at my institution, the majority of these are older volumes that we haven’t (yet) replaced with online back-files. If students stuck to the terminology used in the assignment, it would mean a vast body of research was unavailable to them.

Of the online items, some journals still publish a print version, but many do not. Some high quality journals were born digital and many have stopped publishing print versions due to decreased demand.

The requirement of “print” only resources, would also exclude eBooks. My institution has access to 35,000 eBooks, and we will soon be getting a collection of 25,000 more. Are these excluded because of the mode of access?

An undergraduate student, particularly a first or second year student, is still trying to figure out the difference between scholarly and not-scholarly sources. It takes some practice for them to understand the different types of sources that are available online. Asking them to figure out if a particular source is the kind of thing that may have once been published in print is not practical.

So instead of asking students to use “print” resources, be more specific.

“For this project, you may use peer-reviewed scholarly sources and published books.”


“For this project, you may use scholarly journals, magazines, newspapers, or books.”

[Rant over.]

The results are nice, but what I really need to know is HOW you did it

For the past few days I’ve been at the 2014 Library Assessment Conference in Seattle, WA. It has been a great conference and I’ve got tons to think about and lots to do when I get back to work.

As I listen to all of the presentations one of the biggest things that strikes me is that I wish folks would spend a bit more time on HOW they did the assessment.  I like hearing about results and changes that were made as a result, but what I really need now are methods.

It is common sense to take a method and try it. If it fails, admit it frankly and try another. But above all, try something. - Franklin D. Roosevelt

And so I began compiling a list of the methods that librarians use when we assess our services (including student learning). There are lots of variations of each method, but I’m trying to think of the fundamental methods that assessment folks should learn about:

Data collection:

  • Surveys and tests
    • How to write good questions? When is a survey appropriate (or inappropriate)? What can we learn (and what can’t we learn) from surveys?
  • Focus groups
    • How can we recruit participants? How to structure the conversation? How to use the results?
  • Observations
    • What to look for? How to record your observations?
  • Structured interviews
    • How to write good interview questions? What are good interview techniques?
  • Automatic capture (ILS, ILL, Counter, etc.)
    • What is your library currently collecting? How is it accessed? What are folks currently doing with it?
  • Event capture (not sure what to call this, but I’m thinking of reference stats)
    • What are you currently capturing? How are you using that data?
  • Collecting authentic work (student papers, faculty publications)
    • What to collect? How to encourage faculty and student participation?

Data analysis:

  • Statistical analysis of existing data (ILS, Counter, Ref stats, etc.)
    • I think this is the area that scares librarians the most
    • What types of statistical analysis do we need to know about? How can this help us?
  • Data visualization
    • When is this most appropriate? What kinds of visualizations are the most helpful? What tools should we use?
  • Rubrics
    • How can we develop a good rubric? When should we use such a time intensive method?
  • Content analysis
    • How do we develop coding categories? What software do we use? How to interpret and share the results?
  • Citation analysis
    • What can this tell us? Do we look at student papers or faculty publications? What metrics might be the most helpful?

Obviously, each of these categories contains many variations on the theme, and expecting one person (or all library staff) to know about all of these methods is unrealistic. But we should probably be aware of what methods exist, so that when we need one, we can get some help in applying it.

What other methodologies are libraries using?

John Oliver as an Instruction Librarian

I’ve been a big John Oliver fan from his time at the Daily Show and his weekly podcast with Andy Zaltzman, The Bugle, but I never knew just how effective he could be as an instruction librarian. Just watch this video below as he carefully examines the emerging media form of native advertisements and the problems associated with this un-holy alliance between the business and editorial sides of media organizations.

Update: Just yesterday, Advertising Age published an article about the New York Times shrinking the labeling of it’s Native Advertisements. It is difficult enough for people to recognize the difference real ads and news stories, this makes things much harder.

Librarians need bigger egos

Obviously, not all of them.  Some of us have big enough egos and need to tone things down a notch.  I’m not talking about the big egos and so-called “rock star librarians,” but the egos of regular working librarians.

In a 2008 article in Library Journal, Casey and Stephens argue that egos are bad for libraries:

The ego, we concluded, can be a very damaging thing. Inflated. Overbearing. Egos create rules for rules’ sake. Egos complicate procedures and keep good people down. Egos squash good ideas and can take the best of an organization and turn it on itself.

But they really refer to over-inflated egos.  I argue that a healthy, reasonable ego is a good thing. For all of us. Perhaps this is semantics: since the word ego has some pretty negative connotations, maybe I really mean to suggest that librarians need more professional self-confidence or self-esteem.

Because librarians are smart. Damned smart. They are talented, knowledgable, hardworking and willing to go out their way to help others out. If you want to find something out or get something done you should definitely ask a librarian.

But I’ve seen colleagues acquiesce without any discussion to poorly thought out faculty demands regarding library instruction. I’ve seen librarians sit quietly through meetings with bosses or administrators and then provide intelligent, thoughtful criticism after the meeting when the boss isn’t listening.  I’ve heard colleagues at conferences complain about faculty not including them in learning management systems and I find out that they never asked.

What contributes to this quietness, this passivity, this inability to assert ourselves even in the areas of our expertise?

Is it gender? Over 80% of librarians are female, and workplace gender dynamics might come into play.  I’m certainly no expert on this topic, but books like Nice Girls Never Get the Corner Office and Lean In seem to suggest that women need to be more assertive at work and stop confusing “being nice” with asking questions and stating opinions. NPR has an interesting new series called the Changing Lives of Women. As a part of that series, they have created a tumblr project called She Works: Notes to Self encouraging women to share their slogans, affirmations and advice. Many of submitted slogans encourage women to speak up, “Sit at the table and speak up,” and “Don’t be shy. Promote your accomplishments.”  But there are also of slogans encouraging women to be nice or be quiet, “Smile on the outside, tell them off on the inside” or “Work hard and be nice to people,” advice that I’d bet wouldn’t be posted a similar site geared to men.

Sit at the table and speak up

Is it education?  Although librarians often have faculty status, we most often do not have PhDs like most of the rest of the faculty. I routinely call professors by their first name since we are colleagues and that’s what colleagues do these days.  But other librarians routinely call professors “Professor Smith” even when the professor uses the librarian’s first name.  Are librarians intimidated by the title or the degree? Are some folks less likely to state opposing opinions or ask challenging questions?

Is it the library’s place within the institution?  Although we are often faculty, we are different than classroom faculty. No matter how robust our library instruction programs, we sit outside of the classroom and teacher model that serves as the core of most high education institutions. And in a digital world, some faculty start to question the ongoing relevance of the brick-and-mortar library.  Are we stymied by our kind-of-outsider status?

I don’t know what the answer is.  But I’ve met and spoken with lots of librarians, and I know what they are capable of.  They are amazing, articulate professionals with a deep understanding of how folks search for information and the knowledge of what kinds of information is out there. We know about scholarly publishing, instructional design, data resources, pedagogy and a gazillion other things.

Let’s dust off those egos. Let’s make sure other folks know our strengths. Let’s stand up for our accomplishments.

Getting your friends and colleagues to share what they know

One of the things I love about working in an academic library is the steady opportunity to learn about new things.  I learn things when I help students, work with faculty and talk with my colleagues. Over the last couple of years I have worked to organize an informal series of workshops to help librarians and faculty share the things they know with each other.

It started a couple of summers ago, when the newly formed Instructional Design team at my library organized a series of technology workshops.  We each took turns sharing new websites, apps and other tech tools that we liked and used in our work.  I thought this was great, and I loved hearing about all the things my colleagues knew about.

Classroom chairs
CC-BY image courtesy of Flickr user James Sarmiento

Last summer, I wondered if they were going to do the same thing.  At the same time, the Scholarly Communication team in my library was hoping to do some workshops for library staff about things like open access and the Elsevier boycott.

At this point I took over getting things organized, and twisted the arms of my colleagues to put together workshops about things they were knowledgable about.

It was a rather selfish move on my part – I wanted to learn about the things my colleagues knew.

With help from colleagues, we brainstormed things that we wanted to learn about and recruited folks to present on those topics. I wanted the workshops to have an informal feel: I sought out hands-on workshops and discussions more than formal presentations.  I also asked the simple question: why don’t we invite folks across campus to these workshops?  There was no good reason not to, so we sent campus wide emails advertising the workshops that would be over interest to folks beyond the library.

The first summer I organized the workshops, all of the speakers were library staff members.  This year, I asked for workshop topics from our CIT office on campus and a couple of faculty who are doing some interesting things. I’m excited about the workshops they will be presenting.

Now, we are a small institution with a small number of faculty.  I needed to be realistic in terms of my expectations of attendance: we weren’t going to be filling lecture halls.  Attendance at the 2012 workshops varied widely, from a low of 4 to a high of 16 folks from the library and across campus. For our small campus, I was quite happy with these numbers.

At the end of the summer, I sent an evaluation survey to campus faculty and staff and got some great feedback regarding workshops to hold again, ways to improve communication about the workshops and suggestions for future workshops.  One of the less tangible benefits of the summer workshops was the way in which the existence of the workshops (and the emails announcing them) added to the library’s reputation as a group of folks to talk to about scholarly communication issues or some instructional technology issues.

Here are some of the workshops we have lined up for this summer, relevant to staff across campus:

  • What’s In a Name? The Many Facets of the Word ‘Editor’
  • Mendeley
  • Zotero
  • Time Management for Busy Geeks
  • Gmail community roundtable: labels, searches, filters, labs and more
  • Copyright and Creative Commons
  • Trends in peer review: third party peer review services
  • In praise of paper: an open discussion about our favorite paper based tools
  • Introduction to R: Free and open source program for statistics and data analysis
  • Reading your Copyright Transfer Agreement
  • Video hosting and sharing with Ensemble
  • Managing your online professional identity
  • Open Educational Resources
  • Instant Response System

I’ll be leading a couple (Mendeley, Trends in Peer Review) and attending almost all of them.  I’m excited to learn about some interesting things from my smart and talented colleagues.

Does your library have a professional development program?  How do you facilitate the exchange of knowledge between library staff?

Setting up research consultation appointments using Doodle’s MeetMe page

This afternoon I saw positive results of a little experiment I set up using the find-a-good-meeting-time site Doodle and its MeetMe feature.

You see, I don’t have office hours. Sure, there are lots of hours when I am in my office, but unlike teaching faculty, they are never the same from week to week.  This week I am available on Friday at 2pm, next week I’m not. This can make it complicated to set up meetings.  I can rely on viewing my colleagues calendars in Google Calendar, but making appointments with students can be more complicated.

Most of them don’t use Google Calendar, and it can take a lot of emailing back and forth to find a mutually agreeable meeting time.  I make my schedule available online, but most of them don’t think to look it up.

My profile from our Subject Guides pages
My profile from our Subject Guides pages

A couple of months ago, I put a link to the MeetMe service on the profile of me that exists on all of my subject guides – “Make an appointment.”  Students click on the link and see times when I am available (because MeetMe connects with my Google Calendar) and can request a time that works for us both.  I get an email from Doodle, click on a link to confirm (or reject) the appointment, and the meeting is automatically added to my calendar.

I liked how this worked.  It was easy and convenient. But it isn’t perfect.

For years, students at my institution have been able to fill out an online form to request an appointment with a librarian. Students give us a bit of information about the project, their topic and their availability. The form is sent to all of the librarians, and the most appropriate (or most available) librarian “claims” the request and responds to the student. This is great for record keeping purposes (info is automatically entered into a database) and wonderful for students who don’t know who they want to meet with, but it can waste time if the student already knows who they need to talk to.

So I’ll keep the “Make an appointment” link on my profile and see if there are other ways to use this service.

Library publishing services: share what you’re up to

I should have posted this before, since the deadline for submissions is tomorrow, but my library is working with the Monroe County Library System to put together a Library Publishing Toolkit.

A working printing press in one of the labs at the FHTW in Oberschöneweide. CC image courtesy of flickr user tölvakonu

Proposals are due tomorrow (February 15, 2013), but all we are asking for right now is a 300-500 word abstract about the services and strategies that your library. 24 hours is plenty of time to write 300 words!

If you are doing some interesting or innovative things related to publishing, think about sending in a proposal. Some questions we are hoping to answer include:

  • What programs and services are offered by libraries to writers?
  • Does your library help users develop curated content to publish either in print or digital form?
  • What strategies are being used to select items for digitization?
  • Has your library identified unique print materials to be digitized and potentially sold?
  • Has your library developed partnerships with other agencies to support digital publishing?

If your abstract is selected (and you’ll know pretty quick), the final brief papers (just 2-5 pages) will be due on April 22nd.

See the complete call for papers on the Library Publishing Toolkit website. Submissions are accepted via email to browna@geneseo.edu.

Explaining science using simple words: Up Goer Five

While thoroughly enjoying the recent #overlyhonestmethods meme on twitter I came across the #upgoerfive meme.

This latest meme was inspired by an xkcd comic that attempted to explain the Saturn V rocket using only the 1000 most common English words.  Saturn V becomes Up Goer Five.

So Theo Sanderson created a text editor that only allows you to use the 100 most common English words and challenged scientists to explain what they do using simple language. It isn’t easy.  Anne Jefferson and Chris Rowan (of the excellent blog Highly Allochthonous) then created a tumblr blog collecting examples, like this description from volcanologist Lockwood Dewitt:

I like rocks. Also high places that sometimes act like they’re on fire. But what I really like is sharing what I know about rocks and high places, and how those things came to be. What made them? What moved them? Why are they the way they are? I think the answers to these questions are important, and I think people should know more about them. So I use words and pictures to show everyone how beautiful and amazing rocks and high places are, why they’re important to us, and why it’s important to know about them. Sometimes I even get to take people to see rocks in real life, which is the best part of what I do.

I tried my hand at explaining my job in simple language:

I help people learn about the different types of stuff they can find on the computer. I help them find books and computer stuff they need to learn about the world around them.  And I help them learn about how people tell other people about what they learned.

One of the first things I thought of was how this forces you to really think about the topic you are writing about, because you can’t rely on the jargon you normally use.

My next thought was that this could be a useful exercise for students, and could help them understand the concept of “putting something into their own words,” a concept that I talk about often in plagiarism workshops.

The first part of putting something into your own words is to really understand what you are trying to say, a step that students sometimes skip when putting together their term paper at 1am the morning before it is due.

So this might be an interesting challenge for students: ask them to use the Up Goer Five text editor to explain their research or term paper topic.


Student learning at the reference desk: bringing home notes

For as long as there have been reference desks in libraries, there has been a debate and a discussion about the nature of the reference transaction.  In some cases, the reference transaction is simply a question and answer exchange. The patron asks the question, the librarian finds the answer and passes it along:

Q: Who was the 32nd president of the united states?

A: Franklin D. Roosevelt

But in many cases, the reference transaction is about much more than simply providing answers, it’s about teaching the patron how to find the answer themselves.

Q: Who was the 32nd president of the united states?

A: Well, a quick google search leads us to the Wikipedia list of presidents.  Here it lists Franklin D. Roosevelt as the 32nd president, and provides some citations for that information.  If you are just curious, the Wikipedia list will be perfect, but if you need to cite this in a paper, you might want to refer to the White House website, which would be more authoritative and provides some good biographical information.  Do you need to find more information about Roosevelt?  if so, we may have some biographies of him in our collection….

Just like any other learning opportunity, a big part of the whole experience is retention – do the students/patrons remember what you taught them an hour from now, a week from now or a month from now.

Taking notes is a useful tool in the learning process. CC image courtesy of Flickr user geekcalendar.

With that in mind, librarians at my library will be working on some new practices for the Spring semester.  Building on the tendency of librarians to jot down search terms or possible databases while working with a patron, we will be making a more concerted effort to write down notes as we answer a question and give those notes to the student when we are done.  The idea here is that students will better be able to retain the knowledge they gained if they can refer back to the notes that were taken.

We’d like to try and capture information on this student learning, so we are going to try two things.  First, we will be using standardized carbonless duplicate note-taking forms.  This way the student gets a copy of the notes, and we can retain a copy for future study.  Second, we hope to combine this with an assessment of student learning at the reference desk (or at least an assessment of what students think they learned at the reference desk) by asking students to fill out a brief survey asking them what they learned.

Hopefully we can be more deliberate in making sure students walk away with a record of the transaction and this will increase the learning that goes on during a reference transaction.

I’ll let you know how it works out.