Citations, Data and ScienceOnline2011

Despite the vast array of challenges and problems with creating and tracking citations to journal articles, the scholarly publishing realm has developed (over the past 350 years) standards to deal with these things.  New concepts such as DOIs, an increase in the number of providers who track citations (Web of Knowledge, Scopus, Google Scholar), and tools to easily format citations have made all of this a bit easier.

Scholars are now facing new challenges in creating and tracking citations.  The types of material being cited are probably more varied than ever.  Scholars are citing archived data sets, websites that may not exist in few months (or years), multimedia, and perhaps even blog posts and tweets in addition to the traditional journal articles, books and technical reports.

At the Science Online 2011 conference, several speakers lead discussions that focused on the challenges and possible solutions to some of these new issues.

Jason Hoyt, Chief Scientist at Mendeley, discussed some of their new initiatives to track citations based on user libraries.  Since I don’t want to spread misinformation about the nature of these initiatives and I’m not entirely clear about them, you’ll just have to stay tuned for more information.

Martin Fenner discussed his work with project ORCID, which will be a publisher-independant tool to help with author disambiguation.

Overall, there was an interesting discussion about the nature of citation itself.  The way the metrics count it, a citation is a citation.  You get ‘credit’ for a citation even if the folks who cite you say that you are completely wrong.  Is there a way to use the semantic web to indicate how a citation is being used?  For example, Scopus indicates that Andrew Wakefield’s retracted paper about autism and vaccines has been cited 714 times since its publication, including almost 65 citations since the paper was retracted at the beginning of 2010.  Could there be a way to easily say how many of these citations say that Wakefield was wrong?

With all of these interesting advances, there are a lot of challenges.  Can the same set of metadata used to describe genetic data be used to describe high energy physics data?  Are we moving toward a future where scholarly metadata is exponentially more fuzzy than it is now?  Will standard procedures develop – is there an incentive for standard procedures develop?  Who will develop them?

I don’t know enough to even hazard a guess at the answer to these questions.  For a least a little while, before scientists, publishers and librarians work out the details, undergraduate students are going to be even more frustrated at citing material for their projects, especially due to varying faculty expectations.  The “How do you cite this?” questions at the reference desk will get much more complicated before they get any easier.


Comments are closed.