« Older Home
Loading Newer »

Free the facts

21Jan09

Dave Gray has a nice Flickr slideshow up about research and open access.

Using reCAPTCHA to help digitize books

10Jan09

We’ve reactivated reCAPTCHA on our new domain. CAPTCHAs are the distorted images found on registration forms that help determine whether a user is a human or a computer program, such as a spam bot. Carnegie Mellon’s reCAPTCHA takes this function to a new level by using human-generated inputs to help digitize old books.

As recaptcha.net explains, Optical Character Recognition (OCR) cannot successfully digitize all words from book images. 

reCAPTCHA takes these unreadable words and uses them to generate CAPTCHA images. When human users solve the CAPTCHA, their responses help decipher the unreadable words. (In case you’re wondering how it works, users are given two images, one successfully OCRed and another that is not. When a user gets the OCRed word right, the system assumes he/she is correct about the other word. Responses are aggregated together to improve the confidence of digitization.)

reCAPTCHA currently helps digitize books from the Internet Archive and old editions of the New York Times.

We use reCAPTCHA on two areas of the site: user registration and Phylo Forum (where visitors can post messages without registering.

New .info domain

08Jan09

On Monday, we acquired phylo.info and moved the site over to our new domain. We still own phylosophy.net and all requests to our old domain will be redirected to our new one (seamlessly, as far as I can tell).

.info is one of the more popular generic top-level domains released in 2000. It is intended for “informative websites” and had roughly 5.2 million name registrations by April 2008.

One benefit of our new domain is the simplicity of browsing to

phylo.info/john_rawls

phylo.info/willard_van_orman_quine

phylo.info/donald_davidson

Benjamin Rand’s Bibliography of Philosophy

05Jan09

I just got my hands on Benjamin Rand’s 1905 Bibliography of Philosophy, Psychology, and Cognate Subjects. Rand lists roughly 60,000 books, articles, and reviews that were available in his time, and provides nearly exhaustive coverage of the nineteenth-century literature.

Rand’s Bibliography will be crucial for us as we expand our dataset backwards. How we’ll parse the vagaries of the citations is anyone’s guess at the moment. There will be more to come on that once we finish up 20C dissertations and appointments.

For the moment, I wanted to share  few paragraphs from his Preface that express the same spirit as Phylo over a century earlier:

Information concerning philosophical literature has heretofore been scattered among such a great variety of sources that much expenditure of time and effort has been required before it became available. A comprehensive bibliography of philosophy has therefore long seemed a necessity. To form a single serviceable bibliography[,] the literature in various philosophical publications of recent years and the vast array of dispersed data of earlier periods needed to be brought together. To accomplish this task has been the aim of the present work.

. . . Notwithstanding the many years thus devoted to the work, more time might doubtless have been spent on it in point of completeness and thoroughness; but there is a limit to what can be fairly expected of single-handed and self-supported endeavour. The constant desire, however, has been to afford judicious and ready access to philosophical literature alike to student, librarian, and teacher. Whether this end has been satisfactorily accomplished can best be determined by the measure in which the work shall prove helpful in revealing the valuable sources of information in the realm of Philosophy, and by the extent to which it shall serve as a vantage ground from which to carry forward independent philosophical research.

. . .

If it happens in the coming years that students of philosophy in different lands shall first turn here for the sources of information, and, without retracing the steps already laboriously trod, shall proceed more readily with their own original researches, then this work will indeed have served a useful end. That it shall give readiness of access to the works of the ‘great ones’ in philosophy, and shall render available to all, the literature on those systematic subjects with which philosophical writers have dealt; that it shall furnish the means through which libraries of philosophy may more readily be founded or enlarged; shall prepare the way whereby new philosophers may more freely advance and new systems be created; that it shall testify to the intellectual brotherhood of man by true service toward all—are hopes which have stimulated to constant effort and lightened toilsome hours in the preparation of this work. If in spite of the work’s limitations any of these purposes shall hereafter be fulfilled, the end sought by the author will have been reached and his true reward have been attained.

It took Rand over a decade to finish his Bibliography. Let’s hope Phylo goes a little quicker!

New design launched

21Dec08

We’ve implemented a new site design that’s lighter, cleaner, and helps segment information more effectively. I’m posting wireframes of the old and new designs below for comparison.

Site design (09.2007)

Site design (12.2008)

One thing we haven’t fully implemented yet is a history function near the top right part of the screen. A breadcrumb will be added for each piece of content a user visits or each search a user performs. This will provide a visual way of tracing back your location as you navigate through the site.

Pre-1975 North American dissertations added

09Dec08

Last month, I finished cross-referencing Thomas Bechtle and Mary Riley’s Dissertations in Philosophy Accepted at American Universities, 1861-1975 (New York: Garland, 1978) against other sources of philosophy dissertations, including Dissertation Abstracts International, The Pragmatism Cybrary, and various library catalogs.

As described in the preface, Bechtle and Riley compiled a preliminary list from standard sources, including Dissertation Abstracts International and Comprehensive Dissertation Index, and attempted to contact university libraries to verify each dissertation. The resulting list contains 7,503 dissertations. By far, this is the most methodological index I’ve come across, but I think it’s still important to verify entries against other sources when we’re not consulting a hard copy of the dissertation.

Our final list turned up 7,429 dissertations, including several hundred we consulted at libraries in summer 2007. These have all been added to the database, bringing our total number of dissertations up to the 9,000 range.

In addition, 1,087 dissertations were listed in at least one source and still need to be verified, probably by checking library catalogs. Once this is done, Phylo will contain every North American dissertation written in philosophy before 1975.

As a preview, I’ll start doing the same cross-checking for post-1975 dissertations in January. By far, the most helpful source here will be Review of Metaphysics, which has published an annual list of dissertations since 1957.

“Naturalized Metaphilosophy”

30Nov08

David and I received word earlier this month that our article on “Naturalized Metaphilosophy” has been accepted for accepted for a special issue of Synthèse on Representing Philosophy. (Thom Brooks’ blog has the last copy of the CFP that is easily accessible.)

ABSTRACT.  Traditional representations of philosophy have tended to prize the role of reason in the discipline. These accounts focus exclusively on ideas and arguments as animating forces in the field. But anecdotal evidence and more rigorous sociological studies suggest there is more going on in philosophy. In this article, we present two hypotheses about social factors in the field: that social factors influence the development of philosophy, and that position of status and reputation—and thus social influence—will tend to be awarded to philosophers who offer rationally compelling arguments for their views. In order to test these hypotheses, we need a more comprehensive grasp on the field than traditional representations afford. In particular, we need more substantial data about various social connections between philosophers. This investigation belongs to a naturalized metaphilosophy, an empirical study of the discipline itself, and it offers prospects for a fuller and more reliable understanding of philosophy. 

pdficon.jpg Download “Naturalized Metaphilosophy” (PDF)

Beta site launch

05Oct08

We’ve just opened access to a test drive of the site at www.phylosophy.net. At present, you can search for individuals and institutions within the database, and explore connections between them using links. The most recent degrees and appointments from our core set of schools are included, as well as advisors. For a good, complete sample, check out our home institution, CUNY, as well as some of our recent PhDs, such as James Snyder, Fritz McDonald, and Christine Vitrano.

At this point, the data is still tabular, but we’re making steady progress on our first visualization, which should be an institutional timeline. Charts, graphs, and network maps should follow in the coming months. We’ve also disabled account creation and data editing/uploading for the moment, until the rest of our initial, verified dataset has been entered.

After you’ve had a chance to play around a bit, drop us a line in the Feeback section of Phylo forum or via email (phylo@phylosophy.net) with your initial thoughts on site design and usability.

The hundred years problem

15Apr08

Increasingly, I think we’re saddled with what I’m calling the “hundred years” problem. By that, I mean that from at least 2000 forward, it’s fairly easy to compile degree, appointment, and publication information, since (nearly) all of it is published on the web (and sometimes even available in RSS, XML, or flat data formats). Some of this harvesting is complicated by nonstandard metadata, but web-wide standards like Dublin Core are emerging to address these worries.

So much for the future. Let’s consider the more distant past—namely, information before 1900. Much of this isn’t available at all for minor figures in the field (which probably makes up the greatest percentage of the field), and information on major figures is the province of specialized historians and archival efforts. Google Books and the Universal Digital Library are making some headway in archiving older materials, but the process is slow-going and it’s limited to books at the moment (we are, after all, interested in other records as well). Incidentally, UDL estimates that no more than 10 million of the 100 million books since recorded history were written before 1900. Those 10 million will be a huge task, but the bigger task is 1900-2000, at least by the numbers game.

And that’s where we’ve entered. In focusing on North American philosophy since the first dissertations in the 1880s, we’ve started off Phylo right in the middle of these hundred years of densest material. The problem, of course, is that it’s close enough to the present to obtain, yet time-consuming and costly enough to present a real deterrent. We will, of course, have plenty of this information from the start, given the longevity of the programs we’ve chosen to research. But complete saturation looks almost as difficult here as it does for pre-1900 data, where we often don’t know how much exists (and thus how complete our current records are).

Recognizing this problem has led us to think more about our longer short-term goals. Without a great chance of success in filling in 1900-2000 data, it might make sense to start expanding back further, to pre-1900 information that historians already have available. We’ve always know this will require some conceptual changes (e.g., ‘degree’ and ‘institution’ need to be understood more metaphorically as periods of study and places where philosophy happens). In light of the hundred years problem, though, it might be useful to make these changes sooner and start collecting more varied data from earlier periods in philosophy.

ISI Web of Science

02Apr08

David and I both attended presentations on ISI Web of Science today. WoS is taking an interesting and, in many ways, different approach as a search tool. Here are a few of the things that stood out:

  • Keywords are de-emphasized. There is no taxonomy associated with WoS (since it is so interdisciplinary in scope), so users are encouraged to search by authors (including their home institutions) and particular publications. WoS does assign keywords to articles using an algorithm that looks at titles and summaries, so users can search by topic, but it’s certainly not the preferred method.
  • Influence is understood in terms of citations. Each record is tagged with as many citation links as possible (only journal articles are included). As searchers, we were shown how to find the handful of mega-articles that hundreds of other articles on a topic all cite in common. If this really is a good measure of influence, it seems possible that one could jump into any topic knowing virtually nothing about its major players and sift them out from pure citation counts.
  • H-scores. Certain Doubts has had several posts about h-scores in the past few months, so I’ll simply refer you to discussions on 29 Nov, 13 Dec, 15 Dec, 17 Dec, 19 Dec, and 28 Dec.
  • Search queries seem pretty user-intensive. There’s no fuzzy search capabilities (”Did you mean X?”), so there was a lot of emphasis on wild card and truncated search strings. (See below.)
  • Some attempt at visualizations. I noticed two kinds of citation reports available for viewing. One shows the number of publications returned for any search; the other shows the number of citations within that publications set. These charts are static images generated upon request, and seem similar to Scopus’ visual capabilities (although I wouldn’t know because the server always times out before my image is generated by Scopus). Here are the two charts I generated for “rawls AND justice”.

WoS has data for arts and humanities going back to 1975, and I think it will be interesting to see how much it catches on in the humanities and in philosophy. One general limitation—one that I raise in the An Introduction to Phylo—is the way in which this tool makes the user do the work, rather than the other way around. I was struck by how much presenter of the session was essentially training us to work with the tool by favoring publication data over keywords and filtering searches in certain ways, rather than giving us an intuitive tool that worked however we found most natural. In general, I think this underscores the need for more participatory design in building search tools.

Beyond just asking users what they think of the tools we’ve built, we need to learn more beforehand about how they process information and in what forms they find that information most cognitively salient. I think we’ll learn some of this once we launch and revise our displays, and I hope we can come up with some model of participatory design that facilitates the process.