At COOP2006, Michael Buckland gave a very insightful keynote talk about the notion of "documents" and indexicality, regarding their retrieval capabilities. The discussion was around the fact that all documents are artifacts, but are all artifacts documents? He describes how documents pervade society: used in various contexts (educators, scientists, publicists, religion... lawyers and courts), people use documents as more than a just inert artifact. For instance, scientists use documents (articles, offprints) as the archive of achievement and for personal status; or educators use documents (textbooks, instructional materials) to teach, to empower and to diminish teachers. I also like this example: governments use documents to exercise social control: "to travel the passport is more powerful than I am; I could have send my passport here but then I won't be able to come over".
Then he highlighted the phenomenological, semiotic perspective of "documents" by referring to Suzanne Briet (1951): "[a document is] any concrete or symbolic indexical sign preserved or recorded towards the ends of representing, of reconstituting or of proving a physical or intellectual phenomenon". For example, an antelope becomes a document when somebody captures it and brought it at a museum and write an article/shoot a documentary about it, those are secondary documents. He additionally took the example of "a dead bird library": it is meant to be used by students and reseachers: dead birds are documents. It is more convenient and characteristics than a picture or a living bird. It's a document because it is a meaningful sign. You can never say that something could never be a document
There is hence a document - perceived and a document - expressed (code, language) (mode of expression: language, image, sound) (technology). The problem is when we're looking for documents: the indexing and searching problem; the problem is that each specialist express things differently: individuals from different communities need different help. In this context search engine are rather "machine a sélectionner" (selecting machines) than "search engine" so there should be different mapping: - between searcher's words and indexing systems terms - between author's words and indexing terminology - between search query and document metadata
To be efficiently selected, collections of documents need indexing, and here there are some interesting characteristics about that: - indexing is forward looking: indexing is done for a future purpose, so you're imagining the purpose of the group for which you wanted to be useful - indexing is backward looking: "about X" refers to the past discussion / dialog / description what is now named X. - indexing is inscribed in a point of time: time continues so all indexing is necessarily obsolescent. - mention (useing this word) is not meaning (having this sense) - and it's worse because language evolves differently not only in time but in different social groups: cow/sheep becomes beef/mouton in english when you move from the peasant world to the bourgeois world (from english to french).
This connects to Ludig Wittgenstein who showed the value of dialects and contexts: - language games: meaning is constituted through activity / language usage (different contexts) - language regions: language games differ in different language zones (different dialects) This is related to the fact that meaning is dynamic: language is disambiguated within contexts and specialized dialects.
Why do I blog this? even though this might seem very abstract and high-level at first glance, this kind of account is very important while working on collaborative applications because it shows how context and communities play an important role in the creation of a common body of knowledge (regarding information retrieval of course) and therefore to perform collaborative activities (like having a proper document collection in a community of practices or within a company for example). This of course connects to our Mutual Modeling project at the lab.