Introduction: If you have two dictionaries in the same language or in different languages, they can be very complementary in that some dictionaries might have a lot of syntactic information, while others have more information about the meaning of words. Domain specific dictionaries contain specialist terminology. Linking a specialist dictionary with very specific terminology to a general dictionary for example can provide a fuller picture and thus improve the working of systems.
If researchers build systems for a company they will use an in-house dictionary with all of their terminology and link that to a general dictionary. Linking the two allows them build an effective system. This becomes very important in a cross lingual setting like a multi-national company if you have terminology in different languages.
The work of linking dictionaries is important in the realm of text analysis. It’s incredibly important when it comes to figuring out the meaning of text especially given the fact that a single word can have many different meanings and uses. Take lead the metal and lead the verb. How do you tell the difference in text given they are written the same? The thing about text disambiguation is that in order to do it effectively, you need to be able to use all of the words. By taking large chunks of text, particularly words that are fairly common, out of the mix, you are limiting the efficacy of the technology.
Researchers work with lexicographers and existing dictionary resources. Copyright becomes a big issue in this work. Dictionaries are copyright works the dictionary market is a commercial interest. Often researchers are limited in how much they can reuse dictionary data.
Key question: Are copyright issues impeding research in this field?
Key question: Whose rights should win out in this scenario?
Key question: Should a special case be made for researchers conducting not for profit research?
Brand names in dictionaries
There is an issue of trademarks in dictionaries. Say for example, Hoover and Google. These are both brand names and generic terms.
Researchers in Princeton have a very large dictionary called Wordnet. They have had angry letters from large corporations because they included things like brand names, names of pharmaceuticals, different drugs and so-forth. While these were clearly labelled as proper nouns and correctly defined and attributed to the company and so-on there was an issue that by virtue of the name being in an English dictionary, that made it an English word rather than a brand name. Trademark law dictates that you can’t use a generic word as a brand name. You can’t make a tractor and use a brand name Tractor for example.
The lead researcher in Wordnet took the decision to remove all of these terms. This is regrettable from a research point of view, but the legal risk is too big despite the fact that they weren’t breaking the law. Liability is also unclear and probably depends on the research contract itself.
Key question: Does excluding trademarks from this sort of research compromise it?
Key question: Whose rights should win out in this situation?
Key question: Should a special case be made for not for profit research?
Clarity of licensing
Key question: Combining to resources can be problematic because their licences, even if they’re open, may not be compatible. Can you put a non-commercial resource together with a share alike resource for example?