Multilingual Linked Data: the discussion continues

If you are familiarised with the idea of the Semantic Web, you probably are also aware of the difficulties of its practical realisation. A relevant one is the presence of language barriers between semantic information expressed in different languages. Precisely, multilingualism and Semantic Web is one of the research topics in which we are currently involved at OEG. Of course, linguistic barriers are not exclusive of the Semantic Web but of the Web in general. And to treat such topic (Multilingualism on the Web), the Multilingual Web initiative dedicates a series of workshops since a few years.

A few weeks ago I attended the W3C Multilingual Web Workshop in Rome, at FAO. It was the first time I went to one of these MW workshops  and it was a very nice experience, I have to say. There I had the opportunity of co-chairing a session on  Best practices for Multilingual Linked Open Data jointly with Dominic Jones (Trinity College Dublin) and José Labra (University of Oviedo). I will try to summarise the experience in a few lines.

From previous events, such as MLODE workshop in Leipzig (Sep 2012), it was clear the interest of the community around the topic of multilingualism in Linked Data generation. But also, as it was seen in the discussions that followed José Labra’s talk in Leipzig, the lack of consensus  was evident in many aspects, such as the use of URIs vs IRIs, opaque URIs vs descriptive URIs, the scope of language tags, the role of content negotiation, etc. Still with the feeling of an unfinished discussion after Leipzig I proposed to the organisers of the Multilingual Web Workshop to celebrate a sort of panel to continue this discussion in Rome, and this took the form of the breakout session that Dom, José and myself coordinated.

We started the session in Rome with a set of really interesting lightning talks (Ivan Herman, Gordon Dunsire, Daniel Vila,  Dave Lewis, Charles McCathie Nevile, Roberto Navigli, Haofen Wang). They told us about their particular experiences, ranging from bibliographic standards to Chinese LOD generation, and pointed out common issues when dealing with multilingualism in Linked Data.

Then, the discussion session followed, with a lot of interaction between speakers and public. It was mostly focused on three topics: naming (URIs), labelling, and linking of multilingual content in LD.  There was a general feeling that IRIs (internationalized resource identifiers) are cool but that their use is hampered by the lack of support given by current tools. Regarding language tags, it was agreed that they should be used always; although, sadly, this best practise is rarely followed by semantic data providers. Also the participants commented on the necessities and difficulties of linking vocabularies in different languages, and on the fact that links others than owl:sameAs have to be further explored. Finally, it was pointed out the necessity of defining suitable use cases for multilingual Linked Data in order to guide our future discussions on best practises.

Although our initial intention was to write a kind of white paper with the conclusions of the session, it turned out to be a too optimistic idea: after the session there were too many open issues remaining and too few agreements made. Nevertheless, following Felix Sasaki’s suggestion, we agreed that our discussions would continue in the context of a new W3C Community Group.

And here we are, launching a new W3C group “Best Practices for Multilingual Linked Open Data“, and hoping that many interested people contribute to it! Let me paste here the group description:

The target for this group is to crowd-source ideas from the community regarding best practises for producing multilingual linked open data. The topics for discussion are mainly focused on naming, labelling, interlinking, and quality of multilingual linked data, among others. Use cases will be identified to motivate discussions. Participation both from academia and industry is expected. The main outcome of the group will be the documentation of patterns and best practices for the creation, linking, and use of multilingual linked data.

So, if you have research or practical interests on the matter, feel free to join and enjoy it!