Saturday, March 29, 2014

RDA 3rd Plenary Meeting - Day 2

Day 2 of the RDA 3rd Plenary Meeting started with new discussions and introductions between new partners.

Plenary Session
Among the interesting presentations/speeches, I specifically enjoyed the presentation from Dr. Tony Hey, Vice President of Microsoft Research Connections. He discussed about Open Data and Open Science, the use of the MS Tools for using Big Data for Modeling, Azure Cloud Services etc., providing a practical perspective to his ideas. You can view the recoding here.

Another really interesting and useful presentation came later from Beth Plale on the RDA structure in IG/WG and the work done into restructuring the RDA, filling the gaps and taking care of overlapping. I believe that there was a message in the right direction regarding the relation between the existing IGs/WGs and the re-organization that might need to be suggested by RDA. You can view the recording here.

Domain Repository IG
After the lunch break, I decided to participate in the Domain Repository IG meeting, aiming to share my experience with working with agricultural repositories during the last 5 years. The main point of the discussion was that domain-specific repositories may have specific needs related to curation, quality, management and other aspects that need to be taken into consideration. On the other hand, they share a lot of attributes, so it would be interesting to share experiences and best practices from various sectors.
A publication titled "Sustaining Domain Repositories for Digital Data: A White paper (2013)" (available as a PDF file here) could be used as a basis for further discussions and exchange of ideas.
The issue of connecting/linking different domain repositories was raised, especially in the cases where no controlled vocabularies are used for the classification and it was agreed that using langstrings instead of URIs is an important barrier towards linking different data sources.

Poster session
Thanks to a last-minute online application I did as soon as I reached Dublin, the help from Hilary Hanahoe and the kind support from the local organizers, I managed to arrange a last-minute placement for the agINFRA project posters (in fact our application was the last one to be accepted, as I was told) - it was not the best possible one, but taking the limitations into consideration, it was much better than nothing! In addition, we got to share the same space with the iMarine poster, which was really relevant.

Proudly posing next to the agINFRA posters

Wheat Data Interoperability Working Group
The meeting of the group was chaired by Esther Dzale (INRA) and Richard Fulss (CIMMYT)The meeting started with a short introduction of the participants and a presentation of the WG, including:
  • the objectives of the group;
  • the data types to be taken into consideration;
  • a list of the deliverables to be developed in the context of this WG;
  • adopters of the aforementioned deliverables etc.
The discussion was focused on the upcoming deliverables, like the online survey to be used for collecting the user requirements and the structure of the cookbook to be developed. In addition, practical issues, such as the funding options for the work done in the context of this group were discussed, along with ideas about the next face to face meeting. The discussions were followed by a demonstration of the CropScape platform by Dr. Liping Di from the George Mason University, USA. The platform is using satellite data and providing users with information about crop cultivation in specific areas of the US (county, state), size of each field, allocation of cultivations in a specific area and other crop stats. This was followed by another demonstration of the Global Agricultural Draught Monitoring and Forecasting System (GADMFS), again by Dr. Di. 

Further discussions
During the 2nd day of the meeting, I had the opportunity to be involved in discussions with the following:

  • David King (OU) about next tasks of the OU in agINFRA WP5 (related to data integration mainly from Mendeley and BHL);
  • Eamonn O Tuama (GBIF) about germplasm linked data, the excellent work that he has already done in the context of GBIF and the related stuff that he is currently working on;
  • Yde de Jong (University of Eastern Finland, ex-ViBRANT), about the Biodiversity Data Integration IG outcomes and their relation to our work in the Agricultural Data Interoperability IG and Wheat Data Interoperability WG;
  • Dimitris Koureas (NHM London), mostly about biodiversity data interoperability (but not limited to it) and Scratchpads
I am pretty sure that I have left some of the interesting conversations outside this list...

The day ended with a cash bar (time for a nice, cold Guiness!) and the social dinner, which was really classy and included some amazing pieces of traditional Irish music and dancing!

The special RDA menu!

Friday, March 28, 2014

RDA 3rd Plenary Meeting - Day 1

The 3rd Plenary Meeting of the Research Data Alliance (RDA) took place between 26-28/3/2014 in Dublin Ireland. It would be my 2nd participation in a row to an RDA plenary meeting and I managed to organize my trip (even at the last moment) and be there. It was not an easy trip, as I had to travel to Dublin from Athens on a Greek national holiday/celebration but still I did that. The trip itself was nice, reaching Dublin through Copenhagen (with SAS) and then reaching Athens through Frankfurt with Lufthansa; it was my first time in Dublin but due to the packed schedule of the meeting and the moody weather I did not get to see anything outside my 20 min walk from the hotel to the meeting place (at lovely Croke Park) and back.

Statistics about the 3rd RDA plenary
477 participants registered and attended the meeting. The overall RDA membership is 1,585 from 71 countries (64% from academia; 49% EU, 37%US; 2% policy makers).

About the venue
The meeting was hosted at the Croke Park, which is a nice combination of a football stadium and an exhibition center - a fully equipped one I have to admit; the plenary sessions took place at a large hall which fitted the almost 500 RDA meeting participants nicely while coffee and lunch breaks took place right outside the hall. I wouldn't want to know more about the logistics (taking care of such a big number of participants for catering etc. must be a hard task) but the organizers did a great work. There were some issues with the wifi connection but the technicians were working on them as well as lack of power outlets in the big hall (which left a high percentage of the laptops used by the participants for tweeting etc. off after the first couple of hours!) but in the end everything was fine.

Agro-Know was there, with folders including interesting brochures and company profile 1-pagers. In addition, the agINFRA project was disseminated through discussions and the brochures; I also managed to arrange having the 3 small agINFRA project posters to be placed in the poster session just a couple of hours before the poster session, When there's a will there's a way, as they say ;-)

Day 1 Plenary Session
The plenary session started with keynote speeches and greetings; I personally enjoyed more the Keynote Address by Prof. Ian Chubb AC, Australia's Chief Scientist. Some of his key-points, as extracted by various tweets were the following:

  • We need national, international and inter-disciplinary collaboration in research and innovation;
  • We need research and data to be able to feed 9 billion people, producing carbohydrates and fiber while climate moves;

You can also watch his presentation recording here.

Additional presentations/speeches took place afterwards, mostly highlighting the fact that data exist and it is up to the users to find a meaningful way to use them and that infrastructure is already here, waiting for useful applications. The highlight was a cartoon presented by Dr Ross Wilkinson, Executive Director, Australian National Data Service, showing a donkey, a cart and a carrot; an image really familiar to many of us. This led to nice and funny discussions as well as an explosion of related tweets!

Day 1 WG/IG sessions

I opted to attend the "BoF Education and skills development on Data Intensive Science" organized by Yuri Demchenko and Wouter Los, which aimed to identify opportunities for the new field of data scientists. My colleague Miguel-Angel Sicilia from the University of Alcala was also there, proposing his approach on the subject and discussing the possibility of an interest group, a proposal of which has already been submitted. Discussions were interesting and focused on the existing curricula all over the world.

Next was the meeting of the Agricultural Data Interoperability IG, which was chaired by agINFRA colleagues Johannes Keizer (FAO) and Devika Madalli (Indian Statistical Institute, Bangalore). There I made a presentation titled "Global RDF Descriptors for Germplasm Data", describing the work done in the context of the agINFRA project and the RDA WG towards the exposure and publication of germplasm data as linked data (always based on the work already done by other experts in this field).

It was followed by another presentation by Esther Dzale from INRA, about the Wheat Data Interoperability Working Group and then there was a discussion on various topics affecting the group.

Reception sponsored by Irish Research Council gave us some time to get to meet new people and have interesting discussions; I managed to meet

  • Nikos Houssos from the Hellenic National Documentation Center who are working with the aggregation of metadata from various Greek repositories, among others;
  • Nuno Freire, Chief Data Officer at The European Library, with whom we discussed about the role of a CRM tool in the metadata aggregation workflow
  • Stephane Goldstein from the Research Information Network (UK), with whom we had an interesting discussion about the proposed IG on Education for Data Scientists
  • Odile Hologne, Head of Scientific Information Dept. of INRA, who has been in long communication and collaboration with AK but we never had the opportunity to meet in the past;
  • Phil Archer from W3C (the previous time was in the 2nd RDA Plenary in Washington D.C. last September), who was disappointed by people not following the existing standards for linking their data! 

In addition I got to see again old time friends like Johannes Keizer from FAO, David King from Open University (last time we met was back in May 2012 in the 2nd BioVeL workshop in Gothenburg, Sweden!), Esther Dzale from INRA and others. It's always nice to be among friends in such big events!

More info about the 3rd RDA Plenary:

Friday, March 7, 2014

Some thoughts on linking data sources / Bringing down the data silos

Agriculture and silos are two terms which play nice together, when referring to agricultural products; silos provide a nice mean of storing large volumes of harvested crops and provide a controlled environment for their post-harvesting management. However, when referring to agricultural data, one may safely claim that the data silos are dead. In fact, they exist but it is only a matter of time before they are either linked with existing backbones or they eventually disappear. Nikos Manouselis has already presented this "data silos" issue very nicely in a really interesting presentation - don't you agree?

Let me express my personal experiences here: My first contact with EU funded educational and research projects was the Organic.Edunet eContentPlus project, which managed to create a network of content providers on organic agriculture, agroecology and other green topics. These content providers followed a unique methodology for creating metadata records for their educational resources (=harmonization) and these metadata became available through a single point of access, which is the Organic.Edunet Web portal. This was a case of harmonization, networking and public exposure.

Then other projects (ICT-PSP, FP7) came in which I was also involved, like VOA3R, Organic.Lingua and agINFRA. What do these projects have in common? All of them were based or at least included large volumes of work on metadata harmonization, linking between different data sources, making data and metadata public. They managed to interconnect various digital data sources like institutional repositories, digital libraries, databases and educational repositories, applying a harmonization layer (e.g. the application of a common metadata standard/schema, the use of common vocabularies and other KOSs etc), providing a linked data layer for linking heterogeneous data sources and aggregating data and metadata from the homogeneous ones. In fact, this linked agricultural data layer is in my opinion one of the most interesting and important outcomes of the agINFRA project. Using KOS (Knowledge Organization Systems) as the backbone, various heterogeneous data sources can be linked as long as they are published online. Another related case was the mapping between the Organic.Edunet ontology and the AGROVOC thesaurus, which took place in the context of the Organic.Lingua project, which was another step in the direction towards linked data. I also feel really glad to be (even partially) involved in a work that it taking place towards the publication of germplasm and other biodiversity data as linked data, something that will allow the linking of these resources to other types of data like bibliographic and educational resources.

There are also cases of linking on a higher, global level compared to the project-based one; the case of the Research Data Alliance (RDA which aims to enhance the accessibility of research data and enable all stakeholders to get access to them. RDA provide a mean for projects like the ones mentioned earlier and other initiatives (like FAO, CIARD, IFPRI and INRA, just to mention a few) to join their forces, share the effort and resources and make a leap forward. Another case is the Global Food Safety Partnership (GFSP), which aims to provide a centralized mean of access to food safety capacity building, by engaging stakeholders from both the public and the private sector. Global Open Data for Agriculture and Nutrition (GODAN) is another global initiative which aims to support global efforts to make agricultural and nutritionally relevant data available, accessible, and usable for unrestricted use worldwide through the participation of public and private sector bodies. The G8 International Conference on Open Data for Agriculture which took place in April 2013 boosted the development and progress of such initiatives by highlighting the need for opening access to data related to agriculture by setting the landscape and define possible next steps in this direction. It managed to identify the needs and engage key stakeholders, among others.

Taking all these into consideration, it is hard for anyone to believe that in this era of linking and interlinking there is still space for data silos. While there are also cases where data cannot be publicly exposed and shared (e.g. patents, privately funded research work, personal data to name a few), the approach of linking and openly publishing/exposing data seems to be the only way towards ensuring the sustainability of these data and the involvement of all stakeholders. In the end, it is up to each data manager individually to decide if he/she will jump on the train and be a part of the future or just remain a part of the history. ;-)

Sunday, March 2, 2014

Short trip to Alcalá de Henares

It has been quite a long time since I last visited Alcala de Henares; it must have been last February, in the context the Organic.Lingua 5th project meeting. This time I had the opportunity to make a short trip to Alcala (26-28/2) and I admit that each opportunity to visit Alcala is more than welcome! Direct flights from Athens to Madrid are only a few and rather expensive so the trip to and from Madrid takes little bit longer than it should: it usually takes about 8 hours, through Munich, Rome, Istanbul or Zurich, depending on the flight selected; even Sofia and Bucharest can serve as intermediates for these flights. Despite this fact, the destination usually makes up for the long trip.
The entrance of the University of Alcala (UAH)

Reaching Alcala de Henares used to be a little bit hard, as the most common (and cheap) way was to take the metro from the airport until one station and then get the suburban railway to Alcala. However, a new bus line, 824, was recently introduced which takes you directly from the Madrid airport to Alcala. At 3,60 euros, it is also a budget-friendly way to get directly to Alcala and a hassle-free one, as it picks you up just outside the Terminal 2 and drops you off at the same spot.

This was my 3rd time in Alcala, a town which I really like; it is so alive (mostly due to its traditional Spanish character and lots of students) but at the same time it has kept its medieval character; I can almost feel such vibes each time I visit the town. Apart from that, it is always nice to get to meet the UAH team in their own territory - they are always a nice company but I feel that they feel more comfortable in Alcala. Let alone that they are excellent hosts and guides of their own town.

Is he Don Quixote or what?

During my first visit in Alcala for the Herbal.Mednet Kick Off meeting I had the opportunity to see the main building, where the Rectorate is. The meeting took place in a magnificent room, with large paintings on the walls. The second time I got to see another building of the University, in walking distance from the first one. It was modern and well-equipped, probably closer to a meeting place. This time I got the chance to visit the actual base of the IERU team at the Polytechnic building, which is located outside the town of Alcala. Access is easy; one just has to take the Bus No 2 towards the Hospital/Polytechnic building of UAH which takes about 20 mins (depending on traffic) and only 1,30 euros.

Timetable of Bus No2 

During my stay I had the opportunity to meet my UAH colleagues, including Salvador, Miguel-Angel and Elena Garcia Barriocanal (for the first time after all these years), David Martin Moncunill, Enayat, Paulo and Alberto Nogales, with whom I had discussions about the Organic.Lingua 3rd and Final Review Meeting, the ODS metadata aggregation tasks and the agINFRA user trials, respectively. I also had the pleasure to meet Meritxell, with whom we had a long communication through emails during the last months but never got the chance to meet, Rutilo from Mexico and Eydel from Cuba; we are talking about a truly multicultural group there, if one also adds Paulo from Colombia and Enayat from Iran. I personally find this amazing! In fact I managed to spend some time actually working in the lab with the guys and it seems that there more than enough space for everyone here.

I also had the opportunity to vote for the UAH rector (as I am an employee of the University) as well as to visit the Birthplace of Miguel Cervantes which was a really interesting experience; unfortunately no photos were allowed from the well-re-innovated interior of the house. Last but not least, I had a great discussion with Paulo during my last evening in Alcala, over a number of beers and tapas at the lovely Indalo cerveceria.

Lunch with the IERU team

P.S. My trip ended with a nice surprise: I was upgraded by Aegean to Business class on my flight back to Athens; this means that I enjoyed a full 3-course meal and special care from the flight attendants! I don't know how this happened but it was well-appreciated and I would love to see this happening again! :-)