SSSW07 — Day Three

This day’s invited speaker was Peter Mika, from Yahoo! Research Bar­celona. In his talk — The Future of Web Search — emphas­ised the state of the web search, semantic web deploy­ment dif­fi­culties, the shift from doc­u­ments to data­bases (web of data), and cur­rent trends in annotation/​structure of data: folk­so­nom­ies, µformats, wiki­pe­dia infoboxes, RDFa; then how to recon­sider IR in this con­text: folk­so­nom­ies min­ing, GRDDL and hGRDDL, should we have “for­giv­ing” pars­ers for µformats?

In this con­text the descrip­tion of the ideal world would be:

  • plenty of pre­cise metadata to harvest
  • user intent cap­tur­able dir­ectly as a SPARQL query
  • single onto­logy used both by the query and the know­ledge base (KB)
  • a query executed on a single KB, gives the cor­rect, single answer

In real world we face tech­nical and social chal­lenges: query inter­face usab­il­ity, data qual­ity (from synctactic/​semantic errors to spam), onto­logy map­ping, entity res­ol­u­tion, rank­ing across types, res­ults dis­play (inform­a­tion over­load and par­tial under­stand­ing issues), user motiv­a­tion to annot­ate, trust.

Next, Fabio Cirave­gna presen­ted the state of the art in using semantic web tech­no­lo­gies for know­ledge man­age­ment (KM) in large dis­trib­uted organ­isa­tions — from the sheer amount of raw data (i.e. a Rolls-​Royce jet engine pro­duces 1GB of vibra­tion data per hour) to unstruc­tured reports on the life­cycle (dia­gnose, repairs, etc.) of such engines, dis­trib­uted over multiple repositories.

The Rolls-​Royce case study of cross-​media KA was impress­ive, the main issues (apart of data volume) were that evid­ence is dis­trib­uted over dif­fer­ent media, from more or less struc­tured text (word, excel, power­point and PDF) to 3D images, data integ­ra­tion and hybrid search.

Other spe­cific inform­a­tion extrac­tion (IE) issues were event mod­el­ling, table data extrac­tion, dis­tance met­rics approaches (as opposed to the lin­guistic and stat­ist­ical ones).

Later in the prac­tical ses­sion we explored machine learn­ing (ML) from both (human) text annota­tions as well as image annota­tions; which also showed how easy humans dis­agree on annota­tions and how the annota­tions reflect the world model of the annot­ator (and not of the user).

The last tutorial was given by John Domin­gue, on semantic web ser­vices (SWS) — the prob­lems with the web ser­vices today, SWS vis­ion, IRS3 SWS broker, web ser­vice mod­el­ling onto­logy (WSMO), orches­tra­tion and cho­reo­graphy in SWS. Then the Essex County Coun­cil Emer­gency Plan­ning case study was presen­ted and demoed, and the talk ended with OWL-​S and semantic annota­tions for WSDL (SAWSDL).

In the afternoon’s prac­tical ses­sion, Barry Norton led us in how to re-​create the european travel demo with IRS3 and WSMO Stu­dio.