Semantic web

Drupal, the semantic web and search

All major search engines, including Google and Yahoo!, are moving aggressively trying to capture structured data. This isn't exactly a surprise because it provides tremendous opportunity. Let's take the example of product search. Imagine the web as a huge database of millions of products, and search engines like Google and Yahoo! giving you a rich set of controls to filter by price, availability, color, shipping cost, user ratings, and more. Wouldn't it be great to be able to search all the world's products from a single page with a single interface? I'd think so too.

It is waiting to happen; we just have to connect the dots. That is, we have to make Drupal emit structured information.

Hundreds of thousands of Drupal sites contain vast amounts of structured data, covering an enormous range of topics, including product information. Unfortunately, that structure is hidden deep in Drupal's database and doesn't surface to the HTML code generated by Drupal. As such, search engines can't pick it up as a product, and they'd fail to include it in their world-wide product database.

I first talked about the semantic web and Drupal in my DrupalCon keynote last year in Boston. In my presentation, I laid down the challenge that we need to put fields in core and make them first class citizens. Once fields are thus empowered, they can be associated with rich, semantic meta-data that Drupal could output in its XHTML as RDFa. For example, say we have an HTML textfield that captures a number, and that we assign it an RDF property of 'price'. Semantic search engines then recognize it as a 'price' field. Add fields for 'shipping cost', 'weight', 'color' (and/or any number of others) and the possibilities become very exciting. I envision a Drupal core CCK with the power to do just that.

Here is another example. Imagine a standard Drupal node-type called 'job'. The fields in the job node-type would have RDF properties associated with them mapping to salary, duration, industry, location, and so on. Creating a new job posting on a Drupal site would generate RDFa that semantic search engines like Yahoo!'s SearchMonkey would pick up and the job would be included in their world-wide job database.

Technologies like this disintermediate so many existing websites and organizations that it makes my head spin. It is too great an opportunity for us to pass up on. By adding semantic technology to Drupal core, I think we can make a notable contribution to the future of the web.

This kind of technology is not limited to global search. On a social networking site built with Drupal, it opens up the possibility to do all sorts of deep social searches - searching by types and levels of relationships while simultaneously filtering by other criteria. I was talking with David Peterson the other day about this, and if Drupal core supported FOAF and SIOC out of the box, you could search within your network of friends or colleagues. This would be a fundamentally new way to take advantage of your network or significantly increase the relevance of certain searches.

I can has semweb in Drupal core?

State of Drupal presentation (August 2008)

Last week at DrupalCon Szeged I gave my traditional state of Drupal presentation. The video of the presentation is provided below, and you can download a copy of my slides (PDF, 11MB) as well.

The presentation discusses the results of the recent survey that I conducted; the survey ran for 30+ days and collected more than 1300 responses so it should provide a good idea of the community's current thinking. I'll provide more color and details about the survey results in a number of follow-up posts.

State of Drupal presentation (March 2008)

Last week at DrupalCon Boston I gave my traditional state of Drupal presentation in front of 850 Drupalistas. The video of the presentation is provided below, and you can download a copy of my slides (PDF, 15 MB) as well. The video is available in alternative encoding formats from archive.org.

Source: archive.org.

Topics I talked about: the Drupal 6 release, the state of our union, the need for a drupal.org redesign, the Drupal 7 killer release, the Drupal 7 development cycle, usability, test-driven development, the future of Drupal and the semantic web, etc. There is a lot of material in this presentation and during the course of the next few weeks, I plan to decompose this presentation in a number of extended blog posts. Stay tuned!

© 1999-2007 Dries Buytaert Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.
Drupal is a Registered Trademark of Dries Buytaert.