Facebook made some waves with the announcements at their recent f8 Developer Conference. Among the announcements was the introduction of the Open Graph Protocol. This protocol does two things. First, it allows Facebook to extend its reach into more of the web, making the always-on facebook experience even more ubiquitous. Second, it provides an early version of a standard for expanding the semantic web.
Now, the semantic web is generally a pipe-dream. The problem with it is that it’s a top-down notion of semantics, where data can have meta-data attached defining what it’s about and what it means. Semantics, however, is harder to pin down; meaning shifts between contexts. The relevance of a particular bit of data to other data tends to be decided most fully by the direction from which you approach it.
Implicitly, we get this: the best information aggregators tend to use data already out there to try and determine relevance. Google uses links back to a page to help arrive at relevance (along with a variety of other things, not limited to meta-data tags). Digg uses user views. Last.fm uses library preferences among users. Pandora attempts to use “musical genes” within the music, rather than attempting to explicitly categorize songs. And Facebook uses connection people make to things on Facebook…and now, out on the web (where the web supports it).
Understand that if Facebook manages to really push out this Open Graph and convince sites to support it, and there’s no reason to doubt they’ll be at least somewhat successful, at least in a primitive manner, it’s a simple step for facebook to use this as a sort of “Social Search Engine”. If that takes off, expect the Open Graph to expand further.
However, the Open Graph isn’t just for Facebook. I mean, they’ve made it for themselves, and as far as the meta-data tagging is concerned, it’s really built almost entirely with facebook in mind, but it is a standard nonetheless. It suffers from but a single – though important – flaw: it remains top-down.
The page itself still has to describe what it is. For Facebook, this makes sense, as they focus on connecting concrete nouns, and doesn’t want its users to deal with having to pick those out. For the semantic web, though, this is harder to make sense of. A page can be seen as multiple things, depending on your perspective. A simple example is Penny Arcade. How exactly do they tag their site? As being about people? A group? A business? A webcomic? Games? Or is it just…a website? Or a blog? How do you tag it?
Generally, websites aren’t necessarily about objects, making the base conception of the Open Graph rather…primitive. It’s still an advance for the Semantic Web, and it may end up being a giant step in moving that project forward, but it’s still a primitive step.
Now, we do want patchwork of overlapping networks. Not just social networks, but meaning networks are needed to really drive any semantic web, and the more, the better. What I mean by that is that people should be able to connect things together…that’s how you drive the semantic web. And you want them to connect them via a context, because that drives the meaning. The Facebook context is other people, both their connection among people and their connection to the things facebook allows them to connect to. For instance, if I want to find out about actresses/actors a certain subgroup of people like, well, I just have to find members of that sub group on Facebook, check their connections, and peel out the actresses and actors (which will be tagged as such, via the Open Graph Protocol). I can use connection of this sort to produce all sorts of people groupings, and then use the people groupings to narrow contextualize other “things” on Facebook. That helps provide a social meaning for the things Facebook recognizes.
There are other ways to contextualize things, though. For instance, Wikipedia is built around citations. Well, that implies there’s a contextual network built up out of the web, linking Wikipedia entries out into web sites (and books, periodicals, etc.). A less strenuously regulated network of such connections could be build, maybe Wikipedia Examples, where people could take an entry or group of entries and a web site, and connect them. It’s akin to saying the “Kurt Vonnegut” entry “Likes” “Breakfast of Champions”. It could also “Like” a highschool book review of Breakfast of Champions an enterprising student posted to the web. I wouldn’t really advocate attempting to build this into the core citation/bibliography system of Wikipedia, but it would greatly expand the context of websites and Wikipedia entries.
That sort of contextualization, though, is true metadata. Metadata exists outside of the data it describes, whereas the Open Graph attempts to embed at least some of the metadata into the data itself. In the wikipedia example, we don’t want pages to describe their type, that’s implicit in the fact that they’ve been connected to Wikipedia. Instead, you want Wikipedia to have a standard way to publish metadata it has about its connections.
A network is often best represented by a graph: a structure of nodes and connections between them. A meta-network, like those above, has primary nodes: nodes contained strictly on the network itself. These would be, in Facebook, People, Pages, Groups, Events, etc. The meta-network also has secondary, or exterior nodes, which are data elements not strictly in network, but reference-able from the network.
Another way to describe this is primary nodes are things the network understands, creates, and manages, while secondary nodes are things the network really doesn’t understand, but can point at. To use a (probably very poor) programming analogy, you might have a C++ pointer pointing at a specific object of a known type or you can have a void*. The program can actually do stuff with the object pointed at by the pointer of known type, but a void*? Unless you’ve got a really thorough understanding of your entire program, it’s a good bet you’re playing with fire if you try and do anything with whatever that points to.
The primary nodes of a network provide known semantic context. If you’re on Facebook, these will generally be social in nature, whereas the Wikipedia Examples network would be Wikipedia Articles. The secondary nodes then gain additional context and meaning by the connections primary nodes make to them.
Consider such a thing for a game network. Games become primary nodes, probably alongside game devices and consoles, players, developers, composers, development houses, publishers, reviewers, etc. Network users might link news stories to games, reviews to reviewers and games, development houses, etc. Those secondary nodes enrich the information contained within the network itself…but the primary nodes contextualize the meaning of secondary nodes.
In a way Google already attempts to do this, but they’re proceeding in an extremely inorganic way, as they have no concept of the various subnetworks which live across the web. Instead, search engines attempt to view all pages, generally, as pages. At best, they seem to view pages as part of a “site”.
But what about a search engine that crawled the web not via web addresses, but through the eyes of various networks. A given web page would be given semantic meaning based on what connected to it, in all the various networks. Combined with a more traditional web search, this information could more effectively drill down to find sites relevatn to the context the searcher is coming from.
Heck, that sort of search engine could move to being a meta-meta-crawler, drawing connections between primary nodes in disparate networks in order to establish a higher-order context.
Such a network would contain all the metadata for in its primary nodes and in the connection descriptions; secondary nodes would have no need to know what the network thinks of them. In order to integrate this into the semantic web, the network itself needs to expose an API for getting at information relating to both primary nodes and secondary nodes. Facebook has the primary node queries down pretty well, with their new Graph API allowing networks to, assuming they have some starting point, nearly any publicly exposed portion of the primary network and ask for permission to access even more. The difference here is Facebook doesn’t really have a concept of Secondary Nodes. Instead, it is asking other websites to make themselves primary nodes of the Facebook network, via the Open Graph Protocol meta tags.
A secondary node is contextualized by two things: the set of connections from primary nodes to it, and the type of connections those are, which is defined by the network itself. For instance, in Facebook you can connect to something directly. You can have friends, pages you’re connected to, groups you’re a part of. Generally, People have the largest range of possible outgoing connection types. In addition, you can “Like” things, which describes another sort of relationship. Currently, those are limited strictly to primary nodes in the network (strictly things Facebook can actually understand), so you can’t really submit a website to Facebook and see…what everyone thinks about it already.
For instance, if I had this blog on my own server and could make my own widgets, it might be fun to take the link to this post and submit it to Facebook to look up people who’ve connected to it, whether via “Like” or whatever. I could then use Facebook’s authentication system to authenticate people and show them any of their friends who’ve connected here. Beyond that, I could use associated meta-data to check for potentially related links, to text search your friends public connections for things related to the tags on the post, etc.
Sadly, I don’t seem able to do that on Facebook as it is now, because Facebook would want me to embed my blog into its network…and associate it with Bilsybub, the person, I think, so that it could figure out how to categorizes the site. So we’re not quite where I think we’ll end up. We’re closer, though.