Semantic Web 3.0
You've probably heard some of the buzzwords floating around the internet regarding the future of the web: The Semantic Web, Web 3.0, microformats, data portability, and 'open' followed by just about every word in the dictionary. In this article, we're going to examine and discuss current web trends and where they may lead.
Our goal isn't to try to create our own buzzword to describe what the internet might look like in the future. The current fad is calling today's internet Web 2.0 and many fancy that the next iteration of the web will be called Web 3.0. While it's inaccurate and perhaps silly to label the internet with a version number, the simplicity of doing so means that we aren't going to try to buck the trend in this article. However, because the term "Semantic Web" is also gaining traction, we're going to combine the terms and just call it the Semantic Web 3.0, or SW3 for short.
The Semantic Web is full of promise, but will it be realized?
The internet is in a constant state of change, and yet, in many ways, we are still dealing with the same technologies that were present in the 1990's. The emergence of AJAX and Web 2.0 didn't introduce anything new, they merely gave a name and a face to what had gone before and, as standards solidified, made better use of the pre-existing technologies. In a similar manner, we don't see Semantic Web 3.0 as a departure from what exists today. Rather, it will be smarter, faster, and far more integrated and portable.
It is beyond the scope of this article to examine every emerging format and specification in detail, but we will provide resources to do so. We will provide a brief overview for each technology and then discuss how the technologies might be used together across the web. We will also look at some potential problems and solutions that may arise in the process.
The Current State of Affairs
There is some speculation that the Semantic Web 3.0 will include artificial intelligence and a 3D world similar to Second Life. A future intelligent and 3D web may be coming, but it would require massive and costly changes and, at least in the case of artificial intelligence, giant leaps forward in software development. While perhaps possible, a thoughtful look at the work being done today reveals that SW3 will actually be a largely data-centric revolution. Data accessibility, portability, and integration are increasing incrementally but steadily and they will be the concepts that power the future of the web.
Data Accessibility
The idea of data accessibility is nothing new. Google has made billions because of its ability to find, index, and regurgitate data in a meaningful way. However, in order to find information on Google or any other search engine, a user must adjust the way they seek information by framing their question in a way that the search understands rather than asking the question in a natural way. Search engines have made strides toward being able to answer questions in natural language queries, but they are far from being able to do so on a wide scale.
As an example, typing "What is 1 + 1?" into Google will result in a page that says "1 + 1 = 2", but asking "When do the Yankees play Boston next?" doesn't immediately return a useful answer or even a helpful result. Search engines simply can't understand most natural language queries and give a meaningful answer. Many of the Semantic Web technologies are aimed at bridging that gap.
Because machines currently lack the ability to process data in the same manner as humans, it becomes necessary to format our current data in a relational way that machines can index, sort, and filter. In this sense, data accessibility is meant for machines, not humans. A working model of this process has been labeled as the Semantic Web.
Tim Berners-Lee laid out a framework for the Semantic Web in 1998, but the transition has been slow in coming for a few reasons which we will discuss later. The W3C has since laid out more detailed specifications. A group called the Dublin Core MetaData Initiative is also laying foundations for standard practices.
RDF: Resource Description Framework
RDF is an acronym for Resource Description Framework, an XML-based format for assigning meta-data (data about data) on a relational basis through the use of a concept called triples, which is the separation of information into subject/predicate/object classifications. There are several variations of RDF including RDFa and eRDF, two formats that embed RDF into the attributes of XHTML and HTML markup. The Web Ontology Language OWL also extends the functionality and vocabulary of RDF. GRDDL is a method for relating existing XML documents to their RDF equivalents.
By itself, RDF would be fairly useless unless there is also a way to retrieve that information. The SPARQL Query Language provides a protocol for returning RDF data. Yahoo has recently announced that they will begin supporting several versions of RDF and other microformats. There are also several other Semantic Web search engines appearing across the web including: Hakia, Swoogle, ZoomInfo, and Swotti. It would be no great surprise if Google and Microsoft soon announce their support as well.
There are also several utilities that help create minimal RDF markup. Calais will extract recognizable people, places, and companies and organize them into an RDF format. DC-dot is an online tool that will convert your existing meta-data into RDF format and MKDoc is a downloadable utility that does just a little more. Zemanta is a blogging utility that suggests resources like articles, photos, and links for inclusion in a blog post.
Microformats
RDF can be viewed as a general descriptor of metadata, but more specific microformats exist. Following is a list of additional microformats with a brief description for each:
- hCard is a microformat for describing people, places, and companies.
- hCalendar is a microformat for describing events.
- hReview is a microformat for reviews of products, events, businesses and more.
- hAtom is a subset of the Atom feed specification.
- XFN maps human relationships using links.
- FOAF is similar to a hybrid of hCard and XFN.
- GeoRSS encodes locations into RSS and Atom feeds.
- MediaRSS extends the RSS 2.0 format.
Universal Data Access
Data accessibility is perhaps more often thought of as providing access to data for those with disabilities than it is in helping machines process data. It would be negligent to avoid mentioning this issue as it will certainly also be part of the future of the web. The Web Accessibility Initiative seeks to educate developers on how to provide alternative data presentation for people to use the internet through assistive technologies like screen readers. Developers and content providers are also encouraged to provide written transcripts for audio and video content, keyboard access keys, alternative methods for people to access information in JavaScript-laden Rich Internet Applications.
Portability and Integration
OpenID is gaining traction with larger sites.
The OpenID Foundation, which is part of the Data Portability project, has generated a lot of press lately as many well-known companies have announced their support for the OpenID initiative. Simplistically described, OpenID is a way for users to sign in to multiple sites with a single identification. Such a system would eliminate the need for remembering multiple passwords and usernames and would allow users to quickly register for sites that are OpenId enabled.
OpenID and data portability are frequently mentioned in connection to Semantic Web 3.0, but what is often missed in the discussion is the proliferation of portability in general. Applications are also becoming portable in multiple ways through widgets, web services, and APIs. Open Social and the Facebook Developer Platform are some of the higher profile examples of data portability (mainly because they're battling each other), but the rabbit hole goes much deeper than that. We're seeing a lot of examples of major websites becoming web services:
- Amazon will let you host your own store with their products.
- Revver goes beyond the usual allowing your to embed their video- they'll actually let you create a video sharing network and use their library.
- Blog Rush is probably overhyped, but if you take a step back and look at what they're actually doing, it's kind of interesting.
- Ebay is also making use of their web services.
- Yahoo, Google, and AOL all offer integration into their web services and API for your site.
- Add This integrates many social bookmarking sites into a simple widget. They also have a widget for adding feeds to your reader.
- Plaxo is doing a lot of really cool things and so are Twitter and FriendFeed.
You see widgets everywhere that connect site in ways that are far beyond a simple link. ESPN has widgets that allow you to embed sports scores and MSNBC has the same for news. There are even widget building services like Sprout Builder and Konfabulator that allow you to make and brand your own widgets. What we have provided is far from a comprehensive list, but it is enough to illustrate a trend.
Yet another aspect of portability that has not received its full due, perhaps because of its relative newness and a current lack of an abundance of popular applications, is the realm of online applications that can also work offline. As more developers begin using Adobe AIR and Google Gears, expect to see and hear a lot more about hybrid online/offline applications that move from the web to your desktop.
The concept of portability goes still further as mobile browsers become more ubiquitous and are able to access feature-rich web pages. Google's Android and Apple's iPhone Development Platform could play important roles in sparking a flurry of application development for mobile platforms. However, growth beyond simple WAP and WML mobile websites will probably hinge heavily on wireless carriers' willingness to loosen mobile bandwidth restrictions.
The Future of the Web
Having provided a brief overview of emerging formats, technologies, and trends, let's discuss what it all means.
Dissecting the Semantic Web
The end goal of formats like RDF and its variants and the array of microformats is to make data more accessible to machines, which, in turn, can make data more relevant and accessible to humans. However, in some cases, this is much easier said than done.
RDF isn't the easiest format to learn and the time and cost of adding or embedding RDF documents into existing web pages or documents is staggering. Even those who have mastered the RDF format may find it strenuous to produce anything more than a simple RDF document. It seems difficult to imagine a scenario in which everyone embraces RDF to the point of manually implementing it on their own sites.
"It seems difficult to imagine a scenario in which everyone embraces RDF to the point of manually implementing it on their own sites."
There are other issues concerning RDF that will also need to be addressed. In our opinion, embedding RDF information into an XHTML or HTML document is a poor method of implementation. While microformats like XFN are minimally intrusive and limited to links, eRDF and RDFa call for the wholesale markup of content, which make content editing difficult at best. Just as CSS separates style from page markup, so we think RDF should separate metadata from page markup. Our recommendation is that the metadata be included through an attached metadata file rather than embedding of it in the page content.
Utilities like Calais, DC-dot, and MKDoc will help in some adoption of RDF formats, but in their current states they are only able to produce simplified RDF documents which are of limited use. Such documents won't offer much more (if any) functionality over today's current search engines and will therefore produce little incentive for RDF adoption on a wide scale. There are only two scenarios in which we can envision RDF working in the general market: 1) A search engine is developed that processes the content on existing pages and stores the information in RDF format; or 2) An automated and low-cost or free utility appears that generates a highly detailed RDF document for an existing web page.
The problems surrounding widespread adoption of RDF aren't insurmountable, but the solutions will most probably be incremental. Calais and other utilities will improve over time, but they will have to improve to the point of offering unparalleled data processing and retrieval or RDF may be left by the wayside or used in highly specific environments.
If and when RDF becomes commonly used, search engines will face several obstacles that are similar to many of the problems they face today. For example, search engines will need to provide comparison checks to ensure that RDF data matches the actual page content. They will also have to be wary of people who try to manipulate their RDF data to artificially emphasize whatever keyword term they want to rank for, similar to keyword stuffing. An entire industry will almost surely spring up around optimizing RDF documents for search engine placement. These shouldn't be too great of obstacles for the search engines, but it will help if they learn from the past and are aware of the issues before they become a problem.
So is the Semantic Web still possible? We believe it is, but it will take some clever technology to bring it into maturity. We also believe that another aspect of the Semantic Web that shouldn't be overlooked is user tagging. Sites like Flickr, del.icio.us, Stumble Upon, and many others already make extensive use of user tagging to organize information. Wikipedia is perhaps the most famous example of user generated content, but there are millions more similar sites. It would be presumptuous to underestimate the power of users to create semantically relevant data and we think that it will have a very large part to play in the future of Semantic Web 3.0.
In comparison to RDF, the many of the other microformats are simple and cheap to implement in comparison, often requiring little editing or minimally invasive markup changes. There are many popular sites that already use XFN and FOAF and there are many that use the other microformats as well.
My Data or Data Mine?
As easy as implementation of XFN or FOAF may be, there are also potential issues regarding their use. Because relationships are encoded into page markup, it means that anyone can view them. There are multiple negative implications for this, the two most important being lack of privacy and spam, but it won't be long before the Internet is abuzz because someone informed their boyfriend that they were being dumped by changing their XFN link relationship from sweetheart to acquaintance.
The more serious issues of privacy and spam aren't as entertaining, however. Companies will view a network of people with many of the same interests as too potentially valuable to restrain themselves. Spiders will be able to follow friend networks and attempt to market all of a user's friends. There is also the problem of friend feeds that share your online activities with your friends network. Programs like Facebook's Beacon have rightfully come under a lot of fire for violating privacy rights by starting off as an active opt-out system rather than an opt-in model. Facebook's missteps aren't the only example and they probably won't be the last mistakes we see when it comes to the issue of social networks and data portability.
"If each user is granted control over their own data, it's a step in the right direction where data privacy is concerned."
The seeds of some partial solutions are already planted. If each user is granted control over their own data, it's a step in the right direction where data privacy is concerned. In some ways, Facebook has solved several spam issues in not allowing others to see your profile (and thus your contact information) unless you allow them. Where something like that might improve, perhaps, is combining that principle with user-defined filters, much like Gmail uses to combat spam. By having access filters, each user can control what information is shared with whom. Making any programs like Beacon opt-in by default will also go a long way in easy people's privacy concerns. However, companies should take extra steps in the beginning to anticipate problems like these and go above and beyond in their efforts to protect user privacy and to combat spam.
Security will also be another concern as the web becomes more integrated and as data is sent from one platform to another. Security protocols like OAuth will be instrumental in helping keep data secure but portable.
A Little App Spam on the Side
As widgets and applications become are becoming more prevalent and social networks continue to gain power, application spam will become a greater problem. Applications that can send messages to all of your friends can be wearying, especially when they are commercial messages. Users on Facebook are already complaining about application fatigue and and most people will shudder when they think about all the application-littered MySpace profiles that will result from MySpace implementing Open Social. There is little reason to think that Bebo, hi5, Orkut, and the other social networks won't face the same problems.
Beyond just hosting spamming applications, some of the major social networks risk turning into spam engines themselves. MySpace has so many profiles dedicated to work-from-home, multi-level marketing, and would-be porn stars that it can be difficult to go a day without receiving a friend request from someone running some scam. There are multiple applications that will send automatic friend requests by spidering the social networking sites. Even social bookmarking sites like Digg, Reddit, and technorati are under a constant barrage of non-relevant content submitted by people who are merely trying to gain an advantage in their search engine placement.
"Many of the social sites are trying to replicate Google's AdSense success without realizing that their users aren't searching, they are socializing..."
What makes things even more interesting is the fact that the current business models for social websites haven't yet proven to be sure investments. Many of the social sites are trying to replicate Google's AdSense success without realizing that their users aren't searching, they are socializing, and so their online mindset is going to be far different than it would be if they were on a search engine. Users are also starting use browser plugins like Adblock Plus to block the revenue making ads, putting further dents in the social business model.
We believe that because of app spam and app fatigue, there will be a migration toward niche-specific social sites like Sphinn and Staralicious. The major sites won't completely die, but most people will spend the majority of their time in their communities of interest and only just maintain their profiles on the big sites. The major sites that survive will do so by exercising greater regulatory control and through technical adjustments like nofollowing their outbound links while carefully balancing openness and user participation and feedback. We also believe that the revenue model that will emerge as a clear winner will be one in which the users share in the profit. Sites like Revver and Squidoo are already doing this and more will follow. After all, why should the users create the content but not share in the rewards?
Web Services and Web Slaves
We are fast approaching a web where data, people, and services from any site can be embedded, transferred, or utilized by any other site: Platform and location agnostic information and services. As we've learned from past experience, however, such freedom also carries its own unique issues.
Many websites are already reliant on Google for their advertising, ad publishing, and their analytics and a few use it for Google's API. Microsoft is constantly seeking ways to introduce proprietary software into the market. As more of the "corporate" sites grow and share their web services, you may find that many of the smaller sites are little more than mashups, remixes, or syndication channels for the larger sites. People complain about Wal*Mart because it crowds out the mom n' pop stores and they also complain about how the majority of the media is controlled by a small number companies. Vast as it is, it may be foolish to think that the web will be much different.
"As more of the "corporate" sites grow and share their web services, you may find that many of the smaller sites are little more than mashups, remixes, or syndication channels for the larger sites."
Innovation will always have its place, to be sure, but a way must be found in which both the large corporations and the small companies and content providers win through web services. Open source communities like Linux, Apache, and countless others are paving the way by making a case for the open-source business model, but it's premature to say that they will completely replace a proprietary model. It will take multiple forward-thinking companies like IBM and Sun to help usher in a new hybrid of open-source and traditionally proprietary companies working together for everyone's mutal benefit. From reading books like Wikinomics and Mavericks At Work, however, it does seem like there is hope in this regard, and not just in the online world.
Beyond open-source, there are still massive hurdles to clear in the areas of intellectual property and digital rights management. The music industry and Hollywood have fought (and won) many legal battles against peer-to-peer sharing networks, but they are suffering plenty of backlash by doing so. It may take a while, but we think that the major media companies will eventually begin testing new business models that are more open and customer-centric.
Summing It Up: The Promise of a Flying Car
Many of the past's predictions about today now seem silly in retrospect: flying cars, frequent trips to the moon, computers creating a four hour work week. We aren't going to go crazy and predict a major breakthrough in artificial intelligence or the imminent advent of the 3D internet. Those things very well might happen at some point in the future, but we think they're still a little ways off. That said, we do have some ideas about where the internet will go in the immediate future.
The transformation from Web 2.0 into Semantic Web 3.0 will be an incremental evolution. Many of the hot trends that we see today like AJAX, social networking, and tagging will develop best practices which will become standardized. We will see gradual implementation of some, but not all, of the various microformats. Some will also be combined or rolled into others. The adoption of RDF will mirror the rate at which automatic RDF generation improves. Natural language processing and querying will gradually improve, but the process may be so slow that no one will notice.
OpenID will continue to gain traction and popularity, but it will have to solve an array of problems and the major companies like Google, Microsoft, and Yahoo will have to accept OpenIDs in addition to providing them or the whole initiative will never achieve true portability it promises.
We will see a flood of Rich Internet Applications (RIAs) that are able to transition easily between online and offline modes. Though this is the first time we've mentioned it in this article, we also believe that the majority of data storage will eventually shift from the personal computer to online storage because it will be accessible from anywhere and is probably safer from loss online than at home. The line between the desktop and the internet cloud will become blurry and perhaps indistinguishable at times.
Open source software will continue to gain popularity as more and more companies will switch to open source and also embrace or at least engage parts of its business model. Traditionally proprietary companies will begin to reach out to and work with open source communities and their business model will transform from creating proprietary software to providing services and support related to the open source software they helped create.
Web services will continue to expand and many websites will come to rely on a variety of services from multiple large companies. There will be an enormous outcry for original content that hasn't been mashed up, spliced or rehashed from the same old sources. Online social activities and user-generated content will prove to be more than just fads and the social sites that share the revenue with their users will be the ones that win out. There will also be an explosion of niche social sites as the larger social sites become bloated with spam and marketers trying to game the system somehow. Microblogging will also go mainstream.
"If one telecom company is willing to offer unlimited mobile internet bandwidth, the others would have no choice but to follow suit."
There are also a few things we'd like to see, but aren't so confident as to predict that they will actually happen. We want to see Hollywood and the music industry adopt more user-friendly practices in regards to DRM. We'd also love to see mobile bandwidth go unlimited. If one telecom company is willing to offer unlimited mobile internet bandwidth, the others would have no choice but to follow suit. (We also think the company that does that first will gain a lot of new customers.) The promise of truly portable data would then become much closer to reality. Until then, however, expect the mobile revolution to be a largely quiet one.
An obvious question that may be asked is, "When do you expect all of this to take place?" While some of the things we've talked about are already being implemented and some will be incremental, we feel like the next five years will see most of what we've discussed take place.
Over the course of this (long) article, we've examined the current state of Semantic Web technologies, microformats, data portability, social networking, web services and application interoperability. We've also looked at some of the potential issues that will arise from the widespread adoption of these technologies and we've attempted to offer solutions that we think might help solve those problems. Finally, we've looked at all of the data and made our best-guess predictions about where the Web will be five years from now.
We've had our say. Now what do you think?