So have you checked out the next big thing in the development of the semantic web and the ultimate salvation of humankind yet?
Even though the pre-launch hype suggested that all search results would be accompanied by a free basket of puppies made of ice cream and children’s laughter, I went in with fairly middling expectations. I’ve picked up on the fact that Mr. Wolfram’s ego is overblown, and everybody should know that natural-language is a bundle of way-fucking-super-hard problems. And yet, somehow, I was still a little disappointed.
Part of my disappointment may have had more to do with the metahype than the hype directly from Wolfram. I didn’t do a careful job of separating official pre-launch claims from the claims being made by interested observers. For all I know, it’s possible that Wolfram itself never made any natural-language-search claims at all, that all of that came from the meta-hype. I’m going to pretend that’s the case and ignore the absolute failure of natural-language searching in Wolfram|Alpha.[1]
Except…no, I can’t really do that. One of the great promises of W|A was the idea that you could use it for quick and dirty data mining, for making various associations and visualizations and such. You can sort of do that, but only in a limited way and using only pre-tallied values. You can divide the population of Paris by the population of London because the pops of those two cities have already been added up and entered into the database as “pop London” and “pop Paris”.
You can’t do “pop largest city in England” because (it looks like) W|A can’t handle any sort of relative queries. Every possible list of everything of course can’t be computed, compiled, and indexed, so there can’t be any sort of direct query. What’s more interesting is that W|A didn’t think it worthwhile–or wasn’t able to–create a list-generating function (”ALL where twenty_questions=’city’ and location=’england’”).
This isn’t meant as a slam against W|A; it’s more of a sigh.
This bit is a slam: One of the features that looks most useful is the ability to pull up various physical properties of all kinds of materials. Try “young’s modulus titanium” (sans quotes) to see what I mean. I don’t know what they do with it, but our students and researchers often come to us for this kind of stuff after having exhausted Google; having it right there in W|A would seem to be a boon to them.
Unfortunately, W|A is cagy about where they get these values. If you click on the “Source Information” link underneath the results, you just get a list of every place from which Wolfram pulled data; there are no clues as to the provenance of this particular datum. Worse, you’ll see that a lot of the sources are “Wolfram|Alpha curated data”, which is total black-box stuff and gives us no idea of the reliability of the information.
This “curated data” business is sketchy in general, but it’s especially disturbing when it comes to these physical/thermal properties results: Not only can the accepted values change (slightly) over time as the testing equipment becomes more advanced, but the values can also be derived by two entirely different methods. There’s the experimental method, where you take a hunk of titanium and turn up the heat until it melts and decide that’s the melting point. Those are the best values, the most reliable ones. But there are also calculated values, where–I’m not totally clear on this–wizardy scientists take the known values of various other substances and determine theoretically what the value of a material should be. These can be questionable, and are (I’m told) usually avoided when possible.
Using W|A, searchers have no way of knowing whether the value given was determined experimentally (good) or through calculation (not so good), and so can’t know whether they can plug that value into their own work. They also have no way of knowing whether the value given is outdated or otherwise untrustworthy.
(That “young’s modulus titanium” result, for example, doesn’t match up with the value given in one of our standard reference works. I’m really curious as to where they got their number.)
This is already super-long, so I’ll be quick with the last disappointment: The datasets fed into the system on launch are pretty paltry. You can find populations for most geographic areas, but when you move into wage information you start hitting blanks really quickly. You can forget about doing any more interesting socioeconomic searches.
So yeah. It’s a nifty toy, and I guess it has potential, but the product available at launch doesn’t come anywhere close to the hype.