Thursday, December 17, 2009

I love it when people try to work out what I want

GigaOM (which I saw in a tweet from Matt) claims that Google is willing to do our searches without directly taking our money because it "takes its payments in a currency just as precious — user data — that helps advertisers target ads at readers". This is obviously true, but misses some of the point. The user data can certainly be used to target ads, by working out which advertisers provide the service that the reader is looking for, but that is very far from the only use. It can also be used to (sometimes) work out what people really want. There may not yet be an advertiser willing or able to offer that thing, but maybe Google (or Bing, or someone) can invent a way to provide it. This gives you data-driven innovation before you've had the innovative idea. Would it be too much of a stretch to say that Google and the others are picking up on our inventive ideas and commercializing them before we even know that we have had them? It might be fun to get outraged about this, but I'm OK with people trying to work out what I really want. Just let me know, OK?

Friday, October 9, 2009

Safety precautions for data mining expeditions

Safety precautions for data mining expeditions


Prof. S.N.L. Ingqvist, Independent Data Prevention Consultant


The following text was found on the still twitching body of an explorer found near the opening to the lair of the Lithuanian gerund. It was a clear case of data poisoning. For your own protection, you may wish to consider the advice that he evidently ignored. If he’d taken it, he might still be alive today.

Data mining is an inherently risky activity, and should not be undertaken unless you are aware of the risks. But with care it can be safe and enjoyable, even for the novice. If this is your first time, you may wish to engage the services of an experienced guide; but before taking charge of an expedition, or going it alone, there are a few simple safety precautions of which you should be aware:

  • Make sure you are properly equipped. You need the stout boots of self-assured theoretical insight and the thick jacket of intellectual pride. Remember that equipment that would be able to resist data for an afternoon may not hold up so well on an overnight trip. Lighter materials may quickly become saturated with data and afford little protection.

  • Make sure that someone knows where you are going and when you plan to return. Darkness can descend fast in the data mine, and it is easy to lose track of time. If you do not return to the comforting world of established theory at the expected time, your colleagues will need to alert the Data Mining Rescue, who will send out an appropriately equipped search party. See below for a list of the intellectual resources that you can expect the rescue party to bring.

  • If you are trapped in an enclosed space, no matter how desperate you feel, you should not inhale until you have taken steps to be ‘above the data’. Remember that in such an environment there will still be pockets of rarefied intellectual discourse. Stand as tall as you can, and, holding your nose in the air, proceed to your safe zone.

  • You do have a safe zone, don’t you? For some of us this is a methodology. For others it is a readily available source of non-toxic pseudo-data. If you are able to reach your safe zone before you are overcome by the data, the chances are good that you will find congenial company there.

  • The following section, initially unreadable, was deciphered on realizing that the cuneiform script in which it was written was nothing more than an alternative writing system for an ancient dialect of AMS-LaTeX math mode: Be well versed in communications theory. It is only by clanging on the data pipe with your bit bucket that you will be able to send messages to the surface. Furthermore, if you can learn to reduce the conditional entropy of your surroundings, less data will be stirred up and you have a higher chance of surviving data poisoning.

  • Once you are in the safe zone, you should wait for the arrival of the rescue party. Do not engage in debate, controversy or heated discussion, as these will rapidly exhaust your limited supply of intellectual oxygen. If a colleague panics or becomes over-excited, the kindest thing to do is to gently force them under the data, holding them down until they cease to struggle. This will increase everyone else’s chances of intellectual survival. You may be concerned about the long term consequences of this decision for your moral compass: this is quite natural. However, brain-imaging of the hypergallic gyrus (which Descartes correctly believed to be the seat of the conscience) reveals that 37% of survivors suffer few long term effects. While you may experience some degree of low-level remorse, especially in the first few days, post-trauma counseling services are available at all defense contracting firms, many government laboratories and a small but increasing number of first-rank research universities.

  • The rescue party will have access to powerful data clearing tools. These may include the use of default reasoning, two-faced logic, reductio ad libitum (with or without repeats in the development section), argument that the data is a consequence of a general principle (and therefore, having no explanatory impact, must be ignored), argument from non-existent languages, argument from notational elegance, argument for argument’s sake, and (in extreme cases) argument from personal preference (de gustibus non est disputandum, some say, but for us that is just a matter of taste). If all goes well, the rescue party should be able to lead you back out into the light before permanent damage is done.

  • After your experience, you may be tempted to dive back into the data immediately. Do not do this; even the most intellectually flexible require a recovery period. The medically recommended interval is two weeks for junior staff, two months for tenured faculty and two decades for distinguished university professors with a reputation to protect.


Finally, remember that there is usually nothing to fear from data. If you encounter data unexpectedly, the best thing is to ignore it.

(previously published in SpecGram)

Finding the lost symbol

Famous schlock author Dan Brown pounded through the keyboard in search of his lost symbol. After much reflection he had realized that on 28 February 2002 his keyboard had lost one of the most important symbols that it had ever had. Those royalty statements from Lisbon were looking very different. What had happened? All his royalties were down by a ratio of  200$018 to €1. Surely this must be a conspiracy which had been established at the highest level of the Catholic Church! And this wasn't just a matter of global concern and an opportunity for bestsellers ; it had happened to Mr. Brown's very own predictive word association football keyboard. The indignity! The shame! Handsome chisel jawed Dr. Brown, 5 foot 3, not Jewish, not wearing a yarmulke,  not wearing anything painful around his left thigh, honestly, couldn't think what to do. This was bathos! This was information that the reader could ever need! This was a poorly formulated parallel structure! How could he be expected to turn out well formed paragraphs? Why was he worrying about this? when had anybody ever expected this from him? What was the lost symbol, and how much was the royalty statement?



The answer from Wikipedia is that the lost symbol is called the cifrao and that it represents a sum of currency under the old Portuguese system of escudos and thousands of the escudo. The royalty statement hadn't changed, but Dan Brown's expectations had and so had the way in which the monetary sum was represented. But at least the last symbol had been found!!