Friday, October 9, 2009

Safety precautions for data mining expeditions

Safety precautions for data mining expeditions


Prof. S.N.L. Ingqvist, Independent Data Prevention Consultant


The following text was found on the still twitching body of an explorer found near the opening to the lair of the Lithuanian gerund. It was a clear case of data poisoning. For your own protection, you may wish to consider the advice that he evidently ignored. If he’d taken it, he might still be alive today.

Data mining is an inherently risky activity, and should not be undertaken unless you are aware of the risks. But with care it can be safe and enjoyable, even for the novice. If this is your first time, you may wish to engage the services of an experienced guide; but before taking charge of an expedition, or going it alone, there are a few simple safety precautions of which you should be aware:

  • Make sure you are properly equipped. You need the stout boots of self-assured theoretical insight and the thick jacket of intellectual pride. Remember that equipment that would be able to resist data for an afternoon may not hold up so well on an overnight trip. Lighter materials may quickly become saturated with data and afford little protection.

  • Make sure that someone knows where you are going and when you plan to return. Darkness can descend fast in the data mine, and it is easy to lose track of time. If you do not return to the comforting world of established theory at the expected time, your colleagues will need to alert the Data Mining Rescue, who will send out an appropriately equipped search party. See below for a list of the intellectual resources that you can expect the rescue party to bring.

  • If you are trapped in an enclosed space, no matter how desperate you feel, you should not inhale until you have taken steps to be ‘above the data’. Remember that in such an environment there will still be pockets of rarefied intellectual discourse. Stand as tall as you can, and, holding your nose in the air, proceed to your safe zone.

  • You do have a safe zone, don’t you? For some of us this is a methodology. For others it is a readily available source of non-toxic pseudo-data. If you are able to reach your safe zone before you are overcome by the data, the chances are good that you will find congenial company there.

  • The following section, initially unreadable, was deciphered on realizing that the cuneiform script in which it was written was nothing more than an alternative writing system for an ancient dialect of AMS-LaTeX math mode: Be well versed in communications theory. It is only by clanging on the data pipe with your bit bucket that you will be able to send messages to the surface. Furthermore, if you can learn to reduce the conditional entropy of your surroundings, less data will be stirred up and you have a higher chance of surviving data poisoning.

  • Once you are in the safe zone, you should wait for the arrival of the rescue party. Do not engage in debate, controversy or heated discussion, as these will rapidly exhaust your limited supply of intellectual oxygen. If a colleague panics or becomes over-excited, the kindest thing to do is to gently force them under the data, holding them down until they cease to struggle. This will increase everyone else’s chances of intellectual survival. You may be concerned about the long term consequences of this decision for your moral compass: this is quite natural. However, brain-imaging of the hypergallic gyrus (which Descartes correctly believed to be the seat of the conscience) reveals that 37% of survivors suffer few long term effects. While you may experience some degree of low-level remorse, especially in the first few days, post-trauma counseling services are available at all defense contracting firms, many government laboratories and a small but increasing number of first-rank research universities.

  • The rescue party will have access to powerful data clearing tools. These may include the use of default reasoning, two-faced logic, reductio ad libitum (with or without repeats in the development section), argument that the data is a consequence of a general principle (and therefore, having no explanatory impact, must be ignored), argument from non-existent languages, argument from notational elegance, argument for argument’s sake, and (in extreme cases) argument from personal preference (de gustibus non est disputandum, some say, but for us that is just a matter of taste). If all goes well, the rescue party should be able to lead you back out into the light before permanent damage is done.

  • After your experience, you may be tempted to dive back into the data immediately. Do not do this; even the most intellectually flexible require a recovery period. The medically recommended interval is two weeks for junior staff, two months for tenured faculty and two decades for distinguished university professors with a reputation to protect.


Finally, remember that there is usually nothing to fear from data. If you encounter data unexpectedly, the best thing is to ignore it.

(previously published in SpecGram)

No comments :

Post a Comment