NLP: unscheduled, real, and live.: 2010

Thursday, May 6, 2010

What probability is about

Just as formal logic is a tool for capturing intuitions about effective ways of reasoning probability is a tool for capturing effective ways of reasoning about uncertainty. In logic, we are taught to proceed from premises to conclusions. We conclude that "If every CSE graduate takes the exit exam before graduating and Emily is a CSE graduate, then Emily must have taken the exam" because this matches the Modus ponens pattern $P \to Q, P \vdash Q$ . The good thing about this is that if we trust Modus ponens and we believe that we have correctly mapped the English description of the conclusion onto the terms of the formula, we know that the argument is valid. Notice that this claim of validity is not affected by the truth or falsity of the premises. Even if Emily never was a CSE student, or even if some CSE students are in fact able to weasel out of taking the exit exam, the argument is still OK. Not, however, especially useful as a way of working out stuff about Emily: if we aren't sure about the assumptions, we can't be sure about the conclusion either. Aristotle hit on the idea of laying out a taxonomy of plausible ways of reasoning, then thinking, in the abstract, about which of these patterns of reasoning actually deliver valid arguments. This is a wonderful idea, because you can rely on the insights about the patterns of reasoning without having to be concerned about the truth or falsity of particular facts.

In the same way that logic is the study of patterns of true/false reasoning, probability is the study of patterns of reasoning in the face of uncertainty.

Tuesday, April 27, 2010

Why statisticians shouldn't write movie titles

Never Give a Sucker an Asymptotically Even Flip
The Variational Enigma
The Fisher King
Independence Day
Return to Monte Casino
The Man Who Measured the Bank at Monte Carlo
Between 99 and 103 Dalmatians
8.5 +/- 0.2
The Metropolis Method (void where prohibited by law)
Improper Priors go wild on Cancun

Tuesday, April 6, 2010

War reporting

Yesterday Wikileaks

posted a shocking video taken from an Apache helicopter in Iraq. It

Julian Assange, who founded Wikileaks, was interviewed about this.

. He feels that the soldiers in the helicopter obviously committed a war crime, and that

Plain speaker's guide to "any more" and "anymore:

First here's a rephrasing of what Huddleston and Pullum's epic Cambridge Grammar of English says about "any more" and similar adverbs. The main discussion is on p 710 and following, with other bits on 823 and 831

They are polarity sensitive: this means that there is a difference in acceptability between "She isn't here any more" and "She is here any more". For many speakers, the first is OK, the second not.

The difference between "any more" and "anymore" is a British/American spelling difference.

You can line up "anymore" with "still" and "no longer". They differ in how they work with negation.

My own impressions follow. Most speakers can say :

"She is still here" (i.e. she is here and has been for a while)
"She is still not here" (i.e. we are waiting, and she still hasn't arrived),
"She is not here anymore" ,"She is no longer here" (in both cases, she was here, but now isn't)

Many speakers find: "She is not still here","She is here anymore" awkward. For the first one the intended meaning is the same as the one expressed by "She is no longer here". Some speakers, including me, blow a fuse when confronted with the second one, and don't even understand what it means. For others, "anymore" can be used anywhere that "nowadays" is, with much the same meaning, so "She is here anymore" could be used (if you are, say, in a bar) when the person in question used to avoid the bar but now hangs out there on a regular basis. Similarly "Ice cream is cheap anymore" works for many people, but in my natural dialects, I would have to either turn it round and say "Ice cream isn't expensive anymore" or punt and say "Ice cream is cheap nowadays".

Unfortunately, linguists have taken to confusing themselves and others by talking about "positive anymore". If they had called it "nowadays anymore" there would have been no trouble. These adverbs are neither positive nor negative, just a little fussy about what kind of sentences they like to be wrapped up in. The "nowadays" translation helped me, and is from John Lawler. As he says

Apparently, for users of positive "anymore", "nowadays" doesn't
cut it anymore. Anymore, they use "anymore" instead. Or perhaps
only in certain speech contexts; the definitive sociolinguistic
study remains to be done.

I guess I can forgive him for using the term "positive", because he puts it in quotes and gives an amusing example.

By the way, in Columbus, Ohio. ice cream really is cheap and good at Graeter's and Jeni's . No ice creams were consumed in the creation of this post, but several area shops are on high alert.

Facebook's de facto terms of use

If you are thinking of collecting and distributing data from social media sites, you should read
Pete Warden's account of how Facebook responded to his activities.. Facebook appears to be keen to exert more control than one would think they are entitled to, and certainly more than is convenient for academics. Nobody knows how this would play out in court… Twitter is looking better than ever as a data source.

Thursday, April 1, 2010

Genuinely funny April Fool article

This one actually made me laugh:

A would-be saboteur arrested today at the Large Hadron Collider in Switzerland made the bizarre claim that he was from the future. Eloi Cole, a strangely dressed young man, said that he had travelled back in time to prevent the LHC from destroying the world.

The LHC successfully collided particles at record force earlier this week, a milestone Mr Cole was attempting to disrupt by stopping supplies of Mountain Dew to the experiment's vending machines.

http://crave.cnet.co.uk/gadgets/0,39029552,49305387,00.htm?s_cid=33

Environmentally concerned spouse

"It felt so good to throw away that Martha Stewart dishwasher liquid"

Context: in our machine, Martha Stewart's green dishwasher liquid may be green, but isn't effective for washing.

Monday, March 22, 2010

SPLPAC

The Society for the Promotion of Long Prepositions,Adverbs and Conjunctions wishes, henceforward, to exist, notwithstanding its lack of positive ontological status heretofore. Moreover. it regrets and plans to remedy its previous delinquencies in this area, but nevertheless accepts that its existence may not continue for long. Contrariwise, it sees itself as a lexical mayfly skittering over the surface of the language, and is OK with that. Anyone know where the nectar is?

Thursday, March 18, 2010

Are you a linguist interested in taking CS classes at OSU?

If so, I have two options for you:

Option 1

If you want to prove that you know some computer science and to demonstrate the ability to do your own programming (all scientists should want these skills), then take the

Graduate Interdisciplinary Specialization/Minor in Applied Software Engineering

which requires you to take:

1) CSE 502: Object-Oriented Programming for Engineers and Scientists (can be waived if you have some programming experience)

and

CSE 688: Applied Component-Based Programming for Engineers and Scientists

2) One of:

a) CSE 767 Applied Use-Case-Driven OOAD for Engineers and Scientists

b) CSE/ECE 794R: Applied Enterprise Distributed Computing for Engineers and Scientists

3) One or more other elective courses to make up to 15 credit hours:

I strongly recommend that linguists taking this path choose their electives as CSE 630: Survey of Artificial Intelligence I and CSE 680: Introduction to Analysis of Algorithms and Data Structures.

Everything about this strikes me as recommendable. If you have the choice, maybe you should do 2b in preference to 2a.

Option 2: If you want to compete for postdocs and other jobs that need demonstrable CS training, Masters' degree, Research track is your only sensible option.

http://www.cse.ohio-state.edu/grad/ms.shtml

This is a much bigger commitment of time and effort, but gets you an extra and very solid academic qualification.

It is exceptionally hard to do this unless you have the energy, time and funding to devote an extra year or more to your studies. The 7xx classes in the Masters program are difficult. We are talking here about classes that substantial numbers of well-prepared computer scientists are prepared to fund themselves to do, because they judge that it will improve their earning potential. They are correct, but both you and they should expect to work very hard to realize the potential.

Similar options will be available once we move to semesters.

Wednesday, March 17, 2010

The Well-Designed Child

John McCarthy (who, in 1960, wrote the first paper on Lisp), has written a provocative piece called "The Well-Designed Child". It's been published in Artificial Intelligence recently (see below), but seems to have been mostly complete by 1997. The point of the piece is to advocate for a kind of common-sense nativism in understanding human and animal capabilities. McCarthy points out that the world has a number of properties that the well-designed child would not learn ab initio. These include object persistence, the tendency for objects to continue exist even when unseen, and the consequences of gravity, which make it reasonable, for many purposes, to conceptualize the world as mostly two-dimensional. In the same way, he says that the human bias to perceive objects in terms of natural kinds is such a powerful structuring device that evolution ought to build it in as an assumption rather than requiring each child to learn its effectiveness anew.

For linguists, this article could be seen as a contribution to the debate about linguistic nativism: whether the human child has an inbuilt language faculty. But McCarthy touches on that only in passing, being much more concerned to demolish the tabula rasa hypothesis as a basis for the design of child-like robots, and to rehearse some claims about what the language of thought might be like.

The thing I really like about this article is the way it emphasizes the continuity between human capabilities and those of other animals. According to McCarthy, many of the things that make us human are similar to or even identical with the things that make dogs canine, cats feline, mice murine and octopuses octopoidal (although we would expect sea creatures, who live in a genuine 3D environment, to be out of line with us on the stuff about gravity). And those things are largely about the styles of conceptualization which allow organisms to be effective in the particular world in which we find ourselves.

References

John McCarthy, The well-designed child, Artificial Intelligence, Volume 172, Issue 18, Special Review Issue, December 2008, Pages 2003-2014, ISSN 0004-3702, DOI: 10.1016/j.artint.2008.10.001.
(http://www.sciencedirect.com/science/article/B6TYF-4TMJ41M-2/2/6076a1f21080a46b5bb52900faf763c7)

Saturday, March 13, 2010

Alice in Wonderland

Well, I liked it. I'm glad we didn't get discouraged by the reviews and go and see "Up in the air" instead. My favorite character, the creepy (though "good") White Queen.

Tim Burton was not at all faithful to the originals, but made a suitably weird world for Mia Waskikowska's Alice to inhabit. Some of the characters are like Lewis Carroll's. Steven Fry's Cheshire Cat was just one of many good efforts in small parts.

This Alice is a modern teenager, who resists the conventional roles that get thrust on her. She's not a Victorian maiden, and she's a pretty reluctant Hollywood action hero. Her dress in the potion clip deserves to make somebody a few millions in sales, and all the costumes and visuals are up to Tim Burton's usual standard. I especially loved the army of playing cards.

In this movie the IMAX 3D was just OK, for me, not a big thrill like the 3D from Avatar.

Helena Bonham-Carter's Red Queen is BIGHEADED, IMPATIENT, AND UNREASONABLE. Apparently she based the performance on her imperious toddler daughter.

Somebody (not Lewis Carroll) made up a lovely bloodhound for Timothy Spall to play.

The bloodhound produces the best line in the movie, spoken by a deeply cynical talking horse. I won't spoil it …

Heard Keith Devlin on NPR ( http://www.npr.org/templates/story/story.php?storyId=12463231 ) saying that the original Alice was partly a sly mathematical satire. Let's just say that Tim Burton's Alice has less math in it than Mean Girls. Actually, Wasikowska's Alice is a bit like Lindsay Lohan's character in that film, and there's one incidental plot touch in Alice that Tina Fey would have been proud of.

Bottom line: fun movie, great visuals, plot maybe a little conventional, enjoyable performances by a swarm of British stars. No math, chess or puzzles, but that's OK. I'd be happy to see it again.

Thursday, March 11, 2010

Chart Parser initial release

I just put up on Google Code a version of the chart parser code I use for teaching computational linguistics. Feedback welcome. The slightly cruel example sentences about pigeons in cages being … um … reinforced are a throwback to my grad school days in experimental psychology. No pigeons were harmed in the making of this parser.

http://code.google.com/p/chartparse/

Friday, March 5, 2010

Teaching in Tübingen

I just spent a very happy couple of weeks teaching a compact (=insanely intensive) course from Croft, Metzler, Strohman's Search Engines: Information Retrieval in Practice. Really good textbook. Teaching four hours a day for two weeks is actually feasible! There were some nice projects: two groups added Named Entity tagging and one made a Russian version of Galago.

Saturday, February 13, 2010

The Black Chamber Society

For the past few years I have been running a Codes and Code Breaking course at Ohio State. We got the idea from Chris Kennedy who taught a similar course at Northwestern, and before that at UCSD, where the course has now been pushed in a slightly different direction by Andy Kehler. It's a very enjoyable course to teach.The way we do it there is a lot of problem solving and group work, and just a smidge of modern stuff like public key cryptography. We feed the people who become passionate about the math and CS aspects into courses like Steve Lai's graduate CSE class. I am grateful to the National Science Foundation for helping support the course in its early years. Most of the students come from Linguistics or from the Security and Intelligence major. They're delightful (usually) and smart (on average). Both Linguistics and International Studies seem to get many students who enjoy learning and are good at it.

We are beginning to open this course up to be taught by graduate students as well as faculty. The first graduate instructor is the wonderful DJ Hovermale who uses new-fangled avian social media to give clues to the assignments. He also sometimes uses disguise, confederates and a thick Russian accent to make the first lecture into a coup de theatre. When I teach the class again, I'll need to up my game.

The students last quarter were so fired up by all this that they decided to keep the thing going as a student society. So we have (drumroll) The Black Chamber Society. This is immensely gratifying. Thanks to all concerned.

Wednesday, February 10, 2010

DSM V

DSM V is the new version of the Diagnostic and Statistical Manual of Mental Disorders. There are some major changes that are very well covered in press stories (bipolar? children, Asperger's). It's obvious that a label such as Asperger's can affect the expectations and actions of doctors, patients, family members and anyone else who gets involved. The new DSM gives us a chance to think about some of the following:

What happens when a doctor applies a label to a patient. Do they get access to medication, support services, behavioural treatments? Are any of these forced on the patient?

From the reports it seems to an open-and-shut case that some children diagnosed with bipolar disorder have been receiving heavy psychotropic drugs when they shouldn't have been. This is a case where the changes in the DSM are clearly motivated by a desire to reduce the amount of inappropriate treatment.

What about the people who currently identify as Asperger's but do not necessarily fit the diagnostic criteria for the sub-class of Autism Spectrum Disorder that replaces it? Such people might easily have Aspergian traits, but have learned, for example, to handle social interaction with a degree of smoothness and grace. Does their identity depend on fitting into a particular medical category, and if so, is that OK?

It's a commonplace among social scientists that labeling a person has consequences. Many mental illnesses have an associated stigma. To a degree, Asperger's has an anti-stigma, because of the wonderful things that Aspergians have often done. So it turns out that removing the label is an act that has consequences.

In the New York Times, Dr William Carpenter is quoted as saying "“Concerns about stigma and excessive treatment must be there. But keep in mind that these are individuals seeking help, who have distress, and the question is, What’s wrong with them?”. This is interesting, but actually the DSM is not exclusively for doctors and other health professionals, it is also a resource for people who don't necessarily think of themselves as having anything wrong with them. I was queasy about using the term patient earlier on. This is why.

If you care to comment, take a look at DSM5.org The process looks as if it is very top down. Here's a representative piece of verbiage.

Anyone can submit their suggestions and ideas to the members of the work groups through the DSM-5 Web site, by clicking the "Participate" button on the upper right hand side of this screen and registering. The proposed draft revisions to DSM-5 are posted on the Web site, and anyone can provide feedback to the work groups on these during periods of public comment.

Finally, members of the DSM-5 Task Force have given numerous interviews to members of the trade and consumer media to help explain the process of development to mental health professionals, consumers and family members, and members of the public, and will continue to do so through the development process.

[my emphasis]. Notwithstanding the public comment process, I don't see a strong intent to involve. anyone who is not a psychiatric professional. I think this needs to change, or be changed.

Monday, January 25, 2010

No problems with Vampyres here, move along...

Concerned spouse: Oh, OK. It's blood, I thought it might be jam.

Spousal concern was about potential breakfast-related grooming error. Loss of blood due to shaving injury not a concern. Could she be a Jampyre?

Monday, January 11, 2010

XML and Corpora

I just got asked to advise on tutorial materials about XML for a computer scientist starting a corpus encoding project. To get started with the very basics I like Greg Wilson's Software Carpentry lecture. For more advice on how to go about building up a corpus, see Developing Linguistic Corpora: a Guide to Good Practice, by Martin Wynne and a bunch of Humanities Computing luminaries.

The project is going to work with childrens' books. That led me to find the Comic Book Markup Language provides a tool for adding analytical markup to (wait for it) Comic Books. It uses TEI, which is great, but heavy-duty.

Subscribe to: Posts ( Atom )

NLP: unscheduled, real, and live.