roseembolism: (Getoutta)
roseembolism ([personal profile] roseembolism) wrote2007-10-10 10:43 am

Text Analysis and Detecting Sockpuppets- any suggestions?

Based on some recent activity in LJ land, and out of some nasty-minded curiosity, I'm in the mood to find a freeware text analysis tool. One of those programs that analyzes writing styles to see if two texts were written by the same person, similar to the sort that teachers use to detect plagiarism.

Can anyone recommend one? I'm just curious to see if there's a bunch of enthusiastic Gor fans that use a similar writing style to rant about censorship, or just one.

[identity profile] racerxmachina.livejournal.com 2007-10-10 08:19 pm (UTC)(link)
Ask yer wife. ;)

A lot of research went on at UCSB about pattern recognition. Some of the research got translated into cheat-detector software, while still more of it went into biotech gene-mapping applications.

Check out PAIRWISE for some of the info: http://www.pairwise.cits.ucsb.edu/

[identity profile] britgeekgrrl.livejournal.com 2007-10-10 08:23 pm (UTC)(link)
For similar reasons, I'd be v. keen on finding the same sort of thing. I'll prod around the 'net and ask some teachers I know...

[identity profile] racerxmachina.livejournal.com 2007-10-10 08:40 pm (UTC)(link)
Try the demo version of Pairwise and see what you think. If it doesn't work, I can taunt the appropriate grad student and get some better results.

An absolutely useless note

[identity profile] kazuhiko04.livejournal.com 2007-10-11 02:04 am (UTC)(link)
A wonderful way of comparing texts in this way would be to use a Markov chain to analyse the probabilities of certain words following other words and then see how these probabilities compare between the various posts.

Unfortunately, a quick Google search on this shows a few theoretical results but no practical "run this app" type results.

Hence, I post to tell you I am of no help whatsoever.

Re: An absolutely useless note

[identity profile] racerxmachina.livejournal.com 2007-10-11 04:03 am (UTC)(link)
Have a look at Dr. Ming Li's webpage. http://www.cs.uwaterloo.ca/~mli/

He was a professor in bioinformatics for a brief while at UCSB, who studied the progressive changes of chain letters. The algorithm they used for language changes can be used to study genome changes in evolution.