Dating by the Numbers: Why “hacking” OkCupid is a waste of everyone’s time
Is there no problem out there that can’t be solved by SCIENCE? Apparently not. Indeed, it has recently come to my attention that one heroic nerdy dude actually used MATH to get a girlfriend. No really, an actual girlfriend. As in, a living human female that he’s seen naked. (We can only presume.)
Wired magazine found the story so astonishing that they devoted an entire 3000 word feature to it.
The piece tells the tale of Chris McKinlay, then a grad student in mathematics at UCLA, who went searching for love on OKCupid, a dating site that uses daters’ answers to various questions, ranging from silly to profound, in order to calculate a “match score” that supposedly measures your compatibility with a potential date. But McKinlay wasn’t getting as many dates as he wanted.
So he decided to “reverse-engineer” OkCupid. As McKinlay — ever the romantic — explains on his own blog, he used his mathematical skillz to analyze the “high-dimensional user metadata in [the] putatively bipartite social graph structure [of] OkCupid,” and adjust his own profile accordingly.
Basically, he crunched a lot of numbers to figure out how the kinds of women he was most interested in — in particular one data “cluster dominated by women in their mid-twenties who looked like indie types, musicians and artists” — tended to answer questions. And then he fiddled with his own answers — and his choice of which questions to answer — so he would score higher match percentages with them. Ta da! Suddenly he had more matches.
He claims not to have answered any questions dishonestly, but as Wired notes “he let his computer figure out how much importance to assign each question, using a machine-learning algorithm called adaptive boosting to derive the best weightings.”
It doesn’t take a math degree to figure out that fudging your answers so they’re more like those of the women you’re targeting will make it look like you’re more like them. You can pull this same trick in real life by pretending to agree with everything a person says.
But you don’t have to be a psychologist to see that doing this kind of defeats the purpose of OKCupid’s match algorithms in the first place. You’re creating the illusion of chemistry where there may be none. Essentially, you’re cheating, but in a really self-defeating way.
And by focusing so intently on statistically crunchable data, he also ignored a lot of the more intangible “data” that the profiles provide if you actually sit down to read them. The numbers don’t reveal anything about a person’s verbal charm, or their sense of humor. They don’t tell you about the interesting little details of the person’s life.
As Katie Heaney notes in a Buzzfeed piece on McKinlay’s strange quest:
[M]uch of the language used in the story reflects a weird mathematician-pickup artist-hybrid view of women as mere data points … often quite literally: McKinlay refers to identity markers like ethnicity and religious beliefs as “all that crap”; his “survey data” is organized into a “single, solid gob”; unforeseen traits like tattoos and dog ownership are called “latent variables.” By viewing himself as a developer, and the women on OkCupid as subjects to be organized and “mined,” McKinlay places himself in a perceived greater place of power. Women are accessories he’s entitled to. Pickup artists do this too, calling women “targets” and places where they live and hang out “marketplaces.” It’s a spectrum, to be sure, but McKinlay’s worldview and the PUA worldview are two stops along it. Both seem to regard women as abstract prizes for clever wordplay or, as it may be, skilled coding. Neither seems particularly aware of, or concerned with, what happens after simply getting a woman to say yes.
And that’s where McKinlay’s system seems to have fallen down entirely. Though Wired is eager to present his “hacking” as a great success, it took McKinlay more than 90 dates — 87 of them first dates with no followup — before he found his current girlfriend.
In other words, his wondrous system produced a metric shit-ton of “false matches” and wasted a lot of people’s time, including his own.
And in the end it wasn’t his data crunching that brought his girlfriend to his door; as Wired notes, she found him on OKCupid after doing a “search for 6-foot guys with blue eyes near UCLA.” Happily for him, McKinlay already matched her preferences in these areas. In addition to appreciating his height and eye color and location in physical space, she apparently was also charmed by his cynical approach to OkCupid dating, so maybe they are a match made in heaven, if not in his data crunching techniques.
While McKinlay was going on first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date after first date first date after first date after first date after first date after first date after first date after first date, people I know have found wonderfully compatible matches — and long-term relationships — through OkCupid without having to date dozens of duds along the way.
How? Partly because OkCupid’s match algorithms led them to some interesting candidates. But mainly because they read profiles carefully and looked for compatibility in the words, not the numbers.
Posted on February 7, 2014, in okcupid, PUA and tagged okcupid. Bookmark the permalink. 169 Comments.
I can imagine maybe the algorithm favors profiles that have been updated more recently, so you don’t turn up a bunch of people who haven’t been on for a year? Or maybe it’s so you’ll always look different so if you keep popping up in the same person’s results, it’ll be harder for her to go “Oh, it’s that creep again?”
Clearly, his problem was with thinking there are 7 types of women, when we all know of course there is only one type! (Gag. Also, looking at the Diverse cluster, it seems to be named that way because the women gave diverse answers to the questions. How is that a cluster again…) I’m really curious to hear from those 90 women because 0/90 is an impressive strikeout rate even under the circumstances. I’m sure they’re so glad they got used in a “social experiment” for a book too, and will be happy to find out they were sorted based on their same-sex experiences and willingness for one-night stands.
katz-
I guess that’s what Shadow Ninja is trying to make sound all sinister so he can get a false equivalence going.
Which, yeah, if you haven’t been on in a while and answered questions and so on, you’re not going to be in the top of anyone’s lists. It’s a “problem” that can be “solved” by just signing on regularly when you are actively looking for romantic partners and being active.
The whole thing is like some Goofus and Gallant thing with Goofus here designing a bad and time-consuming method of solving a problem more easily solved by just getting on his fucking computer and being a real person.
But I guess that wouldn’t have met the real need he wanted of thinking he was super l33t hacker d00d because he could program a script to make a really complex version of lying on your profile in order to waste everybody’s time.
What bothers me the most is how much time and devotion he spent on this little project of his. Like your PhD dissertation got put on the backburner? seriously?
@delphi, yes we probably do work in a very similar area although I’m applied rather than theoretical.
I’m not sure that he did any boosting because there was no mention of testing, training, and validating data sets in the breathless Wired piece.
I’m also curious as to how he picked the clusters he wanted to pursue, and I imagine it was the ones that made his penis happiest when he looked at the photos - because it certainly *wasn’t compatibility based on answer preferences*. Funny how the Wired piece stays clear away from that angle.
It’s not just that what he did was creepy and unethical, and *purposely contravened the policies of OKCupid* but also that he is now the most salient example of a mathematical/ machine learning / programming geek in people’s heads. Guys I know who work in those areas aren’t like him at all, but he’s just created a negative stereotype for people to laugh at for guys in that group. These are the guys I can be a total geek* around, and I’m accepted for who I am and what I can do, and they’re not fucking scared of/weirded out by me like other subsets of guys. Because female geeks still aren’t really that accepted.
So what he did *hurts other men*.
*I might mean nerd. But this was the correct use of geek when I was a teenager.
@Shadow Nirvana
I don’t know or care about those other examples, but how does this constitute a “backlash”? The guy wrote a book called “Optimal Cupid: Mastering the Hidden Logic of OkCupid” and the book is being marketed (I got 200+ women to go out with me and bagged me the fiancée of my dreams - AND SO CAN YOU!) and, gasp, criticized. Someone on Buzzfeed disagreed with the fawning Wired piece. Boo fucking hoo.
Katie Heaney, the author of the Buzzfeed article wrote a book about her wacky dating adventures. Feel free to criticize that fairly, unfairly or how ever you like. Have at it. People can judge what you say when you publish an advice book. I think his outrage over Buzzfeed is childish, but I get that no one likes criticism and it is pretty amusing that they offered him a job.
I don’t find what he did particularly wrong or outrageous, even though I think his “hacking” job of OK Cupid is being overblown. It’s a commercial enterprise, not a unquestionable force for good. They charge people $1-2 to “promote” their profile for 15 minutes. I don’t see the problem of people wanting to learn how make their profile more appealing or know how the website’s algorithms work. I also think it’s perfectly reasonable for people to disagree and voice their concerns.
The guy should be happy for the publicity because this “backlash” is pretty weak and will be forgotten momentarily by everyone but him and over-zealous defenders such as yourself.
Anyone else in love with the cat profile David found, with that amazing street shot of the cat. So debonair and self-assured. *Swoons*
Oh I missed this in the breathless wall of admiration that was the Wired item:
So his testing sample “clustered in a similar way” therefore his method “worked”? Actually, it sounds like the method didn’t entirely bloody work.
- did he still get 7 clusters?
- how did the distances between the clusters change?
- how did the question/answer weights change?
- what was the conclusion that the test sample was “similarly” clustered based on?
Because if I read this as an article that I was peer reviewing, those are the questions I would be asking. I don’t care that this isn’t in peer reviewed literature, those are basic questions that should have been asked. Journos who don’t know to ask those questions shouldn’t be writing pieces on technical stuff they don’t understand.
Reading on another site, I found out that the “Dog” cluster is women who have dogs (and not related to his personal opinion on attractiveness). If the most salient feature of one of the bloody clusters simply boils down to a woman owning a dog, I would not anticipate that many of these clusters would be useful in practice.
The most important result of all this math and hackery is that:
>>>Dating with his computer-endowed profiles was a completely different game. He could ignore messages consisting of bad one-liners. He responded to the ones that showed a sense of humor or displayed something interesting in their bios. Back when he was the pursuer, he’d swapped three to five messages to get a single date. Now he’d send just one reply. “You seem really cool. Want to meet?”
Basically, it changed his OKCupid usage pattern as if he was an awerage woman (like being able to get up to two dates a day). Sooo creepy.
Brooked:
Yes, he didn’t hack OKCupid, he botted it. Against their botting policy. And when his bots were detected, he created a new botting method. He data scraped OKCupid, which OKCupid actively seeks to prevent.
Imagine you run a dating website. You earn revenue this way, and you know you have to make your site attractive for people to use because profit! So you set up terms and conditions and privacy policies, etc, to make people feel comfortable about using your website. You strongly restrict access to your datasets (e.g. only make them available to your in-house mathematicians/statisticians who use the data to improve matching algorithms so the site continues to be useful to people who use it).
Then, someone comes along and scrapes information on thousands of your users to make their penis happy. While the users may have taken pains to ensure that their profiles mean they are relatively anonymous, the amount of data scraped for each user now makes at least some of them identifiable on the basis of the way they answered 300 questions. None of the users gave their permission to have their data used in this way, so there is no informed consent to have their OKCupid profiles used. The data on these profiles are stored on a university’s network, which is obviously outside of the OKCupid servers - who has access to this data now?
Can you see how this is wrong?
Oh, and if anyone’s curious, this isn’t what Amy Webb did at all! Let’s see if I can embed:
What she did was analyze other women’s good profiles because she was reaaally terrible at making a profile.
As for “that Canadian girl”: (She’s a woman, not a girl, and she has a name: Erin Wotherspoon.) She’s a foodie who runs a review blog about the food she has on the many dates she’s asked on. She admits flippantly that her aim is to have people pay for her to eat nice dinners. Naturally, she’s been sent death threats for it. A nice thing for her to do? Certainly not. Does the backlash against women doing a bad thing turn a hundred times more intense, violent, and sexual right off the bat? Yeah.
Nope, embedding fail. Here’s Amy Webb’s TED Talk.
Kiwi girl: thanks you for the detailed explanation of what he did. Really sheds light on how skeezy what he did was.
Do you teach? I ask because how you explained things made it clear to me, and I understand very little about this stuff.
And of course, the prerequisite whinging that women have it so much easier in the dating field because the get so much attention:
Note that the ” awerage” woman can get up to two dates a day!
@hellkell: I’m not employed as a teacher, but I have been training people on various aspects of statistics and statistical interpretations for a number of years. My brain understands things from a practical “why do I want to know this?” angle, so that is how I do my training. I’ve also done some small amounts of remedial teaching to failing first year statistics students. I really believe that most people aren’t stupid, it’s just that they haven’t had [insert statistics method here] explained in a way that makes sense to them, or in a way that’s relevant to them so they can go, aha!
I’m a very visual learner, so I try to grab very descriptive examples to illustrate points. But I went through a number of years thinking I was very slow at particular things - when it was simply the teaching method that was the problem - so I empathise with people who think they’ll never grip something up.
tl;dr thanks for the compliment.
@shayla
I figured his version of Webb and the “Canadian girl” stories were BS.
When death threats, virulent denouncements and extended hysteria are involved then expressing concern over a backlash is very justified. As far as I can tell the worst thing that happened to this guy is people have called his OK Cupid “hack” creepy and unethical in a perfectly reasonable manner. That’s why I’m bugged by Shadow’s moral outrage over his treatment.
When people discuss and express criticism over his methods and motives, it’s not a backlash and he’s not a victim.
Here’s a nice graphic about whether a person should google *one* other person, let alone data scrape multiple people’s information, which goes some way to explaining creepiness:
http://www.thedatereport.com/dating/communication/heres-a-flowchart-to-figure-out-if-you-should-google-that-person/
@Kiwi Girl
Your analysis and explanations of his methods have been pretty fascinating actually, much better than the Wired puff piece. Thanks!
However, I stand by my original statements about the cat photo above and will go to the mattresses to defend it.
@Brooked LOL, so long as the mattress is ON THE BEACH (as kitteh pointed out) and your apartment has lawyers and doctors living in it, the cat won’t care.
I’m so under the paw in my household that one of my cats (the youngest, who still plays up to being “the baby”) has developed a distinctly “no, mummy” meow when I do something he doesn’t approve of at the time. Mr Kiwi finds it hilarious.
99 out of 99 data points can’t be outliers, by definition.
@vaity: point!
I just thought of an analogy for the method he used. Say I asked you to go to Toys R Us and group all the toys into at least three groups. But you cannot use “I would like this toy” and “I would not like this toy” in your decision making for your groups. You could use colour, intended age range for the toy, whether it makes a noise or not, whether it needs more than one person to be involved in order to be fun, whether it needs batteries, and so forth. You decide your attributes for the decisions, but can’t be on the basis of “I like”.
After you sort all the toys, you then go through each group and look at them to see which group contains the toys you would most like. You use the shared attributes in this group to define your toy preference.
Do you think all the toys you would like, would be in that group? Or would some toys you would like also be in other groups?
Do you think that the group that you choose would contain all the toys you would like?
If you know what you prefer in a toy, why would you group them using a method that actually ignored your preferences in toys?
Do you think I can get a funding grant so that we can try this out for reals?
I’m actually less creeped out by his changing around his answers (though that still seems a helluva lot of work for a trivial, creepy thing) than I am the botting.
Like, if I were on a dating website, the whole reason I’d talk to people is to get to know them. I’d find the whole fudging answers creepy, but I’d be WAY more upset over finding out dudes were sending me bots. DUDE. Do you give a shit or not? (Well, obviously this guy did, considering the man-hours he put in, but JESUS.) If I wanted to talk to robots, I’d go through another round with the disability folks.
I would not want to date someone who was just bot-spamming everyone. Spamming and botting is gross and obnoxious. What the hell.
Also… sleeping on his desk? Putting aside his dissertation? Seriously, all that alone is making me give dude the side-eye.
I am imagining the fallout if a woman had done the same thing (botting OKCupid), had gone on ninety first dates, et cetera, then had it written up in Wired.
my guess is she’d have to go into the Witness Protection Program due to the death threats.
It is as far as I’m concerned. If he didn’t care about a subject I cared about a lot, but said he did because he fancied a date, he’s lying. He’s also a fucking idiot, because odds are the subject will come up in conversation, and his little pretence will fall down.
::snicker:: He’ll end up in the Permanently Invisible Zone the way he’s going!
Shhhh, letting people think they’re just pets is part of their world domination strategy.
Generally, my reaction to Mr Maths Genius? Contempt. Any chance of OKCupid sueing his ass off?
Kitteh — it doesn’t quite work like that, you can’t say how much your answer matters to you (which is really annoying). There’re three parts to each question: your answer, what answers you’d accept, and how important it is that they put one of those answers. So if he doesn’t actually care if, say, she’d date a smoker, but his desired cluster (ugh) is full of people who would, instead of saying that how she answers is irrelevant he’d say that it’s mandatory that she put yes, she’d date a smoker. That make sense?
To whomever it was who thought the match %s were different for them and you — if you read their explanation of how that’s calculated, they take those two numbers and multiply them to arrive at the %s you see. I’m not 100% sure that method means that you and them see the same number though.
Seeing how I can test that, I just sent a message doing so. I’m been talking to someone on OKC who reads manboobz sometimes so I asked if the numbers are different, we shall see!
Ah, okay, I was thinking of different types of subjects, too, like politics or whatever, apart from not knowing how the questions are framed.
With the percentages being different thing, I thought I had done that with a friend and we had different %, which is why I said it. But my memory is not to be relied upon, so it’s good that you’re testing it properly
@ Kiwi Girl
Not only was your flow chart link useful, it also led me to the show below, which will keep me entertained for a while and will be shared with many friends. So thanks!
(So NSFW - it’s a show in which random gay dudes try to make straight porn actors come by blowing them. While the actors are standing in a box.)
http://www.dailymotion.com/video/x12x3hw_poko-x-tate-orgasm-wars-av-actor-sawai-vs-takuya_fun?start=23