Monday, May 16, 2016 - 17:44

Researchers hijack 70.000 OkCupid profiles for publication on Open Science Framework

According to Vox a student from Aarhus university in Denmark, which denies any involvement in the actions, thought it was appropriate to leech 70,000 profiles from OkCupid, an online dating platform, and publish it to Open Science Framework, which is a collaboration platform to share raw data.

The somewhat good news is that OSF immediately blocked access to the data. But quite obviously that is not going to prevent damage as the lead also promoted his gig on twitter. The data is out, someone has it and it's very much likely it's staying around for exploitation ... as it usually is in such cases. The internet doesn't easily forget.

His twitter posts show a massive disrespect for pretty much everything that's relevant here. It's worth reading. And make sure you have a look at the provided screenshots from the twitter conversation on Vox. Ít clearly shows that this is not a guy who fragged up and realized it afterwards. He's boasting.

Stating that he didn't know, didn't ask - and obviously didn't care either - if OkCupid would be ok with this raises some questions about ethics at Aarhus university. They might not be directly involved in this heist but they certainly are the ones who forgot to educate this guy. What the hell do they teach their students? One would assume that basic ethical behavior and rudimentary aspects of the legal situation when obtaining or compiling data is an essential part. Kirkegaard clearly hasn't heard of neither of it before. He is quite literally the worst case scenario you can produce. I have no real clue about OkCupid but I would guesstimate that the scientific value of this data is highly questionable. It's a free service and as such likely hords tons of fake profiles with fake data. 

The best part is where Kirkegaard explains why usernames were not anonymized. Of course to fill in missing data like profile images and profile text. They did not scrape the pictures because of disk space. What a lucky day for OkCupid's users that he's not just short on ethics but also short on money.

Not only is this data obtained without consent of owner and subjects and published without proper anonymization. It's actually a database owned by OkCupid and even though it's sort of public that doesn't mean you can use it for anything else than match making on OkCupid ... or whatever users are doing there. It's their intellectual property.

It's pretty much like scraping semi-publicly accessible pictures from a website and publish them under CC license on some public image database. The fact that you can access something doesn't mean you own it. And if you do not have ownership or consent it's pretty obvious that you cannot republish.

Kirkegaard clearly has not the faintest idea about legal aspects. Let alone ethical aspects that should really be quite obvious in this case. And apparently he also has no clue about the legal situation he's currently in. Even in the US this would be off limits. With Europe's much stronger privacy laws this is a fuckup that's going to hurt. And apparently Aarhus university knows this perfectly well with their swift response that essentially says: Pretty please stay the fuck away from us. OkCupid will go after him with all they have. They don't even have a choice here. If they get anywhere even near we're ok with this they can close their business.

I think it's pretty safe to say that smiley-face Kirkegaard will rather soon need a lawyer. And I don't think he'll be as happy about the gig when that lawyer explains the more or less inevitable outcome of legal actions by OkCupid.

The only difference between this and any other case in my field is that he didn't technically break into OkCupid's systems. That is a difference. But everything else is just the same old story. I doubt he's getting away with a warning shot. Certainly not after publicly boasting on twitter that he exactly knew what he was doing.

I'm going to close this farce with a general advice for users of social services. Actually of any kind you might be uncomfortable with if data is leaked. You cannot really prevent cross linking profiles if enough data is accessible. But you can mitigate the possible damage quite a bit by making it a lot harder.

Unless they go full commando and your physical address is also spilled the most risky data is actually your photo. Your name - unless it's really freaky - is just one among many. A search for my name yields 14, 000, 000 results on Google. Without anything else that's useful it's at least cumbersome to figure out which one is me. Same's true for usernames. Pick one that's a tad arbitrary and never use the same twice. Emails are distinct. So you might want to use a dedicated address for that service you don't use anywhere else. The big problem is anything that requires photos. Which I assume is somewhat necessary on a dating platform. If you use such services make sure you significantly reduce your footprint everywhere else. And certainly never use the same photo twice. If your profile photo on OkCupid is the same you use on Xing or facebook I will be able to find you there.

There's a recent issue with a tool that links anonymous faces in pornographic productions with their profiles on VK. Now it's a bit naive to assume you're possibly anonymous when doing internet porn. But I'm fairly certain they didn't think about this particular problem when they uploaded a profile photo to VK.

Google's image search is already pretty good and it's not dedicated facial recognition. It's a tad more generic. With the proper tool and dedicated targets (VK, facebook, twitter, etc) the rates can go way up even with similar photos. So be careful with profile photos. If you absolutely need one with service A you might want to opt out otherwise.