The man behind it has now been identified, and says that he did it “for fun” – though he is also selling the data …
Data scraping is a controversial topic. At its simplest, it means writing a piece of software to visit a webpage, read the data displayed, and then add it to a database.
More commonly, people will use APIs (application programming interfaces) provided by the web service for legitimate purposes, and use it to grab large quantities of data.
It’s controversial because, on the one hand, those doing the scraping can argue that they are only accessing publicly available data – they are simply doing so in an efficient way. Others argue that they are abusing tools not intended for the purpose, and that there is more data available through APIs than is visible on websites, making it hard for users to know what data has been exposed.
There’s even controversy over terminology. Many security professionals argue that it isn’t a security breach if the data is available for public access. I would argue that if a service like LinkedIn doesn’t spot someone scraping literally hundreds of millions of records, that’s a massive security failing.
LinkedIn scraping for fun – and profit
BBC News spoke with the man who took the data, under the name Tom Liner.
How would you feel if all your information was catalogued by a hacker and put into a monster spreadsheet with millions of entries, to be sold online to the highest paying cyber-criminal?
That’s what a hacker calling himself Tom Liner did last month “for fun” when he compiled a database of 700 million LinkedIn users from all over the world, which he is selling for around $5,000 (£3,600; €4,200) […]
In the case of Mr Liner, his latest exploit was announced at 08:57 BST in a post on a notorious hacking forum […] “Hi, I have 700 million 2021 LinkedIn records”, he wrote. Included in the post was a link to a sample of a million records and an invite for other hackers to contact him privately and make him offers for his database.
Tom told me he created the 700 million LinkedIn database using “almost the exact same technique” that he used to create the Facebook list.
He said: “It took me several months to do. It was very complex. I had to hack the API of LinkedIn. If you do too many requests for user data in one time then the system will permanently ban you.”
LinkedIn denies that Liner used its API, but cybersecurity company SIS Intelligence says we need more controls over their use.
CEO Amir Hadžipašić says the details in this, and other mass-scraping events, are not what most people would expect to be available in the public domain. He thinks API programmes, which give more information about users than the general public can see, should be more tightly controlled.
“Large-scale leaks like this are concerning, given the intricate detail, in some cases, of this information – such as geographic locations or private mobile and email addresses.
“To most people it will come as a surprise that there’s so much information held by these API enrichment services.
Security expert and haveibeenpwned.com owner Troy Hunt says he doesn’t consider API misuse to be a security breach, but mostly agrees on the need for more control.
“I don’t disagree with the stance of Facebook and others but I feel that the response of ‘this isn’t a problem’ is, whilst possibly technically accurate, missing the sentiment of how valuable this user data is and their perhaps downplaying their own roles in the creation of these databases.”
FTC: We use income earning auto affiliate links. More.