The data appears to have been collected by a practice known as web-scraping, where a company accesses the web interface of a service and then collates data automatically …
This is different from a hack, as that involves breaking into a system in order to access data that is not supposed to be publicly accessible. Web-scraping accesses only public data.
For example, an automated system can access a series of YouTube channels, collecting the username, photo, and follower count of the channel owner. A whole database of these records becomes a privacy issue even though the data itself is public-facing.
Once that data has been collated into a database, you’d normally expect it to be protected. But TNW reports that a database of 235M records was found on the web with no password protection.
The scraped data had four major datasets with details of millions of users from the aforementioned platforms. It contained information such as profile name, full name, profile photo, age, gender, and follower stats […]
Bob Diachenko, the lead researcher for security firm Comparitech, found three identical copies of the database on August 1. According to Diachenko and the team, the data belonged to a now-defunct company called Deep Social.
When they reached out to the company, the request was forwarded to Hong-Kong-based firm Social Data, who acknowledged the breach and closed the access to the database. However, Social Data denied having any links with Deep Social.
Comparitech said that each record contained some or all of the following:
- Profile name
- Full real name
- Profile photo
- Account description
- Whether the profile belongs to a business or has advertisements
- Statistics about follower engagement, including:
- Number of followers
- Engagement rate
- Follower growth rate
- Audience gender
- Audience age
- Audience location
- Last post timestamp
Additionally, about 20% of the records sampled contained either a phone number or email address. As TNW notes, this type of data can be used for spam, but also for phishing attempts.
Web-scraping is usually prohibited by the terms and conditions of the services concerned, but a California court last year ruled that it’s not illegal. That can, in many cases, be a good thing.
For example, CityMapper is a hugely popular app which works out how to get from A to B in a city by the quickest method, pulling in live traffic and public transit data to do so. These days, most public transit companies make that data available via an API, but in the early days it was only available on the web. Web-scraping by early forerunners to CityMapper offered a handy way to make the data more usable.
Web-scraping can still be useful today, when companies put useful data on the web but don’t make it available through an API. Price-comparison services, for example, often still rely on web-scraping.
But scraping personal data is another matter, and courts perhaps need to distinguish between the two types of use.
FTC: We use income earning auto affiliate links. More.