Hundreds of websites collect information that users type into web forms and send it to tracking companies, even before users actually submit the form. That happens at about two percent of popular websites, researchers say.
Researchers from the Dutch Radboud University, the Belgian KU Leuven and the Swiss Université de Lausanne looked at web forms on the hundred thousand most visited websites. They then built a crawler based on DuckDuckGo’s Tracker Radar Collector. That looks for fields to enter e-mail addresses and passwords in Web forms. In that case, the crawler also intercepts the network traffic of those sites to see what information is being sent at what time. In doing so, the researchers concluded that in 1844 of the 100,000 websites, information is already being sent to third parties before the user sends the web form. This applies to websites visited from Europe. For visitors from the United States, this is even more common, 2950 times.
The researchers found this practice with forms used to log in or register, or when users wanted to sign up for a newsletter. In 1844 cases, the researchers found scripts that sent an MD5, SHA-1 or SHA-256 hash from the user’s email address to a third-party domain. In a third of those cases, that script was invoked from the external domain and therefore not by the domain where the user is currently located.
The trackers are most common on shopping sites, news sites and sites with fashion as the main topic, according to the researchers. In most cases, third-party trackers have built in password exfiltration blocks, but not always. The researchers found 52 websites where the password was also sent along to the third parties. The websites did improve that after the researchers contacted them.
Top ten websites where email addresses are leaked to tracker domains
|567||*newsweek.com||rlcdn.com||Hash (MD5, SHA-1, SHA-256)|
|Hash (SHA-256 with salt)|
Hash (SHA-256 with salt)
|217||healthline.com||rlcdn.com||Hash (MD5, SHA-1, SHA-256)|
|234||foxnews.com||rlcdn.com||Hash (MD5, SHA-1, SHA-256)|
|278||theverge.com||rlcdn.com||Hash (MD5, SHA-1, SHA-256)|
|288||webmd.com||rlcdn.com||Hash (MD5, SHA-1, SHA-256)|
*: Not reproducible anymore as of February 2022.
In most cases, the requests were made from known tracker domains that researchers say are included in many blocklists. But in 41 cases this did not happen: the domain was not known there. The researchers also conclude that collecting the addresses is probably not allowed under the AVG. This is because the collection is done without consent and has no further basis under which, for example, it is necessary for the data to be collected. The researchers also see the difference in website visits between the EU and US as a sign that advertisers are complying with the AVG.