Connect with us

Hi, what are you looking for?

The Real DailyThe Real Daily

Real Estate Technology

Real estate data scraping back in a big way with new bots, surprising new tactics

Data scraping was a hot issue in the industry once upon a time, but the solutions fixed the problems. There are new bots with new tactics, and our industry has some vulnerabilities – here’s what we all need to do.

data scraping

Property sellers expect that, when they give an agent their listing information, it will be used to market and sell their property. While that means that agents need to give the listing information exposure on the Internet, it is the agent’s responsibility to take reasonable steps that the data is not ‘scraped’ off of their website and used for illicit purposes, such as direct-marketing the seller, display on unapproved places on the Internet, and other undesirable uses.

Realtors, this means ensuring that your website and software providers have taken steps to ensure that the data is not harvested by malicious software (“bots”). This is mandated by MLS rules for Virtual Office Websites (VOWs) but not yet for other displays. But now that IDX rules are changing to allow sold data, IDX should definitely be re-examined as well.

The scraping issue of yesterday vs. today

The ‘scraping’ issue was the center of attention in our industry a few years ago, when several MLSs went up against a nationwide data ‘scraper’ that foolishly re-posted the stolen data online where it could be found.

Still, it cost these MLSs over 10 million dollars and a lot of time spent in court to get this one scraper to stop. But most of the scrapers’ work product never sees the light of day, so we can’t easily find them and go after them. Proactively stopping the bots is the only way we can deal with this problem.

Where are the bots now a problem?

  • MLS systems – past the login, but also prospecting and client collaboration features, and the framed IDX solutions some vendors offer
  • MLS/Association consumer facing websites with listings (not to mention the member roster)
  • IDX sites
  • Virtual Office Websites (VOW)
  • Publishers / Portals

Stopping those bots is not easy for a developer or webmaster

Even just a few years ago, it was easier. A bot wouldn’t look (to the web server) like a real web browser. A bot would look at too many listings from one IP address, or look through them faster than any human ever could.

You can still catch a few of the less sophisticated bots by watching for those kinds of things – but most of the scrapers have moved beyond that level of sophistication, and it’s all too easy to block the good bots you want crawling your site, like search engines.

These days, the bots may be written to automate the activities of real web browsers, making it harder to distinguish bot traffic from people traffic. The bots may be deployed on thousands of computers with IP addresses that may belong to, or be re-deployed to, actual legitimate users – so blocking an IP address is no longer effective.

And, instead of looking at thousands of listings from one computer, the bots can now look at just a few listings from many computers – so old fashioned “rate limiting” and review of how many listings were viewed by one computer no longer help us differentiate between bots and real people.

How to protect your site from these bots

There are a variety of companies that specialize in stopping the scrapers’ bots. At one end, there’s Sentor, which is good and used by – but way too expensive for the smaller non-enterprise-level companies that make up our industry. Another is Distil Networks, which has a number of large scale platforms protected as well as MLSs, IDX sites, and brokerages and is highly accurate and effective.

This morning I was just reviewing data on an IDX vendor protected by Distil Networks. I saw that, over the past two months, over 7 million page requests had been made by malicious bots (differentiated from the good search engine bots) – over a million requests made by one bot alone!

And that’s just one IDX vendor that had taken the step to implement an anti-scraping solution. What is going to happen when that vendor’s protections are ratcheted up for blocking bots? I will tell you – the bad guys will move on to an easier target. Is your site an easy target for web scraping? We’re all in this together, and everybody needs to do their part.

Advertisement. Scroll to continue reading.


Written By

Matt Cohen has been with Clareity Consulting for over 17 years, consulting for many of the real estate industry’s top Associations, MLSs, franchises, large brokerages and technology companies. Many clients look to Matt for help with system selection and negotiation. Technology providers look to Matt for assistance with product planning, software design, quality assurance, usability, and information security assessments. Matt has spoken at many industry events, has been published as an author in Stefan Swanepoel’s “Trends” report and many other publications, and has been honored by Inman News, being listed as one of the 100 Most Influential Real Estate Leaders.


The Daily Intel
in your inbox

Subscribe and get news and EXCLUSIVE content to your email inbox.




Meeting weekly is rarely something a team looks forward to, but there are ways for brokers to improve attendance easily.


Back in the dark ages of, maybe three years ago, IDX on an agent’s website was a hot topic. The web designers and gurus...


Pre-MLS listings are used for a variety of reasons, but as they become more popular for their advantages, let us look at the disadvantages.


When a doctor's hands get shaky, they can kill a patient. But when a Realtor's mind gets shaky, a client can lose thousands of...


The Real Daily is honest, up to the minute real estate industry news crafted for industry practitioners - we cut through the pay-to-play news fluff to bring you what's happening behind closed doors, what's meaningful to your practice, and what to expect in the future. We're your competitive advantage. The American Genius, LLC Copyright © 2005-2023