KatsBits Community

General Category => Blog => Topic started by: kat on April 22, 2017, 10:36:26 PM

Title: MarkMonitor, AWS and site scanning abuse
Post by: kat on April 22, 2017, 10:36:26 PM

[image courtesy Amazon]

The last time MarkMonitor was mentioned here on KatsBits was back in 2011 (https://www.katsbits.com/smforum/index.php?topic=293.0) when their aggressive BOT was discovered to be consuming a disproportionate amount of bandwidth to scour the entire server KatsBits ran from. Scrapers, snoopers and other types of BOT that intentionally ignore robots.txt whilst mooching around a website aren't normally a problem because they are often indexing content for custom built search engine products (the fact they do this is for another conversation). What's special about MarkMonitor's BOT however, is its offensive (meaning "preemptive", "active") aggressiveness; it simply does not care how much bandwidth is consumed as it move through a target website like a bull in a china shop, to the extent that bandwidth averages can be significantly different after their BOT has paid a visit. Especially troubling for image heavy websites.
Quote
Long story short, MarkMonitor are a "global leader in brand protection". Big brands task them to paparazzi their way around the internet looking for brand infringement ("paparazzi" because like that particular beast, they intentionally ignore common protocols to do what they do). They're not specifically looking for Copyright violations so much as broader 'brand' abuse they can take action against.

Back then MarkMonitor used to serve their brand tracking/investigation BOT from their own IP address making it relatively straightforward to block its bandwidth abuses. Now however, MarkMonitor uses Amazon Web Services as a third-party content distribution system to offset their own bandwidth use, and more importantly, obfuscate their presence in the scanning and network abuse the bot is engaged in. The nefarious nature of this latter point cannot be stressed enough regardless as to how it might be argued (justified).

What this now means for webmasters versus perhaps five or so years ago, is that abuse logs simply reference IP addresses associated with AWS server instances instead of MarkMonitors own domain name/IP (e.g., markmonitor.com/209.200.xxx.xxx). In other words, at face value it's slightly more difficult to trace the abuse back to the abuser, a fact that for them, reduces their liabilities.

What's more, whilst these abuse instances can be reported to Amazon using their EC2/AWS abuse reporting system (https://aws.amazon.com/forms/report-abuse) (or directly mailing ec2-abuse@amazon.com), there is little assistance for those caught in Saurons MarkMonitors glare (their network abuse has been an ongoing problem for KatsBits for the better part of 10 years). Even then if abuse is found to have occurred, Amazon simply reiterates privacy policies prohibitions preventing the revelation of pertinent information about the abuser and what they were/are doing. Fortunately they don't need to as there are plenty of other ways to find this out. But that's by-the-by.

To get an idea of the extent of the abuse perpetrated by MarkMonitor, below is a list of the most recent instances of AWS abuse traced back to MarkMonitor, a few from a list of hundreds reported to Amazon this month (caveat: the nature of AWS means that whilst the addresses listed below currently resolve to MarkMonitor, they may be  dynamically reassigned to another entity at some point in the future - when in doubt perform a "reverse lookup" to see what's at the end of the rainbow before then reporting the suspicious activity to Amazon so a record exists);Discovering all this is one thing. Knowing what to do with it is another. At the very least some pointed and pertinent questions need to be asked of MarkMonitor:
- Why do they ignore robots.txt  (beyond "bad people can block our bots").
- Why are they so aggressive in pursuit of protecting managed brands.
- Why do they persist when no evidence of brand infringement is discovered.
- Why do they not have an ABUSE policy in place.
- Why do they obfuscate their scraper/scanner/bot.
- and more...