What is screen scraping, and how does it relate to APIs?

Screen scraping is a common challenge for businesses with a significant online presence, such as financial services and e-commerce firms. It may be referred to under many different names, such as web data extraction, web scraping, web harvesting, etc. While screen scraping was once thought of primarily as a front-end web application security challenge, the changing nature of business applications is extending the issue of scraping into the API security domain.

For example, business-to-consumer (B2C) architectures have evolved over time from monolithic web applications to new API-based front-end frameworks that can meet the needs of both web and mobile applications. Meanwhile, growing use of business-to-business (B2B) APIs by industry ecosystem partners is creating even more potential scenarios for scraping to occur.

B2B APIs have different APIs consumers than B2C APIs, which broadens the universe of potential data scraping scenarios. Some forms of scraping may be legitimate, but more often it is used to abuse APIs. Examples may include:

  • Aggregating information for use in non-sanctioned ways like product descriptions and product reviews
  • Collecting pricing information from ecommerce sites to inform competitive pricing strategies and offers, particularly those with constantly changing pricing models like travel, hotel, and car rental to name a few
  • Accessing frequently changing information such as interest rates from financial sites or betting odds from gambling sites for competitive reasons.

In addition to undesirable forms of enabling data leakage, API scraping can place a heavy resource burden on application infrastructure. And unfortunately, mitigating it is not as simple as implementing rate limits or quotas. Many sophisticated actors are adept at conducting scraping activities in a “low and slow” manner that falls below existing limit and quota levels. This makes it difficult to stop without disrupting legitimate API usage.

In addition, the fact that API scraping likely operates within these existing rate limit and quota parameters means that most organizations have zero visibility that it’s actually happening.

How do most organizations protect themselves against API scraping?

Most organizations rely on rate limits and quotas to limit the ability to perform web scraping. While this is not a silver bullet for the reasons described above, it is nonetheless an important first step. At a very minimum, it puts an upper limit on the volume of scraping that can occur.

Another crucial best practice is to ensure that the clients connecting to APIs are valid. For example, if APIs are generally accessible by mobile devices, steps should be taken to assure that the mobile client accessing the API hasn’t been hacked, the mobile device integrity hasn’t been compromised through jailbreaking, etc.

Some organizations may also use specialized bot mitigation tools to protect their web applications against automated scraping. These solutions provide value for B2C API traffic. But since they require specific browser or mobile application instrumentation, they are completely ineffective for B2B API scraping, where browsers and mobile apps don’t exist, which generally originates from a programmatic client of some kind. Similarly, compromised internet of things (IoT) or internet of everything (IoE) devices can be used to create “swarms” that do not originate from standard web or mobile application clients.

So, in summary, even if you have rate limits and quotas in place, you will still be left with two major points of exposure:

  1. You remain wide open to low and slow scraping on B2C APIs.
  2. Authenticated B2B API traffic is completely unmonitored.

And these risks are more than theoretical. Earlier this year, a threat actor was able to exploit a Twitter API vulnerability to scrape account details for an estimated 5.4 million users. 


How does Neosec’s approach close these critical protection gaps?

The most important advance that Neosec brings to API security is extending API monitoring and analysis to authenticated traffic. B2B APIs represent a much larger attack surface – and a potential pathway to higher-value corporate assets.

Behavioral analytics at the authenticated user level is the key to monitoring B2B APIs. It’s the only way to tell when a seemingly legitimate, authenticated API consumer not using any known attack patterns is scraping your APIs. This requires context that can only come from analyzing the API requests of the same user over a long period of time – even if they’ve changed access tokens 100-plus times.

Below is a summary of how Neosec’s approach can extend your API protection capabilities beyond traditional bot mitigation techniques.

Comparison of Bot Mitigation and API Data Scraping

OWASP API Top 10 Bot Mitigation Neosec
What UI-based API (B2C only)

Any API (B2C, B2B)
Where In the browser Through the API
How Detects browser or mobile app and human user signals - assumes any human is good Behavioral profiling of users and IPs
Impact on the user experience High Low
Endurance Easier to bypass Robust
Response Immediate Slower
Strengths Blocking high volume automated scraping on websites

Detects a wide range of abuse and misuse by malicious insiders and attackers masquerading as legitimate users
Common scraping use case Scraping prices on website
(for example: Airlines, Playstation 5)

Scraping any API resource by any authenticated user - from resellers, partners, suppliers to customers

 

 

FEATURED RESOURCE

API Security Fundamentals 2022

Learn the fundamentals of API security. Made for security leaders and practitioners to increase their foundational knowledge about API security and best practices.

DOWNLOAD NOW
img-1-2

We’re a Cool Vendor. Read the 2022 Gartner Cool Vendor Report to find out why.

Newsletter