zvelo BLOG

Articles, news, advice, research and open discussions from zveloLABS® – a team of software engineers, web analysts and other professionals dedicated to the development and enhancement of zvelo’s content and context categorization databases and technologies, and malicious website detection capabilities.

An experiment in buying Internet traffic – video

zvelo Chief Security Scientist, Mark Ryan Talabis, performed an experiment in buying Internet traffic. He wanted to see if he could buy traffic, and if the traffic would be actually human generated, driven by bots or low quality traffic.  He set up a honeypot site (essentially a traffic trap), bought inbound traffic from 10 different providers and examined the inbound traffic. What he learned may be surprising – it is definitely depressing if you are a publisher buying traffic to supplement inventory, or if you are the advertiser paying for the traffic on a CPM basis.  Watch the 5 minute video below:

‘Ware the Web-the Lesson from the Yahoo Malvertising Attack

The ad tech industry has been reeling for the past 12 months over ad fraud and the industry is starting to come to terms with it. But unfortunately, ad fraud is just the tip (albeit, a very costly tip) of the malicious web. The recent malvertising attack on the Yahoo Network is a painful example and reports suggest malvertising will continue to grow. According to Malwarebytes, the security company that discovered the malicious creative, the attack actually started on July 28th, but was not discovered and taken down until Monday, August 3rd – or a full six days and potentially billions of impressions later. Malvertising (malicious advertising) typically exploits a vulnerability in a software program and recent variants do not require any user interaction to infect the user’s machine. Infections can include browser redirects to malicious sites, ransomware, bot creation, Trojans and more. Malicious software affects users, advertisers and publishers.

For as long as the web has existed, there has been malware (malicious software) although the primary early vector was via email. Today most computer worms and viruses are designed to cause harm by allowing hackers to monetize stolen sensitive information or perpetuate fraud. They change or direct user Internet behavior, steal sensitive data, or control critical infrastructure. They cause very real monetary damage.

Pre-emptive ad fraud detection = protection

On June 26th, Kathy Leake, CEO of Qualia, wrote a very thoughtful column entitled “We Need Clearer Fraud Definitions, More Standard Measurement.” In it, she called for fraud definitions to be agreed upon between buyers and sellers and for pre-emptive fraud protection. We couldn’t agree more!

On the fraud definitions, there are very clear cases of ad fraud – NHT (non-human traffic), bots (which fall into NHT), ad stacking, hidden ads, etc. but others that are less clear cut. If a publisher runs 10 ads on a page, is that fraud? How about 7, or 5? What about running an ad at the very bottom of the page or on a page with very little content? What about auto play video? How should we view ads (overlays) that are served smack in the middle of the content so the user has to close it in order to reach their content? As a user, I hate those ads – but are they fraud or just ill-advised? What if you can’t close the overlay without inadvertently clicking? Guidance and agreement on what constitutes ad fraud vs. sketchy publisher practices is warranted.

Fighting ad fraud before the ad is served would be a far more efficient and effective method. Currently, most ad fraud identification is done at the impression level, so by definition it is performed after the ad is served. However, if that intel is then fed back into a system that also contains a dynamically updated black list of known fraudulent and malicious sites which are then used in pre-bid decisioning, that impression becomes the sacrificial lamb and subsequent impressions are protected against further fraud. This is how the anti-virus industry works – the first machine that is infected informs the AV engine and then a signature is written, providing protection against further infections for the rest of the machines. The beautiful part of such a system is that every customer of the AV company benefits from that single machine as the signature is rolled out to everyone. The wider the footprint of a company, the more protection they can provide because they have greater visibility. zvelo has visibility into 99% of the web.

The Magnitude of the Breach

On Thursday, June 4, it was announced that there was a large breach of the Office of Personnel Management (OPM) of the federal government. The Chinese are fingered in the breach, in which about 4 million people’s records were lost in the attack, and the Chinese government denied being responsible, as it does routinely. But think about the data which has been stolen – PII (personally identifiable information). That PII included JOB HISTORY – of federal employees. With deep job history, it greatly increases the hacker’s ability to spear phish (send highly targeted email attacks to gain access to sensitive data) successfully. If they are successful with someone high up in the DOD, what could they gain access to? What about the White House? State Department? Department of Homeland Security? Nuclear Regulatory Agency? The target smacks of a state sponsored attack, as do the methods, servers and hacker habits.

Attacks targeting federal workers are not new – federal contractor KeyPoint had 40k federal employee records breached in December, 2014 and rival contractor United States Investigative Services lost its investigations business with the government following a cyberattack earlier in 2014. That breach tainted the files of at least 25,000 Homeland Security workers. China was fingered in those attacks as well.

Read more about the breach here.

Ad Age looks inside Google’s war on ad fraud

Ad Age had an opportunity to look inside the hood of Google’s war on ad fraud and the article is an eye opener for those who really have never looked into the dark places on the web. Although long, the article is well worth reading to the end. Ad Age – Inside Google’s secret war on fraud

To summarize – Google bought a young company called Spider.io in 2014. The company had 7 employees upon acquisition, but has bulked up to 100 under Google, with massive computing power added to the technology they brought to the advertising giant. As the article points out, as the leading provider of ad impressions, Google has the most to lose from identifying ad fraud, but is also in the best position to lead the fight against it. Google has decided that it cannot lead by keeping quiet, so they are beginning to open their doors by sharing some methodologies and intel with Ad Age.

Adventures in Buying Internet Traffic

zvelo’s Chief Security Scientist, Mark Ryan Talabis, put together an experiment in buying low cost Internet traffic to replicate ad fraud.  First he set up a honeypot site; then he shopped some blackhat traffic vendors and tested 10 providers in his experiment.  He found that the lowest cost traffic was driven mostly by bots; the slightly higher priced traffic was driven by pop-unders, pop-ups and frames. He concluded that you could get away with buying internet traffic if your advertiser doesn’t audit the traffic and doesn’t know what to look for.  Watch the presentation yourself – it is only 5 minutes long and was given to the Boulder Denver AdTech Meetup on April 22nd. Watch the video here!

Too Damn Rich

I recently read an article that predicts global ad tech revenue will rise from a forecasted humble $30 billion for 2015, to $100 billion in 2020. If ad tech were a country, it would have a larger economy than 120 countries. That is a great deal of money.

That is so much money, in fact, that I believe there is NO WAY the practitioners of ad fraud are going to walk away from the market.

Mobile Advertising

As a media buyer previously, I was always frustrated with the lack of tracking on iOS in mobile inventory. However, it was possible to track performance on Android and as Android’s market share grew, so did the conversions attributed to mobile display. Mobile apps, however, were another story altogether – as a performance shop, I could not justify buying app inventory. A couple of years have passed since then, and mobile display has exploded. Interesting cross-device targeting & attribution companies (InMobi, AppsFlyer, MediaLets & more) have sprung up to close the loop on mobile at the same time that the issue of ad viewability has come to a head. At a recent member meeting of the IAB, the discussion revolved around mobile ad viewability and the huge challenges that the Media Rating Council (MRC) faces in trying to set standards, especially with app inventory. “This is a tech problem,” said the MRC, “and it will be solved with tech.”  For the company(s) that provide/s that technology, the future is bright.

zvelo’s URL Checker and its Contextual Data Sets Explained

zvelo's URL checker featured on the homepage of zvelo.com

The URL checker found on the zvelo.com homepage, previously known as the “Test-a-site” tool, serves to demo various contextual data sets about URLs that can be derived by licensing zvelo’s contextual categorization and malicious website detection services. When queried, the URL checker yields a sample of data sets stored within zvelo’s URL database, via an intuitive GUI, that technology vendors within the ad tech, data analytics, network security, web filtering and other markets can integrate into their own offerings via a cloud API or a local SDK.

How Ad Networks are Being Used for Scamvertising

The Internet age has shown us a myriad of online scams, from get rich quick schemes to winning the lottery, typically originating via an email hook. This is a blind way of distributing scams, since scammers have no way of knowing if the scam is relevant to the person they are trying to lure. For example, a scammer might send a Lotto scam in a country where there is no Lotto. This has changed; scammers now have access to online advertising networks, which have proven to be very powerful targeting tools because of the vast amount of demographic and web usage behavioral information they possess about real people.

Crowdsourced Security for Web Threat Intelligence

If we have a thousand monkeys typing away on a thousand typewriters, surely they can produce great works of literature – or so goes the popular adaptation of the Infinite Monkey Theorem. But in the context of information security, a similar idea has been taking shape in past few years. Crowdsourced security, leveraging on input from a host of geographically dispersed systems, is slowly gaining ground as a means to provide actionable threat intelligence for both the public and private sectors.

Online Advertising Gone Bad – French Child Abuse Trial

Advertisements are everywhere, from print publications to road-side billboards, and of course TV and on the Web. The intent of advertising is no different regardless of the medium. Advertisers are constantly feuding to win over consumer sentiment. On the Internet, ad-serving technologies have become so advanced that ads can now be targeted based on one’s individual web browsing history and behaviors, likes, shares, location, device type and other factors. From time to time, however, ad placements land severely out-of-context, and here is one such example of online advertising gone bad.

zvelo Joins IAB

Ad fraud continues to plague the online advertising industry and advertiser trust in automated ad-serving technologies continues to dwindle. It’s not just traditional display advertising that’s susceptible. Digital video and mobile advertising are seeing their fair share of bot (non-human) generated impressions and clicks as well. zvelo has recently become an Associate Member of the Interactive Advertising Bureau (IAB) to help mold industry best practices to combat ad fraud.

Giving Up Our Privacy and Liking It

We are constantly reminded of the growing number of privacy concerns from the use of Information and Communications Technology (ICT). Some are quick to blame governments or commercial entities when our personal information is compromised. Very few stop to think whether or not the blame should be pointed at ourselves. To what extent are we as end-users responsible for facilitating our own personal privacy?

To WWW or Not to WWW?

If one performs the search “use www or not,” well over a billion results in many of the most popular search engines are returned. The focus of each result may differ. For zvelo, the usage is irrelevant because its contextual categorization processes are designed to identify and handle each component of a URL. At a simplistic view, the basic components of a URL are the following:

Better Ad Targeting with Increased Category Granularity

zvelo once offered 53 categories that were used to classify content on websites about Businesses & Services, Politics & Law, Portal Sites and others. This was later raised to 141 categories to help cover even more topics. The latest version boasts nearly 500 categories, making it one of the most granular categorization sets in the industry. We’ve managed to upgrade our categorization systems to better serve the needs of our existing and future technology partners and following is one example why this matters.

Forget the Gates, the Huns Are Inside: Thoughts on Secure Programming, Education and BYOD

Heartbleed vulnerability logoRecent events serve as the best example of how the context of security has shifted from the once server-centric model to that of a decentralized threat landscape. From the Heartbleed attacks to the widespread Internet Explorer vulnerabilities and finally the sensationalized OAuth issues, it appears that even organizations with a hardened perimeter infrastructure are just as vulnerable as an end-user at home. Although threats geared towards enterprise infrastructure are by no means going away, the prevalence of vulnerabilities affecting end-users are alarming to say the least.

Full Path URL Content Classification

Given the dynamic nature of the majority of today’s websites, categorization at the full path URL versus the base domain is superior and now required. Parts of a website include the top-level domain (.com, .org, etc.), the base domain (example.com), sub-domain (subdomain.example.com) or sub-path (example.com/page). When categorizing content, it is highly important to recognize exactly what is being classified within a website because content can differ dramatically across full path URLs.

What is a URL Parameter?

What is a URL parameter? Quite simply it is a string of characters, or a query string, that is appended to a URL that contains data. This data is passed to predefined web applications to find the appropriate content and return it back to the user’s web browser which then generates the entire web page. The query string can also be used for various other methods such as identifying a user’s session or using it as a way to look up information about your online bank account after you have logged in. URLs with parameters are used by various types of web sites however online shopping, auction, and banking type sites are probably the most prevalent.

How to Use Keyword Density to Classify a Web Page

Manually classifying the content on a single web page takes but a few seconds to accomplish. Analyzing the keywords – words or phrases – used and the number of instances of each – keyword density – is one way to go about it. When needing to classify the content on billions of web pages at a time, however, the task becomes overwhelmingly daunting for any human eye to handle. In this scenario, only an automated content classification engine can succeed.

Water Hole Attacks – Drinking from the Poisoned Well

Instances of large-scale compromises of both private industry and public institutions in 2013 prompted a flurry of activity among security researchers to identify emerging and established threats. Commonly identified as Advance Persistent Threats (APTs), this phenomenon is expected to continue well into the foreseeable future. Fundamental to the spread of these threats is one of their foremost methods of propagation – a water hole attack.

Content categorization QA with Web Analysts

How does zvelo provide the most accurate content categorization service and the best URL database available? The approach is two-fold and while a substantial chunk of the workload is handled by zvelo’s line-up of machine learning and artificial intelligence-based categorization processes and systems, the quality assurance and other daily efforts put forth by its human Web Analysts can never be discounted.

Internet Safety Resources, Guides and Tips for Everyone

This article will be updated periodically, in support of numerous global online safety awareness campaigns occuring every year – Safer Internet Day (promoted in February), Cyber Security Awareness Month (October), IWF Awareness Day (also in October) and others. During these times, web safety advocates, companies, organizations and professionals worldwide raise awareness about safer and more responsible use of online technologies and mobile devices. Following is a living repository of online resources, guides, tips and entities aimed at helping everyone enjoy worry-free Internet experiences. Additional web safety resources will be hand-picked and added as they are discovered. To possibly be included in this list, or if other online safety resources exist that deserve mention, please feel free to comment below. Including a link and a brief description with each comment helps.

Anatomy of a Dynamic Website

Static HTML websites are becoming increasingly rare, and nowadays sites pack quite the punch. We’ve grown accustomed to photo and video slideshows, widgets, feeds, social network integrations, and other dynamic elements. Websites come overloaded with media, are more interactive, and the content can vary dramatically from page-to-page and can differ even more between end-users or browsing sessions. Much of the content is pulled in dynamically from external sources and most of us fuel the Internet’s growth by creating and uploading content of our own daily and at extremely high upload rates. Making sense of it all can be quite the challenge for technology vendors “needing to know” and following are insights into zvelo’s content categorization approach.

Ad Fraud and the Bots Gaming the Online Advertising Ecosystem

Reports are plentiful of non-human bots gaming the online advertising industry by delivering fraudulent impressions and click traffic, and the Internet Advertising Bureau (IAB) took note. The IAB released the “Traffic Fraud: Best Practices for Reducing Risk to Exposure” on December 5, 2013, to help online media buyers, publishers and ad networks mitigate the dilemma.

Social Media Experiment Raises Online Privacy ‘brow

People don’t seem to worry much about privacy when “checking in” to a favorite local restaurant or coffee shop, or from other social media posts that reveal one’s location. What if you were approached by a complete stranger who knew your name and other personally indefinable information within minutes after making an upload? A few socialites got quite the shock after a social media experiment revealed how much personal information can be extracted from publicly viewable status updates.

Challenges of Blocking Web Porn from the Brits

In mid-2013, British Prime Minister, David Cameron, began a push to block pornographic material on the Web in UK households. Under the new legislation, porn would be filtered by default and citizens would have to opt-in to view such adult content. Enforcement of such an ambitious initiative comes with many content categorization and technical challenges, not just in the UK, but within any internet service provider infrastructure.

Adult, Extremist, Drug Content Easily Accessible by Kids at U.S. Wi-Fi Hotspots

Wi-Fi hotspots commonly found in many American coffee shops, restaurants and other popular after-school hang outs are providing kids with what they demand – free Internet access. This may help keep them connected with family or friends, in addition to sparing parents from costly data plan overages, but the complimentary Web access was proven to come with a twist in an Adaptive Mobile independent study. The adult, dating, extremist, drug, gambling and other similarly objectionable content typically blocked at home by some type of parental controls solution is easily accessible by kids at these Wi-Fi locations.


ROOTCON 2013 Conference and Security Gathering Highlights

DEFCON Philippines 2013 logoOnce again, zveloLABS participated in the 2013 ROOTCON annual hacker conference and security gathering in Cebu City, Philippines. It aims to share best practices and technologies through talks by qualified speakers and demos of exciting hacks, tools, tips, and more. The event was attended by groups and individuals who share similar interests in information security. Following is a summary of a few of the topics presented.

Ad Blocking Software Adoption Makes In-context Ad Placements More Vital

Ad blocking has gained wide consumer acceptance over the past couple of years and a PageFair report suggests it could be costing web-based businesses hundreds of thousands of dollars in lost advertising revenue. In some instances, ad blocking negatively impacted a select number of websites so much they are no longer online. With the use of ad blocking software on the rise, there exists a significant requirement by the ad-tech market to make the most of those actual ad placements that make the cut. In other words, it’s more important than ever for ad units to be in-context with content on web pages, no matter how deep within a website the placements land.

Black Hat USA 2013 Highlight – Source Code Analysis and Integer Overflow

Black Hat USA 2013 logoI attended one of the Black Hat training sessions titled “Advanced C++ Source Code Analysis.” It was quite fascinating! Looking through source code for bugs seems to be a different mindset from writing software. While reading the buggy code I often found myself thinking, “Yes, that should work,” and then realized that what looked fine was actually horribly dangerous.

DEF CON 2013 Highlights – Home Invasion, RFID Hacking

The annual DEF CON® hacker conference came and went as swiftly as a light rain against the hot Las Vegas strip. Consumer tech was a big focus and speakers demonstrated how various network-connected gadgets, once hacked, could be controlled to affect the real, physical world. Here are some highlights from two particular lectures about the hacking of network-connected and radio-frequency identification (RFID) enabled devices that got much attention.