Estimated Reading Time: 9 minutes
As more and more companies are pursuing “Data as a service” or “DaaS” business models, we wanted to share our experiences on the DaaS business model, with a specific focus on the challenges of Intellectual Property Rights (IPR) protections for DaaS data.
This has turned out to be one of the most significant business and technical challenges for the DaaS business model. We are openly sharing what we are doing to address the problem, in the hope that other DaaS-based businesses may have similar experiences and we can start a dialogue that helps move the industry towards a solution.
What is Data/Data as a Service?
First things first, what is “data”? And what is “Data as a Service”?
There are several definitions for data, most of which refer to a concept of information or knowledge that is represented or stored in a coded or digital format suitable for retrieval, access by applications or used for visualization in images, graphs, or other tools. There are various types of data, each with their own intrinsic value.
Some data is publicly-available data that is collected, compiled, cleaned and updated. The value is derived from the time and effort involved in the collection and compilation activities, which saves a subscriber to this type of data both the time and the effort involved in the gathering, collection and compilation efforts.
Other data is generated from algorithms, machine learning, data mining and other activities which apply a particular type of engineering or science to create datasets that are proprietary in nature. The value for this type of data is that they are generated due to the investments made in perfecting the algorithms, data mining or other processes that a subscriber does not have the wherewithal to create.
In yet other instances, the data is a combination of the “collected/compiled” data and the “generated” data. In each scenario, the data has intrinsic value to subscribers due to the cost savings, time savings, uniqueness of the data or some other attribute which the subscriber does not possess or for which the subscriber did not elect to allocate the resources themselves.
“Data as a Service” refers to a business model whereby data is typically licensed on a subscription basis (although other pricing models exist, i.e. “Pay As You Go”), with access to the data provided through a cloud-based API for hosted DaaS or delivered via some type of file transfers for applications which require access to the data on a local basis. In some cases, the data is static, while in other cases, the data provided through a DaaS are continually changing, evolving and being updated.
DaaS Business Model
Organizations that pursue a DaaS business model do so because they believe they can invest in the engineering, data science, AI/ML, computing, creation and maintenance of training corpora and the thousands of other activities that go into generating data of value to others. Further, they believe that they can monetize this data to a level that exceeds the initial and ongoing costs of generating and delivering the data, thereby producing a profit. In other words, good old fashioned capitalism.
The DaaS business model is subscription-based, where a customer pays a subscription for access to the data. When the subscription ends, the access to the data stops and any data previously accessed or downloaded needs to be deleted or removed.
In a world where everyone obeyed the law and honored the agreements they signed, the business model would be simple and straightforward. Unfortunately, we live in the real world, where not everyone obeys the law or lives up to the agreements they sign.
The DaaS IPR Challenge
Why is this a problem? Well, to make the data usable, it needs to be accessible, meaning it can’t be encrypted or obfuscated. Those types of preventative measures make the data unusable.
However, by making the data usable and accessible, in many cases that means providing the data “in the clear”, which makes it incredibly vulnerable to theft, piracy and copying. This is one of the inherent challenges of the Intellectual Property Rights (“IPR”) protection of DaaS data.
Whether the data is intentionally stolen or the user is somehow blissfully ignorant of the license status of the data they are using, the result is the DaaS business model’s economics are undermined, leaving a far smaller market from which to monetize the investment in generating the data and damaging the perceived value of said data.
Some DaaS businesses consciously decide to limit the amount of data they provide to subscribers, or restrict the types of customers or markets to which they will license their data. Again, these may help protect the data IPR, however, they also obviously limit the market opportunities. The DaaS business has to decide (in advance) how much data to provide, to whom, and what risk the prospective subscriber poses with regards to data theft and piracy.
Protecting DaaS IPR
To date, companies have used two approaches in their efforts to protect the DaaS IPR.
1. License Agreement Protection
Nearly all DaaS businesses use some form of a License Agreement for IPR protection that invariably includes:
- No copying, reverse engineering, and so forth of the data
- When the subscription ends, the access to the data ends
- When the subscription ends, all data shall be deleted or removed
- At the end of the day, companies relying on License Agreements are depending on the honesty and trust of the customer to do the right thing.
Unfortunately, not all customers read, understand, or live up to the agreements they sign. The data sitting on a server, PC, phone is already “there”—it’s just so tempting and appealing to simply keep the data. Who would know? Who is it hurting? I already paid for it, why shouldn’t I just keep using it? There are many other rationalizations, none of which serve the interests of the DaaS provider.
And we recognize the massive, license enforcement initiatives of companies like Microsoft in the early 2000’s are not practical or viable for many DaaS businesses.
2. Technical or Operational Protection
Many companies, including zvelo, also implement some form of technical or operational IPR protection. The objective of these protections is to create technical hurdles to the theft and copying of the DaaS data, or at a minimum, to provide a way to track if data has been stolen and being used illegally.
The drawbacks to technical protection is the more data protection that is implemented, the less data is available/accessible and the less the revenue and market opportunity. Careful consideration must also be taken to ensure these protections do not hinder performance or deployment flexibility.
DNA Markers and other Methods of Data “Fingerprinting”
As one of our objectives is to help move the industry towards a solution to the IPR piracy issue, we want to share some of the technical approaches we have pursued. These are not ground-breaking new technologies such as Digital Rights Management (DRM) that was developed for the recording industry—an enormous endeavor that has arguably done little to prevent media piracy. Rather, we have taken more of a blue-collar approach—adding specific records, and markers into our DaaS data to provide a “fingerprint” as to the origin or source of the data. Whether taken individually, or en masse, these methods provide a unique and attributable signal or trace as to the DNA of the data.
Even with these measures, we aren’t able to prevent outright copying. After all, if we possessed the technology to prevent our proprietary raw (“in-the-clear”) data from being copied once it was on another computer—that would be a “killer app” which we would have sold and be sitting on a beach sipping drinks with little umbrellas by now. We are primarily focused on identifying when our data has somehow ended up in a database, on a server, being used by an application where there is not a valid license.
Data Fingerprinting Methods in Practice
As noted above, with an honest person or company, all you need is the License Agreement to have adequate IPR protection for the DaaS data. With a dishonest organization, these DNA Fingerprinting methods (unfortunately) come into play—allowing us to identify if the zvelo DaaS data is indeed being used without a proper license. Again, this isn’t going to prevent data piracy, it’s simply a way to determine if pirated data is present in an unlicensed system or company.
It should be noted that this is not always easy to ascertain. With a dishonest organization, they didn’t have any problem stealing the data in the first place and they will likely put up every defense to prevent the inspection of “their” data to verify the presence of said markers.
In these instances, it may require legal involvement, a forensic auditor, or even a court order to obtain the permission and verification of Data “Fingerprints”. The resulting steps a DaaS vendor takes and how much they are willing to invest in any these avenues is obviously entirely their prerogative. Decision factors include cost, business relationship, venue, lost revenue potential, and many more. In short, is it worth it to you to pursue the matter?
A Case Study: zvelo’s Experience with Data Fingerprinting
Unfortunately, zvelo has had reason to actually test the practicality of Data Fingerprinting. The DNA markers used by zvelo are specific to the URL categorization and malicious detection services we offer. With access to the database, zvelo is able to quickly determine if the DNA markers and Data Fingerprints are present. This allows us to verify if the zvelo data has been removed when a relationship or subscription ends, or if the zvelo data is present in an organization which has never licensed the zvelo data in the first place. If DNA markers are detected, we can then determine what action to take.
For purposes of this article, we will use two examples where it was simple to do a verification of the presence of the DNA markers—as both companies illustrated below conveniently have public-facing websites which allow for URLs in their databases to be looked up.
Cyren
Cyren is a network security company that had licensed the zvelo DaaS data for integration with their web filtering services. While the relationship ended in December 2015, the zvelo data has not been removed, as evidenced by the presence of the zvelo “Data Fingerprints” (in the form of the URL and categories) in the Cyren database after the license ended (see the screenshot below from the Cyren website).
Checkpoint
Checkpoint is a network security company with whom zvelo has had no previous business or license relationship. It’s not clear how they obtained the zvelo data, but they have it nonetheless.
Again, this URL and associated categories act as a DNA Marker in the zvelo DaaS data—the URLs and the categorizations are unique to the zvelo database and taxonomy.
Summary
zvelo’s objective with this article is to help start a discussion about the best practices in protecting DaaS data IPR, preventing data theft, and to identify a potential solution—or steps toward progress—that can be generally adopted by the industry. The solution would ideally be a collaborative effort, perhaps a new technological approach that prevents copying or theft of data, or perhaps a public information sharing of those companies that are abusing DaaS data licenses, or some entirely novel approach to the problem. If you are interested in being part of the discussion, please contact us at [email protected].