Content Categorization Using Distributed AI & Machine Learning

Content Categorization Using Distributed Artificial Intelligence – Q&A with a zvelo’s Chief AI Scientist

The technology company known for categorizing the web, zvelo, has found many applications for artificial intelligence and machine learning. In this brief question and answer session with its Artificial Intelligence Chief Scientist, Dr. Ignacio Giraldez, discover how zvelo uses artificial intelligence, distributed artificial intelligence, and machine learning for the its content categorization.

Q: What is AI?

A: Artificial intelligence uses machine learning (ML) to learn from experience and utilizes KBSE, a knowledge-based system engineering, to solve problems problem effectively. By using artificial intelligence (AI), we can replace human beings in tasks that require logical intelligence.

Logical intelligence is defined as the ability to learn from experience and to use that knowledge to effectively solve problems. Artificial intelligence is substituted for humans in tasks where AI is capable of performing quickly and with a high degrees of accuracy. Certain use cases warrant high speed and accuracy, such as content categorization with 1000’s of requests per second.

Q: What is DAI?

A: Distributed Artificial Intelligence (DAI) is a subfield of AI that studies distributed computational systems formed by multiple individual software agents which have the capability of learning and using knowledge. These agents work as a team or multi-agent system, performing tasks that involve machine learning and decision-making functions needed in the case of content categorization.

Q: Why do we need DAI?

A: For very complex problems, distributed artificial intelligence deliver even faster and better results than AI. Since DAI is not as restricted by a “limited rationality” as a standalone monolithic AI system, it can solve much more complex problems than intelligent monolithic systems.

Q: What tasks associated with content categorization are solved with DAI?

A: We are able to categorize exponentially increasing amounts of content (web, unstructured text, social media, apps, sms…) using a highly granular set of hundreds of categories. By using DAI, we can offer improved contextual targeting for online advertisers, filtering inappropriate content based on the audience, enrichment of customer profiles for propensity models, and much more.

Q: How does zvelo leverage DAI for content categorization?

A: After verifying that monolithic AI solutions could not satisfy zvelo’s content categorization quality standards, we decided to design our own specialized multiagent system for categorization (MASCAT). In our zvelo DAI solution, hundreds of intelligent software agents work as a team to produce highly accurate content categorizations.

Q: Was machine learning used with content categorization challenges?

A: At first, zvelo worked with classical machine learning algorithms, however our attempts were unsuccessful. Through these failures, we learned a lot by analyzing why and how these ML algorithms had failed. As a result, we decided to develop our very own proprietary machine learning algorithm explicitly intended for intensely complex tasks. With this new ML algorithm, zvelo is able to solve content categorization challenges with sufficient quality to meet even the most critical standards.

Q: What other unique approaches did zvelo take?

A: Our data quality department periodically generates extensive training and test datasets from the active web and social media traffic. By processing these training and test datasets with our machine learning algorithms, we ensure that our distributed categorization “logic” is up to date and responsive to current trends. Miscategorizations reported by customers are verified by our human analysts, properly annotated, and are then fed back into the machine learning loop for immediate responsiveness.

READ: Cyren Announces Liquidation and Ceasing Operations

Q: What are your primary experimental KPIs for categorization?

A: We evaluate categorization quality individually for hundreds of categories. IAB Tier 2, using F1, reflects in a single figure, how often the categorization is incorrect and also how often the right category was missed. In addition, we evaluate our MASCAT globally by using in-house quality metrics such as content acceptability rating (CAR) and efficiency (EFF). CAR measures the weighted precision and acceptability of categorization. For example, a category returned as “nudity” instead of “R-rated” might be acceptable for most, but perhaps not for all. Efficiency measures how often a categorization is output. However, sometimes input data is so scarce that the categorization becomes uncertain and no categorization is output.

Q: What technical challenges did zvelo face when placing MASCAT into production??

A: Our zvelo multiagent system for categorization (MASCAT) has hundreds of intelligent software agents working in coordination with each other. Its highly-distributed logic demands a highly-distributed and flexible production platform. MASCAT maintains fast and reliable inter-agent interactions and complies with the latency and throughput thresholds required for high availability of the categorization service. The zvelo engineering team chose sound design decisions such as taking advantage of a service oriented architecture (SOA).

Q: Can zvelo customers take advantage of DAI?

A: Our DAI system has been completed and released into production. The data generated is provided through zveloCAT, our real-time web content classification engine, and delivered either via zveloAPI or zveloDB. Customers can learn more about the content, other datasets, or our data platform by visiting our web site at www.zvelo.com.

About the Author

Ignacio Giraldez, (B.S. in Physics and Ph.D. in Artificial Intelligence) is an expert in machine learning, artificial intelligence and business innovation. He attended Universidad Carlos III, Harvard University and Universidad Europea. Ignacio has written for many research publications and has served as the Principal Investigator on numerous research projects.

Content Categorization Using Distributed Artificial Intelligence