Raising AutoCat: Web Analysts Role In zvelo’s Website, URL Classification Accuracy
Okay, I lied. We don’t exactly have a superhuman being but we do have the next best thing – an artificial intelligence engine that we call AutoCategorization, or AutoCat for short. AutoCat is our automated URL and website classification system equipped with a robust utility belt of tools used to analyze and classify web content on the fly. It can determine the category (or combination of categories) of a site from sports and lifestyle to technology, food or entertainment. AutoCat can also determine if a web page harbors malicious or child inappropriate content. The list of website categories is over 140 items representing all facets of online information, which can be mapped with utmost flexibility. AutoCat’s high degree of accuracy to discern between categories can be attributed to the amount of updated information and continuous training it receives from, of course, humans.
Web Analysts – The Trainers
The Internet is a universe of human information that is constantly growing, evolving and updating with time. To complicate matters more, online information is presented in various languages so classifying web content quickly and accurately requires a cutting-edge approach. This website classification process demands a high level of resources to ensure it can keep pace. As such, AutoCat is backed by a team of Web Analysts and Linguists that help keep its knowledge base complete and updated.
Witnessing zvelo Web Analysts and Linguists in action resembles a team of meticulous rocket scientists monitoring every aspect of a space shuttle launch. Alerts go off the moment under-performing metrics are spotted. Improvements are likewise viewable. These measurements coupled with continuous observation and research makes for a very effective feedback loop for AutoCat. This careful training and guidance starts from release and follows through towards maturity – as if raising a superhuman child.
Babysitting – The Early Stage
Like a curious toddler asking a torrent of questions about new things seen for the first time, AutoCat experiences the same. Web Analysts actively watch over AutoCat whenever new analyzers or languages are added or after existing ones are enhanced. As new web content or information comes online, Web Analysts reinforce AutoCat’s knowledge base so that it will know what to do after the initial discovery. During this early stage, new web content or information can account for billions of new URLs awaiting classification. AutoCat must identify new URLs and classify them with utmost speed. This URL classification process spans several iterations requiring continuous patience and dedication from Web Analysts. Complete coverage is half the responsibility. The other half is taking care that new information does not clobber what AutoCat already knows in order to avoid confusion.
Resolving Confusions – The Teenage Stage
As AutoCat grows smarter, the more confident it becomes in recognizing new web content or information within fractions of a second. Teaching AutoCat proper website classification logic is always a balancing act, however. An unbalanced learning experience may result in a false bias when classifying URLs as one category over another, or both, which does happen.
Let’s shift perspective for a bit and consider that the 140+ categories correspond to different skills that need to be mastered. Say we taught our superhuman child with an emphasis on weaponry way more than art, and then we try to present her a stunning war photo piece. If we animate it, she will likely say “Awesome! That soldier looks like he’s holding a Spanish M43 Mauser Rifle.”
Missing half the point, you interrupt and say, “Yes, but this is also a very famous piece by combat photojournalist Robert Capa. Didn’t I tell you about him?”
With a blank face she responds, “Did he take pictures of M1 Garand sniper rifles designed by John Garand, too? I know of John Garand’s work.
Web Analysts are always on the lookout for the first signs of confusion. They observe and study AutoCat’s URL classification trends to know what, where, and by how much adjustments should be made. Most of these conflicts are spotted early on and preemptive tweaks are conducted in a timely manner. However, in some cases even after adjustments are made, AutoCat may stubbornly refuse to give in resulting in errors in production. It’s as if AutoCat becomes exceedingly confident that it rejects whatever new instructions it receives from humans. This is a red alert situation for Web Analysts and damage control is executed like clockwork. Reported mis-classification cases are pulled for review and corrected within minutes via an automatic vetting process by multiple analysts – a process inherent to the Web Analyst site review tools. Meanwhile, AutoCat will undergo corrective iterations where the intensity of applied adjustments increase after each cycle until the problem is resolved.
The conflict resolution cycle continues through a feedback loop until errors wind down. Attaining resolution is an excellent indication that AutoCat is well on its way towards maturity.
Follow Through – The Maturity Stage
If our eager and now teenage superhuman tries to ride a bicycle on an unknown terrain, even after being taught how to ride, she will still adjust the gears and she may shift her weight or change speed in response to obstacles that hinder a smooth ride. The same metaphor applies to AutoCat.
Monitoring by Web Analysts never stops, even if AutoCat is steadily performing well. In the unpredictable nature of the Internet, there’s no telling when new sets of websites or web content will challenge AutoCat’s website, URL classification capabilities. This is one of two reasons why Web Analysts consistently hawk changes in behavior, so they can make adjustments when necessary. The other reason is so that Web Analysts can also learn from AutoCat.
When AutoCat performs exceedingly well during its maturity stage, it surprises Web Analysts and Engineers alike with how it is able to discover and accurately classify billions of URLs on its own. zvelo Engineers work seamlessly with Web Analysts, and it is through such unique human-to-machine collaboration that we are able to achieve a top notch website content classification engine we call AutoCat.