Position: Site Reliability Engineer
Department: Infrastructure and Cloud Services
Location: Denver, Colorado (Denver Tech Center) (map)
Reports to: VP of Infrastructure and Cloud Services
The Site Reliability Engineer works hand-in-hand with Engineering, Cybersecurity, AI and other teams to deploy, manage, monitor, and support RESTFul API-based solutions utilizing cloud infrastructure technologies like Amazon Web Services (AWS), Terraform, Kubernetes, Helm as well as others.
The Infrastructure and Cloud Services team is responsible for managing and monitoring the health of all systems, data, databases, networks and cloud infrastructure. Additionally, the Site Reliability Engineer will also be responsible for the data integrity, data validation, statistical key performance indicators systems and its configuration and monitoring to ensure zvelo’s platform is performing at expected levels and, if not, diagnose and troubleshoot to find the issue or bring in other engineers to work together to find the issue.
This is a key position which requires proactiveness, clear communication and responsiveness. This position requires someone who enjoys working in an environment that utilizes new technologies (AI, ML, Malicious Detection, Cloud Infrastructure). The person who fills this role will have the skills to proactively collaborate and take the initiative from cradle to grave in a fast paced, quick response environment. In this role you will wear multiple hats and be part of the on call rotation to ensure maximum uptime and availability of all production systems and services.
- Enjoys working in a quick pace, highly collaborative and innovative environment.
- Experience in managing and administering cloud providers and cloud services; specifically Amazon AWS services such as EC2, S3, IAM, DynamoDB, Redshift, RDS, Elasticache, etc.
- Work with team to ensure 24 hour monitoring of systems and mitigate any critical issues that arise. Research customer generated reports/issues. Analyze results of designated metrics and statistics and interact with other teams, such as development and support engineers, to determine the root cause and provide resolution.
- Ensure database integrity by monitoring services, data, database logs and overall system health.
- Experienced with cloud security such as managing users, roles and privileges through AWS IAM or other auth services.
- Expertise with Kubernetes and other tools that interact with it such as Helm, Docker, etc.
- Deep hands on experience with Linux administration and scripting languages, preferably Perl, Bash, or Python, to automate alerting and aggregate data to and from various locations (e.g. logs, databases) in order to place that data in a central location or display correct statistical data via reports, dashboards and scorecards.
- Experienced with production deployments and CI/CD tools.
- Experienced with implementing or managing metric monitoring solutions such as Prometheus and visual analytics solutions such as Grafana.
- Experience with various SQL/NoSQL/in-memory databases such as PostgreSQL, DynamoDB, Redis, ElasticSearch as well as performing data validation and data integrity automation checks.
- Excellent verbal and written communication skills. Strong analytical and critical thinking skills.
- Familiarity with message queuing systems (eg: NSQ) & Git.
- We have a great office in the Denver area but support a remote culture and are open to remote candidates as well.
Ideal Candidate Will Have:
- Experience with autoscaling (schedulers/deschedulers) configuration management of clusters.
- Experience with Docker and operating microservices at scale.
- Knowledge of networking technologies including TCP/IP, Routing, Gateway, Firewall, and network security. Bonus points for hands on experience with BGP and operating a public cloud service.
- Familiarity with related services/applications specifically in the domain realm of Web Security, Internet Security, SIEMs, or Network Security.
- Interaction with custom Restful API’s (internal/3rd party) and GoLang binaries.
- Proficiency with at least two programming languages and can easily pick up new languages as technology trends develop.
- Experience with large data tool sets, such as NiFi, Spark, Hadoop/Hive or Snowflake.
What We Offer:
- Innovative, client-focused, transparent, collaborative and results-oriented culture
- Competitive salary
- Stock options
- A generous benefits program (premiums paid 100% for you and your dependents)
- FlexPTO – Unlimited PTO
- Company sponsored lunches
- Great location and office environment (in Denver Tech Center, near Light Rail and world-class outdoor lifestyle and activities)
- Ability to affect change throughout the company
About zvelo, Inc.
zvelo is a leading provider of web security detection and classification services for content, traffic, and devices. zvelo combines advanced artificial intelligence-based web security and contextual categorizations with sophisticated malicious and fraud detection capabilities that our customers integrate into network and endpoint security, URL and DNS filtering, brand safety, and other applications where data quality, accuracy, and detection rates are critical.
Typical office type of environment. Frequent virtual communication with international offices.
We highly encourage interested and qualified professionals to email a cover letter detailing career highlights and your résumé to [email protected] Please ensure that the job title is in the subject line.Apply Now