Skip to main content
Posted February 03, 2022
CertiK

Data Engineer - Web Scraper

New York / Remote Remote Full Time

As a Web Scraping focused Data Engineer, you will be responsible for extracting and ingesting data from websites using web crawling tools, and building...

As a Web Scraping focused Data Engineer, you will be responsible for extracting and ingesting data from websites using web crawling tools, and building and maintaining infrastructure to support those tools.

In this role you will own the creation process of these tools, services, and workflows to improve crawl/ scrape analysis, reports and data management. We will rely on you to test the data and the scrape to ensure accuracy and quality. You will own the process to identify and rectify any issues with breaks as well as scale scrapers as needed.


Responsibilities

  • Building and managing targeted web scrapers, including but not limited to ad-hoc scraping tasks and production level regularly recurring scraping jobs.
  • Managing the pipeline and storage for the data of those scrapers.
  • Working closely with Data Scientists and Product Team to build and develop future data pipelines as defined by the business.

  • Requirements

  • Experience running large scale web scrapers; ideally some familiarity with a big data stack (e.g. Airflow, Spark, Hadoop, MapReduce, Hive, Impala, Kafka, Storm, and equivalent cloud-native services)
  • Solid Python knowledge, Java, C are ideal
  • Hands-on experience with cloud services and tools required for designing cloud architectures and websites like AWS, Google Cloud, Azure, Linux/UNIX, regex, GraphQL, HTTP, HTML, Javascript, typescript, and Networking protocols
  • Familiarity with techniques and tools for crawling, extracting and processing data (e.g. Scrapy, pandas, mapreduce, SQL, BeautifulSoup, etc).
  • Familiarity with Search APIs
  • Strong database creation and administration knowledge; Mysql and nosql (elastic, postgres, graph-dbs)
  • Bachelor's Degree in Computer Science or a related field or the equivalent demonstrated experience

  • Bonus Points

  • Experience in developing data & analytics platforms / applications on cloud and experience with cloud native container technologies such as Kubernetes, Jenkins, OpenShift, Docker (experience with Terraform, Ansible, Chef, and Puppet is preferred)
  • Experience with streaming data sources and RESTful interfaces including familiarity with extracting data from publicly available API endpoints
  • Knowledge of CICD best practices like designing and building automated solutions for building, testing, monitoring, and deploying applications in a continuous delivery environment
  • Familiarity with social media apis and scraping
  • Experience with system monitoring/administration tools
  • Experience with version control, open source practices, and code review
  • Experience with applications designed to display archived web content
  • Knowledge of entity resolution best practices and ontology creation
  • CertiK is proud to offer medical, vision, and dental insurance, 401(k) plan with company matching, life and accidental death and dismemberment insurance, HSA (with high deductible plan), FSA, and other benefits to all full-time employees, along with flexible paid time off and holidays.

    In compliance with federal law, all persons hired will be required to verify identity and eligibility to work in the United States and to complete the required employment eligibility verification form upon hire.

    CertiK is proud to be an equal opportunity employer. We will not discriminate against any applicant or employee on the basis of age, race, color, creed, religion, sex, sexual orientation, gender, gender identity or expression, medical condition, national origin, ancestry, citizenship, marital status or civil partnership/union status, physical or mental disability, pregnancy, childbirth, genetic information, military and veteran status, or any other basis prohibited by applicable federal, state or local law.

    CertiK will consider for employment qualified applicants with criminal histories in a manner consistent with local and federal requirements.

    All CertiK employees are expected to actively support diversity on their teams, and in the Company.

    This listing expired on Mar 20. Applications are no longer accepted.

    Below are some other jobs we think you might be interested in.