We are seeking a Data Engineer to join our data engineering team, focusing on data processing and analytics solutions using big modern data technologies.
Responsibilities
Develop and maintain data processing pipelines using Python, Spark, and Hadoop ecosystems.
Build ETL workflows on AWS EMR clusters for processing large datasets.
Assist in optimizing Spark applications for performance and reliability.
Support data visualization and exploration using Hue and other analytics platforms.
Work with data scientists and analysts on machine learning pipeline implementation.
Monitor distributed computing environments and assist with troubleshooting.
Qualifications
3+ years of software development experience.
Proficiency in programming languages like Python & SQL.
Knowledge of database modeling and familiarity with ETL tools.
Experience with big data technologies and cloud platforms (AWS).
Working knowledge of Apache Spark (PySpark, Spark SQL) and Hadoop ecosystem.
Experience with AWS EMR cluster operations.
Familiarity with data exploration and workflow management tools.
Understanding of distributed systems concepts.
Exposure to real-time data streaming (Kafka, Kinesis).
Basic understanding of data governance & security.
Familiarity with machine learning concepts.
AWS certifications or willingness to obtain.
Experience with AWS services (S3, Glue, Redshift, Athena).
Basic knowledge of containerization (Docker, Kubernetes).