Job Title: Applic/Solutions Architect, Sr. (Data Engineer)
Location: GPC Corporate HQ, Atlanta
Department Name: GPC4020 - PD Data Analytics & Innovation
This position is responsible for translating data into readily consumable forms to deliver integrated data to consumers by building, operationalizing, and maintaining data pipelines for Data & Analytics use cases across heterogeneous environments. The Data Engineer also plays a role in working with various data integration tools which support combination of data delivery styles such as virtualization, data replication, messaging and streaming in hybrid and multi-cloud integration scenarios.
The position is responsible for detailed data analysis, data modeling and requirements mapping, and ETL design and development. They also work closely with Data Architects within the organization to ensure that new analytics solutions align with the enterprise data management and Power Delivery data strategy. This includes identification of the true source of data, reducing data redundancy and focus on increased integration of data across business units.
Responsibilities
- Designing and Developing methods to process structured, semi-structured, and unstructured data using batch and real-time data processing techniques.
- Delivering fast, reliable, and scalable data by Incrementally and efficiently processing as it arrives from files or streaming sources like Kafka, DBMS and NoSQL.
- Developing Release pipelines to automate recurring manual tasks like creating a build package, checking that build package into a version control repository and deploys it to a DV/UA environment.
- Building and Maintaining templates such as code libraries, pipeline patterns and semantic models to promote reuse and agility.
- Establishing gatekeeping processes that monitors and controls the promotion of successful data processes into production by understanding the business criticality.
- Collaborating with cross-functional teams with a combination of data, business, and technical personas, as well as a product owner/manager as necessary.
- Advocating data reusability by breaking down monolithic data delivery processes into modular data product delivery.
- Ensuring data reliability by defining data quality and integrity controls within the pipeline with defined data expectations and addressing data quality errors with predefined policies.
- Actively working with less experienced data engineers providing technical guidance and oversight.
- Understanding the usage of performance optimization clusters that parallelize jobs and minimize the data movement in Batch and stream data processing.
- Recommending improvements to the processes, technology, and interfaces that reduce the development time and effort and enhance the effectiveness of the team.
- Promptly participating in the Enterprise Social Networking sites, staying up to date on new data technologies and best practices and shares insights with others in the organization.
Candidates with the following preferred qualifications are encouraged to apply:
Education
- Bachelor’s degree required; degree in technical field such as computer science, engineering, mathematics, or another relevant academic discipline. Advanced degree preferred.
Knowledge, Skills, and Abilities
- Working experience with batch and real-time data processing frameworks.
- Working experience with data modelling, data access, schemas, and data storage techniques.
- Working experience with data quality tools.
- Experience in creating functional and technical designs for data engineering and analytics solutions.
- Experience implementing data models of different schemas and working with diverse data source types.
- Hands-on experience developing solutions with big data technologies such as Hadoop, HIVE and Spark.
- Hands-on experience developing and supporting statistical models, R, and/or Python based AI/ML solutions.
- 5+ years hands on experience designing, developing, testing, deploying, and supporting data engineering and analytics solutions using on-premises tools such as, MSBI (SSIS/SSAS), Informatica, Oracle Golden Gate, SQL, Oracle, and SQL Server.
- 3+ years hands on experience designing, developing, testing, deploying, and supporting data engineering and analytics solutions using Microsoft cloud-based tools such as Azure Data Lake, Azure Data Factory, Azure Databricks, Python, Azure Synapse, Azure Key Vault, and Power BI.
- Experience with Containerization methodologies – Docker, OpenShift etc.,
- Experience with Agile as well as DevOps, CI/CD methodologies.
- Hands-on experience designing and developing solutions involving data sourcing, enrichment and delivery using APIs & Web Services.