Job Description
Job Description:
Language: Fluency in French/ Dutch and English
The Azure Data Factory and Azure Databricks developer/application engineer will work as part of the project team. Will be primarily responsible for Architecting, developing, deploying and operating the Azure Databricks and Data Factory/ Data Flow product. Also, build and extend the Databricks cloud offering, which is based on a microservice architecture and includes: a cluster management platform Spark job scheduling, notebook services deployment and automation.
- Expert in Azure Data Factory, Azure Data Lake Analytics, U-SQL, Java, Python, Hive SQL, Spark SQL, Azure Data Bricks
- Strong t-SQL skills with experience in Azure SQL DW
- Experience handling Structured and unstructured datasets
- Experience in Data Modeling and Advanced SQL techniques
- Experience implementing Azure Data Factory Pipelines using latest technologies and techniques
- Use the interactive Databricks notebook environment using Apache Spark SQL, Examine external data sets, Query existing data sets using Spark SQL
- Visualize query results and data using the built-in Databricks visualization features, Perform exploratory data analysis using Spark SQL.
- ETL processing and data extraction using Azure Databricks to write a basic ETL pipeline using the Spark design pattern, Ingest data using DBFS mounts in Azure Blob Storage and S3, Ingest data using serial and parallel JDBC reads
- Define and apply a user-defined schema to semi-structured JSON data, Productionize an ETL pipeline
- ETL transformations and Loads using Azure Databricks to apply built-in functions to manipulate data, Write UDFs with a single DataFrame column inputs, Apply UDFs with a multiple DataFrame column inputs and that return complex types
- Perform an ETL job on a streaming data source, Parameterize a code base and manage task dependencies, Submit and monitor jobs using the REST API or Command Line Interface
- Mange Delta lake using Data bricks to use the interactive Databricks notebook environment, Create, append and upsert data into a data lake.
Mandatory -
Overall 8-11 years of IT experience
Strong interest for Business Process Analysis
Experience in functional analysis
Experience in working on Agile projects
Understanding development life cycles
Act as a communication channel between Business teams, client & Development team.
Communicate with development team to understand any blocking issue, ensure on time delivery
Report to the Business & clients about the status of the project
Raise early flags, if you see any risk in the delivery quality or timeline
Fluency in French/ Dutch and English
Understanding of Data warehouse concepts ETL development, Batch scheduling & Abend management
Skills and Experience
- Overall 8-11 years of software development experience
- 3-5 years of development experience in cloud platform includes Azure Databricks, Azure SQL, Apache Spark (Scala/Python) etc.
- 5 to 7 years of development experience in SSIS and SQL skills.
- Excellent problem analysis and solving skills, strong communication skills.
- Strong technical skills architecting, designing, developing, debugging, documenting, implementing applications and operating large scale distributed systems.
Primary Responsibilities
- Develop and design ETL processing and Data extraction using Azur Data Factory/ Data Flow/ Azure Databricks.
- Design and develop Apache Spark SQL using Scala/Python to examine and query datasets.
- Develop Dataframes for ETL transformation and loads
- Write scripts for automated testing of data in the target facts and dimensions.
- Capture audit information during all phases of the ETL transformation process.
- Write and maintain documentation of the ETL processes via process flow diagrams.
- Collaborate with business users, support team members, and other developers throughout the organization to help everyone understand issues that affect the data warehouse.