Extensive ingesting, cleaning, transformation and aggregation of massive amounts of data from multiple internal and external sources (Salesforce, Google Analytics, Amazon S3 Storage bucket, SAP, etc...) by using Azure Databricks (Pyspark)
Create a data pipeline that enables various systems within ecosystem to stream high volume data from multiple sources into a central repository for processing
Experience building data pipeline by using Azure Big data stack
Setup and Configuration of Azure Databricks
Reporting at scale enabled by Databricks Delta for incremental processing
Understanding of spark framework and tuning of spark applications Extensive experience with horizontally scalable and highly available system design and implementation, with focus on performance and resiliency
An end-to-end solution containing Spark, Azure SQL, Azure Data Lake, Azure Data Factory and Power BI (for visualization)
Exposure to Data & Analytics, Cloud technologies
Setup different cloud environments for Dev, QA, and Prod
We use cookies to improve your online experience. If you continue on this website, you will be providing your consent to our use of cookies.
more informationI ACCEPT