General Role :
Quality of data transformation for connection with Cluster Datalake's solution ("on premises") or Cloud architectures, in order to design and implement "end to end " solutions : proper operation of data processing, data ingestion by API's exposure and data visualization, within an DevOps culture.
All the applications are new developments / products for different LOB's (line of business) inside our Group.
General Skills : Experience in "end to end " data streams implementation and design in Big Data architectures ( Hadoop clusters , NOSQL databases, Elastic search ) and also in massive data processing environments of distributed data with : frameworks as Spark / Scala.
Mandatory :
Languages : Spark, Scala, SQL; DevOps : Git, Jenkins, Gitlabee;
Datalake : NiFi work in distributed architecture ; Hortonworks with Hive, Oozie
Data base : NoSQL and SQL
Cloud : GCP (Google) nice to have, AWS nice to have
The candidate shouldn’t be too much into : data science / Machine Learning / Artficial Inteligence.
Project methodology it is only Agile / SCRUM, inside an DevOps culture.
Main Responsibilities in different stages :
During project definition :
Design of data ingestion chains
Design of data preparation chains
Basic ML algorithm design
Data product design
Design of NOSQL data models
Design of data visualizations
Participation in the selection of services / solutions to be used according to uses
Participation in the development of a data toolbox
During the iterative implementation phase :
Implementation of data ingestion chains
Implementation of data preparation chains
Implementation of basic ML algorithms
Implementing data visualizations
Using ML framework
Implementation of data products
Exposure of data products
NOSQL database configuration / parametrisation
Use of functional languages
Debugging of distributed processes and algorithms
Identification and cataloging of reusable entities
Contribution to the working development standards
Contribution and solution proposals on data processing issues
During integration and deployment phase :
Participation in problem solving
Technical Requirements :
Expertise in the implementation of end-to-end data processing chains
Experience in distributed architecture
Basic knowledge and interest in the development of ML algorithms
Knowledge of different ingestion mechanism / framework
Knowledge of Spark and its different modules
Proficiency of Scala and / or Python
Knowledge of the AWS or GCP environment
Knowledge of NOSQL databases environment
Knowledge in building API's for data products
Knowledge of Dataviz tools and libraries
Experience in Spark debugging and distributed systems
Extension of complex systems
Proficiency in the use of notebook data
Experience in data testing strategies
Strong problem-solving skills, intelligence, initiative and ability to withstand pressure
Strong interpersonal skills and great communication skills (ability to go into detail)