Perform processing, cleansing, and verifying the integrity of data used for analysis. Do ad-hoc analysis and present results in a clear manner.
Extend the data with third party sources Enhance data collection procedures to include relevant information for the analysis.
Select features, build and optimize classifier models using machine learning techniques
Qualifications and Experience
Proficient in SDLC (Planned Iterative, Agile)
Good spoken and written English language skills
Good understanding of machine learning techniques such as Decision Trees, Naïve Bayes, SVM, Neural Networks
Good experience with Python (preferred) but also R or C / C++
Experience with at least one of the data science toolkits, such as R, numpy, scikit-learn, Weka, Matlab.
Experience in using query languages such as SQL
Experience with at least one NoSQL database (such as MongoDB, Cassandra)
Experience in using visualization tools
Good applied statistics skills, such as distributions, statistical testing, regression