Projects

Data Science

  • Carrefour-X AI & Retail Challenge - Rank 1st: 10 weeks project building a local Marketing Mix Modeling – analysis and prediction of the impact of marketing media on sales (ROI)

    Keywords: MMM, Interpretable ML, CatBoost, SHAP, GCP, Big Query, Dash

    From 3rd February to 8th April 2020, I had the opportunity of participating in the X-Carrefour AI & Retail Challenge, a data science hackathon gathering 130 participants working on 3 specific thematics : Assortments / in-store offers, Marketing Mix Modeling and Delivery Optimisation.

    With the help of Carrefour Data and Marketing Departments and over 30 datasets on Carrefour’s products, customers, marketing investments and sales, our team won the 1st prize by building a Marketing Mix Modeling at a local granularity with a focus on organic products in order to isolate and anticipate the impact of the local marketing levers on organic sales.

    Our team managed to train a highly effective yet interpretable Machine Learning algorithm using CatBoost and SHAP, build a dashboard for Carrefour Marketing teams to help them analyse the profitability of all marketing investments and give Carrefour concrete recommendations to help them become the world leader in the « food transition ».

  • LFIS-Dauphine Hackathon - Rank 3rd: 24 hours data challenge predicting stock volatility among the S&P500 and Stoxx600 on performance announcement dates

    Keywords: LightGBM, Hyperopt, resampling (SMOTE)

    On February 2020, I had the pleasure of participating in a 24 hours data challenge organized by LFIS, Sesamm and Dauphine Université Paris. The goal was to predict stock volatility among the S&P500 and Stoxx600 on performance announcement dates.

    Using advanced preprocessing and hyperparameters optimization techniques, our team won the 3rd prize.

Data Engineering

  • GDELT Project: Built a resilient architecture for storing large amount of data from the GDELT database allowing fast responding queries

    Keywords: Spark, MongoDB, AWS, Zeppelin, ETL

    Environment:

    • ETL: Spark
    • Architecture: 3 MongoDB nodes on EC2
    • Visualization: Zeppelin + Python webapp

    Environment for GDELT project

Data visualization

  • French digital training courses - GitHub repo: Built a web app in Python providing an analysis of French online digital training market.

    Keywords: web scrapping, data visualization

    Built a web app providing a statistical analysis of French digital training courses offered on the website / mobile app “Mon Compte Formation”, launched in November 2019 by the French Governement.

IoT