Having graduated from M.Sc. Data Science, I find my passion in data, statistics, and computer science. Being in possession of solid knowledge and proficient skills in data analysis and computer programming, I am confident and motivated when facing new challenges during my study and work. When problems arise, I am persistent to push for the best results while being prudent to recognize the limits of potential solutions. My Bachelor's degree in engineering laid the foundation of my multidisciplinary background, the experience from which enables me to utilize transferable skills and think from different perspectives with high adaptability.


Leiden University

M.Sc. Statistical Science: Data Science
Relevant Courses:
Machine Learning | Deep Learning | Advances in Data Mining | Multivariate Analysis and Multidimensional Data Analysis (Business Data Analysis) | High-Dimensional Data Analysis | Linear and Generalized Linear Models | Mixed and Longitudinal Modeling | Statistical Consulting | Advanced Statistical Computing | Bayesian Statistics | Probability and Statistics | Mathematics for Statisticians | Programming with Python | Statistical Computing with R | Databases |

GPA: 8/10

Sept. 2017 - Dec. 2019
The Netherlands

University of Twente

M.Sc. Civil Engineering and Management
Obtained 30 credits and then transferred to Leiden.

GPA: 7.5/10

Sept. 2016 - Feb. 2017
The Netherlands

Jiangsu University

B.Eng. Civil Engineering

GPA: 83/10

Sept. 2012 - Jun. 2016


Data Analyst

Sept. 2020 - Now
The Netherlands

Intern Data Scientist

Dr. Reddy’s Research and Development B.V.
  • Performed a complete analysis (from feature engineering to model training to model diagnosis) of two projects from pharmaceutical industry;
  • Applied a bunch of machine learning techniques to build prediction models to aid decision-making;
  • Visualized and interpreted the results to colleagues from multiple disciplines;
  • Learned and applied new methods that can make models and predictions interpretable to non-technical staff.
  • Tools & Models used:

    R, Logistic Regression, Ridge, Lasso, Naive Bayes, Support Vector Machines, Random Forests, Neural Networks, Principal Component Analysis, etc.

    Feb. 2019 - Apr. 2019
    Leiden, The Netherlands


    Research on Interpretable Machine Learning Algorithms

    Master's thesis
  • Studied and applied recently published methods that can make machine learning models interpretable, which is beneficial to the users who need to understand predictions;
  • Evaluated the interpretability and predictive performance of several recently published interpretable models and compared the results to other classic models.
  • Tools & Models used:

    R, Python, NumPy, Pandas, SciPy, Matplotlib, Seaborn, Scikit-Learn, Linux, Decision Trees, Random Forests, Rule-Based Models (e.g. Scalable Bayesian Rule Lists & Falling Rule Lists), etc.

    May 2019 - Dec. 2019

    Testing the Work Environment Hypothesis of Bullying on a Department Level

    A statistical consulting project
  • Performed a massive data cleansing work in collaboration with the client to ensure the analysis is valid and meaningful under the specific background;
  • Applied statistical modeling methods to help the client explore the relationships between bullying and different working environment factors.
  • Tools & Models used:

    R, SPSS, Mixed-Effect Models, Ordinal Logistic Regression Models, etc.

    Sept. 2018 - Dec. 2018

    A Movie Recommendation System; Finding Similar Users of Netflix

    Two data mining course projects
  • The first: built a recommendation system that provides users with the recommendations in respect with the other users who might have a similar viewing history or preferences;
  • The second: implemented the algorithm that can find users who might have a similar viewing history or preferences.
  • Tools & Models used:

    Python, NumPy, Pandas, SciPy, Matplotlib, Matrix Factorization, MinHash, Locality Sensitive Hashing, etc.

    Sept. 2018 - Oct. 2018

    Real-Time Facial Expression Recognition; Classifying Images of Handwritten Digits; Text Generation and Language Translation

    Three deep learning course projects
  • The first: created an application that captures real-time images with a webcam and performs facial expression recognition;
  • The second: implemented several algorithms for classifying images of handwritten digits;
  • The third: built two networks that 1) generate new text from original texts, and 2) perform sequence-to-sequence translation.
  • Tools & Models used:

    Python, NumPy, Pandas, Matplotlib, Seaborn, Scikit-Learn, TensorFlow, Keras, OpenCV, CNN, RNN, LSTM, Bayes Classifiers, Multi-class Perceptrons, etc.

    Feb. 2018 - May 2018


      Chinese (Native), English (Fluent), Dutch (A2 level)

      Machine Learning, Deep Learning, Data Mining, Data Analytics, Data Visualization, Statistical Modeling, Statistical Consulting, Prediction, Recommendation Systems, Business Data Analysis

      R, Python, C

      SQL, MapReduce, Linux, NumPy, Pandas, SciPy, Matplotlib, Seaborn, Scikit-Learn, TensorFlow, Keras, SPSS, Microsoft Office, RStudio, Spyder, Jupyter Notebook, LaTeX, etc.

      Logistic Regression, Ridge & Lasso Regression, Decision Trees, Random Forests, Boosting, Naive Bayes, Perceptrons, SVM, k-NN, LDA, Neural Networks, CNN, RNN, LSTM, k-Means, PCA, Matrix Factorization, etc.


    Backpacking, photography, fitness and cooking.