A Data Scientist's Studies

A Higher-Rated Strategy Game App

The project I completed for the Data Science bootcamp Module 5 involved finding a dataset I was unfamiliar with and analyze it. I browsed Kaggle and found a dataset of variables about strategy game apps from the Apple App Store. It can be found at this link: https://www.kaggle.com/tristan581/17k-apple-app-store-strategy-games/.

Posted by Bronwen C. on February 28, 2021

Two Experiences with Feature Engineering Using pandas

After completing four projects for the Flatiron Data Science course, I discovered that one aspect of data exploration (Exploratory Data Analysis or the E of OSEMN) that I enjoy is feature engineering.

Posted by Bronwen C. on July 27, 2020

Dimensionality & PCA

One aspect of data science I already had in my daily life before starting this course was clustering.

Posted by Bronwen C. on July 27, 2020

pandas.DataFrame Quick-Start Guide (Python)

The package pandas is oft-used in Flatiron’s Data Science course. Its manipulation of dataframes is very handy, although there are some surprising limitations that further differentiate it from working with simple Python or Java matrices. This post will address workarounds for a few of those limitations as well as the existing useful simple coding available when working with a pandas dataframe. Links appear throughout that go directly to the relevant webpage of the documentation.

Posted by Bronwen C. on July 24, 2020

Data Science: Using LASSO and Ridge Regression in Python

One important concept of data science that comes up in this course is overfitting and underfitting. Overfitting is when a function fits the training data supremely well, to the exclusion of any other data. LASSO (Least Absolute Shrinkage and Selection Operator) and Ridge regressions or (L1 and L2 Norm Regularizations, respectively) are two related approaches to overfitting and are used similarly to a linear regression model. They use a hyperparameter to penalize the coefficients so some are zero or closer to it in order to filter out noise in the data, thus helping reduce overfitting.

Posted by Bronwen C. on June 26, 2020