And how you can too

(Turn up the volume)

Why

The first time I stepped foot inside of my undergraduate business building, I noticed the flashing strip of green and red numbers in the lobby. I was not sure what those numbers were or what the percentages meant. All I knew was that important and powerful people cared about them, and I wanted to know more.

As months went by studying in that building, I would find myself gazing up at the stock scroller for motivation. I wanted to do what the stocks do: move and grow. …


How long will it take me to win the lottery?

Image for post
Image for post
Source

Intro

I grew up in New York where the lottery is a big part of my family’s culture. A refrigerator looks incomplete to me unless there are a few scratch-offs or Mega Million tickets stuck to it. Every holiday or birthday I would (and still do) receive scratch off tickets combined with half-joke/half-threats that “if you win big, I’m getting a cut!”

For better or for worse, I eventually moved to Alabama where the lottery is illegal. This was a bit of a shock for me, but I eventually got used to it. Every time I visit family in NY, however…


What is Alteryx and my thoughts on using it

Image for post
Image for post
Aerial view of Paris, France — Source

Introduction:

My coworker a few weeks ago peeped over my cubical wall and said, “Hey, you're going to learn Alteryx.” … Alteryx? I didn’t even know what word that was. After getting him to repeat and spell it out for me, I did some Googling. Upon my googling, I couldn’t find a consolidated breakdown of what the software was and what it’s best used for. I had to tediously piece this information together from many searches. That’s why I’m writing this article: To share conclusions about Alteryx I’ve pieced together and to share my own first-hand experience using it.

Table of Contents:

  • What is…


The path that lead me to data analytics and lessons learned

Image for post
Image for post
Source: https://unsplash.com/photos/FSFfEQkd1sc

My background is in accounting and healthcare consulting. During these experiences, I worked a lot with Excel to manipulate, organize, and calculate metrics from hospital payroll data and cost reports. My days were full of vlookups, pivot tables, macros, powerpoints, and looking through the CMS website.

It was an overall good experience and I worked with great people. I got to travel and live the wild lifestyle of a big 4 consultant. However, I wasn’t completely satisfied in terms of the technical trajectory that career path was headed towards. I looked at my management and saw where I would be…


Tomokazu Harimoto vs. Koki Niwa ITTF-ATTU Asian Cup 2019

Image for post
Image for post
Source

I’m a table tennis fan. I’ve been in the scene for a few years and I keep up with all the major players and tournaments.

I thought that a fun way to practice making dashboards in Google Data Studio would be to visualize a match I watched recently, with my favorite player Koki Niwa.

The match can be found here for reference.

Recording the Match


Using Linear Optimization in Python’s PuLP

Image for post
Image for post
Source

During the MBA, we learned all about predictive modeling techniques using Excel and University of Waikato’s free software, WEKA. We learned the foundational concepts but never ventured into the hard skills required for advanced calculation. After some time studying python, I thought it would be fun to rework one of my linear optimization projects I originally did in Excel’s solver. The goal of this article is to recreate the project in python’s PuLP, share what I learn along the way, and compare python’s results to Excel’s.

The real world benefits /applications of linear optimization are endless. I would highly recommend…


Visualizing Text by Frequency of Words — Economist Style

Image for post
Image for post
Image from Wikipedia

This is a simple exercise visualizing my latest issue of The Economist in a word cloud. I wasn’t in the mood to actually read the issue, so I thought this would be a fun way to digest the information quickly.

Table of Contents

  • PDF to Text Conversion
  • Text Preprocessing
  • Word Cloud

PDF to Text Conversion

The first step in this process was to convert the original PDF document to text. I did this using pdfminer.six (for python3).

The printed text looked like this; it takes a long time to scroll through the whole thing. Too long to read for a busy data scientist like myself. …


An experienced Excel-user’s perspective of SQL and why it’s worth learning.

Image for post
Image for post

Introduction

I have been involved in the data analytics realm for about 3 years. I’ve worked in the field for over 2 years as a healthcare analyst, and I recently finished my MBA with a focus in data science.

During my masters I was particularly interested in predictive modeling techniques using python (and I still am). However, from a basic organization /analysis / reporting perspective, I am hands-down most comfortable using Excel. Spending 60–80% of the day staring into an Excel spreadsheet is something not foreign to me. The tabs, ribbons, groupings, and tools nested within Excel’s GUI are the instruments…


Exploratory Analysis, Bag of Words, Logistic Regression

Image for post
Image for post
Source

Introduction

This project outlines a text-mining classification model using bag-of-words and logistic regression. We will attempt to understand the relationship between Uber text reviews and ride ratings. This is a great place to start if you’re relatively new to unstructured data analysis, yet have some experience with statistics and/or other classification experience.

Data Source: Kaggle by user Purvank

Table of Contents

  1. Preliminary Analysis
  2. Formatting / Converting Text
  3. Logistic Regression
  4. Testing / Conclusions

Preliminary Analysis

Import Data

First let’s bring in the data and visualize the dataframe:

#importing modules
import pandas as pd
import numpy as np
import seaborn as sns
%matplotlib inline
import matplotlib.pyplot as plt
import random


Everything you need to know about impurity measurements and information gain

Image for post
Image for post
Source

Introduction:

I recently published a project analyzing the performance between logistic regression and random forest. While working on the random forest component, I wanted expand on measures of impurity/information-gain, particularly Gini Index and Entropy. It’s not as well known as some other topics in machine learning, but I think it adds some valuable perspective to those wanting to understand the foundational mechanics of how decision trees work.

Table of Contents

  • Background
  • Gini Intuition
  • Entropy Intuition
  • Visualization using python
  • Information Gain Comparison
  • Practical Takeaways

Background:

Decision trees recursively split features with regard to their target variable’s purity. The algorithm is designed to find the optimal point…

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store