Recent Posts

Entry SM02: Clean Data

20 minute read

Wrangling data into a usable form is a big part of any real world machine learning problem. When tackling these types of problems, several things are general...

Entry SM01: Using S3 from AWS’s SageMaker

10 minute read

There are a lot of considerations in moving from a local model used to train and predict on batch data to a production model. This series of posts explores h...

Entry NLP4: Frequencies and Comparison

26 minute read

In the previous entries in this series, I loaded all the files in a directory, processed the data, and transformed it into ngrams. Now it’s time to do math a...

Entry NLP2: Load All Files in a Directory

5 minute read

In the previous entry, I figured out how to process individual files, removing many of the items on the “Remove lines/characters” list specified in the homew...