https://youtu.be/-6s2srZT4j0?t=0

https://youtu.be/-6s2srZT4j0?t=64

Applied Modeling

Discussion of Kaggle

A discussion of the project for January.

Regression vs. Classification:

Discrete, non-orderable values -> classification Discrete, orderable, low-cardinality -> regression or classification Discrete, orderable, hig-cardinality -> regression

Contiunous can be converted to discrete with compreison operators Discrete values can be grouped to produce binary classifications

Predicted probability: zone betweeen regression and classification

Conversion of cardinlity greater than 2 to binary

Drop rows that are missing values in a certain column Drop Rows

How is the target distributed Target distribution

Classification How many classes do I have?

• For classification problems:

`y.unique()` - returns list of unique values - [‘red’, ‘blue’, ‘purple’]

`y.nunique()` - gives count of unique values = 3

`y.value_counts()` - gives counts of values

`y.value_counts(normalize=True)` - gives counts as proportion of all

`y.value_counts(normalize=True).max()` - gives count as proportion of all of only the max

• Precision vs. Recall

Accuracy - percentage of correct predictions (of all predictions)

Precision - TP / (TP + FP)

Precision: ability of a classification model to return only relevant instances

Recall (also True Positive Rate) - TP / (TP + FN)

The ability of a classification model to identify all relevant instances

False Positive Rate - FP / (FP + TN)

F1 score: single metric that combines recall and precision using the harmonic mean

Regression vs. Classification

• Convert all strings to lower case

`df.['columnname'].str.lower()`

Goals:

Know how to make

Simple histogram Simple scatter Ctegorical vs numeric plot Cross tab

How to do template