https://youtu.be/-6s2srZT4j0?t=0
https://youtu.be/-6s2srZT4j0?t=64
https://www.youtube.com/watch?v=-6s2srZT4j0?t=1
Applied Modeling
Discussion of Kaggle
A discussion of the project for January.
Regression vs. Classification:
Discrete, non-orderable values -> classification Discrete, orderable, low-cardinality -> regression or classification Discrete, orderable, hig-cardinality -> regression
Contiunous can be converted to discrete with compreison operators Discrete values can be grouped to produce binary classifications
Predicted probability: zone betweeen regression and classification
Conversion of cardinlity greater than 2 to binary
Drop rows that are missing values in a certain column Drop Rows
How is the target distributed Target distribution
Classification How many classes do I have?
- For classification problems:
y.unique()
- returns list of unique values - [‘red’, ‘blue’, ‘purple’]
y.nunique()
- gives count of unique values = 3
y.value_counts()
- gives counts of values
y.value_counts(normalize=True)
- gives counts as proportion of all
y.value_counts(normalize=True).max()
- gives count as proportion of all of only the max
- Precision vs. Recall
Accuracy - percentage of correct predictions (of all predictions)
Precision - TP / (TP + FP)
Precision: ability of a classification model to return only relevant instances
Recall (also True Positive Rate) - TP / (TP + FN)
The ability of a classification model to identify all relevant instances
False Positive Rate - FP / (FP + TN)
F1 score: single metric that combines recall and precision using the harmonic mean
- Convert all strings to lower case
df.['columnname'].str.lower()
Goals:
Know how to make
Simple histogram Simple scatter Ctegorical vs numeric plot Cross tab
How to do template