5 Exercises
5.1 Exercise on interactive debugging
Let’s try out interactive debugging with a simple machine learning problem: using k-means clustering on the famous iris-dataset.
In this exercise there are four bugs hidden in the code. Try to use debugging tools presented in the debugging chapter to spot them out. The bugs should appear in order, so if you fix one, you can continue to the next.
- This example is available here: exercise-k-means.R.
- Hints for each of the problems are available here: exercise-k-means-hints.txt.
- Solution is available here: exercise-k-means-solution.R.
<- function(dataset) {
normalize_standard # Normalizes each column in the dataset to standard normalization
# https://en.wikipedia.org/wiki/Standard_score
<- data.frame()
normalized_data for (column in colnames(dataset)) {
<- mean(dataset[[column]])
data_mean <- sd(dataset[column])
data_sd <- ((dataset[[column]] - data_mean))/data_sd
normalized_data[[column]]
}return(normalized_data)
}
<- function() {
kmeans_iris # Load data
data(iris)
# Get a copy of the iris dataset
<- iris
iris_data
# Normalize data
<- normalize_standard(iris_data)
iris_normalized
# Take species out
<- iris_data$Species
species $Species <- NULL
iris_data
# Calculate kmeans clustering
# https://en.wikipedia.org/wiki/K-means_clustering
<- kmeans(iris_normalized, 3)
clustering
# Calculate confusion matrix
# https://en.wikipedia.org/wiki/Confusion_matrix#Confusion_matrices_with_more_than_two_categories
table(species, clustering$clusters)
}
print(kmeans_iris())
## Error in is.data.frame(x): (list) object cannot be coerced to type 'double'
5.2 Exercise on non-interactive debugging
Let’s try non-interactive debugging with another simple problem: calculating surviability among Titanic passengers based on different features.
In this exercise there are two bugs hidden in the code. One of the bugs is a traditional error raising bug, but the other one is a functional one. Try to use debugging tools of the non-interactive debugging section to find out the bugs.
- This example is available here: exercise-titanic-survivability.R.
- Hints for each of the problems are available here: exercise-k-means-hints.txt.
- Solution is available here: exercise-titanic-survivability-solution.R.
# Load titanic survivability data
data(Titanic)
<- as.data.frame(Titanic)
titanic
<- function(feature) {
check_survivability # Function for checking survivability per feature
# Get the different feature levels
<- levels(as.factor(titanic[[feature]]))
levels
# Create output data frame
<- data.frame(row.names=levels)
titanic_survivability
# Calculate survivability per level
for (level in levels) {
<- titanic[Titanic[[feature]] == level, ]
level_data <- sum(level_data[level_data$Survived == 'Yes', 'Freq'])
n_survived <- sum(level_data[level_data$Survived == 'No', 'Freq'])
n_died 'survivability'] <- n_survived / (n_survived + n_died)
titanic_survivability[levels,
}return(titanic_survivability)
}
# Features to iterate over
<- c('Class', 'Sex', 'Age')
different_features
# Analyze survivability per feature
for (feature in different_features) {
cat(paste('Feature to check:', feature, '\n'))
print(check_survivability(feature))
}## Feature to check: Class
## Error in Titanic[[feature]]: subscript out of bounds