December 30, 2018

Popover breakfast!

This morning we had homemade popovers for breakfast - they were great!! So great that I forgot to take a picture until after I finished one and had ripped open the second.

I’m not making any promises, but if you’re at our house in the winter months and you ask nicely, you too could enjoy this treat.

December 21, 2018

"...if you want to learn something, I can't stop you. If you don't...I cannot teach you."


As I was catching up on some podcasts after finals week, the episode of Freakonomics called "Where Does Creativity Come From? (And Why Do Schools Kill It Off)?" which had the following line from legendary trumpeter Wynton Marsalis: "...if you want to learn something, I can't stop you. If you don't want to learn it, I cannot teach you."  Whoa!  That is so true. I can't count the number of times that I have students in my class who are there because they have to fulfill a science credit (for various reasons) and have very little interest in the physics I am trying to discuss.  I think that I have tried for years to foster a classroom environment where learning can happen, but I sometimes forget that students have to WANT to learn what I am offering to teach.

Following my continuing philosophy to not hide anything in terms of pedagogy, learning, or teaching from my students, I plan to hang some printouts of these images I made and have them in the classroom as a reminder that the choice to engage in learning is solely up to the learner.

After hearing this episode, I thought for sure that some other teacher had discovered this great podcast episode and the Marsalis line before I did.  I did a quick search and the only post I could find was this one on Medium from Shaun Mosley.  I like how he tied the process of developing creativity and learning to the differences between extrinsic and intrinsic motivations. It is something I have certainly thought a lot about as I have planned my classes and made the shift to Standards-Based Assessment and Reporting.

To all the teachers out there: if you have a chance to listen to the podcast episode, I'd love to know what you think about it and what you are doing in your class to engage learners in creativity. Let me know!


Photo source/credit: Eric Delmar public domain image from Wikimedia Commons.
Images on this page are licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.

Creative Commons License

August 28, 2018

Some observations of doing a bit of data analysis with DBSCAN and pandas in a Jupyter notebook

Sorting Classifications for making graphs-VolunteerClassifications
Here is a Jupyter notebook I was using today to parse the classifications from the Steelpan Vibrations project. I'm leaving some of the notes here as a reminder to myself for the future. (I learned how to put the Jupyter notebook into the blog from this page.)

I really want to share this because in all my reading on using DBSCAN to do cluster analysis, I had a hard time finding any page online that was describing how the coordinates of the points identified in a cluster could be paired with matched data from the larger (original) data set. When I found the solution (see link in the comments between cells below) it was really obvious, but it was painful not knowing even how to google for what I was looking for.

Function to do the cluster identification with DBSCAN:
In [31]:
def dbscan(crds):
    bad_xy = []  #might need to change this
    X = np.array(crds)
    db = DBSCAN(eps=18, min_samples=3).fit(X)
    core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
    core_samples_mask[db.core_sample_indices_] = True
    labels = db.labels_
    
    n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
    unique_labels = set(labels)
    
    colors = plt.cm.Spectral(np.linspace(0, 1, len(unique_labels)))
    
    for k, col in zip(unique_labels, colors):
        if k == -1:
            # Black used for noise.
            col = 'k'

        class_member_mask = (labels == k)
        
        # These are the definitely "good" xy values.
        xy = X[class_member_mask & core_samples_mask]
        plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=col,
                 markeredgecolor='k', markersize=14)
        #print("\n Good? xy = ",xy)
        #print("X = ",X)
        # These are the "bad" xy values. Note that some maybe-bad and maybe-good are included here.
        xy = X[class_member_mask & ~core_samples_mask]
        plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=col,
                 markeredgecolor='k', markersize=6)
        #print("\n Bad? xy = ",xy)
        bad_xy.append(xy)

    plt.title('Estimated number of clusters: %d' % n_clusters_)
    plt.xlim(0, 512)
    plt.ylim(0, 384)
    
    clusters = [X[labels == i] for i in range(n_clusters_)]
    #print(clusters)
    #print(db.labels_)
    
    return clusters, labels
Import the classifications into a pandas DataFrame. I'm using header=None because there were no headings in the csv file:
In [32]:
import pandas as pd
df=pd.read_csv('averages-strike1.csv', sep=',',header=None)
This is the main part of the code that ends up calling the dbscan function at the end:
In [34]:
from matplotlib.patches import Ellipse
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as col
cmap_1 = cm.ScalarMappable(col.Normalize(1, 11, cm.gist_rainbow))
import numpy as np
from sklearn.cluster import DBSCAN

x_val = []
y_val = []
frng = []
crds = []
ell = []

for centers in df.values:
    x_val.append(centers[0])
    y_val.append(centers[1])
    frng.append(centers[3])
    crds.append([centers[0], centers[1]])
    ell.append(Ellipse(xy=[centers[0], centers[1]], width=centers[4], height=centers[5], angle=centers[6]))
    centers_raw = {'XVal': x_val,
                   'YVal': y_val,
                   'Fringe': frng}
    
centers_df = pd.DataFrame(centers_raw, columns=['XVal', 'YVal', 'Fringe'])
plt.figure(0)
plt.scatter(centers_df.XVal, centers_df.YVal, s=20, c=cmap_1.to_rgba(centers_df.Fringe), alpha=.6)
plt.xlim(0, 512)
plt.ylim(0, 384)
#plt.title('Subject id = %s'%(coords_x[0][2]))
plt.show()
#print(crds)
plt.figure(1)
clusters, labels = dbscan(crds)
/Users/amorriso/anaconda/lib/python3.6/site-packages/matplotlib/lines.py:1206: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future.
  if self._markerfacecolor != fc:
Check the DataFrame once, and then check it again after renaming the columns:
In [30]:
df[:15]
Out[30]:
x y filename fringe rx ry angle cluster
0 107.716469 213.009577 06240907_proc_00254.png 1.000000 85.034929 67.943204 -47.505782 0
1 114.698967 213.766703 06240907_proc_00258.png 1.333333 67.924027 67.389913 -51.659952 0
2 111.190662 218.375451 06240907_proc_00270.png 0.714286 67.455082 57.088226 -63.335567 0
3 113.800339 223.653310 06240907_proc_00276.png 8.333333 86.160744 73.501320 -73.822837 0
4 88.625250 218.599081 06240907_proc_00279.png 7.200000 119.292404 107.265178 -76.700412 0
5 81.290269 220.570363 06240907_proc_00281.png 7.333333 115.024131 109.400213 -91.981419 0
6 81.476925 215.762886 06240907_proc_00282.png 6.166667 115.916690 111.225947 -51.426068 0
7 72.502562 219.822452 06240907_proc_00292.png 7.200000 115.302500 108.964856 -54.631973 0
8 71.396729 213.876289 06240907_proc_00295.png 7.000000 132.873660 114.236231 -88.764995 0
9 73.012500 206.005209 06240907_proc_00299.png 10.000000 116.456652 113.427691 -82.312357 0
10 62.431250 206.850000 06240907_proc_00301.png 10.000000 104.117715 88.929126 -2.347311 0
11 141.296875 252.166667 06240907_proc_00301.png 3.666667 55.919208 29.365025 62.916449 -1
12 71.331521 212.055188 06240907_proc_00306.png 8.166667 122.378310 99.126123 -52.857932 0
13 71.714899 208.812385 06240907_proc_00307.png 8.666667 107.007787 98.573020 11.509674 0
14 286.998737 170.834790 06240907_proc_00307.png 1.200000 34.312887 32.881617 -0.016536 1
In [7]:
labels
Out[7]:
array([0, 0, 0, ..., 0, 1, 3])
These next two lines are the magic that connect the clusters identified by DBSCAN with the original classifications so that we can plot the fringe measurements for each cluster over time.
Finally figured this out by reading the question posted here: https://datascience.stackexchange.com/questions/29587/python-clustering-and-labels
In [8]:
cluster=pd.Series(labels)
df["cluster"] = cluster
Rename the DataFrame columns:
In [10]:
df = df.rename(index=str, columns={0: "x", 1: "y",2:"filename", 3:"fringe",4:"rx", 5:"ry",6:"angle"})
Assign each cluster its own variable:
In [27]:
cluster0 = df[df['cluster']==0]
cluster1 = df[df['cluster']==1]
cluster2 = df[df['cluster']==2]
cluster3 = df[df['cluster']==3]
cluster4 = df[df['cluster']==4]
cluster5 = df[df['cluster']==5]
cluster6 = df[df['cluster']==6]
cluster7 = df[df['cluster']==7]
Make plots!!!
In [29]:
plt.scatter(cluster0.index, cluster0.fringe)
plt.show()
In [36]:
plt.scatter(cluster1.index, cluster1.fringe)
plt.show()
In [37]:
plt.scatter(cluster2.index, cluster2.fringe)
plt.show()
In [38]:
plt.scatter(cluster3.index, cluster3.fringe)
plt.show()
In [39]:
plt.scatter(cluster4.index, cluster4.fringe)
plt.show()
In [43]:
plt.scatter(cluster5.index, cluster5.fringe)
plt.show()