This shows you the differences between two versions of the page.
cs401r_w2016:lab1 [2016/01/03 00:04] admin |
cs401r_w2016:lab1 [2021/06/30 23:42] |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====Objective:==== | ||
- | |||
- | Get started with anaconda, python, ipython notebooks, and pandas. Begin producing simple visualizations of data and images. | ||
- | |||
- | ---- | ||
- | ====Deliverable:==== | ||
- | |||
- | For this lab, you will submit an ipython notebook. This notebook will have two parts: | ||
- | |||
- | **Part 1:** Your notebook should generate a random image. We will run this | ||
- | notebook 5 times; it should generate 5 different, moderately complex | ||
- | images. Each image should be 512 x 288. Have fun with it! | ||
- | |||
- | The resulting image could, for example, look like this: | ||
- | |||
- | {{:cs401r_w2016:lab1.png?nolink|}} | ||
- | |||
- | **Part 2:** Your notebook should use the pandas library to read in the Rossman store sales data (a CSV dataset) and plot the sales of store #1. Your plot should look something like this: | ||
- | |||
- | {{:cs401r_w2016:lab1_storesales.png?direct&700|}} | ||
- | |||
- | Done correctly, this should only take a few lines of code. | ||
- | |||
- | ---- | ||
- | ====Description:==== | ||
- | |||
- | Throughout this class, we will be using a combination of ipython | ||
- | notebooks and the anaconda python distribution. For this lab, you | ||
- | must install anaconda, and write a simple python program (using | ||
- | ipython notebooks). As described above, the notebook should do two things: | ||
- | 1) generate simple random images, and 2) plot some data using pandas. | ||
- | |||
- | For part 1, you can generate any sort of random image that you want -- consider | ||
- | random lines, random curves, random text, etc. Each time the program | ||
- | is run, it should generate a different random image. Your image | ||
- | should have at least 50 random elements (they can all be the same | ||
- | type, such as random lines, and can be created in a loop). We won't | ||
- | count the number of elements; this is just to encourage you to create | ||
- | random images with moderate complexity. | ||
- | |||
- | In preparation for future labs, we strongly encourage you to use the | ||
- | [[http://cairographics.org/|cairo]] package as part of your image generator. | ||
- | |||
- | For part 2, the data you should use is downloadable here: | ||
- | |||
- | [[http://hatch.cs.byu.edu/courses/stat_ml/store_train.csv|Rossman store sales data]] | ||
- | |||
- | ---- | ||
- | ====Installing anaconda:==== | ||
- | |||
- | http://docs.continuum.io/anaconda/install | ||
- | |||
- | To generate images, check out PIL and cairo: | ||
- | |||
- | ''conda install cairo'' | ||
- | |||
- | To generate random numbers, check out the [[http://docs.scipy.org/doc/numpy-1.10.0/reference/routines.random.html|numpy.random]] module. | ||
- | |||
- | To create a new notebook, run: | ||
- | |||
- | ''jupyter-notebook'' | ||
- | |||
- | This should start an ipython kernel in the background, set up a | ||
- | webserver, and point your browser to the webserver. In the | ||
- | upper-right corner, you will see a "new" menu; under that menu you | ||
- | should see "Notebook" and "Python 2". This will create a new | ||
- | notebook. | ||
- | |||
- | Here's some starter code to help you generate an image. The ''nbimage'' function will display the image inline in the notebook: | ||
- | |||
- | <code python> | ||
- | import cairo | ||
- | import numpy | ||
- | |||
- | # A simple function to display an image in an ipython notebook | ||
- | def nbimage( data ): | ||
- | from IPython.display import display, Image | ||
- | from PIL.Image import fromarray | ||
- | from StringIO import StringIO | ||
- | |||
- | s = StringIO() | ||
- | fromarray( data ).save( s, 'png' ) | ||
- | display( Image( s.getvalue() ) ) | ||
- | |||
- | WIDTH = 512 | ||
- | HEIGHT = 288 | ||
- | |||
- | # this is a numpy buffer to hold the image data | ||
- | data = numpy.zeros( (HEIGHT,WIDTH,4), dtype=numpy.uint8 ) | ||
- | |||
- | # this creates a cairo context based on the numpy buffer | ||
- | ims = cairo.ImageSurface.create_for_data( data, cairo.FORMAT_ARGB32, WIDTH, HEIGHT ) | ||
- | cr = cairo.Context( ims ) | ||
- | |||
- | # draw a blue line | ||
- | cr.set_source_rgba( 1.0, 0.0, 0.0, 1.0 ) | ||
- | cr.set_line_width( 2.0 ) | ||
- | cr.move_to( 0.0, 0.0 ) | ||
- | cr.line_to( 100.0, 100.0 ) | ||
- | cr.stroke() | ||
- | |||
- | # display the image | ||
- | nbimage( data ) | ||
- | </code> | ||
- | |||
- | ---- | ||
- | ====Using Pandas:==== | ||
- | |||
- | For the second part of this lab, you will need to understand the ''pandas'' python package, just a little bit. For this lab, you only need to know how to select some data from a CSV file. | ||
- | |||
- | You should read through this tutorial and play with it. | ||
- | |||
- | [[http://synesthesiam.com/posts/an-introduction-to-pandas.html|Tutorial on using Pandas]] | ||
- | |||
- | For this lab, you need select the data for store #1 and plot it. | ||
- | |||
- | If you want to get fancy, you should label the x-axis using dates (See the example image). This involves a bit of python trickery, but check out some helpful functions in the hints below. | ||
- | |||
- | ---- | ||
- | ====Hints:==== | ||
- | |||
- | The following python functions might be helpful: | ||
- | |||
- | <code python> | ||
- | |||
- | import matplotlib.pyplot as plt | ||
- | plt.plot_date | ||
- | |||
- | pandas.to_datetime | ||
- | |||
- | plt.legend | ||
- | plt.xlabel | ||
- | plt.ylabel | ||
- | |||
- | plt.tight_layout | ||
- | |||
- | </code> | ||
- | |||