Lecture 1

The objectives of this lecture are to:

  1. Introduce the fundamental package for scientific computing with Python -- NumPy.
  2. Creating NumPy n-dimensional arrays.
  3. Binary and unary operations on arrays.

Much of the lecture is based on this tutorial: http://wiki.scipy.org/Tentative_NumPy_Tutorial

Scientific Computing With Python using NumPy

The NumPy library provides the necessary functionality to use Python as a powerful scientific computing language. From NumPy's website, it provides,

  1. a powerful N-dimensional array object
  2. sophisticated (broadcasting) functions
  3. tools for integrating C/C++ and Fortran code
  4. useful linear algebra, discrete Fourier transforms, and random number capabilities

In this lecture we will learn about items 1-2 and how to apply them to manipulation of numeric n-dimensional arrays. Note that these are not matrices, but instead a more general type of data that contains many scalar values which may be "addressed", as in a matrix.

We may form n-dimensional arrays, which have type ndarray, for any type of numeric data (int, float, complex). By default, NumPy will assume that the datatype is float unless otherwise specified.

Let's create a 2-dimensional array by importing NumPy and using its equivalent function to Python's built-in range() function, arange(),

In [1]:
# import all of NumPy's objects into the interpeter-level namespace
from numpy import *

# create an array of float values from 0->14
a = arange(15)
print(a)

# "reshape" the array to be 2-dimensional, 3 rows and 5 columns, the number of elements are conserved!
a = a.reshape(3, 5)
print(a)
print(a[0,2]) # "index" into the array to display a single item
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]
2

Does the method in which NumPy displays the array make sense? It looks like a list of lists where the first index (row) indexes into the top-level list and the second indexes into the sub-list. It should be clear how this could be generalized into higher-dimensional arrays.

Before we go into more detail, let's explore the attributes of the ndarray object,

In [2]:
type(a)
Out[2]:
numpy.ndarray
In [3]:
# the array "shape" is a tuple which specifies the size of each dimension
a.shape
Out[3]:
(3, 5)
In [4]:
# the array "ndim" is an integer which specifies the dimensionality of the array
a.ndim
Out[4]:
2
In [5]:
# the array "dtype" is an data-type object which specifies type of data which the array stores
a.dtype
Out[5]:
dtype('int64')
In [6]:
# the array "size" is an integer which specifies the total number of items in the array
a.size
Out[6]:
15

These are the array attributes you will find most useful, but for an full listing see the docstring for ndarray.

Creation and manipulation of NumPy n-dimensional arrays

There are several approaches to creating an ndarray, the most basic of which is using the array() function,

In [7]:
# a 1x4 array 
a = array([1, 2, 3, 4])
print(a)
[1 2 3 4]

As you can see, we are using the same list syntax as we saw in the previous example. A single list of values will create a one-dimensional array of those values.

To create higher-dimensional arrays, we can nest lists,

In [8]:
# a 3x4 array
a = array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print(a)
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

The array() function takes as input nest sequences to create arrays of the same dimension of the degree of nesting. In fact, we do not need to use lists, we can use tuples as well

In [9]:
b = array( [ (1.5,2,3), (4,5,6) ] )
b
Out[9]:
array([[ 1.5,  2. ,  3. ],
       [ 4. ,  5. ,  6. ]])

Note what happens when I do not print() the array, the object is still displayed in an understandable format. In this case the interpreter converted all integers into floats,

The type of the array can also be explicitly specified at creation time:

In [10]:
c = array( [ [1,2], [3,4] ], dtype=complex )
c
Out[10]:
array([[ 1.+0.j,  2.+0.j],
       [ 3.+0.j,  4.+0.j]])

The function zeros() creates an array full of zeros, the function ones() creates an array full of ones, and the function empty() creates an array whose initial content is random and depends on the previous state of the memory that was allocated. Why might you want to use empty() instead of ones() or zeros()?

By default, the dtype of the created array is float64.

In [11]:
# note that zeros() and ones() input argument is a tuple which specifies the shape of the output
# array

# create a 3x4 array of zeros using the default datatype (float)
zeros( (3,4) )
Out[11]:
array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.]])
In [ ]:
# create a 2x3x4 array of ones using the int16 datatype (16-bit integer)
ones( (3,3,4), dtype=int16 )
In [12]:
# create a 2x3 array that is uninitialized
empty( (2,3) )
Out[12]:
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

To create arrays of sequences of numbers, NumPy provides a function analogous to range() that returns arrays instead of lists,

In [13]:
# create an array of the sequence of numbers starting with 10 and ending with 30 
# (but not including) incremented by 5.
arange( 10, 30, 5 )
Out[13]:
array([10, 15, 20, 25])
In [14]:
# what happens when one or more of the arguments is a float?
arange( 10, 30, 5.)
Out[14]:
array([ 10.,  15.,  20.,  25.])
In [ ]:
# thus a float may be used as any argument!
arange( 0, 2, 0.3 )

When arange() is used with floating point arguments, it is not always possible to predict the length of the resulting array due to floating point precision. For this reason, NumPy provides linspace() and logspace() that instead require the number of elements instead of the increment,

In [ ]:
# Unlike arange(), linspace() outputs arrays with datatype float by default which is more
# natural for scientific computing
linspace( 0, 2, 10)
In [ ]:
# linspace() is very useful for plotting functions, as we will see in the last lecture
x = linspace( 0, 2*pi, 10 )
sin(x)
In [ ]:
# logspace() outputs a logarithmic scale, by default the base is 10, but any value can be
# used, this functionality will be very useful later in your coursework!

# the first two arguments are the power to which the base is raised, not the actual value
# in the case, the output will be a logarithmic scale sequence starting at 10**0 and ending
# (but not including) 10**3 with 10 points.
logspace(0, 3, 10)

Printing Arrays

Printing small arrays is a relatively simple task, even if they are high dimensional. By default, NumPy decomposes arrays of dimension more than 2 into a two-dimensional representation.

In [ ]:
# a 2D array example, arange only outputs 1D arrays so we must reshape()
b = arange(12)
b = b.reshape(4, 3)
print(b)
In [ ]:
# a 3D array example using a more compact way to immediately reshape
# the 1D array returned by arange()
c = arange(24).reshape(2,3,4)
print(c)

Frequently we will be using arrays of large size. In these cases printing the array is infeasible and NumPy automatically skips the central part of the array. We may still inspect large sets of data, but to do this we will need plotting and visualization tools.

In [ ]:
# a large 1D array
print(arange(10000))
In [ ]:
# a large 2D array
print(arange(10000).reshape(100,100))

Operations - Binary Operations and "Broadcasting"

Now that we have a basic understanding of creating NumPy arrays, we can now start using arithmetic operators on them. NumPy applies arithmetic operators on arrays elementwise and (typically) creates a new array initialized with the resulting values.

In [ ]:
a = array( [20,30,40,50] )
b = arange( 4 )

print(a, b)

# this operation will be successful even though a and b have different shapes
c = a - b
c
In [ ]:
# raising an array to a power results in a new array with each element raised to a power
a**2
In [ ]:
# passing an array to the sin() function returns a new array of the values of the sin() applied
# to each element of a
sin(a)
In [ ]:
# here is an interesting case, now the comparison is applied to each element of `a` and a new
# array is created with the results of these comparisons!
a < 35

When you add, substract, multiply, or divide arrays of the same shape these operations are performed elementwise. Clearly we need some additional functionality if we want to perform matrix algebra using NumPy.

In [ ]:
A = array( [[1,1],
            [0,1]] )
B = array( [[2,-1],
            [3,4]] )
print(A + B)
print(A - B)
print(A * B)
print(A / B)

When arrays have different shapes, things get a bit more complicated. NumPy tries to "broadcast" the operation elementwise, where elements can be sub-arrays. This can cause quite a bit of confusion; let's explore this behaviour using examples.

In [ ]:
# A and B have different shapes, so NumPy tries to find sequences of sub-arrays of the smaller 
# array in the larger array.
A = array( [[1, 1, 1],
            [1, 0, 1]] )
B = array( [2, 0, 1] )

# `A`  is the larger array, thus NumPy determines if there are sub-arrays in `A` with the same
# shape as `B` and then applies the subtraction to these subarrays!

# The array resulting from these types of arithmetic operations has the shape of the *larger*
# array
print(A - B)
print(B - A)

Broadcasting is not always successful,

In [ ]:
A = array( [[1, 1, 1],
            [1, 0, 1]] )
B = array( [2, 0] )

A - B

Can you formulate the rule that NumPy uses to determine if arithmetic operations broadcast or not? Use the error from the previous evaluation to help your thought process. In the successful evaluation the shapes of the arrays were,

A.shape = (2,3)
B.shape = (3,)

and for the unsuccessful evaluation,

A.shape = (2,3)
B.shape = (2,)

What is the difference between the two sets of arrays? In the first case, NumPy evaluates the arithmetic operation elementwise where the element is a 1D array of size 3! Thus NumPy "sees" two elements of size B in array A.

In the second case, NumPy cannot find sub-arrays of A of the same shape as B. Thus you might have been able to deduct the general rule that NumPy uses for broadcasting arithmetic operations:

1) Compare the dimensions of A and B,

A.dim = a
B.dim = b

2) Assuming a > b, compare the last b values of the shapes of A and B, if they are equal the arithmetic operation broadcasts, else it does not.

Let's try a few examples,

In [ ]:
# An arithmetic operation should broadcast for arrays of shape (a,b,c) and (b,c)
A = arange(12).reshape(2,3,2)
B = arange(6).reshape(3,2)

print(A)
print("----------")
print(B)
In [ ]:
print(A + B)
print("-"*8) # a more compact way of printing a divider!
print(A * B)

Operations - Unary Operations

Many unary operations, such as computing the sum of all the elements in the array, are implemented as methods of the ndarray class,

In [ ]:
A = arange(6).reshape((2,3))
A
In [ ]:
A.sum()
In [ ]:
A.min()
In [ ]:
A.max()

By default, these unary operations are evaluated regardless of the shape of the array, treating it like a collection of numbers. However, by specifying the axis parameter you may apply the operation over each a sub-array element resulting from treating the values over each element in the axis specified,

In [ ]:
B = arange(12).reshape(3,4)
B
In [ ]:
# sum over the first axis (3, 4)->(4,)
B.sum(axis=0)
In [ ]:
# sum over the second axis (3, 4)->(3,)
B.sum(axis=1)

In addition to unary functions that are methods of NumPy arrays, many other more general functions exist that evaluate frequently used trigonometric and other special functions. Once again, these are evaluated elementwise regardless of the shape of the array,

In [ ]:
B = arange(3)
print(B)
exp(B)
In [ ]:
sqrt(B)