The objectives of this lecture are to:

- Learn how to perform matrix operations with
`NumPy`

. - File input/output for numerical data.

*Much of the lecture is based on this tutorial: http://wiki.scipy.org/Tentative_NumPy_Tutorial*

As we learned in the previous lectures, arithmetic operations using arrays are element-wise. In order to perform matrix operations, NumPy implements a `matrix`

object that is derived from the `array`

object.

`matrix`

object¶In order to create a `matrix`

object there are two general methods:

*Copy*values from an existing`array`

object.- Create a
`matrix`

and initialize it as you would an array.

NumPy provides some special conveniences for initializing `matrix`

objects which you will find useful,

In [ ]:

```
from numpy import *
# Method 1, *copy* data from an existing array
A = array([[1.0, 2.0], [3.0, 4.0]])
A = matrix(A)
print(type(A))
print(A)
print(8*'-')
# Method 2a, provide matrix element values in an array()-like syntax
A = matrix([[1.0, 2.0], [3.0, 4.0]])
print(A)
print(8*'-')
# Method 2b, provide matrix element values in a more compact format similar to MATLAB/Octave/Scilab etc
A = matrix('1.0 2.0; 3.0 4.0')
print(A)
```

`matrix`

data is exactly the same as a 2D array. For the most part, using Method 1 is inefficient if your objective is to use matrix algebra from the beginning. There is one case where Method 1 is useful; NumPy provides several built-in functions to generate special arrays,

In [ ]:

```
# the output of all of NumPy's built-in functions are arrays, they are more general than matricies
B = ones((3,3))
print(type(B))
# instead of creating different built-in functions, you should instead initialize a matrix using the
# array output
B = matrix(B)
print(type(B))
print(B)
```

`identity()`

in that we frequently need to use the identity matrix in matrix algebra,

In [ ]:

```
I = identity(3)
print(type(I))
print(I)
I = matrix(I)
print(type(I))
print(I)
```

`array`

to make a `matrix`

,

In [ ]:

```
Aarr = array([[1.0, 2.0], [3.0, 4.0]])
Amat = matrix(Aarr)
# Aarr's data will *not* be changed
Amat[0,0] = 0.0
print(Aarr)
print(Amat)
```

`array`

methods? A *view*! NumPy provides two functions to create views of `array`

and `matrix`

objects for this exact purpose: `asmatrix()`

and `asarray`

,

In [ ]:

```
Aarr = array([[1.0, 2.0], [3.0, 4.0]])
Amat = asmatrix(Aarr)
# Aarr's data *will* be changed
Amat[0,0] = 0.0
print(Aarr)
print(Amat)
```

Arithmetic operations with `matrix`

objects results in matrix algebra operations that you are familiar with,

In [ ]:

```
A = matrix('1.0 2.0; 3.0 4.0')
I = asmatrix(identity(2))
print(A+I)
print(8*'-')
print(A-I)
print(8*'-')
print(A*I)
print(8*'-')
print(A**2)
```

Additionaly, there are many attributes of the `matrix`

class that are defined for convenience,

In [ ]:

```
# the `T` attribute of a `matrix` is a *view* of its transpose
print(A.T)
print(8*'-')
# the `I` attribute of a `matrix` is a its inverse, it is computed when needed (for obvious reasons)
print(A*A.I)
```

`linalg`

module of NumPy provides linear algebra functionality,

In [ ]:

```
# find the determinant of a matrix
linalg.det(A)
```

In [ ]:

```
# find the inverse of a matrix, equivalent to A.I
linalg.inv(A)
```

In [ ]:

```
# eigendecomposition of a square matrix
linalg.eig(A)
```

In [ ]:

```
# QR factorization, note that a more robust method is used than Gram-Schmit
linalg.qr(A)
```

In [ ]:

```
# solving a linear system A*x = b, as you learned in your Linear Algebra course
# this is more efficient than A.I*B
b = matrix("1.0 ; 2.0")
linalg.solve(A,b)
```

NumPy provides a relatively simple but powerful set of functions for reading and writing numerical data files:

`save(filename, array_like)`

`savez(filename, label1 = array_like1, label2 = array_like2, ...)`

`load(filename)`

`savetxt(filename, array_like, format)`

`loadtxt(filename)`

You will learn about the basic usage of these functions and one intermediate-level concept, memory-mapped file access, that is useful for dealing with very large data-sets.

The most simple approach to saving numerical data with NumPy is to save each array in its own file,

In [ ]:

```
# let's create a relatively large array
data = random.random((1000,1000))
# create a `string` object containing the desired file name
filename = "my_array"
# use the `save()` function
save(filename, data)
```

`savetxt()`

but this convenience is not without a cost...

In [ ]:

```
# save the array data to a human-readable text file
savetxt(filename+".txt", data)
```

If you look in your working directory, there is a new file "my_array.txt". The `savetxt()`

function does not add a filetype suffix so you may choose what is appropriate (.txt, .csv, etc). In this case I am using the default formatting to output the data, but there is quite a bit of flexibility in the formatting that you may configure. This option is especially useful for exporting data to another software package such as spreadsheet software.

The cost for this readability is the size of the data file; in your working directory type `ls -lh my_array.*`

,

```
$ ls -lh my_array.*
-rw-rw-r-- 1 nasser nasser 7.7M Jun 20 15:47 my_array.npy
-rw-rw-r-- 1 nasser nasser 24M Jun 20 16:08 my_array.txt
```

The size of the text file is almost three times that of the binary file. The text representation of each value in the array clearly requires more data than the direct binary representation and a delimiter (a space, comma, etc) is required in the text file for the reader to distinguish each value. Thus the vast majority of the time we will use the more efficient `save()`

function.

Loading saved files is a relatively simple task, if the file is in the binary ".npy" format use `load()`

,

In [ ]:

```
array1 = load(filename+".npy")
print(array1.shape)
```

Else if it is in a text format use `loadtxt()`

,

In [ ]:

```
array2 = loadtxt(filename+".txt")
print(array2.shape)
```

In the previous examples we have chosen to use the `array`

datatype, but any array-like data type is compatible with these functions such as `matrix`

.

As you have seen, single array input/output is very very simple, but frequently we wish to save many arrays and doing so in a single file would be convenient.

Saving and loading data files with multiple arrays is slightly more complicated and can be accomplished using `savez()`

to save and `load()`

(as before),

In [ ]:

```
# using array1 and array2 from the previous example
savez("my_arrays", large_array=array1, small_array=array1[::10, ::10])
# an equivalent function call, but now the keyword for array1 will default to arr_0
# and for array2 will default to arr_1
#savez("my_arrays", array1, array1[::10, ::10])
```

As before, the initial argument is the file name, but now additional arguments require assigning the arrays you want to store to keywords. As many arrays as you wish may be assigned and the keywords are optional. If they are not provided a default label `arr_0`

, `arr_1`

, ... are assigned. Check your working directory and you will find a data file "my_arrays.npz", where the ".npz" suffix denotes that this is a different format than ".npy".

This saves the array data in an uncompressed format, the variant `savez_compressed()`

only differs in that it saves the data in a compressed format. Typically with numerical data, compression does not perform well unless there are many repeated values.

To load the data, use the same syntax as before,

In [ ]:

```
arrays = load("my_arrays"+".npz")
type(arrays)
```

`load()`

function returns a single `NpzFile`

object which is derived from the `dict`

object. The `NpzObject`

behaves very similar to a dictionary where the keys are the keywords from the `savez()`

call and the data associated with the keys are the arrays that were stored.

In [ ]:

```
# the list of keys is useful
keys = arrays.keys()
print(keys)
array1 = arrays[keys[0]]
array2 = arrays[keys[1]]
print(array1.shape)
print(array2.shape)
```

The `load()`

function allows *memory-mapped* access to data files where only a small segment of a file on-disk is loaded into main memory. For large to very large files this results in significant speed-up of array manipulations. A simple example of this is an array containing time series data of the state of a system. We typically will need to manipulate one or a few snapshots at a time, so there is no need to load the whole array at once. The memory-mapping functionality of NumPy is transparent to the user; simply enable it through setting the keyword `mmap_mode`

of the `load()`

function to one of these four values,

`r` | Open existing file for reading only. |

`r+` | Open existing file for reading and writing. |

`w+` | Create or overwrite existing file for reading and writing. |

`c` | Copy-on-write: assignments affect data in memory, but changes are not saved to disk. The file on disk is read-only. |

The purpose of "r" and "r+" are obvious, but "w+" and "c" require some explanation. The "c" option us useful when you want to manipulate and explore array data without affecting the data on-disk. Using this option frequently is good practice so that you do not accidentilly change important data that you have stored. Consider removing filesystem write access for an extra measure of caution,

```
$ touch file
$ ls -l file
-rw-rw-r-- 1 nasser nasser 0 Jun 20 17:14 file
$ chmod -w file
$ rm file
rm: remove write-protected regular empty file ‘file’?
```

The operating system will not allow you to modify this file and removing it requires an extra confirmation! The `w+`

is similar to the `r+`

option, except it overwrites the existing file, so it is rarely used.

This functionality only works with single array ".npy" files, so let's open the "my_array.npy" file from the previous example,

In [ ]:

```
# load the array stored in "my_array.npy" for read-only access the using memory-mapping mode
arr_mmap = load("my_array.npy",mmap_mode='r')
type(arr_mmap)
```

`memmap`

object is array-like, we may manipulate it directly but it is more understandable to use `asarray()`

or `asmatrix()`

to create a view of it that has a familiar type,

In [ ]:

```
arr = asarray(arr_mmap)
print(type(arr))
mat = asmatrix(arr_mmap)
print(type(mat))
```