NumPy Basics

NumPy is the foundational library for data processing in Python. Its core object is ndarray, the multidimensional array.

I include this in the main track not because you need to do complex scientific computing from the start, but because when you later connect to Pandas, machine learning, or numerical computing, many concepts will come back to arrays, shapes, and vectorization.

Creating Arrays

import numpy as np

a = np.array([1, 2, 3])
b = np.array([[1, 2], [3, 4]])

Common creation methods:

np.zeros((2, 3))
np.ones((2, 3))
np.arange(0, 10, 2)
np.linspace(0, 1, 5)

Understand three attributes first

arr = np.array([[1, 2, 3], [4, 5, 6]])

print(arr.shape)  # (2, 3)
print(arr.ndim)   # 2
print(arr.dtype)  # int64 or int32, depends on platform

shape: Dimensions
ndim: Number of dimensions
dtype: Element type

Indexing and Slicing

arr = np.array([[1, 2, 3], [4, 5, 6]])

print(arr[0, 1])   # 2
print(arr[:, 1])   # [2 5]
print(arr[0:2, 1:])

Like regular Python sequences, NumPy supports slicing, but multidimensional array indexing is more expressive.

Vectorization

One of NumPy's most valuable features is that many operations don't require writing for loops yourself.

arr = np.array([1, 2, 3, 4])

print(arr * 2)      # [2 4 6 8]
print(arr + 10)     # [11 12 13 14]
print(arr ** 2)     # [1 4 9 16]

Arrays can also perform element-wise operations directly with each other:

x = np.array([1, 2, 3])
y = np.array([4, 5, 6])

print(x + y)
print(x * y)

Common Statistical Operations

arr = np.array([1, 2, 3, 4, 5])

print(arr.sum())
print(arr.mean())
print(arr.max())
print(arr.min())
print(arr.std())

Reshape and Flatten

arr = np.arange(6)
matrix = arr.reshape(2, 3)
flat = matrix.flatten()

These two operations are very common when you need to change data from one-dimensional to two-dimensional or vice versa.

Boolean Indexing

This is a capability I use very frequently:

arr = np.array([1, 2, 3, 4, 5, 6])
mask = arr % 2 == 0

print(arr[mask])   # [2 4 6]

Combined with np.where() is also common:

arr = np.array([1, 2, 3, 4])
result = np.where(arr % 2 == 0, arr, 0)

Two Common Pitfalls

1. Slices usually return views

NumPy slices often return views rather than copies. Modifying a slice may affect the original array.

2. `dtype` affects results

Integer arrays performing integer operations and float arrays performing float operations will behave differently. When you encounter precision issues, check dtype first.

Learning priorities I focus on

Be able to read shape
Be proficient with indexing and slicing
Understand what vectorization means
Use built-in array operations first, then consider writing loops

Creating Arrays​

Understand three attributes first​

Indexing and Slicing​

Vectorization​

Common Statistical Operations​

Reshape and Flatten​

Boolean Indexing​

Two Common Pitfalls​

1. Slices usually return views​

2. dtype affects results​

Learning priorities I focus on​

Related Reading​