NumPy Basics
NumPy is the foundational library for data processing in Python. Its core object is ndarray, the multidimensional array.
I include this in the main track not because you need to do complex scientific computing from the start, but because when you later connect to Pandas, machine learning, or numerical computing, many concepts will come back to arrays, shapes, and vectorization.
Creating Arrays
import numpy as np
a = np.array([1, 2, 3])
b = np.array([[1, 2], [3, 4]])
Common creation methods:
np.zeros((2, 3))
np.ones((2, 3))
np.arange(0, 10, 2)
np.linspace(0, 1, 5)
Understand three attributes first
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.shape) # (2, 3)
print(arr.ndim) # 2
print(arr.dtype) # int64 or int32, depends on platform
shape: Dimensionsndim: Number of dimensionsdtype: Element type
Indexing and Slicing
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr[0, 1]) # 2
print(arr[:, 1]) # [2 5]
print(arr[0:2, 1:])
Like regular Python sequences, NumPy supports slicing, but multidimensional array indexing is more expressive.
Vectorization
One of NumPy's most valuable features is that many operations don't require writing for loops yourself.
arr = np.array([1, 2, 3, 4])
print(arr * 2) # [2 4 6 8]
print(arr + 10) # [11 12 13 14]
print(arr ** 2) # [1 4 9 16]
Arrays can also perform element-wise operations directly with each other:
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
print(x + y)
print(x * y)
Common Statistical Operations
arr = np.array([1, 2, 3, 4, 5])
print(arr.sum())
print(arr.mean())
print(arr.max())
print(arr.min())
print(arr.std())
Reshape and Flatten
arr = np.arange(6)
matrix = arr.reshape(2, 3)
flat = matrix.flatten()
These two operations are very common when you need to change data from one-dimensional to two-dimensional or vice versa.
Boolean Indexing
This is a capability I use very frequently:
arr = np.array([1, 2, 3, 4, 5, 6])
mask = arr % 2 == 0
print(arr[mask]) # [2 4 6]
Combined with np.where() is also common:
arr = np.array([1, 2, 3, 4])
result = np.where(arr % 2 == 0, arr, 0)
Two Common Pitfalls
1. Slices usually return views
NumPy slices often return views rather than copies. Modifying a slice may affect the original array.
2. dtype affects results
Integer arrays performing integer operations and float arrays performing float operations will behave differently. When you encounter precision issues, check dtype first.
Learning priorities I focus on
- Be able to read
shape - Be proficient with indexing and slicing
- Understand what vectorization means
- Use built-in array operations first, then consider writing loops