Skip to main content

NumPy Basics

NumPy is the foundational library for data processing in Python. Its core object is ndarray, the multidimensional array.

I include this in the main track not because you need to do complex scientific computing from the start, but because when you later connect to Pandas, machine learning, or numerical computing, many concepts will come back to arrays, shapes, and vectorization.

Creating Arrays

import numpy as np

a = np.array([1, 2, 3])
b = np.array([[1, 2], [3, 4]])

Common creation methods:

np.zeros((2, 3))
np.ones((2, 3))
np.arange(0, 10, 2)
np.linspace(0, 1, 5)

Understand three attributes first

arr = np.array([[1, 2, 3], [4, 5, 6]])

print(arr.shape) # (2, 3)
print(arr.ndim) # 2
print(arr.dtype) # int64 or int32, depends on platform
  • shape: Dimensions
  • ndim: Number of dimensions
  • dtype: Element type

Indexing and Slicing

arr = np.array([[1, 2, 3], [4, 5, 6]])

print(arr[0, 1]) # 2
print(arr[:, 1]) # [2 5]
print(arr[0:2, 1:])

Like regular Python sequences, NumPy supports slicing, but multidimensional array indexing is more expressive.

Vectorization

One of NumPy's most valuable features is that many operations don't require writing for loops yourself.

arr = np.array([1, 2, 3, 4])

print(arr * 2) # [2 4 6 8]
print(arr + 10) # [11 12 13 14]
print(arr ** 2) # [1 4 9 16]

Arrays can also perform element-wise operations directly with each other:

x = np.array([1, 2, 3])
y = np.array([4, 5, 6])

print(x + y)
print(x * y)

Common Statistical Operations

arr = np.array([1, 2, 3, 4, 5])

print(arr.sum())
print(arr.mean())
print(arr.max())
print(arr.min())
print(arr.std())

Reshape and Flatten

arr = np.arange(6)
matrix = arr.reshape(2, 3)
flat = matrix.flatten()

These two operations are very common when you need to change data from one-dimensional to two-dimensional or vice versa.

Boolean Indexing

This is a capability I use very frequently:

arr = np.array([1, 2, 3, 4, 5, 6])
mask = arr % 2 == 0

print(arr[mask]) # [2 4 6]

Combined with np.where() is also common:

arr = np.array([1, 2, 3, 4])
result = np.where(arr % 2 == 0, arr, 0)

Two Common Pitfalls

1. Slices usually return views

NumPy slices often return views rather than copies. Modifying a slice may affect the original array.

2. dtype affects results

Integer arrays performing integer operations and float arrays performing float operations will behave differently. When you encounter precision issues, check dtype first.

Learning priorities I focus on

  • Be able to read shape
  • Be proficient with indexing and slicing
  • Understand what vectorization means
  • Use built-in array operations first, then consider writing loops