Skip to main content

timsort

Tim Sort

Code

C language code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define RUN 32

void insertion_sort(int arr[], int left, int right) {
for (int i = left + 1; i <= right; i++) {
int temp = arr[i];
int j = i - 1;
while (j >= left && arr[j] > temp) {
arr[j + 1] = arr[j];
j--;
}
arr[j + 1] = temp;
}
}

void merge(int arr[], int l, int m, int r) {
int len1 = m - l + 1, len2 = r - m;
int left[len1], right[len2];
for (int i = 0; i < len1; i++)
left[i] = arr[l + i];
for (int i = 0; i < len2; i++)
right[i] = arr[m + 1 + i];

int i = 0;
int j = 0;
int k = l;

while (i < len1 && j < len2) {
if (left[i] <= right[j]) {
arr[k] = left[i];
i++;
} else {
arr[k] = right[j];
j++;
}
k++;
}

while (i < len1) {
arr[k] = left[i];
k++;
i++;
}

while (j < len2) {
arr[k] = right[j];
k++;
j++;
}
}

void tim_sort(int arr[], int n) {
for (int i = 0; i < n; i += RUN)
insertion_sort(arr, i, (i + RUN - 1) < (n - 1) ? (i + RUN - 1) : (n - 1));

for (int size = RUN; size < n; size = 2 * size) {
for (int left = 0; left < n; left += 2 * size) {
int mid = (left + size - 1) < (n - 1) ? (left + size - 1) : (n - 1);
int right = (left + 2 * size - 1) < (n - 1) ? (left + 2 * size - 1) : (n - 1);

if (mid < right)
merge(arr, left, mid, right);
}
}
}

int main() {
int arr[] = {12, 11, 13, 5, 6, 7};
int n = sizeof(arr) / sizeof(arr[0]);

printf("Given array is \n");
for (int i = 0; i < n; i++)
printf("%d ", arr[i]);
printf("\n");

tim_sort(arr, n);

printf("\nSorted array is \n");
for (int i = 0; i < n; i++)
printf("%d ", arr[i]);
printf("\n");
return 0;
}

Code Explanation

  1. Initialization: The array arr is the array to be sorted, and n is the length of the array.
  2. Insertion sort function:
    • The insertion_sort function is used to perform insertion sort on small blocks of the array.
    • Traverses the array from left to right, inserting each element into the correct position in the sorted portion.
  3. Merge function:
    • The merge function is used to merge two sorted sub-arrays.
    • Creates temporary arrays left and right to store elements of the two sub-arrays.
    • Uses a two-pointer approach to merge the two sub-arrays back into the original array.
  4. TimSort function:
    • tim_sort first divides the array into blocks of size RUN, using insertion sort to sort each block.
    • Then gradually merges these sorted blocks, starting with a merge size of RUN and doubling each time until the entire array is sorted.

Example Run

Assuming the input array is {12, 11, 13, 5, 6, 7}:

  1. Initial state: {12, 11, 13, 5, 6, 7}
  2. Insertion sort:
    • Sort each block of size RUN with insertion sort.
    • Since RUN is set to 32, the entire array is treated as one block and sorted with insertion sort.
    • Array after insertion sort: {5, 6, 7, 11, 12, 13}
  3. Merge:
    • Since the array is already sorted, no further merging is needed.

Time Complexity Analysis

The time complexity of TimSort mainly depends on the insertion sort and merge processes.

  • Best-case time complexity: O(n)O(n), when the input array is already partially sorted.
  • Worst-case time complexity: O(nlogn)O(n \log n), regardless of the initial order of the input array, the number of steps executed is essentially the same.
  • Average time complexity: O(nlogn)O(n \log n), for most input arrays, TimSort's performance is very stable.

Space Complexity Analysis

TimSort uses auxiliary arrays to merge sub-arrays, so the space complexity is O(n)O(n).

Advantages and Disadvantages

Advantages:

  • Stable time complexity; suitable for large-scale data sorting.
  • Combines the advantages of insertion sort and merge sort; excellent performance on partially sorted data.
  • Stable sorting algorithm; does not change the relative order of equal elements.

Disadvantages:

  • High implementation complexity; more complex to understand and code than simple sorting algorithms.
  • Auxiliary arrays lead to higher space complexity.

Applicable Scenarios

TimSort is suitable for the following scenarios:

  • Sorting large datasets.
  • When a highly efficient and stable sorting algorithm is needed.
  • Excellent performance when data is partially sorted.

Summary

TimSort is an efficient sorting algorithm that combines the advantages of insertion sort and merge sort. It works by performing insertion sort on small blocks of the array, then gradually merging these sorted blocks to complete the sorting. Although its implementation complexity is relatively high, it performs excellently when handling large-scale data and partially sorted data, and is a sorting algorithm with stable time complexity and higher space complexity.

Deeper Understanding of Tim Sort

TimSort's Philosophy and Implementation

TimSort is a hybrid sorting algorithm designed by Tim Peters in 2002 for the Python programming language. It combines the advantages of insertion sort and merge sort, and is particularly suited for handling real-world data, which tends to be partially sorted. TimSort's main idea is to divide the array into multiple small blocks (runs), sort each block with insertion sort, then use merge sort to merge these blocks.

TimSort's Basic Idea

  1. Divide into blocks: Split the array into multiple small blocks (called runs), each of size RUN (typically 32 or 64).
  2. Insertion sort: Sort each block using insertion sort.
  3. Merge: Use merge sort to gradually merge these sorted blocks until the entire array is sorted.

TimSort's Process

  1. Block division and insertion sort:

    • Split the array into blocks of size RUN.
    • Sort each block with insertion sort, because insertion sort is efficient for small-scale data.
  2. Merge blocks:

    • Use merge sort to gradually merge these sorted blocks.
    • Each merge starts with a block size of RUN and doubles each time, until the entire array is sorted.

TimSort's Detailed Steps

  1. Define RUN size:

    • Typically set RUN to 32 or 64, which is an empirical value that works well in most cases.
  2. Sort each block with insertion sort:

    • Traverse the array from left to right, processing one block of size RUN at a time.
    • Sort each block with insertion sort.
  3. Merge sorted blocks:

    • Start with a block size of RUN and gradually merge adjacent blocks.
    • After each merge, the block size doubles until the entire array is sorted.

TimSort's Pseudocode

def tim_sort(arr):
RUN = 32
n = len(arr)

# Sort each block with insertion sort
for i in range(0, n, RUN):
insertion_sort(arr, i, min((i + RUN - 1), (n - 1)))

# Merge sorted blocks
size = RUN
while size < n:
for left in range(0, n, 2 * size):
mid = min((left + size - 1), (n - 1))
right = min((left + 2 * size - 1), (n - 1))
if mid < right:
merge(arr, left, mid, right)
size = 2 * size

Insertion Sort and Merge Functions

Insertion Sort:

def insertion_sort(arr, left, right):
for i in range(left + 1, right + 1):
key = arr[i]
j = i - 1
while j >= left and arr[j] > key:
arr[j + 1] = arr[j]
j -= 1
arr[j + 1] = key

Merge Function:

def merge(arr, l, m, r):
len1, len2 = m - l + 1, r - m
left, right = [], []

for i in range(0, len1):
left.append(arr[l + i])
for i in range(0, len2):
right.append(arr[m + 1 + i])

i, j, k = 0, 0, l

while i < len1 and j < len2:
if left[i] <= right[j]:
arr[k] = left[i]
i += 1
else:
arr[k] = right[j]
j += 1
k += 1

while i < len1:
arr[k] = left[i]
k += 1
i += 1

while j < len2:
arr[k] = right[j]
k += 1
j += 1

TimSort's Implementation Details

  1. Block size: Choose an appropriate RUN size (typically 32 or 64) to ensure insertion sort efficiency within each block.
  2. Insertion sort: Use insertion sort within each block because it is very efficient for small-scale data.
  3. Merge sort: Gradually merge sorted blocks, doubling the block size after each merge, ultimately sorting the entire array.

TimSort's Time Complexity

  • Best-case time complexity: O(n)O(n), when the input array is already partially sorted.
  • Worst-case time complexity: O(nlogn)O(n \log n), regardless of the initial order of the input array, the number of steps executed is essentially the same.
  • Average time complexity: O(nlogn)O(n \log n), for most input arrays, TimSort's performance is very stable.

TimSort's Space Complexity

TimSort uses auxiliary arrays to merge sub-arrays, so the space complexity is O(n)O(n).

TimSort's Advantages and Disadvantages

Advantages:

  • Stable time complexity; suitable for large-scale data sorting.
  • Combines the advantages of insertion sort and merge sort; excellent performance on partially sorted data.
  • Stable sorting algorithm; does not change the relative order of equal elements.

Disadvantages:

  • High implementation complexity; more complex to understand and code than simple sorting algorithms.
  • Auxiliary arrays lead to higher space complexity.

TimSort's Applicable Scenarios

TimSort is suitable for the following scenarios:

  • Sorting large datasets.
  • When a highly efficient and stable sorting algorithm is needed.
  • Excellent performance when data is partially sorted.

Summary

TimSort is an efficient sorting algorithm that combines the advantages of insertion sort and merge sort. It works by performing insertion sort on small blocks of the array, then gradually merging these sorted blocks to complete the sorting. Although its implementation complexity is relatively high, it performs excellently when handling large-scale data and partially sorted data, and is a sorting algorithm with stable time complexity and higher space complexity.