timsort
Tim Sort
Code
C language code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define RUN 32
void insertion_sort(int arr[], int left, int right) {
for (int i = left + 1; i <= right; i++) {
int temp = arr[i];
int j = i - 1;
while (j >= left && arr[j] > temp) {
arr[j + 1] = arr[j];
j--;
}
arr[j + 1] = temp;
}
}
void merge(int arr[], int l, int m, int r) {
int len1 = m - l + 1, len2 = r - m;
int left[len1], right[len2];
for (int i = 0; i < len1; i++)
left[i] = arr[l + i];
for (int i = 0; i < len2; i++)
right[i] = arr[m + 1 + i];
int i = 0;
int j = 0;
int k = l;
while (i < len1 && j < len2) {
if (left[i] <= right[j]) {
arr[k] = left[i];
i++;
} else {
arr[k] = right[j];
j++;
}
k++;
}
while (i < len1) {
arr[k] = left[i];
k++;
i++;
}
while (j < len2) {
arr[k] = right[j];
k++;
j++;
}
}
void tim_sort(int arr[], int n) {
for (int i = 0; i < n; i += RUN)
insertion_sort(arr, i, (i + RUN - 1) < (n - 1) ? (i + RUN - 1) : (n - 1));
for (int size = RUN; size < n; size = 2 * size) {
for (int left = 0; left < n; left += 2 * size) {
int mid = (left + size - 1) < (n - 1) ? (left + size - 1) : (n - 1);
int right = (left + 2 * size - 1) < (n - 1) ? (left + 2 * size - 1) : (n - 1);
if (mid < right)
merge(arr, left, mid, right);
}
}
}
int main() {
int arr[] = {12, 11, 13, 5, 6, 7};
int n = sizeof(arr) / sizeof(arr[0]);
printf("Given array is \n");
for (int i = 0; i < n; i++)
printf("%d ", arr[i]);
printf("\n");
tim_sort(arr, n);
printf("\nSorted array is \n");
for (int i = 0; i < n; i++)
printf("%d ", arr[i]);
printf("\n");
return 0;
}
Code Explanation
- Initialization: The array
arris the array to be sorted, andnis the length of the array. - Insertion sort function:
- The
insertion_sortfunction is used to perform insertion sort on small blocks of the array. - Traverses the array from left to right, inserting each element into the correct position in the sorted portion.
- The
- Merge function:
- The
mergefunction is used to merge two sorted sub-arrays. - Creates temporary arrays
leftandrightto store elements of the two sub-arrays. - Uses a two-pointer approach to merge the two sub-arrays back into the original array.
- The
- TimSort function:
tim_sortfirst divides the array into blocks of sizeRUN, using insertion sort to sort each block.- Then gradually merges these sorted blocks, starting with a merge size of
RUNand doubling each time until the entire array is sorted.
Example Run
Assuming the input array is {12, 11, 13, 5, 6, 7}:
- Initial state:
{12, 11, 13, 5, 6, 7} - Insertion sort:
- Sort each block of size
RUNwith insertion sort. - Since
RUNis set to 32, the entire array is treated as one block and sorted with insertion sort. - Array after insertion sort:
{5, 6, 7, 11, 12, 13}
- Sort each block of size
- Merge:
- Since the array is already sorted, no further merging is needed.
Time Complexity Analysis
The time complexity of TimSort mainly depends on the insertion sort and merge processes.
- Best-case time complexity: , when the input array is already partially sorted.
- Worst-case time complexity: , regardless of the initial order of the input array, the number of steps executed is essentially the same.
- Average time complexity: , for most input arrays, TimSort's performance is very stable.
Space Complexity Analysis
TimSort uses auxiliary arrays to merge sub-arrays, so the space complexity is .
Advantages and Disadvantages
Advantages:
- Stable time complexity; suitable for large-scale data sorting.
- Combines the advantages of insertion sort and merge sort; excellent performance on partially sorted data.
- Stable sorting algorithm; does not change the relative order of equal elements.
Disadvantages:
- High implementation complexity; more complex to understand and code than simple sorting algorithms.
- Auxiliary arrays lead to higher space complexity.
Applicable Scenarios
TimSort is suitable for the following scenarios:
- Sorting large datasets.
- When a highly efficient and stable sorting algorithm is needed.
- Excellent performance when data is partially sorted.
Summary
TimSort is an efficient sorting algorithm that combines the advantages of insertion sort and merge sort. It works by performing insertion sort on small blocks of the array, then gradually merging these sorted blocks to complete the sorting. Although its implementation complexity is relatively high, it performs excellently when handling large-scale data and partially sorted data, and is a sorting algorithm with stable time complexity and higher space complexity.
Deeper Understanding of Tim Sort
TimSort's Philosophy and Implementation
TimSort is a hybrid sorting algorithm designed by Tim Peters in 2002 for the Python programming language. It combines the advantages of insertion sort and merge sort, and is particularly suited for handling real-world data, which tends to be partially sorted. TimSort's main idea is to divide the array into multiple small blocks (runs), sort each block with insertion sort, then use merge sort to merge these blocks.
TimSort's Basic Idea
- Divide into blocks: Split the array into multiple small blocks (called runs), each of size
RUN(typically 32 or 64). - Insertion sort: Sort each block using insertion sort.
- Merge: Use merge sort to gradually merge these sorted blocks until the entire array is sorted.
TimSort's Process
-
Block division and insertion sort:
- Split the array into blocks of size
RUN. - Sort each block with insertion sort, because insertion sort is efficient for small-scale data.
- Split the array into blocks of size
-
Merge blocks:
- Use merge sort to gradually merge these sorted blocks.
- Each merge starts with a block size of
RUNand doubles each time, until the entire array is sorted.
TimSort's Detailed Steps
-
Define RUN size:
- Typically set
RUNto 32 or 64, which is an empirical value that works well in most cases.
- Typically set
-
Sort each block with insertion sort:
- Traverse the array from left to right, processing one block of size
RUNat a time. - Sort each block with insertion sort.
- Traverse the array from left to right, processing one block of size
-
Merge sorted blocks:
- Start with a block size of
RUNand gradually merge adjacent blocks. - After each merge, the block size doubles until the entire array is sorted.
- Start with a block size of
TimSort's Pseudocode
def tim_sort(arr):
RUN = 32
n = len(arr)
# Sort each block with insertion sort
for i in range(0, n, RUN):
insertion_sort(arr, i, min((i + RUN - 1), (n - 1)))
# Merge sorted blocks
size = RUN
while size < n:
for left in range(0, n, 2 * size):
mid = min((left + size - 1), (n - 1))
right = min((left + 2 * size - 1), (n - 1))
if mid < right:
merge(arr, left, mid, right)
size = 2 * size
Insertion Sort and Merge Functions
Insertion Sort:
def insertion_sort(arr, left, right):
for i in range(left + 1, right + 1):
key = arr[i]
j = i - 1
while j >= left and arr[j] > key:
arr[j + 1] = arr[j]
j -= 1
arr[j + 1] = key
Merge Function:
def merge(arr, l, m, r):
len1, len2 = m - l + 1, r - m
left, right = [], []
for i in range(0, len1):
left.append(arr[l + i])
for i in range(0, len2):
right.append(arr[m + 1 + i])
i, j, k = 0, 0, l
while i < len1 and j < len2:
if left[i] <= right[j]:
arr[k] = left[i]
i += 1
else:
arr[k] = right[j]
j += 1
k += 1
while i < len1:
arr[k] = left[i]
k += 1
i += 1
while j < len2:
arr[k] = right[j]
k += 1
j += 1
TimSort's Implementation Details
- Block size: Choose an appropriate
RUNsize (typically 32 or 64) to ensure insertion sort efficiency within each block. - Insertion sort: Use insertion sort within each block because it is very efficient for small-scale data.
- Merge sort: Gradually merge sorted blocks, doubling the block size after each merge, ultimately sorting the entire array.
TimSort's Time Complexity
- Best-case time complexity: , when the input array is already partially sorted.
- Worst-case time complexity: , regardless of the initial order of the input array, the number of steps executed is essentially the same.
- Average time complexity: , for most input arrays, TimSort's performance is very stable.
TimSort's Space Complexity
TimSort uses auxiliary arrays to merge sub-arrays, so the space complexity is .
TimSort's Advantages and Disadvantages
Advantages:
- Stable time complexity; suitable for large-scale data sorting.
- Combines the advantages of insertion sort and merge sort; excellent performance on partially sorted data.
- Stable sorting algorithm; does not change the relative order of equal elements.
Disadvantages:
- High implementation complexity; more complex to understand and code than simple sorting algorithms.
- Auxiliary arrays lead to higher space complexity.
TimSort's Applicable Scenarios
TimSort is suitable for the following scenarios:
- Sorting large datasets.
- When a highly efficient and stable sorting algorithm is needed.
- Excellent performance when data is partially sorted.
Summary
TimSort is an efficient sorting algorithm that combines the advantages of insertion sort and merge sort. It works by performing insertion sort on small blocks of the array, then gradually merging these sorted blocks to complete the sorting. Although its implementation complexity is relatively high, it performs excellently when handling large-scale data and partially sorted data, and is a sorting algorithm with stable time complexity and higher space complexity.