Sorting Algorithm Time Scaling
Sorting is one of those basic algorithms that new programmer and CS students learn.
Maybe you already see the big-O comparison in Wikipedia, or few sites that visualize how the element in sorting move around like this.
In this notebook, we will compare the actual time comparison of different sorting algorithms.
Generating sample data
First we need to get random list of integers with different size.
1import numpy as np
2
3sizes = [2**i for i in range(4, 18)]
4random_arrays = {}
5for size in sizes:
6 random_arrays[size] = np.random.randint(0, size, size=size)
7
8print("Generated arrays of sizes:", list(random_arrays.keys()))
Generated arrays of sizes: [16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536, 131072]
Sorting library
We will use sortingx library to quickly get multiple different sorting algorithms.
Here we will test 6 different implementations.
1import sortingx as sx
2algorithms = [
3 sx.bubble,
4 sx.shell,
5 sx.heap,
6 sx.insert,
7 sx.quick,
8 sx.merge,
9]
1import time
2import pandas as pd
3from tqdm import tqdm
4
5results = []
6for size, arr in tqdm(random_arrays.items(), desc=f'Array sizes {size:,}'):
7 for algo in algorithms:
8 test_arr = arr.copy()
9 start_time = time.time()
10 algo(test_arr)
11 end_time = time.time()
12 results.append({
13 'algorithm': algo.__name__,
14 'array_size': size,
15 'time_seconds': end_time - start_time
16 })
17df_results = pd.DataFrame(results)
Array sizes 131072: 100%|██████████| 14/14 [21:01<00:00, 90.09s/it]
1import matplotlib.pyplot as plt
2import seaborn as sns
3
4plt.figure(figsize=(10, 6))
5sns.lineplot(data=df_results, x='array_size', y='time_seconds', hue='algorithm', marker='o')
6
7plt.title('Sorting Algorithm Performance Comparison')
8plt.xlabel('Array Size')
9plt.ylabel('Time (seconds)')
10plt.yscale('log')
11plt.grid(True)
12plt.legend(title='Algorithm')
13
14plt.show()
🗒️ Download (notebook.ipynb)
And let’s do more data point but excluding the slowest algorithm.
Show code (31 lines)
1import numpy as np
2import sortingx as sx
3
4sizes = [2**i for i in range(4, 25)]
5random_arrays = {}
6for size in sizes:
7 random_arrays[size] = np.random.randint(0, size, size=size)
8algorithms = [
9 sx.shell,
10 sx.heap,
11 sx.quick,
12 sx.merge,
13]
14import time
15import pandas as pd
16from tqdm import tqdm
17
18results = []
19for size, arr in tqdm(random_arrays.items(), desc=f'Array sizes {size:,}'):
20 for algo in algorithms:
21 test_arr = arr.copy()
22 start_time = time.time()
23 algo(test_arr)
24 end_time = time.time()
25 results.append({
26 'algorithm': algo.__name__,
27 'array_size': size,
28 'time_seconds': end_time - start_time
29 })
30df_results = pd.DataFrame(results)
31print("Generated arrays of sizes:", list(random_arrays.keys()))
Generated arrays of sizes: [16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536, 131072, 262144, 524288, 1048576, 2097152, 4194304, 8388608, 16777216]
Show code (15 lines)
1import matplotlib.pyplot as plt
2import seaborn as sns
3
4plt.figure(figsize=(10, 6))
5sns.lineplot(data=df_results, x='array_size', y='time_seconds', hue='algorithm', marker='o')
6
7plt.title('Sorting Algorithm Performance Comparison')
8plt.xlabel('Array Size')
9plt.ylabel('Time (seconds)')
10plt.yscale('log')
11plt.grid(True)
12plt.legend(title='Algorithm')
13
14plt.show()
🗒️ Download (notebook-2.ipynb)
All the computation is done in my local machine, so result might varies.
Conclusion
Based on the performance comparison graph above, we can draw several key insights:
Algorithm Performance Tiers:
- Bubble and Insert sort show significantly worse performance, especially for larger arrays
- Quick, Merge, Heap and Shell sort perform notably better, clustering together at the bottom of the graph
Scaling Behavior:
- All algorithms show non-linear growth as array size increases
- Bubble sort scales the worst, with exponential growth
- The more efficient algorithms (Quick, Merge, Heap, Shell) scale much more gracefully
Practical Implications:
- For small arrays (< 20,000 elements), the difference between algorithms is less pronounced
- For large arrays (> 60,000 elements), choosing an efficient algorithm becomes critical
- Quick, Merge and Heap sort maintain consistent performance even at larger scales
Time Scale:
- The y-axis logarithmic scale shows differences spanning several orders of magnitude
- Bubble sort takes ~1000 seconds for large arrays
- Efficient algorithms stay under 1 second even for the largest arrays tested
This experiment clearly demonstrates why certain sorting algorithms are preferred in practice, particularly for large datasets where performance differences become extremely significant.