Adhika Setya Pramudita

Adhika Setya Pramudita

Collection of thoughts

15 Feb 2025

Sorting Algorithm Time Scaling

Sorting is one of those basic algorithms that new programmer and CS students learn.

Maybe you already see the big-O comparison in Wikipedia, or few sites that visualize how the element in sorting move around like this.

In this notebook, we will compare the actual time comparison of different sorting algorithms.

Generating sample data

First we need to get random list of integers with different size.

1import numpy as np
2
3sizes = [2**i for i in range(4, 18)]
4random_arrays = {}
5for size in sizes:
6    random_arrays[size] = np.random.randint(0, size, size=size)
7
8print("Generated arrays of sizes:", list(random_arrays.keys()))
Generated arrays of sizes: [16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536, 131072]

Sorting library

We will use sortingx library to quickly get multiple different sorting algorithms.

Here we will test 6 different implementations.

1import sortingx as sx
2algorithms = [
3    sx.bubble,
4    sx.shell,
5    sx.heap,
6    sx.insert,
7    sx.quick,
8    sx.merge,
9]
Then we run each algorithm on each array size and store the results.
 1import time
 2import pandas as pd
 3from tqdm import tqdm
 4
 5results = []
 6for size, arr in tqdm(random_arrays.items(), desc=f'Array sizes {size:,}'):
 7    for algo in algorithms:
 8        test_arr = arr.copy()
 9        start_time = time.time()
10        algo(test_arr)
11        end_time = time.time()
12        results.append({
13            'algorithm': algo.__name__,
14            'array_size': size,
15            'time_seconds': end_time - start_time
16        })
17df_results = pd.DataFrame(results)
Array sizes 131072: 100%|██████████| 14/14 [21:01<00:00, 90.09s/it] 
We got the result after 21 mins. Now to plot the result:
 1import matplotlib.pyplot as plt
 2import seaborn as sns
 3
 4plt.figure(figsize=(10, 6))
 5sns.lineplot(data=df_results, x='array_size', y='time_seconds', hue='algorithm', marker='o')
 6
 7plt.title('Sorting Algorithm Performance Comparison')
 8plt.xlabel('Array Size')
 9plt.ylabel('Time (seconds)')
10plt.yscale('log')
11plt.grid(True)
12plt.legend(title='Algorithm')
13
14plt.show()
Cell output

🗒️ Download (notebook.ipynb)

And let’s do more data point but excluding the slowest algorithm.

Show code (31 lines)
 1import numpy as np
 2import sortingx as sx
 3
 4sizes = [2**i for i in range(4, 25)]
 5random_arrays = {}
 6for size in sizes:
 7    random_arrays[size] = np.random.randint(0, size, size=size)
 8algorithms = [
 9    sx.shell,
10    sx.heap,
11    sx.quick,
12    sx.merge,
13]
14import time
15import pandas as pd
16from tqdm import tqdm
17
18results = []
19for size, arr in tqdm(random_arrays.items(), desc=f'Array sizes {size:,}'):
20    for algo in algorithms:
21        test_arr = arr.copy()
22        start_time = time.time()
23        algo(test_arr)
24        end_time = time.time()
25        results.append({
26            'algorithm': algo.__name__,
27            'array_size': size,
28            'time_seconds': end_time - start_time
29        })
30df_results = pd.DataFrame(results)
31print("Generated arrays of sizes:", list(random_arrays.keys()))
Generated arrays of sizes: [16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536, 131072, 262144, 524288, 1048576, 2097152, 4194304, 8388608, 16777216]
And plot it again:
Show code (15 lines)
 1import matplotlib.pyplot as plt
 2import seaborn as sns
 3
 4plt.figure(figsize=(10, 6))
 5sns.lineplot(data=df_results, x='array_size', y='time_seconds', hue='algorithm', marker='o')
 6
 7plt.title('Sorting Algorithm Performance Comparison')
 8plt.xlabel('Array Size')
 9plt.ylabel('Time (seconds)')
10plt.yscale('log')
11plt.grid(True)
12plt.legend(title='Algorithm')
13
14plt.show()
Cell output

🗒️ Download (notebook-2.ipynb)

All the computation is done in my local machine, so result might varies.

Conclusion

Based on the performance comparison graph above, we can draw several key insights:

  1. Algorithm Performance Tiers:

    • Bubble and Insert sort show significantly worse performance, especially for larger arrays
    • Quick, Merge, Heap and Shell sort perform notably better, clustering together at the bottom of the graph
  2. Scaling Behavior:

    • All algorithms show non-linear growth as array size increases
    • Bubble sort scales the worst, with exponential growth
    • The more efficient algorithms (Quick, Merge, Heap, Shell) scale much more gracefully
  3. Practical Implications:

    • For small arrays (< 20,000 elements), the difference between algorithms is less pronounced
    • For large arrays (> 60,000 elements), choosing an efficient algorithm becomes critical
    • Quick, Merge and Heap sort maintain consistent performance even at larger scales
  4. Time Scale:

    • The y-axis logarithmic scale shows differences spanning several orders of magnitude
    • Bubble sort takes ~1000 seconds for large arrays
    • Efficient algorithms stay under 1 second even for the largest arrays tested

This experiment clearly demonstrates why certain sorting algorithms are preferred in practice, particularly for large datasets where performance differences become extremely significant.