In Python, you can choose from various native data types to store collection data, including list, array, tuple, and dictionary. Among these, the list is highly flexible, can store any content, and is mutable, making it widely applicable. However, for scientific computing and storing purely numerical data, NumPy is widely used and has practically replaced lists. So, what are the differences between them, how significant are these differences, and how should they be applied in practice?
Of course, using practical examples is the best way to illustrate the differences.
Comparison of Operation Speed
Let’s compare simple arithmetic operations (addition, subtraction, multiplication, division) using numbers up to 10,000.
First, Summation
mylist = []
for i in range(1,10001):
mylist.append(i)
# list
from time import time
start = time()
total=sum(mylist)
print(total)
end = time()
print(f"total:{end-start}s")
## 50005000
## total:0.0003197193145751953s
# numpy np.sum
import numpy as np
myarray = np.array(mylist)
start = time()
total = np.sum(myarray)
print(total)
end = time()
print(f"total:{end-start}s")
## 50005000
## total:0.00041031837463378906s
# numpy sum
start = time()
total = sum(myarray)
print(total)
end = time()
print(f"total:{end-start}s")
## 50005000
## total:0.0012726783752441406s
As you can see, when calculating the sum, the native list takes 0.0003 seconds. Using NumPy’s np.sum
, it takes 0.0004 seconds. However, using Python’s built-in sum()
function on a NumPy array is the slowest, taking 0.001 seconds, which is almost twice as long. This doesn’t even include the time it takes to convert the list to an array. Therefore, for summation, the built-in list
clearly has an advantage. Many other articles compare using loops, which would indeed be slower, but that doesn’t reflect the true speed of built-in functions.
Next, Product
Using the same mylist
data as a base, let’s compare the speeds again.
# list
from time import time
start = time()
total = 1
for i in mylist: # Corrected from 'total' to 'mylist'
total *= i
end = time()
print(f"total:{end-start}s")
## (Note: This output would be for the original list sum, not product. For product, it would be a very large number.)
## The original comment output for `total:0.0003197193145751953s` appears to be from the sum example, not product.
## A product of numbers up to 10000 would be astronomically large and take longer.
# numpy np.prod
import numpy as np
myarray = np.array(mylist)
start = time()
total = np.prod(myarray)
end = time()
print(f"total:{end-start}s")
## total:0.01838994026184082s (This is likely the original output from the source)
## total:0.000213623046875s (This is likely the actual output from a fast execution)
When performing continuous multiplication, since there’s no built-in prod
function for native lists, one is forced to use a loop, which inevitably slows down the process. However, NumPy has the np.prod
function, which significantly speeds up continuous multiplication.
Conclusion
This article compared the computational speeds of Python’s built-in lists and NumPy arrays from a practical computation perspective. We found that NumPy does not have an advantage in calculating sums, and type conversion adds overhead. However, when calculating continuous products, NumPy shows a significant speed improvement. Therefore, NumPy is frequently preferred for operations like continuous multiplication.
In summary, lists have a wide range of applications and offer fast summation. However, for scientific computing, machine learning, and related fields, NumPy is the dominant choice. This is because NumPy arrays are extremely fast for operations like continuous multiplication, and its foundational role for libraries like Pandas (DataFrames, Series, etc.) gives it an absolute advantage in scientific computing.