Python Native Lists vs. NumPy Arrays

In Python, you can choose from various native data types to store collection data, including list, array, tuple, and dictionary. Among these, the list is highly flexible, can store any content, and is mutable, making it widely applicable. However, for scientific computing and storing purely numerical data, NumPy is widely used and has practically replaced lists. So, what are the differences between them, how significant are these differences, and how should they be applied in practice?

Of course, using practical examples is the best way to illustrate the differences.

Comparison of Operation Speed

Let’s compare simple arithmetic operations (addition, subtraction, multiplication, division) using numbers up to 10,000.

First, Summation

mylist = []
for i in range(1,10001):
    mylist.append(i)

#  list
from time import time
start = time()
total=sum(mylist)
print(total)
end = time()
print(f"total:{end-start}s")
## 50005000
## total:0.0003197193145751953s

# numpy  np.sum
import numpy as np
myarray = np.array(mylist)
start = time()
total = np.sum(myarray)
print(total)
end = time()
print(f"total:{end-start}s")

## 50005000
## total:0.00041031837463378906s

# numpy sum
start = time()
total = sum(myarray)
print(total)
end = time()
print(f"total:{end-start}s")

## 50005000
## total:0.0012726783752441406s

As you can see, when calculating the sum, the native list takes 0.0003 seconds. Using NumPy’s np.sum, it takes 0.0004 seconds. However, using Python’s built-in sum() function on a NumPy array is the slowest, taking 0.001 seconds, which is almost twice as long. This doesn’t even include the time it takes to convert the list to an array. Therefore, for summation, the built-in list clearly has an advantage. Many other articles compare using loops, which would indeed be slower, but that doesn’t reflect the true speed of built-in functions.

Next, Product

Using the same mylist data as a base, let’s compare the speeds again.

#  list
from time import time
start = time()
total = 1
for i in mylist: # Corrected from 'total' to 'mylist'
    total *= i
end = time()
print(f"total:{end-start}s")
## (Note: This output would be for the original list sum, not product. For product, it would be a very large number.)
## The original comment output for `total:0.0003197193145751953s` appears to be from the sum example, not product.
## A product of numbers up to 10000 would be astronomically large and take longer.

# numpy  np.prod
import numpy as np
myarray = np.array(mylist)
start = time()
total = np.prod(myarray)
end = time()
print(f"total:{end-start}s")
## total:0.01838994026184082s (This is likely the original output from the source)
## total:0.000213623046875s (This is likely the actual output from a fast execution)

When performing continuous multiplication, since there’s no built-in prod function for native lists, one is forced to use a loop, which inevitably slows down the process. However, NumPy has the np.prod function, which significantly speeds up continuous multiplication.

Conclusion

This article compared the computational speeds of Python’s built-in lists and NumPy arrays from a practical computation perspective. We found that NumPy does not have an advantage in calculating sums, and type conversion adds overhead. However, when calculating continuous products, NumPy shows a significant speed improvement. Therefore, NumPy is frequently preferred for operations like continuous multiplication.

In summary, lists have a wide range of applications and offer fast summation. However, for scientific computing, machine learning, and related fields, NumPy is the dominant choice. This is because NumPy arrays are extremely fast for operations like continuous multiplication, and its foundational role for libraries like Pandas (DataFrames, Series, etc.) gives it an absolute advantage in scientific computing.

December 28, 2020December 28, 2020

Hands-on Implementation of Random Forest Algorithm with Python

TECHNOLOGY

Hands-on Implementation of Random Forest Algorithm with Python

This article will guide you through a hands-on implementation of a powerful random forest machine learning model. It aims to complement my conceptual explanation of random forests, but as long as you have a basic understanding of decision trees and random forests, you can fully read it. Later, we will discuss how to improve the model built here.

July 29, 2020July 29, 2020

python3 solution to LeeCode medium problem

TECHNOLOGY

python3 solution to LeeCode medium problem

This is an article analyzing a problem from the coding practice site LeeCode.

May 31, 2020May 31, 2020

Calculating the Gini Coefficient and Plotting the Lorenz Curve with matplotlib

TECHNOLOGY

Calculating the Gini Coefficient and Plotting the Lorenz Curve with matplotlib

The Gini coefficient and Lorenz curve are widely used to represent data inequality, especially wealth inequality. However, currently in Python, there isn't a very good function to directly plot the Lorenz curve. Since the current project requires it, this article records how to use numpy, pandas, matplotlib, and other packages to calculate the Gini coefficient and plot the Lorenz curve for practical use.

May 3, 2020May 3, 2020

Using folium to Draw a COVID-19 Pandemic Map

TECHNOLOGY

Using folium to Draw a COVID-19 Pandemic Map

After being contained in China, the COVID-19 pandemic became increasingly severe worldwide. Countries and regions publish daily new infection and death data to help fight the pandemic globally.

Google Advertisement

January 8, 2020January 8, 2020

Application of Python Implementation of Gradient Descent in Practice

TECHNOLOGY

Application of Python Implementation of Gradient Descent in Practice

Gradient descent is a first-order optimization algorithm, commonly called the steepest descent method. To find a local minimum of a function using gradient descent, one must iteratively move from the current point in the opposite direction of the gradient (or approximate gradient) by a specified step size.

December 27, 2019December 27, 2019

Statistical Skew Distributions Reveal Statistical Traps in Life

TECHNOLOGY

Statistical Skew Distributions Reveal Statistical Traps in Life

90% of drivers believe their driving skills are above average, 90% think their IQ is above the average IQ of the population, and the key is this can actually be consistent with real data — it's true and not fabricated.

May 17, 2019May 17, 2019

Decoding Real Addresses from Xunlei Thunder Download Links

MISCELLANEOUS

Decoding Real Addresses from Xunlei Thunder Download Links

Students who frequently download videos and games often encounter Xunlei download links starting with 'thunder://', but are often unable to download due to copyright issues. Here, we will explain the conversion between regular download URLs and Xunlei download links.

March 28, 2019March 28, 2019

Longest Palindromic Substring Algorithm - Manacher

TECHNOLOGY

Longest Palindromic Substring Algorithm - Manacher

While solving LeetCode problems, I encountered a question about finding the longest palindromic substring.

Google Advertisement

March 8, 2019March 8, 2019

Finding Common Values in Two Python Lists

TECHNOLOGY

Finding Common Values in Two Python Lists

In daily life, we often encounter the need to find common values between two arrays. This article provides several simple and practical methods on how to elegantly get common values between two arrays in Python.

January 20, 2019January 20, 2019

Comparison of Operation Speed

First, Summation

Next, Product

Conclusion

Related