First off, let’s start with line breaks. Unlike in SQL where you can just cram everything onto one line, in Python it’s important to keep things organized and neat by using line breaks. For example:
# This is an example of bad formatting
# Import the pandas library
import pandas as pd
# Read the csv file and store it in the variable 'data'
data = pd.read_csv('my_dataset.csv')
# Create a new column called 'new_column' and use the str.replace() function to remove any non-alphanumeric characters from the 'old_column'
data['new_column'] = data['old_column'].str.replace(r'[^a-zA-Z0-9]', '', regex=True)
# Here's the same code with better formatting
# Import the pandas library
import pandas as pd
# Read the csv file and store it in the variable 'data'
data = pd.read_csv('my_dataset.csv')
# Create a new column called 'new_column' and use the str.replace() function to remove any non-alphanumeric characters from the 'old_column'
data['new_column'] = data['old_column'].str.replace(r'[^a-zA-Z0-9]', '', regex=True)
As you can see, the second version is much easier to read and understand what’s going on. Now namespaces.In Python, it’s important to keep your code organized by using namespaces for different functions or modules. For example:
# This is an example of bad formatting with no namespace
# The following function calculates the mean of a given dataset
def calculate_mean(data):
# Initialize a variable to store the sum of all values in the dataset
total = 0
# Get the number of values in the dataset
count = len(data)
# Loop through each value in the dataset and add it to the total
for value in data:
total += value
# Calculate the mean by dividing the total by the number of values
mean = total / count
# Return the mean
return mean
# Here's the same code with better formatting and a namespace
# Import the numpy library and assign it an alias "np"
import numpy as np
# The following function calculates the mean of a given dataset
def calculate_mean(data):
# Initialize a variable to store the sum of all values in the dataset
total = 0
# Get the number of values in the dataset
count = len(data)
# Loop through each value in the dataset and add it to the total
for value in data:
total += value
# Calculate the mean by dividing the total by the number of values
mean = total / count
# Return the mean
return mean
As you can see, the second version is much easier to read and understand what’s going on. And if you want to learn more about Python syntax essentials and best practices, I recommend checking out this article series by Tomi Mester: https://www.data36.com/python-syntax-essentials-and-best-practices/. But remember, these are just guidelines! Don’t be afraid to break the rules if it makes your code easier to understand and use. And if you have any questions or comments, feel free to reach out on Twitter @Python_Professor.