Python Scripting for Data Analysis

Instead of manually crunching numbers or running commands in a terminal window, you can write scripts that do all the heavy lifting for you.
For example, let’s say you have a dataset with thousands of rows and columns. You want to filter out certain values based on specific criteria (like only keeping data from 2019) and then calculate some statistics like mean, median, and standard deviation. Instead of doing this manually for each column, you can write a script that does it automatically for all the columns at once!
Here’s an example script:

# Importing Pandas library to read CSV file
import pandas as pd 

# Reading data from CSV file and storing in a variable called 'df'
df = pd.read_csv('data.csv') 

# Filtering out rows where year is not equal to 2019 and storing in a new variable called 'filtered_data'
filtered_data = df[df['year'] == 2019] 

# Calculating mean values for all columns in the filtered data and storing in a variable called 'mean_values'
mean_values = filtered_data.mean() 

# Calculating median values for all columns in the filtered data and storing in a variable called 'median_values'
median_values = filtered_data.median() 

# Calculating standard deviation values for all columns in the filtered data and storing in a variable called 'std_deviation_values'
std_deviation_values = filtered_data.std() 

# Printing mean values to console
print(mean_values) 

# Printing median values to console
print(median_values) 

# Printing standard deviation values to console
print(std_deviation_values)

In this script, we first import the Pandas library (which is a popular data analysis tool in Python). Then we read our CSV file into a variable called ‘df’. Next, we filter out rows where year is not equal to 2019 and store them in a new variable called ‘filtered_data’. Finally, we calculate mean, median, and standard deviation values for all columns in the filtered data using the `mean()`, `median()`, and `std()` functions.
This script can be run multiple times with different CSV files or even automated to run at specific intervals (like daily) without any manual intervention!

SICORPS