Let me refine my previous tutorial on handling HTTP cookies with Python and requests library for you. Open up your favorite web browser (or Postman) and navigate to the login page of the website you want to access. 2. Inspect the network traffic using developer tools or Postman’s “Preview” feature, looking for any cookies that are set when you log in successfully. You can do this by clicking on a request that contains your credentials (username/password) and checking the response headers. Here’s an example:
! [image](https://user-images.githubusercontent.com/37861502/94048127-cbfafe00-00d1-11ea-8bcf-e0f2f1afd20)
In this case, the website sets a cookie named “session” with a value that starts with “sess_”. 3. Copy the name and value of any cookies set by the server during successful authentication. In our example, we’ll use “session=sess_
# This line installs the requests library using pip, which is a package manager for Python.
# The following line imports the requests library, allowing us to use its functions in our script.
import requests
# This line creates a variable named "url" and assigns it the value of the URL we want to make a request to.
url = "https://www.example.com"
# This line creates a variable named "payload" and assigns it a dictionary containing the data we want to send in our request.
payload = {'username': 'example_user', 'password': 'example_password'}
# This line makes a POST request to the URL specified in the "url" variable, passing in the data from the "payload" variable.
response = requests.post(url, data=payload)
# This line checks the status code of the response to ensure the request was successful.
if response.status_code == 200:
# This line prints a success message if the request was successful.
print("Authentication successful!")
# This line creates a variable named "cookies" and assigns it the value of the cookies set by the server during successful authentication.
cookies = response.cookies
# This line prints the name and value of the cookies.
print("Cookies set by server: ", cookies)
else:
# This line prints an error message if the request was not successful.
print("Authentication failed.")
5. Create a new file called `cookies.py`. This will be our recipe for handling cookies in Python! Add the following code to it:
# Import necessary libraries
import requests # Import the requests library to make HTTP requests
from bs4 import BeautifulSoup # Import BeautifulSoup to parse HTML content
# Define login credentials and target URL
username = "your-username" # Replace with your actual username
password = "your-password" # Replace with your actual password
url = "https://example.com/login" # Replace with the actual login URL
# Set up a session object to store cookies for future use
session = requests.Session()
# Define a function to get cookies from the server after successful login
def get_cookies():
# Load login credentials from a file or hardcode them here
with open("credentials.txt", "r") as f:
lines = f.readlines()
username, password = [line.strip() for line in lines]
# Send a POST request with login credentials and get cookies set by the server
response = session.post(url, data={"username": username, "password": password})
# Check if the login was successful (status code 200)
if response.status_code == 200:
# Parse the HTML content and extract any cookies set by the server
soup = BeautifulSoup(response.content, "html.parser")
for cookie in soup("input", attrs={"name": "csrfmiddlewaretoken"}):
csrftoken = cookie["value"] # Extract the CSRF token from the HTML form
# Get the session ID from the response headers and set it as a cookie
session_id = response.cookies['session'].split('=')[-1]
# Return a dictionary of necessary cookies for future requests
return {
"csrfmiddlewaretoken": csrftoken,
"session": f"session={session_id}"
}
# If the login failed (status code != 200), raise an exception and exit the function
else:
print("Login failed!")
raise Exception("Failed to log in.")
# Call the get_cookies() function to load credentials and get cookies from the server
try:
cookies = get_cookies()
except Exception as e:
# Handle any exceptions that might occur (e.g., invalid credentials or network errors)
print(f"Error: {str(e)}")
else:
# Set the session cookie and other necessary cookies for future requests
session.cookies.update(cookies)
# Define a function to make authenticated requests using the session object
def get_page():
url = "https://example.com/protected-content"
response = session.get(url)
# Check if the request was successful (status code 200)
if response.status_code == 200:
return BeautifulSoup(response.content, "html.parser")
# If the request failed (status code != 200), raise an exception and exit the function
else:
print("Request failed!")
raise Exception("Failed to get page.")
# Call the get_page() function to make a protected content request using the session object
try:
soup = get_page()
except Exception as e:
# Handle any exceptions that might occur (e.g., network errors or unauthorized access)
print(f"Error: {str(e)}")
else:
# Parse the HTML content and extract any data you need from it
for link in soup("a"):
if "download" in link["href"]:
download_url = urljoin(url, link["href"]) # Construct the complete download URL
print(f"Download URL: {download_url}") # Print the download URL for the user to access
In this recipe, we first define our login credentials and target URL. We then set up a session object to store cookies for future use. The `get_cookies()` function sends the login request using the session object and gets any necessary cookies from the response headers (including the CSRF token). If the login is successful, it returns a dictionary containing all the necessary cookies.
We then define a helper function called `get_page()`, which uses our session object to make an authenticated request for protected content. This function checks if the request was successful and raises an exception if not.
Finally, we call both functions in a try-except block to handle any exceptions that might occur during runtime. If everything goes well, we extract any data we need from the HTML content using BeautifulSoup.