Python Web Scraping Cookbook
上QQ阅读APP看书,第一时间看更新

How to do it - reading and processing messages

To process the messages, run the 03/process_messages.py program:

import boto3
import botocore
import requests
from bs4 import BeautifulSoup

print("Starting")

# declare our keys (normally, don't hard code this)
access_key = "AKIAIXFTCYO7FEL55TCQ"
access_secret_key = "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"

# create sqs client
sqs = boto3.client('sqs', "us-west-2",
aws_access_key_id = access_key,
aws_secret_access_key = access_secret_key)

print("Created client")

# create / open the SQS queue
queue = sqs.create_queue(QueueName="PlanetMoreInfo")
queue_url = queue["QueueUrl"]
print ("Opened queue: %s" % queue_url)

while True:
print ("Attempting to receive messages")
response = sqs.receive_message(QueueUrl=queue_url,
MaxNumberOfMessages=1,
WaitTimeSeconds=1)
if not 'Messages' in response:
print ("No messages")
continue

message = response['Messages'][0]
receipt_handle = message['ReceiptHandle']
url = message['Body']

# parse the page
html = requests.get(url)
bsobj = BeautifulSoup(html.text, "lxml")

# now find the planet name and albedo info
planet=bsobj.findAll("h1", {"id": "firstHeading"} )[0].text
albedo_node = bsobj.findAll("a", {"href": "/wiki/Geometric_albedo"})[0]
root_albedo = albedo_node.parent
albedo = root_albedo.text.strip()

# delete the message from the queue
sqs.delete_message(
QueueUrl=queue_url,
ReceiptHandle=receipt_handle
)

# print the planets name and albedo info
print("%s: %s" % (planet, albedo))

Run the script using python process_messages.py.  You will see output similar to the following:

Starting
Created client
Opened queue: https://us-west-2.queue.amazonaws.com/414704166289/PlanetMoreInfo
Attempting to receive messages
Jupiter: 0.343 (Bond)
0.52 (geom.)[3]
Attempting to receive messages
Mercury (planet): 0.142 (geom.)[10]
Attempting to receive messages
Uranus: 0.300 (Bond)
0.51 (geom.)[5]
Attempting to receive messages
Neptune: 0.290 (bond)
0.41 (geom.)[4]
Attempting to receive messages
Pluto: 0.49 to 0.66 (geometric, varies by 35%)[1][7]
Attempting to receive messages
Venus: 0.689 (geometric)[2]
Attempting to receive messages
Earth: 0.367 geometric[3]
Attempting to receive messages
Mars: 0.170 (geometric)[8]
0.25 (Bond)[7]
Attempting to receive messages
Saturn: 0.499 (geometric)[4]
Attempting to receive messages
No messages