I’d wanted to make a Reddit bot for a little while, really just as a toy project, and finding the excellent PRAW library for Python (as well as finally having some time off) gave me the kick I needed to get it done. It’s also April Fool’s Day, so I thought I’d make something a bit stupid.
I decided to make a bot that would simulate a conversation with the wonderful racounteur that is DJ Khaled. If you don’t know what I’m talking about, watch this video (the second half is the important bit):
Cool guy, huh? My idea was that the bot would monitor the top X number of threads on /r/all, and look through the comments to see if anyone mentioned our pal Khaled’s name. If so, they’d engage them in the conversation from the video.
If our player got through the conversation, they’d be rewarded with a small amount of money, so they could “go buy your mom a house”. By small I mean small - I’m using ChangeTip to send 5 bits, which at the time of writing is equal to £0.00083. Someone’s mom might not get to move for a little bit longer.
I’m going to dive into how I made the thing, but if your time’s a bit limited, you can see the source code on my GitHub, and here’s a screenshot of what the finished conversation looks like.
{: .center-img}
Implementation
The bot is built in Python, using PRAW, which provides a very nice wrapper for the Reddit API. We start out by initializing the Reddit object and logging in, like so:
r = praw.Reddit('My cool Reddit bot 0.0.1')
r.login()
The string given to the Reddit constructor is the User-Agent field for the Reddit API. The API eschews API tokens and the like - instead, individual users are tracked by the User-Agent string. Reddit are quite strict about applications being honest with this parameter - have a look at their documentation if you’re interested.
The actual crawling of Reddit comments is quite simple using PRAW. Here’s a stripped-down example:
subr = r.get_subreddit('all')
for sub in subr.get_hot(limit=100):
for c in sub.comments:
if isinstance(c, MoreComments):
continue
text = c.body.lower()
if "dj khaled" in text:
c.reply('Say my name, baby.')
This code will fetch the subreddit /r/all. It then goes through the top 100 submissions in that subreddit, and for each one looks through the top comments. If it finds a case-insensitive reference to DJ Khaled in a comment, it replies to it and stops looking through that submission.
The number of comments actually returned is a maximum of 200, but not all of them may be usable. Instead, some of them may be MoreComments
objects - these are used to fetch further comments. We ignore these. Reddit’s API is quite strict with rate limiting - they quote around 30 requests a minute, which is not very many - so it makes sense to only fetch what we need.
Our bot runs in an infinite loop, which looks like this:
while True:
try:
process_queue()
process_new()
except HTTPError:
continue
time.sleep(60)
The bot will run once a minute. An HTTPError
exception will be caught if it fails to connect to Reddit - this could be because Reddit is down, or because their servers are busy (the 503 error we all know and love). If this happens, we simply give up on the current iteration and start again.
The process_new()
method does pretty much what I outlined above, with a few additions:
def process_new():
subr = r.get_subreddit('all')
for sub in subr.get_hot(limit=100):
if sub.permalink in seen:
break
for c in sub.comments:
if isinstance(c, MoreComments):
continue
text = c.body.lower()
if "dj khaled" in text:
try:
newc = c.reply('Say my name, baby.')
except RateLimitExceeded:
break
seen.append(sub.permalink)
with open('./seen.txt', 'a') as s_f:
s_f.write(sub.permalink + '\n')
queue[0].insert(0, newc.permalink)
print 'New comment ' + newc.permalink
break
As well as replying, we add our reply comment to a queue, which is processed by the process_queue()
method. There are actually three queues - one for each stage of the conversation.
queue = [[], [], []]
seen = []
The process_new()
method also keeps track of seen submissions. This ensures that we only start one conversation on each submission - we need to stop people abusing our bot for free money, after all. Even DJ Khaled isn’t infinitely rich. This list is also written to a file, so that it can persist if the bot crashes or needs to be restarted for some reason.
The RateLimitExceeded
exception is thrown if we try to comment too often. How often a Reddit account is allowed to comment depends on the amount of karma it has - to start with, one can only comment once every five minutes. This makes our bot a bit slow, but for our purposes this doesn’t really matter, and hopefully it will speed up if people upvote the stuff it comes out with.
The process_queue()
method looks like this:
def process_queue():
for i, q in enumerate(queue):
for j in range(5):
if len(q) == 0:
break
c = q[-1]
com = r.get_submission(c).comments[0]
for rep in com.replies:
if "dj khaled" in rep.body.lower():
try:
newc = rep.reply(responses[i])
except RateLimitExceeded:
break
queue[i].remove(c)
if i < 2:
queue[i + 1].insert(0, newc.permalink)
print "Repled with comment " + newc.permalink
break
This code will process five items from each queue - this is to keep our number of requests down. The code fetches the comment using PRAW, and checks it to see if any of the replies contain DJ Khaled’s name. If so, we reply to it with the appropriate response, and move the reply comment into the next queue if there is one, removing the old comment from its queue. In this way we gradually move conversations along.
The responses are stored in their own Python file:
responses = [
'Say my _name_, baby!',
'''You smart. You loyal. You grateful. I _appreciate_ that.
Go buy your mom a house.
Go buy your whole family houses.
Put this money in your savings account.
Go spend some money for no reason.
Come back and ask for more.
/u/changetip 5 bits
Baby, let the music take control.
Let we the best sound take control, baby - hold on, say my name?''',
'''S'right. That's right, baby. You remember that.
https://youtu.be/fxPBu_vX9Q0
---
^_[Confused?](http://www.joelotter.com)_'''
]
I did this rather than putting them in a plain text file so as to use Python’s multiline strings - I had some issues with getting newlines to behave properly simply using \n
, which I suspect has to do with Reddit’s Markdown processing.
When we put it all together, the completed bot looks like this:
import time
from praw.errors import RateLimitExceeded
from requests.exceptions import HTTPError
from responses import responses
queue = [[], [], []]
seen = []
r = praw.Reddit('User-Agent string goes here')
def process_queue():
for i, q in enumerate(queue):
for j in range(5):
if len(q) == 0:
break
c = q[-1]
com = r.get_submission(c).comments[0]
for rep in com.replies:
if "dj khaled" in rep.body.lower():
try:
newc = rep.reply(responses[i])
except RateLimitExceeded:
break
queue[i].remove(c)
if i < 2:
queue[i + 1].insert(0, newc.permalink)
print "Repled with comment " + newc.permalink
break
def process_new():
subr = r.get_subreddit('all')
for sub in subr.get_hot(limit=100):
if sub.permalink in seen:
break
for c in sub.comments:
if isinstance(c, MoreComments):
continue
text = c.body.lower()
if "dj khaled" in text:
try:
newc = c.reply('Say my name, baby.')
except RateLimitExceeded:
break
seen.append(sub.permalink)
with open('./seen.txt', 'a') as s_f:
s_f.write(sub.permalink + '\n')
queue[0].insert(0, newc.permalink)
print 'New comment ' + newc.permalink
break
# Initialise
with open('./seen.txt', 'r') as f:
for line in f:
seen.append(line)
r.login()
while True:
try:
process_queue()
process_new()
except HTTPError:
continue
time.sleep(60)
And that’s pretty much it - it doesn’t take much! If you’re thinking of building a Reddit bot, here are some links that might be useful:
- PRAW
- Reddit API docs
- The source code for this bot
- DJ Khaled approving the most powerful servers
- The actual bot on Reddit
Any questions, don’t hesitate to drop a comment in the box below. Go buy your mom a house.