Compare commits

..

No commits in common. "main" and "v0.1" have entirely different histories.

4 changed files with 78 additions and 406 deletions

7
.gitignore vendored
View file

@ -1,7 +0,0 @@
*.conf
*.txt
log/
*.bak
bin/
lib/
*.cfg

View file

@ -1,7 +1,5 @@
# DiscoRSS
![DiscoRSS Logo](https://frzn.dev/~amr/images/discorss.png)
## What is it?
DiscoRSS is a simple Python script to send RSS feeds to Discord webhooks. It was created because existing bots that did this set limits on the number of feeds, and self-hosting stuff is easier and better anyway. To get this working, you will require the following Python libraries:
@ -11,34 +9,24 @@ requests >= 2.4.2
feedparser
```
The remaining imports should all be part of the standard Python install.
The remaining imports should all be part of the standard Python install. To configure the script, create /etc/discorss.conf with the following structure:
## Important Notes
The logger will try and put the logs in `/var/log/discorss`. Make sure to create this directory and give the user running the script write permissions there. If you want the logs to go somewhere else, just edit the log_dir variable near the top of discorss.py. Choose a directory that makes sense. Unfortunately, as far as I know, the XDG standards don't have an equivalent to the /var/log directory in the user directory, so I wasn't sure what the best default was. In the future, we may switch to logging using systemd and journald directly, though it is nice to have a separate file.
## How to setup
Note: see the Automation section below for info about using the `install.sh` script to help get all the files in the right places.
### Config file format
To configure the script, create `~/.config/discorss/discorss.conf` using JSON formatting like this:
```json
```
{
"feeds": [
{
"name": "Phoronix",
"siteurl": "https://www.phoronix.com/",
"name": "phoronix",
"url": "http://www.phoronix.com/rss.php",
"webhook": "webhook url",
"offset": -18000
"webhook": "webhook url"
},
{
"name": "Pagetable",
"siteurl": "https://pagetable.com",
"name": "pagetable",
"url": "https://www.pagetable.com/?feed=rss2",
"webhook": "webhook url"
},
{
"name": "righto",
"url": "https://www.righto.com/feeds/posts/default",
"webhook": "webhook url",
"offset": -18000
}
@ -46,56 +34,6 @@ To configure the script, create `~/.config/discorss/discorss.conf` using JSON fo
}
```
Create a webhook for each feed (unless you want them all to show as the same webhook for whatever reason) and make sure to add it in to the config. I have it set up with a webhook for each site, each with the site's icon and name set for the webhook which makes the messages look really nice.
The offset should only be required if feeds aren't showing up. This is because feedparser, in its infinite wisdom, just ignores the timezone when converting publish dates from feeds. So most feeds end up with an epoch in UTC. The offset should be the number of seconds between your time zone and UTC. This will eventually be fixed in a future update, I just need to sit down and wrangle with feedparser and datetime some more.
The offset should only be required if feeds aren't showing up. This is because feedparser, in its infinite wisdom, just ignores the timezone when converting publish dates from feeds. So most feeds end up with an epoch in UTC. The offset should be the number of seconds between your time zone and UTC. This will eventually be fixed in a future update, I just need to sit down and wrangle with feedparser and datetime some more. All fields are mandatory, if you want to have no offset for example, set it to 0. The name and siteurl are used to create the "author" field in the Discord embed.
## Automation
**New**: There is now `install.sh` in the repo which will automatically help you set up both the config file and the systemd unit files for the service and timer, using essentially the exact text below. It will copy them to the user systemd unit folder, `~/.config/systemd/user` and optionally enable the timer. It's a good idea to edit the configuration file at `~/.config/discorss/discorss.conf` and paste in your webhook URLs and add any other feeds you want before starting the timer, unless you can do it really quickly before the next 5 minute spot on the clock :)
Of course, if it fires with an invalid config, the script will just crash, and you'll probably just have to manually start the timer once the config is fixed, so not a big deal.
_Remember to create `/var/log/discorss` and change it to be writeable by the user running the service!_
### Manual method
To automate feed posting, create a systemd service and timer to execute the script.
Use the command `systemctl --user edit --full --force discorss.service` and then paste in something like this:
```systemd
[Unit]
Description=Discord RSS feeder
Wants=discorss.timer
[Service]
Type=oneshot
TimeoutStartSec=120
ExecStart=/path/to/discorss.py
[Install]
WantedBy=default.target
```
The TimeoutStartSec will catch any issues with the script locking up due to, e.g., DNS failures or RSS feeds being slow/unavailable. 2 minutes should be more than enough time unless you are running hundreds of feeds. Also make sure to edit the ExecStart to point to the correct location. Then we need a systemd timer to automatically fire the script. Run `systemctl --user edit --full --force discorss.timer` and then paste in this:
```systemd
[Unit]
Description=Timer for DiscoRSS
Requires=discorss.service
[Timer]
Unit=discorss.service
OnCalendar=*-*-* *:00,15,30,45:30
AccuracySec=10s
[Install]
WantedBy=timers.target
```
To change how often this fires, edit the OnCalendar parameter. The config above has it firing every 15 minutes at half past the minute. Look at the systemd timer man pages for help if you want to tweak it.
## Contributing
Want to fix something or make a suggestion? Feel free! If you want to send a pull request, you *must* run the Python `black` formatter on the source code before committing. I have this set up in my editor to automatically run every time I save the file, but you could have it run as part of a git hook or something. For non-format stuff, please just follow the code style as best you can. For Python code, I separate multi-word variable names with underscores. So it should be `feed_time`, not `feedTime` or `FeedTime` or `feed-time`. Don't ask me why, but I use camelCase for other languages... but in Python I've switched to underscores.
If you know how and are able to, *please* sign your commits with the `-S` option to `git commit`. This shows that you are the author, especially if others have signed your keys.
To automate feed posting, create a systemd service and timer to execute the script. I will include examples soon.

View file

@ -3,239 +3,86 @@
# SPDX-License-Identifier: MPL-2.0
# SPDX-FileCopyrightText: © 2025 A.M. Rowsell <https://frzn.dev/~amr>
# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
# DiscoRSS: A simple RSS feed reader for Discord. Takes RSS feeds and then sends them to
# webhooks. Intended to run using systemd timers.
import requests
import feedparser
import hashlib
import logging
from pathlib import Path
import json
import datetime
import time
import os
import sys
import argparse
import re
config_file_path = r"/etc/discorss.conf"
# config_file_path = r"discorss.conf"
# log_file_path = r"/var/log/discorss"
log_file_path = r"./log"
log_file_name = r"/app.log"
class Discorss:
def __init__(self):
self.config_dir = os.environ.get("XDG_CONFIG_HOME")
home_dir = Path.home()
if self.config_dir is None:
self.config_file_path = str(home_dir) + "/.config/discorss/discorss.conf"
self.config_dir = str(home_dir) + "/.config/discorss"
else:
self.config_file_path = self.config_dir + r"/discorss/discorss.conf"
self.log_dir = r"/var/log/discorss"
self.log_file_path = r"/app.log"
# Yes, I know you "can't parse HTML with regex", but
# just watch me.
self.html_filter = re.compile(r"\<\/?([A-Za-z0-9 \:\.\-\/\"\=])*\>")
self.success_codes = [200, 201, 202, 203, 204, 205, 206]
self.app_config = {}
# This function gets and formats the brief excerpt that goes in the embed
# Different feeds put summaries in different fields, so we pick the best
# one and limit it to 250 characters.
def get_description(self, feed, length=250, min_length=150, addons=None):
try:
temporary_string = str(feed["summary_detail"]["value"])
temporary_string = self.html_filter.sub("", temporary_string)
while length > min_length:
if temporary_string[length - 1 : length] == " ":
break
else:
length -= 1
except KeyError:
temporary_string = str(feed["description"])
temporary_string = self.html_filter.sub("", temporary_string)
while length > min_length:
if temporary_string[length - 1 : length] == " ":
break
else:
length -= 1
desc = temporary_string[:length]
if addons is not None:
desc = desc + str(addons)
return desc
def setup(self):
os.environ["TZ"] = "America/Toronto"
time.tzset()
self.now = time.mktime(time.localtime())
# Check for log and config files/paths, create empty directories if needed
# TODO: make this cleaner
if not Path(self.log_dir).exists():
print(
"No log file path exists. Yark! We'll try and make {}...".format(
self.log_dir
)
)
try:
Path(self.log_dir).mkdir(parents=True, exist_ok=True)
except FileExistsError:
print(
"The path {} already exists and is not a directory!".format(
self.log_dir
)
)
if not Path(self.config_file_path).exists():
print(
"No config file at {}! Snarf. We'll try and make {}...".format(
self.config_file_path, self.config_dir
)
)
try:
Path(self.config_dir).mkdir(parents=True, exist_ok=True)
except FileExistsError:
print(
"The config dir {} already exists and is not a directory! Please fix manually. Quitting!".format(
self.config_dir
)
)
sys.exit(255)
return
# Loading the config file
with open(self.config_file_path, "r") as config_file:
self.app_config = json.load(config_file)
# Set up logging
self.logger = logging.getLogger(__name__)
logging.basicConfig(
filename=str(self.log_dir + self.log_file_path),
encoding="utf-8",
level=logging.ERROR,
datefmt="%m/%d/%Y %H:%M:%S",
format="%(asctime)s: %(levelname)s: %(message)s",
)
return
def process(self):
self.setup() # Handle the config and log paths
try:
last_check = self.app_config["lastupdate"]
except KeyError:
last_check = (
self.now - 21600
) # first run, no lastupdate, check up to 6 hours ago
for i, hook in enumerate(self.app_config["feeds"]): # Feed loop start
self.logger.debug("Parsing feed %s...", hook["name"])
self.feeds = feedparser.parse(hook["url"])
self.latest_post = []
prev_best = 0
self.logger.debug(
"About to sort through entries for feed %s ...", hook["name"]
)
for feed in self.feeds["entries"]:
try:
bad_time = False
published_time = time.mktime(feed["published_parsed"])
published_time = published_time + hook["offset"]
except KeyError:
published_time = time.mktime(feed["updated_parsed"])
bad_time = True
if published_time > prev_best:
latest_post = feed
prev_best = published_time
else:
continue
if bad_time is True:
self.logger.debug(
"Feed %s doesn't supply a published time, using updated time instead",
hook["name"],
)
# Hash the title and time of the latest post and use that to determine if it's been posted
# Yes, SHA3-512 is totally unnecessary for this purpose, but I love SHA3
self.logger.debug("About to hash %s ...", latest_post["title"])
try:
new_hash = hashlib.sha3_512(
bytes(latest_post["title"] + str(published_time), "utf-8")
).hexdigest()
except TypeError:
self.logger.error("Title of %s isn't hashing correctly", hook["name"])
continue
try:
if hook["lasthash"] != new_hash:
self.app_config["feeds"][i]["lasthash"] = new_hash
else:
continue
except KeyError:
self.app_config["feeds"][i]["lasthash"] = new_hash
self.logger.info(
"Feed %s has no existing hash, likely a new feed!", hook["name"]
)
# Generate the webhook
self.logger.info(
"Publishing webhook for %s. Last check was %d, self.now is %d",
hook["name"],
last_check,
self.now,
)
webhook = {
"embeds": [
{
"title": str(latest_post["title"]),
"url": str(latest_post["link"]),
"color": 2123412,
"footer": {
"text": "DiscoRSS",
"icon_url": "https://frzn.dev/~amr/images/discorss.png",
},
"author": {
"name": str(hook["name"]),
"url": str(hook["siteurl"]),
},
"fields": [
{
"name": "Excerpt from post:",
"value": self.get_description(latest_post),
}
],
# "timestamp": str(self.now),
}
],
"attachments": [],
}
custom_header = {
"user-agent": "DiscoRSS (https://git.frzn.dev/amr/discorss, 0.2)",
"content-type": "application/json",
}
webhook_string = json.dumps(webhook)
self.logger.debug("About to run POST for %s", hook["name"])
r = requests.post(
hook["webhook"], data=webhook_string, headers=custom_header
)
if r.status_code not in self.success_codes:
self.logger.error(
"Error %d while trying to post %s", r.status_code, hook["name"]
)
else:
self.logger.debug("Got %d when posting %s", r.status_code, hook["name"])
# End of feed loop
# Dump updated config back to json file
self.logger.debug("Dumping config back to %s", str(self.config_file_path))
self.app_config["lastupdate"] = self.now
with open(self.config_file_path, "w") as config_file:
json.dump(self.app_config, config_file, indent=4)
return
# end of Discorss class
def getDescription(feed):
try:
tempStr = str(feed.entries[0]["summary_detail"]["value"])
desc = tempStr[:100] if len(tempStr) > 100 else tempStr
except KeyError:
tempStr = str(feed.entries[0]["description"])
desc = tempStr[:100] if len(tempStr) > 100 else tempStr
return desc
def main():
app = Discorss()
app.process()
os.environ["TZ"] = "America/Toronto"
time.tzset()
try:
Path(log_file_path).mkdir(parents=True, exist_ok=True)
except FileExistsError:
print("This path already exists and is not a directory!")
# Load and read the config file
if not Path(config_file_path).exists():
print("No config file! Snarf. Directories were created for you.")
return
with open(config_file_path, "r") as config_file:
app_config = json.load(config_file)
now = time.mktime(time.localtime())
last_check = app_config["lastupdate"]
for hook in app_config["feeds"]:
# Get the feed
feed = feedparser.parse(hook["url"])
published_time = time.mktime(feed.entries[0]["published_parsed"])
published_time = published_time + hook["offset"]
print(feed.entries[0]["published"], published_time, now)
# Generate the webhook
webhook = {
"content": "RSS Feed Update from " + str(hook["name"]),
"embeds": [
{
"title": str(feed.entries[0]["title"]),
"url": str(feed.entries[0]["link"]),
"color": 5814783,
"fields": [
{
"name": str(feed.entries[0]["title"]),
"value": getDescription(feed),
}
],
}
],
"attachments": [],
}
customHeader = {
"user-agent": "DiscoRSS (https://git.frzn.dev/amr/discorss, 0.1)",
"content-type": "application/json",
}
webhookStr = json.dumps(webhook)
print(webhookStr)
if published_time > last_check and published_time < now:
r = requests.post(hook["webhook"], data=webhookStr, headers=customHeader)
app_config["lastupdate"] = now
with open(config_file_path, "w") as config_file:
json.dump(app_config, config_file, indent=4)
return
if __name__ == "__main__":

View file

@ -1,106 +0,0 @@
#!/bin/bash
# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
# This script will set up a basic systemd service and timer for DiscoRSS
# You can optionally edit the entries here before running it, or you can
# use systemctl --user edit --full discorss.service or discorss.timer
# after installing them.
printf "\e[1;34mDisco\e[1;38;5;208mRSS\e[0m Install Helper Script\n\n"
workingDir=$(pwd)
# bail if we're on a non-systemd system, suggest cron
if [[ -d /run/systemd/system ]]; then
printf "systemd detected..."
else
printf "This script and DiscoRSS in general are optimized for systemd! You can use cron as a substitute but I haven't written any documentation for it, so you're on your own for now!"
exit 127 # command not found exit code
fi
printf "Would you like the systemd service and timer files created for you? [y/n]: "
read answer
if [[ "$answer" =~ ^([yY])$ ]]; then
cat << EOF > discorss.service
# Autogenerated by install.sh
[Unit]
Description=Discord RSS feeder
Wants=discorss.timer
[Service]
Type=oneshot
TimeoutStartSec=120
ExecStart=$workingDir/discorss.py
[Install]
WantedBy=default.target
EOF
cat << EOF > discorss.timer
# Autogenerated by install.sh
[Unit]
Description=Timer for DiscoRSS
Requires=discorss.service
[Timer]
Unit=discorss.service
OnCalendar=*:0/5:00
AccuracySec=1s
[Install]
WantedBy=timers.target
EOF
printf "Making ~/.config/systemd/user in case it doesn't exist ...\n"
mkdir -p -v ~/.config/systemd/user/
printf "Copying service and timer files there ... \n"
cp discorss.service ~/.config/systemd/user/
cp discorss.timer ~/.config/systemd/user/
rm -f discorss.service
rm -f discorss.timer
printf "Reloading systemd daemon ... \n\n"
systemctl --user daemon-reload
else
printf "This script is intended to be automatically run. It's designed with systemd in mind, but you are free to use any automation tools. You can look at this script for examples of how to structure systemd user services and timers.\nOf course, you could always run it by hand, if you really want to :)\n\n"
fi
printf "Would you like a basic example config created for you? [y/n]: "
read answer1
if [[ "$answer1" =~ ^([yY])$ ]]; then
mkdir -p -v ~/.config/discorss
cat << EOF > ~/.config/discorss/discorss.conf
{
"feeds": [
{
"name": "Phoronix",
"siteurl": "https://www.phoronix.com/",
"url": "http://www.phoronix.com/rss.php",
"webhook": "PASTE WEBHOOK URL HERE",
"offset": 0
}
]
}
EOF
printf "\nMake sure to edit \e[1;34m~/.config/discorss/discorss.conf\e[0m and add in your custom feeds and webhook URLS! The script will just error out if you don't do this."
else
printf "\nMake sure to create a config at \e[1;34m~/.config/discorss/discorss.conf\e[0m and follow the pattern shown in the README."
fi
printf "\nWould you like to have the timer enabled and started now? [y/n]: "
read answer
if [[ "$answer" =~ ^([yY])$ ]]; then
systemctl --user enable --now discorss.timer
printf "\ndiscorss.timer enabled and started. \e[1;31mDon't enable or start discorss.service\e[0m -- the timer does this automatically."
else
printf "\nDon't forget to run \e[1;32msystemctl --user enable --now discorss.timer\e[0m when you are ready! \e[1;31mDon't enable or start discorss.service\e[0m -- the timer does this automatically."
fi
printf "\n\nYou should be almost ready to go! Double-check your config files, and check \e[1;32msystemctl --user list-timers\e[0m once the discorss.timer is enabled to see when it will fire next. The default is every 5 minutes."
printf "\nRemember, if you need help or encounter any bugs, contact me via the issues tracker on the git repository where you got this from!\n"