diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..fed12ea --- /dev/null +++ b/.gitignore @@ -0,0 +1,7 @@ +*.conf +*.txt +log/ +*.bak +bin/ +lib/ +*.cfg \ No newline at end of file diff --git a/README.md b/README.md index 17134ec..48148d4 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,7 @@ # DiscoRSS +![DiscoRSS Logo](https://frzn.dev/~amr/images/discorss.png) + ## What is it? DiscoRSS is a simple Python script to send RSS feeds to Discord webhooks. It was created because existing bots that did this set limits on the number of feeds, and self-hosting stuff is easier and better anyway. To get this working, you will require the following Python libraries: @@ -9,24 +11,34 @@ requests >= 2.4.2 feedparser ``` -The remaining imports should all be part of the standard Python install. To configure the script, create /etc/discorss.conf with the following structure: +The remaining imports should all be part of the standard Python install. -``` +## Important Notes + +The logger will try and put the logs in `/var/log/discorss`. Make sure to create this directory and give the user running the script write permissions there. If you want the logs to go somewhere else, just edit the log_dir variable near the top of discorss.py. Choose a directory that makes sense. Unfortunately, as far as I know, the XDG standards don't have an equivalent to the /var/log directory in the user directory, so I wasn't sure what the best default was. In the future, we may switch to logging using systemd and journald directly, though it is nice to have a separate file. + +## How to setup + +Note: see the Automation section below for info about using the `install.sh` script to help get all the files in the right places. + +### Config file format + +To configure the script, create `~/.config/discorss/discorss.conf` using JSON formatting like this: + +```json { "feeds": [ { - "name": "phoronix", + "name": "Phoronix", + "siteurl": "https://www.phoronix.com/", "url": "http://www.phoronix.com/rss.php", - "webhook": "webhook url" + "webhook": "webhook url", + "offset": -18000 }, { - "name": "pagetable", + "name": "Pagetable", + "siteurl": "https://pagetable.com", "url": "https://www.pagetable.com/?feed=rss2", - "webhook": "webhook url" - }, - { - "name": "righto", - "url": "https://www.righto.com/feeds/posts/default", "webhook": "webhook url", "offset": -18000 } @@ -34,6 +46,56 @@ The remaining imports should all be part of the standard Python install. To conf } ``` -The offset should only be required if feeds aren't showing up. This is because feedparser, in its infinite wisdom, just ignores the timezone when converting publish dates from feeds. So most feeds end up with an epoch in UTC. The offset should be the number of seconds between your time zone and UTC. This will eventually be fixed in a future update, I just need to sit down and wrangle with feedparser and datetime some more. +Create a webhook for each feed (unless you want them all to show as the same webhook for whatever reason) and make sure to add it in to the config. I have it set up with a webhook for each site, each with the site's icon and name set for the webhook which makes the messages look really nice. -To automate feed posting, create a systemd service and timer to execute the script. I will include examples soon. +The offset should only be required if feeds aren't showing up. This is because feedparser, in its infinite wisdom, just ignores the timezone when converting publish dates from feeds. So most feeds end up with an epoch in UTC. The offset should be the number of seconds between your time zone and UTC. This will eventually be fixed in a future update, I just need to sit down and wrangle with feedparser and datetime some more. All fields are mandatory, if you want to have no offset for example, set it to 0. The name and siteurl are used to create the "author" field in the Discord embed. + +## Automation + +**New**: There is now `install.sh` in the repo which will automatically help you set up both the config file and the systemd unit files for the service and timer, using essentially the exact text below. It will copy them to the user systemd unit folder, `~/.config/systemd/user` and optionally enable the timer. It's a good idea to edit the configuration file at `~/.config/discorss/discorss.conf` and paste in your webhook URLs and add any other feeds you want before starting the timer, unless you can do it really quickly before the next 5 minute spot on the clock :) +Of course, if it fires with an invalid config, the script will just crash, and you'll probably just have to manually start the timer once the config is fixed, so not a big deal. + +_Remember to create `/var/log/discorss` and change it to be writeable by the user running the service!_ + +### Manual method + +To automate feed posting, create a systemd service and timer to execute the script. + +Use the command `systemctl --user edit --full --force discorss.service` and then paste in something like this: + +```systemd +[Unit] +Description=Discord RSS feeder +Wants=discorss.timer + +[Service] +Type=oneshot +TimeoutStartSec=120 +ExecStart=/path/to/discorss.py + +[Install] +WantedBy=default.target +``` + +The TimeoutStartSec will catch any issues with the script locking up due to, e.g., DNS failures or RSS feeds being slow/unavailable. 2 minutes should be more than enough time unless you are running hundreds of feeds. Also make sure to edit the ExecStart to point to the correct location. Then we need a systemd timer to automatically fire the script. Run `systemctl --user edit --full --force discorss.timer` and then paste in this: +```systemd +[Unit] +Description=Timer for DiscoRSS +Requires=discorss.service + +[Timer] +Unit=discorss.service +OnCalendar=*-*-* *:00,15,30,45:30 +AccuracySec=10s + +[Install] +WantedBy=timers.target +``` + +To change how often this fires, edit the OnCalendar parameter. The config above has it firing every 15 minutes at half past the minute. Look at the systemd timer man pages for help if you want to tweak it. + +## Contributing + +Want to fix something or make a suggestion? Feel free! If you want to send a pull request, you *must* run the Python `black` formatter on the source code before committing. I have this set up in my editor to automatically run every time I save the file, but you could have it run as part of a git hook or something. For non-format stuff, please just follow the code style as best you can. For Python code, I separate multi-word variable names with underscores. So it should be `feed_time`, not `feedTime` or `FeedTime` or `feed-time`. Don't ask me why, but I use camelCase for other languages... but in Python I've switched to underscores. + +If you know how and are able to, *please* sign your commits with the `-S` option to `git commit`. This shows that you are the author, especially if others have signed your keys. diff --git a/discorss.py b/discorss.py index 07b80a2..9d001ce 100755 --- a/discorss.py +++ b/discorss.py @@ -3,86 +3,239 @@ # SPDX-License-Identifier: MPL-2.0 # SPDX-FileCopyrightText: © 2025 A.M. Rowsell +# This Source Code Form is subject to the terms of the Mozilla Public +# License, v. 2.0. If a copy of the MPL was not distributed with this +# file, You can obtain one at http://mozilla.org/MPL/2.0/. + # DiscoRSS: A simple RSS feed reader for Discord. Takes RSS feeds and then sends them to # webhooks. Intended to run using systemd timers. import requests import feedparser +import hashlib +import logging from pathlib import Path import json -import datetime import time import os - -config_file_path = r"/etc/discorss.conf" -# config_file_path = r"discorss.conf" -# log_file_path = r"/var/log/discorss" -log_file_path = r"./log" -log_file_name = r"/app.log" +import sys +import argparse +import re -def getDescription(feed): - try: - tempStr = str(feed.entries[0]["summary_detail"]["value"]) - desc = tempStr[:100] if len(tempStr) > 100 else tempStr - except KeyError: - tempStr = str(feed.entries[0]["description"]) - desc = tempStr[:100] if len(tempStr) > 100 else tempStr - return desc +class Discorss: + def __init__(self): + self.config_dir = os.environ.get("XDG_CONFIG_HOME") + home_dir = Path.home() + if self.config_dir is None: + self.config_file_path = str(home_dir) + "/.config/discorss/discorss.conf" + self.config_dir = str(home_dir) + "/.config/discorss" + else: + self.config_file_path = self.config_dir + r"/discorss/discorss.conf" + self.log_dir = r"/var/log/discorss" + self.log_file_path = r"/app.log" + # Yes, I know you "can't parse HTML with regex", but + # just watch me. + self.html_filter = re.compile(r"\<\/?([A-Za-z0-9 \:\.\-\/\"\=])*\>") + self.success_codes = [200, 201, 202, 203, 204, 205, 206] + self.app_config = {} + + # This function gets and formats the brief excerpt that goes in the embed + # Different feeds put summaries in different fields, so we pick the best + # one and limit it to 250 characters. + def get_description(self, feed, length=250, min_length=150, addons=None): + try: + temporary_string = str(feed["summary_detail"]["value"]) + temporary_string = self.html_filter.sub("", temporary_string) + while length > min_length: + if temporary_string[length - 1 : length] == " ": + break + else: + length -= 1 + except KeyError: + temporary_string = str(feed["description"]) + temporary_string = self.html_filter.sub("", temporary_string) + while length > min_length: + if temporary_string[length - 1 : length] == " ": + break + else: + length -= 1 + + desc = temporary_string[:length] + if addons is not None: + desc = desc + str(addons) + return desc + + def setup(self): + os.environ["TZ"] = "America/Toronto" + time.tzset() + self.now = time.mktime(time.localtime()) + # Check for log and config files/paths, create empty directories if needed + # TODO: make this cleaner + if not Path(self.log_dir).exists(): + print( + "No log file path exists. Yark! We'll try and make {}...".format( + self.log_dir + ) + ) + try: + Path(self.log_dir).mkdir(parents=True, exist_ok=True) + except FileExistsError: + print( + "The path {} already exists and is not a directory!".format( + self.log_dir + ) + ) + if not Path(self.config_file_path).exists(): + print( + "No config file at {}! Snarf. We'll try and make {}...".format( + self.config_file_path, self.config_dir + ) + ) + try: + Path(self.config_dir).mkdir(parents=True, exist_ok=True) + except FileExistsError: + print( + "The config dir {} already exists and is not a directory! Please fix manually. Quitting!".format( + self.config_dir + ) + ) + sys.exit(255) + return + # Loading the config file + with open(self.config_file_path, "r") as config_file: + self.app_config = json.load(config_file) + # Set up logging + self.logger = logging.getLogger(__name__) + logging.basicConfig( + filename=str(self.log_dir + self.log_file_path), + encoding="utf-8", + level=logging.ERROR, + datefmt="%m/%d/%Y %H:%M:%S", + format="%(asctime)s: %(levelname)s: %(message)s", + ) + return + + def process(self): + self.setup() # Handle the config and log paths + try: + last_check = self.app_config["lastupdate"] + except KeyError: + last_check = ( + self.now - 21600 + ) # first run, no lastupdate, check up to 6 hours ago + for i, hook in enumerate(self.app_config["feeds"]): # Feed loop start + self.logger.debug("Parsing feed %s...", hook["name"]) + self.feeds = feedparser.parse(hook["url"]) + self.latest_post = [] + prev_best = 0 + self.logger.debug( + "About to sort through entries for feed %s ...", hook["name"] + ) + for feed in self.feeds["entries"]: + try: + bad_time = False + published_time = time.mktime(feed["published_parsed"]) + published_time = published_time + hook["offset"] + except KeyError: + published_time = time.mktime(feed["updated_parsed"]) + bad_time = True + if published_time > prev_best: + latest_post = feed + prev_best = published_time + else: + continue + if bad_time is True: + self.logger.debug( + "Feed %s doesn't supply a published time, using updated time instead", + hook["name"], + ) + # Hash the title and time of the latest post and use that to determine if it's been posted + # Yes, SHA3-512 is totally unnecessary for this purpose, but I love SHA3 + self.logger.debug("About to hash %s ...", latest_post["title"]) + try: + new_hash = hashlib.sha3_512( + bytes(latest_post["title"] + str(published_time), "utf-8") + ).hexdigest() + except TypeError: + self.logger.error("Title of %s isn't hashing correctly", hook["name"]) + continue + try: + if hook["lasthash"] != new_hash: + self.app_config["feeds"][i]["lasthash"] = new_hash + else: + continue + except KeyError: + self.app_config["feeds"][i]["lasthash"] = new_hash + self.logger.info( + "Feed %s has no existing hash, likely a new feed!", hook["name"] + ) + # Generate the webhook + self.logger.info( + "Publishing webhook for %s. Last check was %d, self.now is %d", + hook["name"], + last_check, + self.now, + ) + webhook = { + "embeds": [ + { + "title": str(latest_post["title"]), + "url": str(latest_post["link"]), + "color": 2123412, + "footer": { + "text": "DiscoRSS", + "icon_url": "https://frzn.dev/~amr/images/discorss.png", + }, + "author": { + "name": str(hook["name"]), + "url": str(hook["siteurl"]), + }, + "fields": [ + { + "name": "Excerpt from post:", + "value": self.get_description(latest_post), + } + ], + # "timestamp": str(self.now), + } + ], + "attachments": [], + } + custom_header = { + "user-agent": "DiscoRSS (https://git.frzn.dev/amr/discorss, 0.2)", + "content-type": "application/json", + } + webhook_string = json.dumps(webhook) + + self.logger.debug("About to run POST for %s", hook["name"]) + r = requests.post( + hook["webhook"], data=webhook_string, headers=custom_header + ) + if r.status_code not in self.success_codes: + self.logger.error( + "Error %d while trying to post %s", r.status_code, hook["name"] + ) + else: + self.logger.debug("Got %d when posting %s", r.status_code, hook["name"]) + + # End of feed loop + + # Dump updated config back to json file + self.logger.debug("Dumping config back to %s", str(self.config_file_path)) + self.app_config["lastupdate"] = self.now + with open(self.config_file_path, "w") as config_file: + json.dump(self.app_config, config_file, indent=4) + + return + + +# end of Discorss class def main(): - os.environ["TZ"] = "America/Toronto" - time.tzset() - try: - Path(log_file_path).mkdir(parents=True, exist_ok=True) - except FileExistsError: - print("This path already exists and is not a directory!") - # Load and read the config file - if not Path(config_file_path).exists(): - print("No config file! Snarf. Directories were created for you.") - return - with open(config_file_path, "r") as config_file: - app_config = json.load(config_file) - now = time.mktime(time.localtime()) - last_check = app_config["lastupdate"] - for hook in app_config["feeds"]: - # Get the feed - feed = feedparser.parse(hook["url"]) - published_time = time.mktime(feed.entries[0]["published_parsed"]) - published_time = published_time + hook["offset"] - print(feed.entries[0]["published"], published_time, now) - # Generate the webhook - webhook = { - "content": "RSS Feed Update from " + str(hook["name"]), - "embeds": [ - { - "title": str(feed.entries[0]["title"]), - "url": str(feed.entries[0]["link"]), - "color": 5814783, - "fields": [ - { - "name": str(feed.entries[0]["title"]), - "value": getDescription(feed), - } - ], - } - ], - "attachments": [], - } - customHeader = { - "user-agent": "DiscoRSS (https://git.frzn.dev/amr/discorss, 0.1)", - "content-type": "application/json", - } - webhookStr = json.dumps(webhook) - print(webhookStr) - if published_time > last_check and published_time < now: - r = requests.post(hook["webhook"], data=webhookStr, headers=customHeader) - app_config["lastupdate"] = now - with open(config_file_path, "w") as config_file: - json.dump(app_config, config_file, indent=4) - - return + app = Discorss() + app.process() if __name__ == "__main__": diff --git a/install.sh b/install.sh new file mode 100755 index 0000000..82a5fab --- /dev/null +++ b/install.sh @@ -0,0 +1,106 @@ +#!/bin/bash + +# This Source Code Form is subject to the terms of the Mozilla Public +# License, v. 2.0. If a copy of the MPL was not distributed with this +# file, You can obtain one at http://mozilla.org/MPL/2.0/. + +# This script will set up a basic systemd service and timer for DiscoRSS +# You can optionally edit the entries here before running it, or you can +# use systemctl --user edit --full discorss.service or discorss.timer +# after installing them. + +printf "\e[1;34mDisco\e[1;38;5;208mRSS\e[0m Install Helper Script\n\n" + +workingDir=$(pwd) + +# bail if we're on a non-systemd system, suggest cron +if [[ -d /run/systemd/system ]]; then + printf "systemd detected..." +else + printf "This script and DiscoRSS in general are optimized for systemd! You can use cron as a substitute but I haven't written any documentation for it, so you're on your own for now!" + exit 127 # command not found exit code +fi + +printf "Would you like the systemd service and timer files created for you? [y/n]: " +read answer +if [[ "$answer" =~ ^([yY])$ ]]; then + + cat << EOF > discorss.service +# Autogenerated by install.sh +[Unit] +Description=Discord RSS feeder +Wants=discorss.timer + +[Service] +Type=oneshot +TimeoutStartSec=120 +ExecStart=$workingDir/discorss.py + +[Install] +WantedBy=default.target + +EOF + + cat << EOF > discorss.timer +# Autogenerated by install.sh +[Unit] +Description=Timer for DiscoRSS +Requires=discorss.service + +[Timer] +Unit=discorss.service +OnCalendar=*:0/5:00 +AccuracySec=1s + +[Install] +WantedBy=timers.target + +EOF + + printf "Making ~/.config/systemd/user in case it doesn't exist ...\n" + mkdir -p -v ~/.config/systemd/user/ + printf "Copying service and timer files there ... \n" + cp discorss.service ~/.config/systemd/user/ + cp discorss.timer ~/.config/systemd/user/ + rm -f discorss.service + rm -f discorss.timer + printf "Reloading systemd daemon ... \n\n" + systemctl --user daemon-reload +else + printf "This script is intended to be automatically run. It's designed with systemd in mind, but you are free to use any automation tools. You can look at this script for examples of how to structure systemd user services and timers.\nOf course, you could always run it by hand, if you really want to :)\n\n" +fi + +printf "Would you like a basic example config created for you? [y/n]: " +read answer1 +if [[ "$answer1" =~ ^([yY])$ ]]; then + mkdir -p -v ~/.config/discorss + cat << EOF > ~/.config/discorss/discorss.conf +{ + "feeds": [ + { + "name": "Phoronix", + "siteurl": "https://www.phoronix.com/", + "url": "http://www.phoronix.com/rss.php", + "webhook": "PASTE WEBHOOK URL HERE", + "offset": 0 + } + ] +} +EOF + printf "\nMake sure to edit \e[1;34m~/.config/discorss/discorss.conf\e[0m and add in your custom feeds and webhook URLS! The script will just error out if you don't do this." +else + printf "\nMake sure to create a config at \e[1;34m~/.config/discorss/discorss.conf\e[0m and follow the pattern shown in the README." +fi + +printf "\nWould you like to have the timer enabled and started now? [y/n]: " +read answer +if [[ "$answer" =~ ^([yY])$ ]]; then + systemctl --user enable --now discorss.timer + printf "\ndiscorss.timer enabled and started. \e[1;31mDon't enable or start discorss.service\e[0m -- the timer does this automatically." +else + printf "\nDon't forget to run \e[1;32msystemctl --user enable --now discorss.timer\e[0m when you are ready! \e[1;31mDon't enable or start discorss.service\e[0m -- the timer does this automatically." +fi + +printf "\n\nYou should be almost ready to go! Double-check your config files, and check \e[1;32msystemctl --user list-timers\e[0m once the discorss.timer is enabled to see when it will fire next. The default is every 5 minutes." + +printf "\nRemember, if you need help or encounter any bugs, contact me via the issues tracker on the git repository where you got this from!\n"