Merge branch 'main' into nix

This commit is contained in:
A.M. Rowsell 2025-04-24 01:54:17 -04:00
commit 63115339f8
Signed by untrusted user who does not match committer: amr
GPG key ID: 0B6E2D8375CF79A9
3 changed files with 321 additions and 186 deletions

View file

@ -15,15 +15,17 @@ The remaining imports should all be part of the standard Python install.
## Important Notes
As it currently is written, the script uses the hash of the post title to prevent sending duplicates. However, a recent change to check for the publish time was added, only because some feeds are not in reverse chronological order (latest post at top of feed, ie, entry index 0). Because of this, we do actually need to check the publish times. This still needs some testing and things might be a bit broken because of it. If you see any issues please let me know.
Logging was recently enabled. Make sure that the user running the script (especially when using systemd timers) has write access to the /var/log/discorss directory. The app will try and create the directory for you, but if your user doesn't have permissions to create directories in /var/log this will fail and this will probably crash the script as is. I will try and remember to catch that exception and exit gracefully with an error message to stdout. If you want the logs to go somewhere else, just edit the log_dir variable near the top of discorss.py. Choose a directory that makes sense. Unfortunately, as far as I know, the XDG standards don't have an equivalent to the /var/log directory in the user directory, so I wasn't sure what the best default was. In the future, we may switch to logging using systemd and journald directly, though it is nice to have a separate file.
The logger will try and put the logs in `/var/log/discorss`. Make sure to create this directory and give the user running the script write permissions there. If you want the logs to go somewhere else, just edit the log_dir variable near the top of discorss.py. Choose a directory that makes sense. Unfortunately, as far as I know, the XDG standards don't have an equivalent to the /var/log directory in the user directory, so I wasn't sure what the best default was. In the future, we may switch to logging using systemd and journald directly, though it is nice to have a separate file.
## How to setup
To configure the script, create ~/.config/discorss/discorss.conf with the following structure:
Note: see the Automation section below for info about using the `install.sh` script to help get all the files in the right places.
```
### Config file format
To configure the script, create `~/.config/discorss/discorss.conf` using JSON formatting like this:
```json
{
"feeds": [
{
@ -50,25 +52,33 @@ The offset should only be required if feeds aren't showing up. This is because f
## Automation
**New**: There is now `install.sh` in the repo which will automatically help you set up both the config file and the systemd unit files for the service and timer, using essentially the exact text below. It will copy them to the user systemd unit folder, `~/.config/systemd/user` and optionally enable the timer. It's a good idea to edit the configuration file at `~/.config/discorss/discorss.conf` and paste in your webhook URLs and add any other feeds you want before starting the timer, unless you can do it really quickly before the next 5 minute spot on the clock :)
Of course, if it fires with an invalid config, the script will just crash, and you'll probably just have to manually start the timer once the config is fixed, so not a big deal.
_Remember to create `/var/log/discorss` and change it to be writeable by the user running the service!_
### Manual method
To automate feed posting, create a systemd service and timer to execute the script.
Use the command `systemctl --user edit --full --force discorss.service` and then paste in something like this:
```
```systemd
[Unit]
Description=Discord RSS feeder
Wants=discorss.timer
[Service]
Type=oneshot
TimeoutStartSec=120
ExecStart=/path/to/discorss.py
[Install]
WantedBy=default.target
```
Make sure to edit the ExecStart to point to the correct location. Then we need a systemd timer to automatically fire the script. Run `systemctl --user edit --full --force discorss.timer` and then paste in this:
```
The TimeoutStartSec will catch any issues with the script locking up due to, e.g., DNS failures or RSS feeds being slow/unavailable. 2 minutes should be more than enough time unless you are running hundreds of feeds. Also make sure to edit the ExecStart to point to the correct location. Then we need a systemd timer to automatically fire the script. Run `systemctl --user edit --full --force discorss.timer` and then paste in this:
```systemd
[Unit]
Description=Timer for DiscoRSS
Requires=discorss.service

View file

@ -19,196 +19,223 @@ import json
import time
import os
import sys
import argparse
import re
config_dir = os.environ.get("XDG_CONFIG_HOME")
home_dir = Path.home()
if config_dir is None:
config_file_path = str(home_dir) + "/.config/discorss/discorss.conf"
config_dir = str(home_dir) + "/.config/discorss"
else:
config_file_path = config_dir + r"/discorss/discorss.conf"
log_dir = r"/var/log/discorss"
log_file_path = r"/app.log"
# Yes, I know you "can't parse HTML with regex", but
# just watch me.
html_filter = re.compile(r"\<\/?([A-Za-z0-9 \:\.\-\/\"\=])*\>")
success_codes = [200, 201, 202, 203, 204, 205, 206]
app_config = {}
# IDEA: Consider making this into a class-based program
# This would solve a couple issues around global variables and generally
# make things a bit neater
class Discorss:
def __init__(self):
self.config_dir = os.environ.get("XDG_CONFIG_HOME")
home_dir = Path.home()
if self.config_dir is None:
self.config_file_path = str(home_dir) + "/.config/discorss/discorss.conf"
self.config_dir = str(home_dir) + "/.config/discorss"
else:
self.config_file_path = self.config_dir + r"/discorss/discorss.conf"
self.log_dir = r"/var/log/discorss"
self.log_file_path = r"/app.log"
# Yes, I know you "can't parse HTML with regex", but
# just watch me.
self.html_filter = re.compile(r"\<\/?([A-Za-z0-9 \:\.\-\/\"\=])*\>")
self.success_codes = [200, 201, 202, 203, 204, 205, 206]
self.app_config = {}
# This function gets and formats the brief excerpt that goes in the embed
# Different feeds put summaries in different fields, so we pick the best
# one and limit it to 250 characters.
def get_description(feed, length=250, min_length=150, addons=None):
try:
temporary_string = str(feed["summary_detail"]["value"])
temporary_string = html_filter.sub("", temporary_string)
while length > min_length:
if temporary_string[length - 1 : length] == " ":
break
else:
length -= 1
except KeyError:
temporary_string = str(feed["description"])
temporary_string = html_filter.sub("", temporary_string)
while length > min_length:
if temporary_string[length - 1 : length] == " ":
break
else:
length -= 1
desc = temporary_string[:length]
if addons is not None:
desc = desc + str(addons)
return desc
def setupPaths():
global app_config
global logger
# Check for log and config files/paths, create empty directories if needed
# TODO: make this cleaner
if not Path(log_dir).exists():
print("No log file path exists. Yark! We'll try and make {}...".format(log_dir))
# This function gets and formats the brief excerpt that goes in the embed
# Different feeds put summaries in different fields, so we pick the best
# one and limit it to 250 characters.
def get_description(self, feed, length=250, min_length=150, addons=None):
try:
Path(log_dir).mkdir(parents=True, exist_ok=True)
except FileExistsError:
print("The path {} already exists and is not a directory!".format(log_dir))
if not Path(config_file_path).exists():
print(
"No config file at {}! Snarf. We'll try and make {}...".format(
config_file_path, config_dir
)
)
try:
Path(config_dir).mkdir(parents=True, exist_ok=True)
except FileExistsError:
temporary_string = str(feed["summary_detail"]["value"])
temporary_string = self.html_filter.sub("", temporary_string)
while length > min_length:
if temporary_string[length - 1 : length] == " ":
break
else:
length -= 1
except KeyError:
temporary_string = str(feed["description"])
temporary_string = self.html_filter.sub("", temporary_string)
while length > min_length:
if temporary_string[length - 1 : length] == " ":
break
else:
length -= 1
desc = temporary_string[:length]
if addons is not None:
desc = desc + str(addons)
return desc
def setup(self):
os.environ["TZ"] = "America/Toronto"
time.tzset()
self.now = time.mktime(time.localtime())
# Check for log and config files/paths, create empty directories if needed
# TODO: make this cleaner
if not Path(self.log_dir).exists():
print(
"The config dir {} already exists and is not a directory! Please fix manually.".format(
config_dir
"No log file path exists. Yark! We'll try and make {}...".format(
self.log_dir
)
)
sys.exit(255)
try:
Path(self.log_dir).mkdir(parents=True, exist_ok=True)
except FileExistsError:
print(
"The path {} already exists and is not a directory!".format(
self.log_dir
)
)
if not Path(self.config_file_path).exists():
print(
"No config file at {}! Snarf. We'll try and make {}...".format(
self.config_file_path, self.config_dir
)
)
try:
Path(self.config_dir).mkdir(parents=True, exist_ok=True)
except FileExistsError:
print(
"The config dir {} already exists and is not a directory! Please fix manually. Quitting!".format(
self.config_dir
)
)
sys.exit(255)
return
# Loading the config file
with open(self.config_file_path, "r") as config_file:
self.app_config = json.load(config_file)
# Set up logging
self.logger = logging.getLogger(__name__)
logging.basicConfig(
filename=str(self.log_dir + self.log_file_path),
encoding="utf-8",
level=logging.ERROR,
datefmt="%m/%d/%Y %H:%M:%S",
format="%(asctime)s: %(levelname)s: %(message)s",
)
return
# Loading the config file
with open(config_file_path, "r") as config_file:
app_config = json.load(config_file)
# Set up logging
logger = logging.getLogger(__name__)
logging.basicConfig(
filename=str(log_dir + log_file_path),
encoding="utf-8",
level=logging.INFO,
datefmt="%m/%d/%Y %H:%M:%S",
format="%(asctime)s: %(levelname)s: %(message)s",
)
return
def process(self):
self.setup() # Handle the config and log paths
try:
last_check = self.app_config["lastupdate"]
except KeyError:
last_check = (
self.now - 21600
) # first run, no lastupdate, check up to 6 hours ago
for i, hook in enumerate(self.app_config["feeds"]): # Feed loop start
self.logger.debug("Parsing feed %s...", hook["name"])
self.feeds = feedparser.parse(hook["url"])
self.latest_post = []
prev_best = 0
self.logger.debug(
"About to sort through entries for feed %s ...", hook["name"]
)
for feed in self.feeds["entries"]:
try:
bad_time = False
published_time = time.mktime(feed["published_parsed"])
published_time = published_time + hook["offset"]
except KeyError:
published_time = time.mktime(feed["updated_parsed"])
bad_time = True
if published_time > prev_best:
latest_post = feed
prev_best = published_time
else:
continue
if bad_time is True:
self.logger.debug(
"Feed %s doesn't supply a published time, using updated time instead",
hook["name"],
)
# Hash the title and time of the latest post and use that to determine if it's been posted
# Yes, SHA3-512 is totally unnecessary for this purpose, but I love SHA3
self.logger.debug("About to hash %s ...", latest_post["title"])
try:
new_hash = hashlib.sha3_512(
bytes(latest_post["title"] + str(published_time), "utf-8")
).hexdigest()
except TypeError:
self.logger.error("Title of %s isn't hashing correctly", hook["name"])
continue
try:
if hook["lasthash"] != new_hash:
self.app_config["feeds"][i]["lasthash"] = new_hash
else:
continue
except KeyError:
self.app_config["feeds"][i]["lasthash"] = new_hash
self.logger.info(
"Feed %s has no existing hash, likely a new feed!", hook["name"]
)
# Generate the webhook
self.logger.info(
"Publishing webhook for %s. Last check was %d, self.now is %d",
hook["name"],
last_check,
self.now,
)
webhook = {
"embeds": [
{
"title": str(latest_post["title"]),
"url": str(latest_post["link"]),
"color": 2123412,
"footer": {
"text": "DiscoRSS",
"icon_url": "https://frzn.dev/~amr/images/discorss.png",
},
"author": {
"name": str(hook["name"]),
"url": str(hook["siteurl"]),
},
"fields": [
{
"name": "Excerpt from post:",
"value": self.get_description(latest_post),
}
],
# "timestamp": str(self.now),
}
],
"attachments": [],
}
custom_header = {
"user-agent": "DiscoRSS (https://git.frzn.dev/amr/discorss, 0.2)",
"content-type": "application/json",
}
webhook_string = json.dumps(webhook)
self.logger.debug("About to run POST for %s", hook["name"])
r = requests.post(
hook["webhook"], data=webhook_string, headers=custom_header
)
if r.status_code not in self.success_codes:
self.logger.error(
"Error %d while trying to post %s", r.status_code, hook["name"]
)
else:
self.logger.debug("Got %d when posting %s", r.status_code, hook["name"])
# End of feed loop
# Dump updated config back to json file
self.logger.debug("Dumping config back to %s", str(self.config_file_path))
self.app_config["lastupdate"] = self.now
with open(self.config_file_path, "w") as config_file:
json.dump(self.app_config, config_file, indent=4)
return
# end of Discorss class
def main():
os.environ["TZ"] = "America/Toronto"
time.tzset()
now = time.mktime(time.localtime())
setupPaths() # Handle the config and log paths
try:
last_check = app_config["lastupdate"]
except KeyError:
last_check = now - 21600 # first run, no lastupdate, check up to 6 hours ago
for i, hook in enumerate(app_config["feeds"]): # Feed loop start
logger.debug("Parsing feed %s...", hook["name"])
feeds = feedparser.parse(hook["url"])
latest_post = []
prev_best = 0
for feed in feeds["entries"]:
try:
bad_time = False
published_time = time.mktime(feed["published_parsed"])
published_time = published_time + hook["offset"]
except KeyError:
published_time = time.mktime(feed["updated_parsed"])
bad_time = True
if published_time > prev_best:
latest_post = feed
prev_best = published_time
else:
continue
if bad_time is True:
logger.warning(
"Feed %s doesn't supply a published time, using updated time instead",
hook["name"],
)
# Hash the title and time of the latest post and use that to determine if it's been posted
new_hash = hashlib.sha3_512(
bytes(latest_post["title"] + str(published_time), "utf-8")
).hexdigest()
try:
if hook["lasthash"] != new_hash:
app_config["feeds"][i]["lasthash"] = new_hash
else:
continue
except KeyError:
app_config["feeds"][i]["lasthash"] = new_hash
logger.info(
"Feed %s has no existing hash, likely a new feed!", hook["name"]
)
# Generate the webhook
logger.info(
"Publishing webhook for %s. Last check was %d, now is %d",
hook["name"],
last_check,
now,
)
webhook = {
"embeds": [
{
"title": str(latest_post["title"]),
"url": str(latest_post["link"]),
"color": 216128,
"footer": {
"name": "DiscoRSS",
# "url": "https://git.frzn.dev/amr/discorss",
},
"author": {
"name": str(hook["name"]),
"url": str(hook["siteurl"]),
},
"fields": [
{
"name": "Excerpt from post:",
"value": get_description(latest_post),
}
],
}
],
"attachments": [],
}
custom_header = {
"user-agent": "DiscoRSS (https://git.frzn.dev/amr/discorss, 0.2rc3)",
"content-type": "application/json",
}
webhook_string = json.dumps(webhook)
r = requests.post(hook["webhook"], data=webhook_string, headers=custom_header)
if r.status_code not in success_codes:
logger.error(
"Error %d while trying to post %s", r.status_code, hook["name"]
)
else:
logger.debug("Got %d when posting %s", r.status_code, hook["name"])
# End of feed loop
# Dump updated config back to json file
app_config["lastupdate"] = now
with open(config_file_path, "w") as config_file:
json.dump(app_config, config_file, indent=4)
return
app = Discorss()
app.process()
if __name__ == "__main__":

98
install.sh Executable file
View file

@ -0,0 +1,98 @@
#!/bin/bash
# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
# This script will set up a basic systemd service and timer for DiscoRSS
# You can optionally edit the entries here before running it, or you can
# use systemctl --user edit --full discorss.service or discorss.timer
# after installing them.
printf "\e[1;34mDisco\e[1;38;5;208mRSS\e[0m Install Helper Script\n\n"
workingDir=$(pwd)
printf "Would you like the systemd service and timer files created for you? [y/n]: "
read answer
if [[ "$answer" =~ ^([yY])$ ]]; then
cat << EOF > discorss.service
# Autogenerated by install.sh
[Unit]
Description=Discord RSS feeder
Wants=discorss.timer
[Service]
Type=oneshot
TimeoutStartSec=120
ExecStart=$workingDir/discorss.py
[Install]
WantedBy=default.target
EOF
cat << EOF > discorss.timer
# Autogenerated by install.sh
[Unit]
Description=Timer for DiscoRSS
Requires=discorss.service
[Timer]
Unit=discorss.service
OnCalendar=*:0/5:00
AccuracySec=1s
[Install]
WantedBy=timers.target
EOF
printf "Making ~/.config/systemd/user in case it doesn't exist ...\n"
mkdir -p -v ~/.config/systemd/user/
printf "Copying service and timer files there ... \n"
cp discorss.service ~/.config/systemd/user/
cp discorss.timer ~/.config/systemd/user/
rm -f discorss.service
rm -f discorss.timer
printf "Reloading systemd daemon ... \n\n"
systemctl --user daemon-reload
else
printf "This script is intended to be automatically run. It's designed with systemd in mind, but you are free to use any automation tools. You can look at this script for examples of how to structure systemd user services and timers.\nOf course, you could always run it by hand, if you really want to :)\n\n"
fi
printf "Would you like a basic example config created for you? [y/n]: "
read answer1
if [[ "$answer1" =~ ^([yY])$ ]]; then
mkdir -p -v ~/.config/discorss
cat << EOF > ~/.config/discorss/discorss.conf
{
"feeds": [
{
"name": "Phoronix",
"siteurl": "https://www.phoronix.com/",
"url": "http://www.phoronix.com/rss.php",
"webhook": "PASTE WEBHOOK URL HERE",
"offset": 0
}
]
}
EOF
printf "\nMake sure to edit \e[1;34m~/.config/discorss/discorss.conf\e[0m and add in your custom feeds and webhook URLS! The script will just error out if you don't do this."
else
printf "\nMake sure to create a config at \e[1;34m~/.config/discorss/discorss.conf\e[0m and follow the pattern shown in the README."
fi
printf "\nWould you like to have the timer enabled and started now? [y/n]: "
read answer
if [[ "$answer" =~ ^([yY])$ ]]; then
systemctl --user enable --now discorss.timer
printf "\ndiscorss.timer enabled and started. \e[1;31mDon't enable or start discorss.service\e[0m -- the timer does this automatically."
else
printf "\nDon't forget to run \e[1;32msystemctl --user enable --now discorss.timer\e[0m when you are ready! \e[1;31mDon't enable or start discorss.service\e[0m -- the timer does this automatically."
fi
printf "\n\nYou should be almost ready to go! Double-check your config files, and check \e[1;32msystemctl --user list-timers\e[0m once the discorss.timer is enabled to see when it will fire next. The default is every 5 minutes."
printf "\nRemember, if you need help or encounter any bugs, contact me via the issues tracker on the git repository where you got this from!\n"