discuss infrastructure

From IndieWeb
(Redirected from IRC spam removal)

discuss infrastructure is documentation of various internal plumbing which runs the IndieWeb discussion chat rooms, chat logs, bots, dealing with spam, and experiments related to that plumbing.

For information about the chat and how to connect to it, refer to discuss.

Bots on IRC

The IRC channel uses these bots:

  • Loqi - Contact: aaronpk, wiki page
    • logs all edits to this wiki
    • logs the irc channel itself to archives on the web
    • logs Twitter mentions of #indieweb or #indiewebcamp to the channel
    • logs webmentions of wiki pages
  • Kaja - Contact: Sven Knebel
    • microformat parsing commands
    • indieweb youtube channel announcements

want to write your own bots: see #bot development

Logs

If the logs have stopped logging or have any other problems, notify Aaron Parecki on the IRC channel.

IRC Log Atom Feed

(Currently unavailable)

An experimental atom feed was available for the IRC logs. This feed delivers the last full day's log to your feed reader.

Please report any feed issues to https://github.com/bcomnes/iwc-log-feed/issues

Spam

Occasionally, Libera gets hit with a wave of IRC spam. Typically the pattern is an account joins many channels across Libera and sends repeated similar messages to as many channels as it can, often mentioning other peoples' nicknames in order to get their attention.

  • Anyone registered on chat-names can kick users by telling Loqi to kick them using !kick spammer
  • Any channel op in our channels can also /kick spammer to remove them from the specific channel immediately
  • Other Libera admins often catch the bots and block them from the server

Red Alert Mode

In very large spam waves, such as the one starting on 2018-07-31, we can employ additional tricks to prevent the spam from hitting the channels at all.

  • Set everyone currently in the channel to +v (/voice nickname)
  • Set the channel modes to +zm (/mode #indieweb-dev +zm)
    • +m means only people with +v mode can speak in the channel
    • +z allows channel ops to still see the messages from unvoiced users, so Loqi can still see messages from new people and spammers
    • (If the channel is in MLOCK mode, use /msg chanserv set #indieweb-dev mlock +zm)
  • When someone joins a channel, Loqi sets a timer for 30 seconds. If they are still in the channel after 30 seconds, they are automatically given +v
  • When an unvoiced user says something, if it matches a spam keyword phrase, they are immediately kicked.
    • If it does not match a spam phrase, then Loqi says "Welcome, {nickname}. Since you weren't yet recognized as a real person, your message wasn't sent to the channel. I'm repeating it below for you.", repeats their message to the channel, and sets +v
    • If the spam does not consistently match a pattern, this last rule can be dropped

To stand down from red alert, set the channel modes back to -zm, and disable Loqi's auto-voicing hooks.

  • /mode #indieweb-dev -zm

Spam in Chat Logs

The IndieWeb chat logs are all stored as files in a GitHub repo, which is where the web interface pulls from. Individual lines in the file can be marked as deleted to avoid them being shown in the web interface. This should only be used to block actual spam, not remove legitimate text from users, even if it is something you don't agree with.

To help remove spam from the logs, you can send a pull request to the chat archive repo to mark specific lines as deleted. Find the specific file containing the messages, and then find the specific chat line to mark as deleted. Each line in the file corresponds to a single IRC message, and begins with a timestamp, followed by a JSON string with the log info. To mark as deleted, add "deleted":true as the first property in the JSON text.

For example, change:

2018-01-01 01:05:10.931300 {"type":"message", ... 

to:

2018-01-01 01:05:10.931300 {"deleted":true,"type":"message", ... 

Send a pull request containing all your edits together, like this one. Anyone with write access to the repo can merge the change, and Aaron Parecki will then deploy it to the server. This will prevent the spam from being shown in the web logs.

How-to remove spam

This is how Martijn van der Ven has been removing spam from the chat.indieweb.org logs:

  1. Find the log file with spam in it on GitHub.
    1. Go to the GitHub archive.
    2. Choose the folder of the matching channel.
    3. Follow a folder structure down to the specific day’s log file. Note that days are in UTC.
  2. Use GitHub’s editor by clicking the pencil icon.
  3. Copy the entire file to a plain text editor with regular expression find and replace support. (E.g. BBEdit.)
  4. Run a find and replace that marks all lines from a specified user as deleted.
    • Match the pattern: (^[0-9:. -]+ {)(.+?,"author":{"uid":"spammername"). Replace spammername with the actual name of the spammer.
    • Replace it with: \1"deleted":true,\2.
  5. Copy it all back into GitHub’s editor, making sure it ends with exactly 1 empty line.
  6. Give it a clear title in the “Propose file change” section identifying what spam you have removed.
  7. Press the (green) “Propose file change” button.
  8. Check the diff to double check you haven’t made any destructive changes.
  9. Press the (green) “Create pull request” button.
  10. The pull request should be auto-filled with the title from step 6. If the title was clear enough you do not need to write anything more, otherwise comment on why/what you changed in the logs.

Please also send an email to remove the spam from freenode.logbot.info logs per https://freenode.logbot.info/#how-to-scrub:

Send me an email or message me on IRC ("glob" on irc.mozilla.org) if there's data that you feel should be scrubbed (eg. spam, accidental leakage of passwords, etc).

See Also