PBWorks

From IndieWeb


PBWorks is a wiki content-hosting silo that was originally called "PBWiki", created and launched at an SHDH hack event, and adopted by many independent individuals, movements, and events in the mid 2000s.

How to

How to leave

There are some steps you can take to start leaving your PBWorks wiki.

Most importantly is to start owning your wiki page URLs.

You can do this by:

  1. Figure out where you would want to put wiki pages on your site, e.g. in a /wiki/ directory, or even "just" /w/ or perhaps /page/ to indicate a named wiki-page in contrast to date-time stamped posts that may start with a YYYY year (like /2022/) in their URL.
  2. Set up redirect from your site at that path to your PBWiki.

This way you can immediately start sharing (and linking to) wiki page links that have your own domain instead of PBWiki so that when you eventually switch over the serving of your wiki pages to your own domain, you don't have to worry about having to update those links.

This also has the advantage of sharing https: URLs (using your own domain), instead of having to share http: (insecure) URLs to your PBWiki.

How to stop creating

If you’re able to setup a redirect like the above, the next thing you can do is stop making your future task of archiving/exporting/importing harder, by not creating new pages.

Instead of creating new pages on your PBWiki, consider uploading a "simple" static HTML page to your own website, at a static path, perhaps even at the path you designed above. Then modify your redirect to only be a 404 handler for that directory, that is, only when a page is not found, should it redirect to your old PBWiki.

How to archive

PBworks does not provide any tools for exporting a usable archive of a wiki. It is possible to archive an entire PBworks wiki, including the history of each page, as a set of static HTML files using wget. Below are the steps involved to make it work, as well as nginx config for serving the archived site.

This method will result in an archive of the PBworks site with all original URLs intact, including page revision URLs.

Create links to all pages

Since PBworks does not have a page that lists all pages, you'll need to ensure all wiki pages are linked from the home page. One way to do this is to create a page called "AllPages" and link to all of your wiki pages from there. Note: This is not necessary if you are sure that every page on your wiki is linked to from some other page.

Create a mirror with wget

Create a full mirror of all the HTML pages using wget. The following wget command will download all pages linked from the home page, including linked CSS and JS files.

PBworks hosts a number of static assets on a different domain, vs1.pbworks.com. By default, wget doesn't cross to other domains to download assets there, so it has to be explicitly enabled.

wget --mirror --page-requisites --convert-links -e robots=off -P . --domains=wiki.oauth.net,vs1.pbworks.com --span-hosts http://wiki.oauth.net/

This will create two folders in the current directory, "wiki.oauth.net" and "vs1.pbworks.com". If you have hot-linked any images from other domains, such as Flickr, you'll want to include those domains in the list when you run the command.

After downloading the initial archive, you can test for which Flickr hosts are referenced by running this command on the resulting files:

cd wiki.oauth.net
grep -hRo farm[[:digit:]].static.flickr.com . | sort | uniq

The result will be a list such as:

farm1.static.flickr.com
farm2.static.flickr.com
farm3.static.flickr.com
farm4.static.flickr.com

Then you can re-run the wget command and include those subdomains in the list:

wget --mirror --page-requisites --convert-links -e robots=off -P . --domains=wiki.oauth.net,vs1.pbworks.com,farm1.static.flickr.com,farm2.static.flickr.com,farm3.static.flickr.com,farm4.static.flickr.com --span-hosts http://wiki.oauth.net/

After wget finishes downloading all the files, it rewrites the HTML in each file to point img and links to the relative location of the other domains on disk. This way viewing a page that previously hot-linked to a flickr image will actually view the local file, since it changed the src attribute to reference the local file.

To prepare to serve the archive from nginx, the last step is to rearrange your folder structure slightly. You'll want to move the other domains into your primary domain's folder. In this example, you would execute the following commands:

mv vs1.pbworks.com wiki.oauth.net/
mv farm1.static.flickr.com wiki.oauth.net/
mv farm2.static.flickr.com wiki.oauth.net/
mv farm3.static.flickr.com wiki.oauth.net/
mv farm4.static.flickr.com wiki.oauth.net/

This will work because the src attribute of an image that previously pointed to http://farm1.static.flickr.com/img.png will be rewritten as ../farm1.static.flickr.com/img.png. When viewed in a browser from a page such as /index.html, the browser will attempt to go up one folder which will fail, and will instead resolve to /farm1.static.flickr.com/img.png.

Configure nginx to serve the archive

You'll want to make a new server directive in your nginx config to serve this archive folder. There are a few tricks required in order for this to work properly. Below is the full config I used in this case.

server {
  listen 80;
  server_name wiki.oauth.net;

  root /web/sites/wiki.oauth.net;
  index index.html;

  rewrite ^/$ /w/page/FrontPage permanent;
    
  try_files $uri$is_args$args =404;
  default_type text/html;
  
  location ^~ /theme_image {
    try_files $uri$is_args$args =404;
    default_type image/png;
  }
}

The tricks required for this are explained below.

  • rewrite ^/$ /w/page/FrontPage permanent; PBworks does not actually serve a page from the root, so you need to redirect your root domain to the "FrontPage" page.
  • try_files $uri$is_args$args =404; Because the history pages are saved on disk with "?" and "&" characters in the filename, you need to force nginx to actually look for a file on disk that matches the full request URI including "?" instead of parsing the query string.
  • default_type text/html; Tells nginx to default to html content-type for unknown files, since most of your files will be named things like "FrontPage" with no extension on disk.
  • location ^~ /theme_image This config block tells nginx to send the image/png content-type for paths that match /theme_image. This for the images that make up the borders of the page, since PBworks serves all of them from a php script named /theme_image.php with query string parameters to select the proper image.

IndieWeb Examples

Tantek

Tantek has a PBwiki at http://tantek.pbworks.com/ and has a longterm project/goal of transfering the content and roughly equivalent functionality to his own site.

For now he has setup a URL redirect from his own site, so he can share wikipage links that he controls, in the ops that the pages will eventually be there at his own site, or some other placeon his site.

Brainstorming

How to export

We need a much simpler set of steps, or maybe a tool or a service that can produce a simple high-fidelity export that contains all the information from your PBWiki (pages, content, links) intact.

How to setup on your own site

Separately we need options for how to setup a replacement for PBWiki on your own site.

  • The wiki-project page has some ideas/brainstorms for how to add wiki-like pages/features to your site

Criticism

Cannot edit without JS

If you have JS turned off or disabled like with the NoScript extension, the "Edit" button doesn't work (it doesn't do anything).

No page that lists all pages

There is no list of all wiki pages by default except in the sidebar loaded in via Javascript. This means making an archive of the site using a tool like wget will not find all the pages. To compensate for this, you can create a wiki page called AllPages or similar, and add a link to every page from that page. Then wget will follow those links since they appear in the HTML and will actually download every page.

Auto-reclaim site-deaths

After 11 months of inactivity (lack of edits?) PBWorks will send an email (presumably to admins?) with a warning, and then in 30 days, delete the entire wiki (name.pbworks.com) and all content therein. E.g. for workspace "ifbt": (actual text of email received)

Hello,

We noticed that you haven't used your workspace named: ifbt for over 11 months.

As you may have heard, we reclaim workspaces that have fallen into disuse (PBworks Spring Cleaning).

Reclaiming these idle workspaces frees up thousands of potentially useful URLs for people who will actually put them to use. We're planning to reclaim your workspace in 30 days.

If you want to keep your workspace, click here. If you're not currently logged into your PBworks account, you'll be asked to log in. You'll know that your workspace has been removed from the deletion list once the warning message disappears.

If you're truly no longer using your workspace, simply do nothing, and in 30 days, we'll delete the unused workspace and reclaim its URL.

Thanks,
The PBworks Team

Who knows how many site-deaths have occurred due to this auto-reclamation.

No https on individual wikis

Individual wikis, subdomains on PBWiki, do not have https support and can ony be accessed via http:.

The login UI at least supports https: https://my.pbworks.com/

See Also