Snippy

Created by ceej. Last edited by ceej Wed, 14 Mar 2007 12:24:36 PDT. Viewed 20469 times.
Blog : recent | archive atom feed
Snippy is the software that runs the site you're reading right now.

Snippy is a wikilog; that is, a wiki system with blogging features. Some people call this a "bliki", but I find that word repulsive. It began as a clone of SnipSnap but evolved. Features include the ability to attach a blog to any wiki page, the ability to mix Markdown-ish text markup with HTML, and mind-bogglingly flexible Atom feed generation.

More about Snippy:

Snippy features: the big list
Subversion repository
Snippy work journal
Snippy Test Page
SnipSnap to Snippy migration
test installation (might not always be running)

Why replace SnipSnap?

It's written in Java. It's buggy. Development has become glacially slow in the last year. Also, it's buggy. It lacks features that I want. Because it's written in Java, adding those features would be annoying, and would require me to learn trivia about the Java platform that I do not wish to learn. And did I mention the bugs?

Python is a more natural choice for a text-processing web application. Development will be more rapid.

Progress

Read the comments to get the progress reports.

first render First view of page rendering in action (8/31/2003) (big).

Snippy Mark II first render:

Some sample page renders from Snippy Mark III (or Mark I v2):


ceej : Wed, 14 Mar 2007 12:21:57 PDT

Time for more work?

Snippy has been serving my blog happily for over a year now. I have skated by without the administrative features I thought I needed. I confess that some of them (changing password etc) would indeed be nice. Time to work on them, maybe?
post comment | 3 comments (Zink, ceej)
ceej : Sun, 03 Aug 2003 12:11:01 PDT permalink
I got the macro plugin architecture working this morning. That was pretty much like falling off a log in Python. Wrote a couple of fast takes on some macros like "link" and "anchor", and am now sure I can just bang them out.

Am continuing to dodge the html page generation problem while I investigate some existing template packages in Python. I might just cheese out and go for an easy dictionary substitution approach. I don't think I need loops and so on in a template language. Probably I just need a way to specify "variable sub here" and "insert other template piece here".

ceej : Sun, 03 Aug 2003 12:36:23 PDT permalink
A medusa http_server runs under asyncore.loop(). It has a bunch of handlers for various items: one for static content like images, and one for wiki pages. The medusa handler just sticks the request data into a page server object, which it then spins off into its own thread. The page server is responsible for application logic: parsing the request, authenticating the user login, then building an appropriate page. A page template object manages page construction (this is still fuzzy), passing snips through the macro expander and the general text processor. Finally, a medusa producer is handed back to the initial medusa request object and the page is spewed out.

I'm storing snip and user data in a BerkeleyDB4 database using the "shelve" stuff in the bsddb3 module. This does the usual trick of giving me something that looks and acts like a dictionary, but has a few more methods implemented on it. The very first thing I did was implement the store classes and then import all my SnipSnap data into it. This gives me a ton of real data to work with right from the start.

ceej : Sun, 03 Aug 2003 14:56:34 PDT permalink
Very minor python rant: sometimes you just want a goddamned switch statement and to hell with your purity, you know? A little syntactic sugar won't kill you...
dws : Sun, 03 Aug 2003 22:13:13 PDT permalink
Mark Pilgrim (>>http://diveintomark.org) has ported Dean Allen's (>>http://www.textism.com/) "textile" formatting package from PHP to Python. See >>http://diveintomark.org/archives/2003/03/19/pytextile.

While that might not be of direct interest, his method of unit testing it is rather elegant.

ceej : Mon, 04 Aug 2003 06:53:36 PDT permalink
Interesting. A solution to a problem I have to solve, but am solving in a radically different way. I'd guess it's the hardest problem of the whole project.
ceej : Sun, 31 Aug 2003 14:06:48 PDT permalink
Resumed development of Snippy today by banging out a couple more macros and continuing work on the text markup processor. I'm going to abandon medusa eventually and switch to a threaded httpd server, but it's modular enough that I simply don't need to worry about it right now. If I get bored with the text processing, I'll start in on the templates.
ceej : Sun, 31 Aug 2003 19:59:20 PDT permalink
Snippy is passing the tests on the Snippy Test Page now. I am now ready to work on the templating mechanism.
ceej : Sun, 31 Aug 2003 23:33:29 PDT permalink
And of course it looks even better now than in that screenshot. Bedtime, though. Finishing off the image macro must wait until tomorrow. Also to-do immediately: generalize the macro argument parser I wrote for the image macro. Figure out what to do about the fact that my directory layout is much more sensible than snipsnap's, which means all the imported image references are wrong. (Probably rewrite the image macros in the import phase.) Don't get too distracted by making nice page layouts, now that I can.

Template language has three features: TEMPLATE NOT FOUND: site/templates/default/template:None
to include another template file. ${variable:None} to substitute a variable from a dictionary. Also, all macros are parsed. Page dictionary building needs to be less ad-hoc.

I'll probably write the calendar macro next, though, since the total non-functionality of Snipsnip's is what pissed me off enough to start writing this damn thing. After that I should write some unit tests for the page markup parser.

ceej : Mon, 01 Sep 2003 14:18:04 PDT permalink
The SnipSnap authors are fucking insane. I'm going to have to fix up the image macros on import. There's no way I'm doing some kind of global context so that a macro knows what page it's being expanded for. That's just crazy.

Oh, right. My calendar macro is working nicely. It took all of, oh, I don't know, 20 minutes to write something that actually works.

Edit: Here's a screenshot of Snippy rendering the Blog archives page: foo

ceej : Wed, 14 Mar 2007 12:19:16 PDT permalink
Macros are no longer expanded in template files; proved impractical and all I really wanted was to be able to include a named snip. So now the template language is:

TEMPLATE NOT FOUND: site/templates/default/include.tmpl
${variable} [[!snipname]]
The rest is assumed to be html and is spewed unchanged.
swetland : Mon, 01 Sep 2003 15:35:07 PDT permalink
Go Ceej!

I've started hacking on mod_python stuff again this weekend, in hopes of switching my photo gallery stuff away from medusa and cleaning it up a bit. I'm holding off on doing anything with sakana or starting a rewrite until I see where this project or John's sakana work goes...

Zink : Mon, 01 Sep 2003 16:39:12 PDT permalink
Hmm. Some pretty funny language in the Medusa default HTTP server sources as they screw up time/date handling and therefore screw up If-Modified-Since.

Of course, it was the Standard C library that is really screwed up. There's a built in way to parse dates, it is just insane and nearly useless. For instance it ignores the Time-Zone modifier, like GMT in "Mon, 01 Sep 2003 23:29:34 GMT". And if you parse the GMT out yourself there is no way to tell the standard C library that for this one date GMT is the time-zone.

So what the code does is have the standard C library parse the date, which it does in the current time-zone, then add the time-zone's offset from GMT by hand. But wait a second, is the time during Daylight Savings time? Then it's one hour less.

But wait another second: if the time leaps back an hour at 1am on October 30th, so that two minutes after 12:59am is 12:01am, how will the parser interpret "Sun, 30 Oct 2004 00:01:00 GMT"?

Naturally all these problems are resolved, but the specification of the library functions is so complex that apparently some implementations are wrong by an hour, and the end result is that the medusa python sources are corrected incorrectly and off by an hour.

dws : Mon, 01 Sep 2003 20:35:48 PDT permalink
Medusa/Python isn't alone in that problem. Tomcat on Win32 gets the time off by one hour in its logfiles. Local time is a wicked hard problem.
ceej : Sat, 20 Sep 2003 13:23:56 PDT permalink
I like unit tests. That is all.
ceej : Sat, 20 Sep 2003 16:28:08 PDT permalink
Much progress: the rendering engine is good enough now for me to move on and work on something else for the moment. Also, I'm fixing up image tags on import. SnipSnap requires you to reference images with an odd naming convention: will actually be rewritten to image-snipname-foo.jpg. This seems silly to me. My image tags will assume that all images are served from a particular directory, but otherwise I do not mangle the image name. Anyway, I have import mangling the SnipSnap references in advance, so I can treat them straightforwardly with minimal work at page serve time:

Snippy today

Zink : Sun, 21 Sep 2003 10:42:05 PDT permalink
I assume the schnip-schnap naming convention allows you to upload some repeated image (Screenshot.jpg, webcam.jpg) to successive blog entries without overwriting. "image--" in this context is a poor man's directory structure. How are you handling this issue?
ceej : Sun, 21 Sep 2003 11:00:18 PDT permalink
Not solving it yet-- I'm assuming you put the file in place by hand at the moment, and can be trusted to name it something sane. If you maintain a directory structure by hand, great. Snippy gives you way more flexibility than SnipSnap, which forces a naming convention on you. When I get around to implementing file upload (a low-priority feature for me), I will have to figure this out.

I recall using SnipSnap's image uploading once or twice, but I can't find the interface now. Maybe it's disabled?

ceej : Sun, 21 Sep 2003 13:26:20 PDT permalink
Okay, that's enough work on Snippy for the moment. I've made a bunch of progress on my page index macro and user listing macro. The index page is now useful.

Starting to think about search, which will be a hard problem to solve.

ceej : Sun, 21 Sep 2003 19:54:53 PDT permalink
An article on the indexer Python module. Reading now; recording the link here just in case.
ceej : Sun, 21 Sep 2003 20:31:09 PDT permalink
Wow, I upgraded to the latest Camino (had been using one from April quite happily). Page rendering is just totally screwed up here. Will have to move back...

[Edit] Or better yet, fix the damn CSS.

ceej : Mon, 22 Sep 2003 11:18:45 PDT permalink
Ransacker looks interesting. Simple, brutal, to the point. I fixed a couple of little bugs in it (love them unit tests) and now have to wire it up and see what the index size is for my current data set. DB file for this set of pages is nearly 5MB. In SnipSnap it takes more than 11.5MB, so I have a ways to grow before I'm being piggier.

Ransacker seems to be dustyware. It's GPLed. I'll rewrite it, add some features, and release a new version.

Snippy will be BSD license, or something like that. Haven't made up my mind. Nobody cares about the exact details of the license of Yet Another Wheel Reinvention.

ceej : Wed, 08 Oct 2003 14:31:25 PDT permalink
Full text searching in action:

Do this search, for "SWG", in SnipSnap on my blog and notice the differences. SnipSnap totally fails to find page name matches! I not only find those, but I call them out, since they're likely more important than content matches. (Actually, SnipSnap fails utterly on that search. Why, I dunno. But I think you could call this a total win for Snippy.)

Formatting probably isn't quite right yet. I'm not displaying the query string in the results page. I'm going to have to refactor, because I understood part-way through how I should handle getting form input data to macros. But the heavy lifting is all done.

Also note bug in blog entry name parsing. A bad regexp in my import phase, probably.

Zink : Wed, 08 Oct 2003 14:43:57 PDT permalink
It's the year in review!
ceej : Fri, 10 Oct 2003 11:23:04 PDT permalink
Wow. Just wow. Notice the order that SnipSnap is displaying these comments in. There must be a pattern here, but I'm not seeing it. And damn, that pattern sure isn't using creation time.
ceej : Wed, 26 Nov 2003 17:03:04 PST permalink
I made some progress on Snippy recently. In addition to improving the search a little bit, I wrote a snips-by-user macro with some reasonable formatting:

usersnips

I think I made some improvements over SnipSnap's rendering of the same data: Zink.

Zink : Wed, 26 Nov 2003 18:29:37 PST permalink
I think the comment balloons are redundant. Also, there are about ninety-three comments for 2003-02-10. These should be done something like "2003-02-10 (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, etc.)"
ceej : Wed, 26 Nov 2003 18:49:32 PST permalink
That's a good suggestion. I think I'll also call out blog entries as distinct from other pages.
ceej : Thu, 27 Nov 2003 11:36:49 PST permalink
Implemented that this morning, along with the blog entry callout.

The current lazy path I'm taking for html generation means that this thing is completely unlocalizable at the moment. Not sure I want to worry about that right now, but I should start thinking about it. Swetland-san has a python template package he likes. Should investigate.

ceej : Fri, 06 May 2005 07:29:28 PDT permalink
Last night's work spasm has taken me past another couple of page rendering milestones. I also have virtual pages working, and even with a reasonable architecture behind the scenes. It's all messier than I'd like, but I have decided to just push through. I'll clean it up later, when I really understand what it is I needed to do. The important thing is to get it all implemented in some fashion.

Had a bad moment when I realized that the nicest way to implement something was to add conditionals to my Ultra-Simple Template Syntax, which is the first step down the path to re-implementing an entire programming language. Which is what usually happens with things like templates and configuration files. So I did not take that step and found another way to do it. Must hold the line there.

The next tasks, in approximate order:

  • Start generalizing blog stuff, so that it's possible to have blog entries attached to anything.
  • Atom feed generation.
  • Form input.
  • Resurrect full-text search.
ceej : Sun, 08 May 2005 09:15:41 PDT permalink
Bah python import cycles. Bah. I need a forward reference!
Zink : Sun, 08 May 2005 10:23:05 PDT permalink
Heh. That's what you get for using a language that doesn't separate interface from implementation.
ceej : Sun, 08 May 2005 11:22:20 PDT permalink
What I actually did yesterday:
  • form input
  • page editing (sans authentication)
  • started generalizing blog stuff
ceej : Thu, 12 May 2005 23:04:05 PDT permalink
I was feeling a little crappy about Snippy's startup time. It takes 8 seconds to start up with 8203 snips on my G5, and 18 seconds on my Powerbook. Then I had to restart SnipSnap with the exact same data set... 2 minutes.
ceej : Fri, 13 May 2005 08:55:15 PDT permalink
A brief pointless rant: What the fuck is up with these people who use two spaces to indent? What planet do they come from? Use a single tab the way god intended, so you freaks can display it as two spaces and I can display it as four and that weirdo David can display it as eight and we're all happy.

And what is up with people who conserve white space as if it were a non-renewable resource? Go ahead. Live a little. Put a blank line between your function definitions. Nothing bad will happen to you, and maybe some stranger bumbling along to read your code will have an easier time. And you know what else? You can use white space to group related lines of code within functions. No, really, you can! No animals are killed when you do it, either! It's perfectly safe.

ceej : Fri, 13 May 2005 20:59:31 PDT permalink
With this checkin, I take a first step toward all the Atom feed flexibility I want.
ceej : Wed, 24 Aug 2005 22:15:46 PDT permalink
It's time to drive Snippy to dogfood status. And that means implementing authentication.
ceej : Fri, 26 Aug 2005 18:58:01 PDT permalink
There is something driving me mad about the Python implementation of Snippy. I haven't figured out what yet.

It's possibly the burden of implementing my own web application framework, starting with the HTTP server. Though I did have fun implementing if-modified-since, I'm not sure I want to do all the work involved in implementing if-none-match. Those are moderately interesting problems to me, and I learn something when I tackle them, but solving them is a barrier to getting SnipSnap the hell out of here and getting this project finished.

There are aspects of Snippy that are really messy and heinous, mostly about how pages get assembled from templates. I think I'm at the point of finding them intolerable.

ceej : Thu, 15 Sep 2005 19:12:10 PDT permalink
Now examining my full-text search options. One is Pyndex, which describes itself as best with sets of a few thousand documents. My dataset has nearly 9000 documents at the moment. Or there's Lupy.

Pyndex has bitrotted and no longer works with the latest versions of python. Probably can be fixed fairly easily.

ceej : Fri, 16 Sep 2005 08:42:13 PDT permalink
Working. And I am thinking that 9000 small documents is indeed too many for Pyndex. It also doesn't work with unicode strings.

My original Berkeley db solution seems to have been the right idea.

ceej : Thu, 29 Dec 2005 15:21:58 PST permalink
Am now investigating bzr as a possible replacement for RCS. RCS is pretty slow. The python library I'm using is a set of functions that call the rcs commands and capture the output, which is kinda clunky. Perhaps I can make this completely modular... Hmm, bzr is GPLed, which is a bummer. Forces me to make it modular, though.

Ugh. One page of API documentation would be a help. Or one reasonable sample of this stuff being used from within Python. Aw, screw it. I can live with RCS for now.

Register or log in to post a comment.