Back in high school (HTX 2005-2008) I operated a custom MediaWiki application for collaborative note taking, tracking home work and occasional sharing of homework 🙈. In case you don’t know MediaWiki is the software behind wikipedia.org. At the time I was using a Danish hosting provider, and couldn’t get LaTeX integration working properly, so I ended up hacking MediaWiki to use mimetex. Similarly, I added a few extensions for calendar integration, raw HTML, etc. These hacks and extensions made upgrading MediaWiki challenging, hence, I never upgraded past MediaWiki 1.5.5 released in 2006.
It should not surprise anyone that the wiki was full of spam a few years later. Even though, write access was only granted to trusted users. Some bot must have been scanning the internet for MediaWiki installations with known vulnerabilities, and automatically exploited those vulnerabilities to post spam. Naturally, I ended up taking the wiki offline, being too busy to fix it.
Then earlier this year I decided that it was the time to revive my old wiki. But how do you revive an ancient php4 / mysql4 application? It’s probably possible to tweak it such that it works on newer versions of PHP and MySQL. But my database dumps from mysql4 didn’t import on mysql5 without hacks, and some of my extensions didn’t work with php5. So I decided to go looking for a way to install and run php4 and mysql4.
Initially, I went looking for a docker image or virtual machine with a php4 and mysql4 LAMP stack pre-configured. But I had no such luck, there was a few php4 docker images, but they were running mysql5. Then I found the End-Of-Life Debian images on hub.docker.com/r/debian/eol/. Using debian/eol:sarge it is easy to install php4 and mysql4 from the package manager, as illustrated in the Dockerfile below.
FROM debian/eol:sarge
ENV DEBIAN_FRONTEND noninteractive
# Install php4, mysql4, apache2, imagemagick, build-essential and phpmyadmin (for good measure)RUN apt-get update -y \
&& apt-get install -y \
mysql-server \
mysql-client \
php4 \
apache2 \
libapache2-mod-php4 \
php4-mysql \
imagemagick \
build-essential \
phpmyadmin
# Enable mod_rewriteRUN a2enmod rewrite \
&& sed -i 's/AllowOverride None/AllowOverride all/' /etc/apache2/sites-available/default
# Launch apache2 and mysql when starting the containerENTRYPOINT /bin/bash -c 'apache2 > /dev/null && mysqld > /dev/null & exec bash --login'
The Dockerfile above will create an image with a php4/mysql4 LAMP stack serving from /var/www/. For simple LAMP applications all that remains is to configure mysql users, restore database from SQL dump, and copy php files and resources into /var/www/. This can be done with a few commands as illustrated below.
# Setup mysql with empty root password
mysqladmin -u root password ''# Create $DATABASE_USERNAME with $DATABASE_PASSWORD
echo "GRANT ALL PRIVILEGES ON *.* TO '$DATABASE_USERNAME'@'localhost' IDENTIFIED BY '$DATABASE_PASSWORD';" | mysql;
# Create $DATABASE_NAME and load contents from SQL dump.
echo "CREATE DATABASE $DATABASE_NAME; USE $DATABASE_NAME;" | cat - /src/database-dump.sql | mysql
# Copy php files and resources to /var/www
cp -r /src /var/www
From php4 your application can connect to localhost:socket with the $DATABASE_USERNAME and $DATABASE_PASSWORD created above. Obviously, one should never expose this Docker image to the internet (ie. deploy it to a server). Debian Sarge haven’t received security updates for years. But we can crawl the site and convert it to static files using wget. Simply run the LAMP app as a Docker container locally, find the IP of the Docker container, and run wget as follows:
If your old LAMP application contains absolute links, one can temporarily tweak /etc/hosts to make the domain point to docker container. Also checkout the wget manual for more options, depending on our URL patterns --restrict-file-names=nocontrol might look better. To successfully render my old MediaWiki setup into static files I tweaked the theme to remove unnecessary links. But wget also has options to exclude certain directories for patterns. In my case the final result is visible at jopsen.dk/wiki, these are my high school notes (in Danish).
I had actually dreaded this project a bit, fearing that I would have to follow a lengthy install guide to setup a server on a slow virtual machine. But thanks to the amazing Debian EOL images for docker reviving and old php4 / mysql4 LAMP app was almost a breeze — who knew restarting apache could make you feel all nostalgic đ
As part of my goals this quarter I’ve been experimenting with running Talos in the cloud (Linux only). There are many valid reasons why we’re not already doing this. Conventional wisdom dictates that visualized resources running on hardware shared between multiple users is unlikely to have consistent performance profile, hence, regressions detection becomes unreliable.
Another reason for not running performances tests in the cloud, is that a cloud server is very different from a consumer laptop, and changes in performance characteristic may not reflect the end-user experience.
But when all the reasons for not running performance testing in the cloud have been listed, and I’m sure my list above wasn’t exhaustive. There certainly is some benefits to using the cloud, on-demand scalability and cost immediately springs to mind. So investigating the possibility of running Talos in the cloud is interesting, if not thing more it could be used for fast smoke tests.
Comparing Consistency of Instance Types
First thing to evaluate is the consistency of results depending on instance-type, cloud provider and configuration. For the purpose of these experiments I have chosen instances and cloud providers:
Digital Ocean (1g-1cpu, 2g-2cpu, 4g-2cpu, 8g-4cpu)
For AWS I tested instances in both us-east-1 and us-west-1 to see if there was any difference of results. In each case I have been using two revisions c448634fb6c9 which doesn’t have any regressions and fe5c25b8b675 which has clear regressions in test suites cart and tart. In each case I also ran the tests with both xvfb and xorg configured with dummy video and input drivers.
To ease deployment and ensure that I was using the exact same binaries across all instances I packaged Talos as a docker image. This also ensured that I could reset the test environment after each Talos invocation. Talos was invoked to run as many of the test suites as I could get working, but for the purpose of this evaluation I’m only considering results from the following suites:
tp5o,
tart,
cart,
tsvgr_opacity,
tsvgx,
tscrollx,
tp5o_scroll, and
tresize
After running all these test suites for all the configurations of instance type, region and display server enumerated above, we have a lot of data-points on the form results(cfg, rev, case) = (r1, r2, ..., rn), where ri is the measurement from the i’th iteration of the Talos test case case.
To compare all this data with the aim of ranking configurations by the consistency of their results, compute rank(cfg, rev, case) as the number of configurations cfg' where rank(cfg', rev, case) < rank(cfg, rev, case). Informally, we sort configurations based lowest standard deviation for a given case and rev and the index of a configuration in that list is the rank rank(cfg, rev, case) of the configuration for the given case and rev.
We then finally list configurations by score(cfg), which we compute as the mean of all ranks for the given configuration. Formally we write:
score(cfg) = mean({rank(cfg, rev, case) | for all rev, case})
Credits for this methodology goes to Roberto Vitillo, who also suggested using trimmed mean, but as it turns out the ordering is pretty much the same.
When listing configurations by score as computed above we get the following ordered lists of configurations. Notice that the score is strictly relative and doesn’t really say much. The interesting aspect is the ordering.
Warning, the score and ordering has nothing to do with performance. This strictly considers consistency of performance from a Talos perspective. This is not a comparison of cloud performance!
You may notice that the list above also contains the configuration mozilla-inbound-non-pgo which has results from our existing infrastructure. It is interesting to see that instances with high CPU exhibits lower standard deviation. This could be because their average run-time is lower, so the standard deviation is also lower. It could also be because they consist of more high-end hardware, SSD disks, etc. Higher CPU instances could also be producing better results because they always have CPU time available.
However, it’s interesting that both Azure and Digital Ocean instances appears to produce much less consistent results. Even their high-performance instances. Surprisingly, the data from mozilla-inbound (our existing infrastructure) doesn’t appear to be very consistent. Granted that could just be a bad run, we would need to try more revisions to say anything conclusive about that.
Unsurprisingly, it doesn’t really seem to matter what AWS region we use, which is nice because it just makes our lives that much simpler. Nor does the choice between xorg or xvfb seem to have any effect.
Comparing Consistency Between Instances
Having identified the Amazon c4 and c3 instance-types, as the most consistent classes, we now proceed to investigate if results are consistent when they are computed using difference instances of the same type. It’s well known that EC2 has bad apples (individual machines that perform badly), but this is a natural thing in any large setting. What we are interested in here is what happens when we compare results different instances.
To do this we take the two revisions c448634fb6c9 which doesn’t have any regressions and fe5c25b8b675 which does have a regression in cart and tart. We run Talos tests for both revisions on 30 instances of the same type. For this test I’ve limited the instance-types under consideration to c4.large and c3.large.
After running the tests we now have results on the form results(cfg, inst, rev, suite, case) = (r1, r2, ... rn) where ri is the result from the i’th iteration of the given test case under the given test suite, revision, configuration and instance. In the previous section we didn’t care which suite the test case belonged to. We care about suite relationship here because we compute the geometric mean of the medians of all test cases per suite. Formally we write:
score(cfg, inst, rev, suite) = geometricMean({median(results(cfg, inst, rev, suite, case)) | for all case})
Credits to Joel Maher for helping figure out how the current infrastructure derives per suite performance score for a given revision.
We then plot the scores for all instances as two bar-chart series one for each revision. We get the following plots. I’ve only included 3 here for brevity. Each pair of bars is results from one instance on different revisions, the ordering here is not relevant.
From these two plots it’s easy to see that there is a there is a tart regression. Clearly we can also see that performance characteristics does vary between instances. Even in the case of tart it’s evident, but it’s still easy to see the regression.
Now when we consider the chart for tresize it’s very clear that performance is different between machines. And if a regression here was small, it would be hard to see. Most of the other charts are somewhat similar, I’ve posted a link to all of them below along with references to the very sketchy scripts and hacks I’ve employed to run these tests.
While it’s hard to conclude anything definitive without more data. It seems that the C4 and C3 instance-types offers fairly consistent result. I think the next step is to setup a subset of Talos tests running silently along side existing tests while comparing results to regressions observed elsewhere.
Hopefully it should be possible to use a small subset of Talos tests to detect some regressions early. Rather than having all Talos regressions detected 12 pushes later. Setting this up is not going to a Q2 goal for me, but I should be able to set it up on TaskCluster in no time. At this point I think it’s mostly a configuration issue, since I already have Talos running under docker.
The hard part is analyzing the resulting data and detect regressions based on it. I tried comparing results with approaches like students t-tests. But there is still noisy tests that have to be filtered out, although preliminary findings were promising. I suspect it might be easiest to employ some naive form of Machine learning, and hope that magically solves everything. But we might not have enough training data.
Whilst gedit isn’t the fastest, smartest or most fancy editor out there, I often find my self using it. With toolbars hidden and menu bar gone (unity), gedit is a neat little thing. It always works, and whilst it lacks many features compared to vim, the features gedit does offer, never pops up because I accidentally pressed some key.
The one thing about gedit that I do however feel is missing, is the ability to modify shortcuts. For example switching tabs with “Ctrl + Tab” and “Ctrl + Shift + Tab”, and closing tabs with “Ctrl + F4”, feels as natural as browsing the web in Firefox.
Luckily, it’s possible to install plugins for gedit, I recently found a plugin call Control Your Tabs, that allows you to switch tabs with “Ctrl + Tab”. However, it does tab switch in most recently used order, instead of switching tabs by order in tabview.
I had a quick look at the source for Control Your Tabs, which turned out to be slightly complicated. While I could hack the source to fit my needs, it turned out to be faster and simpler to just write my own gedit plugin.
So here it is TabControl, 50 lines of python now hosted at github.
The plugin switches tabs with “Ctrl (+ Shift) + Tab” based on their order in the tabview. And on top of this it allows you to close the current tab using “Ctrl + F4”. You can find files and installation instructions in the github repository.
The plugin is really short and simple, so if you want a shuffle feature for switch tabs (or whatever), this could be a good place to start. Otherwise I’d definitely recommend taking a look at the gedit plugin documentation.
I’ve spend most my summer working on my GSoC project, which was to create a visual formula editor for OpenOffice Math. Currently, formulas are entered in OpenOffice Math using a plaintext command language, this can be efficent and easy for power users, however, it’s an absolute show stopper for most casual users. So I’ve spend my summer writing a visual formula editor for OpenOffice Math, you can see demonstration here:
I participated in GSoC for Go OpenOffice, which is a project that maintains a set of patches on top of OpenOffice. Go OpenOffice is the OpenOffice version distributed with OpenSuSE, Ubuntu and other distros, it is allegedly a lot better than the official OpenOffice release. And also available for Windows.
Hacking OpenOffice have been a very exciting experience. I haven’t worked on a project so large and complex before. It easily takes 2 hours to build OpenOffice and the sources and binaries fills about 13 GiB. Luckily I didn’t have to rebuild everytime I had to test something.
The visual formula editor, see video above, is not production ready yet. That is it needs extensive testing and a few extra features… However, I plan to keep working on it. You can read more about it’s features and current status here.
I don’t think I’ll keep updating that wiki page, but rather post some updates here once in a while. If you are eager to help test this feature when it comes that far, feel free to leave a comment with your email…
This summer I’ve been working on Pwytter as a part of Google Summer of Code. My project was to separate the backend from the frontend and make a new user interface with PyQt. In my original propsal, I also wanted to do a GTK frontend, however, this was dropped in exchange for a more polished Qt frontend (I do have the basics for a GTK frontend lying around, if anybody is interested, but it’s far from usable).
While writing a backend for Pwytter I also created some abstractions for micro-blogging services, so that Pwytter supports multiple accounts and multiple services (currently Twitter and Identi.ca). With this new backend all the messages are also cached in an sqlite database, enabling Pwytter for work while offline.
I also added theming support to the Qt frontend I wrote for Pwytter, above is a screenshot of Pwytter running the “Twitter-like” theme (as you can see have also be translated, so far only to Danish). Pwytter uses WebKit to display tweets, users and other types of content, thus themes can customize the GUI using HTML templates and Qt stylesheets. Documentation for writing such themes can be found in the project wiki, I plan to write an article on subject when this Pwytter branch is released. So far this Pwytter branch is still under development, and interested developers can find install instructions in the project wiki.
Tomorrow, I’m finally turning in my P1 project, that’s the first semester project, at Aalborg University. The project is about RSA, and usability of encrypting email clients, and as a part of the project we’ve implemented an encrypting email client for GMail. In Python, with PyGTK and SQLite as backend, e.g. access mail while not online.
Anyway, I thought I’d publish the report here for anyone to stumble upon. Honestly it’s not that great, it’s written in English and is subject to some serious gramma issues, as we’ve been pretty busy actually getting it ready… For those who does not know what a P1 project at AAU is, it is a project conducted by a group of 4-7 students. Most of the education at AAU, happens through these kind of projects, which is kind of nice and gives a lot of freedom. But just for the record, I have not written the entire report myself, so do not blame me all the horror that may be found within it… đ
Enough about the report, during the project we also wrote an encrypting GMail client, called RaptorMail – don’t ask why. A GMail client is actually quite interesting, if I managed to find the time, it would be real nice to nail the last few bugs and integrate it with GPG… An application to access GMail through a non-webinterface while still maintaining the same feature set would be nice to have. And cacheing all mail for offline usage is an absolute killer-feature.
But I’m afraid I have a lot of other small projects on my mind too, so actually getting it out there is probably not going to happen. But if anybody is looking for a way to synchronize and interface GMail with a local database from Python the “gmail_cache” module I’ve written for this project is fairly comprehensive and well documented.
2 weeks ago I did a school project on ECDSA (Elliptic Curve Digital Signature Algorithm). At HTX we have to do a project that goes beyond the curriculum, we must combine two subjects and do an individual project about something we find interesting. I decided to combine mathematics and programming in a project about ECDSA. Personally I think it was great fun, but perhaps I have a twisted sense of humor đ
Anyway, I’ve publish my report here, it’s in Danish though. But I did also write an implementation of ECDSA in C. The implementation is called SimpleECDSA, though I must admit it not very Simple anymore. It uses GMP as integer library, and uses the standardized curves cryptographic operations.
The comments in my source is in English, though I did translate them to Danish, before delivering my paper. Anyway, I still have the source with English comments, so I thought I’d post the source here.
As I had a little spare time this holiday I’ve configured the source with the GNU build system. It my first tarball created with Autotools, and it’s mostly hacked together of snippets from the automake and autoconf manual. But it works and the package compiles, and “./configure” complains if GMP isn’t present. I did also manage to get “make check” to run my internal tests, so I think it’s pretty good, considering the fact that is my first package build with autotools.
If you’ve any comments, bug-reports or questions to my packaging or SimpleECDSA, feel free to leave a comment. SimpleECDSA is ofcourse released under GNU GPLv3, and can be downloaded here:
Some users of TheLastRipper has requested integrated volume nomalization (Issue 61). While I admit that I’ve noticed the volume changes between tracks, I’ve never really bordered to find a solution. But since others had similar issues, I decided to take a look at it. I ended up looking at ReplayGain, a project that aims to add a tag, containing volume information, to all songs. Then read the tag at playback and determine the volume. Though the value of the tag must be calculated first.
As this seams like a big feature, and as argued by Andreas in the issue tread, it’s probably not a feature for TheLastRipper. Nevertheless it doesn’t mean that the problem should be fixed, just that it should be done elsewhere. This is also good, since you entire music collection doesn’t necessarily origin from TheLastRipper. The solution is to implement this feature at playback level, meaning in your audio player.
For those of us running Ubuntu and using Amarok, this can be done easily. First open Amarok, choose “script-management”, Click download new scripts. This will open a dialog showing the newest, most popular and most downloaded script for Amarok. Just install the script called ReplayGain. Once this is installed you’ll have to install some dependencies with Synaptic. I’ll try to list those I think is needed: python-kde3, mp3gain, vorbisgain, flac, python-xml
If you enter the script-management in Amarok again, you can enable the ReplayGain script. Enable it, select it and click “Settings”, you can tweak the ReplayGain script a little here. Once your done with that leave the settings, and ReplayGain will print a small popup, telling you which optional dependencies you are missing. I’ve probably forgot to list some of them, you may find them in Synaptic if you think you need them. Though I haven’t found “aacgain” or “replaygain” in the Ubuntu repositories.
Well, you don’t need “replaygain” or “AACGain”, unless you, like me, have AAC music. Actually I’ve just bought my first AAC music from iTunes. I haven’t used iTunes before, but I thought I’d give it a try since they started releasing DRM-free content. So I installed iTunes in my virtual machine, and copied the AAC files back to my Ubuntu system after they were downloaded. First I must say, the quality of iTunes plus files are very good, and the files has ID3v2 tags (other mp3’s I’ve bought online did!). It’s sad that iTunes doesn’t run on Linux, Apple could at least offer a web interface for iTunes plus.
Well, if you have AAC encoded music, you’ll need AACGain, it’s not in Ubuntu or Debian repositories. Actually I couldn’t find any .deb packages for it anywhere. So I decided to make my own. You may download my ACCGain package here. Feel free to contact me if you want the source package.
Once you’re done installing dependencies, and have enabled the ReplayGain Amarok script, you can right click in your playlist and choose “Apply Replaygain tags”, I selected “To entire collection using album tags” it took a while but the volume was automatically determined by Amarok afterwards.
Jeg ved godt at feisty fawn har vĂŠret ude et stykke tid, men jeg ville ikke opgradere under mine eksamener og lige pludselig havde jeg sommerferie og sommer job… Derfor var det fĂžrst her i weekenden at jeg fik installeret feisty fawn. Jeg valgte installation frem for opgradering, da jeg havde leget lidt mere med min edgy installation end hvad godt er.
Installationen var nem og hurtig, det hele gik meget smerte frit. Denne gang var network-manager allerede konfigureret lige efter installation. En lille sjov detalje:
Kort efter at have installeret Ubuntu kunne jeg lÊse pÄ Newz.dk at toshiba havde tilbage kaldt flere batterier. Senere pÄ aften havde Ubuntu opdaget at min laptop var en toshiba og den gav mig en fin advarsel om at mit batteri kunne vÊre tilbage kaldt. Advarselen sÄ sÄdan ud:
Som det kan ses var den pÄ dansk, og kom forholdvis kort tid efter nyheden om batterierne. Jeg synes det er en fed service fra Ubuntu. Jeg kan hilse og sige at min Windows partition ikke kan finde ud af at advarer mig, og det selvom den er fuld af bloatware fra Toshiba (Bruger den kun til legacy applikationer og end smule Windows udvikling). Det skal lige siges at mit batteri ikke var tilbage kaldt, men det var tÊt pÄ.
Det eneste problem jeg havde efter installation af Ubuntu Feisty Fawn var sove funktionen, eller det der hedder “afbryd” nĂ„r man logger ud at gnome, nĂ„r jeg kom tilbage efter en suspendering til ram virkede det trĂ„dlĂžs netvĂŠrk ikke. Jeg kunne ikke komme pĂ„ trĂ„dlĂžst netvĂŠrk med min IPW2200. Jeg kiggede lidt pĂ„ nettet og det vidste sig at hvis man tilfĂžjer “ipw2200” til modules i /etc/default/acpi-support, sĂ„ bliver driveren reloaded efter suspendering. AltsĂ„ ALT+F2 skriv “gksu gedit /etc/default/acpi-support” ENTER, find linjen MODULES=”” og ĂŠndre den til MODULES=”ipw2200″ og gem.
En anden lille ting man mĂ„ske kunne tĂŠnke sig at gĂžre efter installationen af Ubuntu, er at installere pam-keyring. Gnome-Keyring spĂžrger normalt om din hoved adgangskode til nĂžgleringen, hver eneste gang et program skal have fat i et password som er gemt. F.eks. network-manager der bruger gnome-keyring til at gemme passwords i. Ved installation af pam-keyring vil standart nĂžgleringen blive lĂ„st op ved login. Det er vigtigt at din nĂžglering har sammen password som det du bruger til at logge ind med. Derefter skal du installere pam-keyring med synaptic eller bare “sudo apt-get install pam-keyring”, og tilfĂžje fĂžlgende linjer til filen /etc/pam.d/gdm :
Hvis man gÄr lidt op i hvordan systemet ser ud, kan man installere en thumbnailer til openoffice filer. Jeg fandt en god en i en trÄd pÄ ubuntuforums, den benytter et thumbnail af openoffice dokumentet og sÊtter et lille gennemsigtigt ikon hen over billedet, sÄ man kan se om der er tale om et tekst dokument, en prÊsentation eller et regneark. Jeg har opsummeret trÄden fra ubuntuforums til et lille arkiv med installation instrukser. Resultatet ser sÄdan ud:
Lately I’ve been working on a small project called TheLastRipper, hosted on Code.google.com. A few days ago I started wondering about how to document a Mono/.Net application, not because there’s much to document in TheLastRipper or worth documenting for that matter. Anyway I ended up looking at some pages in the Mono wiki, it seams the best way of generating documentation, wasn’t by using documentations comments handled by the C# compilers /doc argument. This is the method most (former) Windows developers are familiar with, once documentation comments have been exported to Xml by the C# compiler, it can be generate into a CHM file using programs like ndoc. Instead the Mono project generates documentation from binaries, which gives them Xml files containing all methods, classes etc. Later on you’ll then be able to fill out the empty comments. This way documentation of code development have been completely separated.
You can read the discussion between the two different documentation formats in Mono wiki. I’ve chosen a middle path, by exporting my current documentation comments to monodoc. I won’t discuss the process of documenting an application using monodoc, the process is already well documented in the Mono wiki article I’ve linked to a few times. What I will discuss is how to parse you monodoc Xml to WikiMarkup that can be hosted on GoogleCode. Mono already comes with an application to parse monodoc to plain html, called monodocs2html. I’ve made a modification of the application, resulting in monodocs2wiki. If you have your documentation as monodoc Xml you can parse it to WikiMarkup, by doing following:
Where ./docs/ is the base path of your monodoc xml files (Containing an index.xml file). And ./wiki/ is the /wiki/ directory you checked out of svn.
Commit your /wiki/ directory to svn, and view you documentation in you GoogleCode wiki.
The /wiki/ directory of your GoogleCode svn repository, contains all the pages in your GoogleCode wiki. You may wish to change a few things in the template, perhaps using a different label than just Documentation. Take a look in the README file if you what to know more about customization. The current template in the monodocs2wiki application, uses the markup used in the wikies at Google Code. You may modify it, I think it would be easy to port it to another WikiMarkup. Any questions, feel free to leave a comment or mail me…