Jeff Mitchell

聚合内容
Amarok, KDE, and all that good stuff
更新时间: 3 分钟 11 秒 前

Git Repo (Resuming) Tarballs

December 2, 2010 - 13:16

At various points the sysadmins have gotten a request to have repository tarballs made available. The idea is that some repositories can be very large; I believe test runs of the KOffice conversion produced a repository that is 350MB in size. Once you get this down to your system, updating with new refs is relatively fast, but what about that first part? What if you’re on a slow dial-up link somewhere (as some of our contributors are) and/or only have access to the Internet sporadically?

A solution to this is to provide repository tarballs to users which contain a snapshot of the repository. Once downloaded and expanded, you have a viable git clone; run a pull and you’re done (it’s pre-configured to fetch new refs from anongit.kde.org). Actually, run the init script and then a pull; to keep the tarball size as small as possible, it doesn’t include a checked-out working tree, only the .git directory and a tiny script that will a) delete itself and b) check out the working tree.

Now, the idea is to be able to start downloading at one time and to finish it later. The key here is that HTTP file transfers are resumable, so if you start downloading the tarball you can pick up where you left off. However, there was a question — how to be able to show a consistent link on Projects without a priori knowledge of tarball characteristics (since we can’t know any from within Redmine). In Redmine, all we could really do, without massive hackery, is display a statically-named tarball — in this case, it always ends in “-latest.tar.gz”. You can see this at https://projects.kde.org/projects/playground/network/aki/repository or any other project’s repository tab — note the “Tarball” checkout method.

This means however that the client needs some way of knowing whether it’s the same tarball or has been regenerated since. An easy way to do this is to use the If-Unmodified-Since header; however, the web server in use currently doesn’t support this (I checked and the author didn’t believe it was widely used; I pointed him to cURL/libcurl). The other problem with this approach is that it’s not very visible to the user.

So, I first changed the script such that the tarballs being generated weren’t actually named *-latest.tar.gz, but instead that they had more descriptive names and had a -latest.tar.gz version symlinked to the actual tarball. I then wrote a simple web service that the web server proxies to whenever it finds a “-latest.tar.gz” file being requested. This service resolves the symlink and redirects the client, so the user sees immediately what the name of the actual file is. For instance, right now a tarball of the aki repository via http://anongit1.kde.org/aki/aki-latest.tar.gz forwards to http://anongit1.kde.org/aki/aki_20101202050239_sha1-5f866b3a42872f8fd54adcf30bcb8a3a79d02542.tar.gz

Note the components of that filename: the name, the date/time stamp (to the second), “sha1″ indicating that that’s the algorithm to check the hash against, and the hash of the file for easy verification of the completed download. This should make it both easy for the client and the user to verify that it’s the same file, in addition to making it easy for the user to verify the integrity of the full file since they can check the sha1sum against the filename itself.

I think this provides a pretty nice solution to the problem. :-)

分类: Planet Amarok

KDE Git: Three servers, better commit URLs

November 30, 2010 - 02:40

Two quick updates on Git stuff:

First, we now have three anongit servers in rotation. We could probably make do with 0.25 of a single anongit server at the current load, but that will change quickly once kdelibs and co switch over from SVN, so it’s good to be ready.

Second, the generated commit URLs will always be given with the full commit hash, but the commit resolver now supports using partial hashes in case you want to shorten it yourself. For instance, you could manually change the commit URL http://commits.kde.org/amarok/cf17de39f9021064a713db965487be6e3d75a186 to http://commits.kde.org/amarok/cf17de39 to give out a shorter URL.

That’s all; if you were in the States I hope you had a good Thanksgiving. :-)

分类: Planet Amarok

Multifaceted Git Update Post

November 6, 2010 - 19:23

I haven’t blogged about updates to the Git infrastructure for a while and it’s *long* overdue, so here goes. There’s a lot and I don’t have much time so I’ll keep things brief. Credit for these items go to the sysadmin team as a whole plus Sitaram (the Gitolite author)…

anongit servers

We now have two anongit servers (the second one will be coming online on Monday or so — it’s 95% done). It is very likely that git:// and http:// protocols will be restricted to the anongit servers for better load balancing, so if you have been using git://git.kde.org/… as your clone URLs, please switch those to git://anongit.kde.org/… URLs. Wait, did I just say http?

http protocol support

The anongit servers support pulling/cloning via http. This is using the newer Smart HTTP support in which the Git protocol is run over HTTP via a special server. All pushing must be done via SSH (port 22 or 443) on git.kde.org.

trash/destroy

There are now two systems of deleting your personal clones/scratch repositories.

  • unlock/destroy: You can destroy a personal repository, at which point it is gone forever; however, for safety you are now required to run the “unlock” command on the repository first.
  • trash: You can trash a personal repository. When you do this, it will be saved for 28 days in case you change your mind. You can use the “list-trash” command in order to see the repositories you have in the trash.

More details can be found in the Git(.kde.org) Manual.

who-pushed

There is now a “who-pushed” command that, given a SHA1 and a repository, can tell you what user pushed that SHA1 to the repository. This is essential for auditing, since commit authorship information can be faked. Now, if a problem arises, we at least can know who introduced that malevolent commit into the public repository.

Repository nicknames

Remember my last post about short Commit URLs? One thing I wanted — and was much requested — was a way to have a more friendly repository identifier. Now they exist. If a friendly identifier exists for a repository, it will automatically be used in the commit URL that is generated. If you want a friendly URL, let the sysadmins know. (We need to figure out a way to make nice names for repo names longer than 8 characters — or just remove that artificial restriction.)

Clones path change

When we first started out, clones would live somewhere like clones/amarok.git/mitchell/myamarok. Why the “.git”in the middle  (we were asked a lot)? Because the path was meant to match the actual name of the repository as it exists on the server.

The reason for *that* is that Git uses a “.git” suffix on repositories that are bare, to make it clear that it’s a Git directory. However (as we found out from Sitaram), this is not supposed to be user-facing, which is why if you use a Git URL with “.git” at the end of it, it’s dropped in the folder that actually is created on the client. So either the path has to lie about the original path on the server, or the client/user has to be made aware of it when it doesn’t really need to be.

What we ended up doing is removing that “.git” from the middle of the repo name (so now it’d be clones/amarok/mitchell/myamarok) and also took pains to ensure that when clone URLs are shown to the user, they are shown without a trailing “.git”.

At the same time, Sitaram made changes in Gitolite to better handle (in the various commands and in the built-in functionalities) cases where the user uses does or does not use the “.git” suffix so that it works correctly in each case.

Updated permissions on master repositories

Master repositories now allow users to create new branches; however, deletion requires the chosen repository manager(s) to intervene.

Updated permission granularities on personal repositories

Personal (clones/scratch) repositories now have a greater level of permission control settable by the user. (The git.kde.org user manual is not yet updated with this information.)

These are now the permission levels that the user can set, from least to most:

  • All: everyone has read and basic write permissions on each repository
  • Writers: writers can write new branches to the repository
  • Managers: managers can also delete branches from the repository
  • Dangers: people designated as dangers (to the repository!) can also rewind/force push to the repository
  • Creator: the creator always has full rights

projects.kde.org tree view

If you check out the projects page on projects.kde.org (here) you’ll see that it now supports a tree view matching the structure of the KDE repositories. The URLs also match this structure.

projects.kde.org last activity indicator

On the same page (here) you’ll see that information about the last commit for that project is now shown next to each project.

…aaaaaaaaand we’re done.

分类: Planet Amarok

Git Commit Semi-Short-URLs

October 7, 2010 - 18:26

As you may or may not know, the KDE git infrastructure currently has two assets for viewing repository data — Redmine and gitweb.

Redmine powers Projects at http://projects.kde.org and is intended (pending theming and greater population of data) to be the real “home” of all of KDE’s projects, putting project information, news, repository browsing and more in a pleasant UI. Each project with a git repository will have a project on Redmine as well as on ReviewBoard.

However, Redmine doesn’t have *all* of the git repositories available. This is because in the short term at least there won’t be projects created for user clones of repositories. In addition, the “scratch” area, where KDE developers can maintain their own repositories for anything they wish (that’s KDE-related of course) will never be put in Redmine.

(The reason for this is that if a developer just wants to play around with some code, or perhaps wants a location to store their (versioned) emacs/vi config files or some such thing, there’s no reason that needs to be an “official project” with an entry in Redmine and ReviewBoard. Similarly, if a developer is writing new code but feels that the code is far too raw to actually have others having eyes on it, we believe it should be up to them when they decide to have it put into a project on Redmine and ReviewBoard.)

So, let’s say you’ve just pushed some code. Where would you go to see it on the Web — Redmine or gitweb? I took a first cut at solving this problem by, based on the location of the repository you pushed to, spitting out a Redmine- or gitweb-based URL in the output sent back to your git client. Unfortunately, these were quite long. For example:

http://projects.kde.org/projects/repo-management/repository/revisions/dcd43aacaa806a7a32779a0215b7ab8ed7b05dc8

This isn’t so nice. Especially if your terminal is only 80 characters wide.

So, I came up with a solution — commits.kde.org. It’s a simple Sinatra/Thin-based web application that parses a URL generated by the gitolite hook and forwards you to the correct place. If the repository is on Redmine, it forwards you there; otherwise it forwards you to gitweb.

These aren’t tiny URLs in the style of bit.ly, but they’re deterministic and not based on a database backend. In fact, you can construct these on your own and they’ll still resolve. The form is

http://commits.kde.org/<repoid>/<commitid>

The repoid can be found by looking in the URL that is spit out when you push your code. It stays fixed for each repository (and if it does change, aliases can be added to keep old URLs alive).

In this format, the above URL goes down to

http://commits.kde.org/99c5fdd6/dcd43aacaa806a7a32779a0215b7ab8ed7b05dc8

By doing this, the URL size drops from 110 characters-ish (depending on the name of the project, whether it’s on gitweb or Redmine, and so on) to a fixed 72; enough to make it fit on a single line in most terminals (with the “remote: ” prepended to the git output it’s 80 characters total), and to make it relatively tweet-/dent-able if you don’t need to include much other information in the post.

Update: I should mention that you can shorten those URLs further, but how much depends on when you start getting a collision. If gitweb encounters an ambiguous commit ID, it will 404, without giving information as to why it returned a 404. Redmine, however, will simply return one or the other of the commits — so you may get the wrong commit shown without even realizing it. Worse, this could also mean that URLs that were shorter (very short) but once worked may not work later if a collision comes up later. So you can use as many or few of the hash characters as you want, but I’d stick with at least 8 or so for safety. *Also*, right now the webapp only matches full hash values in Redmine’s database, so if you shorten it you will always get a gitweb URL.

For anyone interested, the current webapp code is GPLv2+ and can be found right here.

分类: Planet Amarok