Saturday, October 11. 2008MySQL in Amarok 2 - The RealityThere has been a lot of chatter lately regarding Amarok's switch to MySQL as its only SQL backend. A decent amount is FUD -- either by people simply pushing back against change, or by people that simply don't understand the decision. Some of it (particularly Adriaan's blog post) has been insightful and interesting, but miss the mark in terms of why this change was made. This post attempts to explain why this decision was made, what it really means for you the end-user, and why you should have a cup of tea and relax. I want to point out first that I said that MySQL is going to be Amarok's only SQL backend. A2's collection system is very powerful. Just take a look at how varied music sources from Shoutcast, Jamendo, Magnatune, Ampache, MP3Tunes, as well as local sources like iPods and your local file system, are treated as equals in A2. A collection is a collection, and is limited only by what capabilities it advertises it can support (and of course, it can supply its own custom capabilities). It's not currently enabled, I don't think, but there's a Nepomuk-based collection option too. So take heart -- this change only affects Amarok's internal SQL collection, and not other sources (although those sources can store information in the SQL database if they wish to cache information). Since I mentioned Nepomuk, it's time to discuss another common question/demand/complaint: KDE has this nice Strigi-Nepomuk thing going on...why aren't we using it for scanning music and storing information? There are a couple main reasons. The first is that Strigi and Nepomuk are optional, not required. (Update: Strigi is required, but Soprano isn't, so Nepomuk as a whole is still optional.) We can't rely on the user installing them, and even if they are installed, we can't rely on the user to configure them properly (remember that we're going cross-platform, making it even less likely). The second reason is speed: Amarok's custom collection scanner is extremely fast and pulls out specific pieces of information with TagLib. Strigi is, by comparison, very slow (it calculates hashes of all files, which means it needs to read the entire file) and pulls out less information. (Update: According to the Strigi developer, and despite what is said on kde-apps.org, Wikipedia, and even the author's own home page, it does not calculate hashes by default. So it's possible that Strigi, if properly configured, could be as fast as Amarok's internal scanner, although whether it would pull out all necessary information, I don't know. If it's configured to calculate SHA1 hashes of all files, then it will indeed be far slower.) On a local hard drive, it may not be a big issue, but it sure is a huge issue when you throw networked storage into the picture, which is a very common scenario. I've also heard, though don't remember specifics, that querying and such through Nepomuk is rather slow, compared to a normal SQL database. Regardless, though, remember that when the Nepomuk-based collection is finished, tracks sourced through a Nepomuk-based collection will have their metadata changes saved back to Nepomuk. So, it's not that the SQL collection is in place of Nepomuk -- they are entirely independent. (Update: I forgot to mention that a Nepomuk collection already exists. It was developed by a GSoCer over the summer. I'm not sure what its status is as far as making the 2.0 release, but we Amarokers both like Strigi/Nepomuk and are excited about the idea of opening up the app and having all your music available right then and there with no pre-configuration. But there is a place for the SQL collection too. As I said: they are complimentary technologies.) With those topics out of the way, on to the meat. First, it is important to understand an important pair of facts. Number one: we are not database guys. Sure, we can store data in them, and more or less come up with a working schema, but none of us are gurus/wizards/jedis/etc. This leads in to number two: maintaining three databases was driving us crazy. Every time a minor schema change was needed, it had to be coded up for all three types of databases. Modifying a schema could be trivial for one database type, and super difficult (or impossible) for another. People would report bugs that we couldn't reproduce, only to find out that it was because we didn't quite understand how one database or another behaved (or in some cases, none of the active devs were using that type). And so on. So from the beginning of A2 development (and in our fantasies during A1 development) we knew we wanted just one database. (We did actually look at abstraction layers like QtSQL and others. I'm not going to comment on them much, as I didn't do the evaluation, but in general they were found to not be flexible enough to handle all of our needs without doing some custom SQL coding (especially in the cases of things like schema changes), which kind of defeats the point. If you want to know more/want to insist that they are, try asking eean, as I think he did the evaluations.) Now we had to choose the type. At first, SQLite seemed like a good choice. Using transactions, it's decently fast. It's pretty stable (those that complain about odd MySQL bugs should talk to markey, as he, being the SQLite maintainer in 1.4, can attest that SQLite's had its fair share). However, there were a few problems that in the end knocked it out of the running. The first problem is performance. Although for people with small collections it performs fairly well, people with large collections that switched to the MySQL or PostgreSQL backends in A1 would report enormous speed gains when operations performing complex or many queries were performed, such as adding many entries to the playlist, scanning files, or filtering/searching in the collection. Since we want to accommodate users with large collections just as well as those with smaller collections, and since digital music collections aren't getting smaller, the speed increase for our users with large collections was quite important. Many of our developers, after the switch to mysqle (as we call it, though that's not the official name), have noticed huge speed increases in their day-to-day use of A2, so that speed increase is carrying through to the embedded server as well as the normal server. That was the first knock against SQLite. The other blow for SQLite came for a totally different reason. Many users (myself included) have multiple computers sharing a single Amarok database. Assuming all the computers have access to the music at the same mount point (and a few other things are configured right), this allows you to scan once, play everywhere, update the same ratings no matter where you play it, and more. Even if your aren't sharing the database among multiple computers, many users want their database stored on a particular server for speed, security, or backup reasons. If you think either of these isn't a common use-case, you'd be quite wrong. MySQL and PostrgreSQL were quite happy with this workload. It's a total no-go for SQLite, simply because it's designed for a different purpose. So SQLite had two big knocks against it. K.O. However, just as we can't rely on the user to set up Strigi/Nepomuk correctly, we can't rely on them to get their tables set up in MySQL or PostgreSQL. So we needed the database to be embeddable, so that it could just work for the user without any setup necessary on their part. MySQL, with libmysqld, had the seeds of this in the 4.1 series, it works decently in 5.0, and it's becoming fully supported (AFAIK) in 5.1. PostgreSQL, on the other hand, does not have any such thing. (They have an interesting and cool concept of their own of embedded SQL though. Update: apparently that is part of the SQL standard. Still pretty cool. Still totally different from what we mean when we are talking about an embedded server.) So this leaves us with -- as you guessed -- MySQL. It may not be any particular person's favorite database (although it is for plenty), and I don't know how much overhead it really has in embedded form, but it fit the bill. It's both embeddable and can run standalone on the local or a separate machine (yes, this is not supported yet in A2, but it will be). It is fast and robust for large collections. It is well understood by the development team. And most of all, it is a single-backend solution that fills all of our needs. If you're still unhappy about our decision, I'm sorry. We try to please most and can't please everyone. But we're the ones that develop and support this thing, and so we made a decision based both upon our needs as developers and the real-world use-cases from the collective feedback of thousands of users that have contacted us over the last few years. Please remember that even if most of the comments on the Dot, or to this post, (i.e. much of the sudden visible feedback) are from people that are unhappy with our decision, it is a decision that will actually suit the vast, vast majority of our users better than the other options we currently have. We're a project that is known for being good to our users -- we listen to them, we try to implement features they want, try to be responsive with support. It's one of the things that got us where we are today. So please, dear readers -- put some faith in us. This has not been an easy decision -- we've discussed, we've argued, we've thrown things, we've made up, we've had an after-the-make-up orgy or two -- but in the end it's what we collectively felt was the right way to go, and we feel that, in the long run, it will make Amarok even mores awesomer. Hopefully you'll feel that way too. Trackbacks
Trackback specific URI for this entry
No Trackbacks
Comments
Display comments as
(Linear | Threaded)
Jeff, you say:
"...multiple computers sharing a single Amarok database..." How does this work wit the embedded MySql? As far as I know, it doesn't run a server? (I know pretty much nothing about MySql though, only that I read somewhere that the embedded version is without the server).
MySQL/Embedded does not run a server, but MySQL does! As they behave identically once connected to, it's easy to switch between them with minimal development effort. That's why Amarok 2.1 (or whatever) will be able to offer you to connect to an external "real" MySQL server if you want to configure one.
"KDE has this nice Strigi-Nepomuk thing going on...why aren't we using it for scanning music and storing information? "
Strigi and nepomuk integration in amarok (especially for tags) is one of things I was most looking forward to. Even if Strigi is slow, is there no way to put the information amarok pulls via Taglib into Nepomuk? Otherwise I end up with a regular amarok collection and a nepomuk collection of the exact same songs. The possibilities with nepomuk seem amazing; I'd love to see it well integrated into amarok. P.S. The progress from beta1 to beta2 has been amazing. So many bugs fixed, Amarok2 is almost ready to replace amarok1 for me.
I don't see any reason why this couldn't be done with a script, given someone with the time and desire to do it.
"Strigi is, by comparison, very slow (it calculates hashes of all files, which means it needs to read the entire file) and pulls out less information."
Not true. Strigi does not calculate hashes by default. If you think the strigi scanner is slow, please back it up by numbers.
Well in the ideal world, Nepomuk will take no time since their music will already be scanned by the time they open Amarok. This is the main use-case for the Nepomuk collection.
If Strigi doesn't calculate hashes by default, it's not well-communicated. See http://www.kde-apps.org/content/show.php?content=40889 and http://en.wikipedia.org/wiki/Strigi and even your own Akademy presentation at http://www.vandenoever.info/software/strigi/akademy2006.pdf, which in the examples indicates that sha1 is available for files.
So, going on the information I had available (Strigi calculating hashes for all files), Strigi would indeed be an order of magnitude slower. I can back this up because embedded AFT requires scanning all data of all files, because it too calculates hashes...and when this was the default mode in A1, we saw enormous speed decreases vs. simply using Taglib alone, which only scans a small part of the file by default.
/me can confirm the impressing performance-jump with large collection by switching from SQLite to MySQL on A1. That's the (only) reason I've a MySQL-server running now and it's reason enough for me
Very interesting post and well written. I liked it.
I belong to the people who are happy with your decision to use MySQL. I switched to MySQL for Amarok 1 a long time ago, because Amarok was always (no matter if I was playing music or not) the process with the highest CPU rate. With the switch to MySQL I did neither see Amarok or MySQL in the scoring Btw thanks for A2. I have been running Beta 2 since yesterday and I'm quite impressed by your work and very thankful. That will be an important application for the FLOSS stack
Thanks! Glad you are liking beta2. There's already a ton of fixes and changes and features in SVN since beta2, so lots to look forward to...
I should mention that Amarok 2 does actually have a Nepomuk collection. It was developed by a SoC student and is currently in experimental state.
So it's not like we're ignorant of Nepomuk - on the contrary, we are actively exploring its possibilities.
Yes, sorry, I did forget to mention that. It probably won't see the light of day in 2.0, but most of it is already there, and would be a complementary feature to the SQL collection.
The embedded SQL concept is not a concept of Postgres, its defined in the SQL Standard. It has nothing to do with an embedded database, though.
I'm still not sure if the decision to use an embedded instance of mysql is right. The only thing the user would have to do is create a database user. The tables could be created programatically. Seeing that most people use distribution packages anyway, that could be easily done in the package postprocessing script. This would also save you the dump/import to an external instance for database sharing. Using Nepomuk would have been nice, but I can imagine that it would be too slow issuing all commands through a dcop interface, or even worse through a KIO slave, as its imho planned for file searching. But perhaps a solution could have been worked on cooperatively, interfacing more closely, optimizing the access to nepomuk storage and controlling the indexing of strigi. Btw. is it really optional? I thought it was a compile time dependency already. In the end, I don't want to criticize you. You know much more about amarok than I do, I just couldn't keep my mouth shut. Congratulations on the nice progress, by the way.
Actually even a non-embedded MySQL instance can be used without requiring any user configuration by shipping a default config and starting the instance as a user process.
Akonadi uses this approach.
"Seeing that most people use distribution packages anyway [...]"
You completely disregard the fact that Amarok 2 is cross platform. Depending on a MySQL server might be relatively easy on Linux, but on Windows: Not so much.
The "Akonadi" approach that your outlining is actually easier then you make it sound.
Embedded, out-of-process... it mostly doesn't make a difference though. Amarok could've done this.
It looks like Strigi is now a requirement of kdelibs, but Soprano isn't...so even if you have Strigi support in KDE, it doesn't mean you have Nepomuk support.
If Strigi/Nepomuk isn't required until now, I think it's about time to make them required! This is awesome technology. But of what use is it, if everyone avoids using it, because it's optional? And if there are issues with it, IMHO it would be cool if the Amarok project could put it's weight in and try to resolve them, instead of avoiding using it.
Its already required, its part of kdelibs.
I just doubt we're ever going to see the day when everyone has it configured. The Nepomuk collection is for people that do have it configured - possibly realizing a fantasy of someone opening up Amarok for the first time and already having their music listed.
Configure? There should be no initial configuration in Nepomuk (and Amarok collection thus), and if currently there is such need (but I don't think so, I can tag files in KDE right after compiling last SVN and starting with a clean user) then this is a bug that should be addressed somehow.
The same applies to possible performance problems, you should report these issues to strigi/nepomuk and get them fixed, just as you're doing with Phonon when you encounter a boundary (and you're doing a great job!).
So Strigi knows exactly where a user's music is? Hint: it's not ~/Music. It's not /mnt/music. It's not /Music. Or rather, it could be some, or all of those, or anywhere else. That's the problem. Strigi/Nepomuk isn't going to scan every single file on the system and on every network drive and all by default. So maybe it'll get some user music, but not necessarily all (or anywhere close) -- at least without configuration.
As for performance -- read the blog post. It's already explained in there.
Amarok could simply tell Nepomuk about the music directory to have it covered. Maybe also Amarok could feed the extracted data into Nepomuk somehow?
Anyway, your original blog sounded a bit like the Amarok developers would not be really interested in the nepomuk stuff, so I just expressed the wish to push things forward for optimal KDE integration. However it seems in the works as I see now. This is good
Yeah, we're interested. Don't worry. We like to have options available for our users
There's a freedesktop "standard" about where user music is stored by *default*. If your user stores music somewhere else, it's up to her change her strigi/amarok/whatever configuration.
You have a point with remote collections, thus, but this is (right now, it's changing everyday more) a corner case, and anyway scanning from a remote client IS NOT the right solution. The scanning should be done locally (on your NAS or remote PC) and then the result exported by the best mean (mt-daapd, a possible future "remote strigi index" etc)
Yes, we know all about the standard. This is where Nepomuk will immediately find files (if it and Strigi are enabled; just because Strigi is built doesn't mean the user has it turned on). It may also become a default location for Amarok's SQL collection -- if the directory exists, and if the user actually knows about that "standard" location. Assumptions are bad.
I'm not sure what remote client you're talking about scanning with, but I don't generally see a problem with anything populating an Amarok-accessible database with the Amarok-needed data. The entire idea might be to have the scanning run on a server where there is no X installed. This is totally separate from exporting the local scanned music.
Non-kde users probably will have it off.
Its up to the distros to configure it correctly, I wasn't implying the users had to do a bunch of stuff.
100% agree with Michael. Strigi/Nepomuk (especially the latter) are great technologies and it's about time to be their "prime time", so Amarok should pushes them, at least in KDE. Please, don't take the worst part of being "multiplatform", use full KDE power when you're in KDE!
Same as the comment above...I did forget to mention that a Nepomuk collection already exists, so we are definitely planning on hooking into the awesomeness that is the Strigi/Nepomuk combination. (I'm not sure if it will make it into 2.0 though...I don't think it will.)
If you need embedded database but can't trust MySQL for the job, as I do, maybe you should have a look at Firebird:
http://www.firebirdsql.org/ I's said to be a pretty good solution, and one that really supports SQL standards. For example, you get WITH RECURSIVE support, allowing you to manage trees-in-sql effectively. Which I suppose is something you need in A2 collections...
You apparently missed the part where Jeff said we're not DB people. I have no idea what your talking about.
You really have all my understanding but I've noticed that most problems of supporting more than one DB (especially experienced by MySQL devs) comes from ignorance of the standard.
I know it may be a pain to support multiple DB when you really need to squeeze all HPs from a DB and you've to build up complicated queries but: a) MySQL 5.X is supporting more and more features and getting more standard, b) Amarok doesn't look as an application that has to build complicated queries. If people wold stay far from MySQLism most of the code would work on much more DBs with no effort. Furthermore once you support just one DB it becomes MUCH harder to go back on your path. While qt and kde find their way on multiple platform and mobile devices, amarok is going to lose an opportunity to find it's place on mobile too where mysql may be unsuited. But well things go and they may come back... I hope... Unfortunately my C++ belt is in the closet and it seems a SQL belt won't be enough to help. good luck
Hey Ivan,
Thanks for the comment. You are right that DBs are becoming more SQL-complaint, but there are still issues. For instance, representation of boolean values. You can fake it by storing (for instance) an integer, but that's hacky. Regardless, sqlite and mysql were much easier to deal with than when integrating postgres into the mix. Anyways, we don't plan on using a lot of mysqlisms -- in fact, we've done very little modification to the database since standardizing on just mysql, and most of that is (AFAIK) quite portable (but runs significantly faster on mysql than sqlite). We may yet end up supporting more than one DB -- the PIC issues on AMD64 are troubling (although with MySQL 5.1 mysqld is once again "officially supported" so that will hopefully change...we have bug reports in) but we'll just have to see how it goes. (We're also definitely aware of the issue with mobile platforms, and while we're interested, realistically it will probably be a while.) Thanks a lot for the input!
I don't think many people are likely to have a Firebird server running on their network as a central SQL location they wish to store their collection in. Lots, and lots of people do have MySQL set up like that though. So Firebird might be cool and swank, but it doesn't really work for our users.
No, you missed the point. The solution we pick has to be both embeddable and able to be used from a separate server somewhere across a network. Firebird may do both, but very few (if any) of our users have Firebird servers running on their systems, whereas a large portion have MySQL servers running. Standardizing on Firebird would put an extra burden on users, when it is likely that they already have MySQL (or PostgreSQL) installed, and when there is much more easily accessible help on our mailing lists/IRC/the Web about getting MySQL set up than Firebird.
I'm not saying that Firebird couldn't serve the same purpose, I'm saying it would be too burdensome for our users.
Oh, sorry, you want both...
As much as I'd like PostgreSQL to get use more often,it still does not provide embedded server, and your point about firebird is made. I would have though amarok maybe could talk directly to amaroks on the network, exporting the collection API in dbus & JOLIE for example. If you come to this solution, surely an embedded SQL server is enough. But maybe I should simply realize your choice is made, and on very valid assumptions. Keep up the good work,
Hi again Jeff
I must question the notion that many people have MySQL running, this doesn't match at all any desktop setup I've seen to date, seriously it kind of sounds like you may have extrapolated from a highly biased sample, such as IRC users would be. Besides, regardless of the ratio of people running MySQL on their Linux desktop, that same ratio on Windows were Amarok is supposed to run as well is vanishingly small. Note that this is orthogonal to the other points that have been brought up, because nothing forbids to use an efficient collection engine locally and a different, MySQL-based one remotely right? I see that someone else posted a proposal about decoupling remote access from the collection engine, maybe that person could be drafted into the team which would solve two problems at once. Thank you for reading Jeff.
I totally support your decision to ue MySQL embedded. It's very nice, simple and it works perfectly.
I'd be curious to know what daily tasks where significantly sped up. Is there something other than anecdotal evidence for us to look at. I'm curious if it has anything to do with Amarok's generally poor use of indexes/keys. On its tables. MySQL has a advantage in this since it does a lot of things in memory that sqlite doesn't.
Sure, as soon as you provide something more than your own anecdotal evidence that Amarok has "generally poor use of indexes/keys."
Proofs in the pudding there. Looks at your table creation and the queries that don't have indexes. or the tables completely missing indexes at all... or the combined index and unique instead of a primary key in some places.
Yea probably so - sqlite apologists always point out that in this way or that way it can be made faster. Doesn't change the fact that we don't have the skills to do this though. MySQL is just faster without requiring more DB knowhow.
If I could build svn I'd have sent patches in already for the schema. I've been trying get it built since I heard about the switch and that there where performance problems a couple weeks ago.
I've tried repeatedly to get a hand with it as well on IRC but without any luck so...
You could send in schema patches without building from SVN.
I'm not aware of any places where necessary indexes are missing. Who said we were having performance problems? I said MySQL is faster. I didn't say Sqlite was dead slow.
I could very well post patches. I would have no way of testing that they build, work, or even work as intended. Broken patches just waist developer time.
Maybe I miss understood "The first problem is performance." and the rest of that paragraph. Was it not really a reason for it being removed? Because I completely disregarded the second argument since that's a) likely a smaller section user base and b) completely doable with the previous mysql and postgres db backends. Not to mention I'd bet they're better and easier to configure for those network situations.
You need to read, and understand, the various points made in the post.
Thanks, I have read over it several times to make sure I was not crazy.
If you're referring to the table creation, you're also referring to the same power user scenario I mentioned. I think you've missed the entire point of my original post though. I wanted to help improve amarok. I was never defending sqlite even though I do think removing support is a mistake for other reasons. What I was trying to do was take the symptom seen that seemed to be a big enough deal to be listed as 1 of 2 reasons for a "KO" in the first part of your post. sqlite is a library used by a great deal of projects that require very responsive applications so sluggishness implies an underlying problem. I was trying to offer my experience with databases to try and help solve that. Your static and resistance to that goal has now driven me away from the project as a whole though. It makes me wonder if there isn't a reason for you inability to support it. Faster without database know-how is just a weak excuse for an opensource project. Especially when your chasing of developers that are trying to bring database know-how to the table.
"sqlite is a library used by a great deal of projects that require very responsive applications so sluggishness implies an underlying problem. I was trying to offer my experience with databases to try and help solve that."
You are misrepresenting what I said. I said that people that switched to MySQL or PostreSQL databases in A1 reported enormous speed gains with some collections vs. SQLite, and that people testing the mysqle backend in A2 reported the same. I did not say that SQLite was sluggish. SQLite is indeed used in a variety of applications that need fast responsiveness to user events. And for many of our users it's fast enough. I only ever switched to MySQL to gain support for sharing my collection across multiple computers, and I never timed the two, because my collection is normally composed of only a few thousand tracks. But for those that did switch to something else, with the same database schema, they found it to be faster, in some cases significantly so. There is a difference between saying that we've found SQLite to be sluggish, and that we've found other embedded databases to be faster. So the reason this was a knock against SQLite was not because it was necessarily slow, but that with our normal workload, the new embedded MySQL is faster in our tests, and when running it as a separate MySQL server, it was also faster. "Your static and resistance to that goal has now driven me away from the project as a whole though. It makes me wonder if there isn't a reason for you inability to support it."..."Especially when your chasing of developers that are trying to bring database know-how to the table." Yes, our inability to support it is based off the fact that we looked at the various alternatives and made a decision, and have put a lot of effort towards that goal, with a great deal of success in terms of what it provides for us and speed gains. I don't understand what you think our resistance/"static" is to your goal, but from what I understood, your "goal" is that we don't have proper indexes and you want to help us out with that, except that so far you've not provided examples of such cases, nor have you provided patches, nor contacted me or the other developers off this blog to ask how you can help. In other words, you have not actually shown anything like you actually trying to bring database know-how to the table. Now your goal seems to be to get us back on SQLite. The end result is that I have no idea what you're trying to achieve, and that right now you just look like you're complaining. We work like most other open-source projects. If you want to get involved, we appreciate it, and you should actually show that you are actually intending to get involved by contacting us through the proper channels, or submitting a proposal, or pointing out details of where/how we can improve things. If you make various statements with few details to back them up, where you by your own admission ignore half of the other very relevant reasons why a decision was made, and you show no initiative towards actually trying to help make things better, then I'll not lose sleep over having "driven you away".
The one thing I'm concerned about here is that SQLite is said to be "lightweight", which sort of implies that other database systems aren't. I'm not sure where "lightweight" manifests itself -- obviously it's not speed -- so I suppose my question is: How much memory overhead does using mysqle instead of sqlite incur?
No idea, but when I said lightweight, I really meant "easily embeddable" and "requires no server process". The entire thing is a single source file and a single header file. So it's easy to maintain within a source tree and easy to embed in the application. As for no server process, that's rather the whole point of an embeddable database, and in this sense mysqle is just as lightweight.
Hi,
Have you considered KexiDB / Predicate as the database abstraction layer?
I did look into it a year or so ago. Didn't really sound like it did what we needed.
Just googled a bit... info on predicate isn't easy to find.
As for sharing music and meta data over the network: How about giving Amarok the ability to stream out music data? Preferably via some standardized protocol (and ideally advertised via zeroconf). Then you could make a really stripped down server version of Amarok to do the scanning and provide both the actual music file data as well as meta data over the network. A user could simply install and run that on a server system and that's it, no need for setting up SQL database servers or NFS/SMB mounting.
A1 had the ability to share with DAAP. That's not in A2, at least not yet (and may never, since DAAP has serious issues). It is possible that the parts of Amarok that scan the collection and do the SQL database loading might be separated some day (right now, the scanning is separate, but the SQL loading is done in the main process). We've thought about it, but generally had our hands full...
Hi Jeff
As the person who started the biggest thread on that dot article I certainly owe you a reply! You said: 'MySQL is going to be Amarok's only SQL backend' with the implication that other collection systems will be there. This is great news! Of course as long as MySQL no longer is a hard dependency to run Amarok then all my concerns are void, and I'm sorry I didn't understand it would be the case at first, the article made it look like MySQL would be required even for vast majority of people who only have a few hundred to thousands of music files that they access locally. Oh and for the records I am somewhat a database person, myself, as perhaps you can guess from my latest reply to taj on the dot, and that is precisely why I didn't think it suited to implement a playlist, although of course once more I don't know what kind of stuff you guys want to do exactly, I asked taj about that. One more thing though, you say "We're a project that is known for being good to our users -- we listen to them, we try to implement features they want, try to be responsive with support. It's one of the things that got us where we are today. So please, dear readers -- put some faith in us" but to be awfully honest with you I had to drop Amarok 1 eventually because my small work machine couldn't cope with the many things it wanted to do that I didn't need personally, so please keep the non-power users in mind, making features that require non trivial dependencies optional like I now understand you did for MySQL makes us so much happier. And yeah for an optional SQL-based collection backend MySQL-embedded is a real good choice I think. Thank you for reading Jeff.
MySQL won't be optional for Amarok for the foreseeable future. I'm sorry you misread things.
I don't understand why you think MySQL is such a heavy dependency. At runtime all it needs are a few libraries Amarok already had implicit dependencies on and some /usr/share data files. Some distros might not be this clean with it, but thats not our fault.
Hi Ian
If collection engines are interchangeable shouldn't that imply they're optional or at least can be made so? Also indeed I did notice you didn't understand why people take issue with this so let's talk no more of that, it seems you have an inflated idea of the importance of a media player and its right to ressources and that's okay I guess, trying new bold things is how progress is made, but please accept, if you won't understand, that others may disagree. By the way turns of phrase such as "I'm sorry you misread" are kinda passive aggressive and don't help your cause at all, quite the opposite. Please don't undo the good communication job that people like Jeff and taj did. On that note, thank you for admitting to your lack of know-how in an earlier comment, at least that was honest of you.
Well, Ian is right. You did misread things
I'm not sure what you mean about "interchangeable collections" -- it's really more like "multiple collections". MySQL will be a hard requirement because even if you don't want to use a SQL-based collection, many other collection backends, not to mention things like services and lyrics and statistics and media device support, require a database to cache or store information.
Hi again Jeff
Obviously I'm not denying that I misread. That doesn't excuse a choice of word commonly considered poor ("I'm sorry you are wrong"...). I was not aware SQL was used for so many other things in Amarok either, thank you for this precision. Of course now this opens a whole new can of worms of whether an RDBMS is suitable for those uses and I am so not going there. I'm sorry if I'm being bothersome with this all, mind you I'm the sort of guy whose job is to sort out other people's fuckups so I've become way too touchy about potentially ill-reasoned technical choices, I know in truth I really shouldn't care it's just a media player anyway and it's not like I'm paying. I think I'm going to drop this conversation now, although if you would like to continue it in private please let me know and I'll give you my address. (Also note the correct use of 'I'm sorry' above
Okay apologies about that last bit which was outright aggressive. I'm starting to grow a little annoyed I guess.
There are 2 points I'd like to make:
1. I generally do not agree to using MySQL with any valuable data, no matter what the reasoning behind it. That comes from my extensive experience with it, and it being unsuitable for most high availability, high performance, high reliability data storage and manipulation purposes. MySQL was and still is, to a large extent, a “toy” database that can serve limited purpose, as long as the user is aware of its consequences and limitations. A couple of my most annoying gripes with it are: (a) the level of bugs that are still being found and fixed to this day: the database product at this level of age and "maturity" should not be having any (and especially as many) serious data-related bugs; yes, all database servers have bugs, and the bugs are getting fixed, some new ones crop up, they are also getting fixed, but the situation w/MySQL doesn't provide for a good comfort zone for me when relying on the product with my valuable data, especially because I've had to experience several of those critical bugs in a high-availability environment and spent significant time on them. (b) underlying architecture and code for MyISAM is horrible: MyISAM is not going away and the shared code between it and other “pluggable” storage engines creates a virtual hell to debug and write new features for the system; obviously, there are programmers making the programming decisions, and, in my opinion, many KDE and Amarok programmers are well-versed in "good" programming practices. Please take a look at MySQL source code, then compare it against Postgres – I did – the difference is night and day. You can then see why Postgres is able to add advanced features to its product with minimal bugs, while MySQL is having to support basic features and play wack-a-mole with critical bugs. However, in this case, Amarok could be something that is considered a "toy" program (nothing negative). There are no important financial transactions taking place, atomicity doesn't need to be guaranteed, probably false deadlocks (InnoDB) are not going to be an issue, etc., etc.. And, the requirements were outlined so that no other database software could fit the bill (was the case being made backwards?). In either case, most Amarok uses of the database should be contained within the MySQL sweet spot operational zone, and are not likely to pose many critical strictly database-related issues. I will probably not use it, however, as I personally am exceptionally wary of it. 2. This issue got me thinking about one of the stated requirements that Amarok be able to connect to a remote database for its data - "Many users (myself included) have multiple computers sharing a single Amarok database. Assuming all the computers have access to the music at the same mount point (and a few other things are configured right), this allows you to scan once, play everywhere, update the same ratings no matter where you play it, and more." This is a good idea, but should not be done at the low level of database access: (a) can't easily work between different Amarok devices as (a1) the data/files available to one user on one computer is not generally freely available to the users of other networked devices in the same form/location, if at all; setting up "same mount point" and "few other things" is not user-friendly and will not be usable by vast majority of casual users; it will end up giving an impression of something that "does not work" or only works for geeks (a2) giving direct database access to other devices/users could lead to privacy and security issues as there would be no way to specify what is shared and with who / what device; (b) The requirement assumes a single-user world when the reality is likely to be users will want to share media between each other (family/roommates/etc.) on the network - see privacy issues above. If Amarok is trying to achieve [at least a part of] the media server functionality (what it sounds like from the above requirement), my belief is that there needs to be: 1. A networked API layer (SOAP/XML-RPC/whatever) that will allow 2 Amarok clients to communicate with each other via minimal setup – probably just host name or IP address, and authentication. You can also have it automatically discover advertised services on the subnet. 2. A concept (probably similar to RBAC) presented in a user-friendly manner as to what is allowed shared with who and/or what device. 3. A future concept where you can have Amarok running on a box; and Amarok thin/full clients on any device (small or large) from any recognized user accessing its services via the above API. A thin client could be a media player (without its own full Amarok data backend) that could communicate with Amarok API that could be running anywhere from a portable music player or a cell phone to another PC. A full client would be another instance of Amarok (probably on another device) with its own data store interfacing with and caching metadata between each other. Just a couple of thoughts on the matter - thanks for reading.
Re point 1: All database backends have issues -- we've had them with sqlite in the past too, although generally since we included that source in our project, we found some of the worst ones in SVN versions without releasing. We would actually have really liked to use PostgreSQL but they don't have an embedded server option, which was a must for us.
Although I can't fathom why Sun thought MySQL was worth $1 billion dollars, Sun (or parts of Sun at least) have a good history of good software development, and no one invests that much money in a database that they want to remain a "toy" -- hopefully Sun's acquisition will end up causing it to become a much more mature database (in your eyes at least Re point 2: I disagree, with much of what you say here, or just don't find it relevant. First things first: the remote database ability is not just for sharing info between Amarok instances, but because some users want their databases on a particular server. That's not something that "only works for geeks" but is rather a common thing to set up. But, speaking of "only works for geeks" -- who cares? Anyone attempting to do that is likely to be somewhat skilled, and if skilled people want to do skilled things, let 'em. I'm not sure why you think that this ability (which already worked in 1.4, for a long time) would give the impression of something that "does not work". Moving on -- I don't know why you think there are privacy and security issues. I don't assume anything about a single-user world. Personally, I use this so that on the two, sometimes three computers I have running Linux around here, I can connect to my own Amarok info from any of them without having to rescan, and having statistics track no matter where I'm playing the music. But that doesn't mean I'm making an assumption that no one will want to share all database info, because people already do this. It's pretty simple -- if you want to share with someone else, give them access to your database. If you don't, don't give them access and don't allow them to share from it. There are no privacy or security issues involved here. Amarok's not intending to become a music server, although ideas like that have been tossed about. Maybe something will come along in the future, but for now our hands are already full.
Thanks for replying. These are my suggestions; obviously, Amarok programmers/maintainers decide how Amarok works. These are just my thoughts below:
Re: embedded requirement (vs. Postgres): I would not recommend embedded database server. Even with MySQL, I would suggest to run the server in a separate process, to prevent Amarok crash from corrupting the database. This can relatively easily be done in a cross-platform manner. If you get past that, there's no difference between MySQL and Postgres - both would be pseudo-embedded. Re: "first things first" I know that: 1. this is not directly relevant to your original post 2. it already works as you described in 1.4. I was simply trying to put my thoughts out there on how to improve on the concept; these thoughts stemmed from the requirement you stated in your post that there needs to be a direct shared database access between Amarok instances. Re: privacy and security (single vs. multi-user) Privacy: 1. the way you describe it is share all or nothing - this is fine for a single-user, but what if I don't want to share some of my files with certain other users; e.g. I'd like to keep my offensive lyrics collection away from my children's visibility, or only share certain selected albums/files with them; or maybe you don't want your brother/roommate to know you are still listening to spice girls ;), etc., etc. 2. giving direct database access to other users lets them access your listening stats (anything and everything that Amarok tracks) and update them with theirs. What if (a) I don't want to let some users know what my listening habits are, and (b) I'd like to separate tracking between different users so my favorites, ratings and other statistics don't get messed up. Security: Giving direct database access to other users also presents the problem of trusting those users with your data. There's nothing to prevent them from severely crippling your unrecoverable metadata, whether accidentally, or intentionally. None of these are a real issue in a single-user environment, but as soon as you have multiple users these issues will start to come up. The ideas I suggested were one way of dealing with them. Again, thanks for reading my input. I appreciate that I am able to do so, and be heard on the other side; even if it doesn't change anybody's minds.
It might be worth thinking about spinning off database access into a separate process. Of course, if that process crashes, you're in the same boat.
Re: Privacy. It's still not a problem. Yes, it's an all-or-nothing approach. But it has never been advertised as anything but that, and ability to limit how and what is shared was never a goal. In fact, this ability has never even been "officially" supported, although it has had plenty of unofficial support. Again, this would be a privacy issue if Amarok advertised that you could share databases while separating namespaces or some such thing. But it doesn't -- all this feature is is that it will work if it's shared with another copy of Amarok on another computer. So, the privacy issues are not any greater than any other application in the world which has a database. If you don't want someone peeking, don't give them the password. The only reason you see a privacy issue here is because you are attributing functionality to Amarok which doesn't exist, and then pointing out that it doesn't work. Security: Of course giving other users database access means trusting them with your data. This is not a security issue. If you give someone else your login password and tell them to log in to your computer, and if they then open up one of your files and see something you did not mean for them to see, this does not make it a security issue -- it makes you a dumbass. If Amarok allowed other users to bypass your database password, that would be a security issue. But there is absolutely no security issue, for the exact same reason as there is no privacy issue. Namely: if you don't want others seeing and accessing your stuff, don't give them your password.
RE: database process
You are absolutely correct - but I was suggesting starting the database server itself with minimal required configuration, not just spinning off the embedded DB access into a separate process. Database servers (and the associated threads) have more built-in protection from exiting/crashing gracefully which the embedded model will not fully support; with the embedded model, Amarok's (or any external application's) stability is an added consideration in addition to the database layer itself. And usually, database server (even if it's MySQL) is more stable out of the two. RE: security and privacy I completely understand your point, and I am sorry if you misunderstood mine - I never questioned what was advertised vs. delivered - I was just giving my suggestions how to improve Amarok by thinking about new features that are, in my mind, suited for the multi-user sharing environment. Speaking for myself specifically, the issues I listed are likely to prevent me from sharing my data with others in that manner; but that may not be the case for everyone, if they are comfortable with the type of access given out. Anyway, I believe I communicated my thoughts which is what I intended to do. Unfortunately, due to my time/schedule, I am not able to help with much besides these comments/suggestions (this was actually the first time I gave any feedback), even though they were probably of no help anyway. Thanks for the little discussion.
Ah, like (Akonadi, I think?) does. It could be something we look at in the future.
As for the security/privacy stuff -- if we ever want to move Amarok into a more MPD-like direction, rest assured we will think about these things...thanks for your suggestions for improvement. For right now it's intended as a single-user program, that just so happens to be able to be shared across that user's computers, or, with the user's permission, with someone else. Thanks for the feedback...glad you took the time, even if you don't usually do it!
I can understand the annoyance at working with DBs when you're not really -into- them. (I'm a DB designer and developer. :p).
My question is... Come Amarok 2.1 with the support for "real MySQL" databases, would the team consider a patch to make Amarok use a single Database Schema (it can be done, with a few tricks.
It's possible, although I wouldn't consider it at all likely. It wasn't just the schema, but also things like handling the different naming schemes (text vs. varchar or whatever), bool values, and so on, plus figuring out three changes for when the schema was going to change, and so on.
Essentially: we know it can be done, because we did it in 1.4. But it was a miserable experience to maintain.
When ever you make a decision like this, there's inevitably going to be some people unhappy, but know there's also some happy users out there ;D
I started using Amarok after I grabbed a couple hard drives full of music from friends and family. Most of it was crap that I wouldn't listen to, so I needed a decent player to help me sort it to some degree. But most importantly, needed a player that wouldn't crawl up and die when presented with >100GB of music. I never had much luck until I created a mysql database for Amarok and it *flew*. Ever since, I've been hooked. (and even liking the 100 odd features you've bundled in). So anyway I'm really happy with this decision. I think it's definitely for the best. And most importantly it's the best for the 90% of users who don't know or care what MySQL (or a database) is. Nice work. Keep it up.
I'd like to say for the record that I'm NOT in agreement with this.
My collection is just over 2,000 files. I had been running with local files but a local link (gigabit) to my MySQL server. Load times were slow, and I had more then once case where my data got lost in amarok and mysql 4->5 upgrade issues. After I ditched my desktop for a laptop I wanted to play my collection on the road/etc. This didn't work well with encrypted tunnels and performance, of course, was horrid. So I changed over to SQLite. I was shocked when I saw my performance went up by 10-20 times! Yes, track to track changes happened instantly. Before it would lag for a second or two before loading the next track. Also, launch times are less than a second where as before on MySQL it was 5-10 seconds. I have no idea what others experiencing, but requiring MySQL SERVER running on my laptop to play my music is a deal breaker. Besides the extra memory required, it adds a lot of bulk. I have been a huge fan of amarok, but killing off a working backend seems shortsighted and is bound to lose users such as myself. Maybe you could consider asking for someone else to help with SQLite instead of just dropping support.
You need to reread the article until you understand, well, any of it.
I understood the article just fine. You went to great lengths to talk up how MySQL server (not embedded since you talked about network sharing, etc) increased performance for many of your devs. I stated that I came from server and found it to be greatly slower then SQLite embedded.
Since you freely admit you have no idea of how much overhead the embedded version will have I was giving my input to my experience with amarok. As an end-user I don't know what others are doing. You, and rightly so, have better idea then me on that aspect. However, it doesn't change my statement that my performance with MySQL was not the same as yours. Speaking from a statistical point of view, you cannot know how many people are using X,Y,Z and not telling you because it works for them. The squeaky wheel gets the oil and often more attention.
Actually, you did not understand the article at all, because you're too busy pissing and moaning to pay attention. So let's get some things straight.
First: "You went to great lengths to talk up how MySQL server (not embedded since you talked about network sharing, etc) increased performance for many of your devs. I stated that I came from server and found it to be greatly slower then SQLite embedded." No, I talked about how MySQL server increased performance for many users in 1.4, and how MySQL embedded increased performance for many devs in 2.0: "Many of our developers, after the switch to mysqle (as we call it, though that's not the official name), have noticed huge speed increases in their day-to-day use of A2, so that speed increase is carrying through to the embedded server as well as the normal server." Next: "I stated that I came from server and found it to be greatly slower then SQLite embedded." You came from server running on a different machine connected over the network. Even though it was a gigabit link, you're still adding latency. So if you want to compare speeds, how about comparing embedded sqlite to mysql running on localhost, or, more properly, embedded sqlite to embedded mysql. You'd be in the minority here, btw. My collection, which is not that large, saw a speed increase going to mysql over a wifi link vs. local sqlite, and this experience matches the experience of everyone else I've heard from that switched to mysql in 1.4. So if your mysql performance is suffering, you may want to take a look at your equipment...something might be misconfigured. Then: "Speaking from a statistical point of view, you cannot know how many people are using X,Y,Z and not telling you because it works for them. The squeaky wheel gets the oil and often more attention." I never claimed to, and in fact, if you'd bothered paying attention to the article, you'd have seen this: "But we're the ones that develop and support this thing, and so we made a decision based both upon our needs as developers and the real-world use-cases from the collective feedback of thousands of users that have contacted us over the last few years." Yes, the feedback we have received has swayed our decision. It's kind of like a democracy. If you don't make yourself heard by voting, you don't get a say in what happens. (Of course, before you pounce on that without reading the rest of this statement, in this example the "voting" would be leaving us feedback.) Finally: "Maybe you could consider asking for someone else to help with SQLite instead of just dropping support." I don't understand why you, and some other people, have a hard time grokking the most important concept in the posting -- I think you're forgetting the forest for the trees -- but: "...maintaining three databases was driving us *crazy*. Every time a minor schema change was needed, it had to be coded up for all three types of databases. Modifying a schema could be trivial for one database type, and super difficult (or impossible) for another. People would report bugs that we couldn't reproduce, only to find out that it was because we didn't quite understand how one database or another behaved (or in some cases, none of the active devs were using that type). And so on. So from the beginning of A2 development (and in our fantasies during A1 development) we knew we wanted just one database." SQLite couldn't serve as a single database that would meet all of our needs, as is explained in detail. It has nothing to do with needing help or "fixing" it.
"...maintaining three databases was driving us *crazy*. Every time a minor schema change was needed, it had to be coded up for all three types of databases. Modifying a schema could be trivial for one database type, and super difficult (or impossible) for another. People would report bugs that we couldn't reproduce, only to find out that it was because we didn't quite understand how one database or another behaved (or in some cases, none of the active devs were using that type). And so on. So from the beginning of A2 development (and in our fantasies during A1 development) we knew we wanted just one database."
I suggested that maybe you should consider asking for help with SQLite. There are tons of talented people who do know DBs and maybe they could help you. I never said SQLite was for everyone or a single solution. Only that dropping support for it may negatively impact users such as myself. I would make the same argument for dropping MySQL. As with most opensource, choice is a huge strong point. As you said, I was giving you my feedback. You don't have to agree either. I don't care either way. I did want you to see I was giving you feedback from my experience with amarok and mysql. As my very first post and sentence read: "I'd like to say for the record that I'm NOT in agreement with this."
We didn't want help with SQLite because we weren't going to use it anymore. Simple.
It is duly noted that you are not in agreement with this. Oh well.
Yes. It is too bad SQLite support is dropped from the official project. Maybe someone can fork the SQLite supports, so that people with retarded computers like us can actually install Amarok without using 90% of our disk and memory space.
Having KDE dependency is bad enough, now MySQL is indeed a deal breaker for me. I don't even need database support, xmms was doing fine but it is now dead. Audacious is still buggy and have poor sound quality. It seems my only option is console/ncursed based player now.
Go use something else and stop complaining.
Without complains, human will still be living in stone age. If I did something some people do not agree with, even if it is only 0.1% of the population, I expect them to go out and complain to me. I do understand that some people have difficult life, but I do not think many of the responses is appriopriate.
Nevertheless, there isn't any decent music players for Linux these days. Amarok had great promise, but by forcing the developer's choice down the throat of people, without ever conducting polls or other researchs will only alienate users. We all know developers are weird and don't think like normal stupid people, being a developer myself.
You're right, we're a project that's noted for never listening to users. We love shoving things down people's throats, and we've never looked at user use cases to determine our choices.
Can you stop complaining already and just go use something else?
You guys should get a new tagline - "Amarok: The only music player with an embedded enterprise database". Absolutely and utterly hilarious.
Just as well Songbird is beginning to be usable, because you guys lost the fucking plot with Amarok 2 /a long time/ ago, and you seemingly don't understand the problems. Creating a music player based on the requirements of 0.001% of people forces you into to make design decisions that impact the other 99.99% negatively. Think about that.
I find this to be sad ,
mysql only software is an wrong aproach What if there are better databases out there that we want to use them? I say this because i really wanted to test amarok on firebird but seems that future is bleak and I should hold myself from doing this
As soon as firebird meets all of our requirements, we'll be happy to consider it...
I generally think that any of the backends could have been made performant enough, but I also get the reason to use a multi-user-capable server if desired. To me it sounds just like "pick one that meets all needs" and move on with the project, which is what I think you've done, so I have no gripes with that.
To address the point of some posters who mentioned performance on MySQL: I am a MySQL performance expert and I'd be happy to help. Please feel free to email me. I should note that I've primarily used Rhythmbox because it's installed in Ubuntu by default, but been quite unhappy with its own performance. (And yes, I could also offer to help them, but I dunno, it never occurred to me -- and here we are, with a nice blog post about Amarok performance -- the immediate opportunity to help here is too obvious, and I'm lazy.) |
Amarok LinksCalendar
QuicksearchCategoriesSyndicate This BlogBlog Administration |
||||||||||||||||||||||||||||||||||||||||||

