新闻聚合器

Drizzle on the Rackspace Cloud Blog

Planet MySQL - 周日, 2010/03/14 - 14:30

Adrian has talked about a few of us Drizzle Hackers joining Rackspace over at the Rackspace Cloud Blog.


PlanetMySQL Voting: Vote UP / Vote DOWN

Drizzle Developer Day 2010

Planet MySQL - 周日, 2010/03/14 - 13:46

Hi one and all!

Interested in database systems? Interested because you use them? Because you manage them? Write SQL that goes to them? Or are you one of the people of questionable sanity like myself who develops them?

Well… do we have the offer for you.

Friday, April 16th. Right after the MySQL Conference and Expo at the Santa Clara Convention Center, you can come along to the Drizzle Developer Day.

You will want to add your name to this wiki page: http://drizzle.org/wiki/Drizzle_Developer_Day_2010_signup

Suggest topics over at:
http://drizzle.org/wiki/Drizzle_Developer_Day_2010

Hope to see you there!


PlanetMySQL Voting: Vote UP / Vote DOWN

Revisiting and defending my Tweets from NoSQL Live in Boston

Planet MySQL - 周日, 2010/03/14 - 04:38
Some people have been confused by my Twitter stream from last week's NoSQL Live in Boston conference.  I've never been very good at staying "on message", or adhering to a set of "talking points".  Some people thought I was there to "defend the status quo", or to defend and promote memcached, or memcached+mysql.

True, I was there in part to teach about and promote memcached, and especially Gear6 Memcached.  I have a great employer, but they are not sending me to conferences because they are a charity.  I taught the memcached  breakout session, and also worked the "hallway track", giving a number of people a crash course in what memcache is and what makes it useful, and handed out a pile of business cards.

As for my tweets and pithy statements... Well, some over-simplification has to happen to reduce a concept to 140 characters

My statement, "NoSQL as cheaper to deploy, manage, and maintain is a myth. it costs just as much, if not more", which I said that into the mike at the start of the scaling panel, was very popular to tweet and retweet.

It's something that is a "Jedi Truth", it's true from a "certain point of view".  When you add together the costs of the much shallower "bench" of hirable operational experience, the increased "minimum useful hardware footprint"  (which seems to be about at least 5 full storage nodes), and the evolving maturity of the client libraries, and such, NoSQL is not going to save you money.  Until an important threshold is reached.  When you scale and/or your data representation "impedance mismatch" hits that nasty inflection point, where the "buck for bang" curve suddenly starts to rise hard, and it looks like you will need to start spending infinite amounts of money to keep growing.  Then the NoSQL approach does become cheaper, because it's actually doable.  And it's probably wise to start considering, researching, and then migrating to NoSQL before you hit that wall.

My statement "people have been wanting DBA-less databases about as long as they have wanted programmer-less programming languages" was also popular.  I stand by it.  NoSQL doesn't crack this nut, nothing ever well.  Some NoSQL solutions look like they are "DBA-less", such as AWS SDB, AWS RDB, FluidDB.  Those systems are not DBA-less, they have DBAs, just that the cost of the DBA is "hidden" in the per-drink rental cost of those systems, instead of sitting on your balance sheet as a salary.

The statement "Twitter is using Cassandra because bursty writes are cheap, compared to others" is something I said not because I knew it, but because I just learned it, and I was a bit surprised by it.  I think that the original statement was by Ryan King of Twitter, who was also on the scaling panel.

My statement "Memcached should be integrated into all NoSQL stores" is something I also firmly believe.  The very-high-performance in-memory distributed key value store is a very useful building block for larger systems, and I think that whatever larger NoSQL systems we end up will use it as a component of their internal implementations.

The statement "being able to drop nodes as important as being able to add, because scalability is pointless w/o reliability" was also by Ryan King.  I tweeted it because it is very much something worth broadcasting and remembering it.  It has a little more context in his next statement "The first day we stood up our #cassandra cluster, 2 nodes had hdd die... Clients never noticed."  Machines fail.  And as they get faster and cheaper, and as clusters get bigger, machine failure must become something that must not be any sort of emergency.

My statement "open source means folks dont need a standards body" was a extreme simplication of part of Sandro Hawke's talk.  I tweeted it because it was something I've felt to be mostly true enough for a long time, and it was nice to see someone else recognize it.  As Sandro stated later in twitter, "I think I added an important "sometimes"!".  And he is correct.  It's not true as an absolute statement.



All in all, NoSQL Live was a very good conference.  I felt that the speakers taught and learned, all the other attendees taught and learned, and the networking and hallway track was first rate.  Thanks to 10gen for organizing it, and being entirely fair to their "competition" in the rapidly growing and evolving NoSQL space.
PlanetMySQL Voting: Vote UP / Vote DOWN

Recent Work on Improving Drizzle’s Storage Engine API

Planet MySQL - 周六, 2010/03/13 - 15:07

Over the past six weeks or so, I have been working on cleaning up the pluggable storage engine API in Drizzle.  I’d like to describe some of this work and talk a bit about the next steps I’m taking in the coming months as we roll towards implementing Log Shipping in Drizzle.

First, how did it come about that I started working on the storage engine API?

From Commands to Transactions

Well, it really goes back to my work on Drizzle’s replication system.  I had implemented a simple, fast, and extensible log which stored records of the data changes made to a server.  Originally, the log was called the Command Log, because the Google Protobuffer messages it contained were called message::Commands.  The API  for implementing replication plugins was very simple and within a month or so of debuting the API, quite a few replication plugins had been built, including one replicating to Memcached, a prototype one replicating to Gearman, and a filtering replicator plugin.

In addition, Marcus Eriksson had created the RabbitReplication project which could replicate from Drizzle to other data stores, including Cassandra and Project Valdemort.  However, Marcus did not actually implement any C/C++ plugins using the Drizzle replication API.  Instead, RabbitReplication simply read the new Command Log, which due to it simply being a file full of Google Protobuffer messages, was quick and easy to read into memory using a variety of different programming languages.  RabbitReplication is written in Java, and it was great to see other programming languages be able to read Drizzle’s replication log so easily.  Marcus later coded up a C++ TransactionApplier plugin which replaces the Drizzle replication log and instead replicates the GPB messages directly to RabbitMQ.

And there, you’ll note that one of the plugins involved in Drizzle’s replication system is called TransactionApplier.  It used to be called CommandApplier. That was because the GPB Command messages were individual row change events for the most part.  However, I made a series of changes to the replication API and now the GPB messages sent through the APIs are of class message::Transaction.  message::Transaction objects contain a transaction context, with information about the transaction’s start and end time, it’s transaction identifer, along with a series of message::Statement objects, each of which representing a part of the data changes that the SQL transaction made.

Thus, the Command Log now turned into the Transaction Log, and everywhere the term Command was used now was replaced with the terms Transaction and Statement (depending on whether you were talking about the entire Transaction or a piece of it). Log entries were now written at COMMIT to the Transaction Log and were not written if no COMMIT occurred1.

After finishing this work to make the transaction log write Transaction messages at commit time, I was keen to begin coding up the publisher and subscriber plugins which represent a node in the replication environment. However, Brian had asked me to delay working on other replication features and ensure that the replication API could support fully distributed transactions via the X/Open XA distributed transaction protocol. XA support had been removed from Drizzle when the MySQL binlog and original replication system was ripped out and needed some TLC. Fair enough, I said. So, off I went to work on XA.

If Only It Were Simple…

As anyone who has worked on the MySQL source code or developed storage engines for MySQL knows, working with the MySQL pluggable storage engine API is sometimes not the easiest or most straightforward thing. I think the biggest problem with the MySQL storage engine API is that, due to understandable historical reasons, it’s an API that was designed with the MyISAM and HEAP storage engines in mind. Much of the transactional pieces of the API seem to be a bolted-on afterthought and can be very confusing to work with.

As an example, Paul McCullagh, developer of the transactional storage engine PBXT, recently emailed the mysql internals mailing list asking how the storage engine could tell when a SQL statement started and ended. You would think that such a seemingly basic functionality would have a simple answer. You’d be wrong. Monty Widenius answered like this:

Why not simply have a counter in your transaction object for how start_stmt – reset(); When this is 0 then you know stmnt ended.

In Maria we count number of calls to external_lock() and when the sum goes to 0 we know the transaction has ended.

To this, Mark Callaghan responded:

Why does the solution need to be so obscure?

Monty answered (emphasis mine):

Historic reasons.

MySQL never kept a count of which handlers are used by a transaction, only which tables.

So the original logic was that external_lock(lock/unlock) is called for each usage of the table, which is normally more than enough information for a handler to know when a statement starts/ends.

The one case this didn’t work was in the case someone does lock tables as then external_lock is not called per statement. It was to satisfy this case that we added a call to start_stmt() for each table.

It’s of course possible to change things so that start_stmt() / end_stmt() would be called once per used handler, but this would be yet another overhead for the upper level to do which the current handlers that tracks call to external_lock() doesn’t need.

Well, in Drizzle-land, we aren’t beholden to “historic reasons” So, after looking through the in-need-of-attention transaction processing code in the kernel, I decided that I would clean up the API so that storage engines did not have to jump through hoops to notify the kernel they participate in a transaction or just to figure out when a statement and a transaction started and ended.

The resulting changes to the API are quite dramatic I think, but I’ll leave it to the storage engine developers to tell me if the changes are good or not. The following is a summary of the changes to the storage engine API that I committed in the last few weeks.

plugin::StorageEngine Split Into Subclasses

The very first thing I did was to split the enormous base plugin class for a storage engine, plugin::StorageEngine, into two other subclasses containing transactional elements. plugin::TransactionalStorageEngine is now the base class for all storage engines which implement SQL transactions:

/** * A type of storage engine which supports SQL transactions. * * This class adds the SQL transactional API to the regular * storage engine. In other words, it adds support for the * following SQL statements: * * START TRANSACTION; * COMMIT; * ROLLBACK; * ROLLBACK TO SAVEPOINT; * SET SAVEPOINT; * RELEASE SAVEPOINT; */ class TransactionalStorageEngine :public StorageEngine { public: TransactionalStorageEngine(const std::string name_arg, const std::bitset<HTON_BIT_SIZE> &flags_arg= HTON_NO_FLAGS);   virtual ~TransactionalStorageEngine(); ... private: void setTransactionReadWrite(Session& session);   /* * Indicates to a storage engine the start of a * new SQL transaction. This is called ONLY in the following * scenarios: * * 1) An explicit BEGIN WORK/START TRANSACTION is called * 2) After an explicit COMMIT AND CHAIN is called * 3) After an explicit ROLLBACK AND RELEASE is called * 4) When in AUTOCOMMIT mode and directly before a new * SQL statement is started. */ virtual int doStartTransaction(Session *session, start_transaction_option_t options) { (void) session; (void) options; return 0; }   /** * Implementing classes should override these to provide savepoint * functionality. */ virtual int doSetSavepoint(Session *session, NamedSavepoint &savepoint)= 0; virtual int doRollbackToSavepoint(Session *session, NamedSavepoint &savepoint)= 0; virtual int doReleaseSavepoint(Session *session, NamedSavepoint &savepoint)= 0;   /** * Commits either the "statement transaction" or the "normal transaction". * * @param[in] The Session * @param[in] true if it's a real commit, that makes persistent changes * false if it's not in fact a commit but an end of the * statement that is part of the transaction. * @note * * 'normal_transaction' is also false in auto-commit mode where 'end of statement' * and 'real commit' mean the same event. */ virtual int doCommit(Session *session, bool normal_transaction)= 0;   /** * Rolls back either the "statement transaction" or the "normal transaction". * * @param[in] The Session * @param[in] true if it's a real commit, that makes persistent changes * false if it's not in fact a commit but an end of the * statement that is part of the transaction. * @note * * 'normal_transaction' is also false in auto-commit mode where 'end of statement' * and 'real commit' mean the same event. */ virtual int doRollback(Session *session, bool normal_transaction)= 0; virtual int doReleaseTemporaryLatches(Session *session) { (void) session; return 0; } virtual int doStartConsistentSnapshot(Session *session) { (void) session; return 0; } };

As you can see, plugin::TransactionalStorageEngine inherits from plugin::StorageEngine and extends it with a series of private pure virtual methods that implement the SQL transaction parts of a query — doCommit(), doRollback(), etc. Implementing classes simply inherit from plugin::TransactionalStorageEngine and implement their internal transaction processing in these private methods.

In addition to the SQL transaction, however, is the concept of an XA transaction, which is for distributed transaction coordination. The XA protocol is a two-phase commit protocol because it implements a PREPARE step before a COMMIT occurs. This XA API is exposed via two other classes, plugin::XaResourceManager and plugin::XaStorageEngine. plugin::XaResourceManager derived classes implement the resource manager API of the XA protocol. plugin::XaStorageEngine is a storage engine subclass which, while also implementing SQL transactions, also implements XA transactions.

Here is the plugin::XaResourceManager class:

/** * An abstract interface class which exposes the participation * of implementing classes in distributed transactions in the XA protocol. */ class XaResourceManager { public: XaResourceManager() {} virtual ~XaResourceManager() {} ... private: /** * Does the COMMIT stage of the two-phase commit. */ virtual int doXaCommit(Session *session, bool normal_transaction)= 0; /** * Does the ROLLBACK stage of the two-phase commit. */ virtual int doXaRollback(Session *session, bool normal_transaction)= 0; /** * Does the PREPARE stage of the two-phase commit. */ virtual int doXaPrepare(Session *session, bool normal_transaction)= 0; /** * Rolls back a transaction identified by a XID. */ virtual int doXaRollbackXid(XID *xid)= 0; /** * Commits a transaction identified by a XID. */ virtual int doXaCommitXid(XID *xid)= 0; /** * Notifies the transaction manager of any transactions * which had been marked prepared but not committed at * crash time or that have been heurtistically completed * by the storage engine. * * @param[out] Reference to a vector of XIDs to add to * * @retval * Returns the number of transactions left to recover * for this engine. */ virtual int doXaRecover(XID * append_to, size_t len)= 0; };

and here is the plugin::XaStorageEngine class:

/** * A type of storage engine which supports distributed * transactions in the XA protocol. */ class XaStorageEngine :public TransactionalStorageEngine, public XaResourceManager { public: XaStorageEngine(const std::string name_arg, const std::bitset<HTON_BIT_SIZE> &flags_arg= HTON_NO_FLAGS);   virtual ~XaStorageEngine(); ... };

Pretty clear. A plugin::XaStorageEngine inherits from both plugin::TransactionStorageEngine and plugin::XaResourceManager because it implements both SQL transactions and XA transactions. The InnobaseEngine is a plugin which inherits from plugin::XaStorageEngine because InnoDB supports SQL transactions as well as XA.

Explicit Statement and Transaction Boundaries

The second major change I made addressed the problem that Mark Callaghan noted in asking why finding out when a statement starts and ends was so obscure. I added two new methods to plugin::StorageEngine called doStartStatement() and doEndStatement(). The kernel now explicitly tells storage engines when a SQL statement starts and ends. This happens before any calls to Cursor::external_lock() happen, and there are no exception cases. In addition, the kernel now always tells transactional storage engines when a new SQL transaction is starting. It does this via an explicit call to plugin::TransactionalStorageEngine::doStartTransaction(). No exceptions, and yes, even for DDL operations.

What this means is that for a transactional storage engine, it no longer needs to “count the calls to Cursor::external_lock()” in order to know when a statement or transaction starts and ends. For a SQL transaction, this means that there is a clear code call path and there is no need for the storage engine to track whether the session is in AUTOCOMMIT mode or not. The kernel does all that work for the storage engine. Imagine a Session executes a single INSERT statement against an InnoDB table while in AUTOCOMMIT mode. This is what the call path looks like:

drizzled::Statement::Insert::execute() | -> drizzled::mysql_lock_tables() | -> drizzled::TransactionServices::registerResourceForTransaction() | -> drizzled::plugin::TransactionalStorageEngine::startTransaction() | -> InnobaseEngine::doStartTransaction() | -> drizzled::plugin::StorageEngine::startStatement() | -> InnobaseEngine::doStartStatement() | -> drizzled::plugin::StorageEngine::getCursor() | -> drizzled::Cursor::write_row() | -> InnobaseCursor::write_row() | -> drizzled::TransactionServices::autocommitOrRollback() | -> drizzled::plugin::TransactionStorageEngine::commit() | -> InnobaseEngine::doCommit()

I think this will come as a welcome change to storage engine developers working with Drizzle.

No More Need for Engine to Call trans_register_ha()

There was an interesting comment in the original documentation for the transaction processing code. It read:

Roles and responsibilities
————————–

The server has no way to know that an engine participates in
the statement and a transaction has been started
in it unless the engine says so. Thus, in order to be
a part of a transaction, the engine must “register” itself.
This is done by invoking trans_register_ha() server call.
Normally the engine registers itself whenever handler::external_lock()
is called. trans_register_ha() can be invoked many times: if
an engine is already registered, the call does nothing.
In case autocommit is not set, the engine must register itself
twice — both in the statement list and in the normal transaction
list.

That comment, and I’ve read it dozens of times, always seemed strange to me. I mean, does the server really not know that an engine participates in a statement or transaction unless the engine tells it? Of course not.

So, I removed the need for a storage engine to “register itself” with the kernel. Now, the transaction manager inside the Drizzle kernel (implemented in the TransactionServices component) automatically monitors which engines are participating in an SQL transaction and the engine doesn’t need to do anything to register itself.

In addition, due to the break-up of the plugin::StorageEngine class and the XA API into plugin::XaResourceManager, Drizzle’s transaction manager can now coordinate XA transactions from plugins other than storage engines. Yep, that’s right. Any plugin which implements plugin::XaResourceManager can participate in an XA transaction and Drizzle will act as the transaction manager. What’s the first plugin that will do this? Drizzle’s transaction log. The transaction log isn’t a storage engine, but it is able to participate in an XA transaction, so it will implement plugin::XaResourceManager but not plugin::StorageEngine.

Performance Impact of Code Changes

So, that “yet another overhead” Monty talked about in the quote above? There wasn’t any noticeable impact in performance or scalability at all. So much for optimize-first coding.

What’s Next?

The next thing I’m working on is removing the notion of the “statement transaction”, which is also a historical by-product, this time because of BerkeleyDB. Gee, I’ve got a lot of work ahead of me…

[1] Actually, there is a way that a transaction that was rolled back can get written to the transaction log. For bulk operations, the server can cut a Transaction message into multiple segments, and if the SQL transaction is rolled back, a special RollbackStatement message is written to the transaction log.


PlanetMySQL Voting: Vote UP / Vote DOWN

Index only

Planet MySQL - 周六, 2010/03/13 - 10:29
A problem with SQL is SQL. It is easy to write queries that require random IO in the worst case. It is usually easy to find queries that do too much random IO on a NoSQL system as you must code the extra data fetches manually.

Digg has begun to write about their reasons for migrating from MySQL to Cassandra. They provide an excellent summary and then describe a performance problem fixed by the migration. I think Cassandra and a few other members of the NoSQL family are amazing technology but I don't think a migration was needed to fix this performance problem. A better index on the Diggs table would have done that. Others have said the same thing. Maybe I don't have all of the details. I can only go on what was written in the blog.

You can learn more about the power of indexes at the MySQL conference.

The Diggs table was the source of the problem:
CREATE TABLE Diggs (
  id      INT(11),
  itemid  INT(11),
  userid  INT(11),
  digdate DATETIME,
  PRIMARY KEY (id),
  KEY user  (userid),
  KEY item  (itemid)
) ENGINE=InnoDB;It supported an important query that was too slow. A simple form of this query is:
SELECT digdate, id
FROM Diggs
WHERE userid in (10, 20, 30) AND itemid = 50
ORDER BY digdate DESC, id DESC LIMIT 4;This query requires too much random IO because it isn't index only. The query can use either the index on itemid or the index on userid. In both cases it will scan more entries than it needs to from the secondary index and then lookup the remaining columns from the primary index. Each lookup on the primary index can do one disk seek. On my test server the plan for this query is:
id    select_type    table    type    possible_keys    key    key_len    ref    rows    Extra
1    SIMPLE    Diggs    ref    user,item    item    5    const    8960    Using where; Using filesortAfter running the query I ran SHOW SESSION STATUS LIKE "Handler_read%" and the result from that is below. The query scanned 5000 entries from the secondary index and would have done more than 5000 disk seeks in the worst case to lookup columns from the primary key index.

Variable_name    Value
Handler_read_first    0
Handler_read_key    3
Handler_read_next    5000
Handler_read_prev    0
Handler_read_rnd    0
Handler_read_rnd_next    0The query is much faster for a table with different indexes
CREATE TABLE DiggsFast (
  id      INT(11),
  itemid  INT(11),
  userid  INT(11),
  digdate DATETIME,
  PRIMARY KEY (itemid,userid,digdate,id),
  UNIQUE KEY (id)
) ENGINE=InnoDB;
The query has a better plan:
id    select_type    table    type    possible_keys    key    key_len    ref    rows    Extra
1    SIMPLE    DiggsFast    range    PRIMARY    PRIMARY    8    NULL    149    Using where; Using index; Using filesort It also is much better in reality. The output from SHOW SESSION STATUS LIKE "Handler_read%" is listed below. With a better index the query scans 150 entries from 5 range scans of the index. It should do about 5 disk seeks in the worst case. It is also index only so it doesn't have to lookup other columns after the index scan. Although that doesn't matter much in this case because the query uses the primary key index which has all columns for an InnoDB table. This query will be much faster than the previous one (5 disk seeks versus 5000).
Variable_name    Value
Handler_read_first    0
Handler_read_key    5
Handler_read_next    150
Handler_read_prev    0
Handler_read_rnd    0
Handler_read_rnd_next    0
PlanetMySQL Voting: Vote UP / Vote DOWN

Understanding Drizzle user authentication options &#8211; Part 2

Planet MySQL - 周六, 2010/03/13 - 06:45

A key differentiator in Drizzle from it’s original MySQL roots is user based authentication. Gone is the host/user and schema/table/column model that was stored in the MyISAM based mysql.user table.

Authentication is now completely pluggable, leveraging existing systems such as PAM, LDAP via PAM and Http authentication.

In this post I’ll talk about HTTP authentication which requires an external http server to implement successfully. You can look at Part 1 for PAM authentication.

Compiling for http auth support

By default during compilation you may find.

checking for libcurl... no configure: WARNING: libcurl development lib not found: not building auth_http plugin. On Debian this is found in libcurl4-gnutls-dev. On RedHat it's in libcurl-devel.

In my case I needed:

$ sudo yum install curl-devel

NOTE: Bug #527255 talks about issues of the message being incorrect for libcurl-devel however this appears it may be valid in Fedora Installs

After successfully installing the necessary pre-requisite you should see.

checking for libcurl... yes checking how to link with libcurl... -lcurl checking if libcurl has CURLOPT_USERNAME... no HTTP Authentication

We need to enable the plugin at server startup.

$ sbin/drizzled --mysql-protocol-port=3399 --plugin_add=auth_http &

You need to ensure the auth_http plugin is active by checking the data dictionary plugin table.

drizzle> select * from data_dictionary.plugins where plugin_name='auth_http'; +-------------+----------------+-----------+-------------+ | PLUGIN_NAME | PLUGIN_TYPE | IS_ACTIVE | MODULE_NAME | +-------------+----------------+-----------+-------------+ | auth_http | Authentication | TRUE | | +-------------+----------------+-----------+-------------+

The auth_http plugin also has the following system variables.

drizzle> SHOW GLOBAL VARIABLES LIKE '%http%'; +------------------+-------------------+ | Variable_name | Value | +------------------+-------------------+ | auth_http_enable | OFF | | auth_http_url | http://localhost/ | +------------------+-------------------+ 2 rows in set (0 sec)

In order to configure Http authentication, you need to have the following settings added to your drizzled.cnf file. For example:

$ cat etc/drizzled.cnf [drizzled] auth_http_enable=TRUE auth_http_url=http://thedrizzler.com/auth

NOTE: Replace the domain name with something you have, even localhost.

A Drizzle restart gives us

$ bin/drizzle -e "SHOW GLOBAL VARIABLES LIKE 'auth_http%'" +------------------+-----------------------------+ | Variable_name | Value | +------------------+-----------------------------+ | auth_http_enable | ON | | auth_http_url | http://thedrizzler.com/auth | +------------------+-----------------------------+

By default, currently if the settings result in an invalid url, then account validation does not fail and you can still login. It is recommended that you always configure pam authentication as well as a fall back.

$ wget -O tmp http://thedrizzler.com/auth --17:32:32-- http://thedrizzler.com/auth Resolving thedrizzler.com... 208.43.73.220 Connecting to thedrizzler.com|208.43.73.220|:80... connected. HTTP request sent, awaiting response... 404 Not Found 17:32:32 ERROR 404: Not Found. $ bin/drizzle drizzle > exit Configuring passwords

To correctly configured your web server to perform the HTTP auth, you can use this Apache syntax as an example.

The following is added to the VirtualHost entry in your web browser.

<Directory /var/www/drizzle/auth> AllowOverride FileInfo All AuthConfig AuthType Basic AuthName "Drizzle Access Only" AuthUserFile /home/drizzle/.authentication Require valid-user </Directory> $ sudo su - $ mkdir /var/www/drizzle/auth $ touch /var/www/drizzle/auth/index.htm $ apachectl graceful

We check we now need permissions for the URL.

$ wget -O tmp http://thedrizzler.com/auth --17:35:48-- http://thedrizzler.com/auth Resolving thedrizzler.com... 208.43.73.220 Connecting to thedrizzler.com|208.43.73.220|:80... connected. HTTP request sent, awaiting response... 401 Authorization Required Authorization failed.

You need to create the username/password for access.

$ htpasswd -cb /home/drizzle/.authentication testuser sakila $ cat /home/drizzle/.authentication testuser:85/7CbdeVql4E

Confirm that the http auth with correct user/password works.

$ wget -O tmp http://thedrizzler.com/auth --user=testuser --password=sakila --17:37:45-- http://thedrizzler.com/auth Resolving thedrizzler.com... 208.43.73.220 Connecting to thedrizzler.com|208.43.73.220|:80... connected. HTTP request sent, awaiting response... 301 Moved Permanently Drizzle HTTP Authentication in action

By default we now can’t login

$ bin/drizzle ERROR 1045 (28000): Access denied for user ''@'127.0.0.1' (using password: NO) $ bin/drizzle --user=testuser --password=sakila999 ERROR 1045 (28000): Access denied for user 'testuser'@'127.0.0.1' (using password: YES) $ bin/drizzle --user=testuser --password=sakila Welcome to the Drizzle client.. Commands end with ; or \g. Your Drizzle connection id is 6 Server version: 7 Source distribution (trunk) Type 'help;' or '\h' for help. Type '\c' to clear the buffer. drizzle>
PlanetMySQL Voting: Vote UP / Vote DOWN

Understanding Drizzle user authentication options &#8211; Part 1

Planet MySQL - 周六, 2010/03/13 - 05:46

A key differentiator in Drizzle from it’s original MySQL roots is user based authentication. Gone is the host/user and schema/table/column model that was stored in the MyISAM based mysql.user table.

Authentication is now completely pluggable, leveraging existing systems such as PAM, LDAP via PAM and Http authentication.

In this post I’ll talk about PAM authentication which is effectively your current Linux based user security.

This information is based on the current build 1317.

Compiling for PAM support

Your Drizzle environment needs to be compiled with PAM support. You would have received the following warning during a configure.

$ ./configure ... checking for libpam... no configure: WARNING: Couldn't find PAM development support, pam_auth will not be built. On Debian, libpam is in libpam0g-dev. On RedHat it's in pam-devel.

The solution is provided in the warning message which is another great thing about Drizzle. The pre checks for dependencies and the optional messages like these far exceed the MySQL equivalent compilation process. In my case:

$ sudo yum install pam-devel

When correctly configured, it should look like:

checking for libpam... yes checking how to link with libpam... -lpam Working with PAM

You need to enable the PAM authentication plugin at drizzled startup.

sbin/drizzled --plugin_add=auth_pam &

Unfortunately connecting fails to work with

time sbin/drizzle --user=testuser --password=***** --port=4427 real 0m0.003s user 0m0.003s sys 0m0.001s

A look into the source at src/drizzle-2010.03.1317/plugin/auth_pam/auth_pam.cc shows a needed config file

117 retval= pam_start("check_user", userinfo.name, &conv_info, &pamh); Configuring PAM

In order to enable PAM with Drizzle you need to have the following system configuration.

$ cat /etc/pam.d/check_user auth required pam_unix.so account required pam_unix.so $ time sbin/drizzle --user=testuser --password=***** --port=4427 ERROR 1045 (28000): Access denied for user 'testuser'@'127.0.0.1' (using password: YES) real 0m2.055s user 0m0.002s sys 0m0.002s

This did some validation but still failed.

It seems Bug #484069 may fix this problem, however this is not currently in the main line!

Stay Tuned!

You first need to ensure the pam plugin is active by checking the data dictionary plugin table.

drizzle> select * from data_dictionary.plugins where plugin_name=’pam’; +————-+—————-+—————+—————+————————–+—————-+ | PLUGIN_NAME | PLUGIN_VERSION | PLUGIN_STATUS | PLUGIN_AUTHOR | PLUGIN_DESCRIPTION | PLUGIN_LICENSE | +————-+—————-+—————+—————+————————–+—————-+ | pam | 0.1 | ACTIVE | Brian Aker | PAM based authenication. | GPL | +————-+—————-+—————+—————+————————–+—————-+ 1 row in set (0 sec)

-->


PlanetMySQL Voting: Vote UP / Vote DOWN

Thoughts about working in a distributed organization

Planet MySQL - 周六, 2010/03/13 - 04:00

I've been working in a fully distributed work environment for almost 8 years now (I joined MySQL AB in April, 2002). Therefore I've been reading Toni Schneider's blog post about the "5 reasons why your company should be distributed" with great interest – he raised several points that I fully agree with and which I covered in my talks about "Working for a virtual company - how we do it at MySQL" at last year's next09 conference (slides, video) and at FrOSCon 2009 (video).

However, Toni draws a profusely positive picture here, or, as my dear colleague Dean pointed out "The blog overly simplifies the realities of a distributed workforce, making it sound like it's all ponies and rainbows".


Continue reading "Thoughts about working in a distributed organization"
PlanetMySQL Voting: Vote UP / Vote DOWN

451 CAOS Links 2010.03.12

Planet MySQL - 周六, 2010/03/13 - 02:06

Updating the MPL. Funding for Lucid and eXo. StatusNet. And more.

Follow 451 CAOS Links live @caostheory on Twitter and Identi.ca
“Tracking the open source news wires, so you don’t have to.”

Updating the MPL
# ZDnet reported that the 10-year-old Mozilla Public License will be updated by the end of 2010, while Mitchell Baker explained the process.

Funding for Lucid and eXo
# Lucid Imagination raised $10m in series B funding from Shasta Ventures, Granite Ventures and Walden International.

# eXo Platform raised $6m from Auriga Partners and XAnge Capital and confirmed Bob Bickel as its chairman.

Status check
# StatusNet launched the StatusNet Cloud Service (SCS) into public beta, while OStatic published a Q&A with StatusNet’s CEO on the future of the open source microblogging platform provider.

Busy week for Simon Phipps
# Sun’s chief open source officer, Simon Phipps confirmed he will not be joining Oracle, but also confirmed his election as director of the Open Source Initiative. Joe Brockmeier asked if Simon Phipps will be able to energize the OSI.

# Jay Pipes confirmed that he and many of the Sun Drizzle team are now working at Rackspace Cloud.

# Open Source for America responded to the IIPA’s attack on open source.

# In the first of a series of article’s OpenNMS’s Tarus Balog explained what it takes to build an open source business. http://bit.ly/cTbQq1

# Bloomberg reported, and Elliot denied, that it plans to sell Novell’s NetWare and Linux units.

# Engine Yard claimed to have tripled its customer base in the last six months to reach 1,000 customers.

# SpringSource introduced SpringSource tc Server Spring Edition.

# Dirk Riehle outlined the three areas of open source economics.

# The VAR Guy speculated about Red Hat’s apparently imminent move into business intelligence.

# Digg explained its move from MySQL to Apache Cassandra.

# Appcelerator Titanium 1.0 is now generally available.

# SugarCRM launched its Open+ Partner Program.

# Terracotta announced the availability of Ehcache 2.0 as well as upgrades to Terracotta Web Sessions.

# MySQL/Memcached appliance vendor Schooner was ranked 34th on the WSJ’s list of the top 50 venture-backed companies, while Groundwork Open Source was ranked 28th.

# OSS Watch published an explanation of how the threat to copyleft licenses is not proliferation, but incompatibility.

# A short but sweet explanation of Cloudera’s formation and raison d’être.

# An interview with WaveMaker CEO Chris Keene on commercial open source licensing, business, community strategies.

# Squiz updated its MySource Matrix open source CMS with formal support for Funnelback Search.

# The creators of the Hypertable open source distributed (NoSQL) database have formed Hypertable Inc.


PlanetMySQL Voting: Vote UP / Vote DOWN

Log Buffer #182, a Carnival of the Vanities for DBAs

Planet MySQL - 周六, 2010/03/13 - 01:01

This is the 182nd edition of Log Buffer, the weekly review of database blogs. Make sure to read the whole edition so you do not miss where to submit your SQL limerick!

This week started out with me posting about International Women’s Day, and has me personally attending Confoo (Montreal) which is an excellent conference I hope to return to next year. I learned a lot from confoo, especially the blending nosql and sql session I attended.

This week was also the Hotsos Symposium. Doug’s Oracle Blog has a series of posts about Hotsos. If all this talk about conferences has gotten you excited, Joshua Drake notes that 14 days and the hotel is almost full for postgresql conference east which is March 25th-28th in Philadelphia. And the Oracle database insider notes that the Oracle OpenWorld call for papers is now open.

According to Susan Visser this week (ending tomorrow) is also read an e-book week. So if you have not already done so, read an e-book! She links a coupon for an e-book in the post.

Craig Mullins notes that the mainframe is a good career choice in Mainframes: The Safe IT Career Choice. He notes that the mainframe is still not dead:

People having been predicting the death of the mainframe since the advent of client/server in the late 1980s. That is more than 20 years! Think of all the things that have died in that timespan while the mainframe keeps on chugging away: IBM’s PC business, Circuit City, Koogle peanut butter, public pay phones, Johnny Cash… the list is endless.

In other career-related news, Antonio Cangiano is looking for [2] top-notch student hackers for a 16-month internship at IBM in Toronto starting in May. All the details, including how to apply, are in Cangiano’s blog post.

Willie Favero wants to know how you “solve the batch dilemma” for issues like “shrinking your batch window, designing your batch to play nicely with … OLTP” in how’s your batch workload doing? Perhaps Favero should read the updated batch best practices posted by Anthony Shorten.

Bryan Smith surveys a more personal question by asking if you go both ways and “manage both DB2 for Linux, UNIX, and Windows and DB2 for z/OS” in don’t ask, don’t tell, bi-platform DBAs. This week’s Log Buffer editor admits to being a tri-platform DBA — she has tried many platforms, and in fact, many databases (MySQL, Oracle, DB2, SQL Server, Sybase, Postgres and Ingres)!

Hari Prasanna Srinivasan promotes a patching survey in Oracle really wants to hear from you! Patching Survey.

Henrik Loeser explains what a deadlock and a hot spot are by using a real life analogy taken from a police report in deadlock and hot spot in real life.

Jamie Thomson asks why do you abbreviate schema names?. Shlomi Noach tries to solve the issue that “there is no consistent convention as for how to write [about table aliases in] an SQL query” in proper sql table alias use conventions. Noach also gives us a tip: faster than truncate.

Leons Petrazickis reminds us that “rulesets are chains” and it is important to have your rulesets in the proper order in iptables firewall pitfall.

Anyone interested in the history of MySQL AB will be informed after reading Dries Buytaert’s article.
Gavin Towey shares his software that helps centrally manage 120 MySQL servers in qsh.pl: distributed query tool For those who want to learn more about column-oriented databases, particularly in MySQL, Robin Schumacher of the InfiniDB blog announces that there is a MySQL University session recording on MySQL column databases now available. MySQL join-fu expert Jay Pipes has moved his blog to www.joinfu.com and starts with An SQL Puzzle and of course a follow up on the sql puzzle.

Ivan Zoratti is happy that finally, slides posted for the MySQL DW breakfast. Venu Anuganti gives you tips on one of the most common MySQL frustrations: optimizing subqueries in how to improve subqueries derived tables performance. Justin Swanhart posts the way in which he Gets Linux performance information from your MySQL database without shell access and emulates a ‘top’ CPU summary using /proc/stat and MySQL using the same method.

The Oracle Apps blog has an introduction to Oracle user productivity kit (UPK). Even though in this editor’s opinion the article is very sales-pitchy, it has valuable information, and does indeed live up to its promise:

UPK is a software tool that can capture all the steps in a system process. It records every keystroke, every click of the mouse, each menu option chosen and each button pressed. All this is done in the UPK Recorder by going through the transaction and pressing “printscreen” after every user action. From this, without any further effort from the developer, UPK builds a number of valuable outputs.

Allen White gives a great tip on how to optimize queries in keep your data clean.

Mike Dietrich reminds you to remove “old” parameters and events from your init.ora when upgrading, “as keeping them will definitely slow down the database performance in the new release.” He shows evidence of slowness when this is not done. Dietrich also shows how you can be gathering workload statistics “to give the optimizer some good knowledge about how powerful your IO-system might be”, especially “a few days after upgrading to the new release…while a real workload is running.”

Brian Aker shows the exciting features coming soon in Drizzle in Drizzle, Cherry, Roadmap for our Next Release.

Maybe you are thinking of migrating, not upgrading…..The O’Reilly Radar shows how to asses an Oracle to MySQL migration in MySQL migration and risk management. Actually, that article interviews Ronald Bradford on the subject — Bradford has been prolific lately, updating free my.cnf advice series and “Don’t Assume”: MySQL for the Oracle DBA series. Nick Quarmby also talks about migrating Oracle, but not to a new database, just to a new platform, in his primer on migrating Oracle Applications to new platforms. And the big news comes from Carlos of dataprix that Twitter will migrate from MySQL to Cassandra DB.

Paul S. Randal explains his way of benchmarking: 1 Tb table population on SQL Server.

Pete Finnigan shares his slides from a webinar on how to secure oracle, and Denis Pilipchuk shares his approaches for discovering security vulnerabilities in software applications.

Jeff Davis shares his thoughts about scalability and the relational model. Robert Treat responds actually, the relational model doesn’t scale and Baron Schwartz counters with NoSQL doesn’t mean non-relational.

Buck Woody explains “whenever you want to know something about SQL Server’s configuration, whether that’s the Instance itself or a database, you have a few options” — and of course what those options are — in system variables, stored procedures or functions for meta data.

This week’s T-SQL Tuesday topic was I/O. There are many links to great blog posts in the comments; three random posts I chose to highlight: Michael Zilberstein talks about IO capacity planning, while Kalen Delaney talks about using STATISTICS IO in I/O, you know, and Merrill Aldrich chimes in with information on real world SSD’s. Aldrich also begs folks not to waste resources and make more work for developers and DBAs in dear ISV, you’re keeping me awake nights with your VARCHAR() dates.

And we end with a bit of fin: Paul Nielsen wants us all to have a bit of fun; he has posted an SQL limerick and asks readers to create there own in there once was in Dublin a query.


PlanetMySQL Voting: Vote UP / Vote DOWN

Log Buffer #182, a Carnival of the Vanities for DBAs

Pythian Group - 周六, 2010/03/13 - 01:01

This is the 182nd edition of Log Buffer, the weekly review of database blogs. Make sure to read the whole edition so you do not miss where to submit your SQL limerick!

This week started out with me posting about International Women’s Day, and has me personally attending Confoo (Montreal) which is an excellent conference I hope to return to next year. I learned a lot from confoo, especially the blending nosql and sql session I attended.

This week was also the Hotsos Symposium. Doug’s Oracle Blog has a series of posts about Hotsos. If all this talk about conferences has gotten you excited, Joshua Drake notes that 14 days and the hotel is almost full for postgresql conference east which is March 25th-28th in Philadelphia. And the Oracle database insider notes that the Oracle OpenWorld call for papers is now open.

According to Susan Visser this week (ending tomorrow) is also read an e-book week. So if you have not already done so, read an e-book! She links a coupon for an e-book in the post.

Craig Mullins notes that the mainframe is a good career choice in Mainframes: The Safe IT Career Choice. He notes that the mainframe is still not dead:

People having been predicting the death of the mainframe since the advent of client/server in the late 1980s. That is more than 20 years! Think of all the things that have died in that timespan while the mainframe keeps on chugging away: IBM’s PC business, Circuit City, Koogle peanut butter, public pay phones, Johnny Cash… the list is endless.

In other career-related news, Antonio Cangiano is looking for [2] top-notch student hackers for a 16-month internship at IBM in Toronto starting in May. All the details, including how to apply, are in Cangiano’s blog post.

Willie Favero wants to know how you “solve the batch dilemma” for issues like “shrinking your batch window, designing your batch to play nicely with … OLTP” in how’s your batch workload doing? Perhaps Favero should read the updated batch best practices posted by Anthony Shorten.

Bryan Smith surveys a more personal question by asking if you go both ways and “manage both DB2 for Linux, UNIX, and Windows and DB2 for z/OS” in don’t ask, don’t tell, bi-platform DBAs. This week’s Log Buffer editor admits to being a tri-platform DBA — she has tried many platforms, and in fact, many databases (MySQL, Oracle, DB2, SQL Server, Sybase, Postgres and Ingres)!

Hari Prasanna Srinivasan promotes a patching survey in Oracle really wants to hear from you! Patching Survey.

Henrik Loeser explains what a deadlock and a hot spot are by using a real life analogy taken from a police report in deadlock and hot spot in real life.

Jamie Thomson asks why do you abbreviate schema names?. Shlomi Noach tries to solve the issue that “there is no consistent convention as for how to write [about table aliases in] an SQL query” in proper sql table alias use conventions. Noach also gives us a tip: faster than truncate.

Leons Petrazickis reminds us that “rulesets are chains” and it is important to have your rulesets in the proper order in iptables firewall pitfall.

Anyone interested in the history of MySQL AB will be informed after reading Dries Buytaert’s article.
Gavin Towey shares his software that helps centrally manage 120 MySQL servers in qsh.pl: distributed query tool For those who want to learn more about column-oriented databases, particularly in MySQL, Robin Schumacher of the InfiniDB blog announces that there is a MySQL University session recording on MySQL column databases now available. MySQL join-fu expert Jay Pipes has moved his blog to www.joinfu.com and starts with An SQL Puzzle and of course a follow up on the sql puzzle.

Ivan Zoratti is happy that finally, slides posted for the MySQL DW breakfast. Venu Anuganti gives you tips on one of the most common MySQL frustrations: optimizing subqueries in how to improve subqueries derived tables performance. Justin Swanhart posts the way in which he Gets Linux performance information from your MySQL database without shell access and emulates a ‘top’ CPU summary using /proc/stat and MySQL using the same method.

The Oracle Apps blog has an introduction to Oracle user productivity kit (UPK). Even though in this editor’s opinion the article is very sales-pitchy, it has valuable information, and does indeed live up to its promise:

UPK is a software tool that can capture all the steps in a system process. It records every keystroke, every click of the mouse, each menu option chosen and each button pressed. All this is done in the UPK Recorder by going through the transaction and pressing “printscreen” after every user action. From this, without any further effort from the developer, UPK builds a number of valuable outputs.

Allen White gives a great tip on how to optimize queries in keep your data clean.

Mike Dietrich reminds you to remove “old” parameters and events from your init.ora when upgrading, “as keeping them will definitely slow down the database performance in the new release.” He shows evidence of slowness when this is not done. Dietrich also shows how you can be gathering workload statistics “to give the optimizer some good knowledge about how powerful your IO-system might be”, especially “a few days after upgrading to the new release…while a real workload is running.”

Brian Aker shows the exciting features coming soon in Drizzle in Drizzle, Cherry, Roadmap for our Next Release.

Maybe you are thinking of migrating, not upgrading…..The O’Reilly Radar shows how to asses an Oracle to MySQL migration in MySQL migration and risk management. Actually, that article interviews Ronald Bradford on the subject — Bradford has been prolific lately, updating free my.cnf advice series and “Don’t Assume”: MySQL for the Oracle DBA series. Nick Quarmby also talks about migrating Oracle, but not to a new database, just to a new platform, in his primer on migrating Oracle Applications to new platforms. And the big news comes from Carlos of dataprix that Twitter will migrate from MySQL to Cassandra DB.

Paul S. Randal explains his way of benchmarking: 1 Tb table population on SQL Server.

Pete Finnigan shares his slides from a webinar on how to secure oracle, and Denis Pilipchuk shares his approaches for discovering security vulnerabilities in software applications.

Jeff Davis shares his thoughts about scalability and the relational model. Robert Treat responds actually, the relational model doesn’t scale and Baron Schwartz counters with NoSQL doesn’t mean non-relational.

Buck Woody explains “whenever you want to know something about SQL Server’s configuration, whether that’s the Instance itself or a database, you have a few options” — and of course what those options are — in system variables, stored procedures or functions for meta data.

This week’s T-SQL Tuesday topic was I/O. There are many links to great blog posts in the comments; three random posts I chose to highlight: Michael Zilberstein talks about IO capacity planning, while Kalen Delaney talks about using STATISTICS IO in I/O, you know, and Merrill Aldrich chimes in with information on real world SSD’s. Aldrich also begs folks not to waste resources and make more work for developers and DBAs in dear ISV, you’re keeping me awake nights with your VARCHAR() dates.

And we end with a bit of fin: Paul Nielsen wants us all to have a bit of fun; he has posted an SQL limerick and asks readers to create there own in there once was in Dublin a query.

Using ext4 for MySQL

Planet MySQL - 周六, 2010/03/13 - 00:51

This week with a client I saw ext4 used for the first time on a production MySQL system which was running Ubuntu 9.10 (Karmic Koala). I observe today while installing 9.10 Server locally that ext4 is the default option. The ext4 filesystem is described as better performance, reliability and features while there is also information about improvements in journaling.

At OSCON 2009 I attended a presentation on Linux Filesystem Performance for Databases by Selena Deckelmann in which ext4 was included. While providing some improvements in sequential reading and writing, there were issue with random I/O which is the key for RDBMS products.

Is the RAID configuration (e.g. RAID 5, RAID 10), strip size, buffer caches, LVM etc more important then upgrading from ext3 to ext4? I don’t have access to any test equipment in order to determine myself however I’d like to know of any experiences from members of the MySQL community and if anybody has experienced any general problems running ext4.

ext4 References
PlanetMySQL Voting: Vote UP / Vote DOWN

The Art of &quot;What is going on inside of my database?&quot;

Planet MySQL - 周六, 2010/03/13 - 00:50
Yesterday we were having a conversation on IRC about the need for more useful information about the internals of the database.

"SHOW STATUS" is just too primitive in its design to provide the sort of detailed information you need to do operations. Yesterday we got a bug request over the number of "open tables" found after a particular query. The user had assumed the number was off, but what they hadn't realized was that the number was accurate (in this particular case, MySQL fudges a number on open tables because it can't handle count its derived tables).

One of the patches coming into the tree right now fully exposes the contents of the table cache and table definition cache to the user. You can see who holds what locks on what tables, and you can see the actual count on table access per table.

drizzle> select * from TABLE_DEFINITION_CACHE; +-----------------+------------------------+---------+-------------+----------------+ | TABLE_SCHEMA | TABLE_NAME | VERSION | TABLE_COUNT | IS_NAME_LOCKED | +-----------------+------------------------+---------+-------------+----------------+ | data_dictionary | schema_names | 1 | 1 | FALSE | | data_dictionary | table_definition_cache | 1 | 1 | FALSE | | data_dictionary | show_tables | 1 | 1 | FALSE | +-----------------+------------------------+---------+-------------+----------------+ 3 rows in set (0 sec) drizzle> select * from TABLE_CACHE; +------------+-----------------+------------------------+-----------+----------------+---------+----------------+------+----------------+------------+----------------+ | SESSION_ID | TABLE_SCHEMA | TABLE_NAME | ARCHETYPE | ENGINE | VERSION | IS_NAME_LOCKED | ROWS | AVG_ROW_LENGTH | TABLE_SIZE | AUTO_INCREMENT | +------------+-----------------+------------------------+-----------+----------------+---------+----------------+------+----------------+------------+----------------+ | 0 | data_dictionary | schema_names | FUNCTION | FunctionEngine | 1 | FALSE | 100 | 260 | 0 | 0 | | 0 | data_dictionary | show_tables | FUNCTION | FunctionEngine | 1 | FALSE | 100 | 260 | 0 | 0 | | 1 | data_dictionary | table_cache | FUNCTION | FunctionEngine | 1 | FALSE | 100 | 1113 | 0 | 0 | | 0 | data_dictionary | table_definition_cache | FUNCTION | FunctionEngine | 1 | FALSE | 100 | 559 | 0 | 0 | +------------+-----------------+------------------------+-----------+----------------+---------+----------------+------+----------------+------------+----------------+ 4 rows in set (0 sec) drizzle> select * from TABLE_DEFINITION_CACHE WHERE TABLE_COUNT > 1; Empty set (0 sec) drizzle> select * from TABLE_DEFINITION_CACHE WHERE TABLE_COUNT > 0; +-----------------+------------------------+---------+-------------+----------------+ | TABLE_SCHEMA | TABLE_NAME | VERSION | TABLE_COUNT | IS_NAME_LOCKED | +-----------------+------------------------+---------+-------------+----------------+ | data_dictionary | schema_names | 1 | 1 | FALSE | | data_dictionary | table_cache | 1 | 1 | FALSE | | data_dictionary | table_definition_cache | 1 | 1 | FALSE | | data_dictionary | show_tables | 1 | 1 | FALSE | +-----------------+------------------------+---------+-------------+----------------+ 4 rows in set (0 sec) drizzle> select count(*) from TABLE_DEFINITION_CACHE WHERE TABLE_COUNT > 0; +----------+ | count(*) | +----------+ | 4 | +----------+ 1 row in set (0 sec)



The term "ARCHETYPE" is the base primitive about what sort of table was used. It is more detailed then the ANSI "TABLE_TYPE" that exists in I_S. We still have a debate on what exactly this term should mean. One of the things I enjoy about working on Drizzle? I am not stuck in a room full of people who will spend hours on this sort of bike shed decisions.

Version gives you the current definition count for the table. Right now that number is still based on "since opened" but we will soon be storing the metadata for this so you will know how many times in the life of an object it has been changed.

We are still working out the details to SHOW TABLE STATUS. Our SHOW commands are just query rewrites to tables.

Here is a partial example of the new table that is outputted from a SHOW TABLE STATUS:

+---------+--------+-------------------+----------- | Session | Schema | Name | Type +---------+--------+-------------------+----------- | 0 | Schema | b | STANDARD | 0 | Schema | show_tables | FUNCTION | 1 | Schema | show_table_status | FUNCTION | 0 | Schema | schema_names | FUNCTION | 0 | Schema | dfsdf | STANDARD | 1 | Schema | b | TEMPORARY | 1 | Schema | a | TEMPORARY +---------+--------+-------------------+-----------+ 7 rows in set (0 sec)

Notice Session? Notice that type can be Temporary? With our system you can see the current owner of the open table and we now include whatever temporary tables you have in your own session. We have also included a larger table which you can see just your own temporary tables (and most likely I will soon create a table so that you can see all temporary tables open across all sessions). The current "Type" in SHOW TABLE STATUS is an Archetype so we will be changing that so that terms match up across the database. Consistency in design is an awesome thing :)

There is a lot more to come!

P.S. Just wait until I push the code for tracking locks in Drizzle, I demoed it at SCALE and got a lot of both positive and constructive feedback on it.
PlanetMySQL Voting: Vote UP / Vote DOWN

But I DO want MySQL to say “ERROR”!

Planet MySQL - 周五, 2010/03/12 - 12:53

MySQL is known for its willingness to accept invalid queries, data values. It can silently commit your transaction, truncate your data.

  • Using GROUP_CONCAT with a small group_concat_max_len setting? Your result will be silently truncated (make sure to check the warnings though).
  • Calling CREATE TEMPORARY TABLE? You get silent commit.
  • Issuing a ROLLBACK on non-transactional involved engines? Have a warning; no error.
  • Using LOCK IN SHARE MODE on non transactional tables? Not a problem. Nothing reported.
  • Adding a FOREIGN KEY on a MyISAM table? Good for you; no action actually taken.
  • Inserting 300 to a TINYINT column in a relaxed sql_mode? Give me 255, I’ll silently drop the remaining 45. I owe you.
Warnings and errors

It would be nice to:

  • Have an auto_propagate_warning_to_error server variable (global/session/both) which, well, does what it says.
  • Have an i_am_really_not_a_dummy server variable which implies stricter checks for all the above and prevents you from doing with anything that may be problematic (or rolls back your transactions on your invalid actions).

Connectors may be nice enough to propagate warnings to errors – that’s good. But not enough: since data is already committed in MySQL.

If I understand correctly, and maybe it’s just a myth, it all relates to the times where MySQL had interest in a widespread adoption across the internet, in such way that it does not interfere too much with the users (hence leading to the common myth that “MySQL just works out of the box and does not require me to configure or understand anything”).

MySQL is a database system, and is now widespread, and is used by serious companies and products. It is time to stop play nice to everyone and provide with strict integrity — or, be nice to everyone, just allow me to specify what “nice” means for me.


PlanetMySQL Voting: Vote UP / Vote DOWN

A Follow Up on the SQL Puzzle

Planet MySQL - 周五, 2010/03/12 - 06:12
Or…What the Heck is Wrong with CREATE TABLE IF NOT EXISTS ... SELECT?

So, earlier this week, I blogged about an SQL puzzle that had come up in my current work on Drizzle’s new transaction log.

I posed the question to readers what the “correct” result of the following would be:

CREATE TABLE t1 (a int, b int); INSERT INTO t1 VALUES (1,1),(1,2); CREATE TEMPORARY TABLE t2 (a int, b int, PRIMARY KEY (a)); BEGIN; INSERT INTO t2 VALUES (100,100); CREATE TEMPORARY TABLE IF NOT EXISTS t2 (PRIMARY KEY (a)) SELECT * FROM t1;   # The above statement will correctly produce an ERROR 23000: Duplicate entry '1' for key 'PRIMARY' # What should the below result be?   SELECT * FROM t2; COMMIT;

A number of readers responded, and, to be fair, most everyone was “correct” in their own way. Why? Well, because the way that MySQL deals with calls to CREATE TABLE ... SELECT, CREATE TABLE IF NOT EXISTS ... SELECT and their temporary-table counterparts is completely stupid, as I learned this week. Rob Wultsch essentially sums up my feelings about the behaviour of DDL statements in regards to transactions in a session:

Implicit commit is evil and stupid. Ideally we the server should error and roll back, imho.

The Officially Correct Answer (at least in MySQL)

OK, so here’s the “official” correct answer:

CREATE TABLE IF NOT EXISTS ... SELECT does not first check for the existence of the table in question. Instead, if the table in question does exist, CREATE TABLE IF NOT EXISTS ... SELECT behaves like an INSERT INTO ... SELECT statement. Yep, you heard right. So, instead of throwing a warning when it notices that the table exists, MySQL instead attempts to insert rows from the SELECT query into the existing table.

Here is the official MySQL explanation:

For CREATE TABLE … SELECT, if IF NOT EXISTS is given and the table already exists, MySQL handles the statement as follows:
* The table definition given in the CREATE TABLE part is ignored. No error occurs, even if the definition does not match that of the existing table.
* If there is a mismatch between the number of columns in the table and the number of columns produced by the SELECT part, the selected values are assigned to the rightmost columns. For example, if the table contains n columns and the SELECT produces m columns, where m < n, the selected values are assigned to the m rightmost columns in the table. Each of the initial n – m columns is assigned its default value, either that specified explicitly in the column definition or the implicit column data type default if the definition contains no default. If the SELECT part produces too many columns (m > n), an error occurs.
* If strict SQL mode is enabled and any of these initial columns do not have an explicit default value, the statement fails with an error.

So, given the above manual explanation, the correct answer to the original blog post is:

a | b 100 | 100

partly because there is an implicit COMMIT directly before the CREATE TABLE is executed (committing the 100,100 record to the table) and the primary key violation kills off the INSERTs of 1,1 in InnoDB. For a MyISAM table, the 1,1 record would be in the table, since MyISAM has no idea what a ROLLBACK is.

I Think Drizzle Should Follow PostgreSQL’s Example Here

On implicit commits before DDL operations, I believe they should all go bye-bye. DDL should be transactional in Drizzle and if a statement cannot be executed in a transaction, it should throw an error if there is an active transaction. Period.

For behaviour of CREATE TABLE ... SELECT acting like an INSERT INTO ... SELECT, that entire code path should be ripped out.

PostgreSQL’s DDL operations, IMHO, are sane. Sane is good. PostgreSQL allows quite a bit of flexibility by implementing the SQL standard’s CREATE TABLE and CREATE TABLE AS statements. I believe Drizzle should scrap all the DDL table-creation code and instead implement PostgreSQL’s much-nicer DDL methods. There, I said it. Slashdot MySQL haters, there ya go.


PlanetMySQL Voting: Vote UP / Vote DOWN

Drizzle&#8217;s Data Dictionary and Global Status

Planet MySQL - 周五, 2010/03/12 - 05:33

With the recent news by Brian about the Data Dictionary in Drizzle replacing the INFORMATION_SCHEMA, I was looking into the server status variables (aka INFORMATION_SCHEMA.GLOBAL_STATUS) and I came across an interesting discovery.

select * from data_dictionary.global_status; ... | Table_locks_immediate | 0 | | Table_locks_waited | 0 | | Threads_connected | 8134064 | | Uptime | 332 | | Uptime_since_flush_status | 332 | +----------------------------+----------------+ 51 rows in set (0 sec)

This only retrieved 51 rows, which is way less then previous. What I wanted was clearly missing, all the old com_ status variables. Looking at what the data_dictionary actually has available revealed a new table.

drizzle> select * from data_dictionary.global_statements; +-----------------------+----------------+ | VARIABLE_NAME | VARIABLE_VALUE | +-----------------------+----------------+ | admin_commands | 0 | | alter_db | 0 | | alter_table | 0 | | analyze | 0 | | begin | 0 | | change_db | 1 | | check | 0 | | checksum | 0 | | commit | 0 | | create_db | 0 | | create_index | 0 | | create_table | 0 | | delete | 0 | | drop_db | 0 | | drop_index | 0 | | drop_table | 0 | | empty_query | 0 | | flush | 0 | | insert | 0 | | insert_select | 0 | | kill | 0 | | load | 0 | | release_savepoint | 0 | | rename_table | 0 | | replace | 0 | | replace_select | 0 | | rollback | 0 | | rollback_to_savepoint | 0 | | savepoint | 0 | | select | 10 | | set_option | 0 | | show_create_db | 0 | | show_create_table | 0 | | show_errors | 0 | | show_warnings | 0 | | truncate | 0 | | unlock_tables | 0 | | update | 0 | +-----------------------+----------------+ 38 rows in set (0 sec)

Kudos to this. Looking at list I saw an obvious omission, of “ping”. Something that caught me out some years ago with huge (300-500 per second admin_commands). I’m also a fan of Mark’s recent work An evening hack – Com_ping in MySQL.


PlanetMySQL Voting: Vote UP / Vote DOWN

Liveblogging at Confoo: Blending NoSQL and SQL

Planet MySQL - 周五, 2010/03/12 - 00:11

Persistence Smoothie: Blending NoSQL and SQL – see user feedback and comments at http://joind.in/talk/view/1332.

Michael Bleigh from Intridea, high-end Ruby and Ruby on Rails consultants, build apps from start to finish, making it scalable. He’s written a lot of stuff, available at http://github.com/intridea. @mbleigh on twitter

NoSQL is a new way to think about persistence. Most NoSQL systems are not ACID compliant (Atomicity, Consistency, Isolation, Durability).

Generally, most NoSQL systems have:

  • Denormalization
  • Eventual Consistency
  • Schema-Free
  • Horizontal Scale

NoSQL tries to scale (more) simply, it is starting to go mainstream – NY Times, BBC, SourceForge, Digg, Sony, ShopWiki, Meebo, and more. But it’s not *entirely* mainstream, it’s still hard to sell due to compliance and other reasons.

NoSQL has gotten very popular, lots of blog posts about them, but they reach this hype peak and obviously it can’t do everything.

“NoSQL is a (growing) collection of tools, not a new way of life.”

What is NoSQL? Can be several things:

  • Key-Value Stores
  • Document Databases
  • Column-oriented data stores
  • Graph Databases

Key-Value Stores
memcached is a “big hash in the sky” – it is a key value store. Similarly, NoSQL key-value stores “add to that big hash in the sky” and store to disk.

Speaker’s favorite is Redis because it’s similar to memcached.

  • key-value store + datatypes (list, sets, scored sets, soon hashes will be there)
  • cache-like functions (like expiration)
  • (Mostly) in-memory

Another interesting key-value store is Riak

  • Combination of key-value store and document database
  • heavy into HTTP REST
  • You can create links between documents, and do “link walking” that you don’t normally get out of a key-value store
  • built-in Map Reduce

Map Reduce:

  • Massively parallel way to process large datasets
  • First you scour data and “map” a new set of dataM
  • Then you “reduce” the data down to a salient result — for example, map reduce function to make a tag cloud: map function makes an array with a tag name and a count of 1 for each instance of that tag, and the reduce tag goes through that array and counts them…
  • http://en.wikipedia.org/wiki/MapReduce

Other key-value stores:

Document Databases
Some say that it’s the “closest” thing to real SQL.

  • MongoDB – Document store that speaks BSON (Binary JSON, which is compact). This is the speaker’s favorite because it has a rich query syntax that makes it close to SQL. Can’t do joins, but can embed objects in other objects, so it’s a tradeoff

    • Also has GridFS that can store large files efficiently, can scale to petabytes of data
    • does have MapReduce but it’s deliberate and you run it every so often.

  • CouchDB
    • Pure JSON Document Store – can query directly with nearly pure javascript (there are auth issues) but it’s an interesting paradigm to be able to run your app almost entirely through javascript.
    • HTTP REST interface
    • MapReduce only to see items in CouchDB. Incremental MapReduce, every time you add or modify a document, it dynamically changes the functions you’ve written. You can do really powerful queries as easy as you can do simple queries. However, some things are really complex, ie, pagination is almost impossible to do.
    • Intelligent Replication – CouchDB is designed to work with offline integration. Could be used instead of SQLite as the HTML5 data store, but you need CouchDB running locally to be doing offline stuff w/CouchDB

Column-oriented store
Columns are stored together (ie, names) instead of rows. Lets you be schema-less because you don’t care about a row’s consistency, you can just add a column to a table very easily.

Graph Databases
speaker’s opinion – there aren’t enough of these.
Neo4J – can handle modeling complex relationships – “friends of friends of cousins” but it requires a license.

When should I use this stuff?
If you have:Use
Complex, slow joins for an “activity stream”Denormalize, use a key-value store.
Variable schema, vertical interactionDocument database or column store
Modeling multi-step relationships (linkedin, friends of friends, etc)Graph

Don’t look for a single tool that does every job. Use more than one if it’s appropriate, weigh the tradeoffs (ie, don’t have 7 different data stores either!)

NoSQL solves real scalability and data design issues. But financial transactions HAVE to be atomic, so don’t use NoSQL for those.

A good presentation is http://www.slideshare.net/bscofield/the-state-of-nosql.

Using SQL and NoSQL together
Why? Well, your data is already in an SQL database (most likely).

You can blend by hand, but the easy way is DataMapper:
Generic, relational ORM (adapters for many SQL dbs and many NoSQL stores)
Implements Identity Map
Module-based inclusion (instead of extending from a class, you just include into a class).

You can set up multiple data targets (default is MySQL, example sets up MongoDB too).
DataMapper is:

  • Ultimate Polyglot ORM
  • simple r’ships btween persistence engines are easy
  • jack of all, master none
  • Sometimes perpetuates false assumptions –
  • If you’re in Ruby, your legacy stuff is in ActiveRecord, so you’re going to have to rewrite your code anyway.

Speaker’s idea to be less generic and better use of features of each data store – Gloo – “Gloo glues together different ORMs by providing relationship proxies.” this software is ALPHA ALPHA ALPHA.

The goal is to be able to define relationships on the terms of any ORM from any class, ORM or not
Right now – partially working activeRecord relationships
Is he doing it wrong? Is it a crazy/stupid idea? Maybe.

Example:
NeedUse
Assume you already have an auth systemit’s already in SQL, so leave it there.
Need users to be able to purchase items from the storefront – Can’t lose transactions, need full ACID complianceuse MySQL.
Social Graph – want to have activity streams and 1-way and 2-way relationships. Need speed, but not consistencyuse Redis
Product Listings — selling moves and books, both have different properties, products are pretty much non-relationaluse MongoDB

He wrote the example in about 3 hours, so integration of multiple data stores can be done quickly and work.


PlanetMySQL Voting: Vote UP / Vote DOWN

Liveblogging at Confoo: Blending NoSQL and SQL

Pythian Group - 周五, 2010/03/12 - 00:11

Persistence Smoothie: Blending NoSQL and SQL – see user feedback and comments at http://joind.in/talk/view/1332.

Michael Bleigh from Intridea, high-end Ruby and Ruby on Rails consultants, build apps from start to finish, making it scalable. He’s written a lot of stuff, available at http://github.com/intridea. @mbleigh on twitter

NoSQL is a new way to think about persistence. Most NoSQL systems are not ACID compliant (Atomicity, Consistency, Isolation, Durability).

Generally, most NoSQL systems have:

  • Denormalization
  • Eventual Consistency
  • Schema-Free
  • Horizontal Scale

NoSQL tries to scale (more) simply, it is starting to go mainstream – NY Times, BBC, SourceForge, Digg, Sony, ShopWiki, Meebo, and more. But it’s not *entirely* mainstream, it’s still hard to sell due to compliance and other reasons.

NoSQL has gotten very popular, lots of blog posts about them, but they reach this hype peak and obviously it can’t do everything.

“NoSQL is a (growing) collection of tools, not a new way of life.”

What is NoSQL? Can be several things:

  • Key-Value Stores
  • Document Databases
  • Column-oriented data stores
  • Graph Databases

Key-Value Stores
memcached is a “big hash in the sky” – it is a key value store. Similarly, NoSQL key-value stores “add to that big hash in the sky” and store to disk.

Speaker’s favorite is Redis because it’s similar to memcached.

  • key-value store + datatypes (list, sets, scored sets, soon hashes will be there)
  • cache-like functions (like expiration)
  • (Mostly) in-memory

Another interesting key-value store is Riak

  • Combination of key-value store and document database
  • heavy into HTTP REST
  • You can create links between documents, and do “link walking” that you don’t normally get out of a key-value store
  • built-in Map Reduce

Map Reduce:

  • Massively parallel way to process large datasets
  • First you scour data and “map” a new set of dataM
  • Then you “reduce” the data down to a salient result — for example, map reduce function to make a tag cloud: map function makes an array with a tag name and a count of 1 for each instance of that tag, and the reduce tag goes through that array and counts them…
  • http://en.wikipedia.org/wiki/MapReduce

Other key-value stores:

Document Databases
Some say that it’s the “closest” thing to real SQL.

  • MongoDB – Document store that speaks BSON (Binary JSON, which is compact). This is the speaker’s favorite because it has a rich query syntax that makes it close to SQL. Can’t do joins, but can embed objects in other objects, so it’s a tradeoff

    • Also has GridFS that can store large files efficiently, can scale to petabytes of data
    • does have MapReduce but it’s deliberate and you run it every so often.

  • CouchDB
    • Pure JSON Document Store – can query directly with nearly pure javascript (there are auth issues) but it’s an interesting paradigm to be able to run your app almost entirely through javascript.
    • HTTP REST interface
    • MapReduce only to see items in CouchDB. Incremental MapReduce, every time you add or modify a document, it dynamically changes the functions you’ve written. You can do really powerful queries as easy as you can do simple queries. However, some things are really complex, ie, pagination is almost impossible to do.
    • Intelligent Replication – CouchDB is designed to work with offline integration. Could be used instead of SQLite as the HTML5 data store, but you need CouchDB running locally to be doing offline stuff w/CouchDB

Column-oriented store
Columns are stored together (ie, names) instead of rows. Lets you be schema-less because you don’t care about a row’s consistency, you can just add a column to a table very easily.

Graph Databases
speaker’s opinion – there aren’t enough of these.
Neo4J – can handle modeling complex relationships – “friends of friends of cousins” but it requires a license.

When should I use this stuff?
If you have:Use
Complex, slow joins for an “activity stream”Denormalize, use a key-value store.
Variable schema, vertical interactionDocument database or column store
Modeling multi-step relationships (linkedin, friends of friends, etc)Graph

Don’t look for a single tool that does every job. Use more than one if it’s appropriate, weigh the tradeoffs (ie, don’t have 7 different data stores either!)

NoSQL solves real scalability and data design issues. But financial transactions HAVE to be atomic, so don’t use NoSQL for those.

A good presentation is http://www.slideshare.net/bscofield/the-state-of-nosql.

Using SQL and NoSQL together
Why? Well, your data is already in an SQL database (most likely).

You can blend by hand, but the easy way is DataMapper:
Generic, relational ORM (adapters for many SQL dbs and many NoSQL stores)
Implements Identity Map
Module-based inclusion (instead of extending from a class, you just include into a class).

You can set up multiple data targets (default is MySQL, example sets up MongoDB too).
DataMapper is:

  • Ultimate Polyglot ORM
  • simple r’ships btween persistence engines are easy
  • jack of all, master none
  • Sometimes perpetuates false assumptions –
  • If you’re in Ruby, your legacy stuff is in ActiveRecord, so you’re going to have to rewrite your code anyway.

Speaker’s idea to be less generic and better use of features of each data store – Gloo – “Gloo glues together different ORMs by providing relationship proxies.” this software is ALPHA ALPHA ALPHA.

The goal is to be able to define relationships on the terms of any ORM from any class, ORM or not
Right now – partially working activeRecord relationships
Is he doing it wrong? Is it a crazy/stupid idea? Maybe.

Example:
NeedUse
Assume you already have an auth systemit’s already in SQL, so leave it there.
Need users to be able to purchase items from the storefront – Can’t lose transactions, need full ACID complianceuse MySQL.
Social Graph – want to have activity streams and 1-way and 2-way relationships. Need speed, but not consistencyuse Redis
Product Listings — selling moves and books, both have different properties, products are pretty much non-relationaluse MongoDB

He wrote the example in about 3 hours, so integration of multiple data stores can be done quickly and work.

1.1.0 Alpha Release Now Available

Planet MySQL - 周四, 2010/03/11 - 23:31

We are excited to announce the availability of the 1.1.0 Alpha release of InfiniDB Community Edition.  This is our initial alpha release for 1.1 and is not recommended for production work. 

New functionality we’ve added with 1.1.0 includes:  



Improved support for queries with multiple types of joins.
Improved create table performance and a much smaller footprint on disk for the system catalog, newly created tables, and tables that contaRead More...
PlanetMySQL Voting: Vote UP / Vote DOWN

Liveblogging at Confoo: [not just] PHP Performance by Rasmus Lerdorf

Planet MySQL - 周四, 2010/03/11 - 22:29

Most of this stuff is not PHP specific, and Python or Ruby or Java or .NET developers can use the tools in this talk.

The session on joind.in, with user comments/feedback, is at http://joind.in/talk/view/1320.

Slides are at http://talks.php.net/show/confoo10

“My name is Rasmus, I’ve been around for a long time. I’ve been doing this web stuff since 1992/1993.”

“Generally performance is not a PHP problem.” Webservers not config’d, no expire headers on images, no favicon.

Tools: Firefox/Firebug extension called YSlow (developed by yahoo) gives you a grade on your site.

Google has developed the Firefox/Firebug pagespeed tool.

Today Rasmus will pick on wordpress. He checks out the code, then uses Siege to do a baseline benchmark — see the slide for the results.

Before you do anything else install an opcode cache like APC. Wordpress really likes this type of caching, see this slide for the results. Set the timezone, to make sure conversions aren’t being done all the time.

Make sure you are cpu-bound, NOT I/O bound. Otherwise, speed up the I/O.

Then strace your webserver processs. There are common config issues that you can spot in your strace code. grep for ENOENT which shows you “No such file or directory” errors.

AllowOverride None to turn off .htaccess for every directory, just read settings once from your config file….(unless you’re an ISP).

Make sure DirectoryIndex is set appropriately, watch your include_path. All this low-hanging fruit has examples on the common config issues slide.

Install pecl/inclued and generate a graph – here is the graph image (I have linked it because you really want to zoom in to the graph…)

In strace output check the open() calls. Conditional includes, function calls that include files, etc. need runtime context before knowing what to open. In the example, every request checks to see if we have the config file, once we have config’d we can get rid of that stuff. Get rid of all the conditionals and hard-code “include wp-config.php”. Examples are on the slide.

His tips to change:
Conditional config include in wp-load.php (as just mentioned)
Conditional did-header check in wp-blog-header.php
Don’t call require_wp_db() from wp-settings.php
Remove conditional require logic from wp_start_object_cache

Then check strace again, now all Rasmus sees is theming and translations, which he decided to keep, because that’s the good benefit of Wordpress – Performance is all about costs vs. flexibility. You don’t want to get rid of all of your flexibility, but you want to be fast.

Set error_reporting(-1) in wp-settings.php to catch all warnings — warnings slow you down, so get rid of all errors. PHP error handling is very slow, so getting rid of errors will make you faster.

The slide of warnings that wordpress throws.

Look at all C-level calls made, using callgrind, which sits under valgrind, a CPU emulator used for debugging. See the image of what callgrind shows.

Now dive into the PHP executor, by installing XDebug.

Check xhprofFacebook open sourced this about a year ago, it’s a PECL extension. The output is pretty cool, try it on your own site, Rasmus does show you how to use it. It shows you functions sorted by the most expensive to the least expensive.

For example, use $_SERVER[REQUEST_TIME] instead of time(). Use pconnect() if MySQL can handle the amount of webserver connections that will be persistent, etc.

After you have changed a lot of the stuff above, benchmark again with siege to see how much faster you are. In this case there is not much gained so far.

So keep going….the blogroll is very slow — Rasmus gets rid of it by commenting out in the sidebar.php file. I’d like to see something to make it “semi-dynamic” — that is, make it a static file that can be re-generated, since you might want the blogroll but links are not changed every second…..

At this point we’re out of low-hanging fruit.

HipHop is a PHP to C++ converter & compiler, including a threaded, event-driven server that replaces apache. Rasmus’ slide says “Wordpress is well-suited for HipHop because it doesn’t have a lot of dynamic runtime code. This is using the standard Wordpress-svn checkout with a few tweaks.”

Then, of course, benchmark again.

The first time you compile Wordpress with HipHop, you give it a list of files to add to the binary, it will complain about php code that generate file names, so you do have to fix that kind of stuff. There’s a huge mess of errors the first time you run it (”pages and pages”), and Rasmus had to patch HipHop (and Wordpress) but the changes in HipHop have been put back into HipHop, so you should be good for the most part.

Check out the errors, lots of them show logical errors like $foo.”bar” instead of $foo.=”bar” and $foo=”bar” instead of $foo==”bar” in an if statement. Which of course is nice for your own code, to find those logical errors.

(Wordpress takes in a $user_ID argument and immediately initializes a global $user_ID variable, which overwrites the argument passed in, so you can change the name of the argument passed in….)

You can also get rid of some code, things that check for existence of the same thing more than once. So it will take a bit of tweaking, but it’s worth it.

There are limitations to HipHop, for example:

  • It doesn’t support any of the new PHP 5.3 language features
  • Private properties don’t really exist under HipHop. They are treated as if they are protected instead.
  • You can’t unset variables. unset will clear the variable, but it will still be in the symbol table.
  • eval and create_function are limited
  • Variable variables $$var are not supported
  • Dynamic defines won’t work: define($name,$value)
  • get_loaded_extensions(), get_extension_funcs(), phpinfo(), debug_backtrace() don’t work
  • Conditional and dynamically created include filenames don’t work as you might expect
  • Default unix-domain socket filename isn’t set for MySQL so connecting to localhost doesn’t work

and HipHop does not support all extensions — see the list Rasmus has of extensions HipHop supports.

Then Rasmus showed an example using Twit (which he wrote) including the benchmarks. He shows that you can see what’s going on, like 5 MySQL calls on the home page and what happens when you don’t have a favicon.ico (in yellow).

In summary, “performance is all about architecture”, “know your costs”.

Be careful, because some tools (like valgrind and xdebug) you don’t want to put it on production systems, you could capture production traffic and replay it on a dev/testing box, but “you just have to minimize the differences and do your best”.


PlanetMySQL Voting: Vote UP / Vote DOWN
聚合内容