BlogPhotosContact

Topics

PHP, MySQL 4.1 and UTF-8

While trying to use MySQL 4.1 and its new utf-8 features, I ran into character encodings problems.
Even though the html page encoding was correctly set to utf-8 and data in the database was correctly utf-8 encoded data, the content displayed in my browser was completely wrong: non ascii characters were wrongly converted to 2 chars instead af one, revealing that MySQL was serving latin-1 instead of utf-8 as it should.
Trying to fix that by setting every default character set to utf-8 in my.conf, the MySQL configuration file, or in the database, didn't solve the problem. Changing collations, etc. didn't do it neither.
By looking at phpMyAdmin's way to solve the problem, I found this query right after they make a connection to the database:
SET NAMES 'utf8'; 
This statement set the character encoding for the next operations. This is a design flaw in MySQL,  they used a hack instead of doing things properly from the beginning. What happens if you use a database wrapper like PEAR::DB or whatever ? You will have to test beforehand if the db in use is MySQL if you need utf-8. Sounds silly ?
So the only way I found to avoid doing $db->query("SET NAMES 'utf8'") everytime I have to use MySQL 4.1, was to add this line to my.conf under the [mysqld] section:
init_connect="SET NAMES utf8" 
This means that everytime MySQL opens a connection, it sends SET NAMES utf8. But as the init_connect directive is not taken into account for user root (argument given is that you won't be able to fix problems under root account if your init_connect directive is wrong... not taking into account that if you are able to change my.conf, you can fix them anyway), you will have to use another account to test your changes.


Lots of time wasted.
Comments (0)  Permalink

Mnogosearch htdb mode and on-the-fly reindexing

I have been using Mnogosearch for quite a long time. It is a fast search engine that works with multiple backends. But one of the most interesting feature it provides is its htdb mode. With this mode, instead of crawling webpages, it gathers the data to be indexed directly from the database. This makes indexing really fast as you get rid of network access times. Coupled with the PHP extension, this makes Mnogosearch the search engine of choice when developing a dynamic website.

Mnogosearch is particularly suited for my needs as I use MySQL with InnoDB tables as they provide foreign keys support. Unfortunately InnoDB tables do not have fulltext indexing. So basically, I use Mnogosearch as a fulltext indexer for my InnoDB tables. And it works very well.

The main problem I had with this kind of configuration was how to quickly reindex my database content when it is modified. Reindexing from a web page is easy, you just need to crawl the page again and look at its modification date, but what about reindexing from an htdb server ? Mnogosearch has no idea.

Here is the solution I found:


When a record in the database is deleted I do:
$mnogo = DB::connect($dsn);
$id = 'htdb:/%/'.$recordId;
$mnogo->query('DELETE FROM dict USING dict, url '
.'WHERE url LIKE '.$mnogo->quote($id)
.' AND rec_id = url_id');
$mnogo->query('DELETE FROM url '
.'WHERE url LIKE '.$mnogo->quote($id));
When a record needs to be updated:
$url = 'htdb:/%/'.$recordId;
$cmd = 'indexer -a -u '.$url.' mnogosearch.conf';
exec($cmd);
And last, when a new record is added:
// $server is the url you used in your configuration
$url = htdb:/'.$server.'/'.$recordId;
$cmd = 'indexer -i -u '.$url.' mnogosearch.conf';
exec($cmd);
$url = 'htdb:/%/'.$recordId;
$cmd = 'indexer -a -u '.$url.' mnogosearch.conf';
exec($cmd);


Comments (0)  Permalink

Finding gaps in sequence

Here is a subselect query I have used successfully to get a list of missing serial numbers in a auto-incremented (or not so automatic in my case) sequence :

SELECT A.number + 1
FROM orders AS A
WHERE NOT EXISTS (
SELECT B.number FROM orders AS B
WHERE A.number + 1 = B.number)
GROUP BY A.number;

Yes, simple but effective :)

Will work only in MySQL 4.1+ though because it's a subselect.

Comments (0)  Permalink

Oracle buys InnoDB

Scary news this morning : Oracle Announces the Acquisition of Open Source Software Company, Innobase.

InnoDB tables is what makes MySQL interesting for me, even though it misses a few important features like full-text search. But it provides foreign key constraints and transactions which are must-haves for any serious web application. InnoDB are even supposed to perform better than MyISAM tables under some conditions.

The subtitle to the Oracle's press release says: "Oracle Plans to Increase Support for Open Source Software". This is probably not true. Oracle has taken control of the most interesting part of MySQL. The next step is to either take control of MySQL itself or kill it as a commercial competitor. The future of MySQL doesn't look too bright. Hopefully, InnoDB code is under GPL so they can fork it. And this looks today as the only way to go for them.

Maybe MySQL didn't realize until today that they were competing with such clever, big and aggressive companies as Oracle. They probably underestimated the risks. At least, by reading MySQL CEO declaration, it looks like now they realize it:

"This announcement represents further validation of the open source movement. The beauty of open source software and the GPL license is freedom. As with all MySQL code, InnoDB is provided under the GPL license, meaning that users have complete freedom to use, develop, and modify the code base. We are pleased to see even broader industry acceptance of open source database technology. This also means that database developers now have even greater flexibility to use MySQL and Oracle in the same environment."

It is obvious that the focus is on GPL and Open source. Marten Mickos knows that GPL is today the only thing that protects InnoDB and will give the company some time to deal with what's happening. Marten Mickos wants to reassure its customers, supporters and users. But as stated in the Oracle press release, "InnoDB's contractual relationship with MySQL comes up for renewal next year"...

Comments (1)  Permalink

Phreez, object persistence with PHP and PDO

phreez_logoI am going to release Phreez, my object persistence framework for PHP 5 (everyone got one theses days, you know) under BSD license. The project is currently hosted on Novell Forge, the code is in the subversion repository.

The code is not very clever, not very clean and not well tested so I wouldn't recommend anyone to use it at the moment. I plan to add some features and tests as I have to use Phreez for a real web-based application quite soon. At the moment, only MySQL is supported. Others will be easy to add.

The ideas behind Phreez are:

  • Performance
  • Use PHP Data Objects (PDO)
  • Use Iterators, ArrayAccess (SPL)
  • Use other PHP 5 features like overloading
  • Less code, Rapid Application Development (RAD)
  • No configuration (well, almost)
  • Handle relationships
  • Be flexible
  • Try not to be as good as Hibernate :p

There is still a lot to do, I will post more details on my blog as the project evolves. At least the logo is done.

Comments (2)  Permalink

On the move

There was recently an interesting post about scripting languages by Steve Yegge. What I particularly liked are the hints on other languages he gives and his invitation to try new languages.

I have been programming PHP since 1999 and felt a need to learn something new approximately when they launched Rails a few years ago. But instead of learning Rails, because I don't usually follow the hype, I decided to learn Python.

So, recently, I have been learning Python, then Ruby and now the Lua programming language. Out of these three languages, my favorite is Lua and it is where I spend most of my time now.

PHP blobThey are all excellent languages to learn and in many ways, they are better than PHP. As I said in a previous post, I now see PHP as a big blob which along the years have been developing in too many directions and is now facing a lot of problems because a lot of things were implemented incorrectly and were not carefully planned. It is not obvious at first, but when you start looking at other languages, you understand how some things could have been done and you realize how wrong they are in PHP. I am thinking of all the stuff you find in SPL, create_function(), call_user_function(), ::, $this->, parent::, instanceof, is_a(), the way static are handled, etc, etc. It's all a mess. And it starts to look in blog posts. From time to time, I read things about how nice it would be to have closures, traits, anonymous functions, weak references, duck typing in PHP and some hacks on how to imperfectly implement them.

And because PHP is not idiot proof, you will find a lot of people there with little knowledge about programming. Like this week's post of someone who says "function in_array() sucks because it is slow"... The more I know about programming and the less I find the PHP community interesting. It's like all the interesting people here are doing things secretly and don't take the time to share their experiences (unless they write books).

And the last things I have started to dislike is the way a company like Zend is taking over the language. I have also recently very much disliked Oracle buying InnoDB and Sun buying MySQL, although I was not surprised for MySQL given the road they had taken lately, which is actually a lot like the road PHP is taking with Zend at the moment. With rumors about Microsoft, Oracle, IBM buying Zend, I start to feel the fear. Because in the end, who decides what goes where in PHP core?

Here is a list of programming languages that deserve a look soon:

Comments (10)  Permalink

OMG PHP53!

Damn, I didn't really have the time to look at this version yet, but reading now the new features list, it looks like the most interesting PHP version ever.

I mean closures and lambda functions in PHP, I have been waiting for this for too long. Fileinfo, a very useful extension that I always had to compile is now bundled. Changes to the ternary operator ?:, hehe. A new MySQL driver which should eat less RAM ! ext/intl looks amazing too.

Well, this looks really great, I still have to test this (especially the anonymous functions) and see if it is as cool as it sounds.
Namespaces look weird with their antislash syntax tho... :/

Update : actually, given the long new features list, I start to wonder why they didn't call this version PHP6 ?
Comments (5)  Permalink
Next1-10/21