Grant Skinner

The "g" in gskinner. Also the "skinner".

@gskinner

RegExp Bugs With Accented Characters

During the course of developing the Spelling Plus Library, and more recently while adding multilingual support to it, I discovered two serious bugs with the Regular Expression implementation in ActionScript, and how it handles accented characters.

First, RegExp in AS3 does not include accented characters in the word character class. For example, the pattern /\w+/ (match one or more word characters) matches “r” and “sume” in “résume”, when it should match the full string. UPDATE: Arthur has pointed out in the comments that this is correct according to the ECMAScript and POSIX RegEx specifications. \w is intended to match just the set [a-zA-Z0-9_] , which it does in AS3. With that being understood, it would be nice to have support for unicode property sets (which allow you to match word characters in any language, among other things), but I can understand that this may have an unacceptable impact on the size of the Flash Player.

Secondly, there is a somewhat obscure problem with how the Flash player matches \S and accented characters. Specifically, it appears that it does not count accented characters properly when matching them to \S, and this results in weird results. This is not the case with the negated whitespace character set [^\s], although these sets should exhibit identical behaviour in RegEx. This issue is pretty weird, so I’ll give a few examples:

  1. the pattern /\S+/ (one or more not-whitespace chars) will match the full string of “é aé”, when it should match “é” and “aé” separately.

  2. the same pattern /\S+/ will match “aé” and “bé” correctly for the string “aé bé”.

  3. the pattern /\S{2,}/ (two or more not-whitespace chars) will match the full string “aé bcé” when it should match “aé” and “bcé”.

  4. the same pattern /\S{2,}/ will only match “bcé” for the string “éa bcé”, when it should match “éa” and “bcé”

All of the above work properly if you substitute [^\s] for \S.

Hopefully this is helpful for other people working with RegExp, especially with languages other than English. It is quite frustrating to work around – I ended up writing a specialized character lexer instead of using RegExp in SPL.

Know of any other RegExp bugs in AS3? Share them in the comments.

Multilingual Spell Checking for Flash and Flex

We’re currently finishing off version 1.2 of the Spelling Plus Library, which includes a number of enhancements including:

  • various minor fixes and improvements, including resolving an issue with AIR context menus

  • improvements to the text highlighting engine

  • suggestions algorithm has been refactored to be slightly faster and more accurate

  • support for multiple languages using the roman character set

That last one is a big one, and one that we could use a little help with, if any multilingual folk have a bit of time to spare. Our office is hopelessly monolingual, and we are looking for some brave folk to spend 5 minutes or so just playing with the engine and seeing if it returns appropriate results. We’re currently testing French, Spanish, and German.

If you’re interested, either drop a comment below, or send us an email (use the contact link at the top of the page). Honestly, there’s no immediate incentive for doing it, but we’ll try to figure something out.

FitC 2008 Session Notes Posted.

Just finished posting the slides from my FitC 2008 session “My Favourite Things”. I don’t think they’ll make much sense outside the context of my talk, but they might be useful for people who attended my session.

You can access the slides at gskinner.com/talks/.

I’ll also be releasing a bunch of source code and demos from the session on the blog over the next little while, so stay tuned.

As always, FitC was a phenomenal conference. Much love and Kudos to Shawn. It was great hanging out with everyone, and catching some interesting sessions. It was also a nice bonus that the weather was really pleasant (for once) during the conference.

I’m Huge in Japan!

I’ve always wanted to say that, even if it is a bald-faced lie.

While in Tokyo I had the distinct pleasure and honour of being interviewed by Director’s Magazine. The interview, and a rather silly looking photo of me appeared in the Aug-Sep edition of the magazine. I just got a copy (bit of a delay, but hey, it came all the freaking way from Japan, man!), and while I have no idea what it says, it’s still pretty cool, so I wanted to post it for posterity.

Here’s a scan of the picture page:

Continue reading →

Failure to Unload: Flash Player 9’s Dirty Secret

Update: Adobe has added the Loader.unloadAndStop method to Flash player 10 to address some of the issues outlined in this article. You can find more details on this feature in my article “Additional Information on Loader.unloadAndStop“.


Flash Player 9 has a very dirty secret. It doesn’t even try hide this dirty secret, but it’s still not that widely known. You see, Flash Player has severe problems with separation anxiety – once it’s loaded some content, it has a really hard time ever letting it go. Technically speaking, it is extremely difficult to make Flash Player 9 unload ActionScript 3 content.

In this article, I’ll take an in-depth look at the issue, it’s implications, suggestions for addressing the problem in the player, and some workarounds for the time being. If this issue seems like it will impact your projects, I’d strongly encourage you to read through the article and educate yourself, then use the link at the end of the article to provide Adobe with feedback on it. Likewise, I would encourage you to share this issue with other developers, both to help spread awareness of the issue, and to give them the opportunity to also provide feedback to the Flash Player team. I see this as one of the most critical issues that should be solved in Flash Player 10, and the more people raise it as an issue with Adobe, the more likely it is to be addressed.

Continue reading →

RegExr: Full Code Sense and Replace Support for RegEx Testing

This will be the last RegExr related post for awhile, I promise! I’m just having a lot of fun playing around with this.

I just uploaded version 0.2.1b of RegExr. This version includes support for using replace with regular expressions, including replacement codes (ex. $& is replaced with the RegExp match substring). Even better, I wrote a full RegEx lexer / tokenizer, so that RegExr understands the expressions you write at the token level. This allows it to provide accurate capturing group reports, nested token highlighting including display of errors, and contextual information on tokens as you roll over parts of your regular expression.

These new features make it way easier to learn regular expressions if you’re just starting out.

Continue reading →