Mike Dimmick's Bleurgh

Monday, 17 May 2004

Programming Taboos

I remember why I wanted to keep that last document. It ties in with things that Larry and Raymond (can't find any references right now) and of course John Robbins have said, and also a thread on the CodeProject Lounge about gotos in the MFC source code.

I believe that, if we're going to deal with leaky abstractions, we have to know and understand what's going on under the abstraction. If you don't, when the system does something crazy - and it will - you won't be able to fix it. While our virtual machines are leaky, and especially while they're unreliable, you have to understand both the VM and the real machine beneath. This is particularly true in the edge cases where you're trying to interface between the VM and some native code.

Java tells you to wrap up your native code in a nice interface using the Java Native Interface. Fine, but that requires you can write C++ code (and I think it ties you to a particular JVM implementation - not sure on this). .NET tells you to marshal using [DllImport] attributes (or Declare statements, for VBites) which are a bit neater, but you suffer the leaking abstraction when the type you're trying to marshal has many incompatible options for implementation on the native side. I've answered (or tried to answer) a number of questions about P/Invoke interop marshalling, and people are always trying to do something crazy with strings. There are many ways to marshal a string, and some are better than others (hint: on the way in, pass an LPCTSTR; if you need to pass a string out, declare two parameters, an LPTSTR and a buffer size, and use a StringBuilder on the managed side).

Anyway, if you need to crack a tough problem, it helps if you have all the tools in your toolbox - you just don't have to use the dangerous ones. Don't get hung up on the taboos - if a tool is useful and appropriate for the situation, use it.

The great clearout, part 1

I suspect this is going to be more than one entry...

This weekend, my parents have a visitor coming to stay from Düsseldorf, Reading's twin town. She needs somewhere to sleep, surprisingly, so I have to clear out my sister's old bedroom - which, since I returned from University, has been both my mum's sewing room and my guitar practice/Xbox/TV/extra storage room. A number of things I brought back from Uni have never been sorted out properly - I've been in a 'ah, it's only temporary' mindset for the best part of three years.

So I decided to tackle one of the boxes - an old Amazon box containing various files (and, for some reason, a bunch of telephone cable, some cable tacks and a phone socket - I think I planned to fit this at the last student house but never got round to it). Inside, I found a few course notes, a collection of unused Christmas cards (I don't remember sending many cards, but I have a bunch of spares anyway) and an absolute stack of papers relating to my second second year (I'll talk about that some time, if I haven't already) group project.

This project required a small hotel management system to be written in Ada with a web front-end (actually, the web front end wasn't essential, but more marks were available if you did). I essentially designed and implemented the whole system - basically I came up with what was a reasonably efficient design (based on hash tables) using Ada generics. However, I wasn't too good at communicating the design or the implementation to my team-mates - and it did use a lot of the language's features that we'd barely covered in lectures. It just overwhelmed them.

Anyway, I've recycled all the printout and design notes because, well, it's basically irrelevant to anything I'm doing now (different OS, language, environment, etc) and I've got all the source code on my hard disk anyway.

Other noteworthy documents I found (and I'm going to link to, then recycle):

Thursday, 13 May 2004

Unwarranted speculation

Looks like OpenNETCF's back up - with this registrar. Maybe they were just changing hosts and DNS updates were delayed?

Obscure programming languages

Dare talked about the lack of XSLT 2.0 and XPath 2.0 in the forthcoming .NET Framework 2.0, and linked to Mark Fussell, who linked to 99 Bottles of Beer in various languages (programming languages, that is).

I think Ian would particularly like the Common LISP one which is made up of a complicated format string - in the shape of a beer bottle!

Ian's told me before about the BrainF**k language, but I'd not seen it before. Anyway, it seems that other people have written languages for bovines (beer-drinking bovines) and simians (alcoholic apes), along similar principles.

I can't help thinking that the T-SQL version should be set-based, though...

Wednesday, 12 May 2004

New features

Anyone reading in an aggregator might have noticed a bunch of posts disappeared for a short while this morning, then came back later. You might also find that the last month or so of posts suddenly reappeared.

Blogger have implemented comments, and I've gone back through and enabled them (a manual process) for everything that was appearing on the front page. I don't yet know whether items posted through BlogJet will get a comment link - I hope so.

Because of this change, I've removed the Contact link over on the right. I'll still know if you've left a comment. This might improve the amount of spam I'm getting, though I don't think it'll have any effect on all the spam sent with return addresses at my domain, for which all the bounce messages end up in my inbox grrrrr.

Saturday, 8 May 2004

Reading List

What are the best development books? (blogs.msdn.com/jobsblog)

I tried to answer the question, but I virtually ended up listing the contents of my Programming shelf. There's very little dead weight on there. I've got something of a shelf at work, some of which gets used, some not. Other less immediate references stay at home. Here, then, is the home list: not a lot of commentary. Links go to Amazon UK, no kickback to me if you buy one, feel free to use another vendor. In some cases Amazon lists a different title; I've typed what appears on the cover.

Extreme Programming

Extreme Programming Explained (Beck)
Extreme Programming Installed (Jeffries, Anderson, Hendrickson)
(about four years ago, Ian's then employer were considering XP. I try to use some of the principles as appropriate but not yet properly using XP)

Programming Languages and Libraries

Programming in Ada 95 (Barnes) - the language taught in my first programming courses at University. Actually not my book, it belongs to Colin.

The C++ Programming Language, 3rd ed (Stroustrup)
The Annotated C++ Reference Manual (Ellis, Stroustrup)
Ruminations on C++ (Koenig, Moo) - actually, I've never finished reading this
Effective STL (Meyers) - I barely get to use STL

C++ In Depth Box Set, consisting of:

(read that last one if you want a template-induced headache - I maintain that Andrei Alexandrescu was the only good thing ever to come out of RealNetworks)

The Java Programming Language (Arnold, Gosling, Holmes)
The Java Language Specification (Gosling, Joy, Steele, Bracha)
I actually don't know Java that well. I bought these books to assist with my degree final project - a tool to produce diagrams of the static class structure of a program. I started out targetting C++ but discovered that C++ is, to put it mildly, a bugger to parse. I threw myself into the C++ parser so hard that I spent too little (i.e. practically no) time on the drawing and layout side. Eventually, to get something working, I abandoned C++ for Java - but too late. I just barely scraped the project, getting the minimum 40% pass mark, dragging my overall result down to a 2:2.

Windows Programming

I have so many books on Windows and Windows-oriented programming and Windows tools, it's difficult to know how to organise this. That shouldn't be too surprising, given as I work as a Pocket PC and desktop developer for a Microsoft-oriented ISV.

MFC Books

Teach Yourself MFC in 24 Hours (Morrison) - the main book I, er, taught myself MFC from. The main difference between this and other MFC books is that it's entirely code-focused - there's very little use of the Wizards. I maintain that if you want to be able to maintain an MFC program - indeed, any program - you need to know what it's doing, and relying on Wizard-generated code without understanding it is foolish.
Programming Windows with MFC, 2nd Ed (Prosise)

Low-Level & Raw Win32

Programming Applications for Windows, 4th Ed (Richter)
Programming Server-Side Applications for Windows (Richter, Clark)
Inside Windows 2000, 3rd Ed (Solomon, Russinovich)

.NET

Applied Microsoft .NET Framework Programming (Richter)
Programming Microsoft .NET (Prosise)
Shared Source CLI Essentials (Stutz, Neward, Shilling)

COM

Inside COM (Rogerson)
Transactional COM+ (Ewald)
(Essential COM is at work, as are Inside ATL and ATL Internals)

Other Windows

Debugging Applications for Microsoft .NET and Microsoft Windows (Robbins)
International Programming for Windows (Schmitt) - this was actually an error made by my sister (who works in Exchange tech support in the UK) when I asked for Programming Windows, 5th Ed (Petzold) (which is at work). Nevertheless I kept it and read it. Despite the UK being so much closer to countries with other languages and character sets, failure to program for international markets is just as prevalent as in the US (and we have the disadvantage to be based in the GMT time zone for half the year).

Miscellaneous

TCP/IP Network Administration (Hunt) - I ran, among others, a shared computer network at Aston Brook Green between 1998 and 2000. I was responsible for - well, made myself responsible for - DNS and DHCP, which was served from a Linux kernel 2.0.38 (RedHat 5.2, IIRC) box. It was pretty reliable in comparison to Windows 98, but then I wasn't playing games on it or running GUI apps. This was before I ever used Windows NT, which changed everything.

The Mythical Man-Month (Brooks)
Code Complete (McConnell) - I note a second edition is due in about a month.

Programming with POSIX Threads (Butenhof) - there are no good references on Win32 threads, such as when and how to use them. Programming Applications gives little guidance on signalling another thread - it's all about synchronisation. So I picked this up instead.

lex & yacc (Levine, Mason, Brown) - another legacy of my final project. If you want to build a parser, don't use these tools, they're way too confusing (and not powerful enough for C++, if you want an intelligible parser). Elkhound looks much more interesting.

Refactoring (Fowler)

That's everything I have at home; I may update this entry on Monday with what I have at work.

Friday, 7 May 2004

OpenNetCF lets domain slip

I can only assume that the people at www.opennetcf.org forgot to renew their domain registration - and now a Chinese hosting company has stolen the registration.

whois -h whois.pir.org opennetcf.org
Domain ID:D96652469-LROR
Domain Name:OPENNETCF.ORG
Created On:20-Mar-2003 14:15:41 UTC
Last Updated On:07-May-2004 10:54:48 UTC
Expiration Date:20-Mar-2006 14:15:41 UTC
Sponsoring Registrar:R64-LROR
Status:INACTIVE
Registrant ID:ONLC-637769-4
Registrant Name:Buy a domain
Registrant Organization:chinachanel
Registrant Street1:xiao meng china channel

Info courtesy of www.demon.net/external.

Thursday, 6 May 2004

BALEETED!

To confirm my suspicions about the patch I referred to in the last post not having been applied, I downloaded a Linux 2.6.5 kernel tarball. Having read the appropriate source files, I'm now scouring the appropriate section of the hard disk.

Ah, that's better - I don't feel so dirty...

Wednesday, 5 May 2004

I don't know where to start...

CRN: Microsoft Shelves NGSCB Project As NX Moves To Center Stage (via The Inquirer).

Boy, this article and the comments are so wrong I almost don't know where to begin. There follows a slightly edited version of my comment on the article.

NGSCB and No Execute are completely different things. No Execute applies only - I emphasize ONLY - to the ability to tell the processor not to execute code from given pages. Because Windows is a protected-mode operating system, only the kernel can set or clear this bit in the Page Table Entry. NX can help prevent the exploit of buffer overrun bugs.

Windows NT has always supported the ability to set execute permission on memory independent of read and write permission - look up the VirtualProtect API and the PAGE_EXECUTE flag. It requires hardware support, which has been lacking on x86 until now. AMD's 64-bit processors implement a No-Execute bit in the page table entry when the processor is running in 64-bit mode or in Physical Address Extensions 32-bit mode. Intel's Itanium processors also include an Execute bit in the page table.

Frankly, it's taken far too long to get execute protection on the x86, so AMD should be lauded for finally implementing it.

The Linux patch that another poster referred to: the follow-ups to that message basically damn the proposal on the grounds that it breaks GCC. Things may have changed in 7 years, of course. It works by reducing the length of the Code Segment not to cover the end of the address space - this causes the x86 processor to generate an Access Violation (oops! we're on a *nix - I mean a segmentation fault) if you try to set the instruction pointer outside this range. A clue to the general effectiveness of this proposal lies in the fact that this still isn't included in the kernel source tree (actually, as posted the code had no effect apart from taking another slot in the Global Descriptor Table because the CS register was set to the newly defined USER_HUGE_CS [the old behaviour] on return from taking a trap, and there was no apparent way to set it to anything else).

Replicating this on Windows would limit the number of threads dramatically and require a large change in the way DLLs are loaded. All thread stacks would have to be in the area of memory not covered by the code segment, which would have to be at the end because a segment represents a contiguous sequence of virtual addresses. The No Execute protection allows any virtual address to be protected; rather, that all virtual addresses that don't contain code will be protected by default.

NGSCB is about securing users' data and keys in an area only accessible through secure APIs. It has nothing to do with security vulnerabilities, except that both fall under a very rough umbrella of 'security'.

[edit: confirmed my suspicion about the Linux patch not being included in 2.6, and note that the article was revised after I posted this entry.]

Friday, 30 April 2004

Two most common bugs

The two most common bugs in my code are:

Inverting a boolean condition.
Forgetting to increment a loop counter/move a pointer in a while loop.

You'd think I'd have learned that these are my most common issues by now and to check them more thoroughly.

I just managed to lock up a device when installing a program because of a #2 in the install process...

Thursday, 22 April 2004

An expensive day

My car went in for a (scheduled) service today. I asked the garage whether the driver's side mirror housing, which I broke about six months ago (reversing in the dark down my parents' driveway, I encountered a tree...) would pass the MOT (British yearly vehicle inspection, for any foreigners reading). Their considered opinion was that it wouldn't. The whole unit, mirror, housing and all, would need to be replaced.

Of course, the most expensive part of fitting a new mirror is painting the body-coloured back piece of the housing the same colour as the car. This part is available separately on the Focus, but I'd broken the main body of the housing, so the whole lot had to be replaced.

I got a call halfway through the day saying that they'd performed the MOT and it had passed, but they'd discovered that one of the rear brake cylinders (a hydraulic item which presses the brake pads against the drum) was leaking and should be replaced. Did I want to replace it? If it had been something like a cracked bumper, I'd have said no. But brakes are an essential safety item - despite costing £100, it needed to be done.

So, a third year (37,500 mile) service, plus the mirror, plus the MOT, plus the brake cylinder, plus VAT at 17.5% came to over £500. Ouch.

Oh well, at least it was scheduled, unlike my boss's car battery. /me smirks. During the time I've had the Focus, my boss has had a Mercedes SLK, a Mercedes C-class, and now has a BMW 5-series. The SLK had problems with the roof, the C-class I'm not sure of (perhaps he just didn't like it), and the 5-series can't keep a battery charge.

Something you might consider odd is that while I live in Reading, I had the car serviced at Marlow. I work in Cookham. Why not service it in Reading? Two reasons. Firstly, Reg Vardy run both the Reading Ford garages, and they're - how shall I put this? - crap. Platts at Marlow are a lot better, or so it seems. Secondly, it's actually more convenient to drive to Marlow and get a train back to Cookham (takes 15 minutes, but only once an hour) than it is to drive through Reading town centre, drop the car off, then get a bus back to Reading station, a train to Maidenhead, then change to the once-an-hour train to Cookham - and vice versa at the end of the day.

Wednesday, 21 April 2004

RIP, Sam and Max

It looks like LucasArts has decided to become a Star Wars-only development shop. They were producing a new game featuring Sam and Max, but decided to kill it off - for no apparent reason.

It's a shame. LucasArts adventures are always hilarious, but the last one they did was Escape From Monkey Island back in 2000.

Which reminds me - I was going to buy that Day of the Tentacle/Sam and Max Hit The Road double set. Somewhere along the line I've lost my copy of DOTT.

Tuesday, 20 April 2004

Google Ad-Words Gaffe

I don't think this particular ad is quite relevant...

Class Action Suit - Great deals on thousands of clothing items (EBay)

Thursday, 15 April 2004

Taxes and scarcity culture

US bloggers and websites are all talking about having to do tax returns today. Thank any gods that might be around that I don't have to do one - it's actually very unusual for UK residents with 'normal' tax affairs.

Anyway, the point of this post was to point to yesterday's WLCD comic. An interesting point on how you think differently if you don't have lots of resources.

Automatic DOS

The Windows Update servers have been under pretty serious stress over the last couple of days. Ironically, by encouraging everyone to turn on Automatic Updates, Microsoft have built a distributed Denial-of-Service system - that attacks their own site.

The Automatic Updates tool checks for any updates whenever your network is idle. (Anyone whose connection is charged per-byte should now go and turn it off!) If it finds any, it starts downloading them in the background, depending on how it's configured.

Those of us east of the US were probably out of the office when the latest set of updates were posted, at 10am PST, or (IIRC) 5pm GMT. Everyone whose computer was turned on queried the servers, found the updates, and started downloading...

If you're in a networked environment, you should look into getting Windows Update Services/Software Update Services. For administrators, this allows staging and testing of updates before deploying to workstations. It should reduce your network bandwidth as well as alleviating the load on the servers.

News Aggravator

I really need to sort out my news synchronisation. I read my feeds both at work and at home - the same set of feeds. At work I'm using Awasu and at home, NewsGator.

I spend quite a bit of time marking posts I've already read, as read - a task which is relatively tricky in Awasu.

Why use two aggregators? I don't really want to use NewsGator at work as I feel it would get in the way of what I'm supposed to be doing with Outlook, i.e. working, but then it doesn't seem possible to collect all new posts from all blogs into a single notification in Awasu, so I might change.

With respect to synchronisation, I don't want to pay for NewsGator Online Services. Maybe I need to write an Outlook plugin (no! not another one! it's slow enough already with SpamBayes, SpamNet and NewsGator!) which can do the sync for me. Of course it ought to use SIAM, but it'll probably end up being FTP (or could I just FTP a SIAM file?)

Sunday, 4 April 2004

GNOME usability: still sucks

Miguel de Icaza points to a writeup of GNOME usability.

Taking each section in turn:

Simple Dialog Boxes

First off, let's consider that save dialog. A few questions immediately spring to mind:

How do I overwrite an existing file?
How can I have confidence I'm pointing to the right folder?

The list of files, even though it takes up screen space, allows two things to happen. You can overwrite an existing file by clicking it then clicking OK. You can be sure you're in the right folder, because you can see what else is in it.

It's well-known that users find a list of items easier to manipulate than a drop-down, so I'd argue that Windows' Places Bar is simpler than GNOME's Save In Folder drop-down. Windows appears to have a problem with this when using very large icons, where the bottom icon's label is clipped; however, this dialog is resizable.

There is one good thing about GNOME's Save dialog: the file-type drop-down shows the icon that will be used for the file. The dialog has too many buttons in the upper-right corner: do you really want to minimize a save dialog? What happens if you maximize it: do you get a full-screen dialog with massive edit fields? This is a form that shouldn't resize.

I then find this statement typical: "True, the dialogs may irk those who like tab completion and other esoteric features (I'm guessing their issues will be worked out in future releases)..."

Tab completion? In a selector box? Are you completely insane? It's very, very, very advanced users who will type a path into the File Name box to navigate to a different folder.

Simple Menus and Program Names

"Because free software environments like GNOME are founded upon cooperative development they can avoid the problems caused by corporate competition and branding. A user in Windows XP will have to navigate Windows Media Player, Real Networks Real Player and Apple Quick time in order to play media files. Their applications menu will be cluttered and the number of interfaces to learn is higher than in GNOME where a user must only find and learn Totem Movie Player."

As I've written before, everything would be a lot simpler for Windows users if Real and Apple would follow the platform conventions, DivXNetworks and the XviD project, and write their format decoders as codec DLLs, then write their players in terms of codecs. The user's choice of player would then be able to play any of the common formats - even without licensing those formats. But therein lies the problem - and it's also a problem for GNOME, because Real and Apple want to make money out of their formats.

XviD is probably contravening the MPEG-4 patents.

Simple Configuration Tools and Preferences

The Wallpaper Chooser link is broken, so I can't challenge that. In Windows, the Desktop tab of the Display properties is hardly difficult to find. The Position and Color drop-downs take a bit of thought, but not much. The Customize Desktop button is probably misplaced - maybe an Advanced button instead? Fewer options here would suggest fewer choices.

Yes, the IE preferences dialog is a lot busier. It also has many more options than Epiphany. You're trying to compare the interface of a Mercedes E-class with satellite navigation and CD auto-changer to that of a Ford Fiesta with a radio - of course it has more buttons!

Applications - Multimedia

Again, here you're comparing a really basic application with a complicated one. Totem offers no support for playlists, Internet Radio, CD ripping, copying to a portable device, organising your music library, selecting from that library. It's your choice whether these features are included in your media player, but condemning Media Player's interface because it has all these features is ridiculous. The basic operation is pretty simple: once you've ripped a CD, it appears in Media Library; hit the arrow next to Now Playing to get a menu of albums, artists etc, then select what you want to play. The Albums menu could be clearer: at work I have Tom Petty and the Heartbreakers' Greatest Hits and Queen's Greatest Hits I and II. The Albums menu simply shows the album name. However, there's always Media Library.

On top of this, if you want a simpler interface to Media Player, it's actually quite easy to write one. The reason is that the core of Windows Media Player is written as an ActiveX control, and you can write your own wrapper around that. The 'Windows Media Player' executable, at base, is simply a skinning engine that wraps the player control.

The bottom panel shown in the screenshot in this article isn't even shown by default in WMP.

As for WMP's 'bloated menus', they have to be: every feature that's exposed in the UI is also exposed in a menu somewhere.

Wednesday, 24 March 2004

Pot, meet kettle

Real's Glaser exhorts Apple to open iPod (via Paul Thurrott)

Yep, the CEO of RealNetworks is asking Apple to open their product. I still want Real to open up RealAudio and RealVideo formats so that I can use Windows Media Player - or any other player - to view their formats without having to install any of their crappy player.

Turn it around, Rob; are you going to open your player or your store? Didn't think so.

Saturday, 20 March 2004

I take it back, again

RealDownload stopped downloading after 100MB. I think there's something up with either the download server (farm) or the download itself.

Friday, 19 March 2004

I take it back

RealNetworks did produce one useful piece of software: RealDownload.

OK, the user interface is non-standard and a bit crappy (at the time, Real insisted on a massive icon in the top-left, against the Windows standard, which forces drawing the whole of the normal 'chrome' itself). But, unlike most Real software, it works. It does do popup ads which come up whenever a new download starts, but you get used to closing them pretty quickly (and anyway, I normally use it for unattended downloads).

I was joking earlier this week that the only good thing ever to come out of RealNetworks was Andrei Alexandrescu.

I've not used RealDownload for quite a while, actually, but I pulled it out after the Windows XP SP2 Preview download failed at 131MB twice. Since moving to broadband, most downloads have been very reliable just using IE 6.0's built-in download tool.

Why's my pagefile so big?

Larry Osterman answers the question, why do I need such a large page file?

An aside to this is, why does Windows seem to be constantly swapping out? The answer lies in the working set. Windows tries to keep a certain amount of physical memory free in order to allow memory allocations to succeed - in other words, it uses a bit of otherwise-idle time to clean up memory not used recently so that programs don't have to wait as much when they need more memory. It also tries to share out the amount of memory fairly. The downside is that, if your process's memory usage profile is poor, the system trims off a piece of your process's memory that you haven't used in a while - then you immediately reference it. This can happen if you have a large disorganised data structure that you scan oddly, or parts of your program code that are related are a long distance apart.

I thought I was being bitten by this in our server application recently, and tried using SetProcessWorkingSetSize to give us a bit more size. However, it had no effect. I surmise that what was in fact happening was that the Windows message queue (this server uses the Winsock control for incoming data) was growing, and growing, because the server wasn't keeping up. This seemed to be causing a lot of swapping, which led the server to get slower, and slower, causing more and more swapping. It wasn't a memory leak, as such, because the process's working set dropped right back down again as soon as the load was removed.

Tuesday, 16 March 2004

.NET CF whinging

My major whinge is just that, where the overload count has been reduced, it's the most configurable overloads that have been dropped. Yes, this saves metadata in the runtime - but it hurts the programmer (in some cases having to avoid the framework entirely because you're boxed in). The original designers of Windows CE got it right - eliminate the simple functions (such as MoveTo, LineTo) which are entirely covered by more complex APIs (such as PolyLine). In .NET CF, you can't even create a pen wider than 1 pixel - because the overload that takes a pen width has been eliminated.

Other points of contention: no System.Diagnostics.Process class, so you can't create a process - you have to P/Invoke CreateProcess. You can't wait on more than one handle at a time with WaitHandle.WaitAny or WaitAll, because they've been removed (despite the underlying platform supporting WaitForMultipleObjects, at least for the WaitAny case). You can't poll a wait handle because the only overload of WaitOne left is the one that takes no parameters.

More seriously, it's not CLI compliant: the Thread class has no Abort or Join methods.

When trying to get back from a background thread to a UI thread, in order to update UI state (which you must, otherwise you may deadlock or have other synchronisation problems) you use Control.Invoke. .NET CF eliminates the overload which can take parameters: you're stuck with using the EventHandler delegate, which gives you the current object and an empty EventArgs. The desktop pattern looks something like:

private void ctl_HandleEvent( object sender, CustomEventArgs e )
{
   if ( this.InvokeRequired )
   {
      this.Invoke(
         new CustomEventHandler( ctl_HandleEvent ),
         new object[] { sender, e }
      );
   }
   
   // Do normal handling
}

You can't do this in .NET CF because you don't have InvokeRequired or the two-argument variant of Invoke. You have to cache whatever's in the CustomEventArgs somewhere, then Invoke a different method.

So the net result of omitting the metadata for these overloads (which cover the versions with fewer arguments) is that far more metadata is introduced into the program in order to try to achieve the goal.

And don't get me started on marshalling...

Mind your P/Invoke

It's important to get your P/Invoke [DllImport] declarations right in .NET. Josh Williams points out a specific problem on AMD64/x64.

I've been doing a huge amount of P/Invoke recently as we get our existing codebases ported to be usable in C#. We're mostly going pure-managed because we prefer having fewer large binaries to many small ones (which takes less of a hit on the loader). However, since we're dealing with the .NET Compact Framework, there are many, many places where the framework just doesn't have the features.

More on Visual Studio slip

Dan Fernandez posts more information on why Visual Studio "Whidbey" has slipped.

Friday, 12 March 2004

Know your command prompt

Ian Griffiths points to a post by Junfeng Zheng about the NT command prompt, CMD.EXE.

Ian mentions that file and directory name completion are available in the NT command prompt. He doesn't mention that they're not enabled by default (or at least, filename completion is enabled, but with an obscure keystroke - Ctrl + D?) To get completion enabled, use TweakUI. I've set both completion keystrokes to Tab, which I was used to from bash.

Thursday, 11 March 2004

Site maintenance

Quick maintenance update: a link to my blogroll is now over on the right on the website. For those of you reading in ATOM (or a translator, like Arcterex), it's here.

eVC 4.0 SP3

Via Pocket PC Developer Network.

Embedded Visual C++ 4.0 Service Pack 3 is released.

Heh. I wondered why Symbol's documentation for the MC9000-G (custom CE platform version) told you not to install eVC 4.0 SP2. Sorry, guys, no choice - I was already developing for CE.NET 4.2-based devices, including ~~Pocket PC~~ Windows Mobile 2003 on your PPT 8800.

Old news?

After yesterday's entry, I was browsing Robert Scoble's blogroll (OK, OK, I was looking to see if I was in it - apparently not) and discovered a link to Benjamin Mitchell's blog about an ASP.NET presentation at Microsoft UK, where Scott Guthrie stated a Q1 2005 release date for Whidbey. This was posted over a month ago (10th February).

It'd be nice to be kept up to date, really </sarcasm>.

Wednesday, 10 March 2004

Oops, we slipped

Microsoft Watch is reporting that Yukon (the next release of SQL Server) and Whidbey (the next release of Visual Studio) have slipped to 2005.

Damn, I was looking forward to programming devices in C++ using Visual Studio. Embedded Visual C++ is a cut-down and modified version of Visual C++ 6.0. Not done very well, I might add; it's very crash-prone, particularly if used aggressively - I have a very short code-compile-test-debug cycle. Sometimes I could do with more thought, but if I leave it too long before compiling I end up going down blind alleys. Compiling and testing gives me confidence.

I was also looking forward to an improved .NET Compact Framework (I'll post my frustrations with writing a pretty simple control soon) and a more complete .NET Framework - also, a 64-bit Framework.

I wonder which bit slipped? Probably the CLR, if the list of job postings posted by an MS blogger recently (that I can't now find!) was anything to go by.

It'll also get me off Josh Heitzman's back ;)

Saturday, 6 March 2004

Conflict of Interest

Mike Sax: Patents & Offshoring: Did you know that the USPTO has to be completely self-funded (it can't rely on your tax dollars), but a percentage of patent application fees is diverted to other, unrelated agencies by the US Congress? And did you know that the US Congress, not the USPTO determines what application fees are?

Surely this is a massive conflict of interest: the patent office is paid by the people seeking patents. If the patent office want more income, they have to get it by processing more patent applications in less time.

Thursday, 4 March 2004

Stupid conspiracy theories of our time

The Inquirer: If Longhorn runs on Power PC, what need for Intel?

OK, assuming that Microsoft isn't deliberately putting FUD in the channel surrounding Xbox 2 (my original theory was that MS were trying to deceive Sony, but the information is beginning to look a bit too solid for that), what will this mean?

Microsoft won't buy Apple to get access to PowerPC-based hardware. Their customers' investment in x86-compatible hardware and software is too great. The PowerPC G5 only just barely matches the performance of Intel's top-of-the-range chips, and we're about to see another big step in clock speed with the Prescott chips. Nevertheless, the G5s can probably manage to emulate an x86 quickly enough to run original Xbox games (after all, the Xbox only has a 733MHz PIII-class processor). I expect that Intel were too expensive and unwilling to reduce the massive power and cooling requirements of the P4 series - given that one of Microsoft's goals for Xbox 2 is to reduce the physical size, weight and noise of the console, which caused problems selling into the Japanese market. (Actually, an Xbox isn't much larger than a PS2 - it just looks bigger because the PS2 has a rather deceptive case design, with the bottom half of the front panel recessed).

Windows has run on PowerPC processors before. NT 3.51 and NT 4.0 CDs shipped with support for four processor families: x86, Alpha, MIPS and PowerPC. The PowerPC HAL, however, was designed for the Common Hardware Reference Platform - which never took off; the Power Macs aren't CHRP-compliant. Windows 2000 was to have dropped this to two, x86 and Alpha, but Compaq, having bought DEC, decided they would no longer promote or support Windows on Alpha. (This wasn't the end of the story: much of 64-bit Windows was first developed on 64-bit Alpha chips).

I don't expect Microsoft to release a new general port of Windows to Apple hardware. The market simply isn't there - you'd have to persuade an installed base of Apple owners (since Microsoft will never be able to get Windows pre-installed on Macs) that they would prefer to use a system with even less software available than their own. OK, Longhorn's WinFX API will largely be accessed through the .NET Framework, which performs JIT compilation from the Common Intermediate Language (CIL) stored in the binaries to an execution stream suitable for the host processor - but there's a whole host of legacy applications which won't run. Longhorn isn't intended to be all-or-nothing in this way.

Indeed, it looks like the other attempt to move the PC market to a more modern architecture - Itanium - could fall on the sword of poor x86 compatibility. In essence, an Itanium running x86 code using hardware emulation performs like a 1.5GHz 386 - not very well relative to modern machines - because it doesn't do any out-of-order execution or branch prediction. Software emulation (such as the IA-32 Execution Layer) could improve matters - benchmarks indicate that it can get close to the performance of a processor with a similar clock speed. Unfortunately, clock speeds for x86 processors are already over twice that of the fastest Itanium 2 - 3.4GHz for the newest P4Es compared with 1.5GHz for the Itanium 2.

On native code, the Itanium often blitzes a P4 Xeon at double the clock speed, largely because the instruction set is more expressive and the architecture reduces the need to hit main memory. While a modern x86 processor has many more registers internally than are visible through the instruction set, it can't easily tell whether writes to memory are actually only used because the program has run out of registers - so it has to take the whole hit of writing to and reading from RAM just in case the program depends on this state. Yes, writes and reads are cached - but writing to the caches still causes a bit of a stall.

Wednesday, 3 March 2004

The SELECT/UPDATE problem, or, why UPDLOCK?

Ian's been having some deadlock trouble with SQL Server at work. I tried, but failed, to explain that two server processes running the same stored procedure could deadlock.

The problem comes when you need to update some rows in a table, but only when certain other data in each row is set. You can often do this simply by using the WHERE clause in the UPDATE statement, but if you need to set different values depending on the current values, or you need to update multiple tables simultaneously, it becomes more complicated. So we use a SELECT to get the current values and an UPDATE to write the new values, if we choose to.

The first thing to do is to ensure that we only write back if the data hasn't been changed. In SQL, each statement is atomic - either all of its effects are applied, or none are. However, here we need two statements, so we wrap them up in a transaction:

BEGIN TRAN
SELECT
   @value = Col1
FROM Tbl
WHERE
   RowID = @rowID

UPDATE Tbl
SET Col1 = @newValue
WHERE RowID = @rowID

-- Note, should check @@ERROR and ROLLBACK TRAN
-- if the update failed
COMMIT TRAN

Looks fine, right? Not always. Now I need to explain how SQL Server locks work.

Like all concurrent systems, SQL Server typically has more clients than available resources. It has to give an illusion of concurrent operations. The really, really hard way to allow transactions to operate simultaneously is to allocate new resources for every possibly-contending operation and reconcile them at the end of any atomic operation. The easy way is to lock the object to prevent contending operations, then release the lock at the end of the atomic operations. Locking reduces concurrency, but encourages correctness in a simple fashion.

SQL Server uses locking for concurrency. This is fine so long as locks aren't held for a long period of time. Reading a row takes a shared lock until the end of the atomic operation; updating a row takes an exclusive lock. If a shared lock is held, any other process can take a shared lock; a process wanting an exclusive lock must wait. If an exclusive lock is held, all other processes wanting a lock must wait.

With our query above, the SELECT takes a shared lock and holds it; the UPDATE escalates the shared lock to an exclusive lock.

Now, what happens if we run this query on another connection? Let's say we have queries Q1 and Q2, and to simplify things, let's assume that the server has a single processor. If the scheduler decides to run Q1, and is then interrupted to execute Q2, the following could happen: the SELECT from Q1 runs and takes a shared lock. Then, the SELECT from Q2 runs and takes another shared lock. Now Q2 is interrupted and the scheduler runs Q1 again, which tries to take an exclusive lock, which is blocked by Q2's shared lock. Q1 blocks so the scheduler runs Q2, which tries to take an exclusive lock to do an UPDATE, but is blocked by Q1's shared lock. Result: deadlock - neither Q1 nor Q2 can progress because they're both waiting for the other to finish.

You could give SQL Server a lock hint to take an exclusive lock instead of a shared lock when executing the SELECT, by specifying (XLOCK) after the table name. This stops the deadlock, because both will now try to acquire the exclusive lock, which means one must wait for the other. This has the nasty side-effect of preventing anyone else who just wanted to read the data from reading until we decide to update.

For this reason, SQL Server has another lock type: an update lock. The rules for this lock are simple. If no lock is held, or only shared locks are held, the update lock can be taken. Only one process can have an update lock, but other processes can take shared locks while an update lock is held. If the process holding the update lock wants to write, it is upgraded to an exclusive lock. So if we add the update lock hint (UPDLOCK) to our SELECT, Q1 and Q2 will now perform atomically, one after another, without deadlocking, while other processes can read the selected rows (at least, until we UPDATE).

BEGIN TRAN
SELECT
   @value = Col1
FROM Tbl (UPDLOCK)
WHERE
   RowID = @rowID

UPDATE Tbl
SET Col1 = @newValue
WHERE RowID = @rowID

-- Note, should check @@ERROR and ROLLBACK TRAN
-- if the update failed
COMMIT TRAN

Monday, 1 March 2004

Why aren't people studying Computer Science any more?

Scoble: Why aren't students going into computer science?

Well, at my time at Aston, the number of CS students was rising. However, when I graduated in June 2001, we'd only just started to see the beginning of the dot-bomb, and the terrorist attacks on New York were still a few months away. Since then, the economic conditions for software developers have become a lot worse.

When I was looking for jobs in July-August 2001, there were literally hundreds of vacancies, covering many pages, advertised in every computing journal and national newspaper. This went down to about half a page last year, and has recovered to about a page. Searching on sites like Monster.com (etc, don't take that as an advert or a recommendation) would also give hundreds of vacancies; now it gets you about five.

I think what's happened is that the people that Ian and I termed the 'mercenaries' have started looking elsewhere. For a while, it looked like you could make a lot of money out of software; now it looks like you can make a living. I don't think you should blame this on Microsoft - their market share is not much greater than it was four years ago (does an extra 1 - 2% mean all that much when it's more than 90% already?) The mercenaries weren't doing it because they loved the challenge of working with software; they were doing it for the money. These tended to be the people who complained that the coursework was too hard.

Well, guess what, software is hard. A lot of people think that they can translate a bit of hacking at simple programs into strong, reliable, easy-to-use programs. You can't. As soon as you need to handle errors, rather than ignoring them, and you need to deal with asynchronous operations, and simultaneous operations, you need to think about how your program will work. You can't always experiment and find out, because testing does not prove that your program is correct. It only proves that no errors occurred during the most recent run of your tests, which may not be sufficiently complete (you can only look for bugs that you think might be there). Those are the software tools we have, but the best tool is in between your ears. A lot of developers never understand this.

The OSS movement try to suggest that software is easy and anyone can hack on it. Not true at all. The whole tone of The Cathedral And The Bazaar tries to suggest that professional software developers are developing a priesthood to restrict the Average Joe from getting involved in programming. I don't think we are; I think professional developers have seen through the superficial simplicity of programming to the murky depths of complexity lurking below.

A CS degree can help educate developers about the need to understand your program, and provide the skills to write software. (It can also indoctrinate people in the One True Way to develop software, which is not a good thing). All told, I'd rather see people with a CS degree developing software than people without; you do get excellent self-taught programmers, but you get a lot of poor ones too.

16-bit Apps

Raymond Chen: Why 16-bit DOS and Windows are still with us

As a development organisation, we have a number of DOS apps that are essential to us. We still deal with a lot of DOS-based hand-held terminal hardware (e.g. the Symbol PDT68xx, 61xx or other 3000-series) - indeed, I think we wrote three or four entirely new DOS-based applications last year. Where possible, though, we try to use our application server software and write the actual application with a desktop development tool (VB6 or, recently, a .NET language). This is only possible in a wireless LAN environment, though - while it works over a wireless WAN, this is obviously quite costly for a thin-client environment.

Until the appropriate vendors come up with Win32-based toolsets for these devices, or they die out completely, we need DOS compatibility. Our main compiler for these platforms is still Visual C++ 1.52 (the second most common is Microsoft C 6.0!) However, we also develop for Windows CE and require eVC and Visual Studio .NET 2003. So I have compilers on my work system that are more than ten years apart (IIRC).

Less important Win16 programs include B-Coder Professional. We're still using version 3.0 because, well, it works, and 4.0 offers only a few extra symbologies for a large outlay of funds. The configuration tool for a D-Link network printer adapter is also a 16-bit app (it has a web configuration tool, but that's not very helpful when you don't know the device's IP address).

However, I'm contemplating moving the development environment into a Virtual PC VM. After all, I don't use the IDE for developing DOS applications, except to maintain the makefile and perform the build. Any coding is usually done in TextPad, if it's not a cross-platform project such as the application server's thin client (where it's normally done in Visual C++ 6.0 or in eVC 3.0).

For the most part, though, Windows CE-based devices now cost less to purchase than the DOS devices, and are getting closer in functionality, design, and battery life. The new MC9000-G looks to be on a par with the old PDT68xx in ergonomic terms.

Sunday, 29 February 2004

Future of HTML

Inspired by Dare: A Look at the xml:base attribute and the .NET Framework's XmlReader

Let's start with a bald statement: I believe that the web will continue to use the current lingua franca of page description, HTML 4.01. This is the zenith of the HTML series of standards: it describes the use of styles to provide separation between document model and appearance, and standardises the use of plug-in objects.

I see XHTML 1.0 and later as being a solution in search of a problem. XHTML 1.0 is a reformulation of HTML using the stricter XML model, which should allow a standard XML parser to parse XHTML successfully.

Unfortunately, XHTML is strictly incompatible with HTML 4.0. The problem stems from the requirement that all elements in XML are closed, whereas HTML does not require this. XML offers a simpler syntax for elements with no content, e.g. <br />. If HTML is interpreted strictly, that / is illegal.

The intent of XHTML is to split HTML down into modules, which can be implemented as required by a browser. The unfortunate part is of course that large swathes of the existing Web already contain elements missing from, for example, the XHTML Basic profile. To remain usable for a given user, the user's browser must implement all of HTML 4.0 - making XHTML basically pointless.

Of course, IE has problems with XHTML anyway. The Jargon File renders strangely on IE due to mismatched character-set information. The server doesn't supply any character set information: the HTTP headers only indicate Content-Type: text/html. The file I linked to is formulated as XHTML (and shouldn't be transmitted as text/html anyway); the <?xml?> processing instruction indicates encoding="UTF-8". IE uses its default character set, Windows-1252, to display the data, leading to the wrong result. It does this because the HTTP header didn't indicate a character set. IE also goes into Quirks mode, because there isn't a valid HTML 4.0 DTD.

Friday, 27 February 2004

Alpha vs Beta test

From Paul Thurrott's WinInfo Short Takes:

"First, Longhorn, the next major version of Windows, is currently in pre-beta (which used to be called alpha, but Microsoft actually refers to as "pre-alpha," which doesn't make sense) [...]"

Note: this article is an expanded version of my comment on WinInfo.

Pre-alpha versus alpha versus beta versus pre-beta: terms often misused. Alpha testing is when you are feature-complete (you've implemented all the features you intend to ship) and are testing internally. Beta releases are once you've performed a lot of internal testing and resolved as many issues as you think customers will encounter. Asking customers to perform their daily work on the software can reveal problems that weren't found in the fairly sterile environment of the alpha test lab. It also allows you to get feedback on how easy or difficult to use the software actually is.

I've not really heard of pre-beta, but I suppose it could refer to an intermediate stage between alpha and beta test (for example, polishing installers - your in-house testers can probably cope with some more crufty installers than the users could).

MS seem to be changing a lot more between beta releases these days: some of the code is still alpha quality, or features are appearing in later betas that weren't present in earlier ones.

Longhorn is still a long way from design complete, let alone feature complete. It can only be said to be 'pre-alpha'. Build 4051 was just a dump to allow developers to get an early look at current thinking - call it a warning shot, if you like.

Reference-based languages

Call me an unreconstructed C zealot ;)

Thanks.

You still have to understand pointers in a reference-based language such as VB(.NET), C# or Java. You just don't see them, and have to remember whether a name means a value or a reference.

You also have to know the difference between ByVal and ByRef in VB, otherwise it bites you when a change to a parameter made inside a method gets reflected up to the caller. Remember that the default for a parameter in VB6 is ByRef, which can cost you more for the simple types.

VB6 also has the braindead convention of being able to put parentheses around a parameter to indicate that it's passed by value where it would normally be a reference (actually, I think it's a side-effect - it gets evaluated as an expression, and a reference to a temporary is passed that's thrown away on return),

I'm not that sure that the mental overhead is worth it.

(Inspired by this Ask Joel thread).

Thursday, 26 February 2004

Careful when using udp/1434

I had an interesting problem this afternoon: a client program using a UDP transport was having trouble communicating with its server. The client doesn't call bind() unless you specify a particular port, so it gets a dynamic port (which Windows allocates from 1025). It turned out it was using UDP port 1434...

...which is, by convention, SQL Server's port. It appears that Enterprise Manager sends (as I discovered a couple of days ago) packets to this port to discover if the server is alive, for all registered servers. The client was running on our test box, for which we use the same IP address regardless of what's installed. I'm guessing (but didn't confirm) that a colleague still has a registration for that server.

So the client was getting a single byte just often enough to assume that the server has responded (for reasons I can't go into here, we don't connect() to the server and hence only accept packets from a particular end-point). It was processing the response as 'response received,' but then discarding it because it was too short. Result: going round and round in a retry loop.

The key point: always verify as much of your protocol as possible. Put some magic numbers in it, if you're designing a binary protocol. Reject anything that looks even vaguely wrong.

Some people write code that is too loose, on the misguided assumption that it makes the software more flexible. It does, but it also means that your program may react badly to being sent something it's not expecting. At worst this can lead to security holes.

Wednesday, 25 February 2004

More character set stuff

I see that MSDN is still having character set trouble: it looks like pages are being encoded with UTF-8, then the encoded page is again passed through UTF-8.

The latest example is in the XP SP2 Windows Firewall information (see 'allow local' near the end of the document). The UTF-8 sequence e2 80 9c (which shows in the document as â€œ) is U+201C, the opening double-quote character -> “

The safest way in XML and HTML is to use &# notation (e.g. € is the Euro symbol, €). HTML 3.2 indicated that these were to be interpreted as ISO Latin-1, whereas HTML 4.0, XML 1.0 and later interpret them as Unicode. The named character references (e.g. “ for “) only work properly in HTML, not in XML documents. XML processors are required to recognise <, >, &, ' and " - < > & ' ", respectively (see section 4.6 of the XML 1.0 specification [link to Tim Bray's annotated version; definitive version at www.w3.org]).

Tuesday, 24 February 2004

'I have an antivirus program that I'll monitor myself'

Dear MS:

With regard to the XP SP2 Security Center, specifically the dialog depicted here. Please could you add a checkbox labelled:

I am not an idiot, and do not require an anti-virus program

Thanks.

Seriously, I believe that anyone practising safe computing does not need an anti-virus program. The rules basically boil down to:

Don't run file attachments to emails.
Don't open emails from people you don't recognise
Keep up to date with security patches
Be wary of what you download

Follow these and you'll be fine.

I was a little concerned earlier today when our router logs indicated that my computer was sending arbitrary data over UDP to an external port 1434, which is SQL Server's well-known service discovery port. It turned out to be Enterprise Manager trying to discover whether a colleague's server was running; he doesn't register with our DNS server, whereas I use that as the primary DNS (we have a partly implemented domain) and instead of finding his computer name, it found computername.co.uk. Thanks, TDImon!

Bet they were pleased to get udp/1434 traffic...

On making tea

Follow-up to the methodology of making tea (jeffdav) - milk in first, or afterwards?

If you're doing it properly, you should be making the tea in a pot first, pouring the milk into the cup or mug, then pouring the tea from the pot into the cup once it's brewed.

I'm quite lazy at work and pour hot water directly onto a teabag in the mug, wait about half a minute for it to brew, remove the teabag then add the milk. Milk + teabag doesn't go together well, IMO.

You can buy two sorts of teabags in the UK: regular and 'one cup'. The difference is basically that 'regular' is stronger, and tends to be a blend of teas. Since 'one cup' teabags are nearly as, if not more, expensive compared to regular ones, a lot of people simply use regular bags for making single cups and whip the bag out sooner.

Character sets

While on the subject of character encoding, I had a big followup which I was going to send to John Robbins about his Bugslayer column in this month's MSDN Magazine. However, I drifted off the point a bit, and it would make a better blog entry.

My basic point relevant to the article is that there's no such thing as 'Greek ASCII' (apart from "it's all Greek to me," of course). ASCII was standardised as ISO 646-US. It's a 7-bit only character set; only code points 0 - 127 (decimal) are defined. The meaning of any other bits (ASCII was defined before standardisation - here de facto, not de jure - on an 8-bit byte) is up to the implementation. There are seven other variations, the simplest being 646-UK, which only swaps £ in for # at code-point 35.

The Danish/Norwegian, German and Swedish forms cause havoc for C programmers, because characters essential for writing C programs (e.g. {}, [], \, |) are replaced (relative to the -US set) with accented vowel characters. C partly gets around this using <iso646.h>, which defines a number of alternate names (macros) for some of the operators that are really messed up by this. C also has trigraph support (officially, although few compilers support it), where these characters can be produced by using ?? and another character (e.g. ??/ => \). C++ also has some digraphs which are easier to type and remember than the trigraphs, but are more limited. Officially, the iso646.h names are now keywords in their own right.

The irony of this is that for the most part, very few people now need the trigraphs, digraphs or alternate keywords, because almost everyone is now at least 8-bit capable. The de jure standards for 8-bit character sets - at least, European ones - are the ISO 8859 series, including the well-known ISO 8859-1 suitable for Western European languages. The de facto standard is of course Windows-1252, which defines characters in a region between code points 128 and 159 which 8859 marks as unused (and IANA's iso-8859-1 reserves for control characters). 8859 uses 646-US for the first 128 code points. This often causes havoc on the Web, where many documents are marked as iso-8859-1 but contain windows-1252 characters (although this is usually the least of the problems).

8859 is a single-byte character set: a single 8-bit byte defines each character. This doesn't give nearly enough range for Far East character sets, which use double- or multi-byte character sets. An MBCS character set (such as Shift-JIS) reserves some positions as lead bytes, which don't directly encode a character, they act as a shift for the following or trail bytes. Unfortunately, the trail bytes aren't a distinct set from the lead and single bytes. This gives rise to the classic Microsoft interview question: if you're at an arbitrary position in a string, how do you move backwards one character?

For some reason best known to themselves, all byte-oriented character encodings are known in Windows as 'ANSI', except those designed by IBM for the PC, which are known as 'OEM'. If you see 'ANSI code-page', think byte-oriented.

Frankly this is a bit of a nightmare, and a rationalisation was called for. Enter Unicode (or ISO 10646). Now, a lot of programmers seem to believe that Unicode is an answer to all of this, and only ever maps to 16-bit quantities. Unfortunately, once you get outside the Basic Multilingual Plane, you can get Unicode code points that are above U+FFFF. It's better to think of Unicode code points as being a bit abstract; you use an encoding to actually represent Unicode character data in memory. The encoding that Windows calls 'Unicode' is UTF-16, little-endian. This serves to confuse programmers. Actually, versions of Windows before XP used UCS-2, i.e. they didn't understand UTF-16 surrogates, which are used to encode code points above U+FFFF. Again, for backwards compatibility (at least at a programmer level), the first 256 code points of Unicode are identical to ISO 8859-1 (including the C1 controls defined by IANA).

You may have heard of UTF-8. This is a byte-oriented encoding of Unicode. Characters below U+0080 are specified with a single byte; otherwise, a combination of lead and trail bytes are used. This means that good old ASCII can be used directly.

Hang on, that sounds familiar... The difference with UTF-8 is that the characters form distinct subsets; you can tell whether a given byte represents a single code point, a lead byte, a trail byte, and if it's a lead byte, how many trail bytes follow. UTF-16 has the same property; the term surrogate pair is used because there can only be two code words for a code point. UTF-16 can't encode anything after U+10FFFF because of this limitation. This makes it possible to walk backwards, although everyone who has --pwc in their loops has a potential problem.

UTF-8 is more practical than UTF-16 for Western scripts, but any advantage it has is quickly wiped out for non-Western scripts. The Chinese symbol for water (?) at code-point U+6C34 becomes the sequence e6 b0 b4 in UTF-8 - 3 bytes compared to UTF-16's 2. Its main advantage is that byte-oriented character manipulation code can be used with no changes. Recent *nix APIs largely use UTF-8; Windows NT-based systems use UTF-16LE.

The .NET Framework also uses UTF-16 as the internal character encoding of the System.String type, which is exposed by the Chars property. System.Text.ASCIIEncoding does exactly what it says on the tin: converts to ISO 646-US. Anything outside the 7-bit ASCII range is converted to the default character, ?. The unmanaged WideCharToMultiByte API (thank god it's not called UnicodeToAnsi) allows you to specify the default character, but as far as I can see Encoding does not. GetEncoding( 1253 ) will get you a wrapper for Windows-1253, not 'Greek ASCII'.

Monday, 23 February 2004

BlogJet got cooler

BlogJet now has a Code tab, so I can go and tinker with the few bits it gets wrong ~~and add strikethrough~~.

Recently there was a débacle over débacle, where it had inserted - IIRC - UTF-8 codes which weren't. Didn't render properly in the browser. Let's see if that's fixed...

Nope. The blog is marked as UTF-8, but the é characters appear literally in the code using their Windows-1252 forms. Lessee now, what magic incantation is it... é is in the Unicode Latin-1 Supplement section, so its code is U+00E9, or é to HTML types.

I don't use é because it tends to break XML, and of course my ATOM feed is an XML document.

Darn. Doesn't have Find-and-Replace in Code view. Ah well, should be easier to copy the code into TextPad and back into BlogJet after extra editing. Couple of stray  s too.

crush kill destroy

I definitely stand by an earlier post (at least, I think I've posted this before): Norton Internet Security must die!

Sadly I don't own this computer, or I would be ceremoniously scrubbing the areas of the disk which it occupied and burning the box.

The damn thing is a little child - it needs to be constantly patted on the head and reassured that it's doing the right thing. No, I don't want arbitrary eDonkey users trying to connect to this computer on port 4662 - not that I care, since I'm not running eDonkey, or anything else on that port.

If I say Block, I damn well mean it.

Friday, 20 February 2004

COM Marshalling on CE

Raymond Chen: Why do I get a QueryInterface(IID_IMarshal) and then nothing?

I had this problem with some CE ActiveX controls originally developed for Pocket PC, when running them on a custom platform (PDT7200, if memory serves). The reason? There are two versions of COM for Windows CE platforms, both of which ship with Platform Builder. The simple version supports in-process, MTA components only: on this version, you're only allowed to pass COINIT_MULTITHREADED to CoInitializeEx. This version doesn't support any marshalling; if your component needed it, that's your problem. You have to make your own way back to the UI thread if you need to fire an event.

The other version is known as DCOM in Platform Builder, and supports a pretty complete implementation of Distributed COM, allowing out-of-process components. This includes STAs and marshalling.

The Pocket PC uses the simple-COM implementation (at least, up to Pocket PC 2002 it does, I don't know about Windows Mobile 2003), so my (unmarked) component was fine. The PDT7200 uses the DCOM component, my component was unmarked, so it tried to look for a marshaller - and failed, because the type library wasn't registered.

Tuesday, 17 February 2004

Archaic computer hour

Dammit! BlogJet's Post button looks too much like the Link button

Rory had an 81...

I'm celebrating (I believe) my 20th year of programming this year: we got our first Spectrum in 1984, I think. Yes, he of the rubber keys and the blocky colour graphics.

Over the years we had two Spectrums (Spectra?), a couple of Interface 1 and some Microdrives. Actually, when we bought the second Spectrum second-hand, it was a freebie with the Interface 1 we actually wanted - my dad had written a planning applications database where all the data was stored on the Microdrive. The ROM on the Interface 1 had a bad habit of melting - I think we had about five over the years.

I must confess to killing the last surviving Spectrum by taking it apart, then breaking the membrane cable connecting the keyboard to the motherboard (motherboard? Only board!)

I pestered my dad for an Archimedes for a while (we had these at school) but, sensibly it turned out in the end, he bought an ICL DRS M30 PC (discounted, since he worked for ICL). This had - compared to the Spectrum - a shockingly fast 16MHz 286 processor, a massive 2MB of RAM and an enormous 100MB hard disk (partitioned into three 32MB partitions and a 4MB partition, because it came with MSDOS 3.3). This was in 1991, and I had all the fun of dealing with Windows 3.0 (yes, it was as bad as they say). This was the system I cut my teeth on (and the rest of the family too, considering the number of times I broke it) in the PC world.

This system got upgraded to 4MB, DOS 6.22 and Windows 3.11 over time, and got packed off to university with my sister, and we upgraded to a 486SX-33 with 8MB RAM and 420MB HDD. This served us all for two years before I bought my own PC (P120, 32MB RAM, 1.2GB HDD, Win95) for university.

After another couple of years I upgraded again to a PII-300, 128MB RAM, 8GB HDD passing the P120 down to my parents. Last year, three years after the PII, I built my own P4 2.8GHz, 512MB RAM, 120GB HDD. That's a system with processor clock 467x, memory size 10,922x, and storage capacity 1.2 million times my old Spectrum.

I never knew floating point was so complicated...

...but then my involvement with it is typically limited to randomisation values or APIs which are more convenient in floating point, but don't actually require that precision.

Visual C++ "Whidbey" (VC8) will have floating point optimisation pragmas.

Just a note to anyone dealing in currency values: don't use float. Instead, use a scaled integer. Some programming languages and environments offer built-in decimal or currency types (e.g. Visual Basic, SQL Server's money type, Ada).

Tuesday, 10 February 2004

If my boss finds this...

...I'm out of a job.

Also, if he finds the number of posts on CodeProject (782 in 10 months = ~2.6 per day since I joined) I'm definitely fired. Not all of them were posted during work hours... just most of them.

My contract says I'm not allowed to reveal trade secrets. I've skated awfully close to that on occasions.

In mitigation, I can only say that I'm bloody bored at work. It wasn't supposed to go like this. I was supposed to come in in the morning, pick up the project I was working on, for which I have all the requirements and specifications, code and debug all day, then go home. I might even do a little (unpaid) overtime, if I'm really into what I'm working on.

In practice, what happens is that the work expands to fill the time available. Otherwise, you really would be sitting on your backside with nothing to do. Even when there's nothing to do, modifying the few products we have is practically not allowed. R&D? We've heard of it.

Programming Pearls

We appear to have a programming strategy at my employer which could be termed the Pearl Fallacy.

Basically, the principle is that if you shovel enough shit into the codebase, it will eventually turn into a pearl. This only works for oysters.

It's more likely to turn into a gyppo (as defined by Terry Pratchett in Mort, IIRC - this post alludes to the meaning) - a solid looking crust on the outside, but absolutely disgusting if you put your foot through it.

(If you think I'm getting frustrated with my employer, you would be right.)

Wanted: a blog tool with a diff engine

If someone updates a blog post, I'd like to be able to see what the author changed. I've done the best I can with the change I just made.

Saturday, 7 February 2004

On royalties

[Updated 2004-02-10: a reader corrected me on ownership of MPEG-4 AAC, which bores a huge hole in my argument. Whoops. Removed aspersions on Apple.]

Another thing people forget about standards is the issue of royalties. Again, two fairly recent issues: WMA and MPEG4 AAC, and the whole débacle surrounding Rambus' role in SDRAM.

Just because something is a standard doesn't mean it's free to implement. The developers of the standard will often want compensation for having developed it. This applies even to ISO-mandated international standards.

Taking an example, consider the DVD and digital broadcast. Fairly ubiquitous, at least in the West, right? How much do you think you need to pay to make a DVD video, or broadcast a digital programme?

Digital broadcast and DVD both use the MPEG 2 video standard. Royalties apply to decoder hardware, encoder hardware and to recorded media. They're paid into a pool (the MPEG LA company administers this) which then pays out the the various developers. MP3's full name is MPEG 1 Layer 3 audio - and royalties are due to Fraunhofer IIS, who developed it.

AAC was developed by Dolby (licensing info); it was then standardised first in MPEG-2 (for bit rates higher than 32kbps per channel) and later in MPEG-4. I previously wrote in this paragraph that MPEG-4 was basically Apple's QuickTime 3; other sources make it clear that MPEG-4 only uses QuickTime 3's file format, not the audio encoding.

In these terms, it's understandable that Microsoft uses its own WMA format for their own Media Player software, rather than AAC. Once you realise this, it's down to terms and prices as to what gets licensed.

On standards

The software community is currently on a big standards kick. If you've developed something, you try to get it standardised (example: Microsoft pushing the CLI through ECMA and ISO's fast-track process). You then criticise everyone else for 'not following the standard' or for 'extending the standard.'

I don't actually care much about standards. They're useful, yes, but I'll use a non-standard product if it's better. There are two standards for the SQL database query language, SQL-92 and SQL-99. Most database products now support a subset of SQL-92; newer products are targetting SQL-99 (Microsoft's next release of SQL Server will have some SQL-99 features).

Can you produce a useful database application using only SQL-92 features? Possibly. Can you produce a better application using your vendor's proprietary extensions? Almost certainly.

POSIX is the ISO standard for interfacing programs to an operating system, and for presenting programs to a user. Which major operating system has virtually no POSIX features, and yet has over 90% of the installed base? The one that makes sense. The one that has richer APIs. The extent of POSIX support when programming to Win32 is that some of the POSIX extensions to the C run-time library are available - with a leading underscore - in Microsoft's C run-time. Compare _open to CreateFile and you'll get just a flavour of how much more Windows offers to the developer.

As developers, we have to weigh up whether following a standard is beneficial to our users: either in terms of being able to replace our software (erk!) or interoperate with other software. The downside is that it may be difficult to follow the standard, or it may simply be overwhelmingly complex for the particular application.

Microsoft have been criticised recently (i.e. in the last six months) for both the new Office 2003 XML schemas for Word and Excel (a lot of complaints emanated from Sun^H^H^HOpenOffice because Microsoft didn't use their schema) and for the WinFS schema language - why didn't they just use XML Schema?

In the first case, features of Word and Excel simply didn't map onto the OpenOffice schemas in any reasonable form. And in the second, XML Schema is simply too complicated, and doesn't match the WinFS object model. With this much mismatch, the programs would probably contain more code and run slower than they could have, leading to unhappy users. In the former case, OpenOffice's schema is probably a good match for their internal object model - but it doesn't benefit Microsoft.

Take the flip side: when Microsoft were looking for a better authentication technology for Windows 2000 Active Directory domains, they could have invented their own. Instead, they went with Kerberos. Why? Because it had the features they were looking for (a distributed authentication system, without needing to centralise authentication to a single group of servers) and was already known and trusted. However, it didn't have the ability for the administrator to change a user's password remotely. So Microsoft added that feature to their implementation of Kerberos.

Sun, as always, cried foul! Our Kerberos implementation (or, rather, MIT's) doesn't do that. MS must be making their client work best with their server - let's complain to (in this case) the EU.

But who benefits? The users do. The administrator doesn't have to open a remote command shell on the login server to modify the password file directly. Computers can add themselves to the domain, generating their own passwords (in Windows domains, computers have accounts as well as users) without requiring the administrator to explicitly set one on the login server, then enter it correctly on the computer.

It's helpful to control both ends of the connection, because you can make more extensive changes without reference to others. But I don't believe that Microsoft are deliberately trying to disadvantage their competition - they're just looking for ways to improve the user's experience through technology.

On XML 1.1

Dare blogs about XML 1.1.

Reading the summary Dare links to reminds me of C99: a solution looking for a problem.

There isn't anything majorly wrong with XML 1.0, and the things that are wrong with it aren't fixed in 1.1 anyway.

I don't mind standardisation committees working on producing standard versions of what had been slightly-incompatible variants of a technology. However, a standards committee deciding on extensions to an existing standard have a real tendency to either go over the top (C99) or tinker with things that don't need fixing (XML 1.1). I hope C++ 0x doesn't turn into a nightmare, but the signs at present are that the library will get a lot of extensions, the language will get a few minor ones, and (as long as you haven't used any new keywords) your C++98 programs will upgrade directly.

Indeed a standards committee can end up operating in a rareified academic atmosphere, inventing something that's only of very vague relevance to commercial software development.

Of course the C++ standardisation committee itself ended up inventing things like export, which actually turns out to be fairly useless. It might slightly reduce the time taken to compile a large C++ program with a lot of templates, but Visual C++'s precompiled headers (apparently something similar is also supported by GCC) already exist and are likely to be more beneficial. The goal of shipping a binary version of a template is scuppered by the fact that C++ allows so much variation between instantiations of a template - it can't simply be implemented by 'dope lists' as Ada generics can. The CLR gets away with it by instantiating at runtime, but this works against the general philosophy of C++.

About the only instance of standards upgrade done right I can think of is Ada 95, but even that broke a number of Ada 83 programs, and introduced a very strange object-based syntax for allegedly object-oriented programming.

Thursday, 5 February 2004

Correcting injustices

I need to correct a bit of an injustice I did to the CLR a couple of weeks ago. Since then I've bought and read Shared Source CLI Essentials, which covers a large proportion of the CLR/CLI codebase.

It turns out that JITted code (i.e. everything that is emitted as IL) actually uses dynamically-generated exception tables with a single exception handler around all JITted code. Any calls to unmanaged code also get an exception handler so that an unmanaged exception can be converted to a managed one. This reduces the number of user-to-kernel-to-user transitions that occur.

I did try to work out how the exception handling works, but disassembling the free build of ntoskrnl.exe is an exercise in frustration, especially when you don't have symbols (my main development machine at home is not networked, but it does have some patches after SP1 installed, so my SP1 kernel symbols don't match). Maybe the checked build would be better...

I had the thought that maybe you could hook the exception handling scheme with a driver, which would perform the whole unwind in user mode, but you'd still take the initial hit of a kernel transition on a throw.

The best conclusion is to realise that exceptions are for exceptional circumstances, where we don't care that it takes a little longer to change the point of execution. Microsoft pull less-used blocks of code (such as error handlers) out of the main path in the system DLLs, which can make it harder to follow; these cold blocks are placed in a different part of the DLL to reduce the working set of the normal execution path. RaiseException in kernel32.dll has two displaced cold blocks, IIRC - one for the case where you pass a NULL lpArguments, and another where you try to pass more than EXCEPTION_MAXIMUM_PARAMETERS in nNumberOfArguments.

Microsoft Offers 64-bit Windows XP Preview

"For the first time, Microsoft has posted a free public preview of a 64-bit version of one of its operating systems, opening up the AMD-based Windows XP 64-bit Edition for 64-bit Extended Systems for one and all. While the company has actually been testing the software since last fall--a previous beta was distributed at October's Professional Developers Conference (PDC) in Los Angeles --this is the first time the company has broadly released this type of preview software." -- Paul Thurrott's WinInfo

"MICROSOFT HAS MADE a trial version of its hotly-anticipated Windows XP for 64-bit AMD chips available for download.

"The trial OS is designed, says the Vole, for systems sporting Athlon64 or Opteron processors. Don't bother trying it on an Itanium, it advises." -- The Inquirer

People have short memories. There was a Windows Advanced Server Limited Edition (basically Windows 2000 SP2) for Itanium, essentially as a preview release, before Windows Server 2003 came out. Windows XP Professional 64-bit (for Itanium processors) is listed on MSDN Subscriber Downloads as released on 26th March 2003.

Most Itanium workstation systems will have been purchased with Windows preinstalled, if the user wants Windows. An Itanium cannot run a 32-bit x86 operating system, it must have a native OS (IIRC). By contrast, an AMD64 system can still run 16-bit code in standard mode, and can boot a 32-bit operating system, so there are currently AMD64 systems out there running the 32-bit versions of Windows.

Monday, 2 February 2004

IE Security Update

It's been a long time coming, but it's here.

I defer to JeffDav for the detailed information.

Apply now: windowsupdate.microsoft.com.

More BlogJet

Yep, definitely something worth having. That last post came out perfect (barring a little monkeying with Paste Special with the quote from Chris' blog).

On COM, pumping and the CLR

Chris Brumme's come up trumps again: Apartments and Pumping in the CLR.

This reminds me of something I've been working on last week (and I probably shouldn't be telling you about, but I'm beginning to get fed up anyway). We have a thin-client application server written in VB6. Under a fairly large amount of stress, we got Automation Error -2147417843: "An outgoing call cannot be made since the application is dispatching an input-synchronous call". What had happened, I think, was that we'd tried to call out to one of our 'transaction' objects from within VB's message filter, which it uses to ensure that things like painting and pop-up menus happen. The message filter is used when STA thread A has called an object on thread B (not in the same apartment) and is waiting for B to complete its call. A can't just block, because it needs to be able to handle any recursive calls from B to A.

I tracked this down to excessive use of DoEvents in certain areas of code. As Chris says, "Deadlocks are easily debugged. Reentrancy is almost impossible to debug." Too right. The simple, and, it turned out, correct approach was simply to remove calls to DoEvents.

VB developers seem to be quite keen on doing the least possible thinking to resolve a problem. Code blocking? DoEvents. Need asynchronous behaviour? Use a Timer control. The Right Answer can often be to get out of the VB environment and write a multithreaded COM component to fire an event when it's finished. With this approach, though, this application server will soon have no VB code left.

Hey, this could be a good thing...

Anyway, this server has bigger problems - like consuming over 50% CPU on a 1.8GHz P4 when handling less than 2000 transactions a minute. SQL Server was just laughing at us, typically idling along at less than 1% CPU (on a separate box). A bit of judicious transferral of operations (changing a chatty interface to a slightly more chunky one, with more work done outside the bottleneck server process) improved matters.

.NET Interop is becoming, er, entertaining, too. The application server does nothing useful on its own - it must host an application, which is a COM object (connected server-to-application using Automation late-binding - simple, but inefficient). COM Interop in the CLR allows us to write applications using VB.NET (thank the gods, an environment that doesn't utterly suck). I'm not sure of myself in this environment yet, though. Should I ever call Marshal.ReleaseComObject? Should I be calling it every time?

Publisher Policy bit me as well. It's not well documented. I should write an article for it (here or codeproject). If you don't know what it is, forget about it ;)

I've been vaguely considering re-implementing this server using a .NET language for a while. Might do a bit of that.