Wednesday 30 August 2006

VS6 family completely broken on Vista Pre-RC1 (or so it looked)

Looks like my compatibility problems are solved: in the radical way, by completely breaking Setup on all three applications. Setup now crashes on clicking 'Next' at the welcome screen. No Program Compatibility settings work.

These tools are essential for my work. If they don't work I cannot upgrade.

UPDATE 2006-08-31: It appears that whatever was causing this may have been a temporary glitch; Visual Studio 6.0 is now installing. Still, given my experiences with eVC before, I can perhaps be forgiven for jumping the gun?

I can't go in and add a note to the bugs I filed, because as a Customer Public Preview user, while I can submit bug reports using the Beta Feedback tool, I can't log on to Connect to make changes or additional comments. This means I'm wasting someone's time to triage bugs that I now cannot repro.

Tuesday 29 August 2006

People exhibit surprise that Windows Media DRM is 'cracked'

Example surprise.

I’m not at all surprised this is possible. In order to decrypt data, you need two things: the encrypted data, and the decryption key. In order for media playback of DRM-protected files to be possible while disconnected from the Internet, both of those things need to be on your PC. If the key is already on the attacker’s PC, it’s only a matter of time before they find out where it is.

There are of course things that can be done – such as encrypting the decryption key with a master encryption key, so that it isn’t on disk in a usable form, then decrypting it only while it’s actually needed for playback – but ultimately, the key will be visible in the system’s memory somewhere for long enough to be copied.

Saturday 26 August 2006

TV Licensing has serious issues

(Note for non-Brits reading this [if any]: in the UK we are required, if we want to watch broadcast television, to pay a licence fee of £131.50, which goes to support the BBC – BBC TV has no commercial advertising, except for other BBC programmes. You have to show that your TV is physically incapable of receiving broadcast television to avoid it.)

When I moved into this flat, I bought a TV licence using the TV Licensing website. I did not notice at the time that it had ‘auto-corrected’ the address I entered. This house, an early-20th-century end terrace, was split into two flats by the landlady in 1999 (based on the details from the Council Tax website). The landlady, and all the rental documents, refer to my upstairs flat as ‘17A’. However, the council, for council tax, refers to it as ‘First Floor Flat’. (Actually looking at it right now, the Council Tax website I linked above shows ‘1st Flr Flat’ – they clearly also have a stupidly short field length.)

When I moved my credit cards, I wasn’t aware of the council’s designation, so I used ‘17A’, and that’s what I entered on the TV Licensing website as well. Most UK websites have a gazetteer – a lookup of house number and postcode to pick the correct full address, and this one is no exception. It expanded the street name and town correctly, but dropped the ‘A’, so my licence is actually for number 17, which according to the council, no longer exists. The landlady calls the downstairs flat number 17.

Earlier this year, I decided to get a PVR (Humax PVR9200T, very good thanks). I ordered it with my credit card, and as usual when buying from a new supplier, they insisted it was sent to the card address (i.e. 17A). Whenever you buy TV equipment, this is reported to TV Licensing. Ever since then, I’ve been getting demands to buy a licence for 17A – which I can’t, because the website won’t accept it!

I’ve tried to change the address. You can still edit the address after it’s been expanded out on the change of address page, and I’ve tried that, but the ‘A’ is still dropped.

I’ve sent them letters. They’ve ignored them.

I’ve tried to phone them. They have an automated change of address system. It’s unusable. I’ve tried to leave a phone message. You get about 30 seconds, which is far too little to actually explain the problem. I’ve asked to be phoned back – they haven’t. I’ve tried to be put through to an agent – I just get disconnected.

I’ve sent emails through their website. They’ve been ignored or lost.

What’s actually almost more annoying is the lackadaisical attitude they’ve taken. I bought the PVR in February. I guess when I haven’t had a reply to one of the many attempts to contact them and get this corrected, I’ve been overoptimistic and assumed that they’d sorted it, whereas silence actually means I’m being ignored.

Today I’ve sent two more emails, one using the contact form and the other directly to the email address shown. Hopefully one of them will be processed this time, before the bailiffs come round.

I can’t even try to buy another licence (I’d lose about £40 because this licence still has four months to run, but that’s worth less to me than all this hassle), because I still can’t enter the correct address!

Thursday 24 August 2006

Generic components can only get you so far

We had a strange issue with Meteor Server about two years back. Under stress, the Mem Size column (working set size, in fact) in Task Manager would be up and down like a yo-yo. I initially wondered whether the OS was trimming the working set too aggressively, and tried using the SetProcessWorkingSetSize function to increase the quota. Result: no improvement, it was still happening. The time spent in the memory allocator was causing the server to slow down significantly, and as it started to slow down, the problem would get worse, and worse, eventually virtually grinding to a halt.

To prevent overhead of context switching between multiple runnable worker processes, we moved a long time ago (before I started working on it) from a model where each client would have a dedicated worker process, to a much smaller pool of worker threads (the old mode can still be enabled for compatibility with older applications that don’t store their shared state in our session object or otherwise manage their state, but it is highly discouraged for new code). This does mean that there will be times where a client request cannot be handled because there is no worker process to handle it.

After some thought and experimentation, it became clear that what was happening was that when the server started to slow down, the incoming packets were building up in, of all things, the windows message queue. I should say at this point that we were using the Winsock ActiveX control supplied with Visual Basic 6 for all network communications. We already had a heuristic that would enable a shortcut path if the average time to handle a request exceeded a certain threshold. This shortcut path simply wasn’t fast enough.

To work around the problem, I added code that would actually close the socket when either of these conditions held. This was pretty tricky to get right as we had to reopen the socket in order to send a response out of it, and we would then need to close again if the average time still exceeded the threshold. There was at least one server release where the socket would not be reopened under certain conditions (if I recall, when both the time threshold was exceeded and a worker process became available). The memory allocation issue still occurred, but it was contained. I added an extra condition that would also close the socket if no worker process was available (this would prevent some retries from lost responses and some requests for additional blocks, both handled in the server process without using a worker, from being handled).

Then, recently, we discovered a problem with the code used to send subsequent packets of large responses, too large to fit into a single packet (the application server protocol is UDP-based). We weren’t setting the destination (RemoteHost and RemotePort properties) for these packets, assuming that this wouldn’t change. Wrong! If another packet from another client arrives (or is already queued) between calling GetData and SendData, the properties change to the source of the new packet. This sometimes meant that a client would receive half of its own response and half of a different one, which when reassembled would be gibberish (typically this would cause the client to try to allocate some enormous amount of memory, which would fail). I corrected that, but found that the log in which we (optionally) record all incoming and outgoing packets still had some blanks in it where the destination IP and port were supposed to be – these values retrieved from the RemoteHostIP and RemotePort properties. Where were these packets going? Who knows! Perhaps they were (eek!) being broadcast?

The WinSock control really isn’t designed to be a server component. Frankly it was amazing we were getting around 2,400 transactions per minute (peak) out of it. It was time to go back to the drawing board. Clearly I was going to need an asynchronous way of receiving packets, and the Windows Sockets API really isn’t conducive to use from VB6, so it was going to be a C++ component. Since string manipulation and callbacks were involved, I went with a COM object written with ATL.

I surmise that the WinSock control uses the WSAAsyncSelect API to receive notifications of new packets, and that’s why we were seeing the message queue grow with each packet received. The new component uses WSAEventSelect and has a worker thread which waits on the event for a new packet to arrive. When a packet arrives it synchronously fires an event, which has the effect of waiting until the server finishes processing the packet – either discarding it (as a duplicate, otherwise malformed, or due to excessive load), sending the next block in a multi-block response, or handing the request off to a worker process.

This does mean that there could be long delays between checking for packets. Doesn’t that cause a problem? Not really. The TCP/IP stack buffers incoming packets on a UDP socket in a small First-In-First-Out buffer. If the buffer doesn’t have enough space for an incoming packet, the oldest one in the buffer is discarded. That behaviour is perfect for our situation. You can vary the buffer size (warning, it’s in kernel mode and taken from non-paged pool, IIRC) by calling setsockopt with the SO_RCVBUF parameter.

For added performance the socket is in non-blocking mode, so on sending a packet, it simply gets buffered and the OS sends the data asynchronously.

Net result? No more problems with misdirected packets (my new API requires you to pass the destination in at the same time as the data), a step on the road to IPv6 support (the WinSock control will not be updated for that) – and a substantial performance improvement. My work computer now does 7,000 transactions per minute (peak) on the same application – and the bottleneck has moved somewhere else, because that figure was achieved with only three worker processes while the earlier one was with eight. (Hyperthreaded P4 3.0GHz). We saw much less difference in a VM (on the same machine) I’d been using to get performance baselines, but what we did see was that the performance was much more consistent with the new socket component.

The sizing for this application was previously around 1,500 transactions per minute per server, so this really gives a substantial amount of headroom.

My component would be terrible for general use – but it’s just right for this one.

Wednesday 23 August 2006

Sometimes where code runs is more important than what it is

Sometimes, to get the best performance from some code, you have to change the architecture.

Our application server product, Meteor Server is a complex beast. To be able to handle requests from clients concurrently, the main MeteorServer.exe process farms out those requests to a pool of worker processes. (Yes, we could switch to a single multi-threaded worker process even with VB6, but it’s a lot of effort and many existing applications may not be threadsafe, so we’d have to offer both schemes, and that’s even more effort.)

We can’t multi-thread the main MeteorServer.exe process because it’s written in VB6, and while you can make an out-of-process COM server process (a local server in COM parlance, an ‘ActiveX EXE’ in VB6 terminology) multi-threaded, you can’t make a ‘Standard EXE’ multithreaded. Oh, there are hacks, but I’m of the firm opinion that you shouldn’t subvert a technology to make it do something it wasn’t designed to do – when it goes wrong you will get no support.

A Meteor application is a COM object which exposes two methods through a dispatch (Automation) interface – well, one property and one method. The property, called VersionString, is simply to allow Meteor to pick up and display version information for the application. Every other piece of interaction is done through the TerminalEvent method, which receives a couple of interface pointers to allow it to call back into Meteor, a flag indicating whether this is a new client, a numeric event type indicating what the user’s last action was, and a string representing any event data. The application then calls methods on the interface to accumulate a batch of commands to be sent to the client – things like clearing the screen, setting the text colour, displaying text at a given location, sending a menu of options, defining an entry field. When one of these methods is called, it’s turned into an on-the-wire format, with an operation code and a wire-representation of the parameters. When the application returns from TerminalEvent, the server sends the complete batch to the client.

When I first started working on Meteor Server, when the application called a command-generating method, the stub of code in the interface made a call back into the MeteorServer.exe process to perform the wire-format translation. This meant it had to wait for the server process to finish whatever it was doing and go back to waiting for a window message. This made the server process a serious bottleneck – it was a ‘chatty’ interface, which is really not advisable across process boundaries. About three years ago, I looked at the code and realised it actually had no dependencies on any data in the server process, and had the idea to move this formatting code into the worker process that the application object was running in, to improve both performance and scalability. The commands would be batched up in the worker process then only sent across to the server when the batch was complete.

About a year later I actually made the change – we were seeking a significant performance improvement at the time. I don’t have a record of the performance change but I think it was some decent multiple – 3x or so the transaction rate.

About this time last year, or a little before, I was asked to add a new feature. Meteor provides a session state storage object which can store arbitrary strings that the application sets. The new feature was to allow an application to copy the session state data from another session to its own – this allows a user to resume their work on a different client, for example if the hardware is damaged or otherwise fails. I initially put the extra code directly in the batch-retire method that the worker process calls on completing a request, adding a new parameter, but when testing for performance, discovered that the simple test to see if the session should be transferred caused a regression of about 10%, and that would mean the difference between exceeding and failing to meet the performance requirement for a different customer.

The solution was to make this a separate method that the worker process would call if required, and to take it out of the mainline, returning the batch-retire method to its previous implementation. On doing this, the regression had gone – we were back up to almost exactly the same performance level we’d had before.

Know the environment in which your code has to run.

You must call Dispose

You must call Dispose.

If an object implements the IDisposable pattern, or otherwise offers a Dispose or Close method, you must call it. No exceptions. If an instance member of your class implements IDisposable, your class should too.

OK, if you don’t, a finalizer might clean up after you. But you don’t want it to do that. The finalizer will only run once a GC has occurred. And a GC only runs based on heuristics of how much memory has been allocated (in general; there are a few cases in Compact Framework where the GC can be spurred to run, for example on receiving a WM_HIBERNATE message from the shell to say that physical memory is low). In .NET Framework 1.x and both versions of .NET Compact Framework, the GC has no idea how much unmanaged memory is being used by any object that manages an unmanaged resource. .NET Framework 2.0 does have the GC.AddMemoryPressure method, which can guide the GC to collect earlier than it might otherwise have done.

Finalizers don’t run on a regular application thread. They run on a special finalizer thread. This means you have to be careful around possible synchronisation issues. Objects to be finalized wait in a queue, and only one thread services that queue, so if a finalizer blocks, all undisposed objects will end up hanging around.

Once the finalizer has run, the managed memory isn’t automatically released. You have to wait for GC to run again. On the desktop or server, with the full Framework, you have to wait for it to collect the generation of the heap which the object is now in, which means an even longer wait for the managed memory to be released, which can keep the GC heap larger than it could have been.

The GC propoganda basically tells us we can be lazy. We can’t. We must clean up after ourselves. Treat the finalizer as a safety net.

Best practice is to manage one unmanaged resource with one managed object, and keep that managed object as simple as possible – ideally, just to manage that object. Make your resource manager class implement IDisposable and give it a finalizer as a backstop.

Sunday 20 August 2006

Anyone managed to get eVC working on Windows Vista Beta 2?

My job involves a fair amount of C++ development for Pocket PCs and other Windows CE-based devices. I’ve been doing this for five years as of next week, so in that time I’ve used – and deployed projects developed with – eMbedded Visual C++ 3.0 with the Pocket PC 2000 SDK, the Pocket PC 2002 SDK, eMbedded Visual C++ 4.0 with Pocket PC 2003 SDK and now even a couple of things debugged with VS2005 (note: not compiled with VS2005 because they needed to be backwards compatible with PPC2003 and not redistribute the MFC 8.0 runtime – you can’t use the MFC 6ish supplied with the PPC2003 SDK with VS2005 because the headers won’t compile). To be able to upgrade to Windows Vista, compatibility with eVC is an absolute requirement – if it doesn’t work, I can’t upgrade.

(What about running under Virtual PC? Virtual PC doesn’t virtualise USB ports, new versions of ActiveSync don’t offer network sync, I don’t really fancy manual connection setup, and I can’t always rely on having devices with network support.)

Unfortunately it seems at the moment that eVC just doesn’t work under Vista. eVC 3.0 works with the Pocket PC 2000 SDK (only) installed, but as soon as you install the 2002 SDK (which shipped Platform Manager 4.0) it breaks, taking an access violation exception on opening a project. That’s installing with User Account Control enabled. With UAC disabled before installation, it doesn’t even get as far as an empty environment after installing the 2002 SDK.

eVC 4.0 barely even installs. If you launch Setup from the Program Compatibility Wizard, after setting Windows 2000 compatibility, it does install; using the default compatibility options (i.e. none) it crashes before even asking for the product key. I did find that a scripted build (using EVC /MAKE from the command line) would build for some platforms but not others, but again would crash on opening a project, making it impossible to debug. I’m guessing that UAC gets in the way of installing the SDKs to the right place.

This is a shame. I’d hoped that UAC would help with the myriad problems in trying to get eVC (either version) to work under a limited user account.

On Windows XP, to get them working as a limited user, both versions require users to have write-access to their installation directories. This stems from the Access/Jet database used to hold the processor type definitions, VCCEDB.MDB. Write access to this file is needed when running, and if you start more than one copy, Jet needs to create a ‘lock’ (.ldb) file in the same directory to manage concurrency. There are also registry keys which contain platform definitions under HKEY_LOCAL_MACHINE\Software\Microsoft\Windows CE Tools\Platform Manager, which contain things like the paths under Tools, Options, Directories. If eVC cannot write to this key, it creates an equivalent tree under HKEY_CURRENT_USER, but does not copy the original data, leaving you with a non-functional SDK. You have to copy the settings (maybe I should write a program to do this). I recall that there are other settings that you have to change the permissions on, but can’t recall what they are right now – it’s a while since I last had to do it.

eVC 4.0 also complains that it is unable to update its help system after you install any new SDKs. Running from a command prompt launched by makemeadmin doesn’t help, nor does running eVC as an administrator with Run As.

Somewhat related, VS.NET 2003 has a problem with device debugging as a limited user where it will fail to connect to a device that hasn’t been used for debugging before. You have to run as an admin (via makemeadmin) the first time you connect to a device (or the first time after cold booting). I surmise that some part or parts of the encryption keys are stored in a location that isn’t writable by a limited user.