Friday, 5 March 2010

MFC message handling

Goran, a commenter at Raymond's blog, asks:

As one of those people who only have a hammer :-( ... It looks like MFC just told us off with this? WM_GETDLGCODE has meaningfull wParam and lParam, but CWnd::OnGetDlgCode has none of that. Anyone used OnGetDlgCode? Help?

The specific context is handling the return key - Windows tells you what key caused WM_GETDLGCODE to be fired in the wParam parameter, but MFC doesn't pass that information on to the OnGetDlgCode handler.

Something that often catches people out with MFC is the thought that you have to use the specific handler that MFC declares for a certain message. In fact, it's simply a shortcut - ON_WM_GETDLGCODE in the message map causes the message map handling to go down a certain branch that happens to call the handler function with no parameters.

Instead, you can simply use ON_MESSAGE in the message map, using WM_GETDLGCODE and declaring the function as taking WPARAM, LPARAM and returning LRESULT. Do make sure you don't also include ON_WM_GETDLGCODE.

The message map macros themselves simply produce a table of message number ranges and handler function pointers. The MFC WindowProc scans the table to find a handler for a given message number, then - based on an enumeration value set by the message map macro which indicates the argument types - decodes wParam and lParam, then calls the handler with the appropriate arguments.

Many of the predefined helper message map macros, which usually don't take any arguments, define the function name that you're expected to implement. ON_MESSAGE allows you to specify the function name (as you'd expect, as a generic macro).

The Visual Studio Wizards have always simplified the full power of message maps, not supporting custom message numbers (which you have to use ON_MESSAGE for) and not supporting ranges of command IDs (ON_COMMAND_RANGE, ON_NOTIFY_RANGE, etc). Few people even seem to realise that you can use the same handler function for more than one message map macro, for example if you have discontinuous ranges of control IDs.

Tuesday, 13 October 2009

Four-and-a-half months

…that’s how long I was running Windows 7 RC1 without switching back to my hard disk with Windows Vista. The System event log shows I last booted it on 31 May 2009.

This is practically the last action I’m going to take before wiping this disk – the Vista disk – and installing Windows 7 RTM. Just a little check here and there to find anything that I might not have backed up when transferring over to 7 RC1, then it’s formatting time.

The only thing I’ve missed? The extra 200GB of disk space!

In retrospect Windows Vista has not been a bad operating system, but it had some pretty rough edges, particularly at the start with poor hardware support. It was a rush 18–month project, after the initial three-year disaster of Longhorn, and in some areas it shows. Windows 7 has had a further two-and-a-half years to bake, and a further two-and-a-half years of device driver and hardware development; it’s not fundamentally different but feels more complete.

Saturday, 11 July 2009

Technical details on Humax problem

I thought I’d break this into another post so that the fix is easier to understand for non-technical people.

Comparing the old and new driver source shows that the definition of the libusb_request structure had changed. Visual inspection suggested that this was probably the source of the compatibility issue, when using the old DLL with the new driver.

When using the new DLL, the following error occurs: “The ordinal 76 could not be located in the dynamic link library libusb0.dll.” Inspecting the program and DLL shows that while the DLL’s entry points are exported by name, the program links to the DLL by ordinal. This is peculiar; while it’s a little faster than string-matching, and it saves a little space, care has to be taken to keep the ordinals the same between versions of a DLL. Reading the .def file shows that the ordinals weren’t specified, and that numerous functions were added and removed between the two versions.

My fix is to add specific ordinal declarations to the .def file for functions that humax.exe is using, so that the correct functions are used. I did this, and recompiled the DLL with the Windows Vista WDK (Windows Driver Kit). Success! Or, at least, as much success as I’ve had with Windows XP 32–bit; the box still has a tendency to hang during file transfer.

File Attachment: libusb0.def (1 KB)

Downloading from your Humax PVR9200T on Windows x64

The Humax PVR9200T is a pretty good Freeview/digital terrestrial personal video recorder. I’ve had mine for a few years now. Over time, the disk fills up with some programmes you want to keep. It’s possible to copy recorded programmes from the box to your computer, over USB.

Unfortunately the supplied eLinker software is not highly-regarded. As a result some replacements have been written, such as Humax Media Controller. However, the driver supplied only works on 32–bit Windows.

The USB library and driver used are a Win32 port of libusb, a driver which makes it possible to write your USB transfers fully in user-mode. A later version (0.1.12.2) has both 32– and 64–bit versions of the driver. However, the updated driver doesn’t work with the libusb0.dll supplied with Media Controller 1.05, and the program won’t run if you overwrite it with the new DLL.

I’ve rebuilt the new DLL so it will work with the program. The attached ZIP file contains the new version and the new drivers. To use, extract the contents of the folder. Copy the libusb0.dll in the top-level folder to the Humax Media Controller folder, overwriting the current version. Then, use Update Driver in Device Manager to install the driver from the Driver folder.

Windows Vista introduced a driver signing requirement for x64, and this driver isn’t signed. To get it to load on Windows Vista and Windows 7, reboot, and press F8 until the boot menu appears. From this menu, select Disable Driver Signature Enforcement, then continue to boot normally.

Hopefully this will help other people with the x64 versions of Windows!

File Attachment: HumaxPVR.zip (78 KB)

Wednesday, 30 July 2008

ADSL Router suggestions?

OK, I think my router has become too hackable. The reliability has hit the toilet, and I’ve had two messages from my ISP warning me that there has been odd traffic. The first time I got a warning message I think I was using BitTorrent at the time, but the second was yesterday at around lunchtime – when my computers were all off and I was at work.

So, list of demands requirements:

  • Wired router with a few Ethernet ports
  • 802.11g WiFi with at least WPA encryption (can’t do WPA2 on the handheld)
  • Xbox Live compatible
  • DHCP implementation that works
  • DNS server address passthrough from DHCP server (i.e. I want to use my ISP’s DNS servers, not have the box do DNS for me)
  • ADSL2+ compatible, for whenever that gets rolled out by Demon
  • PPPoA since this is of course a UK requirement
  • A firewall that can be completely switched off or is actually configurable intelligibly
  • Not excessively burdened with blinkenlights
  • Will be updated regularly with patches for whatever its OS is

That last point is critical. There are way too many routers out there that run Linux and aren’t patched. D-Link haven’t produced an update since last November and even that’s not officially official, you have to find it on the FTP site. The last official update is September 2006.

I don’t need 802.11n as a) it’s still a draft, b) none of my other equipment needs it, and c) 54Mbps is already sufficiently fast to access the Internet. I don’t do much copying between devices and I wouldn’t do it when they were running on battery anyway.

I’m half tempted to try building a Windows CE kernel for it myself.

Monday, 10 March 2008

How do I see what the JIT compiled, anyway?

You use the SOS debugger extension’s !U command.

Yeah, that’s not very helpful. Here’s how to get from that terse description to something useful.

As an example, let’s use the following trivial program:

using System;

public class Simple
{
   public static int Main()
   {
       Console.WriteLine( "Hello World!" );
       Console.ReadLine();
       
       return 0;
   }
}

Dump this snippet into Notepad, save as simple.cs file and compile it. We want to see what it’ll be like when the user runs it, so we’ll do a release build. From the VS2005 Command Prompt, run csc /debug:pdbonly /o+ simple.cs. (I find it’s quicker to do command-line compiles for simple test programs than firing up Visual Studio. It’s not as if we’re designing a Form.) If you’re doing this on a 64–bit computer, like me, add /platform:x86 so we get consistent output across platforms. (Marking as x86 means that the executable headers are set to indicate a 32–bit program. The IL is the same whatever you set here, but if you leave it as anycpu, the default, the .NET Framework will create a 64–bit process which obviously gives different output.)

Why call Console.ReadLine? Well, the JIT compiler is helpful (or at least it thinks it is). If you start a program under the debugger, it will generate less optimized code because it thinks you want to debug it, but that changes the code from what the user will see. So I want to add a stop point in the program so I can attach the debugger after the process has started.

The next thing you’ll need is the Debugging Tools for Windows kit. Grab the 32–bit kit – the 64–bit kit can debug 32–bit processes, but SOS (the debugger extension DLL which implements the commands we’re going to use) doesn’t appear to work. We’re going to use WinDBG, which is slightly friendlier than the other debuggers although not a lot!

Open WinDBG. Run simple.exe and, when it’s waiting for input, go to File/Attach to a Process in WinDBG. You’ll see a list of other processes (and if running Windows Vista with UAC enabled, or on XP as a standard user, a load of access denied errors). Select simple.exe from the list and hit OK. WinDBG automatically stops once you’ve attached to a process so you can start manipulating the program straight away. This is the output I got:

Microsoft (R) Windows Debugger Version 6.8.0004.0 X86
Copyright (c) Microsoft Corporation. All rights reserved.

*** wait with pending attach
Symbol search path is: C:\Windows;C:\Windows\System32;C:\Windows\SysWOW64;SRV*C:\WebSymbols*http://msdl.microsoft.com/download/symbols
Executable search path is: 
ModLoad: 008c0000 008c8000   C:\Users\Mike\Documents\Programming\simple.exe
ModLoad: 776a0000 77800000   C:\Windows\SysWOW64\ntdll.dll
ModLoad: 79000000 79046000   C:\Windows\system32\mscoree.dll
ModLoad: 75b70000 75c80000   C:\Windows\syswow64\KERNEL32.dll
ModLoad: 76ce0000 76da6000   C:\Windows\syswow64\ADVAPI32.dll
ModLoad: 76e10000 76f00000   C:\Windows\syswow64\RPCRT4.dll
ModLoad: 75850000 758b0000   C:\Windows\syswow64\Secur32.dll
ModLoad: 75a80000 75ad8000   C:\Windows\syswow64\SHLWAPI.dll
ModLoad: 77110000 771a0000   C:\Windows\syswow64\GDI32.dll
ModLoad: 77040000 77110000   C:\Windows\syswow64\USER32.dll
ModLoad: 76c30000 76cda000   C:\Windows\syswow64\msvcrt.dll
ModLoad: 75d00000 75d60000   C:\Windows\system32\IMM32.DLL
ModLoad: 76940000 76a08000   C:\Windows\syswow64\MSCTF.dll
ModLoad: 75a70000 75a79000   C:\Windows\syswow64\LPK.DLL
ModLoad: 759c0000 75a3d000   C:\Windows\syswow64\USP10.dll
ModLoad: 746c0000 7485e000   C:\Windows\WinSxS\x86_microsoft.windows.common-controls_6595b64144ccf1df_6.0.6001.18000_none_5cdbaa5a083979cc\comctl32.dll
ModLoad: 79e70000 7a3ff000   C:\Windows\Microsoft.NET\Framework\v2.0.50727\mscorwks.dll
ModLoad: 75390000 7542b000   C:\Windows\WinSxS\x86_microsoft.vc80.crt_1fc8b3b9a1e18e3b_8.0.50727.1434_none_d08b6002442c891f\MSVCR80.dll
ModLoad: 75e30000 7693f000   C:\Windows\syswow64\shell32.dll
ModLoad: 771b0000 772f4000   C:\Windows\syswow64\ole32.dll
ModLoad: 790c0000 79bf6000   C:\Windows\assembly\NativeImages_v2.0.50727_32\mscorlib\5b3e3b0551bcaa722c27dbb089c431e4\mscorlib.ni.dll
ModLoad: 79060000 790b6000   C:\Windows\Microsoft.NET\Framework\v2.0.50727\mscorjit.dll
(1dc.12b4): Break instruction exception - code 80000003 (first chance)
eax=7efaf000 ebx=00000000 ecx=00000000 edx=7770d2d4 esi=00000000 edi=00000000
eip=776b0004 esp=04c6fa00 ebp=04c6fa2c iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
ntdll!DbgBreakPoint:
776b0004 cc              int     3

You enter your debugging commands in the edit box at the bottom, next to the prompt 0:003>. This means we’re debugging the 0th process we’re attached to, and our commands by default apply to thread number 3. Four threads? Well, thread 0 is our main thread, thread 1 was created by the CLR’s debugging support in case we attached a debugger, thread 2 is the finalizer thread, and thread 3 was just created by WinDBG so it could stop the process safely. (You can see this by running ~* k, although you’ll need to be set up to get debugging symbols from the symbol server to get good stack traces.) The Debugging Tools debuggers automatically stop the process when you attach, so that you can start manipulating the process straight away.

The first thing we need to do is ask WinDBG to load the SOS extension. This is installed with the CLR, so we ask it to load from the same folder that mscorwks.dll (the DLL which implements the virtual machine, effectively the guts of the CLR) lives in:

0:003> .loadby sos.dll mscorwks

Now we can see what state of the managed threads are in by running !threads. Any command beginning ! comes from a debugger extension DLL, and they can be disambiguated if necessary, but they’re searched in reverse order loaded (last loaded = first searched) so anything we want will be found in SOS anyway.

Right. To see what we can do, run !help.

0:003> !help
-------------------------------------------------------------------------------
SOS is a debugger extension DLL designed to aid in the debugging of managed
programs. Functions are listed by category, then roughly in order of
importance. Shortcut names for popular functions are listed in parenthesis.
Type "!help " for detailed info on that function. 

Object Inspection                  Examining code and stacks
-----------------------------      -----------------------------
DumpObj (do)                       Threads
DumpArray (da)                     CLRStack
DumpStackObjects (dso)             IP2MD
DumpHeap                           U
DumpVC                             DumpStack
GCRoot                             EEStack
ObjSize                            GCInfo
FinalizeQueue                      EHInfo
PrintException (pe)                COMState
TraverseHeap                       BPMD 

Examining CLR data structures      Diagnostic Utilities
-----------------------------      -----------------------------
DumpDomain                         VerifyHeap
EEHeap                             DumpLog
Name2EE                            FindAppDomain
SyncBlk                            SaveModule
DumpMT                             GCHandles
DumpClass                          GCHandleLeaks
DumpMD                             VMMap
Token2EE                           VMStat
EEVersion                          ProcInfo 
DumpModule                         StopOnException (soe)
ThreadPool                         MinidumpMode 
DumpAssembly                       
DumpMethodSig                      Other
DumpRuntimeTypes                   -----------------------------
DumpSig                            FAQ
RCWCleanupList
DumpIL

To find out a little more about !U, let’s run !help U:

0:003> !help U
-------------------------------------------------------------------------------
!U [-gcinfo] [-ehinfo] <MethodDesc address> | <Code address> 

Presents an annotated disassembly of a managed method when given a MethodDesc
pointer for the method, or a code address within the method body. Unlike the
debugger "U" function, the entire method from start to finish is printed,
with annotations that convert metadata tokens to names.


...
03ef015d b901000000       mov     ecx,0x1
03ef0162 ff156477a25b     call   dword ptr [mscorlib_dll+0x3c7764 (5ba27764)] (System.Console.InitializeStdOutError(Boolean), mdToken: 06000713)
03ef0168 a17c20a701       mov     eax,[01a7207c] (Object: SyncTextWriter)
03ef016d 89442414         mov     [esp+0x14],eax

If you pass the -gcinfo flag, you'll get inline display of the GCInfo for
the method. You can also obtain this information with the !GCInfo command.

If you pass the -ehinfo flag, you'll get inline display of exception info
for the method. (Beginning and end of try/finally/catch handlers, etc.).
You can also obtain this information with the !EHInfo command.

Right, so we need the address of a MethodDesc structure, or the address of the code itself. How can we get one of these? There’s a command called DumpMD…

0:003> !help dumpmd
-------------------------------------------------------------------------------
!DumpMD 

This command lists information about a MethodDesc. You can use !IP2MD to turn 
a code address in a managed function into a MethodDesc:

0:000> !dumpmd 902f40
Method Name: Mainy.Main()
Class: 03ee1424
MethodTable: 009032d8
mdToken: 0600000d
Module: 001caa78
IsJitted: yes
m_CodeOrIL: 03ef00b8

If IsJitted is "yes," you can run !U on the m_CodeOrIL pointer to see a 
disassembly of the JITTED code. You can also call !DumpClass, !DumpMT, 
!DumpModule on the Class, MethodTable and Module fields above.

We’re getting a bit closer. Let’s try DumpModule…

0:003> !help DumpModule
-------------------------------------------------------------------------------
!DumpModule [-mt] 

You can get a Module address from !DumpDomain, !DumpAssembly and other 
functions. Here is sample output:

0:000> !dumpmodule 1caa50
Name: C:\pub\unittest.exe
Attributes: PEFile
Assembly: 001ca248
LoaderHeap: 001cab3c
TypeDefToMethodTableMap: 03ec0010
TypeRefToMethodTableMap: 03ec0024
MethodDefToDescMap: 03ec0064
FieldDefToDescMap: 03ec00a4
MemberRefToDescMap: 03ec00e8
FileReferencesMap: 03ec0128
AssemblyReferencesMap: 03ec012c
MetaData start address: 00402230 (1888 bytes)

The Maps listed map metadata tokens to CLR data structures. Without going into 
too much detail, you can examine memory at those addresses to find the 
appropriate structures. For example, the TypeDefToMethodTableMap above can be 
examined:

0:000> dd 3ec0010
03ec0010  00000000 00000000 0090320c 0090375c
03ec0020  009038ec ...

This means TypeDef token 2 maps to a MethodTable with the value 0090320c. You 
can run !DumpMT to verify that. The MethodDefToDescMap takes a MethodDef token 
and maps it to a MethodDesc, which can be passed to !DumpMD.

There is a new option "-mt", which will display the types defined in a module,
and the types referenced by the module. For example:

0:000> !dumpmodule -mt 1aa580
Name: C:\pub\unittest.exe
......
MetaData start address: 0040220c (1696 bytes)

Types defined in this module

      MT    TypeDef Name
------------------------------------------------------------------------------
030d115c 0x02000002 Funny
030d1228 0x02000003 Mainy

Types referenced in this module

      MT    TypeRef Name
------------------------------------------------------------------------------
030b6420 0x01000001 System.ValueType
030b5cb0 0x01000002 System.Object
030fceb4 0x01000003 System.Exception
0334e374 0x0100000c System.Console
03167a50 0x0100000e System.Runtime.InteropServices.GCHandle
0336a048 0x0100000f System.GC

We still need a Module to pass to this command, but we’re getting there. Maybe DumpDomain can help us? I’ll skip the help text this time.

0:003> !DumpDomain
--------------------------------------
System Domain: 7a3bc8b8
LowFrequencyHeap: 7a3bc8dc
HighFrequencyHeap: 7a3bc934
StubHeap: 7a3bc98c
Stage: OPEN
Name: None
--------------------------------------
Shared Domain: 7a3bc560
LowFrequencyHeap: 7a3bc584
HighFrequencyHeap: 7a3bc5dc
StubHeap: 7a3bc634
Stage: OPEN
Name: None
Assembly: 0052f848
--------------------------------------
Domain 1: 00516800
LowFrequencyHeap: 00516824
HighFrequencyHeap: 0051687c
StubHeap: 005168d4
Stage: OPEN
SecurityDescriptor: 00517d30
Name: simple.exe
Assembly: 0052f848 [C:\Windows\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll]
ClassLoader: 0050e470
SecurityDescriptor: 0051f5d0
  Module Name
790c2000 C:\Windows\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll

Assembly: 00535ac0 [C:\Users\Mike\Documents\Programming\simple.exe]
ClassLoader: 0050e630
SecurityDescriptor: 00535a38
  Module Name
000f2c3c C:\Users\Mike\Documents\Programming\simple.exe

At last, we have the address of something we can use. Let’s get the module information for simple.exe. For that we need the address next to it in the module list for the simple.exe assembly.

0:003> !dumpmodule -mt f2c3c
Name: C:\Users\Mike\Documents\Programming\simple.exe
Attributes: PEFile 
Assembly: 00535ac0
LoaderHeap: 00000000
TypeDefToMethodTableMap: 000f0038
TypeRefToMethodTableMap: 000f0040
MethodDefToDescMap: 000f005c
FieldDefToDescMap: 000f0068
MemberRefToDescMap: 000f006c
FileReferencesMap: 000f0088
AssemblyReferencesMap: 000f008c
MetaData start address: 008c206c (740 bytes)

Types defined in this module

      MT    TypeDef Name
------------------------------------------------------------------------------
000f3030 0x02000002 Simple

Types referenced in this module

      MT    TypeRef Name
------------------------------------------------------------------------------
790fd0f0 0x01000001 System.Object
79101118 0x01000006 System.Console

Now we can get the MethodTable for the Simple class:

0:003> !dumpmt -md f3030
EEClass: 000f1174
Module: 000f2c3c
Name: Simple
mdToken: 02000002  (C:\Users\Mike\Documents\Programming\simple.exe)
BaseSize: 0xc
ComponentSize: 0x0
Number of IFaces in IFaceMap: 0
Slots in VTable: 6
--------------------------------------
MethodDesc Table
   Entry MethodDesc      JIT Name
79371278   7914b928   PreJIT System.Object.ToString()
7936b3b0   7914b930   PreJIT System.Object.Equals(System.Object)
7936b3d0   7914b948   PreJIT System.Object.GetHashCode()
793624d0   7914b950   PreJIT System.Object.Finalize()
00310070   000f3020      JIT Simple.Main()
000fc01c   000f3028     NONE Simple..ctor()

Finally we have something we can pass to !U. Note the JIT column. Here, PreJIT means that the code has come from a native image generated by ngen, JIT means that the JIT compiler in this process has generated the code, and NONE means that the method hasn’t been compiled yet. We didn’t override any of System.Object’s virtual methods, so they occupy the first four places in our MethodTable.

I placed the call to Console.ReadLine at the end of the program so everything has already been compiled – remember, the JIT compiles code on-demand, one method at a time, when that method is called (barring very simple methods which might be inlined into their callers). Let’s just get on and show the code for Main.

0:003> !U f3020
Normal JIT generated code
Simple.Main()
Begin 00310070, size 36
00310070 833d8c10920300  cmp     dword ptr ds:[392108Ch],0
00310077 750a            jne     00310083
00310079 b901000000      mov     ecx,1
0031007e e8e1580579      call    mscorlib_ni+0x2a5964 (79365964) (System.Console.InitializeStdOutError(Boolean), mdToken: 06000770)
00310083 8b0d8c109203    mov     ecx,dword ptr ds:[392108Ch] (Object: System.IO.TextWriter+SyncTextWriter)
00310089 8b153c309203    mov     edx,dword ptr ds:[392303Ch] ("Hello World!")
0031008f 8b01            mov     eax,dword ptr [ecx]
00310091 ff90d8000000    call    dword ptr [eax+0D8h]
00310097 e84c750d79      call    mscorlib_ni+0x3275e8 (793e75e8) (System.Console.get_In(), mdToken: 0600076e)
0031009c 8bc8            mov     ecx,eax
0031009e 8b01            mov     eax,dword ptr [ecx]
003100a0 ff5064          call    dword ptr [eax+64h]
003100a3 33c0            xor     eax,eax
003100a5 c3              ret

Interesting. The JIT clearly sees Console.WriteLine(string) and Console.ReadLine as trivial and has inlined the calls. In turn it’s inlined Console.get_Out. The reference to SyncTextWriter isn’t the JIT being clever – it’s SOS telling us that the address 0x0392108C is the base of SyncTextWriter’s vtable, as this call to TextWriter.WriteLine(string) is virtual.

I realise, and hope, that this isn’t everyday usage. More often than not you’ll have crashed somewhere and want to find out where (for which see !IP2MD). But it can be useful to see just how your C# code turns into machine instructions.

What the heck does 'rep ret' mean?

I was looking at what the .NET x64 JIT compiler generates for some code, and saw something very odd at the end of the routine: the last instruction of the function was rep ret. Looking a bit further, this is the same at the end of every JIT-compiled routine.

The thing is, the rep prefix to an instruction is supposed to tell it to be repeated. Repeat the return? How do I do that?

The Intel architecture software developer’s manual set says it’s only defined for the ‘string’ instructions like movs (which moves a byte/word/dword from the address pointed to by ESI to the address pointed to by EDI). The rep prefix repeats the string instruction ECX times. Yes, this means that you can implement memcpy in a single instruction. (You can do memset with a single rep stos instruction, once AL is loaded with the value to be stored.) It’s explicitly undefined for anything else.

So where the heck has this illegal usage come from? I followed a couple of clues and found this patch notification for glibc on x64. And indeed the current version of AMD’s optimization guide [PDF] for Athlon 64 processors says that you should do this. The reason? The branch predictor gets it wrong if the ret instruction is jumped to directly by a branch instruction, or if the ret directly follows a branch instruction.

I’m not sure doing it throughout, even when you’ve got epilog code in there which prevents the bug, is necessary though.

AMD have now published a new optimization guide for their Family 10h processors [PDF] and guess what, the advice has changed. Instead of using a two-byte illegal instruction, they now recommend the three-byte instruction ret 0. The difference between a plain ret and a ret imm16 (where imm16 is an immediate 16–bit value) is that ret imm16 pops the return address, then specified number of bytes from the stack before jumping to the return address. It’s common to see this in 32–bit Windows WINAPI (__stdcall) code as this calling convention requires the called function to clean up the parameters from the stack. (64–bit Windows has only one calling convention and it mainly passes parameters in registers, so stack cleanup is not required.

Still, it’s a shame to see the JIT generating this on my Core 2 Duo laptop, which doesn’t have the bug (as far as I know, but there’s no mention in Intel’s optimization guide). And it’s an even bigger shame on AMD that they a) didn’t fix the damn bug in the processor and b) recommended an illegal instruction as a way round it.

Sunday, 13 January 2008

BT and the case of the missing DSL service

Recently, at the end of September, I got a new neighbour in the flat downstairs. The property is an end terrace, split into two flats in 1999, but the landlady never did a great job of separating the services out (electricity and gas usage is included in the meter counts downstairs, there isn’t a separate supply). When I moved in, the landlady lived downstairs; she allowed a previous tenant of this flat to put in extension wiring to use her old BT landline, as she’d got cable service with NTL (now Virgin Cable, of course), so I was using the BT line. After leaving, other tenants of the downstairs flat had simply taken over the cable service.

When the new bloke moved in downstairs, he wanted a BT service. He claims that he asked them for a new line – they interpreted this to mean taking over the old one, and started the process of transferring my line to him. First I knew of this was a letter dated 16 October saying “we understand you no longer live at the above address and so we’ll disconnect the phone on 30 October.” Obviously I phoned up straight away and said no, I’m not leaving, please cancel this transfer. They prevaricated a bit, then said, “oh, I can’t cancel this on your say-so, the person ordering has to cancel.” Fine, I say – can I give him a direct number to contact you? “Nope.”

Anyway, I ask the new guy if that’s OK, he says he got fed up of BT trying to transfer the line and went with cable anyway, and he’d been trying to get through to cancel and would try again. A day or so later he tells me he’s cancelled and I don’t think any more about it.

In early November my parents tell me they couldn’t get through. (They didn’t actually tell me they were getting a message saying ‘number not in service.’) A few days into November, suddenly, no connection. A horror begins to dawn on me, and my neighbour – BT have gone ahead and done it anyway.

Now comes the reason for the tale. Most of the time I can’t get through to BT Customer Service, even if I wait for half an hour on the phone. The first time I get off hold, there’s silence for a few seconds and the line goes dead. Next time I’ve apparently called the wrong department and they try to transfer me, but again I end up on hold for half an hour (when I give up). After several goes of this I go for an end-around: I call sales and ask for the line to be transferred back to me.

No problem, says the salesman, but there’s a 12–month minimum contract and you can’t have the line until the 27th of November, unless the current customer cancels before that, oh, and you can’t have your old number back. I did try explaining but it seemed futile. Actually I got pretty angry which I did apologise for.

I thought about it for a couple of days and decided to do what should have been done a very long time ago – get a separate line terminating in this flat. You’d expect that to take a long time, right? Longer than transferring the billing from one customer to another?

Nope. I could have it put in on the 20th. So I paid my £125 (£125!) and got it done.

Unfortunately you can’t order DSL until the phone line is in your name and it appears, under the new number, in BT’s database. So the earliest I could order ADSL was the afternoon of the 20th, then it takes another 14 days or so, taking us to an official activation date of 7 December. I got straight through to Demon sales on practically the first ring. They said I couldn’t have my old hostname/website/email address back initially but once activated I could ask for it back.

The 27th came and went and I suddenly realised I hadn’t cancelled the transfer! Never fear, either the second BT salesman cancelled it for me, or (more likely) they cocked up again. My neighbour still hasn’t managed to cancel and I think he’s refusing to pay.

Finally I got DSL back on the 6th, called up Demon support (got through on first ring again) and got my old hostname back, but the contents of my website were lost. So there are a lot of broken images around here, and if you emailed me in November, I didn’t get it.

I’d always heard that NTL, Telewest, and the unbundled broadband suppliers had worse customer service than BT. If it’s worse than BT they must send round people with pitchforks to stab you with if you even dream of calling them.

Return of Best Error Page

CodeProject’s one of my favourite sites – indeed, one of the few real regular sites I visit – but it’s always had a bit of a problem with errors and with the load on the servers. I commented on this before, but sadly the image is gone – lost when my DSL account was closed (BT cock-up, nothing to do with Demon).

Anyway, they recently moved to an all-new ASP.NET implementation of their forums, but unfortunately the errors are still a common occurrence. And the error in the error handler is back:

Error - Windows Internet Explorer

(Yes, running Vista now.)

One of the things I love most about CP is the more… relaxed attitude to reporting problems.

Friday, 7 December 2007

Calling out EMBASSY Trust Suite

Dear Wave Systems,

Fuck right off.

When I bought my Dell Latitude D820 laptop, I went through and deleted a certain amount of the shovelware that was preinstalled, but some of it, to be honest, I didn’t know what it did. I tried to work out what EMBASSY Trust Suite did, and it seems to be something to do with the Trusted Platform Module in the system. Encrypting files using the TPM key, or something like that.

Charles Petzold wrote an article yesterday commenting on random number generators, and I added a comment earlier on mentioning in passing the reported flaw in CryptGenRandom. I decided to see if this seemed to be the case in XP SP2. My answer was inconclusive, as the assembler was hard to follow. By chance I was debugging Internet Explorer with WinDBG (easiest way to force the RSA Enhanced Cryptographic Provider, rsaenh.dll, to load so I could disassemble it) and noticed an odd number of access violation exceptions occurring when I accidentally did a search in the instance of IE I was debugging. That’s weird, I thought – where am I on the stack?

In wxvault.dll. The stack trace was:

0012c888 100065ee wxvault+0x7967
0012cac4 42f8c769 wxvault+0x65ee
0012cadc 42f8cdc9 IEFRAME!PathFileExistsW+0x24
0012cb14 42f8ccf7 IEFRAME!HelperForReadIconInfoFromPropStore+0x97
0012cb98 42f78e53 IEFRAME!CInternetShortcut::_GetIconLocationWithURLHelper+0x156

Looks like we’re trying to get the favourite icon for the address bar. But how is IEFRAME calling into wxvault? Microsoft can’t know that this library exists. Is there something on the stack that somehow isn’t being included (can happen if a function was compiled with the Frame Pointer omitted and no symbols are available to get FPO data [which tells the debugger how to fix up]). Let’s disassemble around PathFileExistsW:

42f8c75e ff7508          push    dword ptr [ebp+8]
42f8c761 8bf8            mov     edi,eax
42f8c763 ff152c13ef42    call    dword ptr [IEFRAME!_imp__GetFileAttributesW (42ef132c)]
42f8c769 83f8ff          cmp     eax,0FFFFFFFFh

That’s weird, we called into GetFileAttributesW. How did we end up in wxvault?

0:000> u kernel32!GetFileAttributesW
kernel32!GetFileAttributesW:
7c80b74c e965ae7f93      jmp     wxvault+0x65b6 (100065b6)

Evil! They patched the running instance of kernel32! What else have they patched?

kernel32!CreateFileW:
7c810760 e9d0587f93      jmp     wxvault+0x6035 (10006035)

Note how they’ve failed to rebase the DLL, using the default 0x10000000 base address, making it collide with everything ever which also uses that default address.

Needless to say this is going to get uninstalled as soon as I take a full backup of the laptop! In my book, this is a user-mode rootkit. I don’t use the feature, so it’s going.

How should they have implemented this? Well, I’d start by seeing if it’s possible to change the algorithm for the Encrypting File System. It should be, it’s implemented using the Cryptographic API and CSPs (involving callbacks into LSASS in usermode!), so I would have thought that simply providing your own CSP would be sufficient.

If that’s not possible, my next port of call would be a file system filter driver. That would have the downside (like this) that every file system call would go through it, rather than the tiny amount of calls which actually target a file encrypted in this way.

The access violation looks like it might ultimately be caused by a bug in IE – it looks like IE tried to pass the URL to the favicon to GetFileAttributesW, which I would hope would fail (or would it try to invoke WebDAV?)

Sunday, 4 November 2007

VS2005 Smart Device projects don't Dispose components

I’ve posted about this on Connect and on Codeproject’s Lounge, and informed my colleagues, but not yet on here. I realise I’m repeating myself for some of my audience, but this is in a more readily searchable location.

I was trying to work out why Compact Framework applications seemed to be consuming a fair amount of memory, often causing the division between Program and Storage memory on Pocket PC 2003 to shift greatly toward Program, often causing inability to create files. I created a simple Windows Mobile 5.0 application with two forms, both with menu bars with actions on them, and had the first form continually create and destroy the second. I then loaded up .NET Compact Framework Remote Performance Monitor to monitor the application. I was seeing 400KB+ of GC heap still being used after a collection. Looking at the GC heap (in .NET CF 2.0 SP2’s RPM) showed that a large number of instances of the second form were still referenced.

Drilling down showed that the MainMenu and MenuItem objects that had been owned by the forms were still rooted, by their finalizers (shown as [root: Finalizer]). The forms were still referenced because I had event handlers connected to the menu items. That meant that all the other controls on the form were also still referenced, by the members of the form class.

To understand why they’re still rooted, even after GC, you have to know how finalization works. When a GC occurs, the collector marks every object that’s referenced by following roots, then sweeps away all those objects that are no longer referenced. When doing the sweep phase, if the object requires finalization (i.e. implements a Finalize method and GC.SuppressFinalize has not been called for that object), instead of actually being deleted and the memory reclaimed, it is placed on a finalization queue. This in itself is a source of roots. The finalizer is not run during the GC itself to reduce the time spent in GC. Instead, a separate finalizer thread processes the queue of objects to be finalized. The object will continue to be reported to GC until its finalizer is run, so may survive multiple GCs if memory demand is high and the finalizers are slow.

The reason I recommend that as far as is possible, you avoid the finalizer is that unless you’re very careful, you can end up with thread affinity problems (trying to destroy something from the wrong thread, or having to marshal back to the creating thread) and you can end up blocking the finalizer thread indefinitely. The issue of freeing memory later than would otherwise be possible isn’t as much of an issue on the desktop, but it also affects GC’s tuning of how often to run. It’s definitely an issue on the limited memory available on devices. For more on how GC works in the Compact Framework, see Steven Pratschner’s “Overview of the .NET Compact Framework Garbage Collector”. I’ve recommended before that you always call Dispose, on any object that implements it (although I’ve discovered it isn’t safe to Dispose the Graphics object passed to you in a PaintEventArgs structure).

So where does the finalizer come from? I studied the assemblies in Reflector. A side note here: the assemblies in Program Files\Microsoft Visual Studio 8\SmartDevices\SDK\CompactFramework\2.0\v2.0\WindowsCE do not actually contain the IL code, only the metadata. For the actual implementation see under SmartDevices\SDK\CompactFramework\2.0\v2.0\Debugger\BCL. Recent versions of Reflector will load assemblies from this location when you create a new assembly list in the File, Open List dialog (or when starting a newly downloaded copy, for the default list).

In Compact Framework 2.0, the System.ComponentModel.Component class implements a finalizer, so anything which derives from this class automatically gets one too. (It calls the virtual Dispose(bool) method, passing false as the parameter). The Dispose method in this class calls GC.SuppressFinalize, so if you dispose of a component properly, it won’t end up on the finalization queue. This matches what the desktop Framework has done since .NET 1.1 at least (I don’t have 1.0 installed to check).

For both desktop and device projects, a newly-created form declares a components member of type System.ComponentModel.Container, to contain the components dragged onto the designer surface. A Dispose(bool) override is generated for you (in Form.Designer.cs/.vb, for .NET 2.0 projects) which calls Dispose on the container, if components is not null. (Presumably the intent was that the container wouldn’t be created until needed, but in fact the initial code in InitializeComponent for a new form does create a Container and assigns it to components.) Container’s Dispose method disposes anything that was added to the container.

For a desktop project, when you drop something that derives from IComponent (I think) onto the form, and which doesn’t derive from Control, the designer generates code to add that component to the container, so it will be disposed of when the form is disposed. Simple. However – and here’s the bug – the device designer doesn’t. In fact it even deletes the creation of the Container object. Result, all your components end up running their finalizer, and thereby consuming memory past the first GC after they died, potentially increasing the overall memory use of the process.

To avoid the problem, you have to write the code to dispose the components yourself. The most straightforward way is to add the code that Visual Studio should have generated to your form’s constructor, after the call to InitializeComponent. That will typically look like:

this.components = new System.ComponentModel.Container();
this.components.Add( mainMenu1 );
// etc for other components

The difficulty is knowing exactly what to add and remembering to update it when you add or remove components. Reflector can help a bit – use the Derived Types view under Component to see the classes that are affected. The other one that affected us was the HardwareButton class (in Microsoft.WindowsCE.Forms). It doesn’t actually override Dispose(bool) so I think you ought to write code to clear AssociatedControl.

This bug still exists in VS2008 Beta 2. To help ensure it doesn’t continue in future versions, please vote for the bug on Connect.

In .NET Compact Framework 1.0, there isn’t a systematic finalization like this – Component doesn’t have a Finalizer or a Dispose method, although weirdly it does have a virtual Dispose(bool) method and a Disposed event. It also doesn’t implement IComponent, so you can’t add a Component-derived class to a Container! And it doesn’t implement IDisposable either.

The MainMenu class unfortunately does have a Finalizer, but it does not have a Dispose method. After a bit of digging around, you can see that you can at least clean up the native resources by setting the form’s Menu property to null. It still causes all the managed objects to remain referenced (if it has menu items that have event handlers). You’ll have to call GC.SuppressFinalize yourself once you’ve forced it to clean up.

The more I think about it, the more I think that garbage collection overpromises and under-delivers. At least the C++ resource-acquisition-is-initialization and automatic variable destruction model ensures that resources are freed in a timely fashion. That’s why the C++/CLI designers hooked that model up to IDisposable – there is no using block because local GC handle destruction does the job.

Friday, 12 October 2007

Tired of whacking rats

I’m fed up with Windows Mobile. Fed up with it from an enterprise device perspective, anyway. See, most customers want a limited set of applications to run on their device. If it’s only one, you can write a full-screen app and hide the Start bar at the top, and maybe the menu bar at the bottom if you don’t need the on-screen keyboard. Two, and you have to find a way to switch between them (and recently it seems that the second one is TomTom Navigator, which doesn’t co-operate anyway) and stop anything else being run.

And so you do this, and bing! up pops a notification. So you work out how to kill that one and bing! here comes another one. It’s like playing Whack-A-Rat.

And I’m tired of it. I’m tired of trying to make my programs survive a cold boot, when your OEM subtly changes how it works from device to device (Symbol MC3090 – breaks the convention of ‘Application folder is for developers’ by lumping the wireless configuration utilities in there, and they have to match the wireless driver version, so you can no longer write to a hex file). But Windows Mobile 5.0 has persistent storage, right? Yes, but OS updates include a wipe of persistent storage, because the registry settings in persistent storage are a diff from the ROM version, so you change the ROM registry, you break the diff, and now you have to work out how to persist your program across clean boots. Or the OEM has inexplicably made the clean boot a simple keypress, with a confirmation prompt, but one so confusing where you press one half of a rocker for YES, I’d like to destroy my applications, and one for NO, actually I’d like to keep them please, and not marked them clearly or the instructions are confusing so the user that meant to say NO says YES by mistake, or they’re labelled the wrong damn way round in the first place and you still have to make the damn program survive!

…and breathe…

…and now you have to come up with a way to do program and firmware upgrades over the air, only it wasn’t designed in, because we were going to use the OEM’s system, but we can’t use that because a) it costs a bomb and b) it sucks and c) it blows as well and you find that you can’t do a complete device reload because the damn ROM image is too big to fit in RAM or persistent storage alongside the updates to your program because they couldn’t find room in ROM for .NET ‘Compact’ Framework and SQL Server ‘Compact’ Edition so you have to do it a bit at a time and now you’re trying to fudge it all at the last minute and then which bit has to chain which other bit and how do we do this on a schedule so updates don’t interfere with the user’s work and I DIDN’T SIGN UP FOR DEVICE MANAGEMENT DAMMIT!

Monday, 3 September 2007

Everyone working on databases should read this

Rico Mariani (now Chief Architect of Visual Studio, formerly just a performance architect on the .NET Framework) has written a great article on database performance, but also covers correctness issues. A good read for any developer working on databases (and isn’t that most of us now?)

Database Performance, Correctness, Compostion, Compromise, and Linq too

Tuesday, 10 July 2007

Another reason not to overload the .NET Framework name

This month’s security bulletin becomes a lot more confusing. It was pretty confusing already, but the extra detail of .NET Framework 3.0 is/is not vulnerable just adds an extra layer.

(Suggestion: update and let your customers know. Since .NET Framework patches are cumulative I expect Barry’s validators are also included.)

Tuesday, 5 June 2007

Wow, long time no post

I have intended to on occasions, never got round to it. It’s been so long that Blogger have changed completely over to Google logins and my old configuration in BlogJet no longer worked because I’d had to switch to the Google login at one point (think I wanted to use my own identity on posting a comment on someone else’s blog).

Indeed the old Google API didn’t work anymore either and I had to grab BlogJet 2.0.x.

Google’s increasing privacy-invasion is making me want to get off this ship (and this one too).

When developers fight...

Microsoft take their ball away.

Tuesday, 16 January 2007

Geek Dinner this Friday (19 Jan)

My friend Colin Mackay, who I know from CodeProject, sent me a message last week to tell me that he was attending this weekend’s Vista and Office Developer Launch in Reading, and to ask if I’d like to meet up while he was here.

I was a bit slow responding and discovered tonight that he’d signed up for Zi Makki’s Geek Dinner on Friday night. If you’re in the area – whether or not you’re going to the event itself (I’m not, and nor is Ian) – why not come along?

Wednesday, 20 December 2006

How to misuse the Office 2007 Ribbon

Dare Obasanjo has noticed a comment of mine on Jensen Harris’s post announcing Microsoft’s licensing of the concept of the Office 2007 ‘Ribbon’ UI. In that comment, I criticised (in a single sentence) Dare’s concept for a future version of RSS Bandit. I should say up-front that I’m a regular user of RSS Bandit; it’s my main RSS reader at home, in which I’m subscribed to over 100 feeds. I want this to remain usable, and my fear is that it won’t be.

Funnily, he doesn’t acknowledge that I made the first comment on that post, in which I go into detail. I said:

It doesn't belong. There's no need to go to an Office-style menu system in RSS Bandit because you barely ever use the menus anyway. It's not like there are loads of features hidden in the depths of the menus and dialogs, and the gallery is particularly over-the-top. How often do you think people will change the style of the newspaper view? Virtually never, in my opinion - they'll pick one that works, and stick with it. These options don't need to be 'in your face' the whole time. RSS Bandit is not document authoring software, it's a browser.

If anything you could follow IE7's lead and drop the menu bar entirely. There aren't that many menu options, and most of them are replicated with some other widget, on one of the toolbars, or in the case of View/Feed Subscriptions and View/Search, the two tabs in the pane.

Most of the other options that aren't duplicated could end up on an extended Tools menu.

Dare links to Mike Torres who comments on the menu-less UI of various Microsoft applications, suggesting that this is something recent. At least two of these have been menu-less for a while, in one case for five years: Windows Media Player. The original version of WMP in Windows XP was without menus:

Windows Media Player for Windows XP (WMP 8)

(screenshot from http://www.winsupersite.com/reviews/wm9series.asp).

The highly-unconventional window shape was toned down in version 9.0 and became virtually conventional in 10.0, although all four corners are rounded whereas the normal XP themes have rounded top corners and square bottom corners.

It appears that the menus first disappeared from MSN Messenger in version 7.0, which was released in April 2005:

MSN Messenger 7.0

(screenshot from http://www.winsupersite.com/reviews/msn_messenger7.asp)

Which Office application is RSS Bandit most like? Word? Excel? No. It’s most like Outlook. Which major Office 2007 application does not get a Ribbon (in its main UI)? Outlook.

I’ve been following Jensen Harris’s blog more-or-less since the beginning. In it, he explains the motivations behind creating the Ribbon, and the data that was used to feed the process of developing it. The Ribbon is mainly about creating better access to creating and formatting documents, by showing the user a gallery of choices and allowing them to refine it. Which part of Outlook gets a Ribbon? The message editor (OK, this is actually part of Word).

RSS Bandit is about viewing other people’s content, for which the best analogy is probably IE7.

I haven’t done any UI studies. I’ve not taken part in any. But Microsoft have analysed their UIs. They’ve gathered data on how those interfaces are used – automatically, in some cases (the Customer Experience Improvement Programs). The Ribbon is an improvement for Office. It’s not going to be right for all applications. Many applications actually suffer in the classic File/Edit/View/Tools/Help system: the menus tend to either be padded with commands that are duplicated elsewhere, or are ridiculously short (e.g. RSS Bandit’s ‘Edit’ menu which only has a ‘Select All’ option, which if you’re currently looking at a browser tab appears to do nothing – it’s only when you switch back to the Feed tab that you notice it’s selected all the items in the current feed or feed group). They’ll suffer equally in the Ribbon, particularly if there are too few features to make a Ribbon worthwhile.

When designing a UI for your application, don’t be too slavish to a particular model. If you find yourself padding out the menus to conform to the File/Edit/View model, or if all your commands are on the Tools menu, a classic menu probably doesn’t fit. If you’re not offering a feature for the user to customise the formatting of something, which the user will use regularly, a Ribbon is probably also wrong. The standard toolbar is probably enough.

Tuesday, 19 December 2006

Another knock-on effect of the stupid WinFX->.NET 3.0 naming decision

The next version of the Compact Framework will be called:

.NET Compact Framework 3.5.

Yeah.

Great way to confuse people.