Thursday, 21 October 2010

Hyper-V from a users perspective

I am seetting up this blog to remin myself mainly how I negotiate the problems I encounter in getting my Hyper-V servers to do what I want. However before goint through this it is worth asking why I am here in the first place?

SBS 2008
It all stemmed from an update to my SBS 2008 premium server which crashed the system. No matter what I  tried I was left with a system that would boot cycle but could not log in. Eventually I bit the bullet and decided I would sacrifice the time needed to setup, and went for a virtualised solution. I had played with Virtual server and liked the idea, but initially I went for VMWare until the full cost implications hit me. What I wanted was a highly available system with minimal downtime, at acceptable cost. The cost of VMWare is anythiing but acceptable.

UPS vs RCD
My problems we compounded by frequent reboots of my system brought on by the fact that my power circuits were RCD protected, and my UPS would send spikes back down to the RCD and cause the RCD to trip. They don't tell you this. In the 3 years I have lived here we have had one power cut, but probably 100 losses of power due to the UPSs. Servers don't like this... Nor in fact do alarm systems. The installer of my alarm had connected the alarm to the RCD - a spectacularly stupid idea, that has caused friends to be awoken at outrageous hours whenever I go on holiday.

Hyper-V
Hyper-V came along at just the right moment. I was an early adopter of MS Hyper-V Server 2008 R2. My goal from the start was to set up a highly available system using 2 or 3 servers running Hyper-V Server 2008 R2, and on these running SBS 2008 Premium, with a PDC, a Windows and/or Linux webserver, a linux VPN, and a couple of further Windows Server 2008 DCs.

To do this I planned to use the Failover Cluster Mechanism. This should allow you to create a cluster of servers and to migrate virtual servers from one physical server to another fairly seemlessly, or at least to keep down time to a minimum in that the virtual server can be rapidly restarted on failure. The theoretical rate limiting step for migration is the time taken to copy the memory through Ethernet from from one physical machine to another.


Failover Cluster
Setting up failover clusters llooks simple. You enable the facility on the Hyper-V servers, and create the cluster. You add in storage, and move the  Virtual machines onto the storage. I had the impression that I could use the built in HDDs on each machine and simply allocate these to the cluster and the cluster would update between the servers. To this end I set the servers to boot off Flash Memory sticks so the HDDs could be committed in their entirity. Right? Er no. You need specific shared storage - iSCSI or Direct attached storage. You obviously want to avoid a single point of failure - that means either 2 units or duplicated controllers etc. I looked at the Dell MD3000, but the £5k pricetag was prohibitive. This is High availability on a shoestring!

iSCSI
Eventually I identified the QNAP 259+ as capable of jumping throught the hoops I needed. This meant I needed extra LAN ports to allow communication with the iSCSI units, one to each.




Incompatible NICs
I purchase a dual port NIC for eachs erver, only to find that these don't work with Hyper-V. Compatibility information is very hard to find and Microsoft seem to have partly given up on this - as usual at the expense of the user.


I then ordered 2 Intel Gigabit Dual port ET adaptors, whcich are supposed to be compatible.

Loss of trust

Whilst waiitng for the NICs to arrive, and my rack to be properly configured, my servers crashed again, this time causing the death of a USB flash boot disk. It took several days to resurrect this and in so doing my servers got out of step, and lost their trusting relationship. The answer to trust issues on both servers and clients seems to be to use the

netdom resetpwd /server:Replication_Partner_Server_Name /userd:domainname\administrator_id /passwordd:*

command. You run this on the afflicted machine, and I used the name of the Primary Domain Computer (PDC) in place of the "Replication partner". To reset the password on a Domain Controller (DC) you need to stop the Kerberos Key distribution service firts and reboot. Do this using "Services" and stop the service and set startup type to manual.
 
 Again this is as clear as mud on the usual obfuscatory, minimal worked examples Microsoft website. For example with a domain called trees and a PDC called oak:

netdom resetpwd /server:oak /userd:trees\sysadmin /passwordd:noddy23

does the trick. It can also be useful to reset the account in AD on the PDC.



NICs that cant be found
Finally I installed the Intel NICs. To my utter bemusement the systems didn't find all the ports.

ProsetCL
Using the Intel ProsetCl.exe utility I can see the cards are installed:

c:\Program Files\Intel\DMIX\CL>prosetcl adapter_enumerate

    Number of adapters currently present: 4

        1) Intel(R) Gigabit ET Dual Port Server Adapter #2
        2) Broadcom NetXtreme Gigabit Ethernet #4
        3) Broadcom NetXtreme Gigabit Ethernet #3
        4) Intel(R) Gigabit ET Dual Port Server Adapter

In more detail:

c:\Program Files\Intel\DMIX\CL>prosetcl adapter_enumeratesettings 1

 1) Intel(R) Gigabit ET Dual Port Server Adapter #2

    Settings:

        LLIPorts                 -
        DefaultGateway           - 0.0.0.0
        IPAddress                - 169.254.100.204
        SubnetMask               - 255.255.255.0
        EnableDca                - Enabled
        EnableLLI                - Disabled
        *FlowControl             - Rx & Tx Enabled
        *HeaderDataSplit         - Disabled
        *InterruptModeration     - Enabled
        *IPChecksumOffloadIPv4   - Rx & Tx Enabled
        *IPsecOffloadV2          - Disabled
        *JumboPacket             - Disabled
        *LsoV2IPv4               - Enabled
        *LsoV2IPv6               - Enabled
        *MaxRssProcessors        - 8
        *NumaNodeId              - System Default
        *PriorityVLANTag         - Priority & VLAN Enabled
        *RSS                     - Enabled
        *SpeedDuplex             - Auto Negotiation
        *TCPChecksumOffloadIPv4  - Rx & Tx Enabled
        *TCPChecksumOffloadIPv6  - Rx & Tx Enabled
        *UDPChecksumOffloadIPv4  - Rx & Tx Enabled
        *UDPChecksumOffloadIPv6  - Rx & Tx Enabled
        *VMQ                     - Disabled
        *WakeOnMagicPacket       - Disabled
        *WakeOnPattern           - Disabled
        EnablePME                - Disabled
        ITR                      - Adaptive
        LogLinkStateEvent        - Enabled
        MasterSlave              - Auto Detect
        NumRssQueues             - 1 Queue
        WaitAutoNegComplete      - Auto Detect
        WakeOnLink               - Disabled
        EnableDHCP               - Disabled
        *ReceiveBuffers          - 256
        *RssBaseProcNumber       - 0
        *TransmitBuffers         - 512
        NetworkAddress           -
        NameServer               - 0.0.0.0
        ConnectionName           - Local Area Connection 6

Netsh
However when I use netsh they aren't visible:

c:\Program Files\Intel\DMIX\CL>netsh interface ipv4 show interfaces

Idx     Met         MTU          State                Name
---  ----------  ----------  ------------  ---------------------------
  1          50  4294967295  connected     Loopback Pseudo-Interface 1
1034           5        1500  connected     Local Area Connection 7
 39           5        1300  disconnected  Local Area Connection* 35
1031          10        1500  connected     Local Area Connection 4
1038           5        1500  connected     Local Area Connection 8

So here we are: Intel reports these arre "Local Area Connection 5" and "Local Area Connection 6", but to Intel  thse don't exist. WHY?

I uninstalled and reinstalled, and then eventually just uninstalled, and hey presto, the devices reappeared - the solution was to remove the drivers. Trying to be too clever - always a problem with Windows, which is nothing if not unpredictable.