HP has released its latest Flex-10 Firmware version 3.17 which has been deemed * RECOMMENDED *.
It doesn’t seem to appear under a search for HP Virtual Connect Flex-10 10Gb Ethernet Module for c-Class BladeSystem and ESX 4.0 yet which is odd but HPs support site isn’t known to be the easiest place to find things!
The 3.17 firmware has been posted here:
Looks like the DNS issue and PXE issues have been addressed along with a whole long list of other fixes.
Hopefully we are getting closer to a stable firmware release as the recent issues have been very troubling.
HP has discovered a bug in their latest Flex-10 firmware version 3.15.
The bug occurs when you have blank blades going through continuous PXE boot cycles. Each time a blade PXE boots Virtual Connect sees the status of the port change and it triggers a process in Virtual Connect that queries the status of all its NICs, which runs but it does not release the memory used.
When Virtual Connect runs out of memory it restarts some of its internal network processes, which causes a network outage. Reseating the Flex-10 switches clears the memory leak and depending on how many blades are power cycling, it can take a couple of days for the memory to run out again. If you are seeing seemingly random network disconnections, this could be the cause.
Future versions of Virtual Connect will apparently have memory usage graphs which may help.
Make sure you power off any blades you are not using rather than letting them power cycle to avoid the issue.
Apparently a 3.17 driver version is imminent from HP which will fix this issue as well as the DNS issue.
VMware has released a new KB article all about investigting the health of a vCenter database.
I’ve blogged before on the major issue with vCenter being a massive single point of failure and also on some steps to work out excessive growth in the database which is now included in this article.
This new KB article does provide good advice and plenty of additional troubleshooting steps for working out where your issues are but the fact still remains that the current design for vCenter is far too monolithic, relying on a database that vCenter itself can corrupt, especially when VDI may require constant availability and more and more management products “bolt-on” to vCenter
Also, alarmingly, the final troubleshooting step is:
Reinitializing the vCenter database
A reinitialization of the vCenter database will reset it to the default configuration as if the vCenter server was newly installed. The following are a few situations which could warrant reseting the database:
- Rebuild of vCenter is required
- Data corruption is suspected
- At the request of VMware Support
HP has released an updated advisory for their serious DNS issue as described in my previous post.
There’s now a lot more information on the problem and three scenarios described depending on how you handle IP addressing for your environment and the steps on how to fix it.
I’ve heard HP is seeing this as a global problem affecting many clients so make sure you are protected.
Anyone who needs to build multiple ESX(i) hosts naturally looks to scripting to automate the process. Scripting allows for faster deployment once you have developed the script but equally important reduces human error. It’s far to easy to mistype a port group name, vlan number or IP address. Scripting removes this element of humen error and allows you to build ESX(i) hosts preditably and quickly.
Unfortunately there are just some things that PowerCLI cannot natively automate such as installing HP agents on ESX as this requires console / SSH access to the ESX host and running the install “locally”.
Hopefully hardware vendors will see the benefits of integration with VMware Update Manager and allow hardware monitoring agents to be installed and updated with Update Manager but until then we have to make another plan.
It is always painful to have developed a fantastic PowerCLI script to automate your build and at the end still have to manually SSH into your ESX host to install a hardware agent.
Updated with further information from HP:
HP has released an advisory explaining a serious issue with their blade Flex-10 Switches and DNS which can cause the Flex-10 switches to lose connectivity with each other in a Virtual Connect (VC) domain and cause network outages.
Apparently this is affecting a large number of customers although it seems to have taken some time to filter through the HP support system so if you are experiencing lost connectivity to your switches and network drops and have logged a call make sure the support engineer is aware of this advisory.