Nested Home Lab – Part 2 – Networking

Networking


While we can run all our services over one network segment, for this lab we really want three VLANs.
Networking setup will differ slightly for ESXi and Workstation but both will use the three VLANs.
Use Example IP Range Note
VM and Management 192.168.0.0 Best to use you existing home network.
VSAN (vlan 30) 192.168.100.0 Internal Only
vMotion (vlan 40) 192.168.110.0 Internal Only

It is considered best practice to separate out various types of network traffic. Usually you would separate out your VM traffic from your management but for this lab we will keep them together. We will separate out vMotion and VSAN traffic though.

ESXi: When I’m designing a vSphere environment with rack mount servers in mind I usually separate management traffic out into a separate standard virtual switch (2 x 1g ports) and all other traffic is sent to 2 x 10g ports through a distributed virtual switch using vlans to separate the traffic further and NIOC to control bandwidth.

All righty then, what’s this going to look like for us?

In Workstation:

 
In ESXi:

So far, so good. For Workstation nothing else needs to be done.

As you can see from the ESXi image above I haven’t specified a physical adapter for vSwitch1, With ESXi if network traffic is on the same VLAN and on the same virtual switch, it won’t go the the physical switch. The virtual switch, an in memory construct, will just pass the traffic along, however if you need to cross VLANs, the traffic will need to be passed to the physical switch for routing across VLANs. In this case, its very handy as we won’t want traffic to pass between the VSAN port group and the vMotion port group. EDIT: What we want to do is set-up the LAN port group with VLAN 4095. This will enable ESXi to pass the vlan traffic about correctly.

Now is ESXi we need to make two changes to the vMotion and VSAN port groups, Enable Promiscuous Mode and Forget Transmits:

William Lam of Virtually Ghetto has a great write up here discussing the reasons for this.

So those are the networking eccentricities, Next we’ll look at getting our first VCSA with a dedicated Platform Services Controller up and running.

Just a note: I had hoped to post this sooner but family and holiday commitments took over.

Nested Home Lab – Part 1 – The Plan

Now before we charge headlong into this lab you need to go and check out the work that Alastair Cooke and Nick Marshall have done with AutoLab over at http://www.labguides.com/. Its a really good project and it automates much of what we will do through the next few posts manually.

For this first lab we are going to start with the basics, a three node cluster that will support vMotion, a couple of VM’s and VSAN for storage.

Lets also see if we can apply best practice where possible. This lab will give you a good environment to familiarise yourself with VMware.

You can just as easily build this lab using VMware workstation, in fact its where I first set it up. I still think Workstation is one of the best products that VMware make.

To get started we’ll need a few things:
  1. Computer with at least 16GB of RAM. This will either run ESXi natively (preferred) or windows/ linux with VMware workstation.
  2. Windows server install media.
  3. VMware vSphere ESXi
  4. VMware vCenter server Appliance
  5. Ubuntu ISO or Tiny Core Linux.
  6. A Plan
While you have the above software downloading lets look at the plan.

What we want to do is think about this environment as having three layers, which I’ll keep referring back to. 

Layer 1 – The physical kit. Here we will be running an OS/hypervisor on our physical “server”. Whether this is ESXi or VMware Workstation is, at this stage immaterial. 

Layer 2 – This is will be our first virtual layer and where our three ESXi servers, the vCenter Server Appliance (VCSA) and our Domain Controller will live.

Layer 3 – This will be our “nested” layer. Here we will run between one and three VM’s. These VM’s won’t really do anything except run an OS. I have historically run Linux in this layer as it seems to perform OK. I have listed two Linux distributions what I know run well as nested VM’s. But, really you could run any OS if you have the mind to.


As we go through the setup I’ll try to cover both the Workstation and ESXi configs but, as you would expect, this will work much better with ESXi and that’s where I’ll concentrate most.



Next post we’ll look at the networking required for your nested lab.

Home Labbing

Unless you’ve been living under a rock you will have heard two big announcements over the last couple of weeks.
1. vSphere 6 is official. 
2. VMUG advantage now comes with VMwares EVALexperience
While the vSphere 6 announcement was expected by the community the EVALexperience was a real surprise, to me anyway. 
What does this mean? Well, in addition to all the benefits that come with a VMUG advantage subscription you now get the ability to use a bunch of VMware’s software for the duration of your subscription. No rebuilds every couple of months which makes your home lab more “stable/persistent”and the list of available software looks quite good.
With each new release of vSphere or SRM or NSX or VSAN or … or … or … a lab becomes more important.

But what do you want out of a lab? Do you want to test new software, create disposable environments, run a permanent infrastructure? I guess its really up to and your budget. For me its important to test new software, do early investigation before I approach work and study. Do I need permanent running infrastructure? Not really. I prefer nested a ESX solution. It suits me and my budget. However there are many instances when you would want a “physical” lab, Consultants for a start.

Anyway, I have only three bits of kit that are really important to creating my home lab.

  • One second hand laptop (Main work horse).
  • One small netgear switch (TP-LINK TL-SG108E)
  • One Lenovo S20 (ESXi – Booted from USB)
Laptop –> Switch –> S20
Right, so the S20 I tricked out a bit. It has a full compliment of Ram (24GB), one 500GB SSD and one 1TB SSD. It’s connectivity to the world is through the 1GB interface and it boots from an 8 GB SSD.

The whole lab runs several Nested VM’s. Usually three ESX servers, VSAN, one VCSA and a DC. However it has run four ESX Servers, two windows servers with vCenter and SRM, and two Netapp simulators.

In the next post I’ll step through setting up a nested virtual lab.

VMworld Session Review – VAPP2305

Session: Extreme Performance Series – Understanding Applications that Require Extra TLC

Speakers: Vishnu Mohan (VMware), Reza Taheri (VMware).



This session was one of a series covering Extreme Performance.

If you are a virtualization Engineer then this should be a session you catch-up with and I am an engineer to my core. Of the three VMworld’s I have attended this was by far the most enjoyable and interesting session I have attended.

This session really looks at edge cases where virtualizationtechnologies would be the cause of performance issue.

Things like standards are not really discussed but assumed, in so much as this talk doesn’t cover rookie mistakes and assumes for all scenarios that all best practices are currently being met and the latest VMware stack is being used.

Extreme I/O, latency and timers are covered, dissected and demystified. Both Vishnu and Reza were brutally honest and completely unapologetic about the limitations of virtualization. The issues that were encountered affecting virtualization, would affect all platforms and not just VMware’s.

The speakers do make it very clear that for 99% of workloads/applications the default settings will serve you just fine and they are completly right. When was the last time you needed to “tune” a VM, not the application but the VM? 

Also questions are posed along the lines of “You want to use SR-IOV? What for?” A VM can push 1 million packets. Perhaps if you needed extremely low latency and virtualization together. But maybe you would be better off going physical in that case.

For me the big takeaway from this session is know your workloads. Question and analyse.





White Paper Wednesday.

I’ve just finished reading NetApp’s white paper WP-7193, FAS Hardware: Optimized for I/O Expandability, and Reliability.

First off I would say that this is not a paper that is heavy on the technical details. It more of a “this is what we do and how we do it” paper. At time it does read like a marketing paper but over all its a paper that would be good for somebody to read that is new to NetApp and would like to find out a bit more about the technology.

The focus is on their FAS series, which is where I would expect most peoples first contact with NetApp would be and covers a fair amount of topics from Storage I/O Data paths to on disk error correction.

One of the topics is touches on is the attitude that a storage system (NetApp, EMC, 3PAR, etc) is really just a fancy server with disks attached. While the argument can be made it usually indicates a lack of understanding of how a dedicated storage appliance really works. Yes is has an Intel CPU and Toshiba RAM and Hitachi disks but it is highly optimised to perform two functions: Serve Data, Protect Data. Both ways it is an interesting argument. How much is hardware and how much is software. With the availability of very well featured software storage OS’s such as FreeNAS the waters get muddied further.

I used FreeNAS extensively when studying for the VCAP-DCA exam, it worked and worked well. However the question is, can it compare? For certain uses, sure, its a viable alternative, cost effective and easy to manage. Could it go head-to-head with a FAS2240? Even though I doubt it, its something I am curious to test.

Cisco nexus 1000v goes free

At my place of work we use the Cisco Nexus 1000v. It was a big part of my drive in the last year to bring all parts of IT into the virtual environment.

Selling the 1000v to the Network department was actually very easy. I gave one of the engineers a Cisco white paper to read on physical switch best practice and VMware. He read it that night and came to the office when straight to his manager and explained why we needed to buy it.

The whole purpose of this was to bring the network team into the fold. As we moved forward with a fairly aggressive P2V drive the network team has slowly lost control and visibility of a fairly major part of their network. Not being able to apply and guarantee the same network policies across the network estate is a major cause of concern for many network people.

The network team are now approaching me and asking if they can look at putting the virtual versions of the physical security appliances they use into the Virtual environment, so when I was at VMworld in Barcelona, I made a point to visit the Cisco stand and ask about the VSG and the Virtual ASA to try to get an idea of how they work with the 1000v’s, differences, licensing and other bits and pieces. They told me that Cisco were going to come out with a two tier licensing model. Essentials and Advanced, in other words free and not free. OK that is a little unfair as the Advanced version does have a few more features than the Essentials version, most importantly for us the Advanced version now comes with a VSG license.

For us this is great. In our small offices (two hosts) we can now start to put the free version and in our large datacentres we can keep using the 1000v but now also have the option of using the VSG. This should also be a big benefit for small companies, schools, charities and anybody else who is cash strapped (and yes, home labs too).

While this is good news, Cisco aren’t in the business of giving away free stuff, they don’t even do, what I fondly refer to as the drug pusher samples (a small bundle of free licenses to get you hooked). It makes me wonder how strong the uptake of the 1000v architecture really was. We love it and get really good use out of it but it is expensive and I believe this cost probably drive most people away.

Either way I think its a good step forward.

New Patch for vSphere

Things got a bit weird a couple of weeks ago while trying to upgrade my home lab from 4.1 to 5.1.

The install of SSO went fine but upgrading the Inventory Service threw up the following error:
Error 29102 . Could not contact Lookup Service. Please check VM_ssoreg.log in system temporary folder for details.

So I dutifully checked the required log file and googled it. It turns out that if the FQDN has the word “port” in it anywhere the install/uprade will fail. REALLY?!?? 
It turns out that this is fixed in 5.1.0a (Build 880471). you can find the details here.
Upgrading to 5.1.0a worked. How that bug got in there is mind boggling.

What is the value of a certification?

Interesting question, as the true value of a certification differs depending on how you present.

For me the true value of being certified has realised itself in two ways:
  1. Got me a second interview with the company a work for.
  2. Improved my Skills.
A few years ago I was looking for a new job and had taken just taken the first of two netapp exams required to get NCDA certified (now I believe one is required). 
I had submitted my CV and managed to get a first interview. On my CV I had stated that I had passed the first exam and intended the second to get the certification. By the time the interview rolled around I had taken and passed the second exam. During the interview I mentioned that I had passed the second exam and had the cert.
This went down well and they felt it showed that I was committed.
Other job interviews granted have been, usually, down to the fact that I have had certifications in the relevant areas as well as experience. In other words it often helps get a foot in the door. Once you are sitting in front of the interview panel it’s up to you to show you know your stuff.
But mostly I do it now to improve my skills.
The VCP certification is VMware’s entry level certification. To get VCP certified you need to attend an official VMware course and then sit the exam. This for me is quite interesting as VMware are trying to ensure that all VCP’s have had exposure to the same or similar material but if your company wont sponsor you it can be quite expensive. It’s also fairly easy and quite common these days but many jobs require you to at least have this cert before you even apply. (Foot in the door!).
The VCAP Exams are much more difficult and, I would expect, don’t have a high first time pass rate. I have taken the VCAP-DCD exam once and missed out by 18 points. Very frustrating.
I took the VCAP4-DCA exam twice before I passed and more recently I took the VCAP5-DCA beta and passed. I guess practice makes perfect. I felt that the DCA exams were similar in format to Red Hats’ exams which give an objective and let the candidate get on with the LAB.
The DCD exams are really about understanding designs. It is important to note, that while reading the recommended white papers and books will help, you really do need to understand how to make decisions required for those designs. Anybody can get a basic VMware setup up and running in next to no time but does it meet the requirements? That’s what you need to get for this exam.
Now studying for these two exams has helped me as a VMware admin more than the cert has helped my career. People who have taken the exams are very cagey about giving out any info about the questions asked, and for good reason. Apart from the fact that VMware will find you and remove your certification and possibly ban you from taking VMware exams in the future, they require a fair degree of study, which means a not insignificant time sacrifice.
VCDX? Well I haven’t done that…. Yet.
But it’s all about what you want from your studies. Can it open doors? Yes, but then it’s up to you to prove yourself. Can it help you improve you skills? Yes, but the higher you aim and the more time you invest.
When all’s said and done, for me it’s been worth it.

It’s the little things….

It’s the little things that make a successful deployment. Getting caught up in planning the perfect design without looking at the fine detail can discredit the time and effort you have put it.

To VMwares credit, the default install generally runs just fine out of the box with little or no additional work required.

However that being said we recently ran into an issue with one of our drivers which caused a huge amount of headache.

We are running 10Gbe on all our hosts. This works well for us and actually was a request from our network department. I did a large amount of testing and settled on an Intel dual port card. Its fast, efficient, tidy and since we are only using two cards we have four fibre cables running out of the back of the servers instead of many copper cables.

So on to the headache. I was on the way home when I received a call from a very distressed Admin. Apparently one of our ESXi hosts had lost its storage taking 56 VM’s down. The network department reported that the two 10Gbe ports were flapping and nobody really knew what was happening.

After plugging in a copper cable and disabling the fiber. We were back up and running. The host was put into maintenance mode, log files exported and a case raised.

As it turns out the driver we were using was a couple of releases out of date. VMware suggested we update to the latest (obviously). I did query why Update Manager hadn’t presented me with the latest driver so hopefully I’ll get an answer for that soon too.

EDIT (02/07/2012): Just had a twitter conversation with Andre Beukes (@eabeukes). Who let me know that the only the 1000e and the bge drivers are in VUM at the moment. As far as he knows they should be coming soon. 
Full conversation:

  • Andre Beukes @eabeukes I dont think the ixgbe driver is in the Update Manager repository – its a FCoE card right?
  • Carel Maritz @carelmaritz The card doesn’t support FCoE but the driver is igbxe. I was surprised that it isn’t updated by VUM.
  • Andre Beukes @eabeukes Only e1000 and bge are in VUM – theres no 10GB cards in there (yet) AFAIK as that will come with soft-FCoE when its released
  • Carel Maritz @carelmaritz ah, thanks for the info. I’ll update my post. Do you mind if I quote this conversation?
  • Andre Beukes @eabeukes Sure no worries

END EDIT

I have updated the driver and it all seems good. I do, however, like to understand exactly what happened. I went to our syslog server (which is something everybody who uses ESXi should have).

Logs as follows:




vmkernel: 345:02:11:53.704 cpu23:73434691)<3>ixgbe: vmnic15: ixgbe_check_tx_hang: Detected Tx Unit Hang
vmkernel:   Tx Queue             <6>
vmkernel:   TDH, TDT             <0>, <3>
vmkernel:   next_to_use          <3>
vmkernel:   next_to_clean        <0>
vmkernel: tx_buffer_info[next_to_clean]
vmkernel:   time_stamp  
vmkernel: 345:02:11:53.704 cpu15:64444910)<6>ixgbe: vmnic15: ixgbe_clean_tx_irq: tx hang 7 detected, resetting adapter



vmkernel: 345:02:11:57.823 cpu1:15580)<3>ixgbe: vmnic17: ixgbe_check_tx_hang: Detected Tx Unit Hang
vmkernel:   Tx Queue             <4>
vmkernel:   TDH, TDT             ,
vmkernel:   next_to_use          
vmkernel:   next_to_clean        
vmkernel: tx_buffer_info[next_to_clean]
vmkernel:   time_stamp    
vmkernel: 345:02:11:57.823 cpu1:15580)<6>ixgbe: vmnic17: ixgbe_clean_tx_irq: tx hang 1 detected, resetting adapter

It was quite interesting. The first nic (vmnic15) registered a stop in tx packets and the driver reset the port. All the portgroups and vmp ports failed over to the other nic (vmnic17). The driver at this stage had gotten itself into a loop. and reset vmnic17. Everything failed over and ……   well you get the picture.

So to cut a long story short, it pays to sweat the small stuff in any deployment.

Heading down the VCDX Highway

Once again, I had hoped to blog more this year but as these things happen I haven’t been able to.
However I have been working hard at getting my certs done. I passed the VCAP4-DCA which was an amazing feeling. I had almost given up on the process as it was very time consuming. My wife has been very supportive which has really helped.
The exam itself was not easy.There are lots of scenarios and a wide variety of tasks so you really need to know your stuff. A lab is a must and you should give yourself a good three months to study. My home lab was inspired by Simon Gallagher’s vTardis and run on my laptop. There are plenty of resources available to help you get going. I would personally recommend Edward Grigson’s guide which can be found at http://www.vexperienced.co.uk.
So with the release of the VCAP5-DCD and the imminent release of the VCAP5-DCA I have decided to go for the VCDX on the vSphere 5 track. I will probably use the design that I put together for my office.
For those of you wanting to go for the VCDX certification has a look at http://vcdxwannabe.com/. It looks like it is going to be a good community for people after the same thing and when I last checked there were a few VCDX’s on there too.
The VCDX’s I have met have been really friendly and have always been approachable, answering any questions I have (and there have been plenty).
There is also the question of whether or not to try to be recognised as a vExpert. The vExpert is not a certification but more of recognition by VMware of an individual’s contribution to the community.
So anyway ramblings aside, I have reading to do.