Tuesday, 28 May 2013

When a snapshot isnt.

I came into the office this morning to see one of our Exchange 2010 VMWare boxes in disarray.

Great start to the morning.  Did I say it was raining too?

The DAG had failed over correctly so users were unaffected.  At least one thing was going my way!

The error we saw on vSphere:

"There is no more space for virtual disk xxx-000002.vmdk. You might be able to continue this session by freeing disk space on the relevant volume, and clicking Retry.  Click Cancel to terminate this session."

We quickly found the LUN that the disk in question was sitting on and then browsed that LUN.

Now is the time I will say we use Veeam for our backups.  This is nothing to do with their excellent system which two days previous had saved us from a SQL DB corruption - I have to get that point in as I would not want anyone to think this is in any way Veeams fault.

Ok back on track...

Veeam had run a backup of Exchange 2010 and failed due to the disk space issue.  Veeam uses snapshots.

Snapshots, as we all know and I was reminded of this morning, are stored in the "default" LUN which is by default the first LUN you add to the VM.  The snapshots for all 15 disks of sizes from 40GB to 580GB on my Exchange box were happily sitting on my one LUN of 1.8TB.   This did not fit.

vSphere was pointing this out to me.  All my Googling didn't help as all suggestions were increase the size of the LUN.  Which is perfectly good advice however we can't increase this LUN.  Or delete anything on it.  Or move anything off.  Stuck.

Clicking Retry on the vSphere message just resulted in the vSphere giving us the message again a few minutes later.

I stopped the Veeam job and vSphere told me that it was rolling back all the snapshots.  It didn't.

"Unable to access file <unspecified filename> since it is locked"

Oh great.  Useful too.

So we clicked Cancel.  Interestingly this turns off the machine.  Again I had Googled for this and found nothing so here it is again so that somewhere this is on the internet.

If you have:

"There is no more space for virtual disk xxx-000002.vmdk. You might be able to continue this session by freeing disk space on the relevnat volume, and clicking Retry.  Click Cancel to terminate this session."

Clicking Cancel turns off the VM.

Ok so now the machine is off, with what looks like snapshots in the datastore, disks ending with -0000x.vmdk.  Ok so consolidate the snapshots then.  vSphere says there are no snapshots.  Oh.

So fired up Putty...and following the VMWare article

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1002310

:>vim-cmd vmsvc/snapshot.removeall

Nothing.  Snapshots files still there.

However there is a way out, despair not my fellow VMWare administrator!

Summary of situation:

A VM with lots of disks with snapshots.  The vSphere believes the VM doesn't have snapshots.  The hypervisor believes there are no snapshots.  There are snapshot disks 00000xx.vmdk ireferenced in the VM's VMFS.  These snapshots are on a LUN which has no space so the VM cannot power up.

Solution:
Make another snapshot.  Sounds perverse but you make another snapshot which gives access to the Delete All in snapshot manager.  This then consolidates all the snapshots even those the which supposedly are not snapshots.

*phew*

Credit to the Veeam engineer for this.

Just for Google searches:  Manually remove snapshots from a failed Veeam backup.




2 comments:

  1. Holy shit! Thank you so much for posting this, I spent hours attempting to fix this issue.

    ReplyDelete
  2. No worries!

    Thanks for reading and posting a comment.

    ReplyDelete