I ran into an environment today where a group policy object (GPO) was configured at the domain level that set security logs to be archived to the C: drive when full. The need for the policy abated, then the policy was removed. But, the setting to archive event logs on each server and workstation in the domain persisted. Hard drives – physical and virtual – began to fill up, taking down critical services. The fix was easy, but I thought I would share for others who might run into the same issue.
First, I set a group policy to configure appropriately sized logs and retention policies on domain controllers (Default Domain Policy), Servers, and Workstations respectively. I also set up Audit Policies for each of the computer account types. For the floating VMware View workstations I dialed down logging to reduce Linked Clone growth rate as the log are not preserved across refresh/recompose operations. For domain controllers, reference this TechNet article with recommendations for ‘Strengthening Domain Controller Policy Settings’ https://technet.microsoft.com/en-us/library/cc773388%28v=ws.10%29.aspx.
Then, I had to do a quick cleanup of a bunch of archived event logs – tens of gigabytes of files on some some the busier servers (domain controllers, print servers, file servers) – that were filling local system (C:) drives up. A quick PowerShell script did the trick for me – I put together a multi-purpose script that could delete files by age or by name filter (or both) against a collection of machines. Here’s the script:
# --------------------------------------------------------------- #Delete files older than X days or meeting Y name filter #for all computers in file servers.txt (export from ADUC) #by Josh Townsend <a href="mailto:Josh.Townsend@clearpathsg.com">Josh.Townsend@clearpathsg.com</a> # change Where-Object to # Where-Object {$_.LastWriteTime -le $LastWrite} #if you want to delete by X days method #---------------------------------------------------------------- $Days = "7" $Now = Get-Date $LastWrite = $Now.AddDays(-$days) $servers = get-content servers.txt $filter = "Archive*.evtx" foreach ($node in $servers) { get-childitem -recurse "$nodeC$WindowsSystem32winevtLogs" | Where-Object {$_.Name -like $filter} | remove-item -recurse -force } #--And they lived happily ever after, the end.--------------
That did the trick to clean up all of the archived event logs. Next up, I wanted to deliver a report to the customer that showed any machines that were still at risk of running out of free disk space. I had an old VB script on file that worked, but I figured it was time to get up to date with PowerShell. A quick Google led me to this post that includes the same functionality of my VB script in glorious new PowerShell. Look down in the comments of the post for some updates to the script. The script reports on disk space consumption/free space for multiple servers (either from a .txt input file or from Active Directory), highlighting systems with less than a certain (variable defined) percentage of free space, then generating a HTML report and emailing it to designated recipients.
This got the job done – fixed the immediate issue and presented detailed information on similar or related issues that could cause immediate harm for which the client could take action on before further problems occured in the environment. While the script might work well for smaller environments, or quick fixes, I recommend a long term, proactive monitoring and alerting solution. VMware vCenter Operations Management Suite (5.6 can monitor within the guest VM’s using VMware’s Hyperic’s functionality), SolarWinds, Nagios or another solution would work, or an outsourced monitoring, support, and proactive maintenance solution like Clearpath’s Select are much better options for preventing problems like this before they knock out key systems.
The final step to resolving this particular issue for the customer would involve some Storage vMotions. The event logs filled up the guest disk, causing the thin-provisioned VMDK to grow to its maximum size. A Storage vMotion can re-shrink the disk to a thin size that resembles the now less full guest disk. Note that this trick only works when the VMFS block size of the source and destination VMFS datastore is different, else the fs3dm datamover will offload to VAAI, keeping the free space in the VMDK per this VMware KB: https://kb.vmware.com/kb/2004155. The KB also has a tip on using SysInternals SDELETE utility to zero out free space within the guest VMDK, then using vmfstools -K to release the unused blocks. If you are interested in a deepdive into how Storage vMotion works (or vSphere HA, DRS, SDRS, SIOC, DPM, etc.) I highly recommend Duncan Epping and Frank Denneman’s ‘VMware vSphere 5 Clustering Technical Deepdive’ book. You’ll fully appreciate the automation that takes place within your vSphere environment after reading their book. The book I linked to is updated for vSphere 5.1.
The new Space Efficient Sparse VMDK disk format, introduced in vSphere 5.1, takes advantage of SCSI UNMAP (block) or RPC Truncate (NFS) to release free guest blocks back to the datastore filesystem after performing a wipe and consolidate operation within the guest. Read more about SE Sparse VMDK’s on Cormac Hogan’s blog here: https://cormachogan.com/2012/09/05/vsphere-5-1-storage-enhancements-part-2-se-sparse-disks/.