Category Archives: EMC

“Shrinking” a Celerra filesystem

There’s no easy way to shrink a filesystem on the celerra. Once the filesystem size has grown or has been set it can’t be shrunk. However, you can get around it with a very short outage by creating a copy and then swapping the two over via re-mounting. Here’s how to do it.

Let’s assume we have a filesystem like this one:

  • Filesystem Name = myfilesystem
  • Filesystem Size = 300GB of pre-allocated space
  • Data Type = file data shared via CIFS totalling 50GB in size
  • DHSM enabled with data archived to secondary filesystem (CIFS in this case)

In this case we have 250GB of wasted space in our filesystem. The goal is to shrink this down to 100GB of pre-allocated space and have 50% file usage on the filesystem.

1] Create a new filesystem 100GB in size (the target size) = myfilesystem_new

2] Use a copy utility to copy the data from the old to the new filesystem. If you’re using DHSM you’ll want to use emcopy which by default will only copy the stubs.

3) Delete all checkpoints and checkpoint schedules on both filesystems (you need to do this before it’ll let you rename the filesystem).

4] SSH onto the celerra and type the following to unmount both myfilesystem and myfilesystem_new filesystems

/nas/bin/server_umount vdmname -p myfilesystem
/nas/bin/server_umount vdmname -p myfilesystem_new

as soon as you do this the share and filesystem is offline until you remount it. This should only be a few seconds.

5] Swap the names of the filesystems around

/nas/bin/nas_fs -rename myfilesystem myfilesystem_old
/nas/bin/nas_fs -rename myfilesystem_ myfilesystem

6] Remount the filesystems now they’ve been renamed

/nas/bin/server_mount vdmname myfilesystem myfilesystem
/nas/bin/server_mount vdmname myfilesystem_old myfilesystem_old

Now the filesystem is back online and you can test it is working OK by browsing to the share.

7] Delete the old mountpoint which isn’t being used (we named it old)

/nas/bin/server_mountpoint vdmname -delete myfilesystem_new

8] Recreate your checkpoint schedule.

What we’re left with is myfilesystem_new now called myfilesystem and the production filesystem. The old filesystem is now myfilesystem_old, which can be deleted when you’re happy all the data is there. The umount, rename, remount steps should take about 10 seconds.

You can even do a final check to see the data is OK and there wasn’t any last minute writes by comparing \\mycifsshare\c$\myfilesystem with \\mycifsshare\c$\myfilesystem_old.

Advertisements
Tagged , , , , ,

recallonly [ Migration: FAIL ] and recallonly [ Migration: ERROR ]

If you’re recalling data from an archive filesystem (see previous post) and you get one of these errors:

state                = recallonly [ Migration: FAIL ]

state                = recallonly [ Migration: ERROR ]

then you have at least one file that failed to recall back to the primary storage.

To view which files failed you’ll need to consult the logs which you can find at the root of the filesystem (e.g. \\mycifsshare.mydomain.com\c$\myfilesystem). The files will be named:

migErr_vdmname_myfilesystem

migLog_vdmname_myfilesystem

and contain a list of files that failed and the log for the recall respectively.

When you have a few files listed it’s fairly easily to find the name of the file that failed in migLog by opening it in a text error and searching for the string “I/O Error”.

Chdir to /myvdm/myfilesystem/myfolder/myfolder
Migrating directory myfolder...Mon Jan 1 00:00:00 2009
creating sub-directory myfolder...Mon Jan 1 00:00:00 2009
migrating file myfilename.doc...Mon Jan 1 00:00:00 2009
migrating file myfilename.doc failed at read last byte: I/O error

The filepath in this case is \myfolder\myfolder\myfilename.doc.

The recall can fail because of an orphan stub (a stub with no data on the secondary storage). This will need a restore from your backups to get that file back. Sometimes the recall fails but the file is on the secondary storage. One way to force the recall is to copy the file and rename it, i.e.

1) Right click the stubbed file and copy

2) Past the file into the same directory to get a “copy of myfilename.doc”

3) Delete or rename the stub file

4) Rename the “Copy of myfilename.doc” back to myfilename.doc

This manual workaround works fine for when you have a small number of files but will quickly become a chore if you have tens or even thousands to perform this trick on.

So here’s couple of vbscript scripts to help parse the migLog and perform the copy and rename task.

Syntax:

cscript /nologo parse-migLog.vbs path\to\migLog > failedfiles.txt

For large log files it might help to grep out some of the content first to speed this parsing up:

grep -i -E "I\/O error|ChDir" > migLog-grepped.txt

which will trim out all the files copied successfully leaving just the directories and the failed copy lines.

The script itself:

Set objFSO = CreateObject("Scripting.FileSystemObject")

strSourceFile=WScript.Arguments.Item(0)
set objFileStream = objFSO.OpenTextFile (strSourceFile, 1)

strCurrPath=""

do until objFileStream.AtEndOfStream
    strLine=objFileStream.ReadLine
    if string_compare("Chdir",strLine) then strCurrPath=get_PathFromChdirLine(strLine)
    if string_compare("I/O error",strLine) then 
        strFilePath=ltrim(rtrim(replaceFSwithBS(strCurrPath & "\" & get_FileNameFromMigratingLine(strLine))))
        wscript.echo strFilePath
    end if
loop

'Chdir 
function get_PathFromChdirLine(strtmpLine)
    strReturn=""
    strReturn=Mid(strtmpLine,10,len(strtmpLine))
    get_PathFromChdirLine=strReturn
end function

function get_FileNameFromMigratingLine(strtmpLine)
    strReturn=""
    intStringLength=len(strtmpLine)
    strReturn=Mid(strtmpLine,16,intStringLength)
    strReturn=left(strReturn,InStrRev(strReturn," failed"))
    get_FileNameFromMigratingLine=strReturn
end function

function replaceFSwithBS(strtmp)
    strReturn=strtmp
    strReturn=replace(strtmp,"/","\")
    replaceFSwithBS=strReturn
end function

'Compare a target string to a regular expression
private function string_compare(expression,targetstring)
    set oreg= new regexp
    oReg.pattern=expression
    oReg.IgnoreCase = TRUE
    if ("" = expression OR "" = targetstring) then
        boolSearchResult=0
    end if
    if oReg.test (targetstring) then
        boolSearchResult=1
    else
        boolSearchResult=0
    end if
    string_compare=boolSearchResult
end function

Download the script (rename to a zip)

This second script performs the copy and rename using the file list generated by the previous script:

Syntax:

cscript /nologo fix-failedfiles.vbs failedfiles.txt

The script:

Set objFSO = CreateObject("Scripting.FileSystemObject")

strSourceFile=WScript.Arguments.Item(0)
set objFileStream = objFSO.OpenTextFile (strSourceFile, 1)

do until objFileStream.AtEndOfStream
    strFileName=objFileStream.ReadLine
    on error resume next
    'Copy file to new to force inflate
    objFSO.CopyFile strFileName,strFileName & ".new"
    if err.number > 0 then
        wscript.echo strFileName
    else
        'rename current to .old
        objFSO.MoveFile strFileName,strFileName & ".old"
        'Rename copy back to original file
        objFSO.MoveFile strFileName & ".new",strFileName
        'delete .old
        objFSO.DeleteFile strFileName & ".old"
    end if
    err.clear
loop

If any of the files fail to copy then they will be output to the screen.

Download the script (rename to a zip)

Once you’ve fixed the files that failed to recall you can restart the recall process. This time it should complete.

 

Tagged , , , , ,

Reinflating stubs on the Celerra from secondary storage

After looking around the web I couldn’t see any obvious way to reinflate files on a secondary filesystem back to the primary on the Celerra.

However, the solution is quite simple. When you delete the dhsm connection from a file system you can opt to have the Celerra scan and move all the stubbed data back to the primary storage.

If you’re planning on re-archiving the data to new storage you can do both at the same time.

In this setup we have: a rainfinity, a centera and a CIFS-based archive storage. The aim is to reinflate from the centera and re-archive back to the CIFS storage without a) filling up the primary storage filesystem or b) auto-extending the primary filesystem.

Here’s an example of a filesystem with a single secondary archive storage (on a centera in this case which loops through the rainfinity):

[root@mycelerra bin]# /nas/bin/fs_dhsm -connection myfilesystem -info
myfilesystem:
state                = enabled
offline attr         = on
popup timeout        = 0
backup               = offline
read policy override = none
log file             = on
max log size         = 10MB
 cid                 = 0
   type                 = HTTP
   secondary            = http://myrainfinityserver.mydomain.com/fmroot
   state                = enabled
   read policy override = none
   write policy         =        full
   user                 = rainfinityuser
   options              = httpPort=8000 cgi=n

It loops through the rainfinity as the celerra is unable to talk to the centera directly; with CIFS storage it can, cutting the rainfinity out of the chain.

Now to perform the migration:

On the rainfinity:

1) Create a new policy with the new secondary storage as the destination

2) Disable the existing rainfinity schedule that archives to the centera

3) Create a new rainfinity schedule that archives to the new secondary storage. Select “Capacity Used” as the trigger to start the archiving. You’ll want to set the % about 10% larger than the current filesystem utilization. So if the filesystem is 26% full then set the trigger at about 35 or 40%.

4) Manually run this new schedule against the filesystem. This should automatically create a new cid (so you’ll have two attached to the same filesystem):

[root@mycelerra bin]# /nas/bin/fs_dhsm -connection myfilesystem -info
myfilesystem: state                = enabled offline attr         = on popup timeout        = 0 backup               = offline read policy override = none log file             = on max log size         = 10MB  cid                 = 0    type                 = HTTP    secondary            = http://myrainfinityserver.mydomain.com/fmroot    state                = enabled    read policy override = none    write policy         =        full    user                 = rainfinityuser    options              = httpPort=8000 cgi=n
 cid                 = 1
   type                 = CIFS
   secondary            = \\mycifsshare.mydomain.com\mynewarchive$\
   state                = enabled
   read policy override = none
   write policy         =        full
   local_server         = mycelerra.mydomain.com
   admin                = mydomain.com\mycifsuser
   wins                 =

Notice that the cid =0 for the old archive storage and cid=1 for the new storage.

Now we can delete the dhsm connection, cid=0, with recall to just recall the data back from the old secondary storage:

[root@mycelerra bin]# /nas/bin/fs_dhsm -connection myfilesystem -delete 0 -recall_policy yes
myfilesystem:
state                = enabled
offline attr         = on
popup timeout        = 0
backup               = offline
read policy override = none
log file             = on
max log size         = 10MB
 cid                 = 0
   type                 = HTTP
   secondary            = http://myrainfinityserver.mydomain.com/fmroot
   state                = recallonly [ Migration: ON_GOING ]
   read policy override = none
   write policy         =        full
   user                 = rainfinityuser
   options              = httpPort=8000 cgi=n
 cid                 = 1
   type                 = CIFS
   secondary            = \\mycifsshare.mydomain.com\mynewarchive$\
   state                = enabled
   read policy override = none
   write policy         =        full
   local_server         = mycelerra.mydomain.com
   admin                = mydomain.com\mycifsuser
   wins                 =
 Done

As you can see that “state” of the connection has changed from “enabled” to “recallonly”. This means that no more data will be archived to the old secondary and that the stubbed data is being recalled back to the primary. You can check on the status by using:

[root@mycelerra bin]# /nas/bin/fs_dhsm -connection myfilesystem -info
myfilesystem:
state                = enabled
offline attr         = on
popup timeout        = 0
backup               = offline
read policy override = none
log file             = on
max log size         = 10MB
 cid                 = 0
   type                 = HTTP
   secondary            = http://myrainfinityserver.mydomain.com/fmroot
   state                = recallonly [ Migration: ON_GOING ]
   read policy override = none
   write policy         =        full
   user                 = rainfinityuser
   options              = httpPort=8000 cgi=n

There are also some log files you can monitor at the root of the filesystem (e.g. \\mycifsshare.mydomain.com\c$\myfilesystem\) and are named migErr_vdmname_myfilesystem and migLog_vdmname_myfilesystem. The error file will contain any filemames which have failed to be recalled. The log file contains a running log of the recall, including errors.

Once the files have been recalled the connection (cid) will be removed. If there is an issue recalling any files the migration status will change to ERROR (meaning there was a problem and the migration is continuing) and FAILED (meaning that the migration has had at least one error and stopped).

As the primary filesystem fills up with the recalled data the % used will grow until it hits the threshold set in rainfinity to trigger an archive (40% in our case). Fortunately the archiving process is considerably faster than the recalling process and the data will be recalled then archived repeatedly until all the data has been moved from one secondary storage to the other.

If any users access any files which are on the secondary filesystem which is being recalled it will trigger that file to be recalled back to the primary filesystem.

Obviously how long the process takes will depend on the amount of data and the speed of your disks.

Tagged , , , , ,

Collecting disk usage data from UNC paths

If you’re not using windows or linux devices to present your CIF shares then collecting usage data can be quite tricky. We ran up against this recently while using EMC Celerra devices to present our shares.

The solution we came up with was to mount the UNC on the WhatsUp box, query the drive mount, then disconnect. As you can imagine this is quite costly and if you’re collecting from multiple CIFS shares the monitors can clash and try and use the same drive letter. To get around this we added some randomization and checking to see if a letter is free:

function random_driveletter()
	strReturn="T:"
	Randomize
	intRandom=(int(Rnd()*19))
	strReturn=CHR(70+intRandom)
	'context.logmessage "CHAR=" & strReturn
	random_driveletter=strReturn
end function

function driveexists(strtmpDrive)
	boolReturn=0
	Set objtmpFileSys = CreateObject("Scripting.FileSystemObject")
	If objtmpFileSys.DriveExists(strtmpDrive) Then
		boolReturn=1
		'context.logmessage "Drive is already in use."
	End If
	driveexists=boolReturn
end function

Drives in this case are chose from between E (ASCII character number 70) and Y (ASCII character 89). We can then cycle until we get a free letter. We’ve added in some extra checking if we cycle through too many times that will try and clear up all the drive letters so the script shouldn’t cycle through until the script times out:

numAttempts=0
do
	strDrive=random_driveletter() & ":"
	context.logmessage strDrive
	numAttempts = numAttempts+1
	if numAttempts > intCleanupThreshold then
		cleanupdriveletters()
	end if
loop while driveexists(strDrive)

The script uses the “DisplayName” field in the WhatsUp device to get the UNC path so you’ll need to setup a new device per share (or at least filesystem). To get the DisplayName field we query the WhatsUp database:

function getDisplayNamefromID(strtmpDeviceID)
	dim strReturn
	' Get the DB instance used by WhatsUp
	set objDatabase = Context.GetDB
	' Check it worked OK
	if "" = objDatabase then
		Context.SetResult  1, "Problem connecting to database"
	else
		' We need to find the reference used for this device in the PivotStatisticalMonitorTypeToDevice table first
		strQuery = "SELECT sDisplayName FROM  [WhatsUp].[dbo].[Device] where nDeviceID=" & strtmpDeviceID
		objResultSet =  objDatabase.Execute(strQuery)
		strReturn = objResultSet(0)
	end if
	getDisplayNamefromID=strReturn
end function

Then use

UNCpath=getDisplayNamefromID(Context.GetProperty("DeviceID"))

to get the path to use to map. This way we can create a single performance monitor script that is used on many shares. The advantage of this is that we can use it in Alert Center to create a single threshold configuration that includes all our CIFS shares.

Here’s the full script:

intCleanupThreshold=5

ipAddress=Context.GetProperty("Address")
UNCpath=getDisplayNamefromID(Context.GetProperty("DeviceID"))

' Get the Windows credentials for the device
strWindowsUsername = Context.GetProperty("CredWindows:DomainAndUserid")
strWindowsPassword = Context.GetProperty("CredWindows:Password")

strComputer="."
strDriveMap=UNCpath

'Timestamp
startTime = Timer()

numAttempts=0

do
	strDrive=random_driveletter() & ":"
	context.logmessage strDrive
	numAttempts = numAttempts+1
	if numAttempts > intCleanupThreshold then
		cleanupdriveletters()
	end if
loop while driveexists(strDrive)

context.logmessage strDrive & " " & strDriveMap & " took " & numAttempts & " attempts to get a free letter"

startMapTime = Timer()

Set objNetwork = CreateObject("WScript.Network")

numAttempts=0
do
	err.clear
	on error resume next
	objNetwork.MapNetworkDrive strDrive, strDriveMap,0,strWindowsUsername,strWindowsPassword
	tmpStatus=err.number
	if tmpStatus <> 0 then
		context.logmessage err.Description & " mapping drive " & strDrive & " to " & strDriveMap
		cleanupdriveletters()
	end if
	context.logmessage "err.num=" & tmpStatus
	numAttempts=numAttempts+1
	if numAttempts > intCleanupThreshold then exit do
loop while tmpStatus <> 0

endMapTime = Timer()
intMapDuration=int((endMapTime-startMapTime)*1000)
context.logmessage "Took " & intMapDuration & "ms to map " & strDrive & " to " & strDriveMap

Set objWMIService = GetObject("winmgmts:" _
    & "{impersonationLevel=impersonate}!\\" & strComputer & "\root\cimv2")
Set colDisks = objWMIService. _
    ExecQuery("Select * from Win32_MappedLogicalDisk  where Caption = """ & strDrive & """")

For Each objDisk In colDisks
	floatPercUsed=percentage_used(objDisk.Size,objDisk.FreeSpace)
	endTime=Timer()
	intDuration=int((endTime-startTime)*1000)
	context.setvalue floatPercUsed
Next

startMapTime = Timer()
objNetwork.RemoveNetworkDrive strDrive
endMapTime = Timer()
intMapDuration=int((endMapTime-startMapTime)*1000)
context.logmessage "Took " & intMapDuration & "ms to unmap " & strDrive & " from " & strDriveMap

function percentage_used(strDiskSize,strFreeSpace)
	floatReturn=0
	floatReturn=100-Round((strFreeSpace/strDiskSize)*100,1)
	percentage_used=floatReturn
end function

function random_driveletter()
	strReturn="T:"
	Randomize
	intRandom=(int(Rnd()*19))
	'context.logmessage intRandom

	strReturn=CHR(70+intRandom)
	'context.logmessage "CHAR=" & strReturn
	random_driveletter=strReturn
end function

function driveexists(strtmpDrive)
	boolReturn=0
	Set objtmpFileSys = CreateObject("Scripting.FileSystemObject")
	If objtmpFileSys.DriveExists(strtmpDrive) Then
		boolReturn=1
		'context.logmessage "Drive is already in use."
	End If 

	driveexists=boolReturn
end function

function cleanupdriveletters()
	context.logmessage "Cleaning up driveletters to free space"
	Set objtmpNetwork = CreateObject("WScript.Network")
	for i=70 to 89 step 1
		tmpDriveLetter=CHR(i) & ":"
		context.logmessage "Processing letter " & tmpDriveLetter
		on error resume next
		objtmpNetwork.RemoveNetworkDrive tmpDriveLetter
		if err.number then
			context.logmessage tmpDriveLetter & " (" & Replace(Replace(err.description, CHR(13),""),CHR(10),"") & ")"
			err.clear
		else
			context.logmessage "Removed " & tmpDriveLetter
		end if
	next
end function

function getDisplayNamefromID(strtmpDeviceID)
	dim strReturn

	' Get the DB instance used by WhatsUp
	set objDatabase = Context.GetDB

	' Check it worked OK
	if "" = objDatabase then
		Context.SetResult  1, "Problem connecting to database"
	else
		'context.logmessage "Connected to DB OK"

		' We need to find the reference used for this device in the PivotStatisticalMonitorTypeToDevice table first
		strQuery = "SELECT sDisplayName FROM  [WhatsUp].[dbo].[Device] where nDeviceID=" & strtmpDeviceID
		objResultSet =  objDatabase.Execute(strQuery)
		strReturn = objResultSet(0)
	end if
	getDisplayNamefromID=strReturn
end function
Tagged , , , ,