## “Shrinking” a Celerra filesystem

There’s no easy way to shrink a filesystem on the celerra. Once the filesystem size has grown or has been set it can’t be shrunk. However, you can get around it with a very short outage by creating a copy and then swapping the two over via re-mounting. Here’s how to do it.

Let’s assume we have a filesystem like this one:

• Filesystem Name = myfilesystem
• Filesystem Size = 300GB of pre-allocated space
• Data Type = file data shared via CIFS totalling 50GB in size
• DHSM enabled with data archived to secondary filesystem (CIFS in this case)

In this case we have 250GB of wasted space in our filesystem. The goal is to shrink this down to 100GB of pre-allocated space and have 50% file usage on the filesystem.

1] Create a new filesystem 100GB in size (the target size) = myfilesystem_new

2] Use a copy utility to copy the data from the old to the new filesystem. If you’re using DHSM you’ll want to use emcopy which by default will only copy the stubs.

3) Delete all checkpoints and checkpoint schedules on both filesystems (you need to do this before it’ll let you rename the filesystem).

4] SSH onto the celerra and type the following to unmount both myfilesystem and myfilesystem_new filesystems

/nas/bin/server_umount vdmname -p myfilesystem
/nas/bin/server_umount vdmname -p myfilesystem_new

as soon as you do this the share and filesystem is offline until you remount it. This should only be a few seconds.

5] Swap the names of the filesystems around

/nas/bin/nas_fs -rename myfilesystem myfilesystem_old
/nas/bin/nas_fs -rename myfilesystem_ myfilesystem

6] Remount the filesystems now they’ve been renamed

/nas/bin/server_mount vdmname myfilesystem myfilesystem
/nas/bin/server_mount vdmname myfilesystem_old myfilesystem_old

Now the filesystem is back online and you can test it is working OK by browsing to the share.

7] Delete the old mountpoint which isn’t being used (we named it old)

/nas/bin/server_mountpoint vdmname -delete myfilesystem_new

What we’re left with is myfilesystem_new now called myfilesystem and the production filesystem. The old filesystem is now myfilesystem_old, which can be deleted when you’re happy all the data is there. The umount, rename, remount steps should take about 10 seconds.

You can even do a final check to see the data is OK and there wasn’t any last minute writes by comparing \\mycifsshare\c$\myfilesystem with \\mycifsshare\c$\myfilesystem_old.

Tagged , , , , ,

## recallonly [ Migration: FAIL ] and recallonly [ Migration: ERROR ]

If you’re recalling data from an archive filesystem (see previous post) and you get one of these errors:

state                = recallonly [ Migration: FAIL ]

state                = recallonly [ Migration: ERROR ]

then you have at least one file that failed to recall back to the primary storage.

To view which files failed you’ll need to consult the logs which you can find at the root of the filesystem (e.g. \\mycifsshare.mydomain.com\c$\myfilesystem). The files will be named: migErr_vdmname_myfilesystem migLog_vdmname_myfilesystem and contain a list of files that failed and the log for the recall respectively. When you have a few files listed it’s fairly easily to find the name of the file that failed in migLog by opening it in a text error and searching for the string “I/O Error”. Chdir to /myvdm/myfilesystem/myfolder/myfolder Migrating directory myfolder...Mon Jan 1 00:00:00 2009 creating sub-directory myfolder...Mon Jan 1 00:00:00 2009 migrating file myfilename.doc...Mon Jan 1 00:00:00 2009 migrating file myfilename.doc failed at read last byte: I/O error The filepath in this case is \myfolder\myfolder\myfilename.doc. The recall can fail because of an orphan stub (a stub with no data on the secondary storage). This will need a restore from your backups to get that file back. Sometimes the recall fails but the file is on the secondary storage. One way to force the recall is to copy the file and rename it, i.e. 1) Right click the stubbed file and copy 2) Past the file into the same directory to get a “copy of myfilename.doc” 3) Delete or rename the stub file 4) Rename the “Copy of myfilename.doc” back to myfilename.doc This manual workaround works fine for when you have a small number of files but will quickly become a chore if you have tens or even thousands to perform this trick on. So here’s couple of vbscript scripts to help parse the migLog and perform the copy and rename task. Syntax: cscript /nologo parse-migLog.vbs path\to\migLog > failedfiles.txt For large log files it might help to grep out some of the content first to speed this parsing up: grep -i -E "I\/O error|ChDir" > migLog-grepped.txt which will trim out all the files copied successfully leaving just the directories and the failed copy lines. The script itself: Set objFSO = CreateObject("Scripting.FileSystemObject") strSourceFile=WScript.Arguments.Item(0) set objFileStream = objFSO.OpenTextFile (strSourceFile, 1) strCurrPath="" do until objFileStream.AtEndOfStream strLine=objFileStream.ReadLine if string_compare("Chdir",strLine) then strCurrPath=get_PathFromChdirLine(strLine) if string_compare("I/O error",strLine) then strFilePath=ltrim(rtrim(replaceFSwithBS(strCurrPath & "\" & get_FileNameFromMigratingLine(strLine)))) wscript.echo strFilePath end if loop 'Chdir function get_PathFromChdirLine(strtmpLine) strReturn="" strReturn=Mid(strtmpLine,10,len(strtmpLine)) get_PathFromChdirLine=strReturn end function function get_FileNameFromMigratingLine(strtmpLine) strReturn="" intStringLength=len(strtmpLine) strReturn=Mid(strtmpLine,16,intStringLength) strReturn=left(strReturn,InStrRev(strReturn," failed")) get_FileNameFromMigratingLine=strReturn end function function replaceFSwithBS(strtmp) strReturn=strtmp strReturn=replace(strtmp,"/","\") replaceFSwithBS=strReturn end function 'Compare a target string to a regular expression private function string_compare(expression,targetstring) set oreg= new regexp oReg.pattern=expression oReg.IgnoreCase = TRUE if ("" = expression OR "" = targetstring) then boolSearchResult=0 end if if oReg.test (targetstring) then boolSearchResult=1 else boolSearchResult=0 end if string_compare=boolSearchResult end function Download the script (rename to a zip) This second script performs the copy and rename using the file list generated by the previous script: Syntax: cscript /nologo fix-failedfiles.vbs failedfiles.txt The script: Set objFSO = CreateObject("Scripting.FileSystemObject") strSourceFile=WScript.Arguments.Item(0) set objFileStream = objFSO.OpenTextFile (strSourceFile, 1) do until objFileStream.AtEndOfStream strFileName=objFileStream.ReadLine on error resume next 'Copy file to new to force inflate objFSO.CopyFile strFileName,strFileName & ".new" if err.number > 0 then wscript.echo strFileName else 'rename current to .old objFSO.MoveFile strFileName,strFileName & ".old" 'Rename copy back to original file objFSO.MoveFile strFileName & ".new",strFileName 'delete .old objFSO.DeleteFile strFileName & ".old" end if err.clear loop If any of the files fail to copy then they will be output to the screen. Download the script (rename to a zip) Once you’ve fixed the files that failed to recall you can restart the recall process. This time it should complete. Tagged , , , , , ## Reinflating stubs on the Celerra from secondary storage After looking around the web I couldn’t see any obvious way to reinflate files on a secondary filesystem back to the primary on the Celerra. However, the solution is quite simple. When you delete the dhsm connection from a file system you can opt to have the Celerra scan and move all the stubbed data back to the primary storage. If you’re planning on re-archiving the data to new storage you can do both at the same time. In this setup we have: a rainfinity, a centera and a CIFS-based archive storage. The aim is to reinflate from the centera and re-archive back to the CIFS storage without a) filling up the primary storage filesystem or b) auto-extending the primary filesystem. Here’s an example of a filesystem with a single secondary archive storage (on a centera in this case which loops through the rainfinity): [root@mycelerra bin]# /nas/bin/fs_dhsm -connection myfilesystem -info myfilesystem: state = enabled offline attr = on popup timeout = 0 backup = offline read policy override = none log file = on max log size = 10MB cid = 0 type = HTTP secondary = http://myrainfinityserver.mydomain.com/fmroot state = enabled read policy override = none write policy = full user = rainfinityuser options = httpPort=8000 cgi=n It loops through the rainfinity as the celerra is unable to talk to the centera directly; with CIFS storage it can, cutting the rainfinity out of the chain. Now to perform the migration: On the rainfinity: 1) Create a new policy with the new secondary storage as the destination 2) Disable the existing rainfinity schedule that archives to the centera 3) Create a new rainfinity schedule that archives to the new secondary storage. Select “Capacity Used” as the trigger to start the archiving. You’ll want to set the % about 10% larger than the current filesystem utilization. So if the filesystem is 26% full then set the trigger at about 35 or 40%. 4) Manually run this new schedule against the filesystem. This should automatically create a new cid (so you’ll have two attached to the same filesystem): [root@mycelerra bin]# /nas/bin/fs_dhsm -connection myfilesystem -info myfilesystem: state = enabled offline attr = on popup timeout = 0 backup = offline read policy override = none log file = on max log size = 10MB cid = 0 type = HTTP secondary = http://myrainfinityserver.mydomain.com/fmroot state = enabled read policy override = none write policy = full user = rainfinityuser options = httpPort=8000 cgi=n cid = 1 type = CIFS secondary = \\mycifsshare.mydomain.com\mynewarchive$\
state                = enabled
write policy         =        full
local_server         = mycelerra.mydomain.com
wins                 =

Notice that the cid =0 for the old archive storage and cid=1 for the new storage.

Now we can delete the dhsm connection, cid=0, with recall to just recall the data back from the old secondary storage:

[root@mycelerra bin]# /nas/bin/fs_dhsm -connection myfilesystem -delete 0 -recall_policy yes
myfilesystem:
state                = enabled
offline attr         = on
popup timeout        = 0
backup               = offline
log file             = on
max log size         = 10MB
cid                 = 0
type                 = HTTP
secondary            = http://myrainfinityserver.mydomain.com/fmroot
state                = recallonly [ Migration: ON_GOING ]
write policy         =        full
user                 = rainfinityuser
options              = httpPort=8000 cgi=n
cid                 = 1
type                 = CIFS
secondary            = \\mycifsshare.mydomain.com\mynewarchive$\ state = enabled read policy override = none write policy = full local_server = mycelerra.mydomain.com admin = mydomain.com\mycifsuser wins = Done As you can see that “state” of the connection has changed from “enabled” to “recallonly”. This means that no more data will be archived to the old secondary and that the stubbed data is being recalled back to the primary. You can check on the status by using: [root@mycelerra bin]# /nas/bin/fs_dhsm -connection myfilesystem -info myfilesystem: state = enabled offline attr = on popup timeout = 0 backup = offline read policy override = none log file = on max log size = 10MB cid = 0 type = HTTP secondary = http://myrainfinityserver.mydomain.com/fmroot state = recallonly [ Migration: ON_GOING ] read policy override = none write policy = full user = rainfinityuser options = httpPort=8000 cgi=n There are also some log files you can monitor at the root of the filesystem (e.g. \\mycifsshare.mydomain.com\c$\myfilesystem\) and are named migErr_vdmname_myfilesystem and migLog_vdmname_myfilesystem. The error file will contain any filemames which have failed to be recalled. The log file contains a running log of the recall, including errors.

Once the files have been recalled the connection (cid) will be removed. If there is an issue recalling any files the migration status will change to ERROR (meaning there was a problem and the migration is continuing) and FAILED (meaning that the migration has had at least one error and stopped).

As the primary filesystem fills up with the recalled data the % used will grow until it hits the threshold set in rainfinity to trigger an archive (40% in our case). Fortunately the archiving process is considerably faster than the recalling process and the data will be recalled then archived repeatedly until all the data has been moved from one secondary storage to the other.

If any users access any files which are on the secondary filesystem which is being recalled it will trigger that file to be recalled back to the primary filesystem.

Obviously how long the process takes will depend on the amount of data and the speed of your disks.

Tagged , , , , ,

## Collecting disk usage data from UNC paths

If you’re not using windows or linux devices to present your CIF shares then collecting usage data can be quite tricky. We ran up against this recently while using EMC Celerra devices to present our shares.

The solution we came up with was to mount the UNC on the WhatsUp box, query the drive mount, then disconnect. As you can imagine this is quite costly and if you’re collecting from multiple CIFS shares the monitors can clash and try and use the same drive letter. To get around this we added some randomization and checking to see if a letter is free:

function random_driveletter()
strReturn="T:"
Randomize
intRandom=(int(Rnd()*19))
strReturn=CHR(70+intRandom)
'context.logmessage "CHAR=" & strReturn
random_driveletter=strReturn
end function

function driveexists(strtmpDrive)
boolReturn=0
Set objtmpFileSys = CreateObject("Scripting.FileSystemObject")
If objtmpFileSys.DriveExists(strtmpDrive) Then
boolReturn=1
'context.logmessage "Drive is already in use."
End If
driveexists=boolReturn
end function


Drives in this case are chose from between E (ASCII character number 70) and Y (ASCII character 89). We can then cycle until we get a free letter. We’ve added in some extra checking if we cycle through too many times that will try and clear up all the drive letters so the script shouldn’t cycle through until the script times out:

numAttempts=0
do
strDrive=random_driveletter() & ":"
context.logmessage strDrive
numAttempts = numAttempts+1
if numAttempts > intCleanupThreshold then
cleanupdriveletters()
end if
loop while driveexists(strDrive)


The script uses the “DisplayName” field in the WhatsUp device to get the UNC path so you’ll need to setup a new device per share (or at least filesystem). To get the DisplayName field we query the WhatsUp database:

function getDisplayNamefromID(strtmpDeviceID)
dim strReturn
' Get the DB instance used by WhatsUp
set objDatabase = Context.GetDB
' Check it worked OK
if "" = objDatabase then
Context.SetResult  1, "Problem connecting to database"
else
' We need to find the reference used for this device in the PivotStatisticalMonitorTypeToDevice table first
strQuery = "SELECT sDisplayName FROM  [WhatsUp].[dbo].[Device] where nDeviceID=" & strtmpDeviceID
objResultSet =  objDatabase.Execute(strQuery)
strReturn = objResultSet(0)
end if
getDisplayNamefromID=strReturn
end function

Then use

UNCpath=getDisplayNamefromID(Context.GetProperty("DeviceID"))

to get the path to use to map. This way we can create a single performance monitor script that is used on many shares. The advantage of this is that we can use it in Alert Center to create a single threshold configuration that includes all our CIFS shares.

Here’s the full script:

intCleanupThreshold=5

UNCpath=getDisplayNamefromID(Context.GetProperty("DeviceID"))

' Get the Windows credentials for the device

strComputer="."
strDriveMap=UNCpath

'Timestamp
startTime = Timer()

numAttempts=0

do
strDrive=random_driveletter() & ":"
context.logmessage strDrive
numAttempts = numAttempts+1
if numAttempts > intCleanupThreshold then
cleanupdriveletters()
end if
loop while driveexists(strDrive)

context.logmessage strDrive & " " & strDriveMap & " took " & numAttempts & " attempts to get a free letter"

startMapTime = Timer()

Set objNetwork = CreateObject("WScript.Network")

numAttempts=0
do
err.clear
on error resume next
tmpStatus=err.number
if tmpStatus <> 0 then
context.logmessage err.Description & " mapping drive " & strDrive & " to " & strDriveMap
cleanupdriveletters()
end if
context.logmessage "err.num=" & tmpStatus
numAttempts=numAttempts+1
if numAttempts > intCleanupThreshold then exit do
loop while tmpStatus <> 0

endMapTime = Timer()
intMapDuration=int((endMapTime-startMapTime)*1000)
context.logmessage "Took " & intMapDuration & "ms to map " & strDrive & " to " & strDriveMap

Set objWMIService = GetObject("winmgmts:" _
& "{impersonationLevel=impersonate}!\\" & strComputer & "\root\cimv2")
Set colDisks = objWMIService. _
ExecQuery("Select * from Win32_MappedLogicalDisk  where Caption = """ & strDrive & """")

For Each objDisk In colDisks
floatPercUsed=percentage_used(objDisk.Size,objDisk.FreeSpace)
endTime=Timer()
intDuration=int((endTime-startTime)*1000)
context.setvalue floatPercUsed
Next

startMapTime = Timer()
objNetwork.RemoveNetworkDrive strDrive
endMapTime = Timer()
intMapDuration=int((endMapTime-startMapTime)*1000)
context.logmessage "Took " & intMapDuration & "ms to unmap " & strDrive & " from " & strDriveMap

function percentage_used(strDiskSize,strFreeSpace)
floatReturn=0
floatReturn=100-Round((strFreeSpace/strDiskSize)*100,1)
percentage_used=floatReturn
end function

function random_driveletter()
strReturn="T:"
Randomize
intRandom=(int(Rnd()*19))
'context.logmessage intRandom

strReturn=CHR(70+intRandom)
'context.logmessage "CHAR=" & strReturn
random_driveletter=strReturn
end function

function driveexists(strtmpDrive)
boolReturn=0
Set objtmpFileSys = CreateObject("Scripting.FileSystemObject")
If objtmpFileSys.DriveExists(strtmpDrive) Then
boolReturn=1
'context.logmessage "Drive is already in use."
End If

driveexists=boolReturn
end function

function cleanupdriveletters()
context.logmessage "Cleaning up driveletters to free space"
Set objtmpNetwork = CreateObject("WScript.Network")
for i=70 to 89 step 1
tmpDriveLetter=CHR(i) & ":"
context.logmessage "Processing letter " & tmpDriveLetter
on error resume next
objtmpNetwork.RemoveNetworkDrive tmpDriveLetter
if err.number then
context.logmessage tmpDriveLetter & " (" & Replace(Replace(err.description, CHR(13),""),CHR(10),"") & ")"
err.clear
else
context.logmessage "Removed " & tmpDriveLetter
end if
next
end function

function getDisplayNamefromID(strtmpDeviceID)
dim strReturn

' Get the DB instance used by WhatsUp
set objDatabase = Context.GetDB

' Check it worked OK
if "" = objDatabase then
Context.SetResult  1, "Problem connecting to database"
else
'context.logmessage "Connected to DB OK"

' We need to find the reference used for this device in the PivotStatisticalMonitorTypeToDevice table first
strQuery = "SELECT sDisplayName FROM  [WhatsUp].[dbo].[Device] where nDeviceID=" & strtmpDeviceID
objResultSet =  objDatabase.Execute(strQuery)
strReturn = objResultSet(0)
end if
getDisplayNamefromID=strReturn
end function