Skip to content
Feb 6 11

/dev/sd to LUN number maping

by jprice

Occasionally, I need to figure out which lun# corresponds to which /dev/sd?? device might be out there.

You need this in a SAN environment when you need to add/remove a LUN from a given host… it helps to know which device name you actually need to vacate.

This works for Redhat 4 and 5:

ls -l /sys/block/sd*/device | grep “:{lun number}”

ls -l /sys/block/sd*/device | grep “:43″
would give (in my case, with 43 luns mapped across two different HBA’s):

lrwxrwxrwx 1 root root 0 Feb 4 01:59 /sys/block/sdas/device -> ../../devices/pci0000:00/0000:00:0a.0/0000:0e:00.0/host5/target5:0:0/5:0:0:43
lrwxrwxrwx 1 root root 0 Feb 4 01:59 /sys/block/sdck/device -> ../../devices/pci0000:00/0000:00:0a.0/0000:0e:00.1/host6/target6:0:0/6:0:0:43

Note: The lun numbers are in base 10 (on most arrays, lun numbers are given in base 16).

Dec 15 10

lftp: awesome.

by jprice

Hopefully, no one out there has to deal with FTP anymore. It blows for reasons I won’t bother to belabor.

But if you need an FTP client which doesn’t suck, and gets out of your way, take a long look at lftp. It just works. The scripting ability is baked in and sensible.

Take a look

Dec 7 10

Linux LVM details

by jprice

Post one of many.

From what I can tell, there’s no way to ask a VG which PV’s are in it. That’s disappointing. However, you can run ‘pvs’ and get a list of all PV’s in the system, and which VG’s they are associated with.

# pvs
PV VG Fmt Attr PSize PFree
/dev/cciss/c0d0p2 localdisks lvm2 a- 279.25G 0
/dev/sdan mssql_backups_vg lvm2 a- 26.00G 0
/dev/sdao mssql_backups_vg lvm2 a- 26.00G 0

Oct 15 10

Backup media (part 2: disk architecture)

by jprice

So, everyone tries to sell backup to disk. To be fair, there are LOTS of advantages to this method, particularly if you’re coming from an exclusively tape based architecture.

The problem is every single metric the vendors give you will assume you’re coming from a backup-to-tape, doing 2 sets of full backups every day, one for on-site, one for off-site architecture.

So, some things people will try to sell you on:

1) Storage life cycle policies: IE use expensive disk, then cheap disk for storage of backups. Don’t get me wrong: for some applications, these lifecycles can be a huge win[2]. The problem is I think the fundamental premise is flawed. If you’re doing backups over a network, the speed difference between 15k rpm SAS vs 7200rpm SATA drives isn’t that big of a deal[1]. The other problem, at least with NetBackup is it doesn’t seem to handle tapes very well.

2) Deduplication: Data Domain was first, then NetApp, now Symantec is in the game with Puredisk. They talk about whether to do the de-duplication ‘inline’ or as a ‘post-process’, etc. What I don’t know (but will be finding out in the next few weeks/months) is this: is it better to take the big wins of ‘radically reduce the frequency of full backups’, or will this + dedup help a lot?

[1] I’m assuming you’re not burning money on SSD’s for backups. If you are, can I come work for you?

[2] Storage life cycles: Meaning you take the backups, and write them to media type x. Then they’re duplicated to media type y after some interval (during the quiet period, after 2 weeks, whatever). There can be arbitrary numbers of media pools, usually referred to as ‘platinum’, ‘gold’, ‘silver’, etc.

Oct 15 10

Backup media

by jprice

The world has changed in backup media, if you’re willing to spend the cash.

In the past, tape has been the undisputed king of backup media. It’s durable, high capacity, reasonably cheep, and it has a long, proven shelf life. Every few years you upgrade the robot/tapes/media, and if your smart, you kept read compatibility (ie LTO2 -> LTO4 and similar). The mix of benefits and cost means that tapes are almost never going to actually be removed from an enterprise (baring a major tech shift)[1], though the use of tape will be increasingly limited to archival style backups.

But if you get to rebuild backups from the ground up, adding a backup-to-disk component is worth it. Some big reasons for this are 1) Restore time is FAR shorter for the vast majority of your restores. Most restores from disk finish before the tape has loaded into the tape drive, and spooled to the right point. 2) You don’t have to fear restore complexity. This takes some explaining: In tapes, you don’t want to have to deal with more than 1 or at most 2 tapes… so you never want to have to restore a Full, then a level 2 backup on top of the full, then a level 3 backup on top of that (etc)[2]. With Disk, assuming all the versions are still on disk, you can just fire off the restore, and let the system figure it out. it automagically does the restores in the proper order.

These two reasons means you can radically rethink how you take backups. With tapes, you’re usually taking a full every day, or at best every week, with an incremental filling every other day.

With Disk, you can take a full backup out to a ‘once a month’ time frame, or even further depending on your data change rate. Take a Level2 backup (Cumulative) every week, and a Level3 (Differential) every day, and you’ve got daily granularity, at a fraction of the storage capacity needed.

Of course, you can add in interesting technologies like DeDuplication at this point as well, but I’m not fully sold on their list of goods. More on this in the future.

[1] Disk backups are awesome, but at reasonable retention periods (>6 months) you need to move the data to something else. Besides, it’s easy to fit tapes into most DR strategies: ship a cartload of tapes to the DR site, and have an admin party.

[2] Granted, the computer can still solve this for you IF you have all the tapes in the robot. But if you don’t, you’re in a world of pain.

Sep 14 10

Long tcpdump and the -C option

by jprice

Note to the wise: If you’re using the -C option to tcpump[1], the directory the output file is written to must be writable by the pcap user.

You’ll get an error of:

tcpdump: file: Permission denied

The reason for this is tcpdump, by default, drops permissions as soon as it opens the initial file, and the interface. This is cool. But it means that most places you’d want to write a dump file will not work. The solution is to ‘chown pcap ‘. Or just use /tmp.

The other way around this is to tell tcpdump NOT to drop permissions, or drop them in favor of someone who can write to the directory via ‘-Z

[1] Make a new output file after the previous one reaches a certain size, specified in megs

Aug 28 10

Backups: Media choice and de-dup

by jprice

Backups by definition eat a lot of storage space. That’s why tapes ruled the backup universe for so long. Tapes offer a relatively cheep and liner cost as the data flow increased. The only changes in the cost expectation came when you needed to replace/add robots, or change media types (say LTO3 -> LTO5). The saving grace is that usually could be taken care of by capital expenses every 3-5 years, after which, costs fall back to a flat $/gb rate.

The other huge advantage? Tapes are designed to be taken offsite easily. Certain companies love selling Backup-to-disk architectures these days, but unless the backups live off site (thus incurring a high speed data link cost), you don’t get the advantages of taking the tapes offsite every day.

The Backup-to-disk people have a strong argument though. 1) Random access. No need to spool the tape. 2) Normal restores (non DR) are a lot faster, especially if you have to do a compound restore[1]. Compound restores for tapes are a massive PITA, but with backup-to-disk, you just click ‘go’, and the software does it for you. 3) Finally, with a Backup-to-disk environment, you could also involve some DeDuplication functionality. But that may be the subject of another post.

It seems to me that backup-to-disk wins out easily at a small scale (less than ~3-5 TB of total backup data). Tape wins out at middle scale, and somewhere up the scale line, you get to a situation where a hybrid approach is the best.

What we’re doing now is a hybrid. Our recent history backups are kept on a disk system at a non-primary site. This gives us the ‘off site backups’ audit check mark. This also means the most frequent restores can be retrieved from disk (meaning it’s fast and easy). After 3 months however, we need to resort to tapes to get the data. We replicate the disk based backup images to tape, and at intervals send them to a third site. I’d love to involve a (Recall|Iron Mountain) type company to take our tapes and protect them for us, but I haven’t been able to get the expense approved.

[1] I may have just made up this term. If you have a L1 (full) and a L2 (cumulative), and a L3 (differential) restore to apply to get the right version of the file. A ‘compound restore’ would be 2 or more restores to get the right revision.

Aug 20 10

NetApp and upgrades

by jprice

What NetApp does well, they do really well. Where they do poorly, they’re boneheaded.

Witness upgrades and ‘disk rightsizing’.

I’m still working through the the full upgrade process on my v3140, but upgrading the 2050 was pretty painless… Especially since the firmware of each disk had to be updated. I’d expect updating the firmware on all the disks would be a full outage kind of deal, but they made it painless.

Step 1: Place the disk firmware in the right folder on the netapp.

There is no step 2. The Netapp notices that there’s new firmware out there, notices that drives need it, and applies the firmware to each drive individually. It spins down the drive, updates the firmware, and then spins it back up, and then applies any IO that the drive missed from cache. That’s awesome. I didn’t have this situation, but apparently it’s even smart enough to do the drives it can, in parallel… ie if you have multiple aggregates, it’ll do one drive per aggregate at a time.

(my 3140 has a problem with a shelf logging an error that netapp’s not happy with… they won’t sign off on my upgrade until it’s fixed. The problem is fixing it may involve powering down a shelf, which means full unit downtime. I hope that doesn’t happen…)

Jul 30 10

rsync and redhat/centos 5

by jprice

Why on earth is RedHat 5 using rsync version 2.6.8? Redhat 5 was released in 2007. rsync version 2.6.9 (higher) was released in 2006. Gah.

There are HUGE advantages in rsync version 3+, most notably, building the file list takes MUCH less time. One dataset that I have went from ‘Building file list…’ in nearly 2 hours, to nearly 2 seconds.

That’s a life saver.

(Redhat 6.0 beta2 has v3.0.6. The latest is v3.0.7, and was released on Dec 31, 2009. That’s decent, I guess, but I’d prefer it to be up to date).

So, if rsync is taking way longer than it should, try upgrading to v3.0.7.

Jul 29 10

Backups (an overview)

by jprice

Since I’ll be rebuilding/migrating our backup environment for the rest of the year, I’ll probably talk about backups a fair bit.

Philosophically, there are 3 driving needs behind backups. People say backups, when they mean:

1) Disaster recovery. The ‘Our building was just hit by a meteor, and we need to rebuild our environment pronto’ scenario. The requirements for this are fairly straight forward: Backup all the data needed to rebuild quickly if the building isn’t there tomorrow. Very data intensive, but the data grows stale very quickly. You’d want 2, maybe 3 sets worth of full data.

2) User error. The ‘Our DBA’s dropped this table 3 weeks ago, but forgot about the monthly report that still requires it’ or ‘The VP of marketing blew away this file 2 months ago… is it in our backups?’ scenarios. This can be fairly data intensive, since you need to keep so many more images than would be required by Disaster Recovery, but you can prune out a portion of the data (the files for the OS are a waste for example… you could also prune lots of transient data). You also need to consider what your retention period is, since that’s the big multiplier for your cost.

3) Archival. ‘The IRS requires you to keep this class of financial records for 15 years’ scenario. The good news is this is usually a very thin subset of your data. The better news is that usually it’s just kept in the various databases, and these applications don’t rely on backups to keep the data long term. The bad news is the retention times are crazy long, and you’re often backing up this 6 and 7 year old data multiple times.

Other topics: Backup to tape, or disks? DeDuplication? Netbackup or someone else?