Home » Perl » HowTo: Perl: split a file into multiple files based on the host name embedded as the first “column” of the file

HowTo: Perl: split a file into multiple files based on the host name embedded as the first “column” of the file

perlLet’s assume you have a file that you need to split into multiple files based on the host name embedded as the first “column” of the file.

For example, the two rows below would go into the file “hosth.out”:

hosth|US-DBA Team|-rw-rw-rw-, 1, db2, dbagroup, 22977, Mar 29 2009, /dba/db2/sqllib_v95/db2dump/d43.log
hosth|US-DBA Team|-rw-rw-rw-, 1, db2, dbagroup, 1299, Mar 29 2009, /dba/db2/sqllib_v95/db2dump/db2.nfy

There are many many ways of doing this, but let’s keep it simple and somewhat modular.

For each line, we make a call to the get_file_handle() subroutine. If the file handle already exists, meaning it resides in the %file_handles hash, then we return that file handle, otherwise we create a new file handle and return it. If the input file was large, we would want to eliminate the subroutine call entirely.

#!/bin/env perl

use strict;
use warnings;

# we need a hash to store the file handles that we're going to write to
#  if we were using Perl 5.10, we could use a static variable (state $variable) in the get_file_handle() subroutine
our %file_handles = ();

#-------------------
sub get_file_handle {
    my $host = shift;

    # if the file handle already exists, then we don't need to create a new one
    unless (exists ($file_handles{$host})) {
        # open a file handle (hostname.out) to write to and store it in the %file_handles hash
        open($file_handles{$host}, ">", $host . '.out')
            or die "ERROR: Unable to open $host . '.out'\n";
    }

    # return the file handle
    return $file_handles{$host};
}

#-------------------
# read one line at a time
while (my $line = <data>) {
    # remove new line from end of $line
    chomp $line;

    # split the line on pipes but only return the first field
    my $host = ( split(/\|/, $line) )[0];

    # retrieve the file handle to write to from the hash based on the host name
    my $fh = get_file_handle($host);

    # write the line to the file handle (files)
    printf $fh "%s\n", $line;
}

__END__
hosth|US-DBA Team|-rw-rw-rw-, 1, db2, dbagroup, 22977, Mar 29 2009, /dba/db2/sqllib_v95/db2dump/d43.log
hosth|US-DBA Team|-rw-rw-rw-, 1, db2, dbagroup, 1299, Mar 29 2009, /dba/db2/sqllib_v95/db2dump/db2.nfy
hostc|UNKNOWN|-rwxrwxrwx, 1, sybase, dbagroup, 113, Jan 25 2012, /dba/code/project/me/MDA/homer/sysmon.sh
hostc|UNKNOWN|-rw-rw-rw-, 1, sybase, dbagroup, 633337, Feb 9 2013, /dba/output/active/uber.dtl.20134
hostd|UNKNOWN|-rwxrwxrwx, 1, sybase, dbagroup, 360, Jan 31 2012, /dba/backup/sybbackup/bcp/build_all_lst.ksh
hostd|UNKNOWN|-rwxrwxrwx, 1, sybase, dbagroup, 156, Feb 1 2012, /dba/code/syb/goober/bcp_central/m5custrpt1.dbo.temporegontrl.in.sh
hostd|UNKNOWN|-rwxrwxrwx, 1, sybase, dbagroup, 168, Feb 1 2012, /dba/code/syb/goober/bcp_central/m5custrpt1.dbo.tempdoclookuptypes.in.sh
hostd|UNKNOWN|-rwxrwxrwx, 1, sybase, dbagroup, 162, Feb 1 2012, /dba/code/syb/goober/bcp_central/m5custrpt1.dbo.tempdocsettings.in.sh

hostc.out:

hostc|UNKNOWN|-rwxrwxrwx, 1, sybase, dbagroup, 113, Jan 25 2012, /dba/code/project/me/MDA/homer/sysmon.sh
hostc|UNKNOWN|-rw-rw-rw-, 1, sybase, dbagroup, 633337, Feb 9 2013, /dba/output/active/uber.dtl.20130209170034

hostd.out:

hostd|UNKNOWN|-rwxrwxrwx, 1, sybase, dbagroup, 360, Jan 31 2012, /dba/backup/sybbackup/bcp/build_all_lst.ksh
hostd|UNKNOWN|-rwxrwxrwx, 1, sybase, dbagroup, 156, Feb 1 2012, /dba/code/syb/goober/bcp_central/m5custrpt1.dbo.temporegontrl.in.sh
hostd|UNKNOWN|-rwxrwxrwx, 1, sybase, dbagroup, 168, Feb 1 2012, /dba/code/syb/goober/bcp_central/m5custrpt1.dbo.tempdoclookuptypes.in.sh
hostd|UNKNOWN|-rwxrwxrwx, 1, sybase, dbagroup, 162, Feb 1 2012, /dba/code/syb/goober/bcp_central/m5custrpt1.dbo.tempdocsettings.in.sh

hosth.out:

hosth|US-DBA Team|-rw-rw-rw-, 1, db2, dbagroup, 22977, Mar 29 2009, /dba/db2/sqllib_v95/db2dump/db2diag.log
hosth|US-DBA Team|-rw-rw-rw-, 1, db2, dbagroup, 1299, Mar 29 2009, /dba/db2/sqllib_v95/db2dump/db2.nfy

Share Button

Leave a Reply

Your email address will not be published. Required fields are marked *

*
*

Facebook login by WP-FB-AutoConnect