当前位置: 动力学知识库 > 问答 > 编程问答 >

string - With Perl, how do I read records from a file with two possible record separators?

问题描述:

Here is what I am trying to do:

I want to read a text file into an array of strings. I want the string to terminate when the file reads in a certain character (mainly ; or |).

For example, the following text

Would you; please

hand me| my coat?

would be put away like this:

$string[0] = 'Would you;';

$string[1] = ' please hand me|';

$string[2] = ' my coat?';

Could I get some help on something like this?

网友答案:

One way is to inject another character, like \n, whenever your special character is found, then split on the \n:

use warnings;
use strict;
use Data::Dumper;

while (<DATA>) {
    chomp;
    s/([;|])/$1\n/g;
    my @string = split /\n/;
    print Dumper(\@string);
}

__DATA__
Would you; please hand me| my coat?

Prints out:

$VAR1 = [
          'Would you;',
          ' please hand me|',
          ' my coat?'
        ];

UPDATE: The original question posed by James showed the input text on a single line, as shown in __DATA__ above. Because the question was poorly formatted, others edited the question, breaking the 1 line into 2. Only James knows whether 1 or 2 lines was intended.

网友答案:

This will do it. The trick to using split while preserving the token you're splitting on is to use a zero-width lookback match: split(/(?<=[;|])/, ...).

Note: mctylr's answer (currently the top rated) isn't actually correct -- it will split fields on newlines, b/c it only works on a single line of the file at a time.

gbacon's answer using the input record separator ($/) is quite clever--it's both space and time efficient--but I don't think I'd want to see it in production code. Putting one split token in the record separator and the other in the split strikes me as a little too unobvious (you have to fight that with Perl ...) which will make it hard to maintain. I'm also not sure why he's deleting multiple newlines (which I don't think you asked for?) and why he's doing that only for the end of '|'-terminated records.

# open file for reading, die with error message if it fails
open(my $fh, '<', 'data.txt') || die $!; 

# set file reading to slurp (whole file) mode (note that this affects all 
# file reads in this block)
local $/ = undef; 

my $string = <$fh>; 

# convert all newlines into spaces, not specified but as per example output
$string =~ s/\n/ /g; 

# split string on ; or |, using a zero-width lookback match (?<=) to preserve char
my (@strings) = split(/(?<=[;|])/, $string); 
网友答案:

I prefer @toolic's answer because it deals with multiple separators very easily.

However, if you wanted to overly complicate things, you could always try:

#!/usr/bin/perl

use strict; use warnings;

my @contents = ('');

while ( my $line = <DATA> ) {
    last unless $line =~ /\S/;
    $line =~ s{$/}{ };
    if ( $line =~ /^([^|;]+[|;])(.+)$/ ) {
        $contents[-1] .= $1;
        push @contents, $2;
    }
    else {
        $contents[-1] .= $1;
    }
}

print "[$_]\n" for @contents;

__DATA__
Would you; please
hand me| my coat?
网友答案:

Something along the lines of

$text = <INPUTFILE>;

@string = split(/[;!]/, $text);

should do the trick more or less.

Edit: I've changed "/;!/" to "/[;!]/".

网友答案:

Let Perl do half the work for you by setting $/ (the input record separator) to vertical bar, and then extract semicolon-separated fields:

#!/usr/bin/perl

use warnings;
use strict;

my @string;

*ARGV = *DATA;

$/ = "|";
while (<>) {
  s/\n+$//;
  s/\n/ /g;
  push @string => $1 while s/^(.*;)//;
  push @string => $_;
}

for (my $i = 0; $i < @string; ++$i) {
  print "\$string[$i] = '$string[$i]';\n";
}

__DATA__
Would you; please
hand me| my coat?

Output:

$string[0] = 'Would you;';
$string[1] = ' please hand me|';
$string[2] = ' my coat?';
分享给朋友:
您可能感兴趣的文章:
随机阅读: