Wednesday, 28 August 2013

Command line to match lines with matching first field (sed, awk, etc.)

Command line to match lines with matching first field (sed, awk, etc.)

What is fast and succinct way to match lines from a text file with a
matching first field.
Sample input:
a|lorem
b|ipsum
b|dolor
c|sit
d|amet
d|consectetur
e|adipisicing
e|elit
Desired output:
b|ipsum
b|dolor
d|amet
d|consectetur
e|adipisicing
e|elit
Desired output, alternative:
b|ipsum|dolor
d|amet|consectetur
e|adipisicing|elit
I can imagine many ways to write this, but I suspect there's a smart way
to do it, e.g., with sed, awk, etc. My source file is approx 0.5 GB.
There are some related questions here, e.g., "awk | merge line on the
basis of field matching", but that other question loads too much content
into memory. I need a streaming method.

No comments:

Post a Comment