HOME  |  GIT Overview  |  Script-Archive: (docs) : (wiki) : (git)  |  ...

Peter's Script-Archive - List Processing incl. Mass-Editing


This toolbox compartment contains the heavy lifting tools for generic pipeline 'line'-based text processing.

About Stream-Editing aka Mass-Editing

I think extending the range of grep usability to allow mass-editing files makes for a nice and powerful concept. This is demonstrated in Grep.xchange below, provided the editing is more or less contained in the grepped lines. But there are also other very useful mass-editing (=non-interactive aka stream-editing) methods to add to your toolbox:

From perl -i.bak -lne ... and sed -i.bak -e ... to a more mature implementation of stream-editing such as Lee Eakin's ped, which is somewhat like a sed-done-right rewrite in perl, with flock and everything. Then there are more generic apps, that in a specific domain are very suitable for mass-edits. Let me give just some examples from the field of system-administration, more specific system and network configuration management: cfengine, puppet, or the 'tool cluster' of augeas/pad/boomerang (PEGs and lenses!).

From patching to vimming: Don't forget about patch (and check out the way it is used by Grep.xchange) and a true classic interactive/non-interactive line editor that provided the first base format for patch: ed; its ex successor embedded within e.g. vim is still quite similar. With the vimscript script from this very archive you can extend the usable stream-editing command-set to most of vim's commands including normal mode! Just be a bit careful and turn off unnecessary niceties like syntax-highlighting or unlimited undo while editing log files in the 100MB region and beyond. Combined with an external language - vim's embedded Python interface is probably the most stable - things get truly interesting. That's not to say that vim's builtin language is a slouch, but still, vimscript is not quite like Python, Perl or Lisp for elegance, expressiveness, speed of writing or community support and range of available modules.

Let's conclude this tiny and very subjective overview with a little gem of an OR-article about perl mass-editing techniques.

Mass-Editing and Regular Expressions

Non-interactive editing requires some way to describe locations and ranges in files. Usually this is done with (sets of) both line-numbers and regular expressions. In this context you can think of the boolean regular expressions of Grep.pm below as just a more convenient way to combine sub-expressions.

Newer developments to watch: The more recent PEGs (e.g. in perl6/Rakudo), tree parsers+transformations, and lenses are also very fun stuff and do extend the Reach Of The GREP and its expressiveness, while reducing the -ahem- line-noise aspect.

A Challenge: Any one up to implementing a prolog interpreter and using an extended regex/peg engine itself as the implementation of the backtracking code (mail me if you did!) :)? The current feature set of e.g. full perl5.10 regexes goes way beyond classic DFA-limits...

If you want to go to the real basics of regular expressions and think about the way PCREs (and worse, Perl5.10's own regex implementation) defy and bend the classics, do check out the computer science terms of EBNF, recursive descent parsing, deterministic finite automatons, the Chomsky hierarchy of grammars (esp. Chomsky-3), the Turing-Machine, and regular expressions. Probably best start from back to front and choose wikipedia as the starting point, but don't forget to look at the seminal papers listed as references.

Commands

Git View of this section / Download

See also:


HOME  |  GIT Overview  |  Script-Archive: (docs) : (wiki) : (git)  |  ...

jakobi(at)acm.org, 2009-07 - 2012-03