-
Notifications
You must be signed in to change notification settings - Fork 28
Spatch
Contents
The goal of spatch
is to allow programmers to
express and perform refactorings while using a syntax they
already are familiar with, the patch syntax. For instance
to remove everywhere the second argument of a function foo
one can write this syntactical patch:
//remove_second_arg_foo.spatch foo(X - ,Y )
and then apply it on a codebase with:
$ spatch -f remove_second_arg_foo.spatch *.php
or:
$ find | grep .php | xargs spatch -f remove_second_arg_foo.spatch
This will work even if the function call is splitted on multiple lines or has extra spaces between the comma and the second expression, because spatch works at the abstract syntax tree level, not at the token or string level like patch or sed.
One could also write it as:
// remove_second_arg_foo_alt.spatch - foo(X,Y) + foo(X)
(although it has some caveats as explained in the section about spaces below)
Finally one can also use the "sed mode" of spatch
as in:
$ spatch -e 's/foo(X,Y)/foo(X)/' *.php
See https://github.com/facebook/pfff/blob/master/main_spatch.ml
Most programming languages do not have refactoring tools and when they have, like Java with Eclipse, the programmer is often limited to a restricted set of refactorings such as "dropping an argument", "adding an argument", "move a function". Just like for Sgrep, we want to easily express complex code patterns but also source-to-source transformation on those patterns in a flexible way. Spatch is domain specific language to express such refactorings.
The synopsis is:
$ spatch (-f <spatch_file> | -e <s/before/after/>) [options] <files_or_dirs>
By default spatch
generates a diff on stdout. Once you are confident
that your syntactical patch is correct, you can then use the --apply-patch
to actually modify the relevant files.
The further options are:
[--apply-patch] [--pretty-printer] [-lang <lang>]
There is support for a few programming languages. See Matrix to check for your favourite programming language.
One can write any PHP expressions inside a syntactical patch
and annotate subparts of it with -
and +
any way you want.
For instance with this spatch:
f(2, - foo(1) + foo(2) )
we want to replace every calls to foo(1)
by foo(2)
but only when the call is nested inside a specific kind of
calls to f
, the ones where the first argument of f
is 2.
On this file:
<?php f(2, foo(2)); f(1, foo(1)); f(2, foo(1)); f(2, foo(1));
spatch
will generate:
$ ./spatch -f tests/php/spatch/foo.spatch tests/php/spatch/foo.php --- tests/php/spatch/foo.php 2010-11-04 22:58:16.000000000 -0700 +++ /tmp/trans-31284-13ff71.php 2010-11-04 23:12:35.000000000 -0700 @@ -5,8 +5,8 @@ f(1, foo(1)); - f(2, foo(1)); + f(2, foo(2)); f(2, - foo(1)); + foo(2));
Just like for Sgrep, spatch
supports metavariables so you
can write syntactical patches like:
// remove_second_arg_foo_alt.spatch - foo(X,Y) + foo(X)
You can use metavariables in place of full PHP expressions.
You can also use metavariables for XHP attribute values as in:
<ui:section-header - border=X ></ui:section-header>
See Sgrep#Metavariables for more examples.
The principle of spatch
is to take a pattern file, the spatch file,
and match it over a source file. By using metavariables we
get a more flexible pattern that can accomodate more source files.
In the same way even if the spatch file contains
extra spaces between tokens, or if an expression is split on multiple lines,
it will still match source files using a different indentation style because
spatch
like sgrep
works at the AST level.
See Sgrep#Isomorphisms for a few other tricks done by spatch
called isomorphisms which allow the pattern to accomodate
more source files
spatch
unfortunately sometimes generates diffs that
break the indentation of the original code. For instance on
this code:
foo(1, 2);
the application of this spatch file:
- foo(X, Y) + bar(X, Y)
will generate this code:
bar(1, 2);
and not:
bar(1, 2);
as one would expect.
The following spatch file on the opposite will perform the right thing:
- foo + bar (X, Y)
which may seem surprising because both spatch files look equivalent.
To understand the difference, one must
understand how internally spatch
works, how it handles the
minus code, plus code and the metavariables.
Here is what spatch
internally does given this spatch file:
- foo(X, Y) + bar(X, Y)
- it extracts the sgrep "pattern" from the spatch file by just looking
at the minus and contextual lines. A contextual line is a line
without any sign (in our case there is no such lines). So here
the extracted pattern is
foo(X, Y)
- it annotates the tokens in the pattern with a minus and/or plus sign, to indicate which transformation to perform on the token. Here: [-foo; -(; -X; -,; -Y; -)+"bar(X,Y)"].
- it then matches the (annotated) pattern on the code, and transfers
the annotation (the - and +), on the tokens in the actual code.
So on the
foo(1,2)
example, the tokens in the PHP code will then be [-foo; -(; -1; -,; -2; -)+"bar(1,2)"]. - it pretty prints the tokens and associated spaces/comments in the original file if the token had no annotation. Otherwise, with a - annotation it does not print the token and with a + annotation it prints the string attached to the +. So here most tokens will be removed and the last parenthesis will be replaced by the string "bar(1,2)".
Here is what spatch
internally does with the spatch file below,
which should explain why this spatch file is more "space friendly":
- foo + bar (X, Y)
- it extracts the sgrep pattern, still 'foo(X, Y)'
- it annotates the tokens in the pattern, which this time are [-foo+bar; (; X; ,; Y; )]. As you can see only one token has an annotation.
- it matches the code and transfer the annotation. So on
the
foo(1,2)
example, only thefoo
token will have an annotation. - it pretty prints the tokens and associated spaces/comments in the original file if the token had no annotation, which here is the case for most of the tokens involved, including the token for the comma, which will then have its subsequent newline and tab pretty printed.
So to minimize the number of spacing issues, try to maximize the number of contextual lines in the spatch file, that is lines without any leading -.
NEW There is a new --pretty-printer
option to spatch
that
will cause spatch
to call a pretty printer on the modified code
to possibly reindent the code in a nice way (but it currently
does not support the whole PHP language).
For instance on this code:
//test.php function test1() { return foo('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); }
and this spatch:
//test.spatch - foo(X) + foo(X, 1, 2, 3, 4)
then spatch --pretty-printer -f test.spatch test.php
will generate:
--- test.php 2011-11-08 14:26:23.000000000 -0800 +++ /tmp/trans-8024-37a89b.php 2011-11-08 14:26:36.000000000 -0800 @@ -1,5 +1,11 @@ <?php function test1() { - return foo('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); + return foo( + 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', + 1, + 2, + 3, + 4 + ); }
spatch
is significantly slower than tools like sed
because it
works on a more complex structure than a stream of characters, the
abstract syntax tree. Nevertheless you can combine it with git
grep
piped to xargs
to speedup things:
$ git grep -l foo |xargs spatch -f remove_second_arg_foo.spatch
Here is the rename_foo_in_bar.spatch file:
- foo + bar (...)
If the syntactical patch notation is not expressive enough for your refactoring needs, you can still express the refactoring by using the internal pfff API that works on the ASTs of the source code.
Here is the content of pfff/demos/simple_refactoring.ml
which
explains how to use the internal OCaml pfff API to perform
a simple refactoring:
http://github.com/facebook/pfff/commit/c7b66cb5471e390a83fd0379754135224a1b34f0
Just Like for sgrep, generalizing spatch patterns to the full PHP language, not just PHP expressions, so one can refactor class definitions, function headers, statements, etc.
spatch
is a continuation of the work I've done on coccinelle,
an advanced refactoring tool for C http://coccinelle.lip6.fr/ I co-designed
with Julia Lawall.
Related tools: