Skip to content

Instantly share code, notes, and snippets.

@pirate
Last active July 31, 2024 04:13
Show Gist options
  • Save pirate/e27ba40a267af62b5d8447f8892d73c6 to your computer and use it in GitHub Desktop.
Save pirate/e27ba40a267af62b5d8447f8892d73c6 to your computer and use it in GitHub Desktop.
Bash script to remove accents and special characters from filenames recursively.
#!/usr/bin/env bash
# Recursively remove all special characters from filenames by renaming them to their ASCII normalized forms.
#
# By default it does a dry run, to actually move the files uncomment the `mv -vi ...` line.
#
# This is useful for cleaning up network shares that will be shared via SMB/NFS between Unix/macOS/Windows
# where non-ASCII filenames can sometimes cause "file does not exist" errors when trying to access the files.
#
# This script removes leading/trailing whitespace in filenames and replaces accents and non-english
# characters with their ASCII equivalent, if no ASCII equivalent exists, it removes the character e.g.:
# some_name_í.txt -> some_name_i.txt
# some_name_á.txt -> some_name_a.txt
# some_name_é.txt -> some_name_e.txt
# some_name_^.txt -> some_name_.txt
# some_name_🐞.txt -> some_name_.txt
# some_name_в.txt -> some_name_.txt
IFS=$'\n'
folder="."
allowed_characters="a-zA-Z0-9_\. \/@#\~&$+()\'!-"
normalize_cmd="
import re
import unicodedata
normalized = unicodedata.normalize('NFD', input()).encode('ascii', 'ignore').decode('utf-8')
stripped = re.sub('[^$allowed_characters]', '', normalized)
print(stripped)"
badfiles=$(
find "$folder" -name '*' # find all files in the folder recursively
| grep ".*[^$allowed_characters].*" # filter for filenames containing characters allowed the specified charset
| awk '{ print -length, $0 }' | sort -n -s | cut -d" " -f2- # sort longest -> shortest so we rename child files before their parent folders to avoid breaking paths
)
for path in $badfiles; do
oldpath="$path"
newpath=$(echo "$oldpath" | python3 -c "$normalize_cmd")
echo "From: $oldpath"
echo "To: $newpath"
# mv -vi -- "$oldpath" "$newpath"
echo "--------------------------------------------"
done
@didwedo
Copy link

didwedo commented Feb 25, 2020

Hello,
how to make this script recursive to process folders and subfolders
thank you in advance
chris

@pirate
Copy link
Author

pirate commented Feb 27, 2020

It's already recursive @didwedo.

@didwedo
Copy link

didwedo commented Feb 27, 2020

@pirate: not completely, it makes errors and cannot rename the files because it has already renamed the folder.

@pirate
Copy link
Author

pirate commented Feb 28, 2020

@tylerismith
Copy link

Thanks! On two different Ubuntu machines I tried the script fails with:

tyler@server-pc:~$ ./strip_bad_filename_characters.sh 
./strip_bad_filename_characters.sh: command substitution: line 37: syntax error near unexpected token `|'
./strip_bad_filename_characters.sh: command substitution: line 37: `    | grep ".*[^$allowed_characters].*"                                # filter for filenames containing characters allowed the specified charset'

Condensing the badFiles variable down to one line resolves it for me, ex:

badfiles=$(find "$folder" -name '*' | grep ".*[^$allowed_characters].*" | awk '{ print -length, $0 }' | sort -n -s | cut -d" " -f2-)

@solracsf
Copy link

solracsf commented Sep 29, 2022

Try this:

echo Renée | iconv -f UTF-8 -t ASCII//TRANSLIT
Renee

@Julianoe
Copy link

I'm on Linux Manjaro 22 and like @tylerismith I had to condense the badFiles variable to make it work

If I'm not mistaken if someone wants to use this without recursion it needs to replace find "$folder" -name '*' with find "$folder" -maxdepth 1 -name '*'

@maltokyo
Copy link

maltokyo commented Apr 16, 2023

I adapted this to also transliterate non-Latin characters (in this case Cyrillic/Russian characters) to latin characters. Here is my version if that is interesting for anyone.
(without this, the Cyrillic charcters were just removed as "special" characters, which resulted in many filenames being the same! So, I thought why not keep the info, just in latin chars instead)

#!/usr/bin/env bash
# Recursively remove all special characters from filenames by renaming them to their ASCII normalized forms.
#
# By default it does a dry run, to actually move the files uncomment the `mv -vi ...` line.
#
# This is useful for cleaning up network shares that will be shared via SMB/NFS between Unix/macOS/Windows
# where non-ASCII filenames can sometimes cause "file does not exist" errors when trying to access the files.
#
# This script removes leading/trailing whitespace in filenames and replaces accents and non-english 
# characters with their ASCII equivalent, if no ASCII equivalent exists, it removes the character e.g.:
#    some_name_í.txt -> some_name_i.txt
#    some_name_á.txt -> some_name_a.txt
#    some_name_é.txt -> some_name_e.txt
#    some_name_^.txt -> some_name_.txt
#    some_name_🐞.txt -> some_name_.txt
#    some_name_в.txt -> some_name_v.txt

IFS=$'\n'

folder="."
allowed_characters="a-zA-Z0-9_\. \/()-"
normalize_cmd="
import re
from transliterate import translit
normalized = translit(input(), 'ru', reversed=True)
stripped = re.sub('[^$allowed_characters]', '', normalized)
print(stripped)"

badfiles=$(find "$folder" -name '*' | grep ".*[^$allowed_characters].*" | awk '{ print -length, $0 }' | sort -n -s | cut -d" " -f2-)

for path in $badfiles; do
    oldpath="$path"
    newpath=$(echo "$oldpath" | python3 -c "$normalize_cmd")
    echo "From: $oldpath"
    echo "To:   $newpath"
    mv -vi -- "$oldpath" "$newpath"
    echo "--------------------------------------------"
done

@pabloab
Copy link

pabloab commented Jul 22, 2023

Check detox, replace problematic characters in filenames. More features on apt show detox.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment