Given a directory of image files that all have the .jpg extension regardless of their image type we need to rename their extension based on the “actual” image type using bash.

We can use the file command to view the “actual” image type of each file.

$ file 1.jpg 
1.jpg: JPEG image data, EXIF standard 2.21

This looks easy enough we can just extract the extension from the 2nd column of the output, right?

Well, not quite. If a filename contains spaces we cannot rely on “column” position.

$ file 1\ 2.jpg 
1 2.jpg: GIF image data, version 89a, 500 x 715

Well, how about we use : as the “field delimiter”?

Again, if a filename contained : it would mess that up.

$ file dir/3\:4.jpg 
dir/3:4.jpg: PNG image, 2000 x 938, 8-bit grayscale, non-interlaced


Generally it is safe to assume that “using bash” just means something that can be run from the command-line and as parsing the output of the file command looks to be problematic we will outsource this problem to Python.

Python has the imghdr module which “determines the type of image contained in a file or byte stream”.

It also has os.walk which can process a directory tree recursively.

Here’s what the code looks like.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
import imghdr, os start_dir = '.' for root, dirs, files in os.walk(start_dir): for f in files: old = os.path.join(root, f) what = imghdr.what(old) if what: if what == 'jpeg': what = 'jpg' new = os.path.splitext(f)[0] new = os.path.join(root, new + '.' + what) print('---') print('old:', old) print('new:', new) os.rename(old, new) print('---')

os.walk() returns 3 things

  • root being the path of the current directory being processed
  • dirs being a list of the names of the directories contained in the current directory
  • files being a list of the names of the files contained in the current directory

So we loop through files on line 6 but as those are just the filenames we need to use os.path.join() to give us the full path.

>>> os.path.join('path/to/dir', 'filename')

The imgur.what() method can take a filepath and it will return None if it is not an image.

None is a falsey value in Python meaning that if what: will only be True for image files.

imgur.what() will return jpeg for “JPEG data in JFIF or Exif formats” but we don’t want to use jpeg as the file extension.

This is the reason for the check on line 11 although we could instead use continue to move to the next loop iteration i.e.

1 2
if what == '.jpg': continue

After getting the new extension we need need to generate the new filepath to use in our os.rename() command which we do on lines 13-14 with os.path.splitext() which allows us to extract the original filename without the extension.

>>> os.path.splitext('')
('', '.jpg')
>>> os.path.splitext('')[0]

Which we then add the new extension to and use os.path.join() once again to give the final full path.

Let’s run the code and check the output.

$ python 
old: ./1 2.jpg
new: ./1 2.gif
old: ./1.jpg
new: ./1.jpg
old: ./dir/3:4.jpg
new: ./dir/3:4.png

Let’s use find to see if the files were actually renamed.

$ find
./1 2.gif