A 15-year-old Python flaw found in "over 350,000" projects • The Register

A 15-year-old Python flaw found in “over 350,000” projects • The Register

At least 350,000 open source projects are believed to be potentially vulnerable to exploitation by a Python module flaw that has remained unpatched for 15 years.

On Tuesday, security firm Trellix said its threat researchers had encountered a vulnerability in Python. tarfile module, which provides a way to read and write packages of compressed files known as tar archives. At first, bug hunters thought they happened on a zero day.

It turned out to be a roughly 5,500-day problem: the bug has been living its best life for the past decade and a half while it waits for extinction.

Identified as CVE-2007-4559, the vulnerability was discovered on August 24, 2007, in a post on the Python mailing list by Jan Matejek, who was the Python package maintainer for SUSE at the time. It can be exploited to overwrite and hijack files on a victim’s computer when a vulnerable application opens a malicious tar archive via tarfile.

“The vulnerability is basically like this: if I drag a file named "../../../../../etc/passwd" and then do the administrator untar it, /etc/passwd is overwritten,” Matejek explained at the time.

The tarfile directory traversal bug was reported on August 29, 2007 by Tomas Hoger, a software engineer at Red Hat.

But she had already been approached, sort of. A day earlier, tarfile module maintainer Lars Gustäbel committed a code change that adds a true default. check_paths parameter and a helper function for TarFile.extractall() method that throws an error if a tar archive file path is unsafe.

But the fix did not address TarFile.extract() method — which Gustäbel said “shouldn’t be used at all” — and left open the possibility that extracting data from untrusted archives could cause problems.

In a comment thread, Gustäbel explained that he no longer considers this a security issue. “tarfile.py does nothing wrong, its behavior conforms to the pax definition and POSIX pathname resolution guidelines,” he wrote.

“There is no known or possible practical exploitation. and [updated] documentation with a warning that extracting archives from untrusted sources could be dangerous. This is the only thing to do IMO”.

Indeed, the documentation describes this weapon:

Warning: Never extract archives from unsafe sources without prior inspection. Files may be created outside wayfor example members that have absolute filenames beginning with "/" or file names with colons "..".

And yet here we are, with both extract() and extractall() still posing the threat of an arbitrary crossing of paths.

“The vulnerability is a path traversal attack in extract and extractall functions in the tarfile module that allow an attacker to overwrite arbitrary files by appending the sequence “..” to filenames in a tar archive,” Kasimir Schulz, a vulnerability researcher for Trellix, explained in a blog post.

The sequence “..” changes the current working path to the parent directory. So, using code like the six-line snippet below, Schulz says, tarfile the module can be told to read and modify the file’s metadata before it is added to the tar archive. And the result is exploitation.

import tarfile

def change_name(tarinfo):
    tarinfo.name = "../" + tarinfo.name
    return tarinfo

with tarfile.open("exploit.tar", "w:xz") as tar:
    tar.add("malicious_file", filter=change_name)

According to Schulz, Trellix created a free tool called Creosote to scan for CVE-2007-4559. The software has already found the bug hidden in applications such as Spyder IDE, an open-source scientific environment written for Python, and Polemarch, an IT infrastructure management service for Linux and Docker.

The company estimates tarfile the flaw can be found “in over 350,000 open-source projects and predominantly in closed-source projects.” It also points out that tarfile is a default module in any Python project and is present in frameworks created by AWS, Facebook, Google, and Intel, and in applications for machine learning, automation, and Docker containers.

Trellix says it is working to make the fixed code available to affected projects.

“Using our tools, we currently have patches for 11,005 repositories ready for pull requests,” Charles McFarland, vulnerability researcher for Trellix, explained in a blog post. “Each patch will be added to a forked repository and a pull request will be made over time. This will help individuals and organizations alike to become aware of the problem and fix it with a single click.

“Due to the size of the vulnerable projects, we expect to continue this process over the next few weeks. It is expected to reach 12.06% of all vulnerable projects, just over 70,000 projects to completion”.

The remaining 87.94% of affected projects may want to consider other possible options. ®

Leave a Comment

Your email address will not be published.