March 23-27

CSAM and Hash-Matching — Tech Tool for Catching Creeps

By David Baker

Child sexual abuse material, or CSAM, presents a host of difficult challenges in law enforcement. Investigators are tasked not only with identifying offenders, but also with protecting victims whose abuse is recorded and repeatedly shared online. Every file represents real harm, and every re-upload can retraumatize victims.

For years, building a case for CSAM possession or production required investigators to manually review and catalog thousands (or even millions) of images and videos. That work was slow, emotionally taxing, and, in many ways, compounded the harm of abuse by repeatedly exposing the material.

Today, technology is helping to change that reality. One of the most important tools in this effort is hash-matching, a method adopted from cryptography that allows investigators and platforms to identify known CSAM quickly, accurately, and with far less human exposure.

NOTE: The term “child pornography” has largely been replaced in criminal justice by “child sexual abuse material” (CSAM) to more accurately portray the reality of the content. “Pornography” implies voluntary participation, which is simply not possible in cases involving children. “CSAM” emphasizes that these images and videos document abuse and exploitation, and that every file represents a victim (or multiple victims). The shift in terminology is intended to emphasize the harm, avoid minimizing the offense, and reinforce that CSAM is evidence of a crime, not a category of media.

What Is a Hash?

Whether they know it or not, most people rely on the hashing process just about every day.

Hashing is a process that creates a “digital fingerprint” of digital content, whether it’s a text string, a document, a photo, or a video. That fingerprint, or hash, is a unique string of numbers and letters generated by a one-way mathematical function.

A common example is password security. You may not realize it, but your bank does not actually store your password. Instead, when you create a password, the system hashes it and stores the resulting value. Later, when you log in, the system hashes the password you type in and compares it to the stored hash. If the hashes match, you are granted access.

The same concept applies to photos, videos, and other kinds of files.

  • Hashing takes an input, such as a string, document, image, or video, and creates a unique, one-way identifier.
  • If anything in the original file changes, even a single letter, a single pixel, or a single frame of video, the resulting hash will be different.
  • Importantly, hashing is a one-way process. You cannot recreate the original file, or any part of it, from the hash.

Traditional hashes, like MD5 or SHA, are used to detect exact duplicates. More advanced systems, such as perceptual hashing, can identify files that are visually similar, even if they have been resized, compressed, or slightly altered. This can account for cropping or watermarking images or dropping frames from video files in hopes of avoiding detection.

One well-known example is Microsoft’s PhotoDNA, developed in partnership with Dartmouth College, which creates a non-reversible signature for images that can still be matched despite common modifications.

Pro tip: It’s best not to Google “test the quality of my hash” or similar phrases from your work computer. Enough said.

How Are Hashes Used in Law Enforcement?

Not many years ago, CSAM investigations required extensive manual review. Investigators had to open, view, listen to, describe, and catalog every suspected file to determine whether it met the legal definition of child sexual abuse material and could be used as evidence.

That process had two major consequences.

First, it contributed to the re-victimization of children, as the same material was viewed and re-viewed repeatedly across investigations and jurisdictions. Second, it exposed investigators to deeply traumatic content on a regular basis.

Hash-matching has fundamentally changed that workflow.

Today, once a file has been confirmed as CSAM by a trusted entity, it is hashed and the resulting digital fingerprint is added to a vetted database. When law enforcement or a technology platform encounters a file, it can hash that file and compare the result against law enforcement databases.

If there is a match, the system can then flag the file for follow-up by law enforcement. This approach saves time, reduces harm, and allows investigators to dedicate the majority of their time to identifying new victims and offenders.

However, hash-matching is not a complete solution. It only works for known material. Newly created CSAM, or files that have been significantly altered, still require detection through other methods, including human review and machine-learning tools.

“When in doubt, get the warrant. This remains the safest legal approach in uncertain situations.”

Who Is Involved in Hash-matching?

Hash-matching is not controlled by a single organization. Rather, it is a collaborative ecosystem involving nonprofits, technology companies, and law enforcement agencies.

National Center for Missing & Exploited Children (NCMEC). NCMEC serves as the central clearinghouse in the United States. It vets reported material, generates hashes, and maintains a large database of known CSAM. NCMEC distributes hashes through secure systems so that platforms and law enforcement can detect known material without accessing the underlying content. The organization also operates the CyberTipline, the nation’s reporting system for suspected child exploitation.

As of 2024, NCMEC has shared more than 9.8 million hashes with dozens of electronic service providers and other partners. It also runs “Take It Down,” a service designed to help remove non-CSAM explicit images of minors.

Microsoft. Microsoft developed PhotoDNA in 2009 and donated it to NCMEC. The technology remains one of the most widely used perceptual hashing tools. PhotoDNA converts images into robust, non-reversible hashes that remain stable even when images are altered. Microsoft makes this technology available to qualified organizations, nonprofits, tech companies, and law enforcement.

Google. Google provides several tools that support hash-matching at scale, particularly for video. CSAI Match helps identify known CSAM in high-volume video environments. Additional tools, such as pHash and PDQ, support image matching and analysis. Google also contributes to shared hash databases and reports detected CSAM to NCMEC.

Law Enforcement and Global Partners. In the United States, law enforcement agencies use hash-matching extensively in digital forensics. When devices are seized, forensic tools can automatically scan files and identify known CSAM almost instantly. This reduces the need for manual review and helps investigators focus on new material and victim identification.

Internationally, organizations such as INTERPOL, the Internet Watch Foundation, and the INHOPE network maintain parallel databases and collaborate across jurisdictions. Programs like Project VIC help standardize categorization and improve investigative workflows.

What About Privacy Laws?

The use of hash-matching sits at the intersection of technology and constitutional law. In the United States, the Fourth Amendment protects individuals against unreasonable searches and seizures. When CSAM is suspected on a private device, law enforcement generally needs a search warrant to access and confirm its presence.

The situation becomes more complex in cloud-based environments, such as Google Drive, Microsoft OneDrive, or Dropbox.

A helpful analogy is a storage unit. A facility manager typically cannot enter a rented unit without legal justification. However, rental agreements and legislation may allow entry under certain conditions.

Similarly, cloud service providers often include terms of service that allow them to scan for illegal content, including CSAM. This scanning often occurs during upload process — before a file is fully stored — or through automated systems that monitor content.

The Private Search Doctrine

The private search doctrine is a Fourth Amendment principle that limits when government action becomes a “search” requiring a warrant. While the Fourth Amendment protects against unreasonable searches by government actors, it doesn’t apply to searches conducted by private individuals acting on their own. In United States v. Jacobsen (466 U.S. 109 (1984)), the U.S. Supreme Court held when a private party conducts a search that turns up something illegal, law enforcement may replicate that search without a warrant, so long as it does not exceed the scope of what the private party already revealed.

The key limitation is scope. Courts ask whether law enforcement learned anything new beyond what the private search already turned up. If officers go further, such as opening additional files, containers, or digital folders that the private party did not examine, the Fourth Amendment is implicated and a warrant is generally required. This question becomes more complicated with digital searches because devices like computers and smartphones can store vast amounts of personal data.

In CSAM and hash-matching investigations, the doctrine has become a central legal battleground. When a technology company uses automated tools to identify suspected CSAM and reports it to authorities, courts must decide whether a subsequent law enforcement review stays within the scope of the initial, private search. However, courts are divided on how this applies to hash-matching.

Circuits Requiring a Warrant

Some federal courts have held that a hash match alone does not negate a user’s reasonable expectation of privacy in an unopened file. These include the 2nd Circuit, the 4th Circuit, and the 9th Circuit. Here are some important cases:

United States v. Maher (2nd Circuit)

In January 2020, Ryan Maher uploaded an image file to a Google email account. Google’s system analyzed the file and discovered it had the same hash value as known CSAM image. Google tipped off NCMEC, which forwarded the unopened file to the New York State Police. A police investigator opened and visually examined the file without first obtaining a warrant.

Since Google had performed only a hash match on Maher’s file, not a human viewing of that file, the Second Circuit held the investigator’s later visual inspection went beyond the scope of the private search and therefore required a warrant. Curiously, the court also allowed the evidence to stay in because investigating officers acted in good faith under existing laws at the time.

United States v. Lowers (4th Circuit)

In 2019 to 2020, Google’s hash-matching system flagged over 150 files uploaded to Nico Lowers’s Google Drive account as potential CSAM. Google referred the information NCMEC, which tipped off law enforcement. A detective in Virginia later opened several files, some of which had not been previously viewed by any Google employee, without a warrant. The resulting investigation led to Lowers’s arrest and conviction in North Carolina.

After Lowers appealed, the 4th Circuit held a hash match alone does not trigger the private search doctrine. Users retain a reasonable expectation of privacy in cloud-stored files, the court said, and a hash value is merely “raw data” that reveals nothing about the file’s contents. Because no private party had visually inspected the specific files opened by law enforcement, the officer’s warrantless viewing exceeded the scope of any prior private search and violated the Fourth Amendment.

United States v. Wilson (9th Circuit)

In 2015, using automated hash-matching algorithms, Google determined Wilson had uploaded CSAM to his Google account as email attachments. Google alerted NCMEC, though no one at either organization opened or viewed Wilson’s attachments. The tip was sent to the San Diego Internet Crimes Against Children Task Force, and an investigator viewed the attachments without first obtaining a warrant. The task force used that viewing to obtain warrants and charge Wilson.

On appeal, the 9th Circuit held the government search exceeded the scope of the antecedent private search because law enforcement learned new, critical information by opening attachments no human at Google or NCMEC had viewed. Also, the government did not prove “virtual certainty” that the previously viewed Google images were exact duplicates of Wilson’s files.

U.S. v. Wilson is one of the clearest appellate decisions rejecting warrantless police viewing based solely on automated hash-based detection.

Circuits Allowing Warrantless Viewing

Other federal courts have reached a conclusion diametrically opposite to the ones above. These include the 5th and 6th Circuits. Here are two important cases:

United States v. Reddick (5th Circuit)

Henry Reddick uploaded image files to Microsoft SkyDrive in early 2015. Using PhotoDNA and hash-matching, SkyDrive determined some of the uploaded files matched known CSAM images. Microsoft sent the files to NCMEC, which forwarded them to the Corpus Christi Police Department in Texas.

The 5th Circuit treated Microsoft’s hash-based identification as a private search, and held law enforcement’s later review of the images did not intrude on any privacy interest beyond what the private search had already exposed. The case is important because it endorses the view that highly reliable hash matching can justify warrantless police viewing of the matched files under the private-search doctrine. At least, in the 5th Circuit.

United States v. Miller (6th Circuit)

Also in 2015, Google used automated hash-matching to determine a Gmail account had uploaded potential CSAM. The company sent the files and other information to NCMEC, which traced the IP information to a user in Fort Mitchell, Kentucky. The report was forwarded to local investigators. An investigator with the Kenton County Police Department opened and viewed two files, then traced the upload to William Miller, the account’s owner. Miller was charged with possession of CSAM.

Unlike in Lichtenberger, the 6th Circuit determined Google’s hash-value matching gave investigators “virtual certainty” the files exactly matched already confirmed CSAM, so the investigator’s viewing did not exceed the scope of the prior private search. This case places the 6th Circuit squarely on the side of allowing warrantless viewing of hash-matched files, deepening the circuit split with cases like Wilson and later Maher.

United States v. Lichtenberger (6th Circuit)

In November 2011 in Cridersville, Ohio, after Aron Lichtenberger was arrested for failing to register as a sex offender, his girlfriend Karley Holmes accessed his password-protected laptop and discovered CSAM. Holmes notified police of her discovery. An investigator responded to the home and watched over Holmes’ shoulder as she opened folders and clicked through images on the laptop.

This is not a hashing case, but rather a test of the private-search doctrine in the digital context. The 6th Circuit held the investigator’s review of the materials exceeded the scope of Holmes’ private search because he lacked the required “virtual certainty” that he was viewing only the exact files Holmes had already seen. The case has become a foundational digital-search precedent, setting the “virtual certainty” standard for deciding whether police investigators viewing digital files stays within or exceeds a prior private search.

The split between circuits matters. In practice, whether officers need a warrant before viewing a hash-matched file may depend on the jurisdiction. Issues like this often lead to eventual review by the U.S. Supreme Court. We’ll be monitoring these and similar cases in case certiorari is granted and the Court decides to weigh in on this important constitutional issue.

Timely legal analysis on law enforcement-related cases: SUBSCRIBE NOW!

Practical Takeaways for Law Enforcement

For agencies and investigators working CSAM cases, hash-matching is an essential tool, but it requires both technical and legal understanding. Here are eight practical takeaways for law enforcement:

  1. Understand the technology. It’s critical to know the difference between exact and perceptual hashes, and what each can (and cannot) detect.
  2. Leverage trusted databases. Use vetted hash sets from organizations like NCMEC and IWF when investigating potential CSAM.
  3. Reduce exposure. Use hash-matching to minimize unnecessary viewing of harmful material. This protects both victims and investigators.
  4. Prioritize new content. Focus investigative resources on previously unseen material and victim identification.
  5. Integrate forensic tools. Ensure digital forensics platforms are configured to use current hash lists.
  6. Know your jurisdiction. Stay informed about how your circuit interprets hash-matching and the private search doctrine.
  7. Document your process. Maintain clear records of how matches were identified and handled.
  8. When in doubt, get the warrant. This remains the safest legal approach in uncertain situations.

A Practical Tool for a Difficult Job

Hash-matching has transformed how law enforcement and technology platforms combat CSAM. By turning known abuse material into non-reversible digital fingerprints, investigators can quickly identify and stop the spread of that material quickly and at scale.

It doesn’t replace human judgment, obviously, and it doesn’t solve the problem entirely. But it significantly reduces re-victimization, protects investigators, and allows agencies to focus on what matters most: identifying victims, stopping offenders, and preventing further harm.

Cyber Tips Are Subject to the Private Search Doctrine

The Fourth Amendment does not restrict private citizens
Read More
David Baker

About the Author

DAVID BAKER is senior manager of content marketing at Lexipol. He's a marketing communications professional with a strong background in writing, editing, and content development. Other areas of expertise include lead generation, digital marketing, thought leadership, and marketing analytics. When he's not wrangling content for the Lexipol blog, he is an avid road racer and trail runner. David has completed over 40 marathons, including five of the six “world majors” (Boston, Chicago, New York City, Berlin, and Tokyo). He recently completed a one-day rim-to-rim-to-rim crossing of the Grand Canyon. David is the proud father of a police officer son.

More posts by David Baker

Related Posts

You May Also Like...