Clear Sky Science · en

Deep learning enabled pseudonymization for preserving data privacy of financial identifiers in public documents in India

2026-02-10 · Back to index

Why your signature on an ID card is at risk

Most of us sign our names on government ID cards, bank forms, and tax documents without thinking that those looping lines can be copied, forged, or mined by hackers. As more offices scan and share these documents online, handwritten signatures—still treated as legally binding in many places—have become an appealing target for identity theft. This paper explores a new way to hide signatures on Indian tax ID cards while still keeping the documents useful for record keeping, audits, and even future security checks.

Turning real signatures into safe stand-ins

The authors focus on India’s Permanent Account Number (PAN) card, widely used for financial transactions and tax filing. These cards increasingly appear in emails, cloud drives, and public submissions, where exposed signatures can be copied or printed onto fake documents. Simply blurring or blacking out the signature protects privacy but destroys the document’s value for later verification or investigation. Instead, the researchers use a strategy called pseudonymization: the original signature is detected and replaced with a synthetic look‑alike that keeps the position and structure of the mark, but no longer matches the real person’s handwriting closely enough to be misused.

How a smart vision system finds what to hide

To automate this process, the team builds on a deep‑learning model known as SuperPoint, originally designed to find important points in images—like corners and edges—that stay reliable even if the image is noisy, tilted, or slightly blurred. The method first preprocesses PAN card scans by resizing them and converting them to grayscale to simplify computation. It then isolates the region containing the signature. Inside that region, the SuperPoint network acts like a specialized magnifying glass: one part of the network produces a heatmap showing where distinctive pen strokes lie, and another part generates compact numerical descriptions of those strokes. This combination lets the system pinpoint exactly which parts of the handwriting are most distinctive, and therefore most dangerous to leave exposed.

From strokes and keypoints to masked marks

Once the important locations in the signature are identified, the system replaces them with neutral shapes that preserve the overall look of a signed area without revealing the personal style of the writer. Instead of storing the original ink pattern, the model relies on abstract feature maps—mathematical summaries of where the key points were—making it far harder for an attacker to reconstruct the true signature. The authors also use a tool called Kornia to turn the network’s raw outputs into precise coordinates, scales, and orientations, helping ensure that the masked region aligns cleanly with the original signature area and works across different card layouts and scanning qualities.

How well the new approach stacks up

The framework is tested on more than 500 real PAN card images collected from open datasets, covering many handwriting styles and card designs. Its performance is compared against widely used traditional feature‑finding methods—ORB, FAST, and SIFT—as well as a deep residual network. The researchers measure how accurately the system finds signature details, how close the masked document remains to the original in appearance, and how much computing power and storage are required. Their method achieves high precision and recall in locating the crucial parts of the signatures and reaches a structural similarity score of about 97 percent, meaning the pseudonymized cards look almost identical to the originals except for the protected marks. At the same time, it uses a moderate number of keypoints and compact descriptors, striking a balance between accuracy, speed, and memory use.

What this means for everyday privacy

For non‑specialists, the key message is that it is now possible to automatically shield one of the most sensitive elements on an ID card—your handwritten signature—without turning the document into a useless blacked‑out rectangle. By replacing real signatures with carefully constructed stand‑ins, the proposed system lets governments and organizations share, store, and analyze scanned IDs while greatly reducing the risk of forgery and identity theft. The authors suggest that similar deep‑learning tools could be built into public‑sector document workflows, helping countries meet modern privacy rules such as GDPR, and could eventually extend beyond PAN cards to passports, licenses, and other identity documents worldwide.

Citation: Roopalakshmi, R., Kailas, S. & Sreelatha, R. Deep learning enabled pseudonymization for preserving data privacy of financial identifiers in public documents in India. Sci Rep 16, 8120 (2026). https://doi.org/10.1038/s41598-026-39309-6

Keywords: signature privacy, identity protection, document anonymization, deep learning security, government ID cards