Typosquatting and phishing are quite a headache for businesses. Creating fake websites that look almost identical to legitimate ones has become a piece of cake for attackers. Experts can spot those fakes, but not everyone is an expert or has the time. In the US alone, businesses lose almost $2 billion each year because their clients are getting phished. These attacks are hitting the jackpot mainly because users lack awareness of the danger.
Typosquatting is an unforeseen major deal for mid-sized businesses. Imagine you’ve worked hard to build a brand and a solid online presence. Typosquatters are sneaky troublemakers who register domain names that are similar to yours but with typos or misspellings. They do this to trick customers into visiting their bogus websites instead of yours. This leads to confusion, loss of trust, or they might steal your customers’ data or money!
Dealing with these attacks is a real challenge. They prey on our weaknesses, making it hard to fend them off completely. The most common method to detect phishing websites is the “blacklist” approach. They update a database with blacklisted URLs and IPs (such as Phishtank).
In our search for an all-in-one solution, we’ve come across several projects. Some (such as Opensquat) simply grab a list of URLs/domain names from an online domain source and look for substrings in them and try to determine if they’re phishing sites or not. Others (such as dnstwist) have tried a permutation or heuristic-based detection method to catch these attacks. While these methods offer promise, their Achilles’ heel lies in the realm of time complexity. As the number of permutations and patterns grows, so does the analysis time, potentially leading to delayed detection and response to real threats. Moreover, the computational resources required for such analysis can strain infrastructure, causing performance issues. Balancing innovation with efficient algorithm design becomes crucial; prioritizing algorithms with better time complexity could hold the key to a more effective and responsive phishing detection system.
AntiSquat is our attempt in trying to see the typosquatting problem differently. Our aim is to complement and assist existing methods. It leverages AI techniques such as natural language processing (NLP), large language models (ChatGPT) and more to empower detection.
What sets AntiSquat apart
- Large Language Model / ChatGPT integration
AntiSquat takes a fresh perspective on tackling the challenge of typosquatting. It serves as a complementary and enhancing approach to existing methods by harnessing the power of AI techniques like natural language processing (NLP) and large language models (LLMs), ChatGPT in this case. By leveraging these advanced technologies, AntiSquat aims to strengthen the detection process. It utilizes NLP to understand how words are used in language, and LLMs like ChatGPT to generate various domain name variations efficiently. This combined approach boosts the ability to identify and counter the deceptive tactics employed by cybercriminals.
- Image processing and optical character recognition
Using Selenium, AntiSquat renders a webpage as a user would see it, then tries extracting all available text in the final version of the page. Some phishers use image assets to prevent phishing detection, so ChatGPT also uses OpenCV and Tesseract OCR to extract words from within images. Words collected from phishing sites are compared to the words collected from the original site, using algorithms such as Levenshtein distancing, which helps the tool generate a similarity index for sites.
- Contact information provisioning
AntiSquat integrates with GoDaddy and Whois to try and determine if domains are available for sale. It also tries extracting intelligence such as contact information (emails and phone numbers) from site pages so that organizations can contact domain owners. This is helpful if the domain is available for sale privately. Organizations can use this data to buy out these domains in bulk to protect their users.
How to use
• Clone the project via
git clone https://github.com/redhuntlabs/antisquat.
• Install all dependencies by typing
pip install -r requirements.txt.
• Get a ChatGPT API key at https://platform.openai.com/account/api-keys
• Create a file named
.openai-key and paste your chatGPT API key in there.
• (Optional) Visit https://developer.godaddy.com/keys and grab a GoDaddy API key. Create a file named
.godaddy-key and paste your Godaddy API key in there.
• Create a file named ‘domains.txt’. Type in a line-separated list of domains you’d like to scan.
• (Optional) Create a file named
blacklist.txt. Type in a line-separated list of domains you’d like to ignore. Regular expressions are supported.
• Run antisquat using
python3.8 antisquat.py domains.txt
Let’s say you’d like to run antisquat on amazon.com.
Start by typing
amazon.com in domains.txt. Then run
python3.8 antisquat.py domains.txt.
AntiSquat generates several permutations of the domain, iterates through them one by one and tries extracting all contact information from the page.
AntiSquat is our attempt at taking a crack at the menace that is typosquatting and phishing. It is a long line of tools that are meant to complement the approach, not challenge it. Of course, our solution isn’t fool-proof, but it is a demonstration of yet another way of solving this problem using AI to help empower cybersecurity processes like this by providing practical and statistically inferred ways of looking at problems, as opposed to spending time brute forcing a solution.
How you can help
Since AntiSquat is an open-source project, we appreciate contributions from the community. Feel free to make pull requests with features, raise issues and feature requests on GitHub. You can also contact us about it.
Contributing to BucketLoot is straightforward. Simply follow these steps:
- Fork the Antisquat repository on GitHub.
- Implement your changes or additions locally.
- Create a new branch for your changes.
- Commit your changes with descriptive messages.
- Push the changes to your fork.
- Finally, submit a Pull Request (PR) to the main Antisquat repository.
Our team at RedHunt Labs will review your contributions promptly. Collaborating with the information security community allows us to address a broader range of security challenges and deliver a more powerful and effective tool.
We at RedHunt Labs help organizations discover untracked assets, data exposure, and external attack surface with NVADR, an all-in-one attack surface management SaaS solution.
New attack vectors and vulnerabilities keep originating quite often and might affect one (or many) assets across your organization. During such times, having a precise external asset inventory makes it easy to scan for systems affecting the newly published vulnerability.
NVADR also ‘continuously’ enumerates and lists all the technologies used across your external attack surface and thus helps identify affected assets right away. Don’t hesitate to get in touch with us to schedule your free trial today.