Forum Discussion
A way to filter gibberish Emails and Usernames
Identifying gibberish or fake accounts based on patterns in email names or usernames can be challenging, as there are countless variations. However, you can use some Excel functions and formulas to create a set of rules that might help in flagging potential gibberish.
Here is a basic example to get you started:
Let's assume your email addresses are in column A, and usernames are in column B. You can add a new column (let's say column C) with a formula to check for patterns. Use the following formula in cell C2 and drag it down:
=IF(AND(ISNUMBER(SEARCH("^[A-Za-z]{3,}[0-9]{4,}$", A2)), ISNUMBER(SEARCH("^[A-Za-z]{2,}[0-9]{4,}$", B2))), "Potential Gibberish", "Normal")
This formula checks if both the email and username follow a pattern of starting with letters and ending with numbers. You might need to adjust these patterns based on the specific characteristics of the accounts you consider gibberish.
Explanation of the formula:
- SEARCH("^[A-Za-z]{3,}[0-9]{4,}$", A2): Checks if the email starts with at least 3 letters and ends with at least 4 numbers.
- SEARCH("^[A-Za-z]{2,}[0-9]{4,}$", B2): Checks if the username starts with at least 2 letters and ends with at least 4 numbers.
- AND(ISNUMBER(...), ISNUMBER(...)): Checks if both conditions are true.
- If both conditions are true, it returns "Potential Gibberish"; otherwise, it returns "Normal".
Feel free to adjust the patterns in the SEARCH functions based on your observations of gibberish accounts. This is just a starting point, and you may need to refine the criteria based on the specific characteristics of the accounts you encounter. The text was revised with the AI.
My answers are voluntary and without guarantee!
Hope this will help you.
Was the answer useful? Mark as best response and like it!
This will help all forum participants.
Thankyou NikolinoDE! Is there a way to make it also check for gibberish names that dont have numbers in them like dxus or gjdfvxbtgvxcg?
- NikolinoDENov 15, 2023Platinum Contributor
To check for gibberish names that don't have numbers, you can modify the formula to include a condition for names without numbers.
Here is an updated formula:
=IF(AND(
OR(ISNUMBER(SEARCH("^[A-Za-z]{3,}[0-9]{4,}$", A2)), ISNUMBER(SEARCH("^[A-Za-z]{2,}[0-9]{4,}$", B2))),
NOT(OR(ISNUMBER(SEARCH("[0-9]", A2)), ISNUMBER(SEARCH("[0-9]", B2))))
),
"Potential Gibberish",
"Normal"
)
Explanation:
- OR(ISNUMBER(SEARCH("^[A-Za-z]{3,}[0-9]{4,}$", A2)), ISNUMBER(SEARCH("^[A-Za-z]{2,}[0-9]{4,}$", B2))): Checks if either the email or username follows the pattern of starting with letters and ending with numbers.
- NOT(OR(ISNUMBER(SEARCH("[0-9]", A2)), ISNUMBER(SEARCH("[0-9]", B2)))): Checks if neither the email nor the username contains any numbers.
- AND(..., ...): Checks if both conditions are true.
If both conditions are true, it returns "Potential Gibberish"; otherwise, it returns "Normal".
This formula now considers accounts with names that don't have numbers as potential gibberish. Adjustments can be made based on your specific observations and patterns of gibberish names.
- KumaMatataNov 15, 2023Copper ContributorFor either version I cannot seem to get it to "trigger" it to mark an account as "Potential Gibberish". For example the following variations I have tried:
EMAILS (Assume there is some sort of domain after the @):
kouoi26@
cdsa@
adsvgabaw@
gregfdf651dhedr@
jsrtdgftf23@
Usernames:
Nm1234
J23w2365
random_name1
I tried brute forcing a change to "Potential Gibberish" but nothing I did changed it. Is there something I am missing?- NikolinoDENov 15, 2023Platinum Contributor
It looks like the pattern might need some adjustments to better match the criteria you're looking for. The key is to refine the regular expression to capture the patterns you consider as potential gibberish. Based on the examples you provided, here's a modified formula:
This formula checks for two patterns:
=IF( AND( OR(ISNUMBER(SEARCH("^[A-Za-z]{2,}[0-9]{1,}@[a-zA-Z]+\.[a-zA-Z]+$", A2)), ISNUMBER(SEARCH("^[A-Za-z]{1,}[0-9]{1,}$", B2))), NOT(OR(ISNUMBER(SEARCH("[0-9]", A2)), ISNUMBER(SEARCH("[0-9]", B2)))) ), "Potential Gibberish", "Normal" )- For Emails:
- Starts with at least 2 letters.
- Followed by at least 1 number.
- Contains the "@" symbol.
- Followed by at least one letter for the domain.
- Followed by a period "." and at least one more letter for the domain.
- For Usernames:
- Starts with at least 1 letter.
- Followed by at least 1 number.
This formula now accounts for the email structure with a domain part and should flag cases that don't match these patterns as "Potential Gibberish."
You may need to further refine the patterns based on your specific data characteristics. Feel free to experiment and adjust the regular expressions to match your requirements. Regular expressions can sometimes be a bit tricky, and trial and error is often needed to get the exact match you're looking for.