Skip to content

Conversation

@TytoCapensis
Copy link

Currently, when analyzing an email with eml_parser, the domains appearing in the body of the email are given in the output, like this:

"domain": [
	"b2b.parallels.com",
	"click.parallels.com",
	"coronavirus.data.gov.uk"
],

However, a possible issue here is that the full domains are listed, including the subdomains part. This can make the identification of entities and actors complicated if a lot of subdomains are present in the domain table.

This commit takes the opportunity to use publicsuffixlist (already used in eml_parser) to add a table named domain_registered in the data returned by an eml_parser analysis.

The domains in domain_registered are the true registered domains, i.e. the "closest" domains to the TLD. Thanks to publicsuffixlist, public suffixes like co.uk or co.jp can be taken into consideration.

Now, the output looks like this:

"domain": [
	"b2b.parallels.com",
	"click.parallels.com",
	"coronavirus.data.gov.uk"
],
"domain_registered": [
	"parallels.com",
	"data.gov.uk"
],

Do not hesitate to suggest any improvements (especially regarding the name of the table)

@TytoCapensis TytoCapensis changed the title Added domain_registered table in result data Adding list of actual "registered" domains in result data Aug 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant