-
-
Notifications
You must be signed in to change notification settings - Fork 78
Plugin Development Guide
The goal of Scrummage is to be a framework in which users can develop plugins and contribute to in a communal fashion. It is evident that the list of plugins you can add to such a framework is endless. Scrummage attempts to narrow it down in a few ways, such as only picking one kind of plugin (i.e. The exploit search plugin - Vulners Search), while there are several exploit databases, Scrummage uses only one to perform it's OSINT searches.
If users of Scrummage are dissatisfied with the included plugins they should be able to fairly easily create a plugin for themselves to use as well as others in the community. Developers are free to develop plugins in their own forked repository and request to have it merged with the master branch subject to a revision process, before approving it and adding it to the list of plugins. The revision process ensure any newly developed plugin follows SSSC - Security, Simplification, Standardisation, and Centralisation. Most of these are achieved by leveraging the surrounding Scrummage framework in the plugin.
This wiki page documents the available, functions and classes and well as their default parameters, in the General.py and Common.py libraries as well as a breakdown of a standard plugin. We realise not all plugins fit into a standard set of requirements, hence we are expanding capability with time, but only as necessary.
Both General.py and Common.py are a collection of classes and functions for broad use. The reason there are two files is due to increasing concerns in functions and classes used by the libraries in the plugins/common directory that existed before the creation of Common.py. For example, the function used to set the date needs to be accessed by all plugins, libraries, as well as the core Scrummage.py file. And it is not a good idea for two libraries to be co-dependant. Libraries can be dependent on other libraries, but if it goes both ways there is the potential for infinite loops. Therefore, the Common.py file has function that are used in both the General.py and the Connectors.py libraries, as the General.py library has dependencies on both Connectors.py and Common.py, it's functions are kept separate so they can safely access both. The Connectors.py library only has a dependency on Common.py, meaning there is a finite end to library imports. This wiki page doesn't cover the functions and classes in the Connectors.py file because none of these functions are called within any plugins directly. Outputs are fed into the General.py library, which in a controlled manner, outputs the data via the Connectors.py library. This will be discussed more in detail below.
[CLASS] Screenshot()
This class is only called by the General.py Output class and the main Scrummage.py file, so it will not be covered in this document as it doesn't impact plugin development.
Get_Limit(Limit)
This function receives the Limit argument fed into the plugin. This function checks if a limit has been provided and is in the correct format. If any of these conditions is not met, it uses the default limit of 10, and returns that value. Otherwise, it returns the limit in the filtered format, required by the plugin.
Logging(Directory, Plugin_Name)
Unfortunately, logging has to be done in the plugin file itself, otherwise, the log file would reflect actions in the General.py library. This function receives a given directory, and the plugin's name. It uses this to construct the name of a log file in a location specific to the plugin. This file name is returned and used as the location to log events from the plugin.
[CLASS] Cache(Directory, Plugin_Name)
Cache files are used to mitigate the risk of plugins overwriting output files, and attempting to add items to the database that already exist. Similarly to the Logging() function described above, this class has an init function that constructs a file in the same directory, based on the required parameters; however, it is a text file for caching, and not a log file. After this the Get_Cache() function can be called to receive Cached Data, and the Write_Cache(Current_Cached_Data, Data_to_Cache) function can be called to update the cached data, or create if no data currently exists. With the required inputs.
Convert_to_List(String) This function simply converts a string with Comma-Separated Values (CSV) to a list format. The primary use of this is to split a task's query into a list of queries, if the query has multiple items in in. For example, a query for the Twitter Search containing "JoeBiden, BarackObama".
Main_File_Create(Directory, Plugin_Name, Output, Query, Main_File_Extension) This function is responsible for creating the main file for a plugin, the main file usually represents the first data retrieved from the 3rd part site that the plugin leverages. For example, in Twitter Search, this file is a JSON file that is returned as a result of searching Twitter for the given query. The Main file doesn't always exist in plugins, but does in most.
Create_Query_Results_Output_File(Directory, Query, Plugin_Name, Output_Data, Query_Result_Name, The_File_Extension) This function is responsible for creating files for each result. For example if we follow the Twitter example, let's say we search for "JoeBiden", with a provided limit of 5. The main file will be the returned JSON data with the last 5 tweets from the account @JoeBiden. The plugin then iterates through the results and makes an HTTP request (using the Request_Handler() function from the Common.py library) for each tweet's link. The returns HTML data is then stored in a query file. As part of this process HTML filtering is leveraged for the best results, which is explained more in depth on the wiki page here.
Data_Type_Discovery(Data_to_Search) This function is quite niche, and is currently used by only one plugin. But essentially it's for any plugin that works by scraping data. This is the process of obtaining data, and iterating through it to understand what is there. The Data_Type_Discovery() function returns a list of discovered content, which can include:
- Hashes (MD5, SHA1, and SHA256)
- Credentials
- Email Addresses
- URLs This function ultimately helps you better understand data.
Make_Directory(Plugin_Name) This function is imperative to all plugins, as it creates the directory all plugin-specific data is stored in. For any new plugin it will create the following directory structure in the <SCRUMMAGE_DIR>/lib/static/protected/output directory:
- {Plugin_Name}/{Year}/{Month}/{Day}
For example, running Twitter Search on the 01/01/2021, will firstly create if it doesn't already exist, and return "twitter/2021/01/01"
Get_Title(URL, Requests=False)
This function is helpful for when you have a link representing each result returned in a plugin. Let's say you have the 5 latest tweets from the Twitter account @JoeBiden, and when creating each result, we want the title from each link. While some API's will return this in the original data, most won't so that's where this function comes into play. This function will send an HTTP request to the desired link, and returned the title of it using the BeautifulSoup web scraping library. The option Requests, when set to True will leverage the Requests_Handler() function from the Common.py library, but sometimes it is preferable to use the urllib library, rather than the requests library leveraged by Requests_Handler(). There is no correct answer, as results vary on a case-by-case basis.
Note: If you have the choice, you should always use the option with the least load, if you are able to get the title via the initial API request, that would be the preferred option over this function.
JSONDict_to_HTML(JSON_Data, JSON_Data_Output, Title) Note: JSON_Data is the data used to make the conversion, JSON_Data_Output is the data that is being output to a file, this is placed into a raw data text area in the created HTML file. In rare cases, your plugin will only be able to retrieve JSON data. This might be because you're calling an API that has no website for the same data. This option is provided to convert input JSON data to a more visually pleasing HTML report. For this to work you need to provide a JSON payload that starts with a list, then a dictionary, following by attributes. Similar to as follows:
[
{
"key1", "value1",
"key2", "value2"
}
]
This still doesn't really answer the question when to use this. Thus, I will refer to current examples. When Not To: Plugins like Twitter search, first create a JSON file as the main file, and an HTML file for each result (Query file). As there are already HTML files being produced for the result, there isn't much need for this. While it wouldn't be a problem to use this, it would just be unnecessary. When To Plugins like IPStack Search, query data for an IP address and receive JSON data. But this JSON data is the full result for the task, no further action is required. There is also no simple way to query the web for this data in an HTML format, so we are stuck with just the JSON data. We would then use this function to create an HTML version of this data for improved reporting.
CSV_to_HTML(CSV_Data, Title) Same concept as the above function but for CSV data. The raw data is not included in the created HTML report, so it does not need to be provided. The only plugin that currently uses this is Domain Fuzzer.
CSV_to_JSON(Query, CSV_Data) Again, currently only used by the Domain Fuzzer, but this should be used when your only true data is in a CSV format, as JSON is more versatile.
[CLASS] Connections(Input, Plugin_Name, Domain, Result_Type, Task_ID, Concat_Plugin_Name) This class is responsible for outputting the final data to the configured formats, such as the main DB, CSV and DOCX reports, and other configured systems like Elasticsearch, JIRA, RTIR, etc. The initialisation of this class creates a set of variable that represent the data as it is outputted. This includes the Input (or Query) provided by the task, plugin name, the domain of the third party site, the type of result, task id (provided by the task), and the concatenated plugin name (Twitter would just be twitter, but something like NZ_Business_Search, would have a secondary plugin name for this called "nzbusiness"). The type of result has to fit into a pre-defined list, that can be found towards the top of the main Scrummage.py file. They are listed below for convenience:
Finding_Types = ["Darkweb Link", "Company Details", "Blockchain - Address", "Blockchain - Transaction",
"BSB Details", "Certificate", "Search Result", "Credentials", "Domain Information",
"Social Media - Media", "Social Media - Page", "Social Media - Person", "Social Media - Group",
"Social Media - Place", "Application", "Account", "Account Source", "Publication", "Phishing",
"Forum", "News Report", "Torrent", "Vehicle Details", "Domain Spoof", "Exploit",
"Economic Details", "Virus", "Virus Report", "Web Application Architecture", "IP Address Information"]
If you require this list to be extended a separate request would need to be made to the Scrummage team, otherwise altering this can cause issues for the Scrummage Dashboard.
Once initialised, the Output(self, Complete_File_List, Link, DB_Title, Directory_Plugin_Name, Dump_Types=[]) function can be called.
- Complete_File_List: A list of the location of all output files. So the value will mostly look like [Main_File, Output_File], with as many output file names as you like. (The actual file data is not stored in the database).
- Link: The link for the individual result
- DB_Title: Don't be thrown off by the DB part of the name, this is just the Title of your result.
- Directory_Plugin_Name: Just the Plugin Name, or Concat_Plugin_Name is there are both.
- Dump_Types: This option is only used if your plugin uses the Data_Type_Discovery() function, listed above.
Set_Configuration_File() This function, returns the absolute path of the config.json file, used to access API secrets, as well as other configuration information.
Date(Additional_Last_Days=0, Date_Only=False, Elastic=False, Full_Timestamp=False) By default this function returns the current date and time in the format (YYYY-MM-DD H:M:S), which is used mostly for logging.
- Additional_Last_Days: Used to return a list of Dates, starting from the current date and working back the amount of dates specified in this parameter. For example, if it is set to 5, the function would return the dates of the last 5 days. This is mainly used by the Dashboard to get records from the last 5 days to show successful and unsuccessful logins. So not very relevant to plugin development.
- Date_Only: As the name suggests only returns the date and not the time.
- Elastic: Returns the timestamp in the format for the Elasticsearch output option.
- Full_Timestamp: Returns the raw, unformatted, current timestamp.
[CLASS] JSON_Handler() This class removes the need for plugins and libraries to each import and manage the json module. Additionally, this help with standardisation as the class has defaults that reflect Scrummage standards.
- [Inner Function] init(raw_data): The initialisation function sets the input value as the objects core value.
- [Inner Function] Is_JSON(): Returns true if the core value is valid JSON.
- [Inner Function] To_JSON_Load(): Loads JSON to a Python Dict using the .load method.
- [Inner Function] To_JSON_Loads(): Loads JSON to a Python Dict using the .loads method.
- [Inner Function] Dump_JSON(Indentation=2, Sort=True): Uses the .dumps method used for outputting data in a JSON format. By default it beautifies the JSON with an indentation of two, and sorts keys in alphabetical and numerical order. (Indentation set to 0, will result in no indentation at all)
Request_Handler(URL, Method="GET", User_Agent=True, Application_JSON_CT=False, Accept_XML=False, Accept_Language_EN_US=False, Filter=False, Risky_Plugin=False, Full_Response=False, Host="", Data={}, Optional_Headers={}, Scrape_Regex_URL={}) This function removes the need for plugins and libraries to each import and manage the requests module. Additionally, this help with standardisation as the class has defaults that reflect Scrummage standards.
- URL: This is a string with the URL to send the request to.
- Method: Default Method is GET, but also supports POST. (Other methods can be added as required, with verification of the Scrummage team)
- User_Agent: Default is True, which means Scrummage sets the User_Agent header to the latest Firefox User_Agent, this helps make the requests appear to be normal.
- Application_JSON_CT: When True, sets a Content-Type header with a value of "application/json"
- Accept_XML: When True, sets an Accept header with a value of "ext/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8"
- Accept_Language_EN_US: When True, sets and Accept-Language header with a value of "en-US,en;q=0.5"
- Filter: When True, and must be used in conjunction with a valid value provided to the Host parameter, this calls the response filter function mentioned below.
- Host: Only set this when using the Filter parameter.
- Risky_Plugin: When True, this indicates that data returned in the response can contain malicious JavaScript code. Also to only be used in conjunction when the Filter parameter is set to True.
- Full_Response: When True returns the full response, as by default this function normally only returns the response data.
- Data: Optional field, to provide data to the HTTP request.
- Optional_Headers: Allows the user to set custom headers. If the headers conflict with defaults, the custom headers will override the defaults.
- Scrape_Regex_URL: Used to Scrape URLs from the response data and returns them.
Response_Filter(Response, Host, Risky_Plugin=False) This function goes through the Response data and converts any relative links to absolute links using the Host parameter's value. If Risky_Plugin is set to True, depending on the security settings you have configured in config.json for web scraping (refer to the guide here), this may prevent the function from doing this, incase the data can be potentially malicious.
Load_Web_Scrape_Risk_Configuration() This function loads web scraping configuration settings used by the Response_Filter() function above.
Regex_Handler(Query, Type="", Custom_Regex="", Findall=False, Get_URL_Components=False) This function performs regular expressions against a given Query. Type can be used to select a pre-defined regex pattern. Otherwise Custom_Regex can be used to supply your own. Findall, when set to True, returns a list of matches, vs the default search, that finds the first match. Get_URL_Components can only be used when Type is used and set to "URL". This breaks any discovered URLs into three components (Prefix, Body, and Extension) which can be used to extract domains from URLs, and much more.
All plugins start with a standardised base template, which only ever needs to be slightly customised. It is even not too common to need to extend the imported modules. This is only needed if you require a custom library for accessing your plugins API, or the API returns data in a weird format that can only be converted with the help of another module.
import plugins.common.General as General, plugins.common.Common as Common, os, logging
The_File_Extension = ".html"
# If your main file and your output files use different file extensions, the above would end up looking something more like:
# The_File_Extensions = {"Main": ".json", "Query": ".html"}
Plugin_Name = "Fake_Search_Engine"
# When your Plugin_Name is longer than one word and uses _ to separate the words, a second variable needs to be set.
Concat_Plugin_Name = "fakesearchengine"
Domain = "fakesearchengine.com"
def Search(Query_List, Task_ID):
# If like most plugins, you will be returning more than one result related to the provided query you must add the argument Limit=10.
# Additionally, if your plugin has multiple tasks, such as the Instagram plugin having four separate tasks, you must add an argument for Type, and use conditional programming to behave according to the provided Type.
try:
# In the following code, the term Simplified_Plugin_Name is a placeholder for Concat_Plugin_Name if available, otherwise Plugin_Name.lower().
Data_to_Cache = []
Directory = General.Make_Directory(Simplified_Plugin_Name)
logger = logging.getLogger()
logger.setLevel(logging.INFO)
Log_File = General.Logging(Directory, Simplified_Plugin_Name)
handler = logging.FileHandler(os.path.join(Directory, Log_File), "w")
handler.setLevel(logging.DEBUG)
formatter = logging.Formatter("%(levelname)s - %(message)s")
handler.setFormatter(formatter)
logger.addHandler(handler)
Cached_Data_Object = General.Cache(Directory, Plugin_Name)
Cached_Data = Cached_Data_Object.Get_Cache()
Query_List = General.Convert_to_List(Query_List)
# The below line of code only needs to be added when using the Limit argument:
Limit = General.Get_Limit(Limit)
for Query in Query_List:
<CUSTOMISED_CODE_GOES_HERE>
Cached_Data_Object.Write_Cache(Data_to_Cache)
except Exception as e:
logging.warning(f"{Common.Date()} - {__name__.strip('plugins.')} - {str(e)}")
At this point, you are ready to begin the fun. Depending on if your plugin uses an API or not, you may be required to add a Load_Configuration option to import the details from the config.json file. This would look similar to the below example from the Twitter_Search plugin:
def Load_Configuration():
logging.info(f"{Common.Date()} - {__name__.strip('plugins.')} - Loading configuration data.")
try:
with open(Common.Set_Configuration_File()) as JSON_File:
JSON_Object = Common.JSON_Handler(JSON_File)
Configuration_Data = JSON_Object.To_JSON_Load()
Twitter_Details = Configuration_Data[Plugin_Name.lower()]
Consumer_Key = Twitter_Details['CONSUMER_KEY']
Consumer_Secret = Twitter_Details['CONSUMER_SECRET']
Access_Key = Twitter_Details['ACCESS_KEY']
Access_Secret = Twitter_Details['ACCESS_SECRET']
if Consumer_Key and Consumer_Secret and Access_Key and Access_Secret:
return [Consumer_Key, Consumer_Secret, Access_Key, Access_Secret]
else:
return None
except:
logging.warning(f"{Common.Date()} - {__name__.strip('plugins.')} - Failed to load Twitter details.")
After developing the plugin there will be additional steps required if this is the case. Please configure this function exactly the same as other plugins. This will be reviewed and corrected if submitted incorrectly. From here use the details, if required, to perform the necessary search against the desired target, and from the result obtain a unique URL for the result, even if it means you have to craft it from something else, as well as a unique identifier such as a title. If the request is made via POST, where the response is the stored result, it is acceptable to create a bogus URL to get around the unique link constraint; however, at the very least the bogus URL should contain the domain. Something such as: https://www.domain.com?UID. Please note this only occurs in rare circumstances.
If a Limit has been implemented, unless your API allows you to set a limit field in the request to control the amount of results. Current_Step variables will need to be implemented to help count how many requests are being made; furthermore, a for loop should be used to iterate through results; furthermore, the for loop should verify whether the Current_Step is less than the Limit. If only one result is generated the for loop and limit parts can be omitted. Twitter_Search is an example where limit can be included, so the plugin permits the use of this in the line shown below
[Line 38] Latest_Tweets = API.user_timeline(screen_name=Handle, count=Limit)
For other example the Current_Step iterator is used around the for loop controlling result output. For this take the following lines from Ahmia_Darkweb_Search as an example:
[Line 43] Current_Step = 0
[Line 44] Output_Connections = General.Connections(Query, Tor_Plugin_Name, Domain, "Darkweb Link", Task_ID, Plugin_Name.lower())
[Line 45]
[Line 46] for URL in Tor_Scrape_URLs:
[Line 47]
[Line 48] if URL not in Cached_Data and URL not in Data_to_Cache and Current_Step < int(Limit):
[Line 49] Title = f"Ahmia Tor | {URL}"
[Line 50] Output_Connections.Output([Output_file], URL, Title, Plugin_Name.lower())
[Line 51] Data_to_Cache.append(URL)
[Line 52] Current_Step += 1
- Almost any result link should be requested and the response stored in an output file using the Common.Request_Handler function, unless you are sure the HTML is going to rendered perfectly, a filtered response should be requested for best reporting. This is also used as a verification. If we refer back to Twitter_Search again for the below example (Note if the website contains www., be sure to have https://www. before the Domain variable in the Host parameter.):
[Line 69] Item_Responses = Common.Request_Handler(Link, Filter=True, Host=f"https://{Domain}")
[Line 70] Item_Response = Item_Responses["Filtered"]
- The response should be outputted to a local file to create a local copy of the link. This function will return the output file, which will be used later on. Unique_Result_Identifier can be any text that makes your result unique that is not a Link.
Output_file = General.Create_Query_Results_Output_File(Directory, Query, Plugin_Name, Filtered_Response, Unique_Result_Identifier, The_File_Extension)
- If the Output_file is set, then the "General.Connections" class should be initialised and called as per below:
Output_Connections = General.Connections(Query, Plugin_Name, Domain, "Exploit Kind", Task_ID, Plugin_Name)
Output_Connections.Output([Main_File, Output_file], URL, Title, Simplified_Plugin_Name)
Please use one of the exploit kinds from the approved list, that can be viewed in main.py. 12. Finally, if the Limit is implemented increase Current_Step by 1, also append the link to the Data_to_Cache list regardless of the limit.
Data_to_Cache.append(URL)
# If using the Current_Step iterator:
Current_Step += 1