Data Fields
The document below details the various fields that are extracted from the data that Flare collects. These fields can be used with the Lucene Query Syntax.
| |
---|---|
Activities that have the domain name acme.com in their title | title:"acme.com" |
Listings that have a price between $100 and $200 | price:>100 AND price:<200 |
Retrieve an actor and their publications by their username | author_name:lapifia |
Specifying multiple keywords | your_organisation AND fraud |
Using time field | metadata.estimated_created_at >= now-1d |
Actors from specific threads | _index:forum_topic AND title:actor_name AND keyword |
Some documents have three levels of support. Each of them is documented in the following table.
Level | Description |
---|---|
1 | Base attributes of the entity. As they are part of the document’s identity, fields on this level have the highest chance to be present. |
2 | Addition to the level 1 attributes. Fields on this level add more value from an analysis point of view. As they are not part of the document’s minimal identity, they can be omitted in some cases |
3 | Complementary information. Fields on this level may be unavailable for some sources or unnecessary for basic analysis. Their presence depends on the source. |
Each document contains a metadata field with subfields describing the origin of the data and some other attributes to diagnose the source of the document. Interesting fields are:
Name | Description |
---|---|
_id | Unique ID of the document. Usually composed of the source and an internal ID at the source. |
metadata.source | Name of the source or website from which the raw data was downloaded from. Since a site can have multiple domains that change over time, using names such as dream or wall_st is more reliable. |
metadata.first_crawled_at | Date the document was first found. Useful for having an approximate date of the creation of a document. |
metadata.last_crawled_at | Date the document was last seen. Useful for having an approximate date of deleting a document on a site. |
metadata.estimated_created_at | Some documents may have their exact date of creation. This field estimates the creation date using the date provided by the source if possible or metadata.first_crawled_at. This field therefore provides a better accuracy for the creation date than metadata.first_crawled_at. |
author_name | Name of the author associated with the document in question. This may be the seller name for a listing, the repository host on GitHub, etc. |
Documents also contain a features field with subfields describing extracted keywords that might identify digital identities.
Name | Description |
---|---|
features.urls | List of URLs extracted from the document. This can be URLs from any scheme, including HTTP, FTP, SSH, etc. |
features.emails | List of email addresses extracted from the document. |
features.domains | List of domains extracted from the document. The list contains every valid variant of a domain, such that a document with the www.flared.io domain to have a list with both www.flared.io and flared.io. |
features.reversed_domains | List of domains extracted from the document and the components reversed. A document with www.flared.io will have a list with both io.flared.www and io.flared. This field is useful to index and quickly find documents matching a domain’s subdomains with suffix wildcard query: features.reversed_domains:io.flared.* |
features.ip_addresses_cidr | List of IP addresses extracted from the document. Support both IPv4 and IPv6. The field type supports queries with CIDRs such as features.ip_addresses_cidr:127.0.0.0/8. |
features.btc_addresses | List of Bitcoin public addresses extracted from the document. |
features.cc_numbers | List of potential Credit Card numbers extracted from the document. While the numbers have not been validated, each number matches a known BIN and has a valid Luhn identification number. |
Seller fields level 1
Name | Description |
---|---|
about | Descriptive or biographical text of the seller. |
created_at | Creation date of the seller’s account. |
username | Pseudonym of the seller. |
Seller fields level 2
Name | Description |
---|---|
List of comments or ratings posted on the seller’s page. | |
Seller’s information. | |
rating | Rating (out of 5 stars usually) given to the seller. |
rating_count | Number of ratings given to the seller. |
rating_score | Ratings given to the seller. Usually the sum of positive and negative feedback. |
ship_from | Regions from which the seller delivers the goods. Use ship_from_norm for normalized region across data sources. |
ship_to | Regions where the seller delivers the goods. Use ship_to_norm for normalized region across data sources. |
public_pgp_fingerprint | PGP key impression list associated with the seller. |
public_pgp_uid | List of PGP key IDs associated with the seller. |
transactions_count | Number of sales made by the seller. |
Seller fields level 3
Name | Description |
---|---|
last_7_days_rating | Rating average for the last 7 days given to the seller. |
last_7_days_rating_count | Number of ratings given to the seller in the last 7 days. |
last_month_rating | Rating average for the last month given to the seller. |
last_month_rating_count | Number of ratings given to the seller during the last month. |
transactions_amount | Total amount of sales made by the seller in USD. |
finalize_early_enabled | Boolean indicating whether the seller requires payments at the time of purchase or via an escrow. |
title | Title of the seller on the market. Generally an internal title earned by the experience or reputation of a seller. |
last_active_at | Last activity date of the seller. |
Listing fields level 1
Name | Description |
---|---|
title | Ad Title. |
description | Description of the ad. |
seller_id | Internal ID of the seller. |
seller_name | Descriptive name of the seller. |
creation_date | Creation date of the ad. |
price | Price of the ad in USD. |
currency | Price’s currency. |
Listing fields level 2
Name | Description |
---|---|
category_id | Internal ID of the category of the ad. Use category for normalized ID across all data sources. |
category_name | Descriptive name of the category of the ad. Use categoy_name_norm for normalized name across all data sources. |
ship_to | List of regions where the goods can be delivered. Use ship_to_norm for normalized regions across data sources. |
ship_from | Regions from where the goods are shipped. Use ship_from_norm for normalized regions across data sources. |
rating | Rating given to the merchandise or the seller. |
rating_count | Number of ratings given to the merchandise or the seller. |
List of comments or ratings posted on the ad. See below for the fields at Feedback. | |
sold_count | Number of sales of the ad. |
Listing fields level 3
Name | Description |
---|---|
escrow | Boolean indicating whether the transaction is via an escrow managed by the market platform. |
last_7_days_rating | Rating for the last 7 days given to the merchandise or the seller. |
last_7_days_rating_count | Number of ratings given to the merchandise or the seller in the last 7 days. |
last_month_rating | Rating for the last month given to the merchandise or the seller. |
last_month_rating_count | Number of ratings given to the merchandise or the seller in the last month. |
List of shipping options available for the listing. | |
stock_count | Remaining quantity of the goods. |
view_count | Number of times the listing has been seen. |
Forum fields level 1
Name | Description |
---|---|
username | Username of the user. |
personal_text | Descriptive or biographical text. |
registered_at | Approximate date of account registration. |
Forum fields level 2
Name | Description |
---|---|
contact_info | Profile’s contact info. |
last_posted_at | Approximate date of the last message. |
last_active_at | Approximate date of the last connection. |
public_pgp_fingerprint | Fingerprint list of the PGP keys associated with the account. |
public_pgp_uid | List of PGP key IDs associated with the account. Usually contains a list of email addresses associated with the key. |
rating | User rating. |
rating_pos | Number of positive ratings given to the user. |
rating_neg | Number of negative ratings given to the user. |
signature | Signature attached to each user’s message. |
Forum fields level 3
Name | Description |
---|---|
age | Age of the user. |
avatar | URL to the user’s avatar. |
comments_count | Number of comments from the user. |
following_category_ids | Categories followed by the user. |
following_user_ids | Other users followed by the user. |
location | Location of the user. Unlikely to be accurate. Use location_norm for normalized value across data sources. |
posts | List of posts left on the user profile by other users. See below for the fields at Forum Post. |
posts_count | Number of messages from the user. |
realname | Real name of the user. Unlikely to be an actual real name. |
timezone_offset | Time zone used by the user. |
title | Title of the user (member, moderator, administrator, etc.). |
tags | Tags associated with the user. |
website | Website URL of the user. |
Forum_topic fields level 1
Name | Description |
---|---|
author_id | Internal ID of the original author of the thread. |
author_name | Nickname of the original author of the thread. |
posted_at | Date of creation of the thread. |
title | Title of the thread. |
Forum_topic fields level 2
Name | Description |
---|---|
category_id | Internal ID of the thread’s category. Use category_id_norm for normalized value across data sources. |
category_name | Name of the thread’s category. Use category_name_norm for normalized value across data sources. |
category_path | Internal ID of the thread’s category and parent's categories. |
Forum_topic fields level 3
Name | Description |
---|---|
first_post_preview | Preview of the first post content. |
last_reply_at | Date of the last reply in the thread. |
profile_id | Related profile if the thread is on a profile page. |
tags | Tags associated with the thread. |
Forum_post fields level 1
Name | Description |
---|---|
author_id | Internal ID of the author of the message. |
author_name | Nickname of the author of the message. |
content | Content of the message. |
posted_at | Date of creation of the message. |
parent_post_id | Parent post to which this post reply. |
topic_id | Internal ID of the thread containing the message. |
topic_title | Title of the thread containing the message. |
Forum post fields level 2
Name | Description |
---|---|
There are no support level 2 fields for this document. | |
Forum_post fields level 3
Name | Description |
---|---|
post_title | Title of the post. |
rating_pos | Number of positive ratings given to the user. |
Driller fields
Name | Description |
---|---|
html_url | URL of the original document. |
cache_url | Cache URL from Google. |
is_dork | Whether the document came from a dork query. |
title | Title of the document. |
author_id | Internal ID of the author of the message. |
author_name | Nickname of the author of the message. |
author_email | Email of the author of the message. |
created_at | Creation date of the document. |
project_name | Project’s name. |
content | Content of the document. |
size | Size in bytes. |
snippets | Snippets of the document. |
sha | SHA of the document. |
is_truncated | Whether the document is truncated. |
host | Host URL of the document (ex: Github.com). |
fileype | Type of the file. |
filename | Name of the file. |
dirpath | Dirpath of the document. |
user | Document’s user. |
issue | Document’s issue. |
project | The project that the document comes from. |
commit | Document commit. |
code | Extracted code from the document. |
Paste documents are mostly raw data coming from paste sites such as Pastebin.
Paste fields
Name | Description |
---|---|
author_id | Internal ID of the author. |
author_name | Username of the author. |
content | Raw content of the paste document. Can be truncated if actual content is too large. |
expire_at | Expiration date of the paste document. Documents remain in Flare's database even after the expiration date. |
is_truncated | Large documents are truncated in Flare's database. If True, content is a truncated version of the actual document. |
posted_at | Date the paste document was created. |
size | Actual size of the paste document (bytes). |
syntax | Document’s syntax defined by the author. |
title | Title given to the paste document. |
title_en | English title given to the paste document. |
Domain fields
Name | Description |
---|---|
Name | Domain’s name. |
registered_at | Certificate registration date. |
feed | Certificate Transparency feed name. |
identifier_domain | Domain Identifier. |
cert_data | Contains details about the issued certificates and the issuers. |
subject | Domain SSL certificate’s subject. |
issuer | Certificate issuer. |
Host fields
Name | Description |
---|---|
vulnerabilities | The name of a vulnerability present on the host. (e.g. CVE-2018-15919) |
country_code | The country code of where the IP is located. e.g. country_code:US |
service | Service running on an IP (e.g. http, ssh, ftp, mongodb, elastic, etc). |
tags | Other tag that give information about what this IP is doing. Possible values: cloud,cdn,starttls,self-signed,database,vpn,honeypot,iot,devops,videogame,ics,compromised,cryptocurrency,medical,tor, proxy |
Leak fields
Name | Description |
---|---|
links | [Optional] |
name | The email address associated with the leak. |
passwords | List of passwords leaked associated with the email. |
For each entry in password list | |
domain | [Optional] Domain name of breached company. |
extra | [Optional] Details on the intrusion method if available. |
hash | Clear password or hash. |
hash_type | [Optional] Hash function. |
id | Unique series of numbers for the leaked credential. |
imported_at | Timestamp at which the password was added to our database. This does not correspond to the date at which the password was leaked. |
breached_at (under source) | [Optional] Date of the breach (not always available). |
description (under source) | Description of the breach in English. |
description_fr (under source) | Description of the breach in French. |
hash_description (under source) | [Optional] Description of the hash function of the source. |
hash_type (under source) | [Optional] Hash function of the source if the leaked passwords are the same type. If not, the hash type will be “none”. |
id (under source) | Internal source name and year of breach. |
leaked_at (under source) | Date when the data became publicly available. |
name (under source) | [Optional] Name of the company that was breached. |
related_urls (under source) | [Optional] News articles related to the breach, if available. |
url (under source) | [Optional] Breached website URL (original source not always available). |
source_id | Internal source name. |
source_params - line | Line number where the leaked password was found. |
Feedback are usually posted by buyers of an ad and is an essential part of the black market reputation system.
Feedbacks fields
Name | Description |
---|---|
feedbacks.rated_at | Date of the rating or comment. |
feedbacks.rating | Rating given by the buyer. |
feedbacks.comment | Comment posted by the buyer. |
feedbacks.username | Full or partial pseudonym of the buyer. |
feedbacks.purchase_amount | Price of the purchase associated with the comment. |
feedbacks.purchase_currency | Currency used by purchase_amount. |
feedbacks.author_id | Internal ID of the author. |
User fields
Name | Description |
---|---|
Email. | |
company | Company. |
full_name | Full name. |
location | Location. |
followers_count | Number of followers. |
Contact_info fields
Name | Description |
---|---|
contact_info.fdw_forum | FDW Forum internal id. |
contact_info.jabber | Jabber address. |
contact_info.email | Email address. |
contact_info.skype | Skype username. |
contact_info.icq | ICQ. |
contact_info.irc | IRC nickname. |
contact_info.ricochet | Ricochet. |
contact_info.bitmessage | Bitmessage. |
contact_info.btc | Bitcoin wallet. |
contact_info.telegram | Telegram. |
contact_info.discord | Discord. |
Shipping_options options fields
Name | Description |
---|---|
shipping_options.description | Description of the option. |
shipping_options.price | Price of the option. |
shipping_options.currency | Currency of the price of the option. |
Project fields
Name | Description |
---|---|
owner_name | Name of the author. |
owner_id | Internal ID of the author. |
owner_type | Author’s association with the document. |
tags | Associated tags. |
last_activity_at | Approximate date of the last activity. |
language | Author’s language. |
followers_count | Number of followers. |
forks_count | Number of forks there are of this repository in the whole network. |
is_fork | Identifies if the repository is a fork. |
Commit fields
Name | Description |
---|---|
committer_name | Name of the committer. |
committer_id | Internal ID of the committer. |
committer_email | Email of the committer. |
author_email | Email of the author. |
sha | SHA. |