How we work with Form D filings

private investment
R
Form D
Author

Camden Blatchly

Published

April 18, 2025

We access raw Form D filings using the dform package. There are two datasets: issuers and offerings. The issuers dataset describes entities which are filing an offering, while the offerings dataset describes the terms of the offerings.

formd_obj <- dForm$new()
formd_obj$load_data(years = c(2010:2023), use_cache = FALSE, remove_duplicates = FALSE)

issuers <- formd_obj$issuers
offerings <- formd_obj$offerings

To clean the issuers dataset, we remove unnecessary whitespace and standardize common entity and city naming conventions (e.g., normalize L.P. to LP).

Next, we associate each issuer with a county. When submitting a Form D filing, issuers must list an address. To convert from the address to a county, we take the following steps:

Issuers must also list a precise year of incorporation when filing. Because the yearofinc_value_entered field may be blank for some of an entity’s submissions, we group by the cik (the unique ID given by the SEC to entities) and the name of the entity, and then take the smallest year entered. If an entity is yet to be formed or is older than five years , the yearofinc_value_entered field may be NA. In these cases, we take the value in the yearofinc_timespan_choice field, which will be either ‘yetToBeFormed,’ ‘withinFiveYears,’ or ‘overFiveYears’

Issuers sometimes choose to change their industry association between submissions. To address these discrepancies, we collapse all industries for an entity into a single comma separated string (e.g., “Retailing, Restaurants”).

After these steps, we are ready to join the issuers and offerings datasets. We join by accessionnumber (a unique ID number assigned by the SEC to each submission), year, and quarter and filter to submissions submitted by the primary issuer only (using the is_primaryissuer_flag field).

The field “totalamountsold” is the basis for much of our Form D analysis. This field reports the cumulative amount of money raised by a company for a given fundraising effort (which may involve multiple offerings). Because this field is cumulative, we must identify the incremental difference in the amount raised between one offering and the last offering within a given fundraising effort. Failure to do this will artificially inflate the amount of funding raised in later periods. For example, if Company A raises $50k in 2016, their 2016 filing will report $50k as the total amount sold. But if they continue fundraising efforts and accrue an extra $20k in 2017, the 2017 filing will report $70k as the total amount sold, resulting in an inflated value of $50k for the 2017 filing year.

The steps to correct the dollar amount sold from each company for each fundraising effort are as follows:

1. Create a business ID

The first 10 digits of the accession number (the unique ID given to offerings) is typically the CIK code (the unique ID given to entities), but not always. When the first 10 digits of the accession number differ from the CIK code, it suggests that there is a difference between that entity and an entity where the CIK code and accession number prefix match. Therefore, we use a combination between the CIK code and accession number prefix to identify offerings from the same business entity.

2. Create a funding round ID and sort filings within the round.

Our goal is to identify entries belonging to unique fundraising efforts for each company. To accomplish this aim, for each business ID, we assign a funding round ID to each group of filings that have the same values for the following fields:

Once a business ID’s unique fundraising efforts have been identified, we determine the order in which each offering was filed using the accession number, which includes digits describing the year and the ordering of filing. After sorting the offerings, we can calculate the incremental amounts raised. The data is then separated into two separate tables (businesses and funds) based upon whether the security includes Pooled Investment Fund interests.

Now the data is ready for analysis or visualization! Take a look at our private funding map to get a feel for how the data can be used.