Overview

Five sets of flat files are built from the CRSP Stock and Index databases. A number of our subscribers have long requested alternate formats of our databases in order to automate or streamline their processes and easily ingest CRSP data into them. These flat files are intended for these purposes.

  • SIZ – 1925 US Stock and Indexes
  • SAZ – 1925 US Stock
  • SXZ – 1962 US Stock and Indexes
  • S6Z – 1962 US Stock
  • SFZ - 1925 US Indexes

File Extensions:

  • SAS: *.sas7bdat
  • ASCII: *.dat
  • R: *.rds
Stock Files File Names
Delist – Daily Return Information sfz_del
Delist – Monthly Return Information sfz_mdel
Distributions sfz_dis
Name History sfz_nam
NASDAQ History sfz_ndi
Security Header Information sfz_hdr
Shares History sfz_shr
Time Series – Daily Primary sfz_dp_dly
Time Series – Daily Secondary sfz_ds_dly
Time Series - Monthly sfz_mind

 

Stock and Index Files* File Names
Index Membership  sfz_mbr
Portfolio Membership - Daily  sfz_portd
Portfolio Membership - Monthly  sfz_portm

* Available in 1925 and 1962 US Stock & Indexes, and 1925 US Indexes, but are NOT available in 1925 and 1962 US Stock.

Index Files File Names
Index Header sfz_indhdr
Rebalance  sfz_rb
Time Series – Daily Index sfz_dind
Time Series – Monthly Index sfz_mind

Comparisons between the legacy CRSPAccess Stock and Index databases and the CRSP flat files will reveal differences, outlined here:

Derived Data

The flat files provide all of the underlying data that can be used to generate derived data items, but do not include the derived items themselves. Examples of derived data that TSQuery and ts_print can generate that are not found in the flat include excess and cumulative returns, last non-missing, recent, previous period versions of time series data items.

Double Precision

CRSPAccess stored most items, including commonly used Price (PRC), Ask or High Price (ASKHI), and Bid or Low Price (BIDLO), as 4-byte floating point numbers. The SAS format defaults to storing those items as 8-byte (double precision) floating point numbers. For the majority of historical values, this does not result in any difference. However, the conversion to SAS’s double precision, makes more apparent a known issue of phantom precision related to decimal pricing. For example, IBM’s price on 11/27/2015 of $138.46 will, if displayed with five decimal places, show $138.46001. The ASCII version, by outputting only seven significant digits in scientific notation, will have just 1.384600E+02, and no phantom precision.

CRSP is re-engineering our back office to correct the phantom precision and increase the real precision throughout the Stock and Indexes flat files in the coming year.

9-Character CUSIP

Subscribers have long-requested the addition of 9-character cusips to our databases. The flat files include them in the name- and header history files.

CRSP Total Market

The investable CRSP Total Market Index is included in the index files associated with the stock and index database. Daily price-only and total returns, levels, and counts are included.

Volumes differences

The introduction of double precision allows for the monthly volumes to be stored in same unit (1 share) as daily volumes rather than as a unit of 100 shares. So monthly volumes in the flat files are now stored as one hundred times the value in legacy CRSPAccess. The change also allowed for four missing daily volumes for Citigroup in 2009 and 2010 to be replaced with actual values that were in excess of two billion, and for those monthly volumes to be recalculated.

KYPERMNO CALDT New VOL MCALDT New MVOL
70519 08/05/2009 2,674,463,281 08/31/2009 22,798,732,177
70519 12/17/2009 3,772,638,437 12/31/2009 15,021,795,593
70519 12/18/2009 2,813,697,156
70519 12/07/2010 3,267,829,406 08/31/2009 13,427,190,606

Data Category

Each data item in each table is associated with a Data Category. The Data Category identifies the type of item and the associated formats for use with SAS, ASCII, R, and SQL.

Data Category Description SAS Type SAS Format ASCII SQL Type R Type
DESCRIPTION Wide Character Field Containing Text Iinformation CHARACTER $W.   VARCHAR(W) TEXT
ID Field Containing an Alphanumeric Identifier CHARACTER $W.   VARCHAR(W) TEXT
FLAG Alphanumeric Field Containing a Code Value CHARACTER $W.   VARCHAR(W) TEXT
NAME Alphanumeric Field for Names CHARACTER $W.   VARCHAR(W) TEXT
DATE Date Field NUMERIC YYYYMMDD10. YYYY-MM-DD DATE DATE
START DATE Start of a Date Range - Paired with an End Date NUMERIC YYYYMMDD10. YYYY-MM-DD DATE DATE
END DATE End of a Date Range - Paired with a Start Date NUMERIC YYYYMMDD10. YYYY-MM-DD DATE DATE
TIMESTAMP Date and Time, Including Seconds NUMERIC DATETIME YYYY-MM-DD HH:MM:SS DATETIME DATETIME
AMOUNT Fixed Point Number NUMERIC W.N WW.NNNNN DECIMAL (P,N) FLOAT
RATIO Calculated Floating Point Number NUMERIC PERCENTW.N 1.234567890123E+12 FLOAT FLOAT
VALUE Field with a Wide Range of Values NUMERIC E20. 1.234567890123E+12 FLOAT (OR DECIMAL) FLOAT
CODE Integer Field that Represents One or More Characteristics NUMERIC W. NNNN INT INT
KEY Integer Field that is Used as a Key NUMERIC W. NNNNNNN INT INT
NUMBER Integer Value < 2,000,000,000 NUMERIC W. NNNNNNNNNN INT INT
QUANTITY Integer Field with Some Values in Excess of 2,000,000,000 NUMERIC COMMAW. NNNNNNNNNNNNN BIGINT FLOAT

Missing Values

In CRSPAccess, missing values were indicated by the use of defined non-null values (e.g -99 for returns and 0 for prices). In the flat files, missing values are now represented by null values.

Missing values are displayed as follows:

  • ASCII missing values, regardless of type, is an empty string 
  • SAS missing values are an empty string for character fields and SAS missing (displayed as a.) for numberic fields
  • R missing values are an empty string for character fields, and ‘R’ missing (displayed as N/A) for float, int, date and datatime.