Information about publicly traded companies has become exponentially more available in past decades. However, a large amount of this data has little structure to it. As a result, many people spend countless hours manually reading through various types of business documents.

Automatic extraction of information from business documents cuts down on manual labor for both researchers and investors and allows them to more quickly gain insights into how companies operate.


Diagram showing the process of the demo.

What we Extract

We use Neural Network sequence models and other Machine Learning models such as SVM on Corporate Governance Guidelines to tackle the challenges of automatic detection of corporate structure and conversion of unstructured data into machine readable data.

Lead Independent Director

The Lead Independent Director's roles and duties can vary widely between companies. Extracting these responsibilities can provide insight into how powerful the person in the LID role is.

Preferred Arrangement
for Chief Executive
Officer and Chairman Duality

CEO Duality is the board's opinion on whether to appoint an individual to both the roles of "CEO" and "Chairman of the Board" or have the roles fulfilled by different people.

Company Name
and Guideline Revision Dates

Extracting the revision dates in the document and the company name helps link the document to the source and provides insight into how often the document is revised. Date extraction is done with regex.

The Team

Person 1

Tongqing Ding

Project Sponsor

Person 2

Ryan Baten

Co Team Lead + Machine Learning Lead

Person 3

Ben King

Co Team Lead + Business Research Lead

Person 4

Ameya Bahirat

CEO/Chairman Duality Lead

Person 5

Zachary Johnson

Front End Development Lead

Person 6

Alan Moy

Technical Lead

Person 7

Jianyi (Jerry) Chen

Testing Lead