Abstract
Advances in machine learning and artificial intelligence have demonstrated enormous potential for building intelligent systems and growing knowledge bases. However, the current data marketplaces are not efficient enough to facilitate long-term technological and economic advancements. An efficient data market would allow participants to strategically sell or purchase data and get fair compensation for the efforts to collect and curate the data. Instead, in the existing data market, the originators of the data mostly lose the control over their data and are not paid for data harvesting. Big companies analyze user data to improve product design, customer retention, and initiatives that help them earn revenue; however, the users who contribute data are unrecognized and uncompensated. The inefficiency of the current data market is in part due to the centralized data curation model; more importantly, there is the little consensus as to how to determine the value of data, which will otherwise empower the legislature to regulate the data market. In this project, we plan to investigate the theoretical and algorithmic foundation for data valuation and implement the results from the theoretical studies in a blockchain-based decentralized data marketplace to help manage transactions of patients’ data in a clinical study.