Open data has seen great progress in recent years, but new opportunities and challenges continue to emerge. Future progress requires greater attention to equity, security, and a better understanding of context and culture.
"Collaboration between traditional civil society and civic technologies or between journalists and private-sector application providers is driving new uses of data that could highlight corruption, promote public integrity, or shape public policy debate," summarized Silvana Fumega PhD, Researcher and Policy Director of ILDA, in a recent event hosted by the 浪花直播 Center, in collaboration with Google. This event, 鈥Open Data: What鈥檚 Next in Policy and Practice?鈥 brought together academia, government, and industry representatives to explore past progress and future opportunities to drive open data forward. The speakers agreed: open data involves more than just putting datasets online. It is clear that truly accessible and equitable open data requires tools, resources, collaboration and standardization. Through exploration of panelists鈥 current work and their analysis of gaps and opportunities in the open data movement, recommendations emerged.
1. Open data exists on a spectrum
Making datasets public is the first step to even the scales between who is using and who could use data. However, data access is not a closed-open binary. Like other open movements, open data exists on a spectrum, influenced by different databases and different standards. Reaching the potential promised by open data has many challenges in both policy and practice.
Making open data more than FAIR but equitable is the next frontier that industry, academia, and government must tackle, and this cannot happen in silos. Additionally, equity doesn鈥檛 stop at access, but addresses who is asking the questions and what questions are prioritized. Stefaan G. Verhulst, Co-Founder, Chief of R&D, and Director of the Data Program of the Governance Laboratory (The GovLab), pointed out that 鈥渘ot everyone is part of formulating the questions and as a result, we already have an equity issue from the first part of the scientific enterprise.鈥 Chris Marcum, Assistant Director for Open Science and Data Policy, White House Office of Science & Technology Policy, responded by saying 鈥渨e need to ensure community is engaged in providing data, engaged in using this data, [and] are also asking the questions that are interesting to them.鈥
2. Open data needs to be reusable and interoperable
Analyzing data can be timely and costly. Data is shared in different formats with fields within the data expressing the same thing in different ways. Something as simple as different datasets expressing a field differently (e.x. 鈥淪tate鈥 represented as Texas and TX) requires additional cleaning of data before it can be used. Panelists called for standards, from how the data is stored (CSV, pdf, etc.) to the normalization of data. Guha V. Ramanathan PhD gave an example of this, stating 鈥渢he Bureau of Labor Statistics and the Labor and Economic Analysis Division definitions of employment are different, so today it is up to the user of the data to dissolve all these differences." Reusability creates transparency by allowing research results to be more broadly verified. However, even when data is coded and formatted uniformly, there can be underlying disagreements on definitions and categorization.
3. Context and culture need to be taken into account when assessing open data, especially internationally; This creates the need for shared definitions
The open data movement is not US-centric; the international lens shows the need for more collaboration. Challenges arise around different legal and social contexts, and definitions may differ drastically from country to country. Although open data principles remain largely the same, the context can differ. 鈥淲e can't really say that the infrastructure is the same in a developed country as in a developing one. The barriers and the gaps and the infrastructure are clearly not the same, so we need to learn a little bit about the context to also understand what we are measuring, despite that the survey is exactly the same,鈥 said Dr. Fumega. For example, different Latin American countries have different definitions of femicide, impacting data collection and categorization. Context, when analyzing these datasets, adds value.
4. Incentive structures should encourage open data
Dr. Ramanathan stated, 鈥淵ou鈥檙e not going to get usable effective open data without the right incentive structures.鈥 For example, the Department of Energy (DOE) is promoting Digital Object Identifiers (DOI) for datasets and to affiliated researchers. Awards and other rewards can incentivize usable datasets that bolster research. New incentives could mean rehauling systems for promotion and tenure to include acknowledging open datasets like publications; naming and promoting datasets in the same way as research studies could foster a culture that more explicitly values this work.
5. Technical and sociocultural barriers limit the use of open data
Technical barriers include the need for massive computing power, tedious data cleaning, and navigating discipline-specific databases. Many people, groups, and researchers do not have the skill or money required for processing this data. Although this is a barrier, there are current initiatives funding and giving resources to alleviate these well-identified issues. However, the sociocultural barriers are harder to pinpoint and overcome. Elena Steponaitis PhD, Program Executive in NASA's Chief Science Data Office, described the Transform to Open Science (TOPS) mission at NASA, saying 鈥淭OPS aims to increase understanding and adoption of open science and accelerate major scientific discoveries.鈥 This initiative, and its certificate program, shows what could look like. Building capacity for an open ecosystem can bridge silos through common language and goals.
In recent years, open data has seen great progress and there is still tremendous potential for greater impact. A huge amount of rich data is now publicly available, there is a foundation of government support at the highest levels, and open data is the default for many research programs. However, it is often still seen as a side project. Additionally,. According to OpenGov, we are in the 鈥溾 of this movement; this wave 鈥渢akes a much more purpose-directed approach than prior waves; it seeks not simply to open data, but to do so in a way that focuses on impactful reuse, especially through inter-sectoral collaborations and partnerships.鈥 It is now the time to talk about the next steps, and that open data for the sake of open data is not enough to drive forward the change that is possible. 浪花直播 has previously explored the importance of data quality from open tools, including where data fits in the open science community.
Many of the needs in creating more open and FAIR data point to one thing: the need for cultural change. Collaboration between expert communities in government, academia and private industry will build bridges between them. Additionally, creating incentives to collect and publish open data that is reusable and interoperable will assist in creating more equitable open data. Keeping these needs in mind can help to create more equity and accessibility in the open data movement.
The Science and Technology Innovation Program (STIP) serves as the bridge between technologists, policymakers, industry, and global stakeholders.
Read more