Big data

Consider structured and unstructured data from many sources to gain insights
Jeremie Lebel, Nicholas Charney

Big data generally refers to datasets so large and complex they create significant challenges for traditional data management and analysis tools in practical timeframes. Often it refers to the use of predictive analytics or other advanced methods to extract value from data. These datasets exist because of the ever-increasing number and power of electronic devices, and their use for new roles. 

The sources of big data are numerous and growing: transactions from financial markets and e-commerce sites, chats on social networks, signals from RFID tags, cell phone conversations, urban traffic cameras, surveillance cameras, web search and browsing patterns, weather satellites, etc. For industries such as telecom, media, and banking, big data collection is already the norm and growing exponentially in size and complexity year after year. 

Advantages

  • Exploit weak signals and/or lead indicators by connecting disparate data sets that amplify, compliment or extrapolate from these early indicators.
  • Nurture experimentation and generate enterprise wide insights at scale by allowing data from wide ranging sources to be analyzed and hypotheses to be tested concurrently.
  • Obtain results and make decisions faster through automation and algorithmic analysis.

Limitations

  • It can be difficult to distinguish between correlation and causation with large amounts of data that can generate a near infinite number of seemingly meaningful interrelations.
  • Algorithms can contain biased assumptions. For example, average values for human behaviour can be interpreted by people as “normal” and thus “better”. When that bias is included in an algorithm, the system will flag outlier data as unsound, and human interpreters may perceive the results as more neutral than they really are.
  • Big data directs attention toward data that can be collected and organized. A lot of relevant information is either confidential, hard to find or difficult to set into categories.

Policy Opportunity

  • Big data can help government improve any effort it does that is too complex to be left entirely to human observation and reflection. The ability to collect, store and analyze big data has allowed government entities across the world to:
    • Locate people with an outstanding arrest warrant through licence plate recognition.
    • Identify families in need of social help.
    • Better identify bulls that can breed high-yielding cows.
    • Improve weather and climate forecasts.
    • Better understand and predict traffic flows in cities.

Considerations

  • More data can mean less privacy, and according to a survey done across 12 countries, consumers believe that telecommunications companies, government agencies and banks are the organizations most vulnerable to personal data breaches. The possibility to spot patterns in seemingly unorganized datasets means that some data thought of as innocuous can be used to breach people’s sense of privacy.
  • Datasets need to be interoperable.
  • Organizations need employees with advanced skills to make good use of big data.
  • All aspects of data collection and analysis should be subject to rigorous policy scrutiny to check for hidden biases. Read these two articles on biases as a start;
  1. How to address the inherent bias in algorithmic decision-making
  2. Invisible Women: Exposing Data Bias in a World Designed for Men

Government of Canada

  • The Canada Revenue Agency uses advanced predictive analytics to more precisely and rapidly address non-compliance, and to better understand taxpayer decisions and actions with respect to tax debt.
  • FINTRAC is in the pilot phase of an Analytical Modernization to connect data entered from a variety of sources to detect and identify money laundering and terrorist financing.
  • Health Canada is exploring the use of Big Data to enhance its surveillance of diseases and air quality. It is also undertaking a pilot project to assess the safety of imported consumer products before they reach Canada.
  • Employment and Social Development Canada is using Big Data to assess the Employment Insurance program’s labour market impacts and outcomes for various demographic groups.
  • The National Research Council is developing Big Data solutions for security agencies and the private sector. They are also continuously learning about new technologies that appear on the market.

Best in Class

  • The UK’s troubled families initiative has successfully combined data on families with a collaborative effort across public services at a local level.
  • US federal and state security agencies are using Palantir’s services to track roadside bomb deployment, investigate Medicare fraud tips, locating missing children, and more.
  • New York City’s Fire Department developed a new fire-prevention strategy prioritizing inspections based on risk assessments derived from building data. The city reduced the number of annual fire deaths to the lowest since 1916, and increased its percentage of visits by inspectors resulting in a vacate order from 13 to 70 percent.