The “Internet of things” (IoT) is becoming an increasingly growing topic of conversation both in the workplace and outside of it. It’s a concept that not only has the potential to impact how we live but also how we work. But what exactly is the “Internet of things” and what impact is it going to have on you, if any? There are a lot of complexities around the “Internet of things” but I want to stick to the basics. Lots of technical and policy-related conversations are being had but many people are still just trying to grasp the foundation of what the heck these conversations are about.
Let’s start with understanding a few things.
Broadband Internet is become more widely available, the cost of connecting is decreasing, more devices are being created with Wi-Fi capabilities and sensors built into them, technology costs are going down, and smartphone penetration is sky-rocketing. All of these things are creating a “perfect storm” for the IoT.
What Is The Internet Of Things?
Simply put, this is the concept of basically connecting any device with an on and off switch to the Internet (and/or to each other). This includes everything from cellphones, coffee makers, washing machines, headphones, lamps, wearable devices and almost anything else you can think of. This also applies to components of machines, for example a jet engine of an airplane or the drill of an oil rig. As I mentioned, if it has an on and off switch then chances are it can be a part of the IoT. The analyst firm Gartner says that by 2020 there will be over 26 billion connected devices… That’s a lot of connections (some even estimate this number to be much higher, over 100 billion). The IoT is a giant network of connected “things” (which also includes people). The relationship will be between people-people, people-things, and things-things.
The idea of adding sensors and intelligence to basic objects was discussed throughout the 1980s and 1990s (and there are arguably some much earlier ancestors), but apart from some early projects — including an internet-connected vending machine — progress was slow simply because the technology wasn’t ready. Chips were too big and bulky and there was no way for objects to communicate effectively.
Processors that were cheap and power-frugal enough to be all but disposable were needed before it finally became cost-effective to connect up billions of devices. The adoption of RFID tags — low-power chips that can communicate wirelessly — solved some of this issue, along with the increasing availability of broadband internet and cellular and wireless networking. The adoption of IPv6 — which, among other things, should provide enough IP addresses for every device the world (or indeed this galaxy) is ever likely to need — was also a necessary step for the IoT to scale.
Kevin Ashton coined the phrase ‘Internet of Things’ in 1999, although it took at least another decade for the technology to catch up with the vision.
How big is the Internet of Things?
Big and getting bigger — there are already more connected things than people in the world.
Tech analyst company IDC predicts that in total there will be 41.6 billion connected IoT devices by 2025, or “things.” It also suggests industrial and automotive equipment represent the largest opportunity of connected “things,”, but it also sees strong adoption of smart home and wearable devices in the near term.
Another tech analyst, Gartner, predicts that the enterprise and automotive sectors will account for 5.8 billion devices this year, up almost a quarter on 2019. Utilities will be the highest user of IoT, thanks to the continuing rollout of smart meters. Security devices, in the form of intruder detection and web cameras will be the second biggest use of IoT devices. Building automation – like connected lighting – will be the fastest growing sector, followed by automotive (connected cars) and healthcare (monitoring of chronic conditions).
What are the benefits of the Internet of Things for business?
The benefits of the IoT for business depend on the particular implementation; agility and efficiency are usually top considerations. The idea is that enterprises should have access to more data about their own products and their own internal systems, and a greater ability to make changes as a result.
What is the Industrial Internet of Things?
The Industrial Internet of Things (IIoT) or the fourth industrial revolution or Industry 4.0 are all names given to the use of IoT technology in a business setting. The concept is the same as for the consumer IoT devices in the home, but in this case the aim is to use a combination of sensors, wireless networks, big data, AI and analytics to measure and optimise industrial processes.
If introduced across an entire supply chain, rather than just individual companies, the impact could be even greater with just-in-time delivery of materials and the management of production from start to finish. Increasing workforce productivity or cost savings are two potential aims, but the IIoT can also create new revenue streams for businesses; rather than just selling a standalone product – for example, like an engine – manufacturers can also sell predictive maintenance of the engine.
What are the benefits of the Internet of Things for consumers?
The IoT promises to make our environment — our homes and offices and vehicles — smarter, more measurable, and… chattier. Smart speakers like Amazon’s Echo and Google Home make it easier to play music, set timers, or get information. Home security systems make it easier to monitor what’s going on inside and outside, or to see and talk to visitors. Meanwhile, smart thermostats can help us heat our homes before we arrive back, and smart lightbulbs can make it look like we’re home even when we’re out.
Looking beyond the home, sensors can help us to understand how noisy or polluted our environment might be. Self-driving cars and smart cities could change how we build and manage our public spaces.
However, many of these innovations could have major implications for our personal privacy.
The Internet of Things and smart homes
For consumers, the smart home is probably where they are likely to come into contact with internet-enabled things, and it’s one area where the big tech companies (in particular Amazon, Google, and Apple) are competing hard.
The most obvious of these are smart speakers like Amazon’s Echo, but there are also smart plugs, lightbulbs, cameras, thermostats, and the much-mocked smart fridge. But as well as showing off your enthusiasm for shiny new gadgets, there’s a more serious side to smart home applications. They may be able to help keep older people independent and in their own homes longer by making it easier for family and carers to communicate with them and monitor how they are getting on. A better understanding of how our homes operate, and the ability to tweak those settings, could help save energy — by cutting heating costs, for example.
What about Internet of Things security?
Security is one the biggest issues with the IoT. These sensors are collecting in many cases extremely sensitive data — what you say and do in your own home, for example. Keeping that secure is vital to consumer trust, but so far the IoT’s security track record has been extremely poor. Too many IoT devices give little thought to the basics of security, like encrypting data in transit and at rest.
Flaws in software — even old and well-used code — are discovered on a regular basis, but many IoT devices lack the capability to be patched, which means they are permanently at risk. Hackers are now actively targeting IoT devices such as routers and webcams because their inherent lack of security makes them easy to compromise and roll up into giant botnets.
Flaws have left smart home devices like refrigerators, ovens, and dishwashers open to hackers. Researchers found 100,000 webcams that could be hacked with ease, while some internet-connected smartwatches for children have been found to contain security vulnerabilities that allow hackers to track the wearer’s location, eavesdrop on conversations, or even communicate with the user.
Governments are growing worried about the risks here. The UK government has published its own guidelines around the security of consumer IoT devices. It expects devices to have unique passwords, that companies will provide a public point of contact so anyone can report a vulnerability (and that these will be acted on), and that manufacturers will explicitly state how long devices will get security updates. It’s a modest list, but a start.
When the cost of making smart objects becomes negligible, these problems will only become more widespread and intractable.
All of this applies in business as well, but the stakes are even higher. Connecting industrial machinery to IoT networks increases the potential risk of hackers discovering and attacking these devices. Industrial espionage or a destructive attack on critical infrastructure are both potential risks. That means businesses will need to make sure that these networks are isolated and protected, with data encryption with security of sensors, gateways and other components a necessity. The current state of IoT technology makes that harder to ensure, however, as does a lack of consistent IoT security planning across organisations. That’s very worrying considering the documented willingness of hackers to tamper with industrial systems that have been connected to the internet but left unprotected.
The IoT bridges the gap between the digital world and the physical world, which means that hacking into devices can have dangerous real-world consequences. Hacking into the sensors controlling the temperature in a power station could trick the operators into making a catastrophic decision; taking control of a driverless car could also end in disaster.
What about privacy and the Internet of Things?
With all those sensors collecting data on everything you do, the IoT is a potentially vast privacy and security headache. Take the smart home: it can tell when you wake up (when the smart coffee machine is activated) and how well you brush your teeth (thanks to your smart toothbrush), what radio station you listen to (thanks to your smart speaker), what type of food you eat (thanks to your smart oven or fridge), what your children think (thanks to their smart toys), and who visits you and passes by your house (thanks to your smart doorbell). While companies will make money from selling you the smart object in the first place, their IoT business model probably involves selling at least some of that data, too.
What happens to that data is a vitally important privacy matter. Not all smart home companies build their business model around harvesting and selling your data, but some do.
And it’s worth remembering that IoT data can be combined with other bits of data to create a surprisingly detailed picture of you. It’s surprisingly easy to find out a lot about a person from a few different sensor readings. In one project, a researcher found that by analysing data charting just the home’s energy consumption, carbon monoxide and carbon dioxide levels, temperature, and humidity throughout the day they could work out what someone was having for dinner.
IoT, privacy and business
Consumers need to understand the exchange they are making and whether they are happy with that. Some of the same issues apply to business: would your executive team be happy to discuss a merger in a meeting room equipped with smart speakers and cameras, for example? One recent survey found that four out of five companies would be unable to identify all the IoT devices on their network.
Badly installed IoT products could easily open up corporate networks to attack by hackers, or simply leak data. It might seem like a trivial threat but imagine if the smart locks at your office refused to open one morning or the smart weather station in the CEO’s office was used by hackers to create a backdoor into your network.
The IoT and cyberwarfare
The IoT makes computing physical. So if things go wrong with IoT devices, there can be major real-world consequences — something that nations planning their cyberwarfare strategies are now taking into account.
US intelligence community briefings have warned that the country’s adversaries already have the ability to threaten its critical infrastructure as well “as the broader ecosystem of connected consumer and industrial devices known as the Internet of Things”. US intelligence has also warned that connected thermostats, cameras, and cookers could all be used either to spy on citizens of another country, or to cause havoc if they were hacked. Adding key elements of national critical infrastructure (like dams, bridges, and elements of the electricity grid) to the IoT makes it even more vital that security is as tight as possible.
The Internet of Things and data
An IoT device will likely contain one or more sensors which it will use to collect data. Just what those sensors are collecting will depend on the individual device and its task. Sensors inside industrial machinery might measure temperature or pressure; a security camera might have a proximity sensor along with sound and video, while your home weather station will probably be packing a humidity sensor. All this sensor data – and much, much more – will have to be sent somewhere. That means IoT devices will need to transmit data and will do it via Wi-Fi, 4G, 5G and more.
Tech analyst IDC calculates that within five years IoT gadgets will be creating 79.4 zettabytes of data. Some of this IoT data will be “small and bursty” says IDC – a quick update like a temperature reading from a sensor or a reading from a smart meter. Other devices might create huge amounts of data traffic, like a video surveillance camera using computer vision.
IDC said the amount of data created by IoT devices will grow rapidly in the next few years. Most of the data is being generated by video surveillance, it said, but other industrial and medical uses will generate more data over time.
It said drones will also be a big driver of data creation using cameras. Looking further out, self-driving cars will also generate vast amounts of rich sensor data including audio and video, as well as more specialised automotive sensor data.
Internet of Things and big data analytics
The IoT generates vast amounts of data: from sensors attached to machine parts or environment sensors, or the words we shout at our smart speakers. That means the IoT is a significant driver of big-data analytics projects because it allows companies to create vast data sets and analyse them. Giving a manufacturer vast amounts of data about how its components behave in real-world situations can help them to make improvements much more rapidly, while data culled from sensors around a city could help planners make traffic flow more efficiently.
That data will come in many different forms – voice requests, video, temperature or other sensor readings, all of which can be mined for insight. As analyst IDC notes, IoT metadata category is a growing source of data to be managed and leveraged. “Metadata is a prime candidate to be fed into NoSQL databases like MongoDB to bring structure to unstructured content or fed into cognitive systems to bring new levels of understanding, intelligence, and order to outwardly random environments,” it said.
In particular, the IoT will deliver large amounts of real-time data. Cisco calculates that machine-to machine connections that support IoT applications will account for more than half of the total 27.1 billion devices and connections, and will account for 5% of global IP traffic by 2021.
Internet of Things and the cloud
The huge amount of data that IoT applications generate means that many companies will choose to do their data processing in the cloud rather than build huge amounts of in-house capacity. Cloud computing giants are already courting these companies: Microsoft has its Azure IoT suite, while Amazon Web Services provides a range of IoT services, as does Google Cloud.
The Internet of Things and smart cities
By spreading a vast number of sensors over a town or city, planners can get a better idea of what’s really happening, in real time. As a result, smart cities projects are a key feature of the IoT. Cities already generate large amounts of data (from security cameras and environmental sensors) and already contain big infrastructure networks (like those controlling traffic lights). IoT projects aim to connect these up, and then add further intelligence into the system.
There are plans to blanket Spain’s Balearic Islands with half a million sensors and turn it into a lab for IoT projects, for example. One scheme could involve the regional social-services department using the sensors to help the elderly, while another could identify if a beach has become too crowded and offer alternatives to swimmers. In another example, AT&T is launching a service to monitor infrastructure such as bridges, roadways, and railways with LTE-enabled sensors to monitor structural changes such as cracks and tilts.
The ability to better understand how a city is functioning should allow planners to make changes and monitor how this improves residents’ lives.
Big tech companies see smart cities projects as a potentially huge area, and many — including mobile operators and networking companies — are now positioning themselves to get involved.
How do Internet of Things and 5G connect and share data?
IoT devices use a variety of methods to connect and share data, although most will use some form of wireless connectivity: homes and offices will use standard Wi-Fi, Zigbee or Bluetooth Low Energy (or even Ethernet if they aren’t especially mobile); other devices will use LTE (existing technologies include Narrowband IoT and LTE-M, largely aimed at small devices sending limited amounts of data) or even satellite connections to communicate. However, the vast number of different options has already led some to argue that IoT communications standards need to be as accepted and interoperable as Wi-Fi is today.
One area of growth in the next few years will undoubtedly be the use of 5G networks to support IoT projects. 5G offers the ability to fit as many as one million 5G devices in a square kilometre, which means that it will be possible to use a vast number of sensors in a very small area, making large-scale industrial IoT deployments more possible. The UK has just started a trial of 5G and the IoT at two ‘smart factories’. However, it could be some time before 5G deployments are widespread: Ericsson predicts that there will be somewhere around five billion IoT devices connected to cellular networks by 2025, but only around a quarter of those will be broadband IoT, with 4G connecting the majority of those.
Outdoor surveillance cameras will be the largest market for 5G IoT devices in the near term, according to Gartner, accounting for the majority (70%) of the 5G IoT devices this year, before dropping to around 30% by the end of 2023, at which point they will be overtaken by connected cars.
The analyst firm predicts that there will be 3.5 million 5G IoT devices in use this year, and nearly 50 million by 2023. Longer term the automotive industry will be the largest sector for 5G IoT use cases, it predicted.
One likely trend is that, as the IoT develops, it could be that less data will be sent for processing in the cloud. To keep costs down, more processing could be done on-device with only the useful data sent back to the cloud – a strategy known as ‘edge computing’. This will require new technology – like tamper-proof edge servers that can collect and analyse data far from the cloud or corporate data center.
IoT data and artificial intelligence
IoT devices generate vast amounts of data; that might be information about an engine’s temperature or whether a door is open or closed or the reading from a smart meter. All this IoT data has to be collected, stored and analysed. One way companies are making the most of this data is to feed it into artificial intelligence (AI) systems that will take that IoT data and use it to make predictions.
For example, Google has put an AI in charge of its data centre cooling system. The AI uses data pulled from thousands of IoT sensors, which is fed into deep neural networks, and which predict how different choices will affect future energy consumption. By using machine learning and AI, Google has been able to make its data centres more efficient and said the same technology could have uses in other industrial settings.
IoT evolution: Where does the Internet of Things go next?
As the price of sensors and communications continue to drop, it becomes cost-effective to add more devices to the IoT – even if in some cases there’s little obvious benefit to consumers. Deployments are at an early stage; most companies that are engaging with the IoT are at the trial stage right now, largely because the necessary technology – sensor technology, 5G and machine-learning powered analytics – are still themselves at a reasonably early stage of development. There are many competing platforms and standards and many different vendors, from device makers to software companies to network operators, want a slice of the pie. It’s still not clear which of those will win out. But without standards, and with security an ongoing issue, we are likely to see some more big IoT security mishaps in the next few years.
As the number of connected devices continues to rise, our living and working environments will become filled with smart products – assuming we are willing to accept the security and privacy trade-offs. Some will welcome the new era of smart things. Others will pine for the days when a chair was simply a chair.
What Is Big Data?
Big data refers to the large, diverse sets of information that grow at ever-increasing rates. It encompasses the volume of information, the velocity or speed at which it is created and collected, and the variety or scope of the data points being covered. Big data often comes from multiple sources and arrives in multiple formats.
Big data is a combination of structured, semistructured and unstructured data collected by organizations that can be mined for information and used in machine learning projects, predictive modeling and other advanced analytics applications.
History of Big Data
The term “big data” refers to data that is so large, fast or complex that it’s difficult or impossible to process using traditional methods. The act of accessing and storing large amounts of information for analytics has been around a long time. But the concept of big data gained momentum in the early 2000s when industry analyst Doug Laney articulated the now-mainstream definition of big data as the three V’s:
Volume: Organizations collect data from a variety of sources, including business transactions, smart (IoT) devices, industrial equipment, videos, social media and more. In the past, storing it would have been a problem – but cheaper storage on platforms like data lakes and Hadoop have eased the burden.
Velocity: With the growth in the Internet of Things, data streams in to businesses at an unprecedented speed and must be handled in a timely manner. RFID tags, sensors and smart meters are driving the need to deal with these torrents of data in near-real time.
Variety: Data comes in all types of formats – from structured, numeric data in traditional databases to unstructured text documents, emails, videos, audios, stock ticker data and financial transactions.
How Big Data Works
Big data can be categorized as unstructured or structured. Structured data consists of information already managed by the organization in databases and spreadsheets; it is frequently numeric in nature. Unstructured data is information that is unorganized and does not fall into a pre-determined model or format. It includes data gathered from social media sources, which help institutions gather information on customer needs.
Systems that process and store big data have become a common component of data management architectures in organizations. Big data is often characterized by the 3Vs: the large volume of data in many environments, the wide variety of data types stored in big data systems and the velocity at which the data is generated, collected and processed. These characteristics were first identified by Doug Laney, then an analyst at Meta Group Inc., in 2001; Gartner further popularized them after it acquired Meta Group in 2005. More recently, several other Vs have been added to different descriptions of big data, including veracity, value and variability.
Although big data doesn’t equate to any specific volume of data, big data deployments often involve terabytes (TB), petabytes (PB) and even exabytes (EB) of data captured over time.
Importance of big data
Companies use the big data accumulated in their systems to improve operations, provide better customer service, create personalized marketing campaigns based on specific customer preferences and, ultimately, increase profitability. Businesses that utilize big data hold a potential competitive advantage over those that don’t since they’re able to make faster and more informed business decisions, provided they use the data effectively.
For example, big data can provide companies with valuable insights into their customers that can be used to refine marketing campaigns and techniques in order to increase customer engagement and conversion rates.
Examples of big data
Big data comes from myriad different sources, such as business transaction systems, customer databases, medical records, internet clickstream logs, mobile applications, social networks, scientific research repositories, machine-generated data and real-time data sensors used in internet of things (IoT) environments. The data may be left in its raw form in big data systems or preprocessed using data mining tools or data preparation software so it’s ready for particular analytics uses.
Using customer data as an example, the different branches of analytics that can be done with the information found in sets of big data include the following:
- Comparative analysis. This includes the examination of user behavior metrics and the observation of real-time customer engagement in order to compare one company’s products, services and brand authority with those of its competition.
- Social media listening. This is information about what people are saying on social media about a specific business or product that goes beyond what can be delivered in a poll or survey. This data can be used to help identify target audiences for marketing campaigns by observing the activity surrounding specific topics across various sources.
- Marketing analysis. This includes information that can be used to make the promotion of new products, services and initiatives more informed and innovative.
- Customer satisfaction and sentiment analysis. All of the information gathered can reveal how customers are feeling about a company or brand, if any potential issues may arise, how brand loyalty might be preserved and how customer service efforts might be improved.
Breaking down the Vs of big data
Volume is the most commonly cited characteristic of big data. A big data environment doesn’t have to contain a large amount of data, but most do because of the nature of the data being collected and stored in them. Clickstreams, system logs and stream processing systems are among the sources that typically produce massive volumes of big data on an ongoing basis.
Big data also encompasses a wide variety of data types, including the following:
- structured data in databases and data warehouses based on Structured Query Language (SQL);
- unstructured data, such as text and document files held in Hadoop clusters or NoSQL database systems; and
- semistructured data, such as web server logs or streaming data from sensors.
All of the various data types can be stored together in a data lake, which typically is based on Hadoop or a cloud object storage service. In addition, big data applications often include multiple data sources that may not otherwise be integrated. For example, a big data analytics project may attempt to gauge a product’s success and future sales by correlating past sales data, return data and online buyer review data for that product.
Velocity refers to the speed at which big data is generated and must be processed and analyzed. In many cases, sets of big data are updated on a real- or near-real-time basis, instead of the daily, weekly or monthly updates made in many traditional data warehouses. Big data analytics applications ingest, correlate and analyze the incoming data and then render an answer or result based on an overarching query. This means data scientists and other data analysts must have a detailed understanding of the available data and possess some sense of what answers they’re looking for to make sure the information they get is valid and up to date.
Managing data velocity is also important as big data analysis expands into fields like machine learning and artificial intelligence (AI), where analytical processes automatically find patterns in the collected data and use them to generate insights.
More characteristics of big data
Looking beyond the original 3Vs, data veracity refers to the degree of certainty in data sets. Uncertain raw data collected from multiple sources — such as social media platforms and webpages — can cause serious data quality issues that may be difficult to pinpoint. For example, a company that collects sets of big data from hundreds of sources may be able to identify inaccurate data, but its analysts need data lineage information to trace where the data is stored so they can correct the issues.
Bad data leads to inaccurate analysis and may undermine the value of business analytics because it can cause executives to mistrust data as a whole. The amount of uncertain data in an organization must be accounted for before it is used in big data analytics applications. IT and analytics teams also need to ensure that they have enough accurate data available to produce valid results.
Some data scientists also add value to the list of characteristics of big data. As explained above, not all data collected has real business value, and the use of inaccurate data can weaken the insights provided by analytics applications. It’s critical that organizations employ practices such as data cleansing and confirm that data relates to relevant business issues before they use it in a big data analytics project.
Variability also often applies to sets of big data, which are less consistent than conventional transaction data and may have multiple meanings or be formatted in different ways from one data source to another — factors that further complicate efforts to process and analyze the data. Some people ascribe even more Vs to big data; data scientists and consultants have created various lists with between seven and 10 Vs.
How big data is stored and processed
The need to handle big data velocity imposes unique demands on the underlying compute infrastructure. The computing power required to quickly process huge volumes and varieties of data can overwhelm a single server or server cluster. Organizations must apply adequate processing capacity to big data tasks in order to achieve the required velocity. This can potentially demand hundreds or thousands of servers that can distribute the processing work and operate collaboratively in a clustered architecture, often based on technologies like Hadoop and Apache Spark.
Achieving such velocity in a cost-effective manner is also a challenge. Many enterprise leaders are reticent to invest in an extensive server and storage infrastructure to support big data workloads, particularly ones that don’t run 24/7. As a result, public cloud computing is now a primary vehicle for hosting big data systems. A public cloud provider can store petabytes of data and scale up the required number of servers just long enough to complete a big data analytics project. The business only pays for the storage and compute time actually used, and the cloud instances can be turned off until they’re needed again.
To improve service levels even further, public cloud providers offer big data capabilities through managed services that include the following:
- Amazon EMR (formerly Elastic MapReduce)
- Microsoft Azure HDInsight
- Google Cloud Dataproc
In cloud environments, big data can be stored in the following:
- Hadoop Distributed File System (HDFS);
- lower-cost cloud object storage, such as Amazon Simple Storage Service (S3);
- NoSQL databases; and
- relational databases.
For organizations that want to deploy on-premises big data systems, commonly used Apache open source technologies in addition to Hadoop and Spark include the following:
- YARN, Hadoop’s built-in resource manager and job scheduler, which stands for Yet Another Resource Negotiator but is commonly known by the acronym alone;
- the MapReduce programming framework, also a core component of Hadoop;
- Kafka, an application-to-application messaging and data streaming platform;
- the HBase database; and
- SQL-on-Hadoop query engines, like Drill, Hive, Impala and Presto.
Users can install the open source versions of the technologies themselves or turn to commercial big data platforms offered by Cloudera, which merged with former rival Hortonworks in January 2019, or Hewlett Packard Enterprise (HPE), which bought the assets of big data vendor MapR Technologies in August 2019. The Cloudera and MapR platforms are also supported in the cloud.
Big data challenges
Besides the processing capacity and cost issues, designing a big data architecture is another common challenge for users. Big data systems must be tailored to an organization’s particular needs, a DIY undertaking that requires IT teams and application developers to piece together a set of tools from all the available technologies. Deploying and managing big data systems also require new skills compared to the ones possessed by database administrators (DBAs) and developers focused on relational software.
Both of those issues can be eased by using a managed cloud service, but IT managers need to keep a close eye on cloud usage to make sure costs don’t get out of hand. Also, migrating on-premises data sets and processing workloads to the cloud is often a complex process for organizations.
Making the data in big data systems accessible to data scientists and other analysts is also a challenge, especially in distributed environments that include a mix of different platforms and data stores. To help analysts find relevant data, IT and analytics teams are increasingly working to build data catalogs that incorporate metadata management and data lineage functions. Data quality and data governance also need to be priorities to ensure that sets of big data are clean, consistent and used properly.
Big data collection practices and regulations
For many years, companies had few restrictions on the data they collected from their customers. However, as the collection and use of big data have increased, so has data misuse. Concerned citizens who have experienced the mishandling of their personal data or have been victims of a data breach are calling for laws around data collection transparency and consumer data privacy.
The outcry about personal privacy violations led the European Union to pass the General Data Protection Regulation (GDPR), which took effect in May 2018; it limits the types of data that organizations can collect and requires opt-in consent from individuals or compliance with other specified lawful grounds for collecting personal data. GDPR also includes a right-to-be-forgotten provision, which lets EU residents ask companies to delete their data.
While there aren’t similar federal laws in the U.S., the California Consumer Privacy Act (CCPA) aims to give California residents more control over the collection and use of their personal information by companies. CCPA was signed into law in 2018 and is scheduled to take effect on Jan. 1, 2020. In addition, government officials in the U.S. are investigating data handling practices, specifically among companies that collect consumer data and sell it to other companies for unknown use.
The human side of big data analytics
Ultimately, the value and effectiveness of big data depend on the workers tasked with understanding the data and formulating the proper queries to direct big data analytics projects. Some big data tools meet specialized niches and enable less technical users to use everyday business data in predictive analytics applications. Other technologies — such as Hadoop-based big data appliances — help businesses implement a suitable compute infrastructure to tackle big data projects, while minimizing the need for hardware and distributed software know-how.
Big data can be contrasted with small data, another evolving term that’s often used to describe data whose volume and format can be easily used for self-service analytics. A commonly quoted axiom is that “big data is for machines; small data is for people.”