Did you know it would take 1.2 million mosquitoes, each sucking once, to completely drain the blood in an average human? My latest post about the covid-19 publicly available app has 5 815 views up to now. The text analytics market value will be US$ 18.25 Bn by 2025. Interesting, isn’t it?
Do you think those numbers are correct? Is it even important? How many posts and articles do you read-only because they contain numbers in the headline or first one or two sentences?
People are fascinated by numbers. I believe there are many theories why it is so – about the evolution, genes, collective behavior… I like numbers, too. It’s comfortable to know exactly how many breads I should buy, to see numbers when comparing two possible scenarios before making the decision where to go for a vacation, to watch the reach of my posts in social media.
“If you can’t measure it, you can’t improve it.”
Data is something I can use to improve my business, my decisions, my lifestyle…
All in all, data is here to be measured.
Measures are numbers.
Data is numbers.
Wrong.
…but somehow many people think like this. Despite the fact that 80% of all data we generate is unstructured (not sure if the number is correct, but it looks better than “much”), we are still focused on analyzing and measuring numbers.
A few years ago, at Inphinity (even before it was named Inphinity and the team was a part of Emark) we developed the Mole unstructured data connector – the first Qlik custom connector for unstructured data. To be honest, it doesn’t have hundreds of users today as we expected.
WHAT DO QLIK LUMINARIES THINK?
Why are we so focused on numbers? According to Konrad Mattheis (Deloitte): “The first reason is that there is a lot of structured data. Secondly, most human beings need and want structure – unstructured is chaos, unpredictable, for them.”
However, many leaders with years of experience in the data analytics universe discuss the importance of unstructured data (see the article by Martin Kostic) or already deal with it (Visualizing HR 660 by Dalton Ruer). If you are interested in some creative analysis, check US Presidential Speeches Analysis by Terezia Blaskova (Emark).
I can nothing else than agree with Adrian Parker (Differentia Consulting): “For the full story, we need to show both structured and unstructured data in a single tool.”
The opinion of Radovan Oresky (Emark) is quite skeptical but also confirms the value in usage all data, not only structured: “Unstructured data will surely have its part in the future (not the only one). But I am doubtful, that it will be mainstream.“
Angelika Klidas (Data literacy evangelist) confirms it, too: “I think the power is within the combination of the two, so yes I do believe that unstructured data is in combination with structured the future.”
For the full story, we need to show both structured and unstructured data in a single tool.
All mentioned leads me to the conclusion that unstructured data needs to be analyzed together with structured – it should be accessible and easy to understand for the users. We are on the way to “avoid the data racism” but it is a step by step process. Let’s start by combining the data together…
COMBINING DATA TOGETHER
Thanks to Inphinity Forms new feature, the user can simply upload any file to the Qlik app. When you combine it with the Mole unstructured data connector in the background, the result can be a simple “file search engine” that I think many users will appreciate. “Show me all documents where Customer X or John Doe is mentioned.” Connect your files to structured data and create an overview of your sales team together with their CVs and soft-skills or language certificates. Use a Qlik associative engine to identify anomalies in invoice payments by using clients’ data and invoices in the very same Qlik app. Check if the customer satisfaction correlates with the documentation of the project your team provided.
I think we are on the same page it doesn’t end here. And I don’t think only about almost sci-fi scenarios how the AI will rule the world. Unstructured data can be crucial for specific but very important use cases, exactly like Radovan Oresky (Emark) mentioned: “The main effort will still be to capture the most important data in a structured, easily analyzable form (with help from RPA). Analysis of unstructured data will be used to either enrich key analysis or be used for very specific, niche use-cases (customer/population sentiment, fraud detection, context search).”
After we become more familiar with it, I think we will come with more and more use cases on how to use it and get value from it. Also, the technology needed for its processing will become more accessible. Some of us already think this way. Dalton Ruer (Qlik) is a person I really admire for his ideas and opinions when it comes to the technical future of data analytics. When we were in touch about this specifically, he emailed me (except for the comment that the video needed sound): “I don’t believe it alone holds all the answers as it generally requires very sophisticated NLP to make sense of it to get meaning. But getting data into the application is at least half of the battle. Because I could then call out to whatever R/Python NLP library I want to use.”
Getting files with unstructured data into the application is at least half of the battle.
Yes, there are known workarounds on how to process unstructured data in Qlik. However, it is not easy or intuitive. Not even for a Qlik guru like Rob Wunderlich (Panalytics, Inc.): “I’ve occasionally dealt with unstructured data either brute-forced it with Qlik Script or wrote a processor/extractor in another language. I have usually shied away from PDF as a source because it can be so complex. So, your solution for PDF looks great!”
Unstructured data is just a type of data. Files are only other data sources. In my opinion, it is like with women in tech – there is a long journey until we truly forget the differences (men and women, structured and unstructured) and simply said: until we will analyze “all data” – but we know it’s time to start… somehow. And you can start by downloading the app in the video by yourself here 😉.
SOME INSPIRATIONS
Oh, yes and some inspirations how something like this can be used right now 😉:
Adrian Parker: “This solution is great for empowering users to self-serve with sensitive data recorded on unstructured documents. Delivery accurate and rapid results. Every department and business function will have this type of data. It can reduce exposure to compliance risk, e.g. the GDPR. If the data is stored only in one place it can be found.”
Angelika Klidas: “Uhm, this is more from an HR perspective, what is handy due to the fact that from the systems I know certificates are just copied, or scanned and added in the personals file. I love it that you can add it to a list and that the PDF is saved as well. I like what I see, but I have to think a bit further… it could also work for adding budget information, or expected deals to close, etc…”
Radovan Oresky: “Review, search and semantic analysis in Construction (engineering documentation), Legal (contracts, laws), Manufacturing / High-tech (product documentation, supplier contracts), Consulting (deliverables, knowledge base).”
Me: “I personally used it a year ago for managing all those presentations I downloaded from Qonnections 😁. You can start with something straightforward like CVs and certificates, legal documents, or technical documentation. Once you become familiar with unstructured data, I think many more ideas will come.”