Mastering content aggregation with Northern Light at KMWorld Connect 2021
Enterprise search solutions are built on the assumption that content repositories are on the corporate intranet, so that content aggregation and integration are not problems that the search solution need address beyond being pointed at the repositories on the enterprise network.
But for some use cases—competitive intelligence and market research come to mind—much valuable content is from third-party sources.
The process of aggregation from multiple external parties is operationally challenging to manage due to a myriad of publishing systems, document formats, metadata conventions, publishing schedules, and technical access gateways, not to mention complex and widely varying document access rights.
At KMWorld Connect 2021 David Seuss, CEO, Northern Light, reviewed the dizzying array of issues, as well as the strategies for addressing them.
The problem of content silos is where it all begins, Seuss said. Content is not used because it is fragmented between these silos. This creates a nightmare scenario from the users perspective.
Business research requires content of a different nature than that required by internal operational systems, he explained. It’s about the external world and it’s coming from the external world. We need a KM system to put it all together.
The IT department is in control but when it comes to aggregating content outside the company there is a mismatch.
“There’s this huge mismatch between what the publishers want to do and the IT department expects to do with the content,” Seuss said.
Northern Light’s approach to the research content providers is flexible for publishers, he explained. It requires a complicated set of skills and activities that IT isn’t responsible for.
Northern Light can use a variety of aggregation techniques including APIs, FTPs, RSS feeds, and more. Manual creation is also another option for indexing.
Licensing compliance must be built into the solution, he noted. Though web content is unruly, it offers important data. He recommended looking at news from informed industry journalists, conference presentations, thought leader research, competitor’s white papers, and more.
The content found on webpage’s is almost always copyrighted, he explained. Copyright compliance must be built into the solution. You cannot redistribute copyrighted material, even internally at your company.
Fair use provides guidance on what one can and cannot do with the information. It permits the aggregation and indexing full-text for search.
Normalizing search across many disparate sources includes capturing the metadata from each source and index all the full-text content for all the sources with the same search technology and relevance ranking algorithms. Deeply tag all the content from all sources with all relevant taxonomies.
“The future belongs to those who can make the most effective use of the enormous amount of insight loaded content that exists in the business and technology research publishing community, and on the web,” Seuss said.
KMWorld Connect 2021 is going on this week, November 15-18, with workshops on Friday, November 19. On-demand replays of sessions will be available for a limited time to registered attendees and many presenters are also making their slide decks available through the conference portal. Access to session archives will be available on or about November 29, 2021, so be sure to check back for on-demand replays. For more information, go to www.kmworld.com/conference/2021.
Companies and Suppliers Mentioned