What’s this I hear about a new Knowledge Base? #PQsummit

Presenter: Yvette Diven, Product Manager Lead, Data Services (ProQuest)

Diven provided a status update and more information on the work to improve the ProQuest knowledge base (KB).  The original KB has been around for more than 15 years —  originally called “Serials Solutions Knowledge Works” — and focused on e-resource metadata (titles, database, provider, dates, etc).  The KB provider is responsible for getting content from providers and cleaning and normalizing the data.  Because the KB is centralized and singular, it services across all products.  That is the benefit.   Some reasons Diven outlined for new approach to building an improved KB include:

  • Scope: now global and more diverse (audio, streaming video, etc)
  • Scale:  cloud-based capabilities
  • Systems: need for speed and efficiency  (time to work with providers and to get systems updated)
  • Services: APIs  and interoperability. BIG PART OF THE NEW KB

Some current efforts in this include addition of OA titles and packages, related titles and formats, A&I coverage information [does this include title level?], more descriptive content, the ability to add and describe more and new content types, inclusion of more information about package changes over time.

Libraries benefit from the ability to have the entire collection centrally supported, follow and manage new and emerging business models (e.g DDA), and can  see and share integrated data from many sources.

It is an evolution, transformation —  not a migration.

Integration of data is enhanced by new “relational” data model that is built on FRBR and RDA, and available to share via API and interoperability.

What it means for library workflows can be more effective assessment through a united view of related resources, more efficient ability to track changes of title and packages over time, and expanded coverage for more effective and automated overlap analysis.

What it means for research is improving discovery though more relational data points that include recommendation, impact, and  additional vocabularies.

Questions from the audience:

Q: Integration of various metadata, RM index separate from Summon, talk more about integration of these two?

A:  The knowledge base is the supporting metadata for the full content that is in Summon.   Summon will be able to gain additional, richer data about the resources it indexes. Availability data does not live in KB, but holdings data is.  The API will work to query availability in ILS.

Q: Talk about preservation data and the ability to add in.

A:  Have the ability to do that, just a matter of getting it in.

Q: Problem with ProQuest support passing the buck to libraries to report to publishers their article link errors or missing titles in packages.

A: Because OpenURL is often the cause, we are working with vendors to implement the new IEDL technology, which is being implemented by other discovery systems.  GALE (number one offender), reports they have implemented.

Q: Is there a continued commitment to fix errors, especially as FRBR may introduce more opportunity for calling things different things and create potential for errors.

A: Provider education is an entire group of staff at ProQuest committed to addressing this.

Q: New content types are great, can you speak more to enhancing the metadata for these, especially classification of streaming content?

A: The structure is now there in the knowledge base, now about getting that from providers.

Q: something more important than streaming metadata?

A: Ebook data quality Goal to have a single knowledge base  without exceptions for MARC (other) records.   (e.g. one of the status fields created in Intota is Subscribed – Local MARC records; would like to stop having to use that).

Q: Timeline for product onboarding?

A: Using new KB currently in Intota. Q1 of next year  planned for Summon, Ejournal platform, and Link.

  • [Followup Q/A asked by this reporter]: The new KB does not and is probably not planned for future to that include onboarding for Client Center RM because of the architecture is not able to support it (swimming pool to tin can problem).


  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: