“The most important contribution management needs to make in the 21st Century is to increase the productivity of knowledge work and the knowledge worker”, said Peter F. Drucker in 1999, and time has proven him right.
Even NASA is no exception, as it faces a number of challenges. NASA has hundreds of millions of documents, reports, project data, lessons learned, scientific research, medical analysis, geospatial data, IT logs, and all kinds of other data stored nation-wide.
The data is growing in terms of variety, velocity, volume, value and veracity. NASA needs to provide accessibility to engineering data sources, whose visibility is currently limited. To convert data to knowledge a convergence of Knowledge Management, Information Architecture and Data Science is necessary.
This is what David Meza, Acting Branch Chief – People Analytics, Sr. Data Scientist at NASA, calls “Knowledge Architecture”: the people, processes, and technology of designing, implementing, and applying the intellectual infrastructure of organizations. You can listen to the Podcast below:
Transripts below
If you are more of a visual type, you can also watch the presentation
Slides available here
—Transcripts —
David Meza 00:00:00
Thank you for awhile. I’m glad to be here. As Nigel said, my name is David Meza. I am the chief knowledge architect at NASA stationed out of Houston, Texas in the USA. First of all, thank you, Nigel James Ziggy, then, and the rest of the data to value to invite me here. I really appreciate that. A great opportunity for me to get up here to London first, visit to London, really enjoying it so far. And I’m planning to get out there a couple of days, and then right, we explore your, your fine city here. So by a little bit about myself. So you kind of understand where, where I’ve progressed through the years, primarily 19 background, 20, 25 years in 90, doing all different types of development within it all the way back to windows development used to be an SQL I’m actually. And when you remember access 1.0 2.0, I was developer an accent SQL and windows applications.
David Meza 00:00:48
I never thought of that point that these web applications would take off with the text of a windows developer and realized that I made a big mistake and drop that started going into a lot of web development. About five years ago, I was asked to join the knowledge management group at NASA. For those of you in knowledge management is just the, the strategies behind how you take data and turn it into knowledge. So what I’m gonna talk to you a little bit about today is why connected data, why we’re here a little bit about my path to lead the knowledge management group to, to this world of being able to connect our data together. And then I’m gonna talk a little bit about Knowledge architecture. Well, how do I define all architecture kind of hot position that created for myself at NASA to be able to take this information and really turn it into valuable knowledge and look at some opportunities that I think we can use this type of information in, and then I’ll have a little time for questions at the end.
David Meza 00:01:44
So connected data, why connected data? You can see this is just your, your standard star chart with all the different constellations up there. We’ve been using this for centuries years, sailors back. So use the, the big Debra with the, I’m sorry, with the north star, trying to, to locate there’s a good direction. So, you know, we’ve been using connected data for numbers of years for centuries. Wait, before back when you, what it was called, the main reason is because our brain works on unconnected data. Our brain is probably the largest connected data or graph network graph we have available throughout that. You have a bunch of entities inside of there. It’s a lot of connections firing back and forth. That’s how we visualize. That’s how we say things. And we see patterns, our brain see patterns a lot quicker than trying to read through a list.
David Meza 00:02:29
So utilizing graph you like isn’t connected data allows us to see information a lot easier. So, and why is this important to our organization? I always start off with a little introduction to Knowledge architecture and the value of inflammation. Peter Drucker, back in 1999, real famous business management analyst at the most important contribution management needs to make in the 21st century is to increase the productivity of knowledge work and the knowledge worker it’s important because we’ve got such a great explosion data. Gardner analysts back in 2012, I believe said that bike 2018, there’s going to be 8000% increase in and 80% of that’s going to be unstructured. I think they’re going to fall well, short of that, mark, I think has gotten, I’m sorry. I think that mark itself sort of, where are we going to be? It’s going to be much larger. They a hundred percent back in 2012.
David Meza 00:03:21
So we’ve got a lot of data coming out there and unstructured data is probably our greatest amount of information. You see it on social networks, you see it on in your customer logs and your customer information. Now we see it all the time in our project reports and we don’t have weed right now. I have a very difficult time getting that information out of our project. I’m going to show you how we’re trying to get that information out of there to give engineers or scientists or medical doctors, the ability to get that information a lot quicker as we go through it. Yeah. How do we do this? Well, let me talk a little bit first about challenges as we go through here, give a little idea of what’s going on in NASA, down at the map down here, you see all the different centers we have out there.
David Meza 00:04:01
We have 10 centers and about another seven or eight facilities. So about 18 different locations out there throughout NASA. We have hundreds and hundreds of millions of dots documents that get generated each and every year. And it’s just growing and it’s growing by leaps and bounds. What are we up to now? Five, seven V’s and, and define a big data. He used to be volume veracity and what variety now they’ve added value and demand in so many different beats trying to be pre-baked. Bottom line is big. Data is just getting bigger and far as far as the problem is. Yeah, well, I had to find big gain a little bit different than most folks. If you don’t have the computing power to take it very, your data, doesn’t matter how if, whether you’ve got 10 gigabytes for 10 petabytes, it’s big data to you. If you don’t have the ability to take care of that.
David Meza 00:04:45
So don’t think just because you have 10 gigabytes and you don’t have big data, you do, and you have an issue there. You got to look at it, you’ve got to figure out how to handle that information. Of course, we’ve also had problems with the accessibility of our engineering data sources. I don’t know how you guys have with organizations, but most engineers we found have to look, look, try to look at 13 different sources to try to find the, you mentioned they’re looking for, and even then they’re not, they’re still not finding it. So we’ll talk a little bit about some of the opportunities in search to help those engineers do that. And of course our visibility is limited. Primarily our visibility is limited because engineers, it’s just a very protective of their data. They don’t want to let it go. So we have a lot of siloed data and that’s another area that we have to try to break through as we work through it, work through our issues.
David Meza 00:05:32
So talk a little bit about Knowledge architecture and how I define knowledge. Architecture, knowledge architecture to me is the combination of three different disciplines, knowledge management, information architecture, and data science, knowledge management is a strategy primarily that we utilize in order to turn that data into knowledge, all the different strategies, whether it’s lessons learned, case studies, positive, learn, different types of techniques that we use to make sure we capture and store and make that information available to our end users, information architecture or the pipes or the data that transmits the data to the end users as to your neighbor Seit group data science is the methodologies, the algorithms, the models necessary to transform that data into knowledge. Okay, so you got knowledge management, which worries about the knowledge information architecture, which moves that data, but then the data science of transforming that data into knowledge. If you don’t have all three of those, you’re going to have a difficult time trying to extract that information or that data knowledge out of your data. I’ve worked with a lot of organizations that worked throughout different centers at NASA. Everybody has generally all of these within the organizations more often than not, they don’t work together.
David Meza 00:06:48
And I’m gonna tell you a little story. I’m not going to read all this to you. You guys can look at the slides, but why they don’t work together. But when I told you five years ago, when I first started and the management group, they were having a bi-week by monthly meeting, we go in and meet with the taxonomist, my information group, my information architecture group, and in our data sciences group, they get into the room and they start talking. And within five, 10 minutes, I realized we had a real big problem here at the, the taxonomists is talking about needing metadata, wanting to make sure there was metadata attached to the document for the classification and organizing, okay. The it guy was saying, we got through all your metadata. It’s all there. No, don’t worry. No, we don’t have metadata. So they went back and forth for five or 10 minutes.
David Meza 00:07:33
And I realized that even though they were using the same definition of better data, data on the data, they were talking about two different things. The taxonomist is looking for metadata on the documents, the title, the author, is there an abstract, was there information about those documents that could help them classify those documents based on their taxonomy or in their ontology? The information architect was looking at the Meda data, as far as the size of the file, whether with Boolean, was it string? Was it, what was it? Yes, but more like from a database development type. So they, while they were looking at this talking the same language, they were totally missing each other. And that’s kind of what, the first, when I started thinking about knowledge architecture, you need to have a group, an individual, a group of people that can talk all three of those languages in order to be able to communicate across each other.
David Meza 00:08:22
Once I started to get them to understand that they were, that they were really talking about two different things and got through working on the same page, we started to move a little forward. So you really need to look at your knowledge architecture and defy that, and try to understand how you can work with all three different groups. And why is this important? This is Eric Schmidt. The former formerly of Google made this com made this comment that do we have an opportunity because there’s so much information out there that ubiquitous information and information is power. Now I’m going to play a semantic game here because I’m going to challenge that because I don’t think the information is the power. It’s the knowledge you get from that information with the actual power. And I know that’s, again, I’m playing with words here, but I just want to make stress a point.
David Meza 00:09:04
You can have all this information, but if you’re not able to take that information and turn it into something that’s useful to your end users, then you haven’t done nothing, but just store a lot of information. So you have to make that information. But what this great power of information comes a great responsibility, that responsibility, he belongs to you guys in order for you to be able to take that knowledge, that information there and turn it into viable Knowledge, doing it in a way that users can, can actually see it. They can develop the patterns and they can use that information within their daily work, whether it is whether it’s more from a customer consumer base or whether you’re working internally from a technical or project management aspect, there’s a lot of work that can be done. It’s all it’s up to you guys to be able to do that. Let’s talk a little bit about areas of opportunity, and if you guys have any questions, feel free to chime in at the time up in the meantime, if I can answer something.
David Meza 00:09:55
So as I look at my opportunities, I decided to focus on three different opportunities that were around areas that were giving me headaches. One was Sarge, and we’ll talk a little bit about why that’s such a big headache. The other one was storage, and then data-driven visualization by data-driven visualization. I mean, the data is to drive how you visualize that to your end. You should not the other way around individualization and put it out there and hoping that it meets the end-users needs. And these are the different types of aspects. I break up Knowledge architecture into four different layers, access and storage, or you’ll see all the different types of databases, excess storage, whether you got a graph database, a document database, social standard, relational databases, and flat files integration. How do we actually get that information out of the data sources? You’ve got to your analysis, where you do your different data, modeling your data science activity, you use your algorithms, and there’s so many different types of possibilities there for, for doing models.
David Meza 00:10:54
Whether it’s R Python, math lab, SPSS was a lot of different tools that are available. And then of course at end began this visualization. That is the real key to me. At least that’s the area I focus on because we fail a lot on how we ended up visualizing it, you know, more often than not. When you get up into a website and you try to find information, what do you get? You get pages and pages, pages of lists of different documents. You’ve got to click on each and every relate to make sure you get to the right information. We can do better than that. So let’s talk about opportunity. Number one, service in the enterprise. And I want to differentiate between search in the enterprise and search out in the, in the public because two different things. Cause Google’s really spoiled us on how we find information.
David Meza 00:11:37
You know, I constantly, when I walk around the center and I talked to people about searches and what have you, you want, what do you want out of search? We want it to be just like Google. Well, I’m sorry, we can’t do that. And I’ll explain to you why in a minute, my hair’s the reason w that started me is very, very important. 46% of our workers can’t find information about half the time. That’s impressive. 30% of total R and D funds just spent to redo what we’ve already done once before. And I’m going to give you a story in a minute, how that being able to improve upon our search, help us there. And then 54% of our decisions are made with incomplete or inconsistent or inadequate information. That’s scary, especially my what kind of work. And we’re trying to get people up to moon and Mars.
David Meza 00:12:21
And we’re only using the 54% of our knowledge to be able to, to actually make decisions. We have to improve upon how to do that. So one of the reasons Google doesn’t work for us are often that Google is a keyword search based engine. I’m going to be very high-level here and I’m going to oversimplify it. And I’ve done this presentation in front of Google folks before, and they’ve basically dotted their hand said, we totally understand. We totally agree. So I’m not saying anything that they’re, they’re gonna argue too much about, but primarily when you take a document, you put it in there, they break it up into keywords, and then those keywords get basically put into the algorithm and it’s broken down and it becomes nothing but a bunch of keywords that you look for for in your document. And while in the, in the public at works, maybe.
David Meza 00:13:08
Yes, well, Google’s algorithm basically what to do the index, everything. First off we can’t, and we can’t afford to index everything. We over a couple of hundred million documents. We probably only index 10 million. So it’s not, not a whole lot of it. It’s a one 10th of our known data. Are we able to index because that’s all we can afford to do. We use, we used to have a Google search appliance, then they apply about 200 different algorithms on top of that to try to model and classify and organize it. Then it’s up to the users by patronizing, the more oftentimes the pages click their link or selected the higher. It goes up again in the rank, right? Well, again, okay, that works. If you’re Google, Google gets 5 billion queries a day. So from that aspect, you can divide. You can develop a very good model is going to tell me within certain standard deviation, if I’m looking for something, I’m going to find it right there.
David Meza 00:13:58
At the top of that curve, you get to the 5 million queries that the 5 billion queries a day, I’m sorry. My aspect of JSC Johnson space center. We have a thousand calories a day. There’s probably 10,000 people at Johnson space center. And maybe we get 10,000 calories a day. And this is where the semantic part of it will help us as we go forward, because we may have people that are looking, I’ll take, for example, mercury, I have three different ways of looking for mercury, the planet, mercury, the element or part project mercury, which was the program back in the late fifties, early sixties of a rocket ship that we were testing as we’re going through this, we’re getting ready to go to Apollo and onto the moon. Well, if somebody has been a scientist or doctors been looking for mercury, the element or looking at Marine or national fits, is it looking for mercury?
David Meza 00:14:45
The planet, that page rank is going to be way over here. It’s going to give me a selection up at top of the curve on the left hand side, but what I’m really looking for something on project mercury as an engineer, something that hasn’t been looked at for 50 years, I’m never going to find that, or it’s going to be on a list, you know, 200 pages down. It’s gonna be it mayor very difficult for me. That’s why I believe Google in the enterprise or keyword based search in the enterprise applied like this. Well, not work. We have to do something different. So we went ahead and we started doing some yeah evaluation. I got my team together as a, let’s try to figure out what we can do. How can we look at starch? So we looked at many different types of search, faceted semantic search, computational cognitive, natural language Aquarius.
David Meza 00:15:28
And we’ve done a lot of identifying the different types of search. We really figured out there is no one solution. You have to use a hybrid approach depending on your data. And it’s going to instinct really going to have to decide more on your ability, your funding, and what you can utilize in different types of tools that are out there based on the data that you have. But some of the essentials there are, you should have a master data management plan of some time work with your knowledge management group, work with your taxonomies, work with those people that help you understand and get a data management plan into your organization to understand how you’re going to capture it. How are you going to store it? How are you going to use it that way? Everybody’s kind of on the same page. We’re constantly having problems across NASA with all the different centers there.
David Meza 00:16:10
There’s a common. Usually we go to the center as a kind of a running joke. If you’ve been the one NASA center, you’ve been to one NASA center because where it’s totally different, everybody does everything differently, all the way down to her. It wasn’t up until about five years ago that I could actually connect to some of the other centers because they were blocking the ICU different protocols. At one time, it was Cisco versus Novell. And, and we had a lot of issues there, but we’re getting better. We’re starting to work with already. They’re trying to communicate with each other. Okay. The other thing is focus on your critical data. Yeah. Don’t try to do everything. What’s important to you, your organization, your people. You know, I looked at our search, our search engine or search laws, and they were really trying to focus on everything.
David Meza 00:16:56
The easy things, which were quick FAQ, frequently asked questions, or how to, how to apply for scholarship information, things that really weren’t necessary to the engineers and the scientists. We need to focus on your critical data, develop standards across your organization. Now, for my aspect, we have a lot of contractors at JSC. There’s probably besides the, the, the us government that runs the NASA facility. We also have about 50 contractors that work there. So you have 50 different ways of doing things on top of the many different ways that the government does. So trying to develop standards now in your organization, if you, your, your supply line, you may want to make sure that up and down your supply chain, everybody’s talking the same type of language that you don’t have the confusion back and forth between each other. So that that’s fairly important. Now analysis is essential.
David Meza 00:17:48
Don’t just think that the tool that you purchased or utilize is going to do it all for you get some data scientists in your group, get some folks that can do the algorithms and understand that can understand within that black box, that the people trying to sell you, that you’ll have an ability to make some of the decisions yourself and metadata. And actually I think to stress that metadata metadata metadata is because if you don’t have the metadata data on your data to make it even harder for you to find things, to make sure you have a good way of collecting that metadata within your organization. And some of the analytics that you can do on that, that on your document, you can pull it out entity extraction and get some of those information and apply that back to your document. Through that, through those analysis, that was a quite helpful do parts of speech, topic analysis.
David Meza 00:18:35
So in talking to our users, these are the top of things they were looking for. As you can see, nobody in here said keyword search they wanted, but when they really started talking to him and ask you them, they didn’t say we want keyword search. If we want semantics arch, we want our ability to connect between what we’re actually asking for. You know, if I’m saying project mercury is going to understand that I’m talking about a program or a project or an engineering, not a planet or an element because of the words that are connected together can get me to the point of what I’m really looking for. Cognitive computing is also something you need to look at. And that’s where we can do things like clustering or topic modeling in order to extract. And I’ll give you the quick example, a little bit about topic, body and how we were able to use that in our lessons learned database for a particular engineer, fascinating.
David Meza 00:19:23
If you’re not familiar with faceting, you can you do it all the time and Amazon, or some of these other for you, were you looking for a TV? And then it’s actually what type of TV? And you break it down, those of your facets that they wanted some type of facets at key point, at least in my analysis was pository specific searches in your organization. Of course, make sure that you’re looking at particular information, a group of engineers in particular engineers that may be looking at more from a propulsion system, really don’t want to look at medical information that came from a radiation analysis. They really want to have specific information what they’re looking for. So as you identify your critical data, make sure you understand where those repositories are and be able to build indexes on your searches. Based on those specific repositories, you can search them all if you want, but the ability to get down and look at specific searches as you go through that, of course, to save time, they want to be able to save searches and they want to have alerts and alarms.
David Meza 00:20:15
Basically, if you have a new document gets put into that index, it tells the person here and you document the metric periods criteria is now available for you to look at. So those are some of the things that I kinda, that’s kinda my, my highlight approaches to Knowledge architecture and what we do, but kind of gives you an idea for how we apply that to search. So time for little story kind of goes back to the start. I had a sad, very sad engineer, and this is, this is true. The story happened about six, eight months ago. Now the Orion project, which is the next capsule, we’re trying to up to two, as we go to Mars, they were doing a test and Ryan is fairly similar to the old Apollo. If you remember the Apollo, when it would come down to would land in the water, do you think what opened up and, and upright the Apollo system, the Apollo capsule, while they’re doing a test on the array and we had a partial failure.
David Meza 00:21:04
So one of the balloons did not open up. And so they had to figure out what could they do? How could they find some information in order to do, to redefine or redesign their, their, their engineering work on this capsule, Nora, for them not to be delayed because they were having a problem. So in doing this, they went and searched everything they could do in our standard, Google search, applying stars, they spent hours and they actually come for two or three weeks. If I remember to finally look through all the information, but if you remember only one 10th of our data was index. So they really weren’t looking at the right information, but they didn’t know that. So they didn’t went to our JSC historians. They were the keepers of everything that we’ve had over the last 60 years. The historian unfortunately had a kind of a, kind of another, another keyword based search capability within a database they had, and they can only get a keyword off of the title.
David Meza 00:21:59
And that was it, nothing internally within the document, that’s it? That historian spent eight hours looking for information, found three documents, spend another three hours trying to review those documents to see whether they were of any value, because some of these documents were a couple of hundred pages long. So they had to try to search inside of there to get that information very, very difficult for them. The engineer, I even went so far as to go and look for retired engineers, go to their homes, talk to them. And couple of them said, wait a minute, I got some documents up in the attic, brought them down. He was getting pretty frustrated. They were looking at having to redo the whole thing, going to take them two years to do a test article for development testing, and probably spend a couple of million dollars. Now here’s where the white Knight comes in. That’s me. You haven’t
speaker 1
speaker 1
00:22:44
Figured out yet. We were having a, we’re
David Meza 00:22:47
Doing the pilot of a, of a application that allows us to do everything that you’d talked about. Semantic search clustering, a lot of different things. I had included the historian in on this, the JSE historian and she, and she said, well, look, why don’t I went to the engineer? Why don’t you try to utilize it? That this tool engineer went in there. I went to the tool. Well, first of all, this is our standard, no search capability that the engineer went through. Typical Google search applied to all the different lanes you go down and there’s some things about uprighting system, but they’re very, very current, maybe a photograph or something that was talked about in an article or something. So not in a real good information. This is what my application will look the love life that we were prototyping. I didn’t here. You have right down the center, right down the center.
David Meza 00:23:37
Here. It is your kind of your keyword search with a lot of semantic type information connected to the Apollo law, right in the system, excuse me. The other benefit of this, it takes you right to the point of the document that that query is answered. So rather than having to read them 200 pages, you get right to your point. On the other side of here, you had your repositories Pacific Sergeant can click in here and tell it which repository you want. You have your savior arch, everything that we talked about from our, from our requirements, here’s how we can facet it. If there were more indexes that we were looking at, we’d have more facets. Chip attention was one. Repository is one limited, and this is where our cognitive computing or clustering and topic modeling. We have a lot of different parameters, costs and effects that we can make the connection semantically back and forth to this uprise system here.
David Meza 00:24:25
We’ve got our prior system made up of compressor, inflated bag or an airbag. So as you see, the engineer can focus in on the information and he won it in three hours. He found 200 relevant documents in the first 10. He found the answer he was looking for. We saved him those two years and $2 million at this point, you know, looking at this. So needless to say, my management was very happy and we’re going forward to implementing this into production. So very, very good tool. We were able to utilize that again, primarily because of the way it was all linked together, the connected data, the semantic, the clustering, the ability to do all of this within your search appliance. And you’re going to hear a lot more throughout the day from, from some of these fine speakers that I looked at them. That’ll give you more detailed information, kind of want to give you an overview of everything we’ve done.
David Meza 00:25:11
Kind of set the ballpark here. So rather than searches for things like this, this is a standard Amazon search for a graph drawing book. We want to be able to search like this, which is another way of doing it. Same, same, same return as this, but more in a graph, in a connected model here. It’s the same book with all the information here, and you’ve got your list over here, but then it’s grouped by different types. Over here are more about believe this is JavaScript and writing it down here is graph theory. Some analytics up here is how to program. And so, so you can easily as a year and you can jump to the areas you’re looking for rather than having to go through a ton of list, try to get information. Most users like this a lot better than I found than having to read through a whole bunch of lists.
David Meza 00:25:57
So let’s talk about opportunity. Number two, storage and access. I’m going to focus on the top two document database and graph databases primarily because working in a small team right now, these were the easier things for me to work through. As I started to develop some of the more wide column, the Hadoop and the key value. So if we look at a graph data of a document database to graph, we’re looking at taking our information that we collect, and we have put it into a document type database through a connector that in this case, Neil 4k has developed a bullying between Mongo DB and Neo four J you can actually create a graph on the fly from your documents if they get ingested into the, into the database. So we’ve done this well, we’re we’re right now, I’ve had a couple of interns actually working on this to get the information of our lessons learned database it’s into a graph that automatically puts it into the graph.
David Meza 00:26:53
And then you start seeing patterns starting to merge. We started seeing this is actually quick graph of our ontology, a subset of our ontology on spacecraft. So we could see how, where the different spacecraft are connected to it was developed at what industry or what companies working on those spacecraft. If you go through that. So, but you could see those patterns. And this is the main thing, is that as you started doing this connected data, you start seeing all these patterns come up and you’re able to find that information a lot faster. And of course, if you can find that a lot faster, you can make decisions a lot better as you go forward. So starting number two, this time, they have an inquisitive engineer, young engineer, and one of our practices for, for any new program or product that we’ve started is to go look at the lessons, learned database, to see what has happened in the past, how we can utilize it.
David Meza 00:27:39
So this engineer had 20, the three different key terms that he wanted to search against the lessons learned database. So he went to our standard JC search, which I showed you earlier. Have you had to put it in each key term to look through the lens and have to look at the entire lesson? And if you remember, we index 10 million documents, not just the lessons learned database. So he was trying to have to pick through what was the lesson learned? What was it? And he was spending a lot of time. So he came to me and said, Hey, can you do something? Well, let me go talk to the it guys, the search team that can you guys just give me an index of the lessons learned database and then apply these key terms to it and give it back to me. Yeah, we can do that a couple of days later, I got an Excel spreadsheet with 23 tabs, one for each keyword, with a list of dollies that the engineers still have to go through there and click and link.
David Meza 00:28:29
I said, no, no, no, no, we can do better than this. We have to do better than this. So here’s the, here’s an example of our public lessons learned data, but not the one that I was working on. Well, I ran, I ran it. Yeah. A model against this. Very similar to what I did, but primarily because a lot of our information is so classified or right. She’s sensitive, but unclassified, I’m not allowed to put it out, but this information, yeah, I can’t put out publicly, but this is very similar to what our lessons learned. Looks like again, just affects the keyword search. Yeah. He can only look through it from what type of sensor it may be a particular date range. Oh, I said, we can do better. So here’s some of the data science comes in applied to topic model and algorithm, Megan, Derek, clay allocation, primarily all that.
David Meza 00:29:12
And what that basically does is the words within the documents and try to develop which the frequency of the words and that in that document, and to try to develop a topic at the end of the, of the analysis of the algorithm and gets run, and you’ll get a list of what your key terms or words in there, frequency and the probability that this document is about these words or these topics. When you start developing a pattern, you start seeing topics. So out of these little ones are 2000 lessons learned. I generated 27 different topics out of that. And went through that. I started looking at my model, how do I develop a model to apply this into a cross section? So here, I know it might be a little difficult to say, but here are my topics. I created a topic it’s associated to a category because some of the metadata within that, in that lessons learned database, it had a category, a self-assigned candidate.
David Meza 00:30:04
It contains this particular lesson. It was submitted by this person that occurred at this center, this topic and tanks this term. And I was also able to correlate the topics based on their category, self assigned to melanoma, how well those topics relate to each other, both positively and or negatively. So I can move across those quite easily as I go forward. So as we put that into Neo four, J you get something like this, very similar, very easy, I’m you, you’re going to see this in some of the new feature docs, when I’m start to develop a good, a good database of where my documents are in this case, the lessons are ingrained. All these lessons here are in these categories. And my topic is in blue right here. So you can start seeing patterns should go, but, and this is good for a developer for somebody that’s working through somebody that can do the analogy, right?
David Meza 00:31:02
The cyber craze, I needed to find a little bit something better that I can do for the end users. Again, here’s the, a little bit more example of a Neo four J here again, you have your topics here and your different categories. So I could see this topic and not only have these lessons in this category, but these lessons over here that are associated with this category. So I have multiple ways. My end-user can find these lessons, do that by topic, by category, and then make those connections rather than having to look through a list of doctors. You mentioned, he can go to one topic or one category, okay. And get all the lessons that are associated with that based on what he’s looking for. So, so opportunity number, as I said that the database while it’s good. Okay. I like it. It’s good for developers.
David Meza 00:31:44
Good for analysis. If you know, cipher quarters, we have to give it to make it a little easier for the end user, to be able to find this information. Something, because most of our end-users don’t want to sit there and learn a new language. They just wanna be able to point and click and get some information. So yeah, we looked at it at another tool, which will be talked about a little later on and we, we ahead, we predefined some queries for the end user to be able to. So in this particular case, we found a lesson. We w we did a little search through the database, through this tool to do fuel and oxidizer storage tank. Wow. So it’s a lesson learned on relief, valve storage tanks. Now we go through that and has all the information they need over here, and then click on the link that takes them directly to that lesson on our, on our website, if they wanted to.
David Meza 00:32:28
But all the information is right here. Very easy to say. And if they click and expand, they get more information. So again, very similar to the graph database, Neo four J here’s the topic it came in fuel water valve. If you can see some of these same words are in the topic, so it totally fell into that topic fairly well. It happened at the Kennedy space center. It was written by this gentleman, David Pennington. And in this particular case, the category was not applicable applicable. They didn’t, they didn’t assign one at the time, but again, information easier for them for the end user, the engineer to go through this. So the engineer was able to go through these lessons learned and find information a lot quicker. Again, as you can expand, he can do this all in one location. Remember if he, if he was on a list on a regular Google list or less, you’d have to click each one and open it up, look at it here. He can see all the topics, all the documents that are associated with that topic really quickly. Look, look at the information here, whether it’s important to him or not rather than having to go through so many different pages of information. No, very, very helpful to them.
David Meza 00:33:28
Another, another building query we looked at is how the topics are associated by categories. So this particular engineer, in this case, we’re looking at it at a topic that’s on valve contamination or tank contaminations. There’s two different categories. He would have to be looking for a category in cryogenics, so he can see this category of project. And all of these lessons are in the cryogenics categories within this topic. So really quickly, he was able to hone in on the information he was looking for without having to search all again, all the information that was there, sped up his timeframe to start the project, rather than having to take two weeks to do all this, all this information gathering. He was able to do that in less than three days and was able to get the project started. So his, his manager was a little bit more grateful in the sense that the project wasn’t getting delayed, it was happening faster.
David Meza 00:34:18
So actually it was going a lot faster than they thought it was because of the information they had available. They allocated up to about three weeks if they were to get it done in three days. That was pretty good. Pretty good day. Last thing I wanted to show on this particular type of analysis here. I remember when I talked about correlating the topics to each other based on the categories. So here’s the topic over here, valve contamination, tank contamination. So I said, give me all the core, although topics that are correlated to that topic. So it comes over here to another topic on air pumps or damages to air pumps. Okay. I can understand how that those two may be correlated, how they made me work to each other. I’ll be here. You guys looked in on a fuel valves and water vows. Okay. You got it. You’ll bounce, water bottles that cause yeah, that’s correlated over here to bowel contamination or tank contamination, or there’s some issues there with the last doc last Iowa here is correlating along the line is fire hazard batteries.
speaker 1
speaker 1
00:35:12
How does lessons on
David Meza 00:35:16
Fire hazard because of a battery correlate to valve contamination or tank contamination? So I started doing a little digging. I started looking at those documents within that or the lessons within that topic, but there were several lessons in there that talked about battery leaks, acid leaks that came out of the battery or heat build up that caused a battery that ended up the asset, got leaked into the tank, which ended up contaminating the tank and contaminating the Val. So on a regular list, these documents over here would have probably been on page 15 or 20. Usually would’ve stopped somewhere around page two. Never, never seen these lessons over here, but on something like this, where you’re using connected data, you’re using your graph database. If you’re using your analysis, you’re able to make quick jumps from one topic to the other and gather your information a lot faster.
David Meza 00:36:09
Those things that benefited us within our program engineering and within our research in order to speed up some of the work we’re doing in order to save money, as I talked about in our search. So I’ll leave you now with, you know, what could get accomplished. If you could take care of all of this and empower faster to make more informed decision makers. Because we sped up the time of finding things. We leveraging our lessons in the past because we’re able to actually dig into those information and bring out those golden nuggets of data that we, that we can get that knowledge out of there and reduce the learning curve from new engineers or new employees. You can see that you can go through that information a lot quicker because we can group together within topics or categories. And then of course, if we enhanced it, extend the content and document management system, we get more information out of our, our, of our data. If we’re able to apply these tools to him, contact information, if anybody wants to reach out, I’d be more than happy to answer any questions or connect. So at this point, I’ll leave that up. If anybody wants to write some of that down, but multiple questions
speaker 1
00:37:07
Anyone have any questions today? Gotcha. So did you have any resistance today? Well, people just trust it.
David Meza 00:37:27
No, definitely. There’s always resistance with anything that’s new. Oh, sorry. No, because the question wasn’t, I was there any resistance to doing this and that people just trust it. Anytime you do anything, especially in my organization where, when we’re working with highly technical, highly skilled individuals that believe they know it all, they, they, they definitely takes them a little while to get, to get them to understand and be comfortable with the new technology. So it was a sales job in my point, to be able to go out and talk to the different folks in the, in the grooves and show them very similarly how I’m trying to show you guys here, but sometimes a little more detailed of how they can get that information out. Fortunately, I stressed a lot on the data science, because again, I am talking to highly technical folks. If I can show them that I’m talking the same language as they are within the technical world, they tend to appreciate that a little bit more and then they can make the connection a lot faster if I came in and just try to sell them something on the, on the high level sweet stuff, I guess, if you want to call it whatever, but they would not take the time do that.
David Meza 00:38:35
They really want to do the technical details. Any other questions?
speaker 3 00:38:39
Thanks. Yeah. Thanks for the presentation. I have two questions. One is the shorter one. If you did any UX on the graph presentation, like the edges look a bit on usual to me, I was curious how the people found it a one, why you chose those types of edges between the nodes and maybe the longer one is on the classroom that you did in your first screenshot of your new interface. On the right hand side, you have different clusters that look like relation based. And I’m curious how you in furthers and how you do the class train yeah. On your new search tool. Okay.
David Meza 00:39:23
Let me ask you that one first. So on the starch tool, I go back to that. Let me make sure. So here. Yeah. So this part here, unfortunately, this part here is part of the application vendor capabilities. So while I, while I asked them for the types of algorithms they use, they don’t, they don’t give me the actual algorithms themselves, but they’re definitely doing different types of hierarchical clustering, as well as I believe if I remember correctly, they were doing from k-means clustering on that, as well as a semantic analysis of parts of speech analysis within, within the documents to try to make the connection. So w on the edges though, you want it, why I chose those types of edges. It was at that point when I was making the slide, they was just kind of going through the, the built-in edge linked to that. So it really, wasn’t a major decision on the, on the colors of the types that what you’re looking for. Oh yeah. The shapes of the images within, within, like instead of arrows, you have those
speaker 2 00:40:34
Soundbar, does the application matter? I didn’t feel like this one or the last one, the pulling one,
David Meza 00:40:43
I think you’ll have a presentation on later on about how you can change those presentations from link heroes themselves. I just chose a different color and I then used the default connections on that, but I can’t change the colors. I can put the icons on there if I wanted to and do a lot of different information about different ways of doing it here. I just try to show a different color just to try to pop out that it really is a different relationship at this point with the different colors. And they’re all bi-directional here. So there there’s not any one direction. Yeah.
speaker 4 00:41:20
Hi, I’m Michael. Yeah. I have two interesting two questions. So when you’re identifying topics from a document, do you also try to assert how each topic like relate to each other or just the, the fact that the document is related to a topic?
speaker 1 00:41:43
Well, both, I guess first one from the
David Meza 00:41:47
Topic, do that one topic modern algorithm will we’ll define that the actual topics that the document falls. It actually, when you’re doing a topic model and algorithm, every document has all the topics. One topic will be we’ll come up to the top from, from a probabilistic sense. So you find the one with her with the highest probability and assign that topic to that document is the most likely one. Now, based on the metadata you may have available within these documents. In this case, I had, I had categories, I chose the category, metadata documents. I was able to use that information and the data value from the topic modeling to correlate those topics based on the category. But that’s how I correlated the topics to that. Okay. So I could use any other, any other type of mandate. I couldn’t use other metadata in there to correlate the topics together in this particular case, the top of the correlated based on their category, but anything else on my blog, I actually write on how I did that in the, in the analysis and the actual code to write that up and how you do that if you’re interested. Okay. Okay,
speaker 4 00:42:55
Cool. The second question. So the semantic part now happens on the identification of the topics and the categorization of the data layer. That’s what I understand that you guys have been doing. So to what extent do you try to overcome the natural language semantic parsing? So to, to tell the complete semantic search experience, like you go all the way to the natural language parsing part, where you actually do the, the, the, the identification of concepts within the queries that the users do to then translate it to the semantic search at the lower level. Right. Do you actually try to like, to, to push through that?
David Meza 00:43:34
We’re working on that? Yes, we haven’t gotten there yet because there are also different types of training we have to do to our algorithm to look at the parts of speech or get the end of the extract and stuff so that we can combine the two, we’re doing some work on sentiment analysis to try to see what some of the documents with them in the comment I can cook cool stories. And we’re working with EV every mission that’s finished. An astronaut has to give a report out on what the top issues within the, within the international space station. So we’ve got years of information that we’ve been doing, sentiment analysis using top to top parts of speech in, in natural language analysis on top of that, trying to figure out what was their most or their worst or best issue. And was there any trends for those issues across the missions? Was it getting better in a particular category or worse? So we are doing some work we’re not there totally yet, but we are doing some work on that. We, there was a question back over here, or just a question about the grass and notice that it’s
speaker 1 00:44:38
Happenstance maybe, but
David Meza 00:44:49
Rather really, it’s just what I started doing from, from my database background. It just, I made a lot more logical sense to me, you know, coming from a relational database management moral. And when I first started looking at different types of database models, the graph model, the Neo four J that that type of model made more sense to me and how I can associate, could make, create those relationships to this entities and actress I’m properties to those entities. To me, it just works a lot easier on the world that I’m working on.
speaker 1 00:45:19
There was a question here for a bit. Yeah. I was curious how difficult was it to actually apply a schema to this because dropped in investors tend to be highly unstructured. Was there a lot of effort associates and figuring out some sort of a structure to follow and ensure that the data, when it’s transferred lines to your model, one
David Meza 00:45:41
Of the benefits of a graph database is that they’re schema-less so you got a graph model here. And as I developed the graph model, if I go back, yeah. So it’s really sitting down and talking to your group, talking to your individuals, talking to people that understand or know the data, what’s the best way to model this data. So some conversation here and what’s happened. The good thing is, is it’s really easy to update. It’s really easy to add things to it in a relational database. If you have a schema and you’ve got to change it again, you could have some problems in a graph model. It’s very easy to add another entity or another part into the model or relationship. It makes it run really quick and easy. So
speaker 1 00:46:19
Semantic data. How do you guarantee that you’re not adding the systematically incorrect? If you don’t have a model that guarantees stuff there,
David Meza 00:46:27
The, the model itself, it can, can be changed. So you can kind of guarantee in the sense that you have control over the model. And I believe what was it you that had a good information on some connecting the semantic model in your graph. Yeah. Are you going to talk about this later on? I am. Yeah. So we’ve got a good lead into, to his, his, his talk about the semantic and graph database models. It’s a good, good article. Cool. That’s, that’s
speaker 5 00:46:55
Some great questions there and really challenging David and, and, you know, obviously paid great attention to that tool, which was fascinating. Thank you very much. Thank you very much. Yeah.