Skip to content

Your perspective related to my perspective

October 11, 2014


What a great word. I love that word.

Perspective: “a particular attitude toward or way of regarding something; a point of view.” – Google

So two people see something. Even though they are seeing the same thing, they each have a very unique perspective. Not just a different physical perspective, but also mental and emotional perspective.

Let’s examine the case where a person asks a questions or gives a piece of information to someone else. This is the case that caused me to want to write this little post here.

The person has thought of the question. The question is formulated based on the information in their head, information that they know about, they can even picture things in their mind, have an emotional response to certain pieces of information. They’ve painted a mental picture surrounding this question over a certain period of time.

With all that information that the questioner has at their disposal, they seem to assume that the person/people that they ask the question of have that same exact set of information and reaction, the same mental picture, even the same emotional response. The people being asked the question are hearing the question for the first time and have not had time to paint the picture in their minds, and it definitely won’t be the same picture. The questioner assumes that the question is clear, or the reference is clear, or whatever it is. But they couldn’t be further from the truth.

Okay … try this. Put yourself in everyone else’s shoes. From where they’re sitting, how would you ask the question so that it is clear, concise, and to the point and conveys enough information so that they can deliver an answer that is useful to yourself and to them. And how much time have you put into asking the question. Would it not be prudent to allow them to take some time to formulate an answer?

Same with note taking. Don’t assume that everyone sees your perspective on a note. Include links, include definitions, write out acronyms, include enough information so that everyone has the same information, or can get to that information.

And here’s a nice quote that I brought up with someone recently.

I have a perspective on an issue, you have a perspective on the same issue, and somewhere in the middle is the truth.


ESIP Summer Meeting 2014

August 18, 2014

A little late getting this one out. Busy catching up after the conference to the work I didn’t get to during the conference, then had a week long vacation, and then a week to catch up after vacation to the work I didn’t get done during vacation. And there you have it.

ESIP Summer Meeting 2014 was again, as it was last year, one of my better conferences. Better then AGU for me. The focus is on informatics and all that goes into data science. Actually, someone asked me during ESIP about what I thought Data Science was. Not a definition that I might get from Professor Fox’ Data Science class (though he explains it much better then I could ever), a class that all Computer Science and Earth and Environmental Science students at RPI should take. But what Data Science means to me. I hadn’t thought about it in quite that way. I could definitely have responded to the question as if it was a question asked during class. Data Science is “advancing inductive conduct of science driven by the greater volumes, complexity and heterogeneity of data being made available over the Internet. Data science combines aspects of data management, library science, computer science, and physical science using supporting cyberinfrastructure and information technology. As such it is changing the way all of these disciplines do both their individual and collaborative work.” Man, I just love that description.

But what does it mean to me? I’ve been pondering this question for the past few weeks. To me, being the Principal Software Engineer in the Tetherless World Constellation at Rensselaer Polytechnic Institute, it means leading the way, leading by example, in the development of data management, information technology, and the supporting cyberinfrastructure related to data collected in the pursuit of knowledge about our natural world, the universe, and all that exists. It means talking about, discussing, and learning from the various disciplines in order to better provide them with the tools necessary to do their jobs. And in discussing these various topics, I want to be able to lead by example, by showing people what we’ve done in our own lab to support this ever growing science that is Data Science. And there are so many aspects of data management, so many technical issues to be tackled in data science. Volume, discovery, search, access to and knowledge about, visuzliations, tool support, archiving, licensing, citation, provenance, storage and speedy access to both data and metadata. And I know I didn’t even scratch the surface.

The week for me started on Saturday July 6, traveling up to Copper Mountain, Colorado (I live in Lafayette, Colorado … so a nice drive up the mountains), getting posters printed, and continuing to prepare for a panel discussion, presentations, and so on. Sunday and Monday was the DataONE Users Group. Monday and Tuesday was the OPeNDAP Developers Meeting. Tuesday through Friday was the actual ESIP Summer Meeting 2014.

Presenting our work on OPeNDAP provenance was by far the best part of the week for me. OPeNDAP is an open source project developing software to provide access to, manipulation and transformation of, and transmission of data to users. Okay, I wrote most of the Back-End Server and that’s why I love it. Problem is, no provenance information is captured within the OPeNDAP Hyrax software framework. What would it take to provide this provenance information? What coding changes would need to be developed? And what architectural changes would we need to make in the software stack? So a presentation on Tuesday and a poster on Sunday and Wednesday about our work on OPeNDAP Provenance. Some good coding to be done.

And there is some interest in using the BES directly. I’ve felt for a while that OPeNDAP shouldn’t be in the business of providing a front-end UI. There’s plenty of data portals out there that have their own user interfaces, their own authentication and authorization systems, discovery, search, and browse architecture. All we have to do is provide access to, manipulation and transformation of, the data. And that could be done by a simple web service talking to the BES, not a UI. Just a thought. Of course, we could provide a user interface for those who don’t have one already, with hooks for providing authentication and authorization, saved sessions, discovery of the data, custom catalogs, faceted browsing, server-side functions and visualizations, visualizations provided in the Cloud instead of having to download data and applications to a local machine, and much, much more. I can visualize what could be and I really enjoy working on it. I just hope there’s time to do the work.

So some great collaborative possibilities with folks that I had discussions with regarding OPeNDAP.

And I noticed that this year, at DataONE and ESIP, there’s a greater amount of interest in providing citation, attribution, and provenance information within systems. And I feel that the Tetherless World Constellation is one of the groups at the forefront of this work. TWC had quite a few people involved in the development of the W3C Prov Recommendation, the Prov-O provenance ontology, and ping back. The Ping Back interface is of particular interest to me within provenance. A shout out to Tim Lebo on this, as he’s the person who brought this to the Recommendation. He and I worked (mostly Tim) on providing ping back within an OPeNDAP system. Basically, a web service, or some mechanism, for allowing upstream users to tell downstream data providers about derived products. So a user requests a data product from an OPeNDAP server. The data product retrieved from the OPeNDAP server is then used in the creation of another data product, like a report. The user can then let the OPeNDAP server know that a derived product, the report, was created using the data product returned by OPeNDAP. Ping back! And then the OPeNDAP server could ping back to the data portal that the report was created from data in the portal. And the data portal could ping back to the data provider that the report was generated using data they provided. And so on and so on.

Of course, this leads to the question of how we represent software in our knowledge system? Another set of great discussions at ESIP, DataONE, and OPeNDAP. Damn, so much fun stuff to work on. Anyway, OPeNDAP Hyrax is a piece of software that needs to be represented in the Knowledge Store. The components of Hyrax need to be represented as well, since Hyrax has a modular design, modules provided by other organizations even. So the libraries and dynamically loaded modules in Hyrax need to have a representation in the Knowledge Store. And the modules use other software libraries. Of course the people and organizations that developed these software components would like to be recognized and cited for their good work. Also helps in getting additional funding if software developers can point to systems that use their software and data products generated through the use of their software. For example, that report above was derived from a data product generated by the OPeNDAP Hyrax Software Stack. And I was a leading contributor to the OPeNDAP Hyrax Back-End Server. And I work at the Tetherless World Constellation at Rensselaer Polytechnic Institute with collaborators from OPeNDAP, Inc. The publisher of the Hyrax software is OPenDAP Inc. DOAP (Description Of A Project) is an ontology that could be used to represent software projects. There are already systems that use DOAP, and I believe we’ll start using it within the Tetherless World Constellation to represent our software products.

Ahhh … Linked Data Rules!!!

ToolMatch was the other big project that I discussed during the week, matching datasets to tools that can be used to access, manipulate, transform, analyze and visualize the data. This project has been particularly challenging for me. It took me a while to determine the architecture for this project, the services that we needed to provide. Specifically, developing and using rules within a semantically enabled knowledge store. In this case, writing rules that given the features of a dataset, match them to the capabilities of tools. Also, how do we represent these features within a dataset and the capabilities provided by the tools. This will require a great amount of feedback from tool developers, domain scientists who really understand their data, and the ToolMatch team. This is an aspect of my career that I have not had a lot of experience in. Sure, I’ve discussed requirements and use cases with clients, developed user requirement documents, gotten feedback from users, but utilizing crowd sourcing for the development of a system is quite different. At least, this is how I see it.

So … a HUGE week for me, and a great success for me and the folks who attended from RPI.

I’m really looking forward to continuing my work with DataONE, ESIP, and OPeNDAP.

Going on the Defensive

June 20, 2014

Had an interesting conversation today. The content of the conversation and the person I had the conversation with is unimportant for this post. It turned out to be a good discussion, and I imagine in the near future I’m going to learn a lot about the topic discussed. But first, I had to open up.

You see, the discussion started with me going on the defensive because my position was being attacked and ridiculed. So much so that I felt that I was personally under attack. My experience is that when someone’s position is attacked, they will more then likely go on the defensive, close up, and not be able to listen to the other person’s point of view, let alone change their mind. All the person ends up doing is defending their own position because it’s under attack. The harsher the attack, the more personal it is, the more you end up defending your position instead of opening up to the possibilities and having a nice lively discussion.

We see it all the time in politics to be sure. Instead of having constructive, lively discussions on the issues, the one side goes on the attack causing the other side to go on the defensive. Nothing gets accomplished. And when it turns personal, it becomes even worse. Not only is the position attacked, but the person as well. You’re busy defending your position and yourself.

And it took me a while to realize that I had closed down. Instead of listening to what the other person had to say, I was searching for the right weapon to fend off the attack, even coming up with “facts” to support my position. But once I realized what I was doing I was able to open up, listen to what the other person had to say, request research articles that I might be able to learn more, potentially grow as a human being, learn something new, and potentially change my position.

There is not one single issue out there that I am 100% on. I have a position, and I can discuss my position. But I’m never 100% positive that I am right. I mean, how could I be. For example, if I have my numbers right, 90+% of scientists are 95+% sure that global warming and global climate change is a critical issue and is human caused. But hey, there’s still that ~5% probability that we could be wrong. Highly unlikely, but possible. The world is a big place with many complex systems interacting together to create our world. And we understand such a small part of that interconnectedness. So how could we possibly be 100%. How could I be so absolute in my positions. I try to approach discussions, debates, conversations in such a way that I will speak to and promote my position, but there is a distinct possibility that I could be wrong. And I’ll be the first to admit it.

So let’s have a lively, constructive conversation on the issues, grow in our understanding and the various perspectives, and be open to the possibilities.

DCO Data Science Day 2014

June 14, 2014

Knowledge Representation

One key point that I heard Mark Ghiorso bring up is related to the fact that there were 20 parameters to a data model that he was using, that he needed to make changes to one or more parameters, but did not know/understand what those 20 parameters were, even with all his years of experience. There wasn’t any information that labeled or described the parameters. The knowledge of these parameters was missing, or had to be searched for.

The point that I took from this is that there needs to be a knowledge representation of the models, the parameters that go into the models. And more then that, the information about the people who put together the model, where the data came from that they use in the model, the software and theories used to create the model, and so on. But semantic representation of the 20 parameters would be ideal. Labels, descriptions, citations, and related documents would be examples of concepts that we could be keeping track of. And in this knowledge store could include information about what would happen if one were to modify the parameters.

When I first started working with the Semantic Web here at the Tetherless World Constellation at RPI we used terms like Knowledge Systems, Knowledge Provenance, Knowledge Store, and even Knowledge Information. Using that word Knowledge meant a lot to me. It was more then just creating directed graphs using concepts and properties, linking things together, creating models, and other engineering tasks. To me it’s more about representing knowledge. This knowledge is in people’s minds, text files on their personal computers, blog posts, scripts that they wrote (probably not well documented.) for the most part the knowledge is in their minds.

As I’ve said for some time now, concepts and properties, the relationships between concepts, are all first-class citizens. I want to be able to ask questions about the individuals to be sure, say a particular dataset, a measured parameter in the dataset, the sensor that measured the parameter, the instrument that that sensor is on, the deployment of that instrument, and so on … but I also want to be able to ask questions about the concepts themselves, the properties, the relationships. Tell me about the concept Cruise. What is an instrument? What is a sensor? Tell me about the idea of measuring parameters by a particular sensor. Describe them to me. Share the knowledge.

Unfortunately this requires someone to enter information. That’s right … data entry. Not a lot of people seem to want to do data entry. Far too busy to do that, not as interesting as the cool science and engineering tasks. Or, in some cases for those with large egos, too important to do data entry. Well, nobody is too important for data entry. Should lead by example anyway. Or, in the least, get someone to do your data entry for you. And we all know what is meant by “I’ll do it later.” It probably won’t get done. And sometimes you just gotta enter the information in by hand, no need to try and create some cool application to do it for you. Just start slapping those keys.

Don’t just declare the concepts and relationships, define and describe them. Don’t just create an individual, label it and describe it. Don’t just do the cool science and engineering, let others understand and follow your cool science and engineering. There are curious minds out there thirsty for not just words, but for knowledge.

Data Science … What?

Some of the presentation given at the DCO data science day contained a great deal of information about the specific science that the presenter specialized in. Deep carbon, extreme physics, deep life, deep energy, volcanology, chemistry, biology, whatever. My take from this … they don’t know, or don’t know how to talk about data science. As scientists they understand very well the special area of research in which they participate. But data? That’s okay … I’m a data scientist and I know very little about solar terrestrial science, oceanography, deep carbon, and other sciences. Well, that’s why we organized this event. And of course I could certainly say, being a data scientist, that everyone should know about data science. Well, that’s because all the scientists collect data, need to do something with the data, store the data somewhere, want their data to be used (eventually), cited. They want recognition for their work, and rightfully so. All of that, and more, is data science. Not that a scientist working in the field of Deep Carbon needs to know how to do any of this, necessarily, but a general understanding of what is data science would be beneficial, specifically data management and the development of data management plans.

21st Century Workshop

One of these days we’ll have a conference/workshop that includes live documentation, not printed on paper, in a folder, in a booklet, whatever. But a live document that has links, is clickable, is searchable.

Could find out information about a person, look at their bio, find out where they do their research, information about the organizations, search for content, search for sessions, search for posters, be able to contact researchers of interest, learn more about research, and more and more. The linked data world.

Day 2 … WOW!

On day 2 of the workshop we presented the DCO Data Science platform. This includes the community portal (Drupal), semantically represented information creation and management (VIVO), data storage and management (CKAN). And I learned a lot. Namely, there’s still a lot to do. Our team has done a tremendous amount of work and have done a tremendous job.

Having a specific number of people working with the tools (Drupal about a dozen and VIVO just a handful, and CKAN even less), is one thing. But getting all the scientists at the workshop to start using the tools? That’s a different story.

Lots of exciting research and work to do here. Lots of great features and community tools are on their way!

The Art of Communication

February 26, 2014

The Art of Communication

Yeah, I know. There’s been plenty written on this topic over the years. I personally wish the art of communication were taught in at least high school. I wish nonviolent communication were taught in high school.

Here’s my philosophy … everyone wants to be heard. Everyone. Not just you. Not just the team lead. Not just the meeting organizer. Not just the professor. And everyone deserves the chance to be heard completely.

The key, in my opinion, to successful communication is … listening. It’s not good enough to just hear someone talking, but to hear what they have to say. It’s not fair to the talker or the listener. That’s what I mean by being heard completely. Listen, don’t interrupt. Sometimes it’s even good to say back to someone what it is you heard them say, just to verify its accuracy. That’s always good in relationship conversations. Don’t think about how you’re going to answer, or what you’re going to say in reply. Don’t even need to react to what they’re saying. Just listen to them, hear them, and try to understand them.

There are some who love to hear themselves talk. They just talk and talk and talk. They interrupt. They don’t let people finish what they have to say. And they make it difficult for people to get a word in edgewise. They make it really difficult for others to be heard, let alone be heard completely. This is unfortunate, whether in a classroom or a meeting room. Hey dude, it’s not all about you.

In an academic environment, yes, there is a teacher, or professor, or TA, who’s job it is to teach. But there are students who want to learn, they want to ask and answer questions, they want to express opinions, they want to convey ideas and thoughts. Just think how rich the conversation would be if everyone had a chance to be heard.

Meetings are my big frustration. Meetings are between 2 or more people. Not just one person. More then likely, everyone in that meeting is very busy, has a lot to do, a lot on their plate. I know I do. And I know that I want to get that meeting over with in as efficient a way as possible so that I can get back to work.

I will be the first to admit that I am not the best at meetings. I am easily distracted and sometimes end up pulling others into my distraction … SQUIRREL. And I promise that I will work really hard at that.

For project meetings, I am very interested in finishing the meetings as quickly as possible, not to take up everyone else’s time, and even try to finish up early. I promise that I will not discuss things in meetings that not everyone needs to be a part of. Not everyone needs to discuss every aspect of every part of the project. So if just 2 or 3 people out of 10 need to discuss something, they can take it offline and discuss it where they aren’t wasting everyone else’s time.

In a recent meeting there were quite a few questions asked during the meeting. I personally felt that just asking those questions was good enough for the meeting. Jot down the question, make sure you know who needs to participate in the discussion and answering of the question, and move on. No need to discuss the actual questions in the meeting. That’s definitely not the case in every meeting or for every question. It was just that in this meeting, which usually lasts a little over an hour on a good day, the questions were being asked, and then discussed and answered by just the 2 people who needed to participate in the discussion. The others in the meeting, including me, just did something else, like actually getting word done for the project.

So here’s my check list.

  1. Talk only when it’s your turn to talk. Don’t hog the entire conversation.
  2. Let other’s talk, be heard, and be heard completely. Don’t interrupt them. Don’t even do the “Mmhmm” thing, while they’re talking. Let them finish. Focus on what they’re saying, not on how you’re going to respond. Even if you disagree with them 100%.
  3. Only talk about what is relevant. Don’t talk about what you’re doing for another project that is unrelated to this project or class.
  4. Only talk about what the entire group needs to talk about. Take smaller conversations offline and include only those people that need to be included.
  5. Just because a question is asked, doesn’t mean it needs to be discussed or answered in the meeting. Again, take it offline and include only those people that need to be included.
  6. And the very most important point … listen. Hear completely what someone has to say.

Oh, one other thing, and something else that I’m going to work really hard at. Pay attention. Even if you have to pretend that you are paying attention. Don’t work on something else, don’t check email, don’t have your nose buried in your iPad, and take off those Google Glasses. I’ve heard people say that they are hearing what is being said while they’re working on something else … but I don’t believe it. Maybe it’s because I can’t do that. Look up, pay attention, and participate fully.

Tell Me a Story

February 10, 2014

A great many times, when we start a project related to the semantic web, people go directly to creating the directed graphs, drawing the data model on the white board. These are classes, those are properties. So the discussion focuses on the creation of the data model, the schema, the implementation, as most of the time people are talking about RDFs and OWL.

But wait … slow down folks. Before we get to implementing anything, tell me a Story. The story is the narrative. It draws out the main ideas that you’re wanting to express in your work. Write about what someone is wanting to accomplish, why they want to accomplish it, what they are researching, measurements they might be using to generate a plot or graph or image or whatever. Describe where they are getting their data, how they are getting it, what they are looking for when they get their data, how they wish they could get their data. Tell me a story using a natural language. Even draw me a picture.

Senator Smith, from the great state of Meh, has just received a PDF of the Ecosystem Status Report from his science team. He reads that the document is generated every two years by various organizations and scientists regarding the health of the environment, tracking changes in key indicators of climate, physical forcing, ecosystem dynamics, and the role of humans in this system. One of the chapters in the document contains information that the Senator wants to learn more about, to discover what information was used to generate a statement in the document.

The Senator clicks on the plot related to the statement. By clicking on the plot he is taken to a splash page for the image, generated from the semantic expression of the plot. Included in the page is information about how the plot was created, from an IPytjon Notebook that loaded a couple of datasets, who created the plot, what role that person plays and for what organization. If he wanted to he could learn that an IPython Notebook is a collection of cells, each cell containing code that is run that does certain things, like loading a data file containing measurements or derived measurements of a particular indicator, and plotting it. The Senator finds out who ran the notebook to generate the plot, who wrote the code that generates the plot, where the dataset came from, the definition of the indicator and other measurements, and continues to click on datasets that were used to derive the current dataset until he gets to the original dataset of measurements taken on a cruise hosted by the Woods Hole Oceanographic Institution with PI Tony Sullivan.

There’s the story. The story is what we need to start with.

From this story we can then pull out the information and relationships that we need to model. But still we’re not talking about a data model. No! What we want to do now is develop the information model. List the concepts. List the relationships between the concepts. What do we want to keep track of and what do we want to link together?

From the story above we see that there is a ecosystem status report, which is a document. That document has chapters. And in the chapters there are images, graphs, plots, citations, references to other documents  and referenced datasets. The plots, images, graphs, etc… are clickable. The plot was generated from the IPython notebook that has cells, authors, and someone who ran the notebook. The plot was created in a cell in the notebook, the cell has an author and the cell loads a data file in from a dataset derived from datasets collected on a cruise run by WHOI for a project with a PI. The data can be traced back to an organization, and funding information (should have citation information and licensing information as well). On the cruise was an instrument with a sensor attached that collected the data that is in the dataset. The cruise is a deployment of a ship that is owned by WHOI and was, for this deployment, captained by Captain Phillips.

Wow, this use case contains a LOT of information.

The creation of this story is one of the first pieces of the semantic web iterative methodology that was developed at the Tetherless World Constellation of Rensselaer Polytechnic Institute by Peter Fox and Deborah McGuinness.

Semantic Web Methodology and Technology Development Process

Semantic Web Methodology and Technology Development Process

Another first step in the process is bringing together a small set of diverse people from various fields who will participate in the telling of the story and the implementation of the story. In many of these projects I play the role of the information modeler. Sometimes the project manager. In some I play the role of the system architect, and in many the role of a software engineer. Those last two are my favorite, by the way.

As a software engineer I see a whole set of software use cases related to this story. So I use the term use case different from the way the methodology describes it. I would consider the methodology use case as the story. And from the story you develop a whole set of implementation use cases, project management use cases, modeling use cases, etc…

Each of these use cases are much, much smaller in scope and fit the classic definition of a use case. The use cases are formalized, link to requirements documents and specific implementation information. Pieces of that are turned into tickets that are assigned to an individual and are clear in their expectations. These tickets can be organized and prioritized, fit into a schedule, assigned to an agile sprint, and can eventually be marked as completed.

Once the tasks for a use case are completed the use case can be marked as completed. Once the use cases for the story are completed the story can then have a happy ending. At each step a sense of accomplishment at the completion of something.

All that information that I talked about in the beginning is more then just a technical accomplishment.  The information represents knowledge. There’s information about all that went into the completion of that Ecosystem Status Report so now it can be recognized  and referenced. The importance of each step is revealed, how the information is being collected, who collected it, and the value of all of that information and the value of each of the steps. An appreciation for the hard work and the resources that went into the report are more apparent.

ESIP Winter Meeting 2014

February 3, 2014

Another great meeting. Had a lot of great conversations, a lot of good meetings, met with collaborators, good sessions to attend.

And starting right out of the gate: First speaker. The speaker asks a lot of great questions related to software in the sciences. He talked about software being publicly available, in code repositories like subversion and git, having unit tests, etc. The exam that he had us take was very interesting, asking us questions like “how many of you use code repositories?” and “how many of you write unit tests?” But there’s additional questions that could be asked.

  1. How many of you use formal documentation practices such as javadoc, doxygen, pydoc, etc…
  2. How many of you can write a formal use case document
  3. How many of you use a content management system to document installation, configuration, architecture documentation, technical infrastructure and decision making documentation.
  4. How many of you have an expressive representation of your software so that it can be referenced or cited elsewhere. (It’s one of my own goals)
  5. How many of you have gone through a formal code review process.

There’s representing the software in such a way that it can be referenced and cited. By represented I’m talking about an expressive representation using RDF. OPeNDAP Hyrax is a software package that includes various software components, dynamically loadable modules. Each of these components and modules can be represented, as can the versions of the software components, and even the installation and configuration of the software. And the URI that represents the installed and running piece of software can then be used in provenance capture.

Let’s not forget licensing and citation information either. A representation of the license and information on how to cite the software that was used to generate data products.

Expressing the data vs. sharing the data

Now there’s an interesting concept. The speaker (Kevin Ahsley) stated that there’s nothing that says you can’t say that data exists without making the data available. In other words, there’s lots of scientists out there who don’t make their data available, nor do they even state that the data exists. Why not? You can say that the data exists, even have representation of the data with at least some basic metadata and/or semantic representation, without making the data itself available. Nicely put.

Speaking with Kevin later in the conference, he wondered how many times a certain question had been raised and researched, failed, but never documented or shared. Because the research and data aren’t expressed in any way, even stating that it existed, can cause things to be re-researched, re-examined, etc… Why waste the time, or spend the time. Or, by making that information available, perhaps someone can say “I was going to do that, but what if we tried this instead.” So instead of repeating the error, trying a different approach.

And that sends me back to a conversation I was having with Peter Fox a while ago, talking about research. The idea of research is that you have a question that you want to explore. You might not know how to approach the problem or the question, how to proceed, or what the results will be. But you have that question. The goal of the research isn’t to succeed, in that you are trying to prove something, but to succeed in that you’re performing the research. The experiments might fail. The research might fail. You might learn that what you had originally thought isn’t really accurate, or the “truth”. That’s all part of the research. Share your data, share your results, and express the data and research semantically so we get a nice rich expression of information.

Information model vs. Data model

I think I’ve talked about this before, but maybe bears to be talked about again. There is no need to come to an agreement on terms anymore. Believing that we have to have one model to rule them all would be considered closed world. The Semantic Web is Open World. If we have terms that we determine to mean the same thing, and have the same relationships with other terms, then we can equate the terms.

One thing that I like to do when first getting into a project with a new group is that we first generate the use case story. And from the story we come up with the information model, the concepts that we want represented, and the relationships between the concepts. Not data modeling, not writing a schema, not an ontology or anything, but an information model. In the back of my mind I might know of an ontology that we could use, but I don’t bring it up just yet. I jot down some notes. The goal at this point is to come up with the information model, the things that we want to formally represent, and their relationships.

Later, when we begin to formalize the information model, and decide on the representation (OWL, RDFs, relational schema, whatever, though tending more towards more semantic representations these days, so RDFs at least), in other words, developing the data model, even then I don’t necessarily use someone else’s terms or another ontology. We stay in the context of the user story, the way that they talk, the terms that they use, the relationships, etc… Stay with what they know. Again, we can keep in mind various ontologies that we might want to utilize, but we make sure we stay with the terminology that the user is used to.

At some future time, then we start utilizing other ontologies and their capabilities, their terminology, their meaning of terms. But we do this with the users, with their knowledge, with their buy-in.

I think, it’s not the term, necessarily, though that is very important because words themselves carry meaning for people, carry emotion, and very specific perspectives. But it’s more about the meaning behind the words, the terms. Encoding that perspective, encoding the meaning.

And when you decide to use another ontology, you are deciding to use the meanings of the concepts and relationships, not just the terms. So understanding the meaning of the terms is very important.

And here’s the reason I’m talking about this. We’re talking about PROV-ES. It seems that the usage of some of the terms is inconsistent with the meaning of the terms as decided by W3C. For example, it seems that the PROV-ES team equates a prov:Agent with a foaf:Agent. And that is not the case. A foaf:Agent is really a prov:Entity. A prov:Agent is a foaf:Agent that is performing an action. For example, a piece of software would be considered a prov:Entity. Let’s say OPeNDAPHyrax is-a prov:Entity. The running of that piece of software would be considered a prov:Agent.

Here’s what Tim Lebo said: “An agent is something that bears some form of responsibility for an activity taking place, for the existence of an entity, or for another agent’s activity. So yes, Agents “steer” Activities and are “responsible” for the Entites that they influence. I go by the rule of thumb that a single code file (Entity) invoked many times is many Agents, and the code file is distinct from each of the agents. Whether you say the agents are the same is up to your needs.”

A prov:SoftwareAgent has inputs and outputs and uses a particular configuration, the environment in which it is running. It’s the software plus something else.

So PROV-ES saying that a NASA project is a prov:Agent is inaccurate in that they were really saying that a NASA project is a prov:Entity instead.

Discovery and Search

Perhaps I’m missing something here as well. It seems to me that the discussion still revolved around data granule browsing and search. Within a given portal, now that the user knows of the portal, find the data that the user is interested in. Here’s some search terms, keywords, etc, etc…

But how did the user get to that portal? How did they discover that this portal, that portal, and that other portal, contained data that they might be interested in? This is what I would consider discovery. And I go back to the graphic that Peter Fox has used in the past. There’s three levels. There’s discovery, there’s browse and search, and then there’s access. All are really forms of discovery, even access. I’ve found the data, now look inside. Discover what’s there, the representation, and what tools can be used to access, manipulate, transform, and visualize the data. Even that’s a form of discovery. But to me, discovery is finding the various data portals, catalogs … virtual observatories, that have information that I might be interested in.

The question, I’m looking for a particular piece of information, but don’t know where to start looking. So I need to discover where to look. Google, Wikipedia, dictionary, thesaurus would be the results of that inquiry. Now that I know where to look I can begin my search for the information. Once I’ve found some virtual observatories, then I search through and browse their holdings. And once I find the data that I’m interested in, then I access it.

Ahh … just thinking out loud!


Imagine if all of our projects are expressed in the knowledge store. All the working groups in the project. All the meetings for the project. All the collaborators working on the project and what they are doing specifically for the project. All the publications and presentations from the project. All the events attended that reference the project. All the announcements for e project too. And so on.

Imagine if all the people in the lab are represented; a bio for them and a current bio picture; announcements about accomplishments, awards, appointments, etc…; all their publications and presentations; all the projects they are working on; all the classes they are attending, teaching, or supporting; all the events they attend. So here’s the visualization:

I go to Peter Fox’ page, and see his picture, contact information, additional pages to visit for more information, a bio, and his current roles and affiliations, and his interests and skills. There’s a link for announcements, and that list is dynamically generated from the knowledge store. Same with his publications, presentations, and events attended, currently attending, and future events. A page for the projects he’s working on, and his role in the project. So we have events, announcements, publications, presentations, projects, classes being taught, uhhh … what else?

To see a list of events that members of the lab are currently participating in or will be participating in. Awesome. The list of papers, publications, presentations that anyone and everyone in the lab has authored or co-authored. The list of projects we’re currently participating in. And so on.

And it’s all linked together with other representations, other sites, other data, other semantic representations (linked open data). Ahh … just imagine.

Imagine how impressive it would be if all of that were semantically represented in the Tetherless World web site.

Wait a minute … it is. Well, a lot of it is. Hmm, so what’s missing? Oh yeah … the one thing that people just hate to do … data entry!

Comment: The Semantic Web, to me, is all about expressing knowledge, sharing knowledge, which is why I like to call it Knowledge Store instead of triple store, Knowledge Provenance, instead of just provenance, etc… We can automate, we can write scripts, make it all machine readable, but in the end, it’s all about providing knowledge to humans, sharing knowledge, even sharing experiences.

It seems that people try to automate things in order to keep from having to do data entry, when what we really need is for people to do more data entry. Just do it. Take the time and do the data entry. It’s worth it in the short term and in the long term. Enter a good description, enter as much relationship information as you can, enter as much as you can. Once you say “Oh, I’ll get to that later.”, later becomes ‘probably won’t do it.’