');mask-image:url('data:image/svg+xml;charset=utf-8, ');width:16px}.markdown-body details,.markdown-body figcaption,.markdown-body figure{display:block}.markdown-body summary{display:list-item}.markdown-body [hidden]{display:none!important}.markdown-body a{background-color:transparent;color:#0969da;-webkit-text-decoration:none;text-decoration:none}.markdown-body abbr[title]{border-bottom:none;-webkit-text-decoration:underline dotted;text-decoration:underline;text-decoration:underline dotted}.markdown-body b,.markdown-body strong{font-weight:600}.markdown-body dfn{font-style:italic}.markdown-body h1{border-bottom:1px solid rgba(209,217,224,.702);font-size:2em;font-weight:600;margin:.67em 0;padding-bottom:.3em}.markdown-body mark{background-color:#fff8c5;color:#1f2328}.markdown-body small{font-size:90%}.markdown-body sub,.markdown-body sup{font-size:75%;line-height:0;position:relative;vertical-align:baseline}.markdown-body sub{bottom:-.25em}.markdown-body sup{top:-.5em}.markdown-body img{border-style:none;box-sizing:content-box;max-width:100%}.markdown-body code,.markdown-body kbd,.markdown-body pre,.markdown-body samp{font-family:monospace;font-size:1em}.markdown-body figure{margin:1em 2.5rem}.markdown-body hr{background:transparent;background-color:#d1d9e0;border:0;box-sizing:content-box;height:.25em;margin:1.5rem 0;overflow:hidden;padding:0}.markdown-body input{font:inherit;font-family:inherit;font-size:inherit;line-height:inherit;margin:0;overflow:visible}.markdown-body [type=button],.markdown-body [type=reset],.markdown-body [type=submit]{-webkit-appearance:button;-moz-appearance:button;appearance:button}.markdown-body [type=checkbox],.markdown-body [type=radio]{box-sizing:border-box;padding:0}.markdown-body [type=number]::-webkit-inner-spin-button,.markdown-body [type=number]::-webkit-outer-spin-button{height:auto}.markdown-body [type=search]::-webkit-search-cancel-button,.markdown-body [type=search]::-webkit-search-decoration{-webkit-appearance:none;appearance:none}.markdown-body ::-webkit-input-placeholder{color:inherit;opacity:.54}.markdown-body ::-webkit-file-upload-button{-webkit-appearance:button;appearance:button;font:inherit}.markdown-body a:hover{-webkit-text-decoration:underline;text-decoration:underline}.markdown-body ::-moz-placeholder{color:#59636e;opacity:1}.markdown-body ::placeholder{color:#59636e;opacity:1}.markdown-body hr:after,.markdown-body hr:before{content:"";display:table}.markdown-body hr:after{clear:both}.markdown-body table{font-feature-settings:"tnum";border-collapse:collapse;border-spacing:0;display:block;font-variant:tabular-nums;max-width:100%;overflow:auto;width:-moz-max-content;width:max-content}.markdown-body td,.markdown-body th{padding:0}.markdown-body details summary{cursor:pointer}.markdown-body [role=button]:focus,.markdown-body a:focus,.markdown-body input[type=checkbox]:focus,.markdown-body input[type=radio]:focus{box-shadow:none;outline:2px solid #0969da;outline-offset:-2px}.markdown-body [role=button]:focus:not(:focus-visible),.markdown-body a:focus:not(:focus-visible),.markdown-body input[type=checkbox]:focus:not(:focus-visible),.markdown-body input[type=radio]:focus:not(:focus-visible){outline:1px solid transparent}.markdown-body [role=button]:focus-visible,.markdown-body a:focus-visible,.markdown-body input[type=checkbox]:focus-visible,.markdown-body input[type=radio]:focus-visible{box-shadow:none;outline:2px solid #0969da;outline-offset:-2px}.markdown-body a:not([class]):focus,.markdown-body a:not([class]):focus-visible,.markdown-body input[type=checkbox]:focus,.markdown-body input[type=checkbox]:focus-visible,.markdown-body input[type=radio]:focus,.markdown-body input[type=radio]:focus-visible{outline-offset:0}.markdown-body kbd{background-color:#f6f8fa;border:1px solid rgba(209,217,224,.702);border-radius:6px;box-shadow:inset 0 -1px 0 rgba(209,217,224,.702);color:#1f2328;display:inline-block;font:11px ui-monospace,SFMono-Regular,SF Mono,Menlo,Consolas,Liberation Mono,monospace;line-height:10px;padding:.25rem;vertical-align:middle}.markdown-body h1,.markdown-body h2,.markdown-body h3,.markdown-body h4,.markdown-body h5,.markdown-body h6{font-weight:600;line-height:1.25;margin-bottom:1rem;margin-top:1.5rem}.markdown-body h2{border-bottom:1px solid rgba(209,217,224,.702);font-size:1.5em;font-weight:600;padding-bottom:.3em}.markdown-body h3{font-size:1.25em;font-weight:600}.markdown-body h4{font-size:1em;font-weight:600}.markdown-body h5{font-size:.875em;font-weight:600}.markdown-body h6{color:#59636e;font-size:.85em;font-weight:600}.markdown-body p{margin-bottom:10px;margin-top:0}.markdown-body blockquote{border-left:.25em solid #d1d9e0;color:#59636e;margin:0;padding:0 1em}.markdown-body ol,.markdown-body ul{margin-bottom:0;margin-top:0;padding-left:2em}.markdown-body ol ol,.markdown-body ul ol{list-style-type:lower-roman}.markdown-body ol ol ol,.markdown-body ol ul ol,.markdown-body ul ol ol,.markdown-body ul ul ol{list-style-type:lower-alpha}.markdown-body dd{margin-left:0}.markdown-body code,.markdown-body pre,.markdown-body samp,.markdown-body tt{font-family:ui-monospace,SFMono-Regular,SF Mono,Menlo,Consolas,Liberation Mono,monospace;font-size:12px}.markdown-body pre{word-wrap:normal;margin-bottom:0;margin-top:0}.markdown-body .octicon{fill:currentColor;display:inline-block;overflow:visible!important;vertical-align:text-bottom}.markdown-body input::-webkit-inner-spin-button,.markdown-body input::-webkit-outer-spin-button{-webkit-appearance:none;appearance:none;margin:0}.markdown-body .mr-2{margin-right:.5rem!important}.markdown-body:after,.markdown-body:before{content:"";display:table}.markdown-body:after{clear:both}.markdown-body>:first-child{margin-top:0!important}.markdown-body>:last-child{margin-bottom:0!important}.markdown-body a:not([href]){color:inherit;-webkit-text-decoration:none;text-decoration:none}.markdown-body .absent{color:#d1242f}.markdown-body .anchor{float:left;line-height:1;margin-left:-20px;padding-right:.25rem}.markdown-body .anchor:focus{outline:none}.markdown-body blockquote,.markdown-body details,.markdown-body dl,.markdown-body ol,.markdown-body p,.markdown-body pre,.markdown-body table,.markdown-body ul{margin-bottom:1rem;margin-top:0}.markdown-body blockquote>:first-child{margin-top:0}.markdown-body blockquote>:last-child{margin-bottom:0}.markdown-body h1 .octicon-link,.markdown-body h2 .octicon-link,.markdown-body h3 .octicon-link,.markdown-body h4 .octicon-link,.markdown-body h5 .octicon-link,.markdown-body h6 .octicon-link{color:#1f2328;vertical-align:middle;visibility:hidden}.markdown-body h1:hover .anchor,.markdown-body h2:hover .anchor,.markdown-body h3:hover .anchor,.markdown-body h4:hover .anchor,.markdown-body h5:hover .anchor,.markdown-body h6:hover .anchor{-webkit-text-decoration:none;text-decoration:none}.markdown-body h1:hover .anchor .octicon-link,.markdown-body h2:hover .anchor .octicon-link,.markdown-body h3:hover .anchor .octicon-link,.markdown-body h4:hover .anchor .octicon-link,.markdown-body h5:hover .anchor .octicon-link,.markdown-body h6:hover .anchor .octicon-link{visibility:visible}.markdown-body h1 code,.markdown-body h1 tt,.markdown-body h2 code,.markdown-body h2 tt,.markdown-body h3 code,.markdown-body h3 tt,.markdown-body h4 code,.markdown-body h4 tt,.markdown-body h5 code,.markdown-body h5 tt,.markdown-body h6 code,.markdown-body h6 tt{font-size:inherit;padding:0 .2em}.markdown-body summary h1,.markdown-body summary h2,.markdown-body summary h3,.markdown-body summary h4,.markdown-body summary h5,.markdown-body summary h6{display:inline-block}.markdown-body summary h1 .anchor,.markdown-body summary h2 .anchor,.markdown-body summary h3 .anchor,.markdown-body summary h4 .anchor,.markdown-body summary h5 .anchor,.markdown-body summary h6 .anchor{margin-left:-40px}.markdown-body summary h1,.markdown-body summary h2{border-bottom:0;padding-bottom:0}.markdown-body ol.no-list,.markdown-body ul.no-list{list-style-type:none;padding:0}.markdown-body ol[type="a s"]{list-style-type:lower-alpha}.markdown-body ol[type="A s"]{list-style-type:upper-alpha}.markdown-body ol[type="i s"]{list-style-type:lower-roman}.markdown-body ol[type="I s"]{list-style-type:upper-roman}.markdown-body div>ol:not([type]),.markdown-body ol[type="1"]{list-style-type:decimal}.markdown-body ol ol,.markdown-body ol ul,.markdown-body ul ol,.markdown-body ul ul{margin-bottom:0;margin-top:0}.markdown-body li>p{margin-top:1rem}.markdown-body li+li{margin-top:.25em}.markdown-body dl{padding:0}.markdown-body dl dt{font-size:1em;font-style:italic;font-weight:600;margin-top:1rem;padding:0}.markdown-body dl dd{margin-bottom:1rem;padding:0 1rem}.markdown-body table th{font-weight:600}.markdown-body table td,.markdown-body table th{border:1px solid #d1d9e0;padding:6px 13px}.markdown-body table td>:last-child{margin-bottom:0}.markdown-body table tr{background-color:#fff;border-top:1px solid rgba(209,217,224,.702)}.markdown-body table tr:nth-child(2n){background-color:#f6f8fa}.markdown-body table img{background-color:transparent}.markdown-body img[align=right]{padding-left:20px}.markdown-body img[align=left]{padding-right:20px}.markdown-body .emoji{background-color:transparent;max-width:none;vertical-align:text-top}.markdown-body span.frame{display:block;overflow:hidden}.markdown-body span.frame>span{border:1px solid #d1d9e0;display:block;float:left;margin:13px 0 0;overflow:hidden;padding:7px;width:auto}.markdown-body span.frame span img{display:block;float:left}.markdown-body span.frame span span{clear:both;color:#1f2328;display:block;padding:5px 0 0}.markdown-body span.align-center{clear:both;display:block;overflow:hidden}.markdown-body span.align-center>span{display:block;margin:13px auto 0;overflow:hidden;text-align:center}.markdown-body span.align-center span img{margin:0 auto;text-align:center}.markdown-body span.align-right{clear:both;display:block;overflow:hidden}.markdown-body span.align-right>span{display:block;margin:13px 0 0;overflow:hidden;text-align:right}.markdown-body span.align-right span img{margin:0;text-align:right}.markdown-body span.float-left{display:block;float:left;margin-right:13px;overflow:hidden}.markdown-body span.float-left span{margin:13px 0 0}.markdown-body span.float-right{display:block;float:right;margin-left:13px;overflow:hidden}.markdown-body span.float-right>span{display:block;margin:13px auto 0;overflow:hidden;text-align:right}.markdown-body code,.markdown-body tt{background-color:rgba(129,139,152,.122);border-radius:6px;font-size:85%;margin:0;padding:.2em .4em;white-space:break-spaces}.markdown-body code br,.markdown-body tt br{display:none}.markdown-body del code{text-decoration:inherit}.markdown-body samp{font-size:85%}.markdown-body pre code{font-size:100%}.markdown-body pre>code{background:transparent;border:0;margin:0;padding:0;white-space:pre;word-break:normal}.markdown-body .highlight{margin-bottom:1rem}.markdown-body .highlight pre{margin-bottom:0;word-break:normal}.markdown-body .highlight pre,.markdown-body pre{background-color:#f6f8fa;border-radius:6px;color:#1f2328;font-size:85%;line-height:1.45;overflow:auto;padding:1rem}.markdown-body pre code,.markdown-body pre tt{word-wrap:normal;background-color:transparent;border:0;display:inline;line-height:inherit;margin:0;max-width:auto;overflow:visible;padding:0}.markdown-body .csv-data td,.markdown-body .csv-data th{font-size:12px;line-height:1;overflow:hidden;padding:5px;text-align:left;white-space:nowrap}.markdown-body .csv-data .blob-num{background:#fff;border:0;padding:10px .5rem 9px;text-align:right}.markdown-body .csv-data tr{border-top:0}.markdown-body .csv-data th{background:#f6f8fa;border-top:0;font-weight:600}.markdown-body [data-footnote-ref]:before{content:"["}.markdown-body [data-footnote-ref]:after{content:"]"}.markdown-body .footnotes{border-top:1px solid #d1d9e0;color:#59636e;font-size:12px}.markdown-body .footnotes ol{padding-left:1rem}.markdown-body .footnotes ol ul{display:inline-block;margin-top:1rem;padding-left:1rem}.markdown-body .footnotes li{position:relative}.markdown-body .footnotes li:target:before{border:2px solid #0969da;border-radius:6px;bottom:-.5rem;content:"";left:-1.5rem;pointer-events:none;position:absolute;right:-.5rem;top:-.5rem}.markdown-body .footnotes li:target{color:#1f2328}.markdown-body .footnotes .data-footnote-backref g-emoji{font-family:monospace}.markdown-body body:has(:modal){padding-right:var(--dialog-scrollgutter)!important}.markdown-body .pl-c{color:#59636e}.markdown-body .pl-c1,.markdown-body .pl-s .pl-v{color:#0550ae}.markdown-body .pl-e,.markdown-body .pl-en{color:#6639ba}.markdown-body .pl-s .pl-s1,.markdown-body .pl-smi{color:#1f2328}.markdown-body .pl-ent{color:#0550ae}.markdown-body .pl-k{color:#cf222e}.markdown-body .pl-pds,.markdown-body .pl-s,.markdown-body .pl-s .pl-pse .pl-s1,.markdown-body .pl-sr,.markdown-body .pl-sr .pl-cce,.markdown-body .pl-sr .pl-sra,.markdown-body .pl-sr .pl-sre{color:#0a3069}.markdown-body .pl-smw,.markdown-body .pl-v{color:#953800}.markdown-body .pl-bu{color:#82071e}.markdown-body .pl-ii{background-color:#82071e;color:#f6f8fa}.markdown-body .pl-c2{background-color:#cf222e;color:#f6f8fa}.markdown-body .pl-sr .pl-cce{color:#116329;font-weight:700}.markdown-body .pl-ml{color:#3b2300}.markdown-body .pl-mh,.markdown-body .pl-mh .pl-en,.markdown-body .pl-ms{color:#0550ae;font-weight:700}.markdown-body .pl-mi{color:#1f2328;font-style:italic}.markdown-body .pl-mb{color:#1f2328;font-weight:700}.markdown-body .pl-md{background-color:#ffebe9;color:#82071e}.markdown-body .pl-mi1{background-color:#dafbe1;color:#116329}.markdown-body .pl-mc{background-color:#ffd8b5;color:#953800}.markdown-body .pl-mi2{background-color:#0550ae;color:#d1d9e0}.markdown-body .pl-mdr{color:#8250df;font-weight:700}.markdown-body .pl-ba{color:#59636e}.markdown-body .pl-sg{color:#818b98}.markdown-body .pl-corl{color:#0a3069;-webkit-text-decoration:underline;text-decoration:underline}.markdown-body [role=button]:focus:not(:focus-visible),.markdown-body [role=tabpanel][tabindex="0"]:focus:not(:focus-visible),.markdown-body a:focus:not(:focus-visible),.markdown-body button:focus:not(:focus-visible),.markdown-body summary:focus:not(:focus-visible){box-shadow:none;outline:none}.markdown-body [tabindex="0"]:focus:not(:focus-visible),.markdown-body details-dialog:focus:not(:focus-visible){outline:none}.markdown-body g-emoji{display:inline-block;font-family:Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol;font-size:1em;font-style:normal!important;font-weight:400;line-height:1;min-width:1ch;vertical-align:-.075em}.markdown-body g-emoji img{height:1em;width:1em}.markdown-body .task-list-item{list-style-type:none}.markdown-body .task-list-item label{font-weight:400}.markdown-body .task-list-item.enabled label{cursor:pointer}.markdown-body .task-list-item+.task-list-item{margin-top:.25rem}.markdown-body .task-list-item .handle{display:none}.markdown-body .task-list-item-checkbox{margin:0 .2em .25em -1.4em;vertical-align:middle}[dir=rtl] .markdown-body ol .task-list-item-checkbox,[dir=rtl] .markdown-body ul .task-list-item-checkbox{margin:0 -1.6em .25em .2em}.markdown-body .contains-task-list:focus-within .task-list-item-convert-container,.markdown-body .contains-task-list:hover .task-list-item-convert-container{clip:auto;display:block;height:24px;overflow:visible;width:auto}.markdown-body ::-webkit-calendar-picker-indicator{filter:invert(50%)}.markdown-body .markdown-alert{border-left:.25em solid #d1d9e0;color:inherit;margin-bottom:1rem;padding:.5rem 1rem}.markdown-body .markdown-alert>:first-child{margin-top:0}.markdown-body .markdown-alert>:last-child{margin-bottom:0}.markdown-body .markdown-alert .markdown-alert-title{align-items:center;display:flex;font-weight:500;line-height:1}.markdown-body .markdown-alert.markdown-alert-note{border-left-color:#0969da}.markdown-body .markdown-alert.markdown-alert-note .markdown-alert-title{color:#0969da}.markdown-body .markdown-alert.markdown-alert-important{border-left-color:#8250df}.markdown-body .markdown-alert.markdown-alert-important .markdown-alert-title{color:#8250df}.markdown-body .markdown-alert.markdown-alert-warning{border-left-color:#9a6700}.markdown-body .markdown-alert.markdown-alert-warning .markdown-alert-title{color:#9a6700}.markdown-body .markdown-alert.markdown-alert-tip{border-left-color:#1a7f37}.markdown-body .markdown-alert.markdown-alert-tip .markdown-alert-title{color:#1a7f37}.markdown-body .markdown-alert.markdown-alert-caution{border-left-color:#cf222e}.markdown-body .markdown-alert.markdown-alert-caution .markdown-alert-title{color:#d1242f}.markdown-body>:first-child>.heading-element:first-child{margin-top:0!important}.markdown-body .highlight pre:has(+.zeroclipboard-container){min-height:52px}
Where Is The Data Catalog? | DoltHub Blog Why is there no place on the internet to get useful, maintained data? This question has puzzled me since 2013. We can rent a server. We can rent a database. Why can't we rent the data in the database? Something like that would be extremely useful. It's weird for it not to exist. Another way to state this question: how do we share data today? What are some of the limitations of those methods? Do any of those methods produce efficiency through centralization?
Surveying the data landscape I came up with three ways data is shared. I will order these from least sophisticated to most sophisticated.
Least Sophisticated: Email a CSV file
If the data is small enough and the schema isn't complicated, just export to CSV and email it to me. I'll load it into Excel or Google sheets and do my thing. When I want a new copy with the updates, I'll let you know.
Moderately Sophisticated: A cloud storage bucket with JSON in it
The data is too big for email or the schema is relatively complicated, i.e. it has one to many mappings. We need another way. Well let's just make a whole lot of JSON blobs and put those in a cloud storage bucket. Here are the permissions to the bucket, go download it. If the data changes, I'll let you know and you can download it again.
Most Sophisticated: A private or public API
This data is pretty valuable. I don't want to give you all of it. Just send up a key and I'll give you back the information you need. Don't store a copy because the data will be changing and you always want the latest. Plus, you could be stealing from me if you are storing it. If you have an application that needs access to the whole thing, too bad.
These methods all share one thing in common. There is a single view of the data. The data you are getting is "the truth", at least at that point in time. If the data changes either in the source or by your hand, that is the new truth. You can either accept a single truth or fork. A fork means you maintain your own copy of the data for ever more.
At this point, I think it makes sense to introduce two general buckets of data: immutable log data and dictionary data.
Immutable log data is colloquially called "big data." It's all the information that we are collecting from our ever increasing electronic footprint: our phones, our web browsers, our cameras. Humans generally don't modify this data. The sensor is the single source of truth. We spend our time aggregating and labeling this data to find problems in the sensor readings and predict future sensor readings. There's been massive advancement in the tools and capabilities we enjoy in this space in the last ten years.
Dictionary data is human curated. It comes in the following form. There is one or many keys and a bunch of columns with more information about said keys. It's used to connect multiple streams of immutable log data or add context to immutable log data. Some examples are IP to geographical location mapping, or product SKU to corporate security mapping. This data is usually about the "three Ps": people, places, or products. There has not been much innovation here since the invention and adoption of the API back in the mid-2000s.
Back to this place on the internet with the data I yearn for. A JSON filled cloud storage bucket may be the best solution to sharing immutable log data. Cataloguing these is what the AWS data registry or Google Dataset Search is trying to do. There are some API catalogs like RapidAPI . APIs are the best way to get dictionary style today. As far as I can tell, none of these have achieved the status of place on the internet to get useful, maintained data.
I think the reason none of these solutions have caught on is that the method for sharing does not encourage collaboration. The current methods encourage complete trust or forking. For an internet data catalog to emerge, a format to encourage internet-style collaboration must exist. This is especially true of dictionary style data where the truth evolves and different people can have different views of the truth.
Pre-2000, we didn't collaborate on source code much either. The pre-2000 source code world looked a lot like the data world today, large institutions asserting power over the source code they produced with smaller players getting crushed or acquired if they got too much traction. Google is the new Microsoft. Facebook is the new Oracle.
The rest of us started to share source code because we didn't like Microsoft or Oracle. We also agreed on a format to distribute open source: first patch files, then CVS, then Git. We think by porting the semantics of version control to databases, specifically merging and branching semantics, we have a chance to create the same collaboration dynamic in data that we see in source code. We all need to band together to topple the giants and usher in an age of data collaboration so any small player with a great idea can flourish. We can only do this if we have a format built for data collaboration.
We built this with Dolt . Dolt is a database built from the engine up to encourage collaboration and sharing. Dolt is git semantics on top of a SQL database. You can see who changed what data and why. You can branch a copy, make some writes, and still get updates from the master branch. If you make a change to the same value, a conflict will be thrown. We also built DoltHub , a place on the internet to share these databases. With a little help from you, we think DoltHub can evolve into the place on the internet to get useful, maintained data.
JOIN THE DATA EVOLUTION
Get started with Dolt