[DAO: bafkrei] Creation of DAO-Owned Data Aggregation Layer for Decentraland Core

by 0xe400a85a6169bd8be439bb0dc9eac81f19f26843 (howieDoin)

Should the following Tier 4: up to $60,000 USD, 6 months vesting (1 month cliff) grant in the Platform Contributor category be approved?

Abstract

We believe Decentraland player usage data should be collected by the DAO and made available to all Decentraland users.

User location data is currently available through use of the /islands endpoint on DCL content server nodes. This information can be collected periodically (e.g. every few seconds) and stored in a database to be served up via API queries available to the Decentraland community. It is not feasible for multiple parties to independently collect this information as it would cause undue strain on Decentraland infrastructure.

This data should be collected and owned by the DAO, and therefore be owned by Decentraland’s users and not a private company or individual.

Grant size

57,300 USD

Beneficiary address

0xe64581F067Cfdce58657E3c0F58175e638C30f2B

Email address

howie@atlascorp.io

Description

Why collect this data and make it available?

As Atlas CORP has been in the analytics business for 18 months, we know that there are some immediate benefits that could be recognized by the Decentraland community through the creation of this service:

1. Grow the number of successful Decentraland Builders

Move the conversation from “why Decentraland?” to “why your use case?” for builders looking to win clients or raise funds.

Providing high level Decentraland statistics will increase the growth and success of builders. Often the first question asked of teams looking to build or sell in Decentraland focuses on the metaverse itself and not the team’s use case; although Decentraland is the most decentralized metaverse it still exists in a competitive landscape. Investors and clients often need to justify investment or choice of metaverse before they can start to focus on a team’s specific use case.

By providing data on Daily Active Users, Total User Growth of Decentraland, and traffic by parcel, builders can refer to these open source analytics instead of each team attempting to obtain them by themselves.

2. Prevent undue load on Decentraland Infrastructure

While the data in question is public, everyone cannot query this data for themselves without an adverse impact on Decentraland nodes.

There has already been discussion in the forums about shutting down external access to player position data due to increased loads felt by Decentraland content server nodes. Too many concurrent requests to these endpoints would have the effect of a DDoS attack which could impact the quality of the service each node can provide.

Therefore instead of a) playing out the Tragedy of the Commons that would occur if everyone collected the data for themselves, b) removing access such that nobody can benefit from this data, or c) allowing the data to be acquired by the highest bidder/private entity – we believe the ecosystem will benefit most from the data collection being done once and everyone sharing in access to that data.

3. Prevent Private Monopolization of the Data

We at Atlas CORP have first-hand knowledge of how valuable this data can be to those operating in Decentraland. User position data can be used to determine how many users attended events – critical for event hosts to understand how well their event performed. Daily active user data is crucial to those seeking to invest in the metaverse to help understand returns on investment.

We believe that no private institution should be able to gate-keep this information from the rest of the community. We at Atlas CORP used to make this information freely available using our own private hosting infrastructure, but we outpaced our capacity requiring a move to more dynamic scaling solutions. This is why we’re here talking about this proposal.

tl;dr

The DAO can provide the Decentraland community a free source of user data via API for up to one year for the cost of $57,300. The existence of this data set will help to grow the builder and entrepreneurial community by providing important metrics needed to win clients and funding, and prevent monopolization by private entities. Atlas CORP is the suitable candidate for this development given an extensive history in Decentraland data collection and analytics.

Specification

What Data is in Scope?

This proposal is only for Decentraland user data as reported by the /comms/islands endpoint on Decentraland Content Servers.

This proposal does NOT include:

  • Content Server data (e.g. scene files, user profile history)
  • Scene-specific data (e.g. object clicks and interactions)
  • Any personally identifiable information (PII), excluding ETH wallet address
  • User IRL location data
  • Any derived content from a user’s ETH wallet

Data will be collected from all active, registered Decentraland Content Servers which at the time of writing includes hephaestus, hela, heimdallr, baldr, loki, dg, odin, unicorn, marvel, and athena.

This data set can provide a platform for building more sophisticated reports by the DCL developer community. The DAO could also choose to one day monetize access to this data (e.g. when the data is being used directly for profit), as well as augment what data is collected and made available. It is important to note that this proposal is currently limited in scope to the one data set and free access to the community, and that these next steps may be the subject of future proposals.

How will this work?

Data will be collected every 20 seconds and piped into a Mongo Atlas cloud database, with a Digital Ocean server set up to provide API access to queries on the data.

An automated feed will be set up to collect data from an authoritative source of each active DAO node. Collecting data every 20 seconds will result in 3 datapoints a minute; graphs using one-minute granularity will have 3 datapoints on which to aggregate data points per minute. Data collection will be set up redundantly on two or more servers, or on a single load-balanced cluster, to minimize downtime in data collection. This data is expected to grow at 1Gb per day, which may accelerate with the growth of daily active users.

As the data is natively in JSON, we propose to use Mongo Atlas as the cloud database of choice. The database will also be deployed with multiple nodes to minimize downtime. Mongo provides an easy way to scale for future needs – whether through increasing storage capacity or sharding the deployment for increased API and query load.

To minimize infrastructure costs, we propose keeping only 3 months of data available for public consumption. A process will be designed to backup, purge, and post/host historical data such that users can perform historical analysis without excessive cost to the DAO.

An API server will be written to provide simplified API access to the data in the database. These queries may include things like – users per minute (global or per parcel), daily active users (global or per parcel), and unique Decentraland visitors (global or per parcel). The API code can be made open source and hosted on GitLab, but hosted privately on Digital Ocean to prevent unauthorized access to the DAO’s database. Additional access can be granted to additional DAO representatives if deemed appropriate.

API access will remain open, although a throttling mechanism will be put in place per IP address to prevent DDoS of the API servers. In addition, query data (e.g. who’s asking for what) could be saved in the database and made available as APIs for complete transparency.

A dashboard will be made available to show recent Decentraland population data and daily active users for the reportable data set.

Personnel

The Development Team

The development of this data platform would be done by the Atlas Corporation @atlascorp_dcl team, who have extensive experience working with this data set:

  • HowieDoin – Lead Analytics & Infrastructure innovation
  • MorrisMustang – Lead DCL & Solidity innovation
  • JosephAaron – Operations and task management
  • StaleDegree – Senior Solidity/UI Development
  • RyanNFT – Junior Developer

Roadmap and milestones

What will this cost the Decentraland DAO?

This project will cost the DAO $57,300 and will take approximately 3 months to develop.

This breaks down into:

$7,500.00* - Infrastructure and Hosting Costs:

  • Budget for Mongo Atlas DB for one year
  • Budget for Digital Ocean API servers for one year
  • Budget for Digital Ocean Data collection servers for one year
  • ENS domain for three years

*This estimate is based on current pricing for each of the above platforms. $7,500 may not last a full year if pricing is altered by the provider.

$45,000.00- Development Costs (estimated 3 month delivery)

  • Data collection with redundancy and failover to set up the data collection
    rails
  • API query development resources to produce queries and prevent
    overuse
  • DevOps and infrastructure development resources to automate as much
    as possible
  • Technical Writing resources to provide user-facing API documentation
  • Front-end development resources to create the dashboard

$4,800 - Ongoing Support Costs (3 months post deployment)

  • Code updates when breaking changes occur due to external forces
  • API query user support in the Decentraland discord

Vote on this proposal on the Decentraland DAO

View this proposal on Snapshot

In general I think this a great idea, but isn’t the /comms/islands endpoint being deprecated in a few weeks as part of changes to the infrastructure?

That would be the second time the Foundation has recommended removing that endpoint, to which the community needs to say NO.

Initially the discussion was around the burden that end point puts on the servers. This is the precise reason for this proposal. That endpoint needs to be saved, so developers can access that data from a cached layer, that can be scaled to meet the needs of app developers, without causing conflict for those in world.

The Foundation does not explicitly have control of the road map as it relates to the Decentraland protocols, which includes the content server and its API. That API endpoint is what has made it possible for us to do the work we do in Decentraland for the last 18 months.

Removing that end point means the data is now no longer open and accessible. It would then only be available to the Foundation, as they are able to gather directly from the client.

1 Like

but from my understanding it’s kind of already happening - see the catalyst channel in the main dcl server

So much for users being in control of their data. This completely unacceptable and not in line with the mission they are projecting, nor were the key players who make use of that data consulted in this process. A major misstep by the catalyst team. Disenfranchising developers who have worked hard on platform developments is not the way. The Foundation team should not be deprecating that end point. Closing that end point is an attempt to control the data that should be open, and should be rejected by the community.

1 Like

so maybe first proposal should be to stop that endpoint from being deprecated - maybe these could be combined though, or if the DAO takes over handling stats in an opensource way that would also be fine - but yeah, i agree, just losing access to that data with almost no warning is pretty lame

1 Like

not gonna lie a lot of this goes over my head but @MorrisMustang and @HowieDoin have been amazing to the Decentraland Community and I have confidence they will execute this properly, so this is an easy YES vote from me!

1 Like

Lol when 2 people can pass the proposal. Joke ass vp system. give these boys the money collect that data.

@MorrisMustang @HowieDoin this got sidestepped in other discussions, but my big question here is if ATLAS starts collecting data and distributing it as the main proxy of the catalyst server infrastructure as a whole, will we still have access to the same granularity of data as we do now in a public way?

it sounds like the API you would devise just gives access to relatively basic pre-computed stats, but maybe i have that wrong.

Currently, at a granularity of 20 second intervals, the data is about 1 gig per day with the current user base. Would prefer to be saving data at 5 second intervals, but that would require a significantly increase in infrastructure costs. This is definitely up for discussion, and its a balance between costs and benefits that can be decided by the community as puts the end points to use.

As this data aggregation layer is built out, our team would love feedback from the community about what queries should be supported. The data will be stored raw, with queries built to unpack the data into different useful insights. All of the data will also be available for bulk download, which would allow you to spin up an independent database to build whatever queries you may want. In an ideal world, community developed queries could be added to the protocol.

1 Like

I voted YES on this proposal - The Atlas Corp team, to my knowledge, has been a provider of ad-hoc reports to multiple organizations & individuals in Decentraland for various land KPI metrics - pulling & transforming data from a source that a normal user may not easily be able to access and query.

Decentraland users should have the ability to access real-time data to utilize for their independent reporting needs (land activity, understanding total activity in DCL at a given day/place, marketplace data). Personally, I would love to pull data and perform my own analytics over Decentraland’s land activity for education purposes. I’m sure there are a few data analysts in our community who may feel the same.

From my perspective, this API database to be created by Atlas Corp will help users access a raw dataset & then transform the data and analyze it for their needs. This can enable data scientists and developers in our community to band together and help create dashboards that may help the broader community understand the current stats of DCL.

I would love to see a few members of the community get together & create a focus group to help Atlas with their development of what data points would be most beneficial for this database to make this an efficient dataset, or reducing data points that may not be as crucial.

The budget seems reasonable given the scope of this project.

As long as the Atlas team is able to access the data, this proposal is needed for decentralized ad-hoc data reporting. I am not familiar with the change in the catalyst infrastructure in place - this might be another proposal or discussion.

The other solution is for the community to rely on an organization who has high-level reporting dashboards, reporting on an XXX basis. However, some developers may only need specific parcel data, or drilled down details that may not be reflected into a “basic” report.

Just my two cents…

Thanks,

Maryana

2 Likes

Based on the reputations of the authors and those who voted in favor of this proposal, I’m voting YES. Admittedly this goes a bit above my head, but I believe the data should be freely accessible and not only accessible to the foundation.

I appreciate the in depth explanations from Morris and Maryana. This also helped me to make this decision.