A really interesting topic. Sorry this is a bit of a ‘long stream of conciousness’ post.
I agree with mixmix’s idea about just starting. Migrating data between formats is not a big job. Having said that, I have found that working with Python and Mongo has been good in the past for its flexibility of data storage, simplicity for linking with APIs, available tools for analysing data, and outputting all kinds of formats.
My approach to these things is to think about what questions you’d like to be answered. Perhaps I’ll get the ball rolling by putting a few out there and briefly putting some potential answers. I find a good approach is to put all the ideas out there at first, don’t worry if they’re not very feasible, someone else might come up with a nearby idea which is feasible. However, in a later stage it’s then good to choose the best/most feasible ones and run with them. But please take these ideas as simply being thought-provoking and feel free to critique/add to them.
- What are the aims an objectives of the data collection? What’s the data for, why do we want to collect it?
+ These are always difficult questions to answer, which is why big corporations like the BBC can spend silly money on projects without having a clear reason as to why they're doing it. I think the questions are important though. Sometimes "because we can" is a good enough answer, but I think it's worthwhile to at least give this question some thought.
+ For me, I'm interested in the idea of understanding co-ops as a kind of ecosystem. Different co-ops have different relationships with one another, there are different 'species' of co-op that work in different niches, etc. I am interested in computer/mathematically modelling this ecosystem in some way, and data can always inform models. The larger goal would be that by understanding how the ecosystem works, we can adapt better to it.
- What classes of data would we want to collect?
+ Online data - the online presences of the different co-ops.
+ Financial/trading data - how co-ops trade with other companies.
+ Geo-location data - where the co-ops have offices and outlets, etc.
+ Member/employee data
+ Temporal data - how these things have changed over time
- What accessible data are currently out there?
+ I guess all cooperatives have some kind of Internet presence. One idea might be to make a collection of their websites. You could glean geolocation data from the sites and look at how the different sites link to one another. I have some thoughts about adapting a sampling algorithm, and network analysis tools, which I have already written, to do this.
+ A second kind of Internet presence might be on social media. I've done some work mapping Twitter accounts in the past and finding out how they link into groups (https://www.theguardian.com/news/datablog/2013/mar/15/twitter-users-tribes-language-analysis-tweets). I could look into doing something along these lines with Twitter, for which I have quite a few API tools - however it might be quite limiting as it is linked mainly to Twitter. However, I'm always up for moving to other APIs. However, perhaps co-ops are often a little under the social media radar, so the social media approach might not be so great?
+ I guess companies house would have some records?
+ It might be possible to send out a questionnaire to co-ops to glean some of these data?
+ What have others already done which might be useful?