Data + Curiosity: Data science for fun and profit, a conversation with Tanya Cashorali

In this episode of Data + Curiosity, I had the absolute pleasure to chat with Tanya Cashorali about her experiences building a data science company and using Shiny to help out her WoW Classic guild. I learned so much from Tanya, and am thrilled to be able to share the conversation with you here:

[EMBED VIDEO] - need to wait until it’s live on youtube

You can also read a lightly edited transcript of our conversation below:

JESSE MOSTIPAK: I am on the edge of going all in on Final Fantasy-- I mean, I'm in. I'm in. I love the game. But I am-- I can feel myself on the edge of being someone who enjoys the game to someone who lives the game. 

TANYA CASHORALI: What other way is there to game? 

JESSE: Do you remember how we met? 

TANYA: Oh, boy. It had to be Twitter related. It had to be-- I'm thinking video game related. 

JESSE: 100%. 

TANYA: Was it talking about WoW or? 

JESSE: No. So I looked this up, and the DM is still there. We have known each other since 2016. 

TANYA: Wow! When the world started to go crazy. 

JESSE: So we have been friends since 2016. And you were one of the first people on Twitter to welcome me to the Data Science community. You were like, I have this data science Slack group. And this was before Slack was ubiquitous. 

This was the first time I had heard of Slack. I was like, I don't know what Slack is!Just let me download this app and log in. And so you were like, yeah, it's for data science but we have a video games channel that I think you may like. 

TANYA: That sounds exactly right. That was when I started Friendly Tech Space. 

JESSE: Way back when. You've been going strong with that for-- I mean, I think of you as one of the original data science community builders, right? Like that was-- everybody was in there. 

TANYA: I'm trying to reinvigorate it actually. So the whole reason I started it was-- so I had been at these different startup jobs and yada-yada. Doing my thing with a big company and trying different things. And I knew I wanted to start my own company. And I knew I wanted to go out on my own. 

But I had met so many awesome people just along the way at conferences, on Twitter, just all over the place. And I'm like, I need to get all these people together. And I also didn't want to be disconnected from everyone in the community or people at my job when I went out on my own. 

So it was totally a selfish thing where I was just like, I want to invite all these smart, interesting people so that I could talk to them. And also see them interact. Because I love being a fly on the wall when I introduce two people that I think would just explode and talk about crazy things. 

I remember messaging my friend, Jimmy, who you know. Who you’ve talked to. And I was like, dude, I have this idea. What should I name it? And I think before he even responded, I had created it. I was like, don't worry about it. I got it. Friendly Tech Space. It's all done. You're invited. 

And I invited all my computer science college friends, and old coworkers, and again, just people I met. And it was supposed to be a place where you could talk about anything. Privacy was huge. No recruiters allowed. Like, I have an issue at work. Or I have a technical question and I don't feel like dealing with the dredges of Stack Overflow and getting flamed for not having a reprex because I have proprietary data, stuff like that. 

Where I was just like, there could be a much friendlier, less toxic community where we could talk about these things. And it grew organically. It was invite-only. And people were like, I have a really great person that should come join Friendly Tech Space. And I even pull contractors from there now. People are looking for consulting work. 

But anyway, all that to say, COVID, a lot of stuff happened. I wasn't as active in it. And we just relaunched our quarterly-- we do these Tech Talks. And we just had one yesterday with a member who talked about the history of data storage. 

Just so nerdy. And it was actually super fascinating and he gave an awesome talk. He went from talking about how data used to be stored and carved onto bones, all the way up to the SSD. 

JESSE: I remember when data lake was a thing. It did not even cross my mind to go back further. 

TANYA: Yeah, he went to punch cards. And punch cards, but it was made-- it was for textiles and the thread and needle will go through the certain holes on the punch card. It was pretty cool. So that's all recorded and if people are interested, I can get them more info on that, but. I can share those links later. 

JESSE: Friendly Tech Space, I do remember lots of data science stuff. I remember people have asked coding questions. For a really long time, there was a very active Game of Thrones channel. I was so sad when I saw that archived because it was the end of an era for me. 

TANYA: I know. I know. Well, and now there's House of the Dragon which I have not watched yet. 

JESSE: I will say that the show will be successful despite the show runner's work. So I think the acting is phenomenal. I think the storylines are very rushed. I don't think there's a lot of character development. But I think that the acting is phenomenal. And I hope that the show runners just slow down and let the story breathe a little bit. 

TANYA: Interesting. OK. Well, I waited till they were all out so we can binge it. 

JESSE: Yeah, that's fair. That's fair. I've been rewatching Game of Thrones and I am realizing, just blitzing through it, I'm at a point where I'm like, oh, yeah. I know that I watched this and I don't remember anything that happened. I'm googling, did Jamie Lannister die? What happened to Jaime Lannister? 

TANYA: It's like a lifetime ago, Game of Thrones. That was pre-COVID. Everything pre-COVID is-- 

JESSE: It's just a different. 

TANYA: It's like BC, Before COVID. 

JESSE: So speaking of before COVID, you worked as a data scientist primarily in biotech, right? And then you started your own company. Can you give-- how did that happen? What is the path that you took? 

TANYA: Yeah. It's an interesting one. Let's see. So I started at a biotech company after-- I started learning R, actually, in 2005 or 2006, which was pretty early. And worked for just really smart PhDs and MDs at Harvard Medical. And I felt way in over my head but learned a ton. 

Got a job at a small biotech startup called GNS Healthcare. Worked for a lot of smart people. Learned a ton. But there was something that I knew I wanted to do. I had started to go to these entrepreneur meetups at Venture Cafe in Downtown, Cambridge. Was talking to VCs. And it was just a really exciting idea to go out and do something on my own. 

I wanted to always be my own boss. I wanted to always have my own schedule. Sitting at a desk for eight, nine hours a day and being told what to do just-- I was miserable. 

Even in the coolest job, and not just at my first job, all my jobs. I was happy for a year or two and then I'm like, OK, I'm sick of this. I'm sick of either doing the same thing or either being underutilized or undervalued. And I think it's a common theme for a lot of people in this industry, especially, the people that really care, and want to do good work, and just make a difference. 

So anyway, I tossed around a lot of product ideas. For some reason I thought I needed a product, and I needed to raise capital, venture funding. And pretty quickly after hopping around different startups, and even working at non-healthcare startups, I was like, I have a lot of skills that I could use and that people will pay for. 

So yeah. I think I went to-- what happened was I burned out a little bit from working at startups for so long. Took a break and decided I want to go try a big company. Fortune 500. I think Biogen's in the Fortune 500 and I went there for a data science position. And I thought, I'll just take my time, learn some things, do one thing really well. And that lasted all of nine months. 

I had an old boss hand me a lead for a consulting project that I started on the side. And then I had a Harvard professor come to me that wanted sports data scraped from ESPN to automate his betting. And before I knew it, I was doing stuff on the weekends and just, it was too much. And I thought, you know what, I'm just going to go for it. 

I bought the domain name, TCB Analytics. It's not clever. It's just my initials. Same way I did Friendly Tech Space. I just impulsively do things sometimes without planning which is why I'm fortunate I have my wife who's very much not a risk taker, to even me out. 

So yeah. So I told my boss at Biogen. He was great. He was like, we'll be a client. And then all these different departments I worked with at Biogen, drug manufacturing, safety, commercial, they all came on as clients. So that was awesome. 

And I remember being scared leaving and thinking, the checks are going to stop coming soon, like two weeks from now. So it's all up to me now. And I loved that feeling. I was like, there's no one else to blame now but myself. 

And I like that sense of ownership and accountability. And I've been-- I feel like I've been talking a big game for a long time so why don't I just prove it to myself. And yeah, I haven't looked back since. It's been awesome. 

JESSE: So how long has TCB Analytics been around? 

TANYA: So I started in, I want to say, November of 2015 or so. So right around when I started Friendly Tech Space. Right around when I messaged you. 

It's fortunate to go-- and my wife's health insurance, of course. And we had her income so that helps. But I definitely recommend to anyone doing it, save as much as you can before you make the leap. 

What I really did was bootstrap. Have paying clients while I'm building this thing. You're grabbing at every possible opportunity in the beginning but now we're starting to be more selective. We're forming more of a niche and where we want to focus.

JESSE: So I think there's two things that are really-- I mean there's a ton of interesting things. There are two interesting things that I want to focus on. One is that you've done all of this with primarily the R programming language, correct? Like R and SQL? 

TANYA: R, SQL. Maybe a little bit of Python here and there, not much. Shiny is our main focus now, building out Shiny dashboards. And then, every now and then, some BI tools will help with Tableau, or Power BI, or something. But trying to move away from that and focus mainly on Shiny and R. 

JESSE: Which is pretty incredible because I think when people think about data science, machine learning, moving into that space, I think that there is a predisposition to assume that it's all done in Python. And you are-- you have almost been turning away clients. And you work primarily in R, which I think is just an incredible success story. 

TANYA: Yeah. I think mainly it's because of where I came from in the healthcare space. So R was just the go to de facto language created by statisticians, for statisticians. And it had all the packages to analyze gene expression data way back when. And I think having the experience of using that outside of healthcare as well, was what was really valuable. I was like, oh, I can do things really quickly and effectively in R. 

And my coworkers would be coming up to me. I worked at a telecom company. And I'd create a chart, ggplot, post it in Slack and they'd run over to my desk and be like, how did you do that so fast? 

So you could only imagine when Shiny came about, it was like, wow, now I can build data products. 

JESSE: And so the other thing is, you left Biogen with clients, right? People were reaching out to you. You were talking about this guy from Harvard with the sports data. And this to me suggests a lot of relationship building. And is that something that you-- 

I consider us both extroverts. You may be an introvert. I don't mean to tell you what you are. But we both enjoy talking to people, to some degree, and connecting people, and introducing people to each other. Is that something that you built intentionally as you were progressing through your career or did it just happen because you will talk to everyone? 

TANYA: A little bit of column A, a little bit of column B. I couldn't wait to get out of my hometown, honestly, in Rhode Island, and get out, and meet interesting people, and learn from people. And I always had that-- yeah. I want to meet-- I just want to meet interesting people and get out of my little bubble. 

And then I-- my first co-op, I was really lucky to have a mentor who advised me on adding every single person on LinkedIn that I met. And so I did. And I started collecting-- building my LinkedIn network in 2004 or something. 

And I'd collect-- I'd go to all these conferences. I was going to the biotech Tuesday meetup in Cambridge. So I was just always at these events, always at these meetups, just having a blast talking to people. You meet some really cool people. Not everyone is, but most of the time, people are pretty-- if they're at these things, they want to meet people too. And it's fun. 

So yeah. I would go home with a stack of business cards. And just the next day, go on LinkedIn, and just add them, and try to keep in touch with the ones that really resonated. Whether it's a conversation about anything, it didn't even have to be data related. Just met a lot of cool people that way. 

And then, I think I was even hired at one point for some of my connections in the healthcare space in Boston. So I was really lucky to just work for a lot of well-known, established people early on in my career. 

JESSE: That's pretty incredible. And I mean, we met because of video games. And ultimately, I've done some work for TCB analytics, right? So you never know. You don't always have to talk shop. 

But speaking of video games and hanging out with other people, we have lots of things in common. But one of the things we have in common is World of Warcraft. And I am so excited to talk to you. 

TANYA: It's something that a lot of people don't want to admit, right? Actually, my wife was more comfortable telling people that she was gay than she was-- that she played World of Warcraft. 

JESSE: So I mean, you probably-- how long have you been playing WoW? 

TANYA: So she got me in. I blame her, OK? Because I played games-- yes. I played games a lot when I was younger. Counter-Strike, I was more like, StarCraft, Counter-Strike, Diablo, shooter games, you name it. I've been playing a lot of games but I never played an MMO. 

And I avoided it in college on purpose because I-- there would be guys failing out of school because of WoW. There was a guy I worked with on a co-op that would be falling asleep at his desk because he was up raiding the night before, right? 

JESSE: Yeah. There would be newspaper articles where people would be like, I realized when I had to pull over in the middle of a family vacation at a truck stop to raid for three hours and then my family wouldn't talk to me, that maybe I had a problem. 

TANYA: Yeah. So I started during the pandemic. Actually, sorry, right before the pandemic, I think. All my friends. It came back out. Classic relaunched. So it's a game from 15 years ago, right? And so I want to say it was like 2019 that Classic relaunched. 

JESSE: Well, there were the vanilla servers, right? So way back when, you could play on a vanilla server. These were community led. But there was a big demand for these servers. 

And so I remember-- I don't remember the details. I can't even remember the guy's name. But there was like a sit down with Blizzard. And then Blizzard was like, I think we can monetize this. So then relaunched as WoW Classic. 

Which, for comparison, while retail is, you play. You get all of these quality of life perks and changes to the game. WoW Classic is, you get what you got when the game was released. TANYA: 15 years ago. So all my friends-- a lot of my friends that play video games were like, guys, we should all play, and form a guild, and do this. And I'm like, all right. I'll try it because it's being reinvigorated. There's a community now. And there's all kinds of people in the low levels. And you can-- the whole idea is as a community where you can party up and play with people. 

So I gave it a try. And when I had my-- I think I did my first raid. And I found out about this thing called Warcraft Logs. And that's when I got super-hooked, right? Because Warcraft Logs is all data. It's just a ton of data on literally everything. What you casted, what time, what second, what you used, where you moved, I mean, there's a lot. 

So immediately I was like, oh, I can use this to get better. And so I started diving into the Logs. And now that's my data driven weekly video game event. 

JESSE: So do you sit down-- and I think with Logs, you can even pull other people's Logs, right? If you're in the same raid, you can then parse the data for your entire raid and very much say, hey, Tanya, you stood in the fire. You wiped. This all your fault. 

TANYA: --to do that. Yeah. Exactly. 

JESSE: But you're in a guild where-- it's not a DKP Guild, is it? 

TANYA: Actually, I think it's GDKP. It's like a gold– I don't even know what they said. It's a G-bid where, yeah, we basically do these raids weekly. And anyone can come and sign up to be what's called a buyer. Mostly though, it's just our group right now. Just you want to get all the good gear, and loot, and upgrades for your people first. And then you can carry other people through the content. 

So what happens is a piece, let's say, a sword drops that everyone wants. People bid on it with the in-game currency, gold. And let's say, it's going for 300 gold. Someone can bid 400. It's literally an auction. And the highest bidder gets the piece. At the end of the run, let's say there's like 25 pieces, that gets divided up and split between the whole raid. And the Guild host gets a cut, like 10% or something for organizing, running it all. 

I mean, it's a lot of-- it's a good exercise in how hard it is to coordinate 25 people to do something. We all know if we worked in-- 

JESSE: Because you're doing 25-man runs. And it's all bids. It's all gold bids, or g-bids, for every piece of loot that drops. And loot generally drops from a boss. But a boss can drop more than one piece of loot. So it's not even 1 to 1. What are you doing-- you, I believe, are doing something data sciencey to help your Guild manage this? 

TANYA: Yes. So this is-- any time I see an opportunity to automate something, or make it like less Excely, or get the hell out of Excel, I jump at the opportunity, right? So yeah. 

There's all these crazy Google Sheet templates that are out there for people to use. And what you have to do is copy-paste all the data in. It's very manual. And someone asked me a question because they knew I'm a data nerd in my Guild. They're like, hey, can you calculate this for me? And I was like, you know what, no. At Google Sheets, it actually would take me longer to figure out than if I were to just get the data in R and do it there. 

So that's what we started to do. So now there's a couple of add-ons that export the data in JSON, CSV, all these formats. They make it really easy. We have a shared sheet. And we talked through everything like the data formats. And I was like, put it here, copy-paste the bid data here, and copy-paste the raiders that attended here, right? 

So now I literally have a job that runs on EC2 that pulls that data at half hour increments and populates, pre-processes it at all, calculates the host cut. Totals up how much people are spending. How many raids they've attended. You can slice the data. You can look at like, how much did this sword go for. And I have a Shiny app and so now we use that, which is actually a lot easier. Everyone seems to love it. Like after raid, everyone's like, can you update the app, Tanya? 

So it's cool to see people nerd out about data, though, and then get excited about it. And then be like, oh, how are you doing this? And are you hosting this? I'm like Shiny App's IO is hosting it. 

JESSE: Have you converted anybody to a Shiny programmer through World of Warcraft? 

TANYA: I think it has-- I think it has happened, yeah. Because I've had people ask me about it and then get interested. And say like, oh, it's free? And then they want to go build their own thing. But yeah, people are pretty-- they're pretty impressed by it. They're like, wow, you did this all by yourself? 

JESSE: And I think-- well, and you didn't just do it but you strung together literally a full stack using Shiny, right? Like you've got the data extraction, transformation load-- the ETL. It's got a new name now, right? Data engineering? I don't know. I can't keep up with it all. And then you built that-- 

TANYA: Data pipeline. 

JESSE: Right? You built the app but then the app has visualizations and it's interactive. And then you have it hosted, right? It's deployed somewhere. It's in-prod. And people use it. 

TANYA: Yeah. And I said-- so I'm using the free tier right now, which is, I think Shiny apps gives you 25 active hours. And I told them. I'm like, if we go over and I have to pay $10 a month for this, I expect some in-game gold. 

JESSE: Yeah, absolutely. 

TANYA: So they do. They give me gold. So now it's pretty sweet. And I had to show them how to make a lookup table. So you can have multiple characters. And so multiple names. And so we need a primary key, like lookup, for one person. And then roll up all their alt accounts to the main account. 

So these are very-- these are problems we see all across every industry. I mean, you have-- I have definitely used the same mechanism before where we don't need a full blown database. This data is small. We just want to use Google Sheets. So the client can put stuff there. We pull it. And we do stuff with it. 

If you need more audit and controls around more sensitive data, obviously, you want to do something else. But for stuff like video game log data, that's great. I've done the same thing for auditing Shiny App logins. Like, when they logged in, how many times, what-- so there's applications where you don't need to go crazy with a full-blown database. 

JESSE: Yeah, I think this is something that you do incredibly well where you are really focused on balancing. You get people to the thing that they need pretty quickly. And you do a lot of balancing between what is going to solve the problem and solve it to the level of difficulty that they need, right? 

You could have set up a whole authenticated database and made a really complex system for this. You could have overengineered it, that's what I'm looking for, right? You're really good at figuring out where to overengineer and where it doesn't need to be overengineered. Do you have-- is there any heuristic or internal guidelines you have, or is this just trying to get stuff done quickly? 

TANYA: It's my constant daily battle between-- yeah. It's a really good question. Because there's pros and cons to this whole "rapid prototyping thing" that I've been talking about now for years. 

You want to build something quickly-- the thing is, I've struggled between really fully fleshed-out requirements that are pages and pages of pages long that almost always change. Once you build the thing, they're like, oh, wait a minute. Now they see it, never mind. 

So it's this balance between building something tangible that people can react to and just get that feedback faster. All you're doing is creating a faster feedback loop. It's this whole like-- what's the term everyone loves, human and the loop type thing. Where you can't have that loop until you see something. You can sit here and come up with every rule, every exception, you're never going to-- with data products, you're never going to predict every possible thing that can go wrong. It's just not possible. 

So yeah. What I like to start with, and what I always try to start with, is let's talk about why we're doing this. What is the question you have, right? And people have a really hard time answering that sometimes. I mean, some people want to just get the data, and put it all somewhere, and do stuff with it. And that's not helpful. 

So for example, the WoW thing, I'm like, what are you trying to do? Well, we want to make sure people are, number one, not attending and just taking all the money. Like, never paying or buying anything. So you're just tracking a lifetime profit, right? OK. We want to know how many raids people are attending for attendance purposes. OK. And the last one was something like, we want to know how much items are going for historically. 

And those are-- that's fine. Some of those are really helpful. I think the first one is the most helpful because there's action that can be had. If someone's not pulling their weight and always receiving, OK, you kick them, right? You don't invite them to the next one. 

So those are the most valuable business questions when you can actually do something about it. If someone says, I just want to see spend by person by day. I'm like, I can do that. But why? What do you want to do with that information? And that's the way our clients can sometimes get really hung up. They don't know. 

Again, I think that's the Holy Grail of data science is, why the hell are we doing this, right? Why are we building a machine learning model when all you need is to know average sales here and figure out which sales guy to talk to.

In closing

Thanks for tuning in to our latest episode of Data + Curiosity! I’d love to hear your thoughts on this interview, how you’ve used data science in your hobbies, and what you’re curious about! Let me know in the video comments – I can’t wait to hear from you!

Show notes

🧡 Friendly Tech Space (YouTube)

🐉 House of the Dragon

👑 Game of Thrones

⚕️ GNS Healthcare

🧠 Biogen

📈 TCB Analytics

💙 Shiny

⚔️ World of Warcraft (WoW)

🏔️ WoW Classic

🗒️ Warcraft Logs

🌐 shinyapps.io

💚 Baseten

🌉 Truss, an open source Python package for model deployment

💙 Blueprint, the fastest way to work with large generative models

Follow Tanya online

🦜 on Twitter

📈 TCB Analytics