Entry 4: Get the Data

3 minute read

In Entry 3 I defined my problem as: Holding all other factors constant, what mass is needed to retain an atmosphere on Mars? I need data to solve it.

The Problem

This sounds like a dataset I’m going to have to create myself.

Hands on Machine Learning with Scikit-Learn & TensorFlow recommends automating as much of the data acquisition process as possible. If I were going to do any automation of this dataset, it would revolve around scraping table data from an HTML page. However, this is a one-off dataset and the known parameters of planets don’t change very often, so I’m not going to worry about it in this entry.

If I were working on a project where I would connect to the data source over and over, like a Twitter natural language processing (NLP) project, I would most certainly want to automate pulling data. Sounds like a project for another entry on a different mini-project.

The Options

The type of entities from which to draw the necessary data is rather limited. There are the planets, moons, and dwarf planets of this solar system and possibly exoplanets of other systems.

The Proposed Solution

Fortunately, I didn’t have to comb through information on each and every planetary body individually. The planetary fact sheet has many of the features I need. This included 8 planets, 1 moon, and 1 dwarf planet. Starting with this as a base, I gathered 27 features on 11 planetary bodies.

I considered including more moons (Jupiter has 79, Saturn 82, Uranus 27, and Neptune 14), but couldn’t find sufficient information on the necessary features. Most importantly, there was a lack of information on atmospheric mass. The same was true for exoplanets. They’re just too far away to have good measurements.

The planetary bodies and features placed into a pandas DataFrame look like this:

name type mass_1024kg diameter_km density_kg_m3 gravity_m_s2 escape_vel_km_s rotation_period_hr day_len_hr distance_from_sun_106_km ... mean_temp_c surface_pressure_bars nbr_moons rings magnetic_field equatorial_radius_km mean_radius_km V(1,0) (mag) geometric_albedo atmospheric_mass_kg
0 Mercury planet 0.3300 4879.0 5427 3.7 4.3 1407.6 4222.6 57.9 ... 167 1.000000e-14 0 No Yes 2440.5300 2439.4000 -0.60 0.106 1.000000e+03
1 Venus planet 4.8700 12104.0 5243 8.9 10.4 -5832.5 2802.0 108.2 ... 464 9.200000e+01 0 No No 6051.8000 6051.8000 -4.47 0.650 4.800000e+20
2 Earth planet 5.9700 12756.0 5514 9.8 11.2 23.9 24.0 149.6 ... 15 1.014000e+00 1 No Yes 6378.1366 6371.0084 -3.86 0.367 1.400000e+21
3 Moon moon 0.0730 3475.0 3340 1.6 2.4 655.7 708.7 149.6 ... -20 3.000000e-15 0 No No 1737.5000 1737.4000 -0.08 0.120 1.000000e+05
4 Mars planet 0.6420 6792.0 3933 3.7 5.0 24.6 24.7 227.9 ... -65 1.000000e-02 2 No No 3396.1900 3389.5000 -1.52 0.150 2.500000e+16
5 Jupiter planet 1898.0000 142984.0 1326 23.1 59.5 9.9 9.9 778.6 ... -110 2.000000e+00 79 Yes Yes 71492.0000 69911.0000 -9.40 0.520 1.900000e+27
6 Saturn planet 568.0000 120536.0 687 9.0 35.5 10.7 10.7 1433.5 ... -140 1.000000e+03 82 Yes Yes 60268.0000 58232.0000 -8.88 0.470 5.400000e+26
7 Titan moon 0.1260 5149.4 1882 1.4 2.6 382.0 382.0 1433.5 ... -179 1.600000e+00 0 No No 2574.7000 2574.7000 -8.10 0.210 9.100000e+18
8 Uranus planet 86.8000 51118.0 1271 8.7 21.3 -17.2 17.2 2872.5 ... -195 1.000000e+03 27 Yes Yes 25559.0000 25362.0000 -7.19 0.510 8.600000e+25
9 Neptune planet 102.0000 49528.0 1638 11.0 23.5 16.1 16.1 4495.1 ... -200 1.000000e+03 14 Yes Yes 24764.0000 24622.0000 -6.87 0.410 1.000000e+26
10 Pluto dwarf 0.0146 2370.0 2095 0.7 1.3 -153.3 153.3 5906.4 ... -225 1.000000e-05 5 No No 1188.3000 1188.3000 -1.00 0.300 1.300000e+14

11 rows × 27 columns

The Failure

This step was virtually error free. Other than spending hours looking at a bunch of different sources and trying to find more information on moons and exoplanets, it went pretty smoothly.

The limited number of examples (only 11) may be problematic in future steps. For example, measuring how well the model performs could prove extremely challenging.

Up Next

Explore the data

Resources:

The planetary body data was retrieved from a variety of sources. There were also a couple of one-off searches for some of the more obscure information, like the V(1,0)(mag) for Titan. These were the major contributors: